SlideShare a Scribd company logo
1 of 37
Xapian 搜索引擎 潘俊勇 易度  everydo.com
我使用 ZODB ,[object Object],[object Object],[object Object],[object Object],[object Object]
ZODB 的索引问题 ,[object Object],[object Object]
需要独立的搜索模块 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
可选的产品 ,[object Object],[object Object],[object Object],[object Object]
Mysql ,[object Object],[object Object],[object Object],[object Object],[object Object]
lucene /Solar ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Xapian ,[object Object],[object Object],[object Object],[object Object],类搜索引擎的简单服务
Xapian performance ,[object Object],[object Object],[object Object],[object Object]
sphinx ,[object Object],[object Object],[object Object],各种复杂的应用
简单比较 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
我们选择 xapian ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
搜索服务 索引 数据库 索引  Index 搜索 Search 关系数据库 文件 NoSQL 数据库 Web 应用 各种数据源 异步、实时
Xapian 特性 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Xapian 术语 ,[object Object],[object Object],[object Object],[object Object],[object Object]
首先在 Terms 里面找 documents 限定范围 完了,查相应的 value ,进行排序 如果数据量大, 这个过程可能比较慢。 如果需要,可把 data 再取出 所以技巧是,尽量减少第一步的 搜索结果量 另外, Sphinx 也是这个原理。只是 他为了提高性能,把 value 全部放入 内存了。
Database 存储信息 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Xapian-backend 存储格式 ,[object Object],[object Object],[object Object],[object Object]
Xapian 的 python 接口 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
建立索引 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
写入速度慢? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
并行修改? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
搜索 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
组合搜索 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
使用 QueryParser ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
排序 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
多索引字段? ,[object Object],[object Object],[object Object]
搜索 ,[object Object],[object Object],[object Object],[object Object]
加大缓存提速? ,[object Object],[object Object]
分库查询 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
分布式搜索? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
备份? ,[object Object],[object Object],[object Object],[object Object]
“ 压缩”数据库 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Replication ,[object Object]
Xapian 优点 ,[object Object],[object Object],[object Object],[object Object],[object Object]
Xapian 问题 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
结论 ,[object Object],[object Object],[object Object]

More Related Content

What's hot

Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
 
Fine-tuning BERT for Question Answering
Fine-tuning BERT for Question AnsweringFine-tuning BERT for Question Answering
Fine-tuning BERT for Question AnsweringApache MXNet
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Daniel Adenew
 
Functional Patterns with Java8 @Bucharest Java User Group
Functional Patterns with Java8 @Bucharest Java User GroupFunctional Patterns with Java8 @Bucharest Java User Group
Functional Patterns with Java8 @Bucharest Java User GroupVictor Rentea
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingCloudxLab
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games ResearchJose Zagal
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Spring @Transactional Explained
Spring @Transactional ExplainedSpring @Transactional Explained
Spring @Transactional ExplainedVictor Rentea
 
Runtime Symbol Resolution
Runtime Symbol ResolutionRuntime Symbol Resolution
Runtime Symbol ResolutionKen Kawamoto
 
From ActiveRecord to EventSourcing
From ActiveRecord to EventSourcingFrom ActiveRecord to EventSourcing
From ActiveRecord to EventSourcingEmanuele DelBono
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity RecognitionTomer Lieber
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Edureka!
 
Prerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyPrerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyViller Hsiao
 

What's hot (20)

Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Fine-tuning BERT for Question Answering
Fine-tuning BERT for Question AnsweringFine-tuning BERT for Question Answering
Fine-tuning BERT for Question Answering
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
Functional Patterns with Java8 @Bucharest Java User Group
Functional Patterns with Java8 @Bucharest Java User GroupFunctional Patterns with Java8 @Bucharest Java User Group
Functional Patterns with Java8 @Bucharest Java User Group
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Language models
Language modelsLanguage models
Language models
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Clean code
Clean codeClean code
Clean code
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Spring @Transactional Explained
Spring @Transactional ExplainedSpring @Transactional Explained
Spring @Transactional Explained
 
Runtime Symbol Resolution
Runtime Symbol ResolutionRuntime Symbol Resolution
Runtime Symbol Resolution
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
From ActiveRecord to EventSourcing
From ActiveRecord to EventSourcingFrom ActiveRecord to EventSourcing
From ActiveRecord to EventSourcing
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
 
Prerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyPrerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrency
 

Similar to Xapian介绍

Coreseek/Sphinx 全文检索实践指南
Coreseek/Sphinx 全文检索实践指南Coreseek/Sphinx 全文检索实践指南
Coreseek/Sphinx 全文检索实践指南HonestQiao
 
Java 的開放原碼全文搜尋技術 - Lucene
Java 的開放原碼全文搜尋技術 - LuceneJava 的開放原碼全文搜尋技術 - Lucene
Java 的開放原碼全文搜尋技術 - Lucene建興 王
 
Ajax Lucence
Ajax LucenceAjax Lucence
Ajax LucenceRoger Xia
 
Fast Esp搜索系统
Fast Esp搜索系统Fast Esp搜索系统
Fast Esp搜索系统xiaochawan
 
搜索技术分享
搜索技术分享搜索技术分享
搜索技术分享endless_yy
 
腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍George Ang
 
Sphinx 全文检索实践指南
Sphinx 全文检索实践指南Sphinx 全文检索实践指南
Sphinx 全文检索实践指南Shaoning Pan
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrievalyxyx3258
 
Elasticsearch search engine_development_tips
Elasticsearch search engine_development_tipsElasticsearch search engine_development_tips
Elasticsearch search engine_development_tipsYI-CHING WU
 
MongoDB for C# developer
MongoDB for C# developerMongoDB for C# developer
MongoDB for C# developerdianming.song
 
Elastic search
Elastic searchElastic search
Elastic searchSamchu Li
 
Sql Server 高级技巧系列之一:索引详解
Sql Server 高级技巧系列之一:索引详解Sql Server 高级技巧系列之一:索引详解
Sql Server 高级技巧系列之一:索引详解向 翔
 
Fastjson那些事
Fastjson那些事Fastjson那些事
Fastjson那些事wen shaojin
 
Python crawling tutorial
Python crawling tutorialPython crawling tutorial
Python crawling tutorialChen-Ming Yang
 
scrapy+sphinx搭建搜索引擎
scrapy+sphinx搭建搜索引擎scrapy+sphinx搭建搜索引擎
scrapy+sphinx搭建搜索引擎Ping Yin
 
Mongo db技术分享
Mongo db技术分享Mongo db技术分享
Mongo db技术分享晓锋 陈
 
腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍areyouok
 
腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍areyouok
 
腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍topgeek
 
elastic search分析与实践
elastic search分析与实践elastic search分析与实践
elastic search分析与实践williams2014
 

Similar to Xapian介绍 (20)

Coreseek/Sphinx 全文检索实践指南
Coreseek/Sphinx 全文检索实践指南Coreseek/Sphinx 全文检索实践指南
Coreseek/Sphinx 全文检索实践指南
 
Java 的開放原碼全文搜尋技術 - Lucene
Java 的開放原碼全文搜尋技術 - LuceneJava 的開放原碼全文搜尋技術 - Lucene
Java 的開放原碼全文搜尋技術 - Lucene
 
Ajax Lucence
Ajax LucenceAjax Lucence
Ajax Lucence
 
Fast Esp搜索系统
Fast Esp搜索系统Fast Esp搜索系统
Fast Esp搜索系统
 
搜索技术分享
搜索技术分享搜索技术分享
搜索技术分享
 
腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍
 
Sphinx 全文检索实践指南
Sphinx 全文检索实践指南Sphinx 全文检索实践指南
Sphinx 全文检索实践指南
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Elasticsearch search engine_development_tips
Elasticsearch search engine_development_tipsElasticsearch search engine_development_tips
Elasticsearch search engine_development_tips
 
MongoDB for C# developer
MongoDB for C# developerMongoDB for C# developer
MongoDB for C# developer
 
Elastic search
Elastic searchElastic search
Elastic search
 
Sql Server 高级技巧系列之一:索引详解
Sql Server 高级技巧系列之一:索引详解Sql Server 高级技巧系列之一:索引详解
Sql Server 高级技巧系列之一:索引详解
 
Fastjson那些事
Fastjson那些事Fastjson那些事
Fastjson那些事
 
Python crawling tutorial
Python crawling tutorialPython crawling tutorial
Python crawling tutorial
 
scrapy+sphinx搭建搜索引擎
scrapy+sphinx搭建搜索引擎scrapy+sphinx搭建搜索引擎
scrapy+sphinx搭建搜索引擎
 
Mongo db技术分享
Mongo db技术分享Mongo db技术分享
Mongo db技术分享
 
腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍
 
腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍
 
腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍腾讯大讲堂25 企业级搜索托管平台介绍
腾讯大讲堂25 企业级搜索托管平台介绍
 
elastic search分析与实践
elastic search分析与实践elastic search分析与实践
elastic search分析与实践
 

Xapian介绍

  • 1. Xapian 搜索引擎 潘俊勇 易度 everydo.com
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. 搜索服务 索引 数据库 索引 Index 搜索 Search 关系数据库 文件 NoSQL 数据库 Web 应用 各种数据源 异步、实时
  • 14.
  • 15.
  • 16. 首先在 Terms 里面找 documents 限定范围 完了,查相应的 value ,进行排序 如果数据量大, 这个过程可能比较慢。 如果需要,可把 data 再取出 所以技巧是,尽量减少第一步的 搜索结果量 另外, Sphinx 也是这个原理。只是 他为了提高性能,把 value 全部放入 内存了。
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.