Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Introduction to Lucene and Solr - 1
1. Day 1 -
Introduction to
Lucene/Solr
Core Tech @Trend Micro
吳奕慶YI-CHING WU
1
2. Agenda
What is a search engine?
Introduction Lucene and Solr?
Advantages of Solr
Solr Architecture
Query Syntax
Setup Solr Configuration files
Working with Solr : Feed data ,query data
2
13. What is a search engine?
Indexing
Component
Search
Component
Index Files
13
User
s
Dat
a
Is Indexed
Sends
search query
Receives
search
results
14. Introducing Lucene
Created by Doug Cutting
Not a application but is a Full-text search library (Java
language)
Open source project (Since 2000.3~)
Mature
Easy to learn API
Store its index as files on disk
No Web Crawler
http://lucene.sourceforge.net/talks/pisa/
14
19. Fields of Lucene
Indexed
Put the content in the inverter index
Analyzed
Split the content into terms to be added to the inverter index.
Normalized terms
Stored
Keep the original content on disk
Multivalued
Repeat the same field multiple times in the same document with
different values
OmitNorm
Index time field boost setting
TermVector
WITH_POSITIONS_OFFSETS
19
24. Query with Lucene
24
Ask Lucene “What documents contain this words?”
Lucene applied an Analyzer to each word queried.
Query can be programmatically build powerful Query Syntax.
27. Relevancy scoring
27
N dimension vectors for documents
and queries
Score represents how close the
vectors are
TF-IDF(term-frequency-inverse
document frequency)
Document with many of the search
terms are scored higher
Smaller documents are scored higher
29. Introducing Solr
Created by Yonik (since 2004)
Open source(released in 2006)
Http Application built around Lucene
Make it easy to develop search solutions
Most programming tasks in Lucene are configuration
tasks in Solr
Advanced features develop on top of Lucene
Data importer, faceting, filter, similarity , replication and
distributed search support, dynamic field, etc.
As of 2010, Lucene and Solr are merged development
codebases
29
37. Solr Schema
Solr allows to administer one or more Lucene Index
Each index has its own schema
List all fields allowed for an index
Defines the analyzers for each field
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFil
ters
37
Relevance ranking
Integrates different data source (web page, email, files, database, etc.)
#A Activates version-dependent features in Lucene
#B Lib directives indicate where Solr can find JAR files for extensions
#C Index management settings
#D Enables JMX instrumentation of Solr MBeans
#E Update handler for indexing documents
#F Cache-management settings
#G Register event handlers for searcher events, for example, queries to execute to warm new searchers
#H Unified request dispatcher
#I Request handler to process queries using a chain of search components
#J Example search component for doing spell correction on queries
#K Extends indexing behavior using update-request processors, such as language detection
#L Formats the response as JSON
#M Declares a custom function for boosting, ranking, or sorting documents
#N Transforms result documents