Successfully reported this slideshow.
Your SlideShare is downloading.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
  • Be the first to comment

Introduction to Lucene and Solr - 1

  1. 1. Day 1 - Introduction to Lucene/Solr Core Tech @Trend Micro 吳奕慶YI-CHING WU 1
  2. 2. Agenda  What is a search engine?  Introduction Lucene and Solr?  Advantages of Solr  Solr Architecture  Query Syntax  Setup Solr Configuration files  Working with Solr : Feed data ,query data 2
  3. 3. Reference  Solr in Action 3
  4. 4. Why do I need a search engine? 4
  5. 5. Why do I need a search engine? 5
  6. 6. Let’s start with Indexing  That’s information like a garbage  No structure  Come in all kinds of shapes, sizes, formats 6
  7. 7. Let’s start with Indexing  This is what index does  Makes data accessible in a structure format, easily accessible through search 7
  8. 8. Which one can be indexed and searched? Various file formats HTML Text Files Word PDF PPT 8 …
  9. 9. 9
  10. 10. 10
  11. 11. And now the search component 11
  12. 12. 12
  13. 13. What is a search engine? Indexing Component Search Component Index Files 13 User s Dat a Is Indexed Sends search query Receives search results
  14. 14. Introducing Lucene  Created by Doug Cutting  Not a application but is a Full-text search library (Java language)  Open source project (Since 2000.3~)  Mature  Easy to learn API  Store its index as files on disk  No Web Crawler  http://lucene.sourceforge.net/talks/pisa/ 14
  15. 15. 15 Typical search application
  16. 16. Search?  If you want to find a word in a book : how do you do it?  Naïve approach : linear-search  O(n) : slow  Inverter index 16
  17. 17. 17 Inverter index
  18. 18. 18 Indexing with Lucene
  19. 19. Fields of Lucene  Indexed  Put the content in the inverter index  Analyzed  Split the content into terms to be added to the inverter index. Normalized terms  Stored  Keep the original content on disk  Multivalued  Repeat the same field multiple times in the same document with different values  OmitNorm  Index time field boost setting  TermVector  WITH_POSITIONS_OFFSETS 19
  20. 20. 20 Analyzer PerFieldAnalyzerWrapper
  21. 21. 21 Analyzer
  22. 22. 22 Analyzer
  23. 23. 23 Custom Analyzers
  24. 24. Query with Lucene 24  Ask Lucene “What documents contain this words?”  Lucene applied an Analyzer to each word queried.  Query can be programmatically build powerful Query Syntax.
  25. 25. 25 Query Code Query Syntax : http://www.lucenetutorial.com/lucene-query-syntax.html http://lucene.apache.org/core/3_5_0/queryparsersyntax.html
  26. 26. 26 Luke for Lucene Index
  27. 27. Relevancy scoring 27  N dimension vectors for documents and queries  Score represents how close the vectors are  TF-IDF(term-frequency-inverse document frequency)  Document with many of the search terms are scored higher  Smaller documents are scored higher
  28. 28. Default Similarity Scoring Algorithm 28
  29. 29. Introducing Solr  Created by Yonik (since 2004)  Open source(released in 2006)  Http Application built around Lucene  Make it easy to develop search solutions  Most programming tasks in Lucene are configuration tasks in Solr  Advanced features develop on top of Lucene  Data importer, faceting, filter, similarity , replication and distributed search support, dynamic field, etc.  As of 2010, Lucene and Solr are merged development codebases 29
  30. 30. 30 Solr Architecture
  31. 31. 31 Solr Archived Folders and Files
  32. 32. 32 Understanding Solr Home
  33. 33. Solr Features  Dismax  Edismax  Text Highlight  Spell Checking  More Like This  Cache  Replication  Database connector  Spatial (Geo-location) 33
  34. 34. 34 Solr Administration Console
  35. 35. 35 Solr.xml
  36. 36. Diagram of the main components of Solr 4.x 36
  37. 37. Solr Schema  Solr allows to administer one or more Lucene Index  Each index has its own schema  List all fields allowed for an index  Defines the analyzers for each field  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFil ters 37
  38. 38. Three Main steps to index a document 38
  39. 39. Solr Schema -Confschema.xml 39
  40. 40. Solr Schema -Confschema.xml 40
  41. 41. 41 Solr- solrconfig.xml
  42. 42. 42 Solr Request Handler
  43. 43. How request handlers process Queries? 43
  44. 44. Solr Indexation  HTTP POST  XML by default, but also json , csv  Multi Threaded 44
  45. 45. Solr Query  HTTP GET or HTTP POST  Query Parameters  Response in XML by default, but other formats are supported(json, php, ruby, etc.) 45
  46. 46. Solr Query using Administration Console 46
  47. 47. 47 Solr Query Parameters
  48. 48. 48 Solr Response in XML
  49. 49. 49 Solr simple example
  50. 50. 50 Q&A
  51. 51. Solr Demo  Using TrendMicro Support knowledge base  Indexed using Solr DataImporter 51
  52. 52. Thank You! 52

    Be the first to comment

    Login to see the comments

  • manojjsm1

    Dec. 16, 2014
  • ifbalias

    Apr. 2, 2015
  • theexperiences

    Apr. 4, 2015
  • kavarul

    Apr. 9, 2015
  • FlorianSalamin

    Apr. 24, 2015
  • i_nosmoking

    May. 18, 2015
  • kaiserabliz

    Oct. 3, 2016
  • braincat

    Jul. 22, 2017
  • IanLi1

    Jun. 27, 2019

this is a solr/lucene search engine induction matrtial. if you would like to know about solr search engine this will be a quick refenence for you.

Views

Total views

1,807

On Slideshare

0

From embeds

0

Number of embeds

15

Actions

Downloads

106

Shares

0

Comments

0

Likes

9

×