Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Elasticsearch - Guide to Search

928 views

Published on

Presentation covers concepts of full-text search and shows possibilites of Elasticsearch as a technology of choice to build an intelligent search engine with.

Presentation from the 2nd Wrocław's PHPErs Conference which took place on 10.08.2015.

Published in: Technology
  • Login to see the comments

Elasticsearch - Guide to Search

  1. 1. Elasticsearch Guide to search #1 Antoni Orfin antoniorfin@gmail.com
  2. 2. USE CASES 1. Intelligent search engines …learning on users behaviour „Search for cats that I would love from 3M database” …forgiving spelling mistakes „Search for Mihael Jakson photos and show Michael Jackson photos”
  3. 3. USE CASES 2. Autocomplete „Show the most relevant suggestions that starts with search…”
  4. 4. USE CASES 3. Geo-search (Geospatial) „Search for restaurants that are nearest to ”
  5. 5. USE CASES 4. Search by colors (ColorSearch) „Search for flowers that are ”
  6. 6. OLD SCHOOL Searching in MySQL SELECT * FROM photos WHERE title LIKE ”%cat%” SELECT * FROM photos WHERE title LIKE ”%cats%” Id [PK] title 1 Cute cat and dog 2 Cat plays with a dog 3 Cats playing piano … …. 3 000 000 Hidden cat
  7. 7. SEARCH THEORY Building Inverted Index Cute cat and dog #1 Cats playing piano #3 Term [PK] Id cute 1 cat 1, 2, 3 dog 1, 2 play 2, 3 … …. Cat plays with a dog #2
  8. 8. SEARCH THEORY Text Analysis Puppy and kitten with guinea pig 1. Tokenization [Puppy] [and] [kitten] [with] [guinea] [pig] 2. Filtering tokens [dog] [cat] [guinea] [pig] Two separate tokens? L
  9. 9. ASCII Folding – róża à roza Lowercase - Cat à cat Synonyms – kitten à cat puppy à dog Stopwords – common words to remove and, what, with, or Stemming - reducing inflected words to their base form cats -> cat fishing, fisher, fished -> fish SEARCH THEORY Text Analysis
  10. 10. Lekarz Chorób Wewnętrznych stemming Lekarz Choroba Wewnętrzny asciifolding, lowercase lekarz choroba wewnetrzny synonyms internista SEARCH THEORY Text Analysis
  11. 11. TECHNOLOGIES Search Engines Overview
  12. 12. SOLUTION Elasticsearch is a flexible and powerful open- source, distributed, real-time search and analytics engine.
  13. 13. ELASTICSEARCH Architecture Node 1 Shard 1 Shard 2 Replica 3 Replica 4 Shard 3 Shard 4 Replica 1 Replica 2 Node 2 4 shards 1 replica
  14. 14. Elasticsearch MySQL Node Instance Index Database Type Table Document Row Attribute Column ELASTICSEARCH Nomenclature
  15. 15. PUT [localhost:9200]/pixers/photos/_mapping { "photos" : { "properties" : { "title" : {"type" : "string", "analyzer" : "pl"}, ”categories" : {"type" : ”nested”, ...} } } } Types string, float, double, byte, short, integer, long, date nested geo_point geo_shape … etc … ELASTICSEARCH Mapping
  16. 16. localhost:9200/{index}/{type}/{document id} PUT [localhost:9200]/pixers/photos/1 { "title" : "Cute cat and dog sitting on books", "keywords": ["cat", "dog"] } GET [localhost:9200]/pixers/photos/1 DELETE [localhost:9200]/pixers/photos/1 ELASTICSEARCH REST API
  17. 17. Searching GET /pixers/photos/_search { "query" : { "match" : { "title" : "cat" } } } Real life query > > ELASTICSEARCH REST API
  18. 18. Query vs Filter Query String „likes:[10 to *] and title:(+cat –dog)” Match – „funny cat” Fuzzy – „funy cad” More Like This ELASTICSEARCH Searching
  19. 19. Query vs Filter Terms – [some, tags] Range – likes > 10 Geo Distance Lat=50; Lon=20; Distance=200m ELASTICSEARCH Searching
  20. 20. Query vs Filter Nested Bool MUST/MUST NOT/SHOULD/SHOULD NOT Function Score ELASTICSEARCH Searching
  21. 21. Aggregations Get likes stats and histogram of created_at date grouped by categories. terms: category - stats: likes - histogram: created_at ELASTICSEARCH Analytics
  22. 22. Contact me at: antoniorfin@gmail.com linkedin.com/in/antoniorfin twitter.com/antoniorfin www.pixersize.com Thank you! Questions & Answers

×