Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Elasticsearch Introduction to Data model, Search & Aggregations

An overview of Elasticsearch features and explains performing smart search, data aggregations, and relevancy through scoring functions. How Elasticsearch works as a distributed scalable data storage. Finally, showcasing some use cases that are currently becoming core functionalities in Zalando.

  • Be the first to comment

Elasticsearch Introduction to Data model, Search & Aggregations

  1. 1. Elasticsearch Zalando Elasticsearch By Alaa Elhadba
  2. 2. Table of Contents
  3. 3. Why Elasticsearch
  4. 4. Why Elasticsearch ✓ ✓ ✓ ✓
  5. 5. Elasticsearch at scale
  6. 6. Index / Type - An index is a collection of documents that should be grouped together for a common reason. - A type is a collection of documents all share an identical (or very similar) schema
  7. 7. Sharding
  8. 8. Talking to data
  9. 9. Distribution Elasticsearch node Cluster_state: yellow
  10. 10. Scaling Cluster Cluster_state: yellow
  11. 11. Replication Cluster Cluster_state: Green
  12. 12. Replication Cluster Cluster_state: Green
  13. 13. Replication Cluster Cluster_state: Green
  14. 14. Replication Cluster Cluster_state: Red
  15. 15. Data Modeling
  16. 16. Schema Type: ◆ Index: ◆ ◆ ◆ Doc_values: ◆
  17. 17. Relationships ● Application Side Joins ● Parent-Child ● Nested objects
  18. 18. Relationships ● Application Side Joins ● Parent-Child ● Nested objects ● ● ● ●
  19. 19. Relationships ● Application Side Joins ● Parent-Child ● Nested objects ● Parent-child queries can be 5 to 10 times slower than the equivalent nested query! ● ● ●
  20. 20. Relationships ● Application Side Joins ● Parent-Child ● Nested objects ● ● ● ● ● ●
  21. 21. Searching
  22. 22. Searching A filter asks a yes|no question of every document and is used for fields that contain exact values - Is a date within the range 2012 to 2015 ? - Is the status “Approved” ? - Is the language code “DE” ? STRUCTURED SEARCH A query calculates how relevant each document is to the query, and assigns it a relevance , which is later used to sort matching documents by relevance. - Containing the word run, but maybe also matching runs, running, jog, or sprint UNSTRUCTURED SEARCH
  23. 23. Searching A filter asks a yes|no question of every document and is used for fields that contain exact values - Is a date within the range 2012 to 2015 ? - Is the status “Approved” ? - Is the language code “DE” ? STRUCTURED SEARCH A query calculates how relevant each document is to the query, and assigns it a relevance , which is later used to sort matching documents by relevance. - Containing the word run, but maybe also matching runs, running, jog, or sprint UNSTRUCTURED SEARCH
  24. 24. Terms Query Example
  25. 25. Unstructured Search (Full Text) Quick brown foxes leap over lazy dogs in summer Quick, brown, foxes, leap, over, lazy, dogs, in, summer Quick, brown, foxes, leap, lazy, dogs, summer Quick, brown, fox, leap, lazy, dog, summer fast, brown, fox, jump, lazy, dog, summer tsar -> star Inverted Index
  26. 26. Relevance
  27. 27. Scoring & Relevance in Full-Text Search Relevance is the algorithm to calculate how similar the contents of a field to a query. TF/IDF Term Frequency How often does the term appear in the field? Inverse Document Frequency How often does each term appear in the index? Field Length Norm How long is the field?
  28. 28. Vector Space Model The vector space model provides a way of comparing a multiterm query against a document. - The model represents both the document and the query as vectors.
  29. 29. Vector Space Model 1. I am happy in summer. 2. After Christmas I’m a hippopotamus. 3. The happy hippopotamus helped Harry. - By measuring the angle between the query vector and the document vector, it is possible to assign a relevance score to each document. - If The angle between a document and the query is large, so it is of low relevance.
  30. 30. Constant Score
  31. 31. Field Value Factor
  32. 32. Field Value Factor
  33. 33. Script Scoring
  34. 34. Aggregations
  35. 35. Aggregation Search Analytics Business Requirement “Help me find the best documents ?” “What do theses documents tell me about my business ?” Enablers Matching, Relevance, Filtering, Auto-completion,... Summaries, Patterns, Trends, Outliers, Predictions, Visualization - Aggregations help build complex summaries & analytics of the indexed data.
  36. 36. Aggregation Terms Significant Terms
  37. 37. Bucket Aggregations
  38. 38. Nested Aggregations
  39. 39. Metrics Aggregations ● Extended Stats Aggregation ● Geo Bounds Aggregation ● Geo Centroid Aggregation ● Percentiles Aggregation ● Stats Aggregation ● Value Count Aggregation ● Avg, Sum, Min, Max Aggregations
  40. 40. Metrics Aggregations ● Extended Stats Aggregation ● Geo Bounds Aggregation ● Geo Centroid Aggregation ● Percentiles Aggregation ● Stats Aggregation ● Value Count Aggregation ● Avg, Sum, Min, Max Aggregations
  41. 41. Metrics Aggregations ● Extended Stats Aggregation ● Geo Bounds Aggregation ● Geo Centroid Aggregation ● Percentiles Aggregation ● Stats Aggregation ● Value Count Aggregation ● Avg, Sum, Min, Max Aggregations
  42. 42. Metrics Aggregations ● Extended Stats Aggregation ● Geo Bounds Aggregation ● Geo Centroid Aggregation ● Percentiles Aggregation ● Stats Aggregation ● Value Count Aggregation ● Avg, Sum, Min, Max Aggregations
  43. 43. Metrics Aggregations ● Extended Stats Aggregation ● Geo Bounds Aggregation ● Geo Centroid Aggregation ● Percentiles Aggregation ● Stats Aggregation ● Value Count Aggregation ● Avg, Sum, Min, Max Aggregations
  44. 44. Metrics Aggregations ● Extended Stats Aggregation ● Geo Bounds Aggregation ● Geo Centroid Aggregation ● Percentiles Aggregation ● Stats Aggregation ● Value Count Aggregation ● Avg, Sum, Min, Max Aggregations
  45. 45. Metrics Aggregations ● Extended Stats Aggregation ● Geo Bounds Aggregation ● Geo Centroid Aggregation ● Percentiles Aggregation ● Stats Aggregation ● Value Count Aggregation ● Avg, Sum, Min, Max Aggregations
  46. 46. Significant Terms
  47. 47. What’s uncommonly common about this sub-group ?
  48. 48. Significant Terms - Significant_terms analyzes your data and finds terms that appear with a frequency that is statistically anomalous compared to the background data. - It can uncover surprisingly sophisticated trends and correlation in your data. - Used in discovering anomalies
  49. 49. Significant Terms Summarisehow their style differ from everyone else Find all people who like these products
  50. 50. Significant Terms
  51. 51. Kibana: Data Visualization
  52. 52. Kibana
  53. 53. Contact

×