An overview of Elasticsearch features and explains performing smart search, data aggregations, and relevancy through scoring functions. How Elasticsearch works as a distributed scalable data storage. Finally, showcasing some use cases that are currently becoming core functionalities in Zalando.
6. Index / Type
- An index is a collection of documents that should be grouped together for a
common reason.
- A type is a collection of documents all share an identical (or very similar)
schema
19. Relationships
● Application Side Joins
● Parent-Child
● Nested objects ● Parent-child queries can be 5 to 10
times slower than the equivalent
nested query!
●
●
●
22. Searching
A filter asks a yes|no question of every document and is
used for fields that contain exact values
- Is a date within the range 2012 to 2015 ?
- Is the status “Approved” ?
- Is the language code “DE” ?
STRUCTURED SEARCH
A query calculates how relevant each document is to the
query, and assigns it a relevance , which is later used
to sort matching documents by relevance.
- Containing the word run, but maybe also matching
runs, running, jog, or sprint
UNSTRUCTURED SEARCH
23. Searching
A filter asks a yes|no question of every document and is
used for fields that contain exact values
- Is a date within the range 2012 to 2015 ?
- Is the status “Approved” ?
- Is the language code “DE” ?
STRUCTURED SEARCH
A query calculates how relevant each document is to the
query, and assigns it a relevance , which is later used
to sort matching documents by relevance.
- Containing the word run, but maybe also matching
runs, running, jog, or sprint
UNSTRUCTURED SEARCH
27. Scoring & Relevance in Full-Text Search
Relevance is the algorithm to calculate how similar the contents of a field to a query.
TF/IDF
Term Frequency
How often does the term appear in the field?
Inverse Document Frequency
How often does each term appear in the index?
Field Length Norm
How long is the field?
28. Vector Space Model
The vector space model provides a way of
comparing a multiterm query against a document.
- The model represents both the document and the
query as vectors.
29. Vector Space Model
1. I am happy in summer.
2. After Christmas I’m a hippopotamus.
3. The happy hippopotamus helped Harry.
- By measuring the angle between the query vector
and the document vector, it is possible to assign a
relevance score to each document.
- If The angle between a document and the query is
large, so it is of low relevance.
35. Aggregation
Search Analytics
Business Requirement “Help me find the best
documents ?”
“What do theses documents
tell me about my business ?”
Enablers Matching, Relevance,
Filtering, Auto-completion,...
Summaries, Patterns,
Trends, Outliers, Predictions,
Visualization
- Aggregations help build complex summaries & analytics of the indexed data.
51. Significant Terms
- Significant_terms analyzes your data and finds terms that appear with a frequency that is
statistically anomalous compared to the background data.
- It can uncover surprisingly sophisticated trends and correlation in your data.
- Used in discovering anomalies