SlideShare a Scribd company logo
1 of 31
Download to read offline
Privileged and Confidential
Point Field Types in Solr
Evolution of Range Filters
amikryukov@griddynamics.com
Privileged and Confidential
Agenda
1. Recap: From query parser to TopDocsCollector.
2. TermQuery search flow.
3. How Range Filters are implemented?
4. Optimizations for Range Filters.
5. Point Fields.
2
Privileged and Confidential
Recap: From query parser to TopDocsCollector
? What is query parser?
3
Privileged and Confidential
Recap: From query parser to TopDocsCollector
? What is query parser?
? What is the difference between Query and Scorer? And why you need a Collector?
4
Privileged and Confidential
Recap: Query execution flow
LeafReader
5
Privileged and Confidential
Recap: From query parser to TopDocsCollector
? What is query parser?
? What is the difference between Query and Scorer?
? What is the difference between TermsEnum and PosingsEnum?
6
Privileged and Confidential
Recap: inverted index, terms, posting list
7
Privileged and Confidential
TermQuery search flow
q=Price:10
8
Privileged and Confidential
TermQuery search flow
q=Price:10
TermQuery(
field=’Price’,
val=’10’)
TermQuery
TermWeight
TermScorer
9
Privileged and Confidential
TermQuery
Idea:
Iterate over posting list of the term `10`
10
q=Price:10
Privileged and Confidential
TermQuery source code
11
Privileged and Confidential
TermScorer source code
12
Privileged and Confidential
How Range filters are implemented?
term -> document ids
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
q=PRICE:[423 TO 642]
13
Privileged and Confidential
How Range filters are implemented?
term -> document ids
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642
14
Privileged and Confidential
MultiTermQuery
term -> document ids
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642
15
Privileged and Confidential
Naive implementation
term -> document ids
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642
In total = 11 should clauses.
16
Privileged and Confidential
Optimizations for Range Filters
? How can we improve the naive implementation of RangeFilterQuery?Original values
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
17
Privileged and Confidential
Trie
18
Original values
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
Privileged and Confidential
Trie*Field index time
Original values
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
Additional values
42* -> [1, 2]
44* -> [3, 4]
52* -> [5, 7]
63* -> [5, 6]
64* -> [5, 6 , 7]
4** -> [1, 2, 3, 4]
5** -> [5, 7]
6** -> [5, 6, 7]
Exploit the Trie*Field
Shift 2
Shift 1
Shift 0
19
(since Lucene 2.9)
Privileged and Confidential
Trie*Field query time
Original values
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
Additional values
42* -> [1, 2]
44* -> [3, 4]
52* -> [5, 7]
63* -> [5, 6]
64* -> [5, 6 , 7]
4** -> [1, 2, 3, 4]
5** -> [5, 7]
6** -> [5, 6, 7]
Exploit the Trie*Field
In total = 6 should clauses in the end
20
Privileged and Confidential
Is not it enough? Distribution of terms?
Trie-based approach does not involve distribution of the terms analysis.
q=PRICE:[100 TO 2002222]Original values
1 -> [1]
100 -> [2]
2000001 -> [3]
2000022 -> [3]
2000222 -> [4]
2002222 -> [5]
50000005 -> [7]
21
Privileged and Confidential
Is not it enough?
IO efficiency.
We need to store all original and additional values.
We need to read all Terms of the field at search time.
Original values
1 -> [1]
100 -> [2]
2000001 -> [3]
2000022 -> [3]
2000222 -> [4]
2002222 -> [5]
50000005 -> [7]
Additional values
10* -> [2]
1** -> [1, 2]
200002* -> [3]
200022* -> [4]
20002** -> [4]
200**** -> [3, 4, 5]
200222* -> [5]
20022** -> [5]
2002*** -> [5]
22
Privileged and Confidential
Point Fields
This feature replaces the now deprecated numeric fields (Trie*Field) and numeric range query since it
has better overall performance and is more general - allowing multidimensions. (since Lucene 6.0)
● Based on Bkd-Tree: A Dynamic Scalable kd-Tree
Naturally adapt to each data set's particular distribution. In contrast to legacy numeric fields
which always index the same precision levels for every value regardless of how the points are
distributed.
● Most of the data structure resides in on-disk blocks, with a small in-heap binary tree index
structure to locate the blocks at search time.
● Allows to operate with multi-dimensional points. (Maps, 3D-models).
23
Privileged and Confidential
Bkd-Tree
Binary Space Partitioning tree
B - Blocked
Number of points in the cell = 2
24
Privileged and Confidential
Bkd-Tree adapts to particular distribution
Example from
https://www.elastic.co/blog/lucene-points-6.0
25
Privileged and Confidential
Point Fields: index time
Disk
Heap
Lucene - number of points in cell is 1024.
26
Privileged and Confidential
Point Fields: search time
Disk
Heap
q=PRICE:[100, 2002222]
If block overlaps with the query - we
have to check every term value inside
If block is fully contained within the query -
the documents with values in that cell are
efficiently collected without having to test
each point
27
Privileged and Confidential
Performance testing (Lucene 6.0)
28
Privileged and Confidential
Point Fields
29
Privileged and Confidential
Links
Numeric Range Queries in Lucene/Solr
http://blog-archive.griddynamics.com/2014/10/numeric-range-queries-in-lucenesolr.html
Lucene Search Essentials: Scorers, Collectors and Custom Queries
https://www.slideshare.net/lucenerevolution/lucene-search-essentials-scorers-collectors-and-custom-queries-dublin13
Multi-dimensional points, coming in Apache Lucene 6.0
https://www.elastic.co/blog/lucene-points-6.0
Bkd-Tree: A Dynamic Scalable kd-Tree
https://users.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf
The Evolution of Lucene & Solr Numerics from Strings to Points
https://www.slideshare.net/lucidworks/the-evolution-of-lucene-solr-numerics-from-strings-to-points-
presented-by-steve-rowe-lucidworks?from_action=save
30
Privileged and Confidential 31

More Related Content

What's hot

Oracle db performance tuning
Oracle db performance tuningOracle db performance tuning
Oracle db performance tuningSimon Huang
 
The MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer TraceThe MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer Traceoysteing
 
New methods for exploiting ORM injections in Java applications
New methods for exploiting ORM injections in Java applicationsNew methods for exploiting ORM injections in Java applications
New methods for exploiting ORM injections in Java applicationsMikhail Egorov
 
TECH TALK 2021/08/10 一歩進んだQlikアプリの開発~Qlik専用QVDファイルでシステムの効率アップ
TECH TALK 2021/08/10 一歩進んだQlikアプリの開発~Qlik専用QVDファイルでシステムの効率アップTECH TALK 2021/08/10 一歩進んだQlikアプリの開発~Qlik専用QVDファイルでシステムの効率アップ
TECH TALK 2021/08/10 一歩進んだQlikアプリの開発~Qlik専用QVDファイルでシステムの効率アップQlikPresalesJapan
 
MySQL Query And Index Tuning
MySQL Query And Index TuningMySQL Query And Index Tuning
MySQL Query And Index TuningManikanda kumar
 
Qlik Alerting で実現する Qlik Sense Enterprise Client-managed (Windows版) の高度でインテリジ...
Qlik Alerting で実現する Qlik Sense Enterprise Client-managed (Windows版) の高度でインテリジ...Qlik Alerting で実現する Qlik Sense Enterprise Client-managed (Windows版) の高度でインテリジ...
Qlik Alerting で実現する Qlik Sense Enterprise Client-managed (Windows版) の高度でインテリジ...QlikPresalesJapan
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaSpark Summit
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)Hemant Kumar Singh
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performanceoysteing
 
なかったらINSERTしたいし、あるならロック取りたいやん?
なかったらINSERTしたいし、あるならロック取りたいやん?なかったらINSERTしたいし、あるならロック取りたいやん?
なかったらINSERTしたいし、あるならロック取りたいやん?ichirin2501
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Comprehensive View on Intervals in Apache Spark 3.2
Comprehensive View on Intervals in Apache Spark 3.2Comprehensive View on Intervals in Apache Spark 3.2
Comprehensive View on Intervals in Apache Spark 3.2Databricks
 
Packages,interfaces and exceptions
Packages,interfaces and exceptionsPackages,interfaces and exceptions
Packages,interfaces and exceptionsMavoori Soshmitha
 
Qlik Tips 日付データの取り扱い
Qlik Tips 日付データの取り扱いQlik Tips 日付データの取り扱い
Qlik Tips 日付データの取り扱いQlikPresalesJapan
 
OpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQLOpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQLOpen Gurukul
 

What's hot (20)

Oracle db performance tuning
Oracle db performance tuningOracle db performance tuning
Oracle db performance tuning
 
The MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer TraceThe MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer Trace
 
New methods for exploiting ORM injections in Java applications
New methods for exploiting ORM injections in Java applicationsNew methods for exploiting ORM injections in Java applications
New methods for exploiting ORM injections in Java applications
 
TECH TALK 2021/08/10 一歩進んだQlikアプリの開発~Qlik専用QVDファイルでシステムの効率アップ
TECH TALK 2021/08/10 一歩進んだQlikアプリの開発~Qlik専用QVDファイルでシステムの効率アップTECH TALK 2021/08/10 一歩進んだQlikアプリの開発~Qlik専用QVDファイルでシステムの効率アップ
TECH TALK 2021/08/10 一歩進んだQlikアプリの開発~Qlik専用QVDファイルでシステムの効率アップ
 
MySQL Query And Index Tuning
MySQL Query And Index TuningMySQL Query And Index Tuning
MySQL Query And Index Tuning
 
Qlik Alerting で実現する Qlik Sense Enterprise Client-managed (Windows版) の高度でインテリジ...
Qlik Alerting で実現する Qlik Sense Enterprise Client-managed (Windows版) の高度でインテリジ...Qlik Alerting で実現する Qlik Sense Enterprise Client-managed (Windows版) の高度でインテリジ...
Qlik Alerting で実現する Qlik Sense Enterprise Client-managed (Windows版) の高度でインテリジ...
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
 
Oracle 索引介紹
Oracle 索引介紹Oracle 索引介紹
Oracle 索引介紹
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
なかったらINSERTしたいし、あるならロック取りたいやん?
なかったらINSERTしたいし、あるならロック取りたいやん?なかったらINSERTしたいし、あるならロック取りたいやん?
なかったらINSERTしたいし、あるならロック取りたいやん?
 
Exception handling
Exception handlingException handling
Exception handling
 
Performance tuning in sql server
Performance tuning in sql serverPerformance tuning in sql server
Performance tuning in sql server
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, Really
 
Comprehensive View on Intervals in Apache Spark 3.2
Comprehensive View on Intervals in Apache Spark 3.2Comprehensive View on Intervals in Apache Spark 3.2
Comprehensive View on Intervals in Apache Spark 3.2
 
Packages,interfaces and exceptions
Packages,interfaces and exceptionsPackages,interfaces and exceptions
Packages,interfaces and exceptions
 
Qlik Tips 日付データの取り扱い
Qlik Tips 日付データの取り扱いQlik Tips 日付データの取り扱い
Qlik Tips 日付データの取り扱い
 
OpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQLOpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQL
 

Similar to Point field types in Solr. Evolution of the Range Queries.

Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comJungsu Heo
 
Writing efficient sql
Writing efficient sqlWriting efficient sql
Writing efficient sqlj9soto
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" DataArt
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupSease
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xshradha ambekar
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1Jungsu Heo
 
OQL querying and indexes with Apache Geode (incubating)
OQL querying and indexes with Apache Geode (incubating)OQL querying and indexes with Apache Geode (incubating)
OQL querying and indexes with Apache Geode (incubating)Jason Huynh
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersJonathan Levin
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2Itamar Haber
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheusBob Cotton
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB RoadmapMongoDB
 
SQL Server Deep Drive
SQL Server Deep Drive SQL Server Deep Drive
SQL Server Deep Drive DataArt
 
Top 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksTop 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksNeo4j
 
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxjexp
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Introducing Apache Carbon Data - Hadoop Native Columnar Data Format
Introducing Apache Carbon Data - Hadoop Native Columnar Data FormatIntroducing Apache Carbon Data - Hadoop Native Columnar Data Format
Introducing Apache Carbon Data - Hadoop Native Columnar Data FormatVimal Das Kammath
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB RoadmapMongoDB
 

Similar to Point field types in Solr. Evolution of the Range Queries. (20)

Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Writing efficient sql
Writing efficient sqlWriting efficient sql
Writing efficient sql
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval Meetup
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
 
OQL querying and indexes with Apache Geode (incubating)
OQL querying and indexes with Apache Geode (incubating)OQL querying and indexes with Apache Geode (incubating)
OQL querying and indexes with Apache Geode (incubating)
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheus
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
 
SQL Server Deep Drive
SQL Server Deep Drive SQL Server Deep Drive
SQL Server Deep Drive
 
Top 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksTop 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & Tricks
 
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Introducing Apache Carbon Data - Hadoop Native Columnar Data Format
Introducing Apache Carbon Data - Hadoop Native Columnar Data FormatIntroducing Apache Carbon Data - Hadoop Native Columnar Data Format
Introducing Apache Carbon Data - Hadoop Native Columnar Data Format
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
 

Recently uploaded

^Clinic ^%[+27788225528*Abortion Pills For Sale In soweto
^Clinic ^%[+27788225528*Abortion Pills For Sale In soweto^Clinic ^%[+27788225528*Abortion Pills For Sale In soweto
^Clinic ^%[+27788225528*Abortion Pills For Sale In sowetokasambamuno
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
 
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...Neo4j
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Maxim Salnikov
 
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...OnePlan Solutions
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024MulesoftMunichMeetup
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaNeo4j
 
Jax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckJax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckMarc Lester
 
The Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test AutomationThe Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test AutomationElement34
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...drm1699
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Andreas Granig
 
Weeding your micro service landscape.pdf
Weeding your micro service landscape.pdfWeeding your micro service landscape.pdf
Weeding your micro service landscape.pdftimtebeek1
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfkalichargn70th171
 
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with GraphGraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with GraphNeo4j
 
What is a Recruitment Management Software?
What is a Recruitment Management Software?What is a Recruitment Management Software?
What is a Recruitment Management Software?NYGGS Automation Suite
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksJinanKordab
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)Roberto Bettazzoni
 
Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConNatan Silnitsky
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Andrea Goulet
 

Recently uploaded (20)

^Clinic ^%[+27788225528*Abortion Pills For Sale In soweto
^Clinic ^%[+27788225528*Abortion Pills For Sale In soweto^Clinic ^%[+27788225528*Abortion Pills For Sale In soweto
^Clinic ^%[+27788225528*Abortion Pills For Sale In soweto
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
 
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
 
Jax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckJax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined Deck
 
The Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test AutomationThe Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test Automation
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Abortion Clinic In Polokwane ](+27832195400*)[ 🏥 Safe Abortion Pills in Polok...
Abortion Clinic In Polokwane ](+27832195400*)[ 🏥 Safe Abortion Pills in Polok...Abortion Clinic In Polokwane ](+27832195400*)[ 🏥 Safe Abortion Pills in Polok...
Abortion Clinic In Polokwane ](+27832195400*)[ 🏥 Safe Abortion Pills in Polok...
 
Weeding your micro service landscape.pdf
Weeding your micro service landscape.pdfWeeding your micro service landscape.pdf
Weeding your micro service landscape.pdf
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
 
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with GraphGraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
 
What is a Recruitment Management Software?
What is a Recruitment Management Software?What is a Recruitment Management Software?
What is a Recruitment Management Software?
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with Links
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 
Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeCon
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 

Point field types in Solr. Evolution of the Range Queries.

  • 1. Privileged and Confidential Point Field Types in Solr Evolution of Range Filters amikryukov@griddynamics.com
  • 2. Privileged and Confidential Agenda 1. Recap: From query parser to TopDocsCollector. 2. TermQuery search flow. 3. How Range Filters are implemented? 4. Optimizations for Range Filters. 5. Point Fields. 2
  • 3. Privileged and Confidential Recap: From query parser to TopDocsCollector ? What is query parser? 3
  • 4. Privileged and Confidential Recap: From query parser to TopDocsCollector ? What is query parser? ? What is the difference between Query and Scorer? And why you need a Collector? 4
  • 5. Privileged and Confidential Recap: Query execution flow LeafReader 5
  • 6. Privileged and Confidential Recap: From query parser to TopDocsCollector ? What is query parser? ? What is the difference between Query and Scorer? ? What is the difference between TermsEnum and PosingsEnum? 6
  • 7. Privileged and Confidential Recap: inverted index, terms, posting list 7
  • 8. Privileged and Confidential TermQuery search flow q=Price:10 8
  • 9. Privileged and Confidential TermQuery search flow q=Price:10 TermQuery( field=’Price’, val=’10’) TermQuery TermWeight TermScorer 9
  • 10. Privileged and Confidential TermQuery Idea: Iterate over posting list of the term `10` 10 q=Price:10
  • 13. Privileged and Confidential How Range filters are implemented? term -> document ids 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] q=PRICE:[423 TO 642] 13
  • 14. Privileged and Confidential How Range filters are implemented? term -> document ids 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642 14
  • 15. Privileged and Confidential MultiTermQuery term -> document ids 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642 15
  • 16. Privileged and Confidential Naive implementation term -> document ids 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642 In total = 11 should clauses. 16
  • 17. Privileged and Confidential Optimizations for Range Filters ? How can we improve the naive implementation of RangeFilterQuery?Original values 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] 17
  • 18. Privileged and Confidential Trie 18 Original values 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7]
  • 19. Privileged and Confidential Trie*Field index time Original values 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] Additional values 42* -> [1, 2] 44* -> [3, 4] 52* -> [5, 7] 63* -> [5, 6] 64* -> [5, 6 , 7] 4** -> [1, 2, 3, 4] 5** -> [5, 7] 6** -> [5, 6, 7] Exploit the Trie*Field Shift 2 Shift 1 Shift 0 19 (since Lucene 2.9)
  • 20. Privileged and Confidential Trie*Field query time Original values 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] Additional values 42* -> [1, 2] 44* -> [3, 4] 52* -> [5, 7] 63* -> [5, 6] 64* -> [5, 6 , 7] 4** -> [1, 2, 3, 4] 5** -> [5, 7] 6** -> [5, 6, 7] Exploit the Trie*Field In total = 6 should clauses in the end 20
  • 21. Privileged and Confidential Is not it enough? Distribution of terms? Trie-based approach does not involve distribution of the terms analysis. q=PRICE:[100 TO 2002222]Original values 1 -> [1] 100 -> [2] 2000001 -> [3] 2000022 -> [3] 2000222 -> [4] 2002222 -> [5] 50000005 -> [7] 21
  • 22. Privileged and Confidential Is not it enough? IO efficiency. We need to store all original and additional values. We need to read all Terms of the field at search time. Original values 1 -> [1] 100 -> [2] 2000001 -> [3] 2000022 -> [3] 2000222 -> [4] 2002222 -> [5] 50000005 -> [7] Additional values 10* -> [2] 1** -> [1, 2] 200002* -> [3] 200022* -> [4] 20002** -> [4] 200**** -> [3, 4, 5] 200222* -> [5] 20022** -> [5] 2002*** -> [5] 22
  • 23. Privileged and Confidential Point Fields This feature replaces the now deprecated numeric fields (Trie*Field) and numeric range query since it has better overall performance and is more general - allowing multidimensions. (since Lucene 6.0) ● Based on Bkd-Tree: A Dynamic Scalable kd-Tree Naturally adapt to each data set's particular distribution. In contrast to legacy numeric fields which always index the same precision levels for every value regardless of how the points are distributed. ● Most of the data structure resides in on-disk blocks, with a small in-heap binary tree index structure to locate the blocks at search time. ● Allows to operate with multi-dimensional points. (Maps, 3D-models). 23
  • 24. Privileged and Confidential Bkd-Tree Binary Space Partitioning tree B - Blocked Number of points in the cell = 2 24
  • 25. Privileged and Confidential Bkd-Tree adapts to particular distribution Example from https://www.elastic.co/blog/lucene-points-6.0 25
  • 26. Privileged and Confidential Point Fields: index time Disk Heap Lucene - number of points in cell is 1024. 26
  • 27. Privileged and Confidential Point Fields: search time Disk Heap q=PRICE:[100, 2002222] If block overlaps with the query - we have to check every term value inside If block is fully contained within the query - the documents with values in that cell are efficiently collected without having to test each point 27
  • 28. Privileged and Confidential Performance testing (Lucene 6.0) 28
  • 30. Privileged and Confidential Links Numeric Range Queries in Lucene/Solr http://blog-archive.griddynamics.com/2014/10/numeric-range-queries-in-lucenesolr.html Lucene Search Essentials: Scorers, Collectors and Custom Queries https://www.slideshare.net/lucenerevolution/lucene-search-essentials-scorers-collectors-and-custom-queries-dublin13 Multi-dimensional points, coming in Apache Lucene 6.0 https://www.elastic.co/blog/lucene-points-6.0 Bkd-Tree: A Dynamic Scalable kd-Tree https://users.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf The Evolution of Lucene & Solr Numerics from Strings to Points https://www.slideshare.net/lucidworks/the-evolution-of-lucene-solr-numerics-from-strings-to-points- presented-by-steve-rowe-lucidworks?from_action=save 30