SlideShare a Scribd company logo
1 of 56
Download to read offline
Approaching Join Index 
yet another one join algorithm 
Mikhail Khludnev 
principal engineer
• Grid Dynamics is a Silicon Valley-based leading provider of scalable, next-generation 
commerce technology solutions 
• Record of outperformance with Tier 1 retail clients 
• Fortune 1000 client relationships 
PRIVILEGED AND CONFIDENTIAL
About Me 
● principal engineer at Grid Dynamics 
● spoke at few last LuceneRevolutions 
● contributed BlockJoin query parser for Solr - {!parent} 
● blogged about it at http://blog.griddynamics.com/ 
● tried to fix threads at DataImportHandler 
http://google.com/+MikhailKhludnev
You are expected to know 
● how Lucene searches/filters 
● how it counts facets 
● that there are segments 
● what is DocValues 
● why to join 
● RDBMS joins: nested loop join, sort-merge join and hash join.
I’m expected to know 
● query-time join 
● index-time join 
● yet another one join
Lucene/Solr Is Strong 
● searching 
○ filtering 
● analytics 
○ facets 
○ pivots 
○ stats
Lucene/Solr Is Weak in 
● robust joins 
○ multiple entities 
○ relations
Joins in General
Joins in General
Joins in General 
PK=FK
Joins in General 
PK=FK
Joins in General 
PK=FK
children 
Joins in General 
1:M 
parents
Executing Join Query 
q
Executing Join Query 
q
Executing Join Query 
q
Executing Join Query 
q 
fq
Executing Join Query 
q 
fq
Join in General 
parents ∩ join-relation ∩ children
Join in General 
parents ∩ join-relation ∩ children 
● input 
● output 
● enumeration order
JoinUtil 
q 
FK[doc#] 
“17” 
“17” 
“25” 
“25” 
“56” 
“56” 
“56” 
“25” 
“4” 
“61” 
“25”
JoinUtil 
q 
FK[doc#] 
“17” 
“17” 
“25” 
“25” 
“56” 
“56” 
“56” 
“25” 
“4” 
“61” 
“25” 
“25” 
“17” 
...
JoinUtil 
q 
FK 
“1” →△ 
… 
“17”→△ 
“25”→△ 
... 
“25” 
“17” 
... 
FK[doc#] 
“17” 
“17” 
“25” 
“25” 
“56” 
“56” 
“56” 
“25” 
“4” 
“61” 
“25”
JoinUtil 
q 
FK FK[doc#] 
“1” →△ 
… 
“17”→△ 
“25”→△ 
... 
“25” 
“17” 
... 
fq 
“17” 
“17” 
“25” 
“25” 
“56” 
“56” 
“56” 
“25” 
“4” 
“61” 
“25”
Block Join 
doc#
Block Join 
doc#
Block Join 
1 doc# 
0 
0 
1 
0 
0 
1 
0 
0 
1 
0
Block Join 
1 doc# 
0 
0 
1 
0 
0 
1 
0 
0 
1 
0 
q
Block Join 
1 doc# 
0 
0 
1 
0 
0 
1 
0 
0 
1 
0 
q 
fq
Block Join 
1 doc# 
0 
0 
1 
0 
0 
1 
0 
0 
1 
0 
q 
fq
Comparison 
JoinUtil BlockJoin 
searching 
is fast 
reindexing 
is fast
Comparison 
JoinUtil BlockJoin 
searching 
is fast 
reindexing 
is fast
Comparison 
JoinUtil BlockJoin 
searching 
is fast < ? < 
reindexing 
is fast > ? >
Comparison 
JoinUtil BlockJoin 
searching 
is fast < ? < 
reindexing 
is fast > ? > 
doc#
Comparison 
JoinUtil BlockJoin 
searching 
is fast < ? < 
reindexing 
is fast > ? > 
doc#
Comparison 
JoinUtil BlockJoin 
searching 
is fast < ? < 
reindexing 
is fast > ? > 
doc#
Join Index 
q 
doc#[doc#] 
3 
6 
0 
3 
10 
0 
6 
3
Join Index 
q 
doc#[doc#] 
3 
6 
0 
3 
10 
0 
6 
3
Join Index 
q 
doc#[doc#] 
fq 3 
6 
0 
3 
10 
0 
6 
3
Join Index 
q 
fq 
doc#[doc#] 
4 
1 
2 
6 
8 
5 10 
9
Join Index 
q 
doc#[doc#] 
fq 
4 
1 
2 
6 
8 
5 10 
9
Join Index 
q 
doc#[doc#] 
fq 
4 
1 
2 
6 
8 
5 10 
9
Benchmarking 2.9 M docs 
Latency, ms 
the bigger the worse 
27 
14 
10 
BlockJoin 
(i-time) 
Join Index 
JoinUtil 
(q-time) 
https://github.com/m-khl/lucene-solr/tree/dvjoin-benchmark
Indexing is still a problem 
doc#[doc#] 
4 
3 
6 
1 
0 
3 
2 
10 
0 
6 
3 
8 
5 10 
9
Further Plans 
● incremental join-index update 
● perhaps just calculate and cache it 
● join in both directions 
● calculate optimal execution plan of segments enumeration
Summary 
● Joins in General 
● JoinUtil vs Block-join 
● {!scorejoin } - SOLR-6234 
● updatable DocValues 
● opportunities for improving query-time joins: 
○ eliminate term enum 
○ choose lower cardinality side for enumeration
References 
● Searching relational content with Lucene's BlockJoinQuery 
http://blog.mikemccandless.com/ 
● Solr Experience: search parent-child relations. Part I 
Solr block-join support 
http://blog.griddynamics.com/ 
● https://wiki.apache.org/solr/Join 
● http://www.slideshare.net/martijnvg/document-relations 
● https://issues.apache.org/jira/browse/SOLR-6234 {!scorejoin } 
● Updatable DocValues Under the Hood 
http://shaierera.blogspot.com/ 
● Subject: How to openIfChanged the most recent merge? 
at: java-dev@lucene.apache.org
Thanks for Joining us!
Off scope
Joins’ Zoo in Lucene 
True Joins 
● query-time join 
○ JoinUtil 
○ {!join } 
○ {!scorejoin } - SOLR-6234 
● index-time join aka block-join {! 
parent}
Joins’ Zoo in Lucene 
Workarounds 
● term positions/SpanQueries 
● FieldCollapsing/Grouping 
● term decoration 
○ spatial 
● multivalue fields 
True Joins 
● query-time join 
○ JoinUtil 
○ {!join }, 
○ {!scorejoin } - SOLR-6234 
● index-time join aka block-join {! 
parent}
Joins’ Zoo in Lucene 
Workarounds 
● term positions/SpanQueries 
● FieldCollapsing/Grouping 
● term decoration 
○ spatial 
● multivalue fields 
True Joins 
● query-time join 
○ JoinUtil 
○ {!join }, 
○ {!scorejoin } - SOLR-6234 
● index-time join aka block-join {! 
parent}
Two phase update problem 
Subject: How to openIfChanged the most recent merge? 
at: java-dev@lucene.apache.org
JoinUtil 
● query-time 
● indexing is fast 
● searching is slow, why? 
○ expensive term enum 
○ single enumeration order 
BlockJoin 
● index-time 
● reindexing whole block is as expensive as mandatory 
● searching is darn fast, however 
○ can’t reorder child docs
Join Index Yet Another Join Algorithm

More Related Content

What's hot

Learning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examplesLearning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examplesNandana Mihindukulasooriya
 
Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)Hector Correa
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Sematext Group, Inc.
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
LDP4j: A framework for the development of interoperable read-write Linked Da...
LDP4j: A framework for the development of interoperable read-write Linked Da...LDP4j: A framework for the development of interoperable read-write Linked Da...
LDP4j: A framework for the development of interoperable read-write Linked Da...Nandana Mihindukulasooriya
 
Describing LDP Applications with the Hydra Core Vocabulary
Describing LDP Applications with the Hydra Core VocabularyDescribing LDP Applications with the Hydra Core Vocabulary
Describing LDP Applications with the Hydra Core VocabularyNandana Mihindukulasooriya
 
W3C Linked Data Platform Overview
W3C Linked Data Platform OverviewW3C Linked Data Platform Overview
W3C Linked Data Platform OverviewSteve Speicher
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Karel Minarik
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1YI-CHING WU
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
REST meets Semantic Web
REST meets Semantic WebREST meets Semantic Web
REST meets Semantic WebSteve Speicher
 
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Lucidworks
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrAndy Jackson
 
Sem tech 2010_integrity_constraints
Sem tech 2010_integrity_constraintsSem tech 2010_integrity_constraints
Sem tech 2010_integrity_constraintsClark & Parsia LLC
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 

What's hot (20)

Learning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examplesLearning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examples
 
Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)Introduction to Linked Data Platform (LDP)
Introduction to Linked Data Platform (LDP)
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Introduction to W3C Linked Data Platform
Introduction to W3C Linked Data PlatformIntroduction to W3C Linked Data Platform
Introduction to W3C Linked Data Platform
 
LDP4j: A framework for the development of interoperable read-write Linked Da...
LDP4j: A framework for the development of interoperable read-write Linked Da...LDP4j: A framework for the development of interoperable read-write Linked Da...
LDP4j: A framework for the development of interoperable read-write Linked Da...
 
Describing LDP Applications with the Hydra Core Vocabulary
Describing LDP Applications with the Hydra Core VocabularyDescribing LDP Applications with the Hydra Core Vocabulary
Describing LDP Applications with the Hydra Core Vocabulary
 
W3C Linked Data Platform Overview
W3C Linked Data Platform OverviewW3C Linked Data Platform Overview
W3C Linked Data Platform Overview
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
REST meets Semantic Web
REST meets Semantic WebREST meets Semantic Web
REST meets Semantic Web
 
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
 
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Sem tech 2010_integrity_constraints
Sem tech 2010_integrity_constraintsSem tech 2010_integrity_constraints
Sem tech 2010_integrity_constraints
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 

Similar to Join Index Yet Another Join Algorithm

Building Bridges with Taxonomy: Enabling Semantic Integration
Building Bridges with Taxonomy: Enabling Semantic IntegrationBuilding Bridges with Taxonomy: Enabling Semantic Integration
Building Bridges with Taxonomy: Enabling Semantic IntegrationDesign for Context
 
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutionsAshwin Kumar
 
Migrating to Database 12c Multitenant - New Opportunities To Get It Right!
Migrating to Database 12c Multitenant - New Opportunities To Get It Right!Migrating to Database 12c Multitenant - New Opportunities To Get It Right!
Migrating to Database 12c Multitenant - New Opportunities To Get It Right!Performance Tuning Corporation
 
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docxtienmixon
 
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docxadkinspaige22
 
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docxpoulterbarbara
 
Qcon beijing 2010
Qcon beijing 2010Qcon beijing 2010
Qcon beijing 2010Vonbo
 
Bring your Content into Focus using Data Driven Analytics.pptx
Bring your Content into Focus using Data Driven Analytics.pptxBring your Content into Focus using Data Driven Analytics.pptx
Bring your Content into Focus using Data Driven Analytics.pptxRachelWalters28
 
Webinar: Deploy Microsoft Teams and stay in control
Webinar: Deploy Microsoft Teams and stay in controlWebinar: Deploy Microsoft Teams and stay in control
Webinar: Deploy Microsoft Teams and stay in controlShareGate
 
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...Joanne Klein
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise WideDatabricks
 
Spsbe 18-04-15 - should i move my network folders to office 365
Spsbe   18-04-15 - should i move my network folders to office 365Spsbe   18-04-15 - should i move my network folders to office 365
Spsbe 18-04-15 - should i move my network folders to office 365BIWUG
 
10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for AnalyticsSenturus
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixCodemotion Tel Aviv
 
Pl sql student guide v 3
Pl sql student guide v 3Pl sql student guide v 3
Pl sql student guide v 3Nexus
 
Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...
Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...
Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...datascienceiqss
 
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...Fishbowl Solutions
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Databricks
 
Intern Project Showcase.pptx
Intern Project Showcase.pptxIntern Project Showcase.pptx
Intern Project Showcase.pptxritikgarg48
 
Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization Iterative Methodology for Personalization Models Optimization
Iterative Methodology for Personalization Models OptimizationSonya Liberman
 

Similar to Join Index Yet Another Join Algorithm (20)

Building Bridges with Taxonomy: Enabling Semantic Integration
Building Bridges with Taxonomy: Enabling Semantic IntegrationBuilding Bridges with Taxonomy: Enabling Semantic Integration
Building Bridges with Taxonomy: Enabling Semantic Integration
 
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
 
Migrating to Database 12c Multitenant - New Opportunities To Get It Right!
Migrating to Database 12c Multitenant - New Opportunities To Get It Right!Migrating to Database 12c Multitenant - New Opportunities To Get It Right!
Migrating to Database 12c Multitenant - New Opportunities To Get It Right!
 
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
 
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
 
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
 
Qcon beijing 2010
Qcon beijing 2010Qcon beijing 2010
Qcon beijing 2010
 
Bring your Content into Focus using Data Driven Analytics.pptx
Bring your Content into Focus using Data Driven Analytics.pptxBring your Content into Focus using Data Driven Analytics.pptx
Bring your Content into Focus using Data Driven Analytics.pptx
 
Webinar: Deploy Microsoft Teams and stay in control
Webinar: Deploy Microsoft Teams and stay in controlWebinar: Deploy Microsoft Teams and stay in control
Webinar: Deploy Microsoft Teams and stay in control
 
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide
 
Spsbe 18-04-15 - should i move my network folders to office 365
Spsbe   18-04-15 - should i move my network folders to office 365Spsbe   18-04-15 - should i move my network folders to office 365
Spsbe 18-04-15 - should i move my network folders to office 365
 
10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
 
Pl sql student guide v 3
Pl sql student guide v 3Pl sql student guide v 3
Pl sql student guide v 3
 
Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...
Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...
Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...
 
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
 
Intern Project Showcase.pptx
Intern Project Showcase.pptxIntern Project Showcase.pptx
Intern Project Showcase.pptx
 
Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization Iterative Methodology for Personalization Models Optimization
Iterative Methodology for Personalization Models Optimization
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 

Join Index Yet Another Join Algorithm

  • 1.
  • 2. Approaching Join Index yet another one join algorithm Mikhail Khludnev principal engineer
  • 3. • Grid Dynamics is a Silicon Valley-based leading provider of scalable, next-generation commerce technology solutions • Record of outperformance with Tier 1 retail clients • Fortune 1000 client relationships PRIVILEGED AND CONFIDENTIAL
  • 4. About Me ● principal engineer at Grid Dynamics ● spoke at few last LuceneRevolutions ● contributed BlockJoin query parser for Solr - {!parent} ● blogged about it at http://blog.griddynamics.com/ ● tried to fix threads at DataImportHandler http://google.com/+MikhailKhludnev
  • 5. You are expected to know ● how Lucene searches/filters ● how it counts facets ● that there are segments ● what is DocValues ● why to join ● RDBMS joins: nested loop join, sort-merge join and hash join.
  • 6. I’m expected to know ● query-time join ● index-time join ● yet another one join
  • 7. Lucene/Solr Is Strong ● searching ○ filtering ● analytics ○ facets ○ pivots ○ stats
  • 8. Lucene/Solr Is Weak in ● robust joins ○ multiple entities ○ relations
  • 14. children Joins in General 1:M parents
  • 20. Join in General parents ∩ join-relation ∩ children
  • 21. Join in General parents ∩ join-relation ∩ children ● input ● output ● enumeration order
  • 22. JoinUtil q FK[doc#] “17” “17” “25” “25” “56” “56” “56” “25” “4” “61” “25”
  • 23. JoinUtil q FK[doc#] “17” “17” “25” “25” “56” “56” “56” “25” “4” “61” “25” “25” “17” ...
  • 24. JoinUtil q FK “1” →△ … “17”→△ “25”→△ ... “25” “17” ... FK[doc#] “17” “17” “25” “25” “56” “56” “56” “25” “4” “61” “25”
  • 25. JoinUtil q FK FK[doc#] “1” →△ … “17”→△ “25”→△ ... “25” “17” ... fq “17” “17” “25” “25” “56” “56” “56” “25” “4” “61” “25”
  • 28. Block Join 1 doc# 0 0 1 0 0 1 0 0 1 0
  • 29. Block Join 1 doc# 0 0 1 0 0 1 0 0 1 0 q
  • 30. Block Join 1 doc# 0 0 1 0 0 1 0 0 1 0 q fq
  • 31. Block Join 1 doc# 0 0 1 0 0 1 0 0 1 0 q fq
  • 32. Comparison JoinUtil BlockJoin searching is fast reindexing is fast
  • 33. Comparison JoinUtil BlockJoin searching is fast reindexing is fast
  • 34. Comparison JoinUtil BlockJoin searching is fast < ? < reindexing is fast > ? >
  • 35. Comparison JoinUtil BlockJoin searching is fast < ? < reindexing is fast > ? > doc#
  • 36. Comparison JoinUtil BlockJoin searching is fast < ? < reindexing is fast > ? > doc#
  • 37. Comparison JoinUtil BlockJoin searching is fast < ? < reindexing is fast > ? > doc#
  • 38. Join Index q doc#[doc#] 3 6 0 3 10 0 6 3
  • 39. Join Index q doc#[doc#] 3 6 0 3 10 0 6 3
  • 40. Join Index q doc#[doc#] fq 3 6 0 3 10 0 6 3
  • 41. Join Index q fq doc#[doc#] 4 1 2 6 8 5 10 9
  • 42. Join Index q doc#[doc#] fq 4 1 2 6 8 5 10 9
  • 43. Join Index q doc#[doc#] fq 4 1 2 6 8 5 10 9
  • 44. Benchmarking 2.9 M docs Latency, ms the bigger the worse 27 14 10 BlockJoin (i-time) Join Index JoinUtil (q-time) https://github.com/m-khl/lucene-solr/tree/dvjoin-benchmark
  • 45. Indexing is still a problem doc#[doc#] 4 3 6 1 0 3 2 10 0 6 3 8 5 10 9
  • 46. Further Plans ● incremental join-index update ● perhaps just calculate and cache it ● join in both directions ● calculate optimal execution plan of segments enumeration
  • 47. Summary ● Joins in General ● JoinUtil vs Block-join ● {!scorejoin } - SOLR-6234 ● updatable DocValues ● opportunities for improving query-time joins: ○ eliminate term enum ○ choose lower cardinality side for enumeration
  • 48. References ● Searching relational content with Lucene's BlockJoinQuery http://blog.mikemccandless.com/ ● Solr Experience: search parent-child relations. Part I Solr block-join support http://blog.griddynamics.com/ ● https://wiki.apache.org/solr/Join ● http://www.slideshare.net/martijnvg/document-relations ● https://issues.apache.org/jira/browse/SOLR-6234 {!scorejoin } ● Updatable DocValues Under the Hood http://shaierera.blogspot.com/ ● Subject: How to openIfChanged the most recent merge? at: java-dev@lucene.apache.org
  • 51. Joins’ Zoo in Lucene True Joins ● query-time join ○ JoinUtil ○ {!join } ○ {!scorejoin } - SOLR-6234 ● index-time join aka block-join {! parent}
  • 52. Joins’ Zoo in Lucene Workarounds ● term positions/SpanQueries ● FieldCollapsing/Grouping ● term decoration ○ spatial ● multivalue fields True Joins ● query-time join ○ JoinUtil ○ {!join }, ○ {!scorejoin } - SOLR-6234 ● index-time join aka block-join {! parent}
  • 53. Joins’ Zoo in Lucene Workarounds ● term positions/SpanQueries ● FieldCollapsing/Grouping ● term decoration ○ spatial ● multivalue fields True Joins ● query-time join ○ JoinUtil ○ {!join }, ○ {!scorejoin } - SOLR-6234 ● index-time join aka block-join {! parent}
  • 54. Two phase update problem Subject: How to openIfChanged the most recent merge? at: java-dev@lucene.apache.org
  • 55. JoinUtil ● query-time ● indexing is fast ● searching is slow, why? ○ expensive term enum ○ single enumeration order BlockJoin ● index-time ● reindexing whole block is as expensive as mandatory ● searching is darn fast, however ○ can’t reorder child docs