Join Index Yet Another Join Algorithm

Approaching Join Index
yet another one join algorithm
Mikhail Khludnev
principal engineer

• Grid Dynamics is a Silicon Valley-based leading provider of scalable, next-generation
commerce technology solutions
• Record of outperformance with Tier 1 retail clients
• Fortune 1000 client relationships
PRIVILEGED AND CONFIDENTIAL

About Me
● principal engineer at Grid Dynamics
● spoke at few last LuceneRevolutions
● contributed BlockJoin query parser for Solr - {!parent}
● blogged about it at http://blog.griddynamics.com/
● tried to fix threads at DataImportHandler
http://google.com/+MikhailKhludnev

You are expected to know
● how Lucene searches/filters
● how it counts facets
● that there are segments
● what is DocValues
● why to join
● RDBMS joins: nested loop join, sort-merge join and hash join.

I’m expected to know
● query-time join
● index-time join
● yet another one join

Lucene/Solr Is Strong
● searching
○ filtering
● analytics
○ facets
○ pivots
○ stats

Lucene/Solr Is Weak in
● robust joins
○ multiple entities
○ relations

children
Joins in General
1:M
parents

Join in General
parents ∩ join-relation ∩ children

Join in General
parents ∩ join-relation ∩ children
● input
● output
● enumeration order

JoinUtil
q
FK[doc#]
“17”
“17”
“25”
“25”
“56”
“56”
“56”
“25”
“4”
“61”
“25”

JoinUtil
q
FK[doc#]
“17”
“17”
“25”
“25”
“56”
“56”
“56”
“25”
“4”
“61”
“25”
“25”
“17”
...

JoinUtil
q
FK
“1” →△
…
“17”→△
“25”→△
...
“25”
“17”
...
FK[doc#]
“17”
“17”
“25”
“25”
“56”
“56”
“56”
“25”
“4”
“61”
“25”

JoinUtil
q
FK FK[doc#]
“1” →△
…
“17”→△
“25”→△
...
“25”
“17”
...
fq
“17”
“17”
“25”
“25”
“56”
“56”
“56”
“25”
“4”
“61”
“25”

Block Join
1 doc#
0
0
1
0
0
1
0
0
1
0

Block Join
1 doc#
0
0
1
0
0
1
0
0
1
0
q

Block Join
1 doc#
0
0
1
0
0
1
0
0
1
0
q
fq

Comparison
JoinUtil BlockJoin
searching
is fast
reindexing
is fast

Comparison
JoinUtil BlockJoin
searching
is fast < ? <
reindexing
is fast > ? >

Comparison
JoinUtil BlockJoin
searching
is fast < ? <
reindexing
is fast > ? >
doc#

Join Index
q
doc#[doc#]
3
6
0
3
10
0
6
3

Join Index
q
doc#[doc#]
fq 3
6
0
3
10
0
6
3

Join Index
q
fq
doc#[doc#]
4
1
2
6
8
5 10
9

Join Index
q
doc#[doc#]
fq
4
1
2
6
8
5 10
9

Benchmarking 2.9 M docs
Latency, ms
the bigger the worse
27
14
10
BlockJoin
(i-time)
Join Index
JoinUtil
(q-time)
https://github.com/m-khl/lucene-solr/tree/dvjoin-benchmark

Indexing is still a problem
doc#[doc#]
4
3
6
1
0
3
2
10
0
6
3
8
5 10
9

Further Plans
● incremental join-index update
● perhaps just calculate and cache it
● join in both directions
● calculate optimal execution plan of segments enumeration

Summary
● Joins in General
● JoinUtil vs Block-join
● {!scorejoin } - SOLR-6234
● updatable DocValues
● opportunities for improving query-time joins:
○ eliminate term enum
○ choose lower cardinality side for enumeration

References
● Searching relational content with Lucene's BlockJoinQuery
http://blog.mikemccandless.com/
● Solr Experience: search parent-child relations. Part I
Solr block-join support
http://blog.griddynamics.com/
● https://wiki.apache.org/solr/Join
● http://www.slideshare.net/martijnvg/document-relations
● https://issues.apache.org/jira/browse/SOLR-6234 {!scorejoin }
● Updatable DocValues Under the Hood
http://shaierera.blogspot.com/
● Subject: How to openIfChanged the most recent merge?
at: java-dev@lucene.apache.org

Joins’ Zoo in Lucene
True Joins
● query-time join
○ JoinUtil
○ {!join }
○ {!scorejoin } - SOLR-6234
● index-time join aka block-join {!
parent}

Joins’ Zoo in Lucene
Workarounds
● term positions/SpanQueries
● FieldCollapsing/Grouping
● term decoration
○ spatial
● multivalue fields
True Joins
● query-time join
○ JoinUtil
○ {!join },
○ {!scorejoin } - SOLR-6234
● index-time join aka block-join {!
parent}

Two phase update problem
Subject: How to openIfChanged the most recent merge?
at: java-dev@lucene.apache.org

JoinUtil
● query-time
● indexing is fast
● searching is slow, why?
○ expensive term enum
○ single enumeration order
BlockJoin
● index-time
● reindexing whole block is as expensive as mandatory
● searching is darn fast, however
○ can’t reorder child docs

Join Index Yet Another Join Algorithm

Join Index Yet Another Join Algorithm

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Join Index Yet Another Join Algorithm

Similar to Join Index Yet Another Join Algorithm (20)

More from Lucidworks

More from Lucidworks (20)

Recently uploaded

Recently uploaded (20)

Join Index Yet Another Join Algorithm