SlideShare a Scribd company logo
1 of 59
Download to read offline
More on "More Like This"
Recommendations in SOLR
Oana Brezai
Software Engineer @ eSolutions
Outline
Use Case
How does Search work
How does MLT work
A limitation of MLT
Quality of the results
Conclusions
Use Case:
Build a
Recommendation
Application
Requirements
● Movie Store
● ~ 85 K Movies
● Use Open Source Software
Solution
● Fast
● High Quality Results
Why
Apache SOLR ?
Solr (NoSQL DB)
● Popular
● Blazing-fast
● Highly scalable
● Open source enterprise search platform
● Built on Apache Lucene
Who Uses SOLR
“Movie Store”
Use Case
When
● A user visualizes the details of a
movie
Then
● The application recommends
“similar” movies
Example
Target Movie
● The Lord of the Rings: The
Fellowship of the Ring
Recommendations
1) The Lord of the Rings: The Return of
the King
2) The Lord of the Rings: The Two
Towers
3) The Lord of the Rings
4) Lord of War
5) The Lord Protector
What Does
“Similar”
Mean?
Target Movie
● “The Lord of the Rings: The
Fellowship of the Ring”

Action / Adventure / Drama

8.8 on IMDB
Recommended (Similar) Movies
● The same words in the title
● The same movie genre
● The same words in the description
● Similar IMDB vote
Questions
Questions for our
Recommendation System
● Do all the words have the
same importance?
● Do all the fields have the same
importance?
● How does the engine
differentiate between results?
Let’s START!
Add Data
to SOLR
Create a Collection (~Table)
● movie_content
Populate the Collection with
Data
● 85855 movies
Data
Structure
Movie Fields
● imdb_title_id (movie id)
● original_title
● description
● genre
● avg_vote (imdb vote)
Movie Fields -> with Types
● imdb_title_id -> string
● original_title -> “analyzed” text
● description -> “analyzed” text
● genre -> array of strings
● avg_vote -> number
String vs “Analyzed” Text Field Types
● Field Type: String
● Example: “Comedy” (field: genre)
 Indexed: “Comedy”
● Field Type: “Analyzed” Text
● Example: “The Lord of the Rings: The Fellowship of the Ring” (field:
original_title)
 Indexed (lowercased and without stopwords):
○ “lord”
○ “rings”
○ “fellowship”
○ “ring”
“The Lord of the Rings: The Fellowship of the
Ring”
● Movie Id (imdb_title_id): tt0120737
● Original Title
 “The Lord of the Rings: The Fellowship of the Ring”
● Description
 “A meek Hobbit from the Shire and eight companions set out on a
journey to destroy the powerful One Ring and save Middle-earth from the
Dark Lord Sauron.”
● Genre
 “Action, Adventure, Drama”
● Imdb vote (avg_vote): 8.8
“More Like
This” Feature
in SOLR
More Like This
● Given a movie id => list
“similar” movies
● Uses the “Search” functionality
How Does
“Search”
Work in SOLR?
“Search”
Example 1:
Query
original_title: “Lord of the Rings”
Results
● No movies found
“Search”
Example 2:
Query
original_title: “Lord” AND
original_title: “Rings”
Results (4)
1) "The Lord of the Rings"
2) "The Lord of the Rings: The
Fellowship of the Ring"
3) "The Lord of the Rings: The
Return of the King"
4) "The Lord of the Rings: The Two
Towers”
Execution time: 21 ms
How Does the Search original_title: “Lord”
AND original_title: “Rings” Function?
● Searches in the original_title index all the movies that contain
the words “lord” AND “rings” (lowercased!)
● Computes search score based on Boosting, Term Frequency (TF)
and Inverse Document Frequency (IDF)
● Displays the results in descending order of the score
The TF / IDF Scoring Formula
score[movie] =∑(boost(field[j]) * tf(word[i]) * idf(word[i]))
where:
boost(field[j]) = custom weight given to the field j
tf(word[i]) = countTermFreq/(countTermFreq + 1.2 * (1 - 0.75 + 0.75 * fieldLength/avgFieldLength))
idf(word[i]) = log(1 + (countDocumentFreq - countTermFreq + 0.5) / (countTermFreq + 0.5))
word[i] = every word in the field, excluding stop words (in our case)
fieldLength = count of words in the field, excluding stop words (in our case)
avgFieldLength = average length of field
original_title = “The Lord of the Rings”
genre = “Animation, Adventure, Fantasy”
description = “The Fellowship of the Ring embark ...”
score = 1 * tf(“lord”) * idf(“lord”) +
1 * tf(“rings”) * idf(“rings”) +
1 * tf(“Animation”) * idf(“Animation”) + ...
Debug the Scoring Formula
score[movie] =∑(boost(field[j]) * tf(word[i]) * idf(word[i]))
Debug the TF / IDF Formula for the
QUERY = original_title:Lord AND original_title:Rings
Original title CTF (Field)
Lord Rings
CDF (Corpus)
Lord Rings
Field
Length
Score
The Lord of the Rings 1 1 26 10 2 8.29
The Lord of the Rings:
The Fellowship of the Ring
1 1 26 10 4 6.06
The Lord of the Rings:
The Return of the King
1 1 26 10 4 6.06
The Lord of the Rings:
The Two Towers
1 1 26 10 4 6.06
tf(word[i]) = countTermFreq/(countTermFreq + 1.2 * (1 - 0.75 + 0.75 * fieldLength / avgFieldLength))
idf(word[i]) = log(1 + (countDocumentFreq - countTermFreq + 0.5) / (countTermFreq + 0.5))
“Search”
in SOLR
High Quality
● Scoring Formula
 TF / IDF
 Boosting
Fast
● Inverted Index
Inverted Index (original_title)
Id
(imdb_title_id)
Tile (original_title)
tt0120737 The Lord of the Rings:
The Fellowship of the Ring
tt0167260 The Lord of the Rings:
The Return of the King
tt0167261 The Lord of the Rings:
The Two Towers
tt0077869 The Lord of the Rings
Word Ids (imbd_title_id)
lord tt0120737,
tt0167260,
tt0167261, tt0077869
rings tt0120737,
tt0167260,
tt0167261, tt0077869
ring tt0120737
fellowship tt0120737
return tt0167260
king tt0167260
towers tt0167261
two tt0167261
How Does
“More Like This”
Work in SOLR?
“More Like
This”
Example
Query
● q = imdb_title_id:tt0120737
(“The Lord of the Rings: The
Fellowship of the Ring”)
● Other parameters:
 mlt = true
 mlt.fl=original_title,
description, genre, avg_vote
 mlt.mintf = 1
 mlt.count = 5
“More Like
This”
Example URL
http://localhost:8983/solr/movie_content
/select?
mlt=true&mlt.mintf=1
&mlt.fl=original_title,description,genre,av
g_vote
&q=imdb_title_id:tt0120737
&mlt.count=5
Results
Results (“The Lord of the
Rings: The Fellowship of the
Ring”)
● Execution Time: <100 ms
● Total Results: 62387
Score Title Year Genre Vote
24.49 The Lord of the Rings 1978 Animation / Adventure / Fantasy 6.2
14.78 The Ring Thing 2004 Adventure / Comedy 3.5
13.11 The Dork of the Rings 2006 Adventure / Comedy / Fantasy 3.2
12.65 The Lord of the Rings:
The Return of the King
2003 Action / Adventure / Drama 8.9
11.23 The Lord Protector 1996 Action / Adventure / Fantasy 4.2
Results for “The Lord of the Rings: The Fellowship of the
Ring” (Action, Adventure, Drama - 8.8)
Score Title Year Genre Vote
24.49 The Lord of the Rings 1978 Animation / Adventure / Fantasy 6.2
14.78 The Ring Thing 2004 Adventure / Comedy 3.5
13.11 The Dork of the Rings 2006 Adventure / Comedy / Fantasy 3.2
12.65 The Lord of the Rings:
The Return of the King
2003 Action / Adventure / Drama 8.9
11.23 The Lord Protector 1996 Action / Adventure / Fantasy 4.2
Results for “The Lord of the Rings: The Fellowship of
the Ring” (Action, Adventure, Drama - 8.8)
Improve Query:
Add Boosting
Boost Fields (Add Weight)
● original_title
● description
● genre
● avg_vote
Importance of Fields
avg_vote >> genre >> original_title >> description
Boosting factors:
● avg_vote -> 40
● genre -> 30
● original_title -> 20
● description -> 1
For every word in (original_title, description, genre)
do
score + = boosting(field) * tf(word) * idf(word)
Scoring Formula
genre = “Animation, Adventure, Fantasy” -- BOOSTING 30
original_title = “The Lord of the Rings” --- BOOSTING 20
description = “The Fellowship of the Ring embark ...” -- BOOSTING 1
score = 30 * tf(“Animation”) * idf(“Animation”) +
30 * tf(“Adventure”) * idf(“Adventure”) +
30 * tf(“Fantasy”) * idf(“Fantasy”) +
20 * tf(“lord”) * idf(“lord”) + ...
Debug Scoring Formula with Boosting
http://localhost:8983/solr/movie_content
/select?
mlt=true&mlt.mindf=1&mlt.mintf=1
&mlt.fl=original_title,description,genre,avg_vote
&q=imdb_title_id:tt0120737
&mlt.boost=true&mlt.qf=avg_vote^40 genre^30 original_title^20 description
&mlt.count=5
SOLR: More Like This URL Request
Results for “The Lord of the Rings: The Fellowship of the
Ring” (Action, Adventure, Drama - 8.8)
Score Title Year Genre Vote
1132 The Lord of the Rings 1978 Animation / Adventure / Fantasy 6.2
894 The Lord of the Rings:
The Return of the King
2003 Action / Adventure / Drama 8.9
881 The Lord of the Rings:
The Two Towers
2002 Action / Adventure / Drama 8.7
667 Rings 2017 Drama / Horror / Mystery 4.5
661 The Dork of the Rings 2006 Adventure / Comedy / Fantasy 3.2
Results for “The Lord of the Rings: The Fellowship of the
Ring” (Action, Adventure, Drama - 8.8)
Score Title Year Genre Vote
1132 The Lord of the Rings 1978 Animation / Adventure / Fantasy 6.2
894 The Lord of the Rings:
The Return of the King
2003 Action / Adventure / Drama 8.9
881 The Lord of the Rings:
The Two Towers
2002 Action / Adventure / Drama 8.7
667 Rings 2017 Drama / Horror / Mystery 4.5
661 The Dork of the Rings 2006 Adventure / Comedy / Fantasy 3.2
A Limitation of
“More Like This”
Numeric Fields
Ignored in MLT
Issue
● Only text fields are used in MLT
queries
Solution
● Rewrite the whole query as a
search query and include also
the numeric fields
More on
“More Like This”
in SOLR
“More Like This”
Steps
1) Extract the “interesting terms”
from the target movie
2) Add boostings / field (as given in
the query) for every interesting term
3) Perform a Search with those words
and boostings
“More Like This” Step 1
1) Extract the “interesting terms” from the target movie (from the field list in
the query): take all the words from all the fields and compute their relevance. Keep
the first 25.
Ex: word “ring” -> very relevant for the movie: “The Lord of the Rings: The
Fellowship of the Ring”:
- 2 occurrences: once in “original_title” and once in “description”
- in the whole corpus of 85855 movies:
- 35 times in the field “original_title” and
- 282 times in the field “description”
2) Add boostings / field (as given in the query) for every interesting term
3) Perform a Search with those words and boostings
List of Interesting Terms for MovieId
tt0120737
genre:Drama
genre:Action
genre:Adventure
description:one
description:set
description:save
description:journey
description:middle
description:meek
description:hobbit
description:shire
description:sauron
original_title:fellowship
original_title:ring
original_title:lord
original_title:rings
description:dark
description:earth
description:powerful
description:destroy
description:lord
description:ring
description:eight
description:companions
“More Like This” Step 2
1) Extract the “interesting terms” from the target movie (from the field list in
the query)
2) Add boostings / field (as given in the query) for every interesting term:
avg_vote^40 genre^30 original_title^20 description
3) Perform a Search with those words and boostings
Interesting Terms for tt0120737 with Boosting
genre:Drama^30
genre:Action^30
genre:Adventure^30
description:one
description:set
description:save
description:journey
description:middle
description:meek
description:hobbit
description:shire
description:sauron
original_title:fellowship^20
original_title:ring^20
original_title:lord^20
original_title:rings^20
description:dark
description:earth
description:powerful
description:destroy
description:lord
description:ring
description:eight
description:companions
“More Like This” Step 3
1) Extract the “interesting terms” from the target movie (from the field list in
the query)
2) Add boostings / field (as given in the query) for every interesting term
3) Perform a Search with those words and boostings
Results for “The Lord of the Rings: The Fellowship of the
Ring” (Action, Adventure, Drama - 8.8)
Score Title Year Genre Vote
1132 The Lord of the Rings 1978 Animation / Adventure / Fantasy 6.2
894 The Lord of the Rings:
The Return of the King
2003 Action / Adventure / Drama 8.9
881 The Lord of the Rings:
The Two Towers
2002 Action / Adventure / Drama 8.7
667 Rings 2017 Drama / Horror / Mystery 4.5
661 The Dork of the Rings 2006 Adventure / Comedy / Fantasy 3.2
Add Numeric
Fields to
“More Like This”
1) SOLR Request 1: perform a MLT and
get the “interesting terms”
2) Add boostings
3) Add numeric fields with their
boostings
4) SOLR Request 2: perform a Search
with numeric fields and “interesting
terms” with their respective
boostings
Example of Numeric Field Syntax
Target movie: avg_vote = 8.8
=> a similar movie would have:
avg_vote: [8.8 - 1.5 TO 8.8 + 1.5]
=> add boosting factor:
avg_vote: [7.3 TO 10.3] ^ 40
Final SOLR Search Query
genre:Drama^30
genre:Action^30
genre:Adventure^30
description:one
description:set
description:save
description:journey
description:middle
description:meek
description:hobbit
description:shire
description:sauron
original_title:fellowship^20
original_title:ring^20
original_title:lord^20
original_title:rings^20
description:dark
description:earth
description:powerful
description:destroy
description:lord
description:ring
description:eight
description:companions
avg_vote:[7.3 TO 10.3]^40
Q =
Final Results for “The Lord of the Rings: The Fellowship of
the Ring”(Action, Adventure, Drama - 8.8)
Score Title Year Genre Vote
249 The Lord of the Rings:
The Return of the King
2003 Action / Adventure / Drama 8.9
246 The Lord of the Rings:
The Two Towers
2002 Action / Adventure / Drama 8.7
222 The Lord of the Rings 1978 Animation / Adventure / Fantasy 6.2
161 Lord of War 2005 Action / Crime / Drama 7.6
157 The Lord Protector 1996 Action / Adventure / Fantasy 4.2
Quality of the
Results
Quality
Recommended Products
Ordered
● Based on history of sales
Recommended Products
Viewed
● Based on history of browsing
Conclusions
Conclusions
MLT in SOLR
● Inverted Index
● TF/IDF Scoring Formula
● Boosting
Quality Measurement
Feedback Loop
● Recommended Products Ordered
● Recommended Products Viewed
References
● https://solr.apache.org/
● https://lucidworks.com/post/who-uses-lucenesolr/
● https://www.kaggle.com/stefanoleone992/imdb-extensive-dataset?select=IMDb+ratings.csv
● https://www.esolutions.ro/streaming-expressions-in-apache-solr
● https://github.com/oanabrezai/moreLikeThisSOLR
Thank you
Oana Brezai
oana.brezai@esolutions.ro

More Related Content

What's hot

[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우PgDay.Seoul
 
Personalized Search on the Largest Flash Sale Site in America
Personalized Search on the Largest Flash Sale Site in AmericaPersonalized Search on the Largest Flash Sale Site in America
Personalized Search on the Largest Flash Sale Site in AmericaAdrian Trenaman
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB
 
Modul 3 pencarian heuristik
Modul 3   pencarian heuristikModul 3   pencarian heuristik
Modul 3 pencarian heuristikahmad haidaroh
 
Contoh program buble sort dalam pascal
Contoh program buble sort dalam pascalContoh program buble sort dalam pascal
Contoh program buble sort dalam pascalSimon Patabang
 
Sisteme de Operare: Implementarea sistemelor de fisiere
Sisteme de Operare: Implementarea sistemelor de fisiereSisteme de Operare: Implementarea sistemelor de fisiere
Sisteme de Operare: Implementarea sistemelor de fisiereAlexandru Radovici
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceChin Huang
 
Analisis Semantik - P 6 Teknik Kompilasi
Analisis Semantik - P 6 Teknik KompilasiAnalisis Semantik - P 6 Teknik Kompilasi
Analisis Semantik - P 6 Teknik Kompilasiahmad haidaroh
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Conquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to PostgresConquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to PostgresEDB
 
Geospatial Advancements in Elasticsearch
Geospatial Advancements in ElasticsearchGeospatial Advancements in Elasticsearch
Geospatial Advancements in ElasticsearchElasticsearch
 
Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022 Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022 Charles Martin
 

What's hot (20)

[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
 
Personalized Search on the Largest Flash Sale Site in America
Personalized Search on the Largest Flash Sale Site in AmericaPersonalized Search on the Largest Flash Sale Site in America
Personalized Search on the Largest Flash Sale Site in America
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Kisi kisi sbp
Kisi kisi sbpKisi kisi sbp
Kisi kisi sbp
 
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
 
Modul 3 pencarian heuristik
Modul 3   pencarian heuristikModul 3   pencarian heuristik
Modul 3 pencarian heuristik
 
Contoh program buble sort dalam pascal
Contoh program buble sort dalam pascalContoh program buble sort dalam pascal
Contoh program buble sort dalam pascal
 
Sisteme de Operare: Implementarea sistemelor de fisiere
Sisteme de Operare: Implementarea sistemelor de fisiereSisteme de Operare: Implementarea sistemelor de fisiere
Sisteme de Operare: Implementarea sistemelor de fisiere
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
La prolog 1
La prolog 1La prolog 1
La prolog 1
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
 
Analisis Semantik - P 6 Teknik Kompilasi
Analisis Semantik - P 6 Teknik KompilasiAnalisis Semantik - P 6 Teknik Kompilasi
Analisis Semantik - P 6 Teknik Kompilasi
 
Jawaban latihan soal STRUKTUR DATA
Jawaban latihan soal STRUKTUR DATAJawaban latihan soal STRUKTUR DATA
Jawaban latihan soal STRUKTUR DATA
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
queue antrian
queue antrianqueue antrian
queue antrian
 
Conquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to PostgresConquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to Postgres
 
Geospatial Advancements in Elasticsearch
Geospatial Advancements in ElasticsearchGeospatial Advancements in Elasticsearch
Geospatial Advancements in Elasticsearch
 
Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022 Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022
 

Recently uploaded

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 

Recently uploaded (20)

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 

More on "More Like This" Recommendations in SOLR

  • 1. More on "More Like This" Recommendations in SOLR Oana Brezai Software Engineer @ eSolutions
  • 2. Outline Use Case How does Search work How does MLT work A limitation of MLT Quality of the results Conclusions
  • 3. Use Case: Build a Recommendation Application Requirements ● Movie Store ● ~ 85 K Movies ● Use Open Source Software Solution ● Fast ● High Quality Results
  • 4. Why Apache SOLR ? Solr (NoSQL DB) ● Popular ● Blazing-fast ● Highly scalable ● Open source enterprise search platform ● Built on Apache Lucene
  • 6. “Movie Store” Use Case When ● A user visualizes the details of a movie Then ● The application recommends “similar” movies
  • 7. Example Target Movie ● The Lord of the Rings: The Fellowship of the Ring Recommendations 1) The Lord of the Rings: The Return of the King 2) The Lord of the Rings: The Two Towers 3) The Lord of the Rings 4) Lord of War 5) The Lord Protector
  • 8. What Does “Similar” Mean? Target Movie ● “The Lord of the Rings: The Fellowship of the Ring”  Action / Adventure / Drama  8.8 on IMDB Recommended (Similar) Movies ● The same words in the title ● The same movie genre ● The same words in the description ● Similar IMDB vote
  • 9. Questions Questions for our Recommendation System ● Do all the words have the same importance? ● Do all the fields have the same importance? ● How does the engine differentiate between results?
  • 11. Add Data to SOLR Create a Collection (~Table) ● movie_content Populate the Collection with Data ● 85855 movies
  • 12. Data Structure Movie Fields ● imdb_title_id (movie id) ● original_title ● description ● genre ● avg_vote (imdb vote)
  • 13. Movie Fields -> with Types ● imdb_title_id -> string ● original_title -> “analyzed” text ● description -> “analyzed” text ● genre -> array of strings ● avg_vote -> number
  • 14. String vs “Analyzed” Text Field Types ● Field Type: String ● Example: “Comedy” (field: genre)  Indexed: “Comedy” ● Field Type: “Analyzed” Text ● Example: “The Lord of the Rings: The Fellowship of the Ring” (field: original_title)  Indexed (lowercased and without stopwords): ○ “lord” ○ “rings” ○ “fellowship” ○ “ring”
  • 15. “The Lord of the Rings: The Fellowship of the Ring” ● Movie Id (imdb_title_id): tt0120737 ● Original Title  “The Lord of the Rings: The Fellowship of the Ring” ● Description  “A meek Hobbit from the Shire and eight companions set out on a journey to destroy the powerful One Ring and save Middle-earth from the Dark Lord Sauron.” ● Genre  “Action, Adventure, Drama” ● Imdb vote (avg_vote): 8.8
  • 16.
  • 17. “More Like This” Feature in SOLR More Like This ● Given a movie id => list “similar” movies ● Uses the “Search” functionality
  • 19. “Search” Example 1: Query original_title: “Lord of the Rings” Results ● No movies found
  • 20. “Search” Example 2: Query original_title: “Lord” AND original_title: “Rings” Results (4) 1) "The Lord of the Rings" 2) "The Lord of the Rings: The Fellowship of the Ring" 3) "The Lord of the Rings: The Return of the King" 4) "The Lord of the Rings: The Two Towers” Execution time: 21 ms
  • 21. How Does the Search original_title: “Lord” AND original_title: “Rings” Function? ● Searches in the original_title index all the movies that contain the words “lord” AND “rings” (lowercased!) ● Computes search score based on Boosting, Term Frequency (TF) and Inverse Document Frequency (IDF) ● Displays the results in descending order of the score
  • 22. The TF / IDF Scoring Formula score[movie] =∑(boost(field[j]) * tf(word[i]) * idf(word[i])) where: boost(field[j]) = custom weight given to the field j tf(word[i]) = countTermFreq/(countTermFreq + 1.2 * (1 - 0.75 + 0.75 * fieldLength/avgFieldLength)) idf(word[i]) = log(1 + (countDocumentFreq - countTermFreq + 0.5) / (countTermFreq + 0.5)) word[i] = every word in the field, excluding stop words (in our case) fieldLength = count of words in the field, excluding stop words (in our case) avgFieldLength = average length of field
  • 23. original_title = “The Lord of the Rings” genre = “Animation, Adventure, Fantasy” description = “The Fellowship of the Ring embark ...” score = 1 * tf(“lord”) * idf(“lord”) + 1 * tf(“rings”) * idf(“rings”) + 1 * tf(“Animation”) * idf(“Animation”) + ... Debug the Scoring Formula score[movie] =∑(boost(field[j]) * tf(word[i]) * idf(word[i]))
  • 24. Debug the TF / IDF Formula for the QUERY = original_title:Lord AND original_title:Rings Original title CTF (Field) Lord Rings CDF (Corpus) Lord Rings Field Length Score The Lord of the Rings 1 1 26 10 2 8.29 The Lord of the Rings: The Fellowship of the Ring 1 1 26 10 4 6.06 The Lord of the Rings: The Return of the King 1 1 26 10 4 6.06 The Lord of the Rings: The Two Towers 1 1 26 10 4 6.06 tf(word[i]) = countTermFreq/(countTermFreq + 1.2 * (1 - 0.75 + 0.75 * fieldLength / avgFieldLength)) idf(word[i]) = log(1 + (countDocumentFreq - countTermFreq + 0.5) / (countTermFreq + 0.5))
  • 25. “Search” in SOLR High Quality ● Scoring Formula  TF / IDF  Boosting Fast ● Inverted Index
  • 26. Inverted Index (original_title) Id (imdb_title_id) Tile (original_title) tt0120737 The Lord of the Rings: The Fellowship of the Ring tt0167260 The Lord of the Rings: The Return of the King tt0167261 The Lord of the Rings: The Two Towers tt0077869 The Lord of the Rings Word Ids (imbd_title_id) lord tt0120737, tt0167260, tt0167261, tt0077869 rings tt0120737, tt0167260, tt0167261, tt0077869 ring tt0120737 fellowship tt0120737 return tt0167260 king tt0167260 towers tt0167261 two tt0167261
  • 27. How Does “More Like This” Work in SOLR?
  • 28. “More Like This” Example Query ● q = imdb_title_id:tt0120737 (“The Lord of the Rings: The Fellowship of the Ring”) ● Other parameters:  mlt = true  mlt.fl=original_title, description, genre, avg_vote  mlt.mintf = 1  mlt.count = 5
  • 30. Results Results (“The Lord of the Rings: The Fellowship of the Ring”) ● Execution Time: <100 ms ● Total Results: 62387
  • 31. Score Title Year Genre Vote 24.49 The Lord of the Rings 1978 Animation / Adventure / Fantasy 6.2 14.78 The Ring Thing 2004 Adventure / Comedy 3.5 13.11 The Dork of the Rings 2006 Adventure / Comedy / Fantasy 3.2 12.65 The Lord of the Rings: The Return of the King 2003 Action / Adventure / Drama 8.9 11.23 The Lord Protector 1996 Action / Adventure / Fantasy 4.2 Results for “The Lord of the Rings: The Fellowship of the Ring” (Action, Adventure, Drama - 8.8)
  • 32. Score Title Year Genre Vote 24.49 The Lord of the Rings 1978 Animation / Adventure / Fantasy 6.2 14.78 The Ring Thing 2004 Adventure / Comedy 3.5 13.11 The Dork of the Rings 2006 Adventure / Comedy / Fantasy 3.2 12.65 The Lord of the Rings: The Return of the King 2003 Action / Adventure / Drama 8.9 11.23 The Lord Protector 1996 Action / Adventure / Fantasy 4.2 Results for “The Lord of the Rings: The Fellowship of the Ring” (Action, Adventure, Drama - 8.8)
  • 33. Improve Query: Add Boosting Boost Fields (Add Weight) ● original_title ● description ● genre ● avg_vote Importance of Fields avg_vote >> genre >> original_title >> description
  • 34. Boosting factors: ● avg_vote -> 40 ● genre -> 30 ● original_title -> 20 ● description -> 1 For every word in (original_title, description, genre) do score + = boosting(field) * tf(word) * idf(word) Scoring Formula
  • 35. genre = “Animation, Adventure, Fantasy” -- BOOSTING 30 original_title = “The Lord of the Rings” --- BOOSTING 20 description = “The Fellowship of the Ring embark ...” -- BOOSTING 1 score = 30 * tf(“Animation”) * idf(“Animation”) + 30 * tf(“Adventure”) * idf(“Adventure”) + 30 * tf(“Fantasy”) * idf(“Fantasy”) + 20 * tf(“lord”) * idf(“lord”) + ... Debug Scoring Formula with Boosting
  • 37. Results for “The Lord of the Rings: The Fellowship of the Ring” (Action, Adventure, Drama - 8.8) Score Title Year Genre Vote 1132 The Lord of the Rings 1978 Animation / Adventure / Fantasy 6.2 894 The Lord of the Rings: The Return of the King 2003 Action / Adventure / Drama 8.9 881 The Lord of the Rings: The Two Towers 2002 Action / Adventure / Drama 8.7 667 Rings 2017 Drama / Horror / Mystery 4.5 661 The Dork of the Rings 2006 Adventure / Comedy / Fantasy 3.2
  • 38. Results for “The Lord of the Rings: The Fellowship of the Ring” (Action, Adventure, Drama - 8.8) Score Title Year Genre Vote 1132 The Lord of the Rings 1978 Animation / Adventure / Fantasy 6.2 894 The Lord of the Rings: The Return of the King 2003 Action / Adventure / Drama 8.9 881 The Lord of the Rings: The Two Towers 2002 Action / Adventure / Drama 8.7 667 Rings 2017 Drama / Horror / Mystery 4.5 661 The Dork of the Rings 2006 Adventure / Comedy / Fantasy 3.2
  • 39. A Limitation of “More Like This”
  • 40. Numeric Fields Ignored in MLT Issue ● Only text fields are used in MLT queries Solution ● Rewrite the whole query as a search query and include also the numeric fields
  • 41. More on “More Like This” in SOLR
  • 42. “More Like This” Steps 1) Extract the “interesting terms” from the target movie 2) Add boostings / field (as given in the query) for every interesting term 3) Perform a Search with those words and boostings
  • 43. “More Like This” Step 1 1) Extract the “interesting terms” from the target movie (from the field list in the query): take all the words from all the fields and compute their relevance. Keep the first 25. Ex: word “ring” -> very relevant for the movie: “The Lord of the Rings: The Fellowship of the Ring”: - 2 occurrences: once in “original_title” and once in “description” - in the whole corpus of 85855 movies: - 35 times in the field “original_title” and - 282 times in the field “description” 2) Add boostings / field (as given in the query) for every interesting term 3) Perform a Search with those words and boostings
  • 44. List of Interesting Terms for MovieId tt0120737 genre:Drama genre:Action genre:Adventure description:one description:set description:save description:journey description:middle description:meek description:hobbit description:shire description:sauron original_title:fellowship original_title:ring original_title:lord original_title:rings description:dark description:earth description:powerful description:destroy description:lord description:ring description:eight description:companions
  • 45. “More Like This” Step 2 1) Extract the “interesting terms” from the target movie (from the field list in the query) 2) Add boostings / field (as given in the query) for every interesting term: avg_vote^40 genre^30 original_title^20 description 3) Perform a Search with those words and boostings
  • 46. Interesting Terms for tt0120737 with Boosting genre:Drama^30 genre:Action^30 genre:Adventure^30 description:one description:set description:save description:journey description:middle description:meek description:hobbit description:shire description:sauron original_title:fellowship^20 original_title:ring^20 original_title:lord^20 original_title:rings^20 description:dark description:earth description:powerful description:destroy description:lord description:ring description:eight description:companions
  • 47. “More Like This” Step 3 1) Extract the “interesting terms” from the target movie (from the field list in the query) 2) Add boostings / field (as given in the query) for every interesting term 3) Perform a Search with those words and boostings
  • 48. Results for “The Lord of the Rings: The Fellowship of the Ring” (Action, Adventure, Drama - 8.8) Score Title Year Genre Vote 1132 The Lord of the Rings 1978 Animation / Adventure / Fantasy 6.2 894 The Lord of the Rings: The Return of the King 2003 Action / Adventure / Drama 8.9 881 The Lord of the Rings: The Two Towers 2002 Action / Adventure / Drama 8.7 667 Rings 2017 Drama / Horror / Mystery 4.5 661 The Dork of the Rings 2006 Adventure / Comedy / Fantasy 3.2
  • 49. Add Numeric Fields to “More Like This” 1) SOLR Request 1: perform a MLT and get the “interesting terms” 2) Add boostings 3) Add numeric fields with their boostings 4) SOLR Request 2: perform a Search with numeric fields and “interesting terms” with their respective boostings
  • 50. Example of Numeric Field Syntax Target movie: avg_vote = 8.8 => a similar movie would have: avg_vote: [8.8 - 1.5 TO 8.8 + 1.5] => add boosting factor: avg_vote: [7.3 TO 10.3] ^ 40
  • 51. Final SOLR Search Query genre:Drama^30 genre:Action^30 genre:Adventure^30 description:one description:set description:save description:journey description:middle description:meek description:hobbit description:shire description:sauron original_title:fellowship^20 original_title:ring^20 original_title:lord^20 original_title:rings^20 description:dark description:earth description:powerful description:destroy description:lord description:ring description:eight description:companions avg_vote:[7.3 TO 10.3]^40 Q =
  • 52.
  • 53. Final Results for “The Lord of the Rings: The Fellowship of the Ring”(Action, Adventure, Drama - 8.8) Score Title Year Genre Vote 249 The Lord of the Rings: The Return of the King 2003 Action / Adventure / Drama 8.9 246 The Lord of the Rings: The Two Towers 2002 Action / Adventure / Drama 8.7 222 The Lord of the Rings 1978 Animation / Adventure / Fantasy 6.2 161 Lord of War 2005 Action / Crime / Drama 7.6 157 The Lord Protector 1996 Action / Adventure / Fantasy 4.2
  • 55. Quality Recommended Products Ordered ● Based on history of sales Recommended Products Viewed ● Based on history of browsing
  • 57. Conclusions MLT in SOLR ● Inverted Index ● TF/IDF Scoring Formula ● Boosting Quality Measurement Feedback Loop ● Recommended Products Ordered ● Recommended Products Viewed
  • 58. References ● https://solr.apache.org/ ● https://lucidworks.com/post/who-uses-lucenesolr/ ● https://www.kaggle.com/stefanoleone992/imdb-extensive-dataset?select=IMDb+ratings.csv ● https://www.esolutions.ro/streaming-expressions-in-apache-solr ● https://github.com/oanabrezai/moreLikeThisSOLR