SlideShare a Scribd company logo
1 of 53
Download to read offline
Tweaking the Base Score:
Lucene/Solr Similarities Explained
Demo: github.com/sematext/activate/tree/master/2019
More info: sematext.com/blog/search-relevance-solr-elasticsearch-similarity
Radu
Gheorghe
Rafaล‚
Kuฤ‡
www.sematext.com
Agenda
BM25 - Best Match: the default
DFR - Divergence From Randomness framework
DFI - Divergence From Independence
IB - Information-Based models
LM - Language Models
Custom similarity
Putting it all together
TF*IDF
You know, for historical reasons
BM25 - the TF part
freq / (freq + k1 * (1 - b + b * dl / avgdl))
Best for Most ๐Ÿ˜
BM25 tunables
freq / (freq + k1 * (1 - b + b * dl / avgdl))
k1 - raise or lower ceiling
BM25 tunables
freq / (freq + k1 * (1 - b + b * dl / avgdl))
doc length normalization
BM25 demo
yes, thatโ€™s how we look
when we give demos
BM25
Good default. You can
tune the weight of freq
and docLength.
Divergence From Randomness
Basic Model
G, I(n), I(ne), I(F)
After Effect
L, B
Normalization
H1, H2, H3, Z, none
tf * c * avgFieldLength / docFieldLength
Divergence From Randomness - H1
Divergence From Randomness - H1
No normalization, and H1 with c == 1, 3, 5, 7
tf * log2
(1 + c * (avgFieldLength / docFieldLength))
Divergence From Randomness - H2
Divergence From Randomness - H2
No normalization, and H2 with c == 1, 3, 5, 7
tf * (avgFieldLength / docFieldLength)Z
Divergence From Randomness - Z
Divergence From Randomness - Z
No normalization, and Z with z == 0.1, 0.2, 0.3, 0.4
(tf * mu * ((totalTermFreq + 1) / (#๏ฌeldTokens + 1)))
(docFieldLength + mu) * mu
Divergence From Randomness - H3
Divergence From Randomness - H3
No normalization, and H3 with mu == 1, 3, 5, 7
DFR demo
Only one, I promise
DFR
Framework. Tunable:
choose algorithm and
tune parameters for
both IDF* and
docLength.
* generic name for importance
of this term
Divergence From Independence
expected frequency
Divergence From Independence
docLength*totalTermFrequency/numberOfFieldTokens
expected frequency
DFI: Standardized
(actual - expected)/sqrt(expected)
DFI demo
Oh, but donโ€™t remove
stopwords*!
1) arbitrarily chops ๏ฌeld length
2) stopwords arenโ€™t always
stopwords ;)
DFI
Simple. Parameterless.
Flexible: works well
with various datasets.
Information Based
how much information we get from this term?
Information Based
Distribution
Log-Logistic, Smoothed Power-Law
Lambda
DF, TTF
Normalization
H1, H2, H3, Z, none
Information Based - Log-Logistic
log( tfn / (lambda + 1) )
Information Based - Log-Logistic
lambda: 0.1 (red), 0.3 (black), 0.8 (blue)
Information Based - Retrieval Function
the average of the document information brought
by each query term
Information Based - Retrieval Function - DF
number of matching documents
(docFrequency + 1) / (numberOfDocuments + 1)
Information Based - Retrieval Function - TTF
total number of term occurrences
(totalTermFrequency + 1) / (numberOfDocuments + 1)
IB demo
IB
Framework. like DFR.
Even has the same
normalization options.
But newer and, in the
paper, better.
Language Models
probability of a term being our term
Language Models
totalTermFreq/totalFieldTokens
probability of a term being our term
Language Models: Jelinek-Mercer
log(
(1-ฮป)*
tf
)
docLength
ฮป * probability
LM demo
feat. Jelinek-Mercer
LM
Two probabilistic
models. Similar
approach to DFI, but
tunable.
Custom Similarity
compute a similarity score using custom code
Custom Similarity - Activate Similarity Factory
public class ActivateSimilarityFactory extends SimilarityFactory {
private volatile Similarity similarity;
public void init(SolrParams params) {
super.init(params);
}
public Similarity getSimilarity() {
if (similarity == null) {
similarity = new ActivateSimilarity();
}
return similarity;
}
}
Custom Similarity - Activate Similarity Factory
public class ActivateSimilarityFactory extends SimilarityFactory {
private volatile Similarity similarity;
public void init(SolrParams params) {
super.init(params);
}
public Similarity getSimilarity() {
if (similarity == null) {
similarity = new ActivateSimilarity();
}
return similarity;
}
}
Custom Similarity - Activate Similarity Factory
public class ActivateSimilarityFactory extends SimilarityFactory {
private volatile Similarity similarity;
public void init(SolrParams params) {
super.init(params);
}
public Similarity getSimilarity() {
if (similarity == null) {
similarity = new ActivateSimilarity();
}
return similarity;
}
}
Custom Similarity - Similarity
public class ActivateSimilarity extends Similarity {
public ActivateSimilarity() {}
public long computeNorm(FieldInvertState state) { return 1; }
public Similarity.SimScorer scorer(float boost,
CollectionStatistics collectionStats, TermStatistics... termStats) {
return new ActivateSimScorer();
}
}
Custom Similarity - Similarity
public class ActivateSimilarity extends Similarity {
public ActivateSimilarity() {}
public long computeNorm(FieldInvertState state) { return 1; }
public Similarity.SimScorer scorer(float boost,
CollectionStatistics collectionStats, TermStatistics... termStats) {
return new ActivateSimScorer();
}
}
Custom Similarity - Similarity
public class ActivateSimilarity extends Similarity {
public ActivateSimilarity() {}
public long computeNorm(FieldInvertState state) { return 1; }
public Similarity.SimScorer scorer(float boost,
CollectionStatistics collectionStats, TermStatistics... termStats) {
return new ActivateSimScorer();
}
}
Custom Similarity - SimScorer
public class ActivateSimScorer extends Similarity.SimScorer {
public float score(float freq, long norm) {
return freq;
}
}
Custom Similarity - SimScorer
public class ActivateSimScorer extends Similarity.SimScorer {
public float score(float freq, long norm) {
return freq;
}
}
Custom
Similarity
demo
Custom
When you need
something special, like
disregarding term
frequency.
Multiple
similarities
demo
THANK YOU

More Related Content

What's hot

Faster Python, FOSDEM
Faster Python, FOSDEMFaster Python, FOSDEM
Faster Python, FOSDEM
Victor Stinner
ย 
ไป•ไบ‹ใงไฝฟใ†F#
ไป•ไบ‹ใงไฝฟใ†F#ไป•ไบ‹ใงไฝฟใ†F#
ไป•ไบ‹ใงไฝฟใ†F#
bleis tift
ย 

What's hot (19)

Faster Python, FOSDEM
Faster Python, FOSDEMFaster Python, FOSDEM
Faster Python, FOSDEM
ย 
ไป•ไบ‹ใงไฝฟใ†F#
ไป•ไบ‹ใงไฝฟใ†F#ไป•ไบ‹ใงไฝฟใ†F#
ไป•ไบ‹ใงไฝฟใ†F#
ย 
String c
String cString c
String c
ย 
Strings
StringsStrings
Strings
ย 
C Programming Homework Help
C Programming Homework HelpC Programming Homework Help
C Programming Homework Help
ย 
05 object behavior
05 object behavior05 object behavior
05 object behavior
ย 
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMYComputer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
ย 
Demystifying the Go Scheduler
Demystifying the Go SchedulerDemystifying the Go Scheduler
Demystifying the Go Scheduler
ย 
Privacy-Preserving Search for Chemical Compound Databases
Privacy-Preserving Search for Chemical Compound DatabasesPrivacy-Preserving Search for Chemical Compound Databases
Privacy-Preserving Search for Chemical Compound Databases
ย 
Introduction to Recursion (Python)
Introduction to Recursion (Python)Introduction to Recursion (Python)
Introduction to Recursion (Python)
ย 
SAE: Structured Aspect Extraction
SAE: Structured Aspect ExtractionSAE: Structured Aspect Extraction
SAE: Structured Aspect Extraction
ย 
Nagios Conference 2013 - BOF Nagios Plugins New Threshold Specification Syntax
Nagios Conference 2013 - BOF Nagios Plugins New Threshold Specification SyntaxNagios Conference 2013 - BOF Nagios Plugins New Threshold Specification Syntax
Nagios Conference 2013 - BOF Nagios Plugins New Threshold Specification Syntax
ย 
Strings
StringsStrings
Strings
ย 
String in c programming
String in c programmingString in c programming
String in c programming
ย 
Introduction to go
Introduction to goIntroduction to go
Introduction to go
ย 
FFT
FFTFFT
FFT
ย 
String.ppt
String.pptString.ppt
String.ppt
ย 
pointer, virtual function and polymorphism
pointer, virtual function and polymorphismpointer, virtual function and polymorphism
pointer, virtual function and polymorphism
ย 
Pointers, virtual function and polymorphism
Pointers, virtual function and polymorphismPointers, virtual function and polymorphism
Pointers, virtual function and polymorphism
ย 

Similar to Activate 2019: Tweaking the Base Score: Lucene/Solr Similarities Explained

Grape generative fuzzing
Grape generative fuzzingGrape generative fuzzing
Grape generative fuzzing
FFRI, Inc.
ย 
Introducing PHP Latest Updates
Introducing PHP Latest UpdatesIntroducing PHP Latest Updates
Introducing PHP Latest Updates
Iftekhar Eather
ย 
Refactoring In Tdd The Missing Part
Refactoring In Tdd The Missing PartRefactoring In Tdd The Missing Part
Refactoring In Tdd The Missing Part
Gabriele Lana
ย 

Similar to Activate 2019: Tweaking the Base Score: Lucene/Solr Similarities Explained (20)

Core java
Core javaCore java
Core java
ย 
C++ concept of Polymorphism
C++ concept of  PolymorphismC++ concept of  Polymorphism
C++ concept of Polymorphism
ย 
Terraform Abstractions for Safety and Power
Terraform Abstractions for Safety and PowerTerraform Abstractions for Safety and Power
Terraform Abstractions for Safety and Power
ย 
The GO Language : From Beginners to Gophers
The GO Language : From Beginners to GophersThe GO Language : From Beginners to Gophers
The GO Language : From Beginners to Gophers
ย 
Andy On Closures
Andy On ClosuresAndy On Closures
Andy On Closures
ย 
Addressing Scenario
Addressing ScenarioAddressing Scenario
Addressing Scenario
ย 
Terraform training ๐ŸŽ’ - Basic
Terraform training ๐ŸŽ’ - BasicTerraform training ๐ŸŽ’ - Basic
Terraform training ๐ŸŽ’ - Basic
ย 
Doing It Wrong with Puppet -
Doing It Wrong with Puppet - Doing It Wrong with Puppet -
Doing It Wrong with Puppet -
ย 
Grape generative fuzzing
Grape generative fuzzingGrape generative fuzzing
Grape generative fuzzing
ย 
Introducing PHP Latest Updates
Introducing PHP Latest UpdatesIntroducing PHP Latest Updates
Introducing PHP Latest Updates
ย 
Design patterns
Design patternsDesign patterns
Design patterns
ย 
Network automation with Ansible and Python
Network automation with Ansible and PythonNetwork automation with Ansible and Python
Network automation with Ansible and Python
ย 
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
ย 
Groovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume LaforgeGroovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume Laforge
ย 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
ย 
Terraform modules restructured
Terraform modules restructuredTerraform modules restructured
Terraform modules restructured
ย 
Refactoring In Tdd The Missing Part
Refactoring In Tdd The Missing PartRefactoring In Tdd The Missing Part
Refactoring In Tdd The Missing Part
ย 
Spock: A Highly Logical Way To Test
Spock: A Highly Logical Way To TestSpock: A Highly Logical Way To Test
Spock: A Highly Logical Way To Test
ย 
From Java to Parellel Clojure - Clojure South 2019
From Java to Parellel Clojure - Clojure South 2019From Java to Parellel Clojure - Clojure South 2019
From Java to Parellel Clojure - Clojure South 2019
ย 
Kicking off with Zend Expressive and Doctrine ORM (PHP South Africa 2018)
Kicking off with Zend Expressive and Doctrine ORM (PHP South Africa 2018)Kicking off with Zend Expressive and Doctrine ORM (PHP South Africa 2018)
Kicking off with Zend Expressive and Doctrine ORM (PHP South Africa 2018)
ย 

Recently uploaded

CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female serviceCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
ย 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
anilsa9823
ย 
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
bodapatigopi8531
ย 

Recently uploaded (20)

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female serviceCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
ย 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
ย 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
ย 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
ย 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
ย 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
ย 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
ย 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
ย 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
ย 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
ย 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
ย 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
ย 
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
ย 
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS LiveVip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
ย 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
ย 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
ย 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
ย 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
ย 

Activate 2019: Tweaking the Base Score: Lucene/Solr Similarities Explained