SlideShare a Scribd company logo
1 of 27
Download to read offline
Learning to rank
fulltext results from
clicks
Tomáš Kramár
@tkramar
@synopsitv
Let's build a fulltext search
engine.
Query
Find
matches
Rank
results
1 2
43
Let's build a fulltext search
engine.
Query
Find
matches
Rank
results
1 2
43
● ElasticSearch
● LIKE %%
● ...
Let's build a fulltext search
engine.
Query
Find
matches
Rank
results
1 2
43
● By number of hits
● By PageRank
● By Date
● ...
How do
you
choose
relevant
results?
Number of
keywords in title
2 2
Number of
keywords in text
2 0
Domain carrerjet.sk vienna-rb.at
Category Job search Programming
Language Slovak English
Document feature How much I care about it
(the higher the more I care)
# keywords in title 2.1
# keywords in text 1
Domain is carreerjet.sk -2
Domain is vienna-rb.at 3.5
Category is Job Search -1
Category is Programming 4.2
Language is Slovak 0.9
Language is English 1.5
Document feature How much I
care about it
# keywords in title 2.1 2 2
# keywords in text 1 2 0
Domain is carreerjet.sk -2 1 0
Domain is vienna-rb.at 3.5 0 1
Category is Job Search -1 1 0
Category is Programming 4.2 0 1
Language is Slovak 0.9 1 0
Language is English 1.5 0 1
= 4.1 = 13.3rank = d . u
Rate each
result on
a scale 1-
5.
rating = d . u =
= d1
. u1
+ d2
. u2
+ ... + dn
. un
d1,1
. u1
+ d1,2
. u2
+ ... + d1,n
. un
= 3
d2,1
. u1
+ d2,2
. u2
+ ... + d2,n
. un
= 5
d3,1
. u1
+ d3,2
. u2
+ ... + dn
. u3,n
= 1
d4,1
. u1
+ d4,1
. u2
+ ... + dn
. u4,n
= 3
rating = d . u =
= d1
. u1
+ d2
. u2
+ ... + dn
. un
d1,1
. u1
+ d1,2
. u2
+ ... + d1,n
. un
= 3
d2,1
. u1
+ d2,2
. u2
+ ... + d2,n
. un
= 5
d3,1
. u1
+ d3,2
. u2
+ ... + dn
. u3,n
= 1
d4,1
. u1
+ d4,1
. u2
+ ... + dn
. u4,n
= 3
di,j
are known, solve this system of
equations and you have u. Done.
Except..
● You don't know the explicit
ratings
● User preferences change in time
● Those equations probably don't
have solution
Clicked!
Assume
rating 1.
Not clicked.
Assume
rating 0.
Except..
● You don't know the explicit
ratings
● User preferences change in time
● Those equations probably don't
have solution
Approximation function
h(d): d → rank
h(d) = d1
.u1
+ ... + dn
.un
= estimated_rank
If the function is good, it should make
minimal errors
error = (estimated_rank - real_rank)2
Gradient descent
1. Set user preferences (u) to arbitrary
values
2. Calculate the estimated rank h(d)
for each document
3. Calculate the mean square error
4. Adjust preferences u in a way that
minimizes the error
5. Repeat until the error converges
meansquareerror
u# of keywords in title
cost function
meansquareerror
u# of keywords in title
cost function
Calculate the derivation of cost
function at this point and it will
give you the direction to move in.
Preference update
ui
= ui
- α.h(d)dui
α learning rate
h(d)dui
partial derivation of
cost function h(d)
by ui
Preference update
ui
= ui
- α.h(d)dui
α learning rate
h(d)dui
partial derivation of
cost function h(d)
by ui
How fast will you
move. Too low -
slow progress. Too
high - you will
overshoot.
Preference update
ui
= ui
- α.h(d)dui
α learning rate
h(d)dui
partial derivation of
cost function h(d)
by ui
Nothing scary. You can
find these online for
standard cost
functions.
For mean square error:
(rank(d) - h(d)) * ui
Gradient descent
1. Set user preferences (u) to arbitrary
values
2. Calculate the estimated rank h(d)
for each document
3. Calculate the square error
4. Adjust preferences u in a way that
minimizes the error
5. Repeat until the error converges
Clicked! Assume
rating 1.
Clicked! Assume
rating 1.
Or? Doesn't
this mean
result #1 is not
relevant?
Clicked! Assume
nothing.
Clicked! Assume
it is better than
#2 and #3.
What's changed?
We no longer have ratings, just document
comparisons.
Cost function - something that
considers ordering, e.g., Kendall's T
(number of concordant and
discordant pairs)
h is now a function of 2
parameters: h(d1, d2). But you can
just do d2 - d1 and learn on that.
d4
> d3
d4
> d2
Learning to rank fulltext results from clicks

More Related Content

Viewers also liked

IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataВладимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataMail.ru Group
 
Markov model for the online multichannel attribution problem
Markov model for the online multichannel attribution problemMarkov model for the online multichannel attribution problem
Markov model for the online multichannel attribution problemadavide1982
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 

Viewers also liked (7)

IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataВладимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
 
Markov model for the online multichannel attribution problem
Markov model for the online multichannel attribution problemMarkov model for the online multichannel attribution problem
Markov model for the online multichannel attribution problem
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 

Similar to Learning to rank fulltext results from clicks

Optimizing search engines
Optimizing search enginesOptimizing search engines
Optimizing search enginesSwapnil Kotwal
 
Pf lec 01 intro
Pf lec 01 introPf lec 01 intro
Pf lec 01 introRajaKayani
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven developmentTony Nguyen
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven developmentJames Wong
 
Behaviour drivendevelopment
Behaviour drivendevelopmentBehaviour drivendevelopment
Behaviour drivendevelopmentHoang Nguyen
 
Behaviour drivendevelopment
Behaviour drivendevelopmentBehaviour drivendevelopment
Behaviour drivendevelopmentLuis Goldster
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven developmentFraboni Ec
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven developmentHarry Potter
 
Behaviour drivendevelopment
Behaviour drivendevelopmentBehaviour drivendevelopment
Behaviour drivendevelopmentYoung Alista
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
Designing Object Oriented Software - lecture slides 2013
Designing Object Oriented Software - lecture slides 2013Designing Object Oriented Software - lecture slides 2013
Designing Object Oriented Software - lecture slides 2013Jouni Smed
 
Improving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptionsImproving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptionsMaria Vechtomova
 
Software development slides
Software development slidesSoftware development slides
Software development slidesiarthur
 
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...
@IndeedEng:  Tokens and Millicents - technical challenges in launching Indeed...@IndeedEng:  Tokens and Millicents - technical challenges in launching Indeed...
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...indeedeng
 
Recommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model EvaluationRecommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model EvaluationCrossing Minds
 
How to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank ProjectHow to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank ProjectSease
 
Translating Qt Applications
Translating Qt ApplicationsTranslating Qt Applications
Translating Qt Applicationsaccount inactive
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...GeeksLab Odessa
 

Similar to Learning to rank fulltext results from clicks (20)

Optimizing search engines
Optimizing search enginesOptimizing search engines
Optimizing search engines
 
Pf lec 01 intro
Pf lec 01 introPf lec 01 intro
Pf lec 01 intro
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven development
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven development
 
Behaviour drivendevelopment
Behaviour drivendevelopmentBehaviour drivendevelopment
Behaviour drivendevelopment
 
Behaviour drivendevelopment
Behaviour drivendevelopmentBehaviour drivendevelopment
Behaviour drivendevelopment
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven development
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven development
 
Behaviour drivendevelopment
Behaviour drivendevelopmentBehaviour drivendevelopment
Behaviour drivendevelopment
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Designing Object Oriented Software - lecture slides 2013
Designing Object Oriented Software - lecture slides 2013Designing Object Oriented Software - lecture slides 2013
Designing Object Oriented Software - lecture slides 2013
 
Improving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptionsImproving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptions
 
Software development slides
Software development slidesSoftware development slides
Software development slides
 
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...
@IndeedEng:  Tokens and Millicents - technical challenges in launching Indeed...@IndeedEng:  Tokens and Millicents - technical challenges in launching Indeed...
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...
 
Algorithms overview
Algorithms overviewAlgorithms overview
Algorithms overview
 
Recommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model EvaluationRecommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model Evaluation
 
How to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank ProjectHow to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank Project
 
Cloud Computing Project
Cloud Computing ProjectCloud Computing Project
Cloud Computing Project
 
Translating Qt Applications
Translating Qt ApplicationsTranslating Qt Applications
Translating Qt Applications
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
 

More from tkramar

Lessons learned from SearchD development
Lessons learned from SearchD developmentLessons learned from SearchD development
Lessons learned from SearchD developmenttkramar
 
Live Streaming & Server Sent Events
Live Streaming & Server Sent EventsLive Streaming & Server Sent Events
Live Streaming & Server Sent Eventstkramar
 
Unix is my IDE
Unix is my IDEUnix is my IDE
Unix is my IDEtkramar
 
Optimising Web Application Frontend
Optimising Web Application FrontendOptimising Web Application Frontend
Optimising Web Application Frontendtkramar
 
MongoDB: Repository for Web-scale metadata
MongoDB: Repository for Web-scale metadataMongoDB: Repository for Web-scale metadata
MongoDB: Repository for Web-scale metadatatkramar
 
Cassandra: Indexing and discovering similar images
Cassandra: Indexing and discovering similar imagesCassandra: Indexing and discovering similar images
Cassandra: Indexing and discovering similar imagestkramar
 
CouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy serverCouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy servertkramar
 
Ruby vim
Ruby vimRuby vim
Ruby vimtkramar
 

More from tkramar (8)

Lessons learned from SearchD development
Lessons learned from SearchD developmentLessons learned from SearchD development
Lessons learned from SearchD development
 
Live Streaming & Server Sent Events
Live Streaming & Server Sent EventsLive Streaming & Server Sent Events
Live Streaming & Server Sent Events
 
Unix is my IDE
Unix is my IDEUnix is my IDE
Unix is my IDE
 
Optimising Web Application Frontend
Optimising Web Application FrontendOptimising Web Application Frontend
Optimising Web Application Frontend
 
MongoDB: Repository for Web-scale metadata
MongoDB: Repository for Web-scale metadataMongoDB: Repository for Web-scale metadata
MongoDB: Repository for Web-scale metadata
 
Cassandra: Indexing and discovering similar images
Cassandra: Indexing and discovering similar imagesCassandra: Indexing and discovering similar images
Cassandra: Indexing and discovering similar images
 
CouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy serverCouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy server
 
Ruby vim
Ruby vimRuby vim
Ruby vim
 

Recently uploaded

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Recently uploaded (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Learning to rank fulltext results from clicks

  • 1. Learning to rank fulltext results from clicks Tomáš Kramár @tkramar @synopsitv
  • 2. Let's build a fulltext search engine. Query Find matches Rank results 1 2 43
  • 3. Let's build a fulltext search engine. Query Find matches Rank results 1 2 43 ● ElasticSearch ● LIKE %% ● ...
  • 4. Let's build a fulltext search engine. Query Find matches Rank results 1 2 43 ● By number of hits ● By PageRank ● By Date ● ...
  • 5.
  • 7. Number of keywords in title 2 2 Number of keywords in text 2 0 Domain carrerjet.sk vienna-rb.at Category Job search Programming Language Slovak English
  • 8. Document feature How much I care about it (the higher the more I care) # keywords in title 2.1 # keywords in text 1 Domain is carreerjet.sk -2 Domain is vienna-rb.at 3.5 Category is Job Search -1 Category is Programming 4.2 Language is Slovak 0.9 Language is English 1.5
  • 9. Document feature How much I care about it # keywords in title 2.1 2 2 # keywords in text 1 2 0 Domain is carreerjet.sk -2 1 0 Domain is vienna-rb.at 3.5 0 1 Category is Job Search -1 1 0 Category is Programming 4.2 0 1 Language is Slovak 0.9 1 0 Language is English 1.5 0 1 = 4.1 = 13.3rank = d . u
  • 10. Rate each result on a scale 1- 5.
  • 11. rating = d . u = = d1 . u1 + d2 . u2 + ... + dn . un d1,1 . u1 + d1,2 . u2 + ... + d1,n . un = 3 d2,1 . u1 + d2,2 . u2 + ... + d2,n . un = 5 d3,1 . u1 + d3,2 . u2 + ... + dn . u3,n = 1 d4,1 . u1 + d4,1 . u2 + ... + dn . u4,n = 3
  • 12. rating = d . u = = d1 . u1 + d2 . u2 + ... + dn . un d1,1 . u1 + d1,2 . u2 + ... + d1,n . un = 3 d2,1 . u1 + d2,2 . u2 + ... + d2,n . un = 5 d3,1 . u1 + d3,2 . u2 + ... + dn . u3,n = 1 d4,1 . u1 + d4,1 . u2 + ... + dn . u4,n = 3 di,j are known, solve this system of equations and you have u. Done.
  • 13. Except.. ● You don't know the explicit ratings ● User preferences change in time ● Those equations probably don't have solution
  • 15. Except.. ● You don't know the explicit ratings ● User preferences change in time ● Those equations probably don't have solution
  • 16. Approximation function h(d): d → rank h(d) = d1 .u1 + ... + dn .un = estimated_rank If the function is good, it should make minimal errors error = (estimated_rank - real_rank)2
  • 17. Gradient descent 1. Set user preferences (u) to arbitrary values 2. Calculate the estimated rank h(d) for each document 3. Calculate the mean square error 4. Adjust preferences u in a way that minimizes the error 5. Repeat until the error converges
  • 18. meansquareerror u# of keywords in title cost function
  • 19. meansquareerror u# of keywords in title cost function Calculate the derivation of cost function at this point and it will give you the direction to move in.
  • 20. Preference update ui = ui - α.h(d)dui α learning rate h(d)dui partial derivation of cost function h(d) by ui
  • 21. Preference update ui = ui - α.h(d)dui α learning rate h(d)dui partial derivation of cost function h(d) by ui How fast will you move. Too low - slow progress. Too high - you will overshoot.
  • 22. Preference update ui = ui - α.h(d)dui α learning rate h(d)dui partial derivation of cost function h(d) by ui Nothing scary. You can find these online for standard cost functions. For mean square error: (rank(d) - h(d)) * ui
  • 23. Gradient descent 1. Set user preferences (u) to arbitrary values 2. Calculate the estimated rank h(d) for each document 3. Calculate the square error 4. Adjust preferences u in a way that minimizes the error 5. Repeat until the error converges
  • 24. Clicked! Assume rating 1. Clicked! Assume rating 1. Or? Doesn't this mean result #1 is not relevant?
  • 25. Clicked! Assume nothing. Clicked! Assume it is better than #2 and #3.
  • 26. What's changed? We no longer have ratings, just document comparisons. Cost function - something that considers ordering, e.g., Kendall's T (number of concordant and discordant pairs) h is now a function of 2 parameters: h(d1, d2). But you can just do d2 - d1 and learn on that. d4 > d3 d4 > d2