Learning to rank fulltext results from clicks

•

1 like•1,222 views

tkramar

Technology Design

Let's build a fulltext search
engine.
Query
Find
matches
Rank
results
1 2
43

Let's build a fulltext search
engine.
Query
Find
matches
Rank
results
1 2
43
● ElasticSearch
● LIKE %%
● ...

Let's build a fulltext search
engine.
Query
Find
matches
Rank
results
1 2
43
● By number of hits
● By PageRank
● By Date
● ...

Number of
keywords in title
2 2
Number of
keywords in text
2 0
Domain carrerjet.sk vienna-rb.at
Category Job search Programming
Language Slovak English

Document feature How much I care about it
(the higher the more I care)
# keywords in title 2.1
# keywords in text 1
Domain is carreerjet.sk -2
Domain is vienna-rb.at 3.5
Category is Job Search -1
Category is Programming 4.2
Language is Slovak 0.9
Language is English 1.5

Document feature How much I
care about it
# keywords in title 2.1 2 2
# keywords in text 1 2 0
Domain is carreerjet.sk -2 1 0
Domain is vienna-rb.at 3.5 0 1
Category is Job Search -1 1 0
Category is Programming 4.2 0 1
Language is Slovak 0.9 1 0
Language is English 1.5 0 1
= 4.1 = 13.3rank = d . u

rating = d . u =
= d1
. u1
+ d2
. u2
+ ... + dn
. un
d1,1
. u1
+ d1,2
. u2
+ ... + d1,n
. un
= 3
d2,1
. u1
+ d2,2
. u2
+ ... + d2,n
. un
= 5
d3,1
. u1
+ d3,2
. u2
+ ... + dn
. u3,n
= 1
d4,1
. u1
+ d4,1
. u2
+ ... + dn
. u4,n
= 3

rating = d . u =
= d1
. u1
+ d2
. u2
+ ... + dn
. un
d1,1
. u1
+ d1,2
. u2
+ ... + d1,n
. un
= 3
d2,1
. u1
+ d2,2
. u2
+ ... + d2,n
. un
= 5
d3,1
. u1
+ d3,2
. u2
+ ... + dn
. u3,n
= 1
d4,1
. u1
+ d4,1
. u2
+ ... + dn
. u4,n
= 3
di,j
are known, solve this system of
equations and you have u. Done.

Except..
● You don't know the explicit
ratings
● User preferences change in time
● Those equations probably don't
have solution

Clicked!
Assume
rating 1.
Not clicked.
Assume
rating 0.

Approximation function
h(d): d → rank
h(d) = d1
.u1
+ ... + dn
.un
= estimated_rank
If the function is good, it should make
minimal errors
error = (estimated_rank - real_rank)2

Gradient descent
1. Set user preferences (u) to arbitrary
values
2. Calculate the estimated rank h(d)
for each document
3. Calculate the mean square error
4. Adjust preferences u in a way that
minimizes the error
5. Repeat until the error converges

meansquareerror
u# of keywords in title
cost function

meansquareerror
u# of keywords in title
cost function
Calculate the derivation of cost
function at this point and it will
give you the direction to move in.

Preference update
ui
= ui
- α.h(d)dui
α learning rate
h(d)dui
partial derivation of
cost function h(d)
by ui

Preference update
ui
= ui
- α.h(d)dui
α learning rate
h(d)dui
partial derivation of
cost function h(d)
by ui
How fast will you
move. Too low -
slow progress. Too
high - you will
overshoot.

Preference update
ui
= ui
- α.h(d)dui
α learning rate
h(d)dui
partial derivation of
cost function h(d)
by ui
Nothing scary. You can
find these online for
standard cost
functions.
For mean square error:
(rank(d) - h(d)) * ui

Gradient descent
1. Set user preferences (u) to arbitrary
values
2. Calculate the estimated rank h(d)
for each document
3. Calculate the square error
4. Adjust preferences u in a way that
minimizes the error
5. Repeat until the error converges

Clicked! Assume
rating 1.
Clicked! Assume
rating 1.
Or? Doesn't
this mean
result #1 is not
relevant?

Clicked! Assume
nothing.
Clicked! Assume
it is better than
#2 and #3.

What's changed?
We no longer have ratings, just document
comparisons.
Cost function - something that
considers ordering, e.g., Kendall's T
(number of concordant and
discordant pairs)
h is now a function of 2
parameters: h(d1, d2). But you can
just do d2 - d1 and learn on that.
d4
> d3
d4
> d2

Learning to rank fulltext results from clicks

Viewers also liked

IE: Named Entity Recognition (NER)Marina Santini

Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataMail.ru Group

Markov model for the online multichannel attribution problemadavide1982

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks

Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou

Collaborative Filtering Recommendation SystemMilind Gokhale

Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain

Viewers also liked (7)

IE: Named Entity Recognition (NER)

Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data

Markov model for the online multichannel attribution problem

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...

Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial

Collaborative Filtering Recommendation System

Recommender Systems (Machine Learning Summer School 2014 @ CMU)

Similar to Learning to rank fulltext results from clicks

Optimizing search enginesSwapnil Kotwal

Pf lec 01 introRajaKayani

Behaviour driven developmentTony Nguyen

Behaviour driven developmentJames Wong

Behaviour drivendevelopmentHoang Nguyen

Behaviour drivendevelopmentLuis Goldster

Behaviour driven developmentFraboni Ec

Behaviour driven developmentHarry Potter

Behaviour drivendevelopmentYoung Alista

Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku

Designing Object Oriented Software - lecture slides 2013Jouni Smed

Improving classification accuracy for customer contact transcriptionsMaria Vechtomova

Software development slidesiarthur

@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...indeedeng

Algorithms overviewDeborah Akuoko

Recommender Systems from A to Z – Model EvaluationCrossing Minds

How to Build your Training Set for a Learning To Rank ProjectSease

Cloud Computing ProjectDevendra Singh Parmar

Translating Qt Applicationsaccount inactive

DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...GeeksLab Odessa

Similar to Learning to rank fulltext results from clicks (20)

Optimizing search engines

Pf lec 01 intro

Behaviour driven development

Behaviour drivendevelopment

Behaviour driven development

Behaviour drivendevelopment

Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge

Designing Object Oriented Software - lecture slides 2013

Improving classification accuracy for customer contact transcriptions

Software development slides

@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...

Algorithms overview

Recommender Systems from A to Z – Model Evaluation

How to Build your Training Set for a Learning To Rank Project

Cloud Computing Project

Translating Qt Applications

DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...

Recently uploaded

Developing An App To Navigate The Roads of BrazilV3cube

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Recently uploaded (20)

Developing An App To Navigate The Roads of Brazil

How to Troubleshoot Apps for the Modern Connected Worker

The Codex of Business Writing Software for Real-World Solutions 2.pptx

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Driving Behavioral Change for Information Management through Data-Driven Gree...

Finology Group – Insurtech Innovation Award 2024

GenCyber Cyber Security Day Presentation

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Handwritten Text Recognition for manuscripts and early printed texts

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Data Cloud, More than a CDP by Matt Robison

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Unblocking The Main Thread Solving ANRs and Frozen Frames

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Injustice - Developers Among Us (SciFiDevCon 2024)

Learning to rank fulltext results from clicks

1. Learning to rank fulltext results from clicks Tomáš Kramár @tkramar @synopsitv

2. Let's build a fulltext search engine. Query Find matches Rank results 1 2 43

3. Let's build a fulltext search engine. Query Find matches Rank results 1 2 43 ● ElasticSearch ● LIKE %% ● ...

4. Let's build a fulltext search engine. Query Find matches Rank results 1 2 43 ● By number of hits ● By PageRank ● By Date ● ...

6. How do you choose relevant results?

7. Number of keywords in title 2 2 Number of keywords in text 2 0 Domain carrerjet.sk vienna-rb.at Category Job search Programming Language Slovak English

8. Document feature How much I care about it (the higher the more I care) # keywords in title 2.1 # keywords in text 1 Domain is carreerjet.sk -2 Domain is vienna-rb.at 3.5 Category is Job Search -1 Category is Programming 4.2 Language is Slovak 0.9 Language is English 1.5

9. Document feature How much I care about it # keywords in title 2.1 2 2 # keywords in text 1 2 0 Domain is carreerjet.sk -2 1 0 Domain is vienna-rb.at 3.5 0 1 Category is Job Search -1 1 0 Category is Programming 4.2 0 1 Language is Slovak 0.9 1 0 Language is English 1.5 0 1 = 4.1 = 13.3rank = d . u

10. Rate each result on a scale 1- 5.

11. rating = d . u = = d1 . u1 + d2 . u2 + ... + dn . un d1,1 . u1 + d1,2 . u2 + ... + d1,n . un = 3 d2,1 . u1 + d2,2 . u2 + ... + d2,n . un = 5 d3,1 . u1 + d3,2 . u2 + ... + dn . u3,n = 1 d4,1 . u1 + d4,1 . u2 + ... + dn . u4,n = 3

12. rating = d . u = = d1 . u1 + d2 . u2 + ... + dn . un d1,1 . u1 + d1,2 . u2 + ... + d1,n . un = 3 d2,1 . u1 + d2,2 . u2 + ... + d2,n . un = 5 d3,1 . u1 + d3,2 . u2 + ... + dn . u3,n = 1 d4,1 . u1 + d4,1 . u2 + ... + dn . u4,n = 3 di,j are known, solve this system of equations and you have u. Done.

13. Except.. ● You don't know the explicit ratings ● User preferences change in time ● Those equations probably don't have solution

14. Clicked! Assume rating 1. Not clicked. Assume rating 0.

15. Except.. ● You don't know the explicit ratings ● User preferences change in time ● Those equations probably don't have solution

16. Approximation function h(d): d → rank h(d) = d1 .u1 + ... + dn .un = estimated_rank If the function is good, it should make minimal errors error = (estimated_rank - real_rank)2

17. Gradient descent 1. Set user preferences (u) to arbitrary values 2. Calculate the estimated rank h(d) for each document 3. Calculate the mean square error 4. Adjust preferences u in a way that minimizes the error 5. Repeat until the error converges

18. meansquareerror u# of keywords in title cost function

19. meansquareerror u# of keywords in title cost function Calculate the derivation of cost function at this point and it will give you the direction to move in.

20. Preference update ui = ui - α.h(d)dui α learning rate h(d)dui partial derivation of cost function h(d) by ui

21. Preference update ui = ui - α.h(d)dui α learning rate h(d)dui partial derivation of cost function h(d) by ui How fast will you move. Too low - slow progress. Too high - you will overshoot.

22. Preference update ui = ui - α.h(d)dui α learning rate h(d)dui partial derivation of cost function h(d) by ui Nothing scary. You can find these online for standard cost functions. For mean square error: (rank(d) - h(d)) * ui

23. Gradient descent 1. Set user preferences (u) to arbitrary values 2. Calculate the estimated rank h(d) for each document 3. Calculate the square error 4. Adjust preferences u in a way that minimizes the error 5. Repeat until the error converges

24. Clicked! Assume rating 1. Clicked! Assume rating 1. Or? Doesn't this mean result #1 is not relevant?

25. Clicked! Assume nothing. Clicked! Assume it is better than #2 and #3.

26. What's changed? We no longer have ratings, just document comparisons. Cost function - something that considers ordering, e.g., Kendall's T (number of concordant and discordant pairs) h is now a function of 2 parameters: h(d1, d2). But you can just do d2 - d1 and learn on that. d4 > d3 d4 > d2

Learning to rank fulltext results from clicks

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Learning to rank fulltext results from clicks

Similar to Learning to rank fulltext results from clicks (20)

More from tkramar

More from tkramar (8)

Recently uploaded

Recently uploaded (20)

Learning to rank fulltext results from clicks