SlideShare a Scribd company logo
1 of 46
Recruiting Solutions 
1 
Learning-to-Rank Search Results 
Ganesh Venkataraman 
http://www.linkedin.com/in/npcomplete 
@gvenkataraman
Audience Classification 
 Search background 
 ML background 
 Search + ML background 
2 
0% 10% 20% 30% 40% 50% 60% 70% 
Search+ML 
ML 
Search
Outline 
 Search Overview 
 Why Learning to Rank (LTR)? 
 Biases with collecting training data from click logs 
– Sampling bias 
– Presentation bias 
 Three basic approaches 
– Point wise 
– Pair wise 
– List wise 
 Key Takeaways/Summary 
3
tl;dr 
 Ranking interacts heavily with retrieval and query 
understanding 
 Ground truth > features > model* 
 List wise > pair wise > point wise 
4 
* Airbnb engineering blog: http://nerds.airbnb.com/architecting-machine-learning-system-risk/
Primer on Search 
5
bird’s-eye view of how a search engine works 
rank using IR model 
system: 
user: 
Information need query select from results 
6
Pre Retrieval/Retrieval/Post Retrieval 
Pre retrieval 
– Process input query, rewrite, check for spelling etc. 
– Hit search (potentially several) nodes with appropriate query 
Retrieval 
– Given a query, retrieve all documents matching query along with 
a score 
Post retrieval 
– Merge sort results from different search nodes 
– Add relevant information to search results used by front end 
7
8
Claim #1 Search is about understanding the query/user intent 
9
Understanding intent 
10 
TITLE CO GEO 
TITLE-237 
software engineer 
software developer 
programmer 
… 
CO-1441 
Google Inc. 
Industry: Internet 
GEO-7583 
Country: US 
Lat: 42.3482 N 
Long: 75.1890 W 
(RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )
Fixing user errors 
11 
typos 
Help users spell names
Claim #2 Search is about understanding systems 
12
The Search Index 
 Inverted Index: Mapping from (search) terms to list of 
documents (they are present in) 
 Forward Index: Mapping from documents to metadata 
about them 
13
Posting List 
14 
Term Posting List 
DO = “it is what it is” 
D1 = “what is it” 
D2 = “it is a banana” 
DocId 
a 
banana 
is 
it 
what 
2 
2 
0 
0 
1 
1 
2 
2 
0 Frequency 
1 
Bold 
B 
1 1 2 1 
1 1 2 1 
1 1
Candidate selection for “abraham lincoln” 
 Posting lists 
– “abraham” => {5, 7, 8, 23, 47, 101} 
– “lincoln” => {7, 23, 101, 151} 
 Query = “abraham AND lincoln” 
– Retrieved set => {7, 23, 101} 
 Some systems level issues 
– How to represent posting lists efficiently? 
– How does one traverse a very long posting list? (for words like 
“the”, “an” etc.)? 
15
Claim #3 Search is ranking problem 
16
What is search ranking? 
 Ranking 
– Find a ordered list of documents according to relevance between 
documents and query. 
 Traditional search 
– f(query, document) => score 
 Social networks context 
– f(query, document, user) => score 
– Find an ordered list of documents according to relevance 
between documents, query and user 
17
Why LTR? 
 Manual models become hard to tune with very large 
number of features and non-convex interactions 
 Leverages large volume of click through data in an 
automated way 
 Unique challenges involved in crowdsourcing 
personalized ranking 
 Key Issues 
– How do we collect training data? 
– How do we avoid biases? 
– How do we train the model? 
18
TRAINING 
Documents for 
training 
Fe 
atures 
Human 
evaluation 
La 
bels 
Machine 
learning 
model
TRAINING 
Documents for 
training 
Fe 
atures 
Human 
evaluation 
La 
bels 
Machine 
learning 
model
training options – crowdsourcing judgment 
21 
Crowd source judgments 
• (query, user, document) -> label 
• {1, 2, 3, 4, 5}, higher label => 
better 
• Issues 
• Personalized world 
• Difficult to scale
Mining click stream 
Approach: Clicked = Relevant, Not-Clicked = Not Relevant 
User eye 
scan 
direction 
Unfairly penalized?
Position Bias 
 “Accurately interpreting clickthrough data as implicit 
feedback” – Joachims et. al, ACM SIGIR, 2005. 
– Experiment #1 
 Present users with normal Google search results 
 55.56% users clicked first result 
 5.56% clicked second result 
– Experiment #2 
 Same result page as first experiment, but 1st and 2nd result were 
flipped 
 57.14% users clicked first result 
 7.14% clicked second result 
23
FAIR PAIRS 
• Fair Pairs: 
• Randomize, Clicked= R, 
Skipped= NR 
[Radlinski and Joachims, 
AAAI’06]
FAIR PAIRS 
• Fair Pairs: 
• Randomize, Clicked= R, 
Flipped 
Skipped= NR 
[Radlinski and Joachims, 
AAAI’06]
FAIR PAIRS 
• Fair Pairs: 
• Randomize, Clicked= R, 
• Great at dealing with position bias 
• Does not invert models 
Flipped 
Skipped= NR 
[Radlinski and Joachims, 
AAAI’06]
Issue #2 – Sampling Bias 
27 
 Sample bias 
– User clicks or skips only what is shown. 
– What about low scoring results from existing model? 
– Add low-scoring results as ‘easy negatives’ so model 
learns bad results not presented to user. 
… 
label 0 
label 0 
label 0 
… 
label 0 
page 1 page 2 page 3 page n
Issue #2 – Sampling Bias 
28
Avoiding Sampling Bias – Easy negatives 
 Invasive way 
– For a small sample of users add bad results in the SERP page to 
test that the results were indeed bad 
– Not really recommended since it affects UX 
 Non-Invasive way 
– Assume we have a decent model 
– Take tail results and add them to model as an “easy negative” 
– Similar approach can be done for “easy positives” depending on 
applications 
29
How to collect training data? 
 Implicit relevance judgments from click logs – including 
clicked and unclicked results from SERP (avoids position 
bias) 
 Add easy negatives (avoids sampling bias) 
30
Mining click stream 
Approach: Relevance labels 
Label = 0 (least relevant) 
Label = 5 (Most relevant) 
Label = 2
Learning to Rank 
 Pointwise: Reduce ranking to binary classification 
33 
Q1 
+ 
+ 
+ 
- 
Q2 
+ 
- 
- 
- 
Q3 
+ 
+ 
- 
-
Learning to Rank 
 Pointwise: Reduce ranking to binary classification 
34 
Q1 
+ 
+ 
+ 
- 
Q2 
+ 
- 
- 
- 
Q3 
+ 
+ 
- 
-
Learning to Rank 
 Pointwise: Reduce ranking to binary classification 
35 
Q1 
+ 
+ 
+ 
- 
Q2 
+ 
- 
- 
- 
Q3 
+ 
+ 
- 
- 
Limitations 
 Assume relevance is absolute 
 Relevant documents associated with different queries are put into the 
same class
Learning to Rank 
 Pairwise: Reduce ranking to classification of document pairs w.r.t. the 
same query 
– {(Q1, A>B), (Q2, C>D), (Q3, E>F)} 
36
Learning to Rank 
 Pairwise: Reduce ranking to classification of document pairs w.r.t the 
same query 
– {(Q1, A>B), (Q2, C>D), (Q3, E>F)} 
37
Learning to Rank 
 Pairwise 
– No longer assume absolute relevance 
– Limitation: Does not differentiate inversions at top vs. bottom positions 
38
Listwise approach - DCG 
 Objective – Come up with a function to convert entire set 
of ranked search results, each with relevance labels into a 
score 
 Characteristics of such a function 
– Higher relevance in ranked set => higher score 
– Higher relevance in ranked set on higher positions => higher 
score 
 p documents in the search results, each document ‘i’ has 
a relevance reli. 
39 
DCGp = 
p 
å 
2reli -1 
log(i +1) i=1
DCG 
Rank Discounted 
40 
Gain 
1 3 
2 4.4 
3 0.5 
(2relevance -1)/log(1+Rank) 
7.9
NDCG based optimization 
 NDCG@k = Normalized(DCG@k) 
 Ensures value is between 0.0 and 1.0 
 Since NDCG directly represents the “value” of particular 
ranking given the relevance labels, one can directly 
formulate ranking as maximizing NDCG@k (say k = 5) 
 Directly pluggable into a variety of algorithms including 
coordinate ascent 
41
Learning to Rank 
42 
Point wise 
Simple to understand and debug 
Straight forward to use 
✕Query independent 
Pair wise 
✕Assumes relevance is absolute 
Assumes relevance is relative 
Depends on query 
✕Loss function agnostic to position 
List Wise 
Directly operate on ranked lists 
Loss function aware of position 
✕More complicated, non-convex functions, higher 
training time
Search Ranking 
43 
Click Logs Training Data Model 
Offline 
Evaluation 
Online A/B 
test/debug 
score = f(query, user, document)
tl;dr revisited 
 Ranking interacts heavily with retrieval and query 
understanding 
– Query understanding affects intent detection, fixing user errors etc. 
– Retrieval affects candidate selection, speed etc. 
 Ground truth > features > model* 
– Truth data is affected by biases 
 List wise > pair wise > point wise 
– Listwise while more complicated avoids some model level issues in 
pairwise and point wise methods 
44 
* Airbnb engineering blog: http://nerds.airbnb.com/architecting-machine-learning-system-risk/
Useful references 
 “From RankNet to LambdaRank to 
LambdaMART: An overview” – Christopher 
Burges 
 “Learning to Rank for Information Retrieval” – 
Tie-Yan Liu 
 RankLib – has implementations of several LTR 
approaches 
45
LinkedIn search is powered by … 
46 
We are hiring !! 
careers.linkedin.com

More Related Content

What's hot

Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Learning to rank
Learning to rankLearning to rank
Learning to rankBruce Kuo
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox Tsahi Glik
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature EngineeringAlice Zheng
 
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive DataSumit Rangwala
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsJames Kirk
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)HJ van Veen
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain
 
Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...Databricks
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 

What's hot (20)

Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Learning to rank
Learning to rankLearning to rank
Learning to rank
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature Engineering
 
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Session-Based Recommender Systems
Session-Based Recommender SystemsSession-Based Recommender Systems
Session-Based Recommender Systems
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...Recurrent Neural Networks for Recommendations and Personalization with Nick P...
Recurrent Neural Networks for Recommendations and Personalization with Nick P...
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 

Viewers also liked

Learning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMARTLearning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMARTJulian Qian
 
Learning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksLearning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksViet Ha-Thuc
 
Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorialGanesh Venkataraman
 
Machine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInMachine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInViet Ha-Thuc
 
Part 1
Part 1Part 1
Part 1butest
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Esh Vckay
 
Fast Feature Selection for Learning to Rank - ACM International Conference on...
Fast Feature Selection for Learning to Rank - ACM International Conference on...Fast Feature Selection for Learning to Rank - ACM International Conference on...
Fast Feature Selection for Learning to Rank - ACM International Conference on...Andrea Gigli
 
Search Ranking Across Heterogeneous Information Sources
Search Ranking Across Heterogeneous Information SourcesSearch Ranking Across Heterogeneous Information Sources
Search Ranking Across Heterogeneous Information SourcesViet Ha-Thuc
 
Soergel oa week-2014-lightning
Soergel oa week-2014-lightningSoergel oa week-2014-lightning
Soergel oa week-2014-lightningDavid Soergel
 
Personalizing Search at LinkedIn
Personalizing Search at LinkedInPersonalizing Search at LinkedIn
Personalizing Search at LinkedInViet Ha-Thuc
 
Learning to rank fulltext results from clicks
Learning to rank fulltext results from clicksLearning to rank fulltext results from clicks
Learning to rank fulltext results from clickstkramar
 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...Amit Sharma
 
Penguin 4.0 - State of Search 2016
Penguin 4.0 - State of Search 2016 Penguin 4.0 - State of Search 2016
Penguin 4.0 - State of Search 2016 Eric Enge
 
Machine Learning and Search -State of Search 2016
Machine Learning and Search -State of Search 2016 Machine Learning and Search -State of Search 2016
Machine Learning and Search -State of Search 2016 Eric Enge
 
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataВладимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataMail.ru Group
 
Online Learning to Rank
Online Learning to RankOnline Learning to Rank
Online Learning to Rankewhuang3
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised LearningLukas Tencer
 

Viewers also liked (20)

Learning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMARTLearning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMART
 
Learning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksLearning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional Networks
 
Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorial
 
Machine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInMachine Learning for Search at LinkedIn
Machine Learning for Search at LinkedIn
 
Part 1
Part 1Part 1
Part 1
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Magpie
MagpieMagpie
Magpie
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
 
Fast Feature Selection for Learning to Rank - ACM International Conference on...
Fast Feature Selection for Learning to Rank - ACM International Conference on...Fast Feature Selection for Learning to Rank - ACM International Conference on...
Fast Feature Selection for Learning to Rank - ACM International Conference on...
 
Search Ranking Across Heterogeneous Information Sources
Search Ranking Across Heterogeneous Information SourcesSearch Ranking Across Heterogeneous Information Sources
Search Ranking Across Heterogeneous Information Sources
 
Soergel oa week-2014-lightning
Soergel oa week-2014-lightningSoergel oa week-2014-lightning
Soergel oa week-2014-lightning
 
Personalizing Search at LinkedIn
Personalizing Search at LinkedInPersonalizing Search at LinkedIn
Personalizing Search at LinkedIn
 
Learning to rank fulltext results from clicks
Learning to rank fulltext results from clicksLearning to rank fulltext results from clicks
Learning to rank fulltext results from clicks
 
IEEE big data 2015
IEEE big data 2015IEEE big data 2015
IEEE big data 2015
 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
 
Penguin 4.0 - State of Search 2016
Penguin 4.0 - State of Search 2016 Penguin 4.0 - State of Search 2016
Penguin 4.0 - State of Search 2016
 
Machine Learning and Search -State of Search 2016
Machine Learning and Search -State of Search 2016 Machine Learning and Search -State of Search 2016
Machine Learning and Search -State of Search 2016
 
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataВладимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
 
Online Learning to Rank
Online Learning to RankOnline Learning to Rank
Online Learning to Rank
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 

Similar to Learn to Rank search results

Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInDaniel Tunkelang
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxiaeronlineexm
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ DataikuPAPIs.io
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchStructure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchC4Media
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakDeepak Agarwal
 
Online Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingOnline Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingSease
 
Mb0047 management information system
Mb0047   management information systemMb0047   management information system
Mb0047 management information systemsmumbahelp
 
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Spark Summit
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentationnirvdrum
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Jin Young Kim
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxJadna Almeida
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxJadna Almeida
 
Sweeny group think-ias2015
Sweeny group think-ias2015Sweeny group think-ias2015
Sweeny group think-ias2015Marianne Sweeny
 
Web Page Ranking using Machine Learning
Web Page Ranking using Machine LearningWeb Page Ranking using Machine Learning
Web Page Ranking using Machine LearningPradip Rahul
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...Università degli Studi di Milano-Bicocca
 
IRJET- Analysis of Question and Answering Recommendation System
IRJET-  	  Analysis of Question and Answering Recommendation SystemIRJET-  	  Analysis of Question and Answering Recommendation System
IRJET- Analysis of Question and Answering Recommendation SystemIRJET Journal
 

Similar to Learn to Rank search results (20)

Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedIn
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ Dataiku
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchStructure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and Deepak
 
Online Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr InterleavingOnline Testing Learning to Rank with Solr Interleaving
Online Testing Learning to Rank with Solr Interleaving
 
Mb0047 management information system
Mb0047   management information systemMb0047   management information system
Mb0047 management information system
 
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
 
kdd2015
kdd2015kdd2015
kdd2015
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentation
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
 
Measurement And Validation
Measurement And ValidationMeasurement And Validation
Measurement And Validation
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
 
Sweeny group think-ias2015
Sweeny group think-ias2015Sweeny group think-ias2015
Sweeny group think-ias2015
 
Web Page Ranking using Machine Learning
Web Page Ranking using Machine LearningWeb Page Ranking using Machine Learning
Web Page Ranking using Machine Learning
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
 
IRJET- Analysis of Question and Answering Recommendation System
IRJET-  	  Analysis of Question and Answering Recommendation SystemIRJET-  	  Analysis of Question and Answering Recommendation System
IRJET- Analysis of Question and Answering Recommendation System
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Learn to Rank search results

  • 1. Recruiting Solutions 1 Learning-to-Rank Search Results Ganesh Venkataraman http://www.linkedin.com/in/npcomplete @gvenkataraman
  • 2. Audience Classification  Search background  ML background  Search + ML background 2 0% 10% 20% 30% 40% 50% 60% 70% Search+ML ML Search
  • 3. Outline  Search Overview  Why Learning to Rank (LTR)?  Biases with collecting training data from click logs – Sampling bias – Presentation bias  Three basic approaches – Point wise – Pair wise – List wise  Key Takeaways/Summary 3
  • 4. tl;dr  Ranking interacts heavily with retrieval and query understanding  Ground truth > features > model*  List wise > pair wise > point wise 4 * Airbnb engineering blog: http://nerds.airbnb.com/architecting-machine-learning-system-risk/
  • 6. bird’s-eye view of how a search engine works rank using IR model system: user: Information need query select from results 6
  • 7. Pre Retrieval/Retrieval/Post Retrieval Pre retrieval – Process input query, rewrite, check for spelling etc. – Hit search (potentially several) nodes with appropriate query Retrieval – Given a query, retrieve all documents matching query along with a score Post retrieval – Merge sort results from different search nodes – Add relevant information to search results used by front end 7
  • 8. 8
  • 9. Claim #1 Search is about understanding the query/user intent 9
  • 10. Understanding intent 10 TITLE CO GEO TITLE-237 software engineer software developer programmer … CO-1441 Google Inc. Industry: Internet GEO-7583 Country: US Lat: 42.3482 N Long: 75.1890 W (RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )
  • 11. Fixing user errors 11 typos Help users spell names
  • 12. Claim #2 Search is about understanding systems 12
  • 13. The Search Index  Inverted Index: Mapping from (search) terms to list of documents (they are present in)  Forward Index: Mapping from documents to metadata about them 13
  • 14. Posting List 14 Term Posting List DO = “it is what it is” D1 = “what is it” D2 = “it is a banana” DocId a banana is it what 2 2 0 0 1 1 2 2 0 Frequency 1 Bold B 1 1 2 1 1 1 2 1 1 1
  • 15. Candidate selection for “abraham lincoln”  Posting lists – “abraham” => {5, 7, 8, 23, 47, 101} – “lincoln” => {7, 23, 101, 151}  Query = “abraham AND lincoln” – Retrieved set => {7, 23, 101}  Some systems level issues – How to represent posting lists efficiently? – How does one traverse a very long posting list? (for words like “the”, “an” etc.)? 15
  • 16. Claim #3 Search is ranking problem 16
  • 17. What is search ranking?  Ranking – Find a ordered list of documents according to relevance between documents and query.  Traditional search – f(query, document) => score  Social networks context – f(query, document, user) => score – Find an ordered list of documents according to relevance between documents, query and user 17
  • 18. Why LTR?  Manual models become hard to tune with very large number of features and non-convex interactions  Leverages large volume of click through data in an automated way  Unique challenges involved in crowdsourcing personalized ranking  Key Issues – How do we collect training data? – How do we avoid biases? – How do we train the model? 18
  • 19. TRAINING Documents for training Fe atures Human evaluation La bels Machine learning model
  • 20. TRAINING Documents for training Fe atures Human evaluation La bels Machine learning model
  • 21. training options – crowdsourcing judgment 21 Crowd source judgments • (query, user, document) -> label • {1, 2, 3, 4, 5}, higher label => better • Issues • Personalized world • Difficult to scale
  • 22. Mining click stream Approach: Clicked = Relevant, Not-Clicked = Not Relevant User eye scan direction Unfairly penalized?
  • 23. Position Bias  “Accurately interpreting clickthrough data as implicit feedback” – Joachims et. al, ACM SIGIR, 2005. – Experiment #1  Present users with normal Google search results  55.56% users clicked first result  5.56% clicked second result – Experiment #2  Same result page as first experiment, but 1st and 2nd result were flipped  57.14% users clicked first result  7.14% clicked second result 23
  • 24. FAIR PAIRS • Fair Pairs: • Randomize, Clicked= R, Skipped= NR [Radlinski and Joachims, AAAI’06]
  • 25. FAIR PAIRS • Fair Pairs: • Randomize, Clicked= R, Flipped Skipped= NR [Radlinski and Joachims, AAAI’06]
  • 26. FAIR PAIRS • Fair Pairs: • Randomize, Clicked= R, • Great at dealing with position bias • Does not invert models Flipped Skipped= NR [Radlinski and Joachims, AAAI’06]
  • 27. Issue #2 – Sampling Bias 27  Sample bias – User clicks or skips only what is shown. – What about low scoring results from existing model? – Add low-scoring results as ‘easy negatives’ so model learns bad results not presented to user. … label 0 label 0 label 0 … label 0 page 1 page 2 page 3 page n
  • 28. Issue #2 – Sampling Bias 28
  • 29. Avoiding Sampling Bias – Easy negatives  Invasive way – For a small sample of users add bad results in the SERP page to test that the results were indeed bad – Not really recommended since it affects UX  Non-Invasive way – Assume we have a decent model – Take tail results and add them to model as an “easy negative” – Similar approach can be done for “easy positives” depending on applications 29
  • 30. How to collect training data?  Implicit relevance judgments from click logs – including clicked and unclicked results from SERP (avoids position bias)  Add easy negatives (avoids sampling bias) 30
  • 31. Mining click stream Approach: Relevance labels Label = 0 (least relevant) Label = 5 (Most relevant) Label = 2
  • 32.
  • 33. Learning to Rank  Pointwise: Reduce ranking to binary classification 33 Q1 + + + - Q2 + - - - Q3 + + - -
  • 34. Learning to Rank  Pointwise: Reduce ranking to binary classification 34 Q1 + + + - Q2 + - - - Q3 + + - -
  • 35. Learning to Rank  Pointwise: Reduce ranking to binary classification 35 Q1 + + + - Q2 + - - - Q3 + + - - Limitations  Assume relevance is absolute  Relevant documents associated with different queries are put into the same class
  • 36. Learning to Rank  Pairwise: Reduce ranking to classification of document pairs w.r.t. the same query – {(Q1, A>B), (Q2, C>D), (Q3, E>F)} 36
  • 37. Learning to Rank  Pairwise: Reduce ranking to classification of document pairs w.r.t the same query – {(Q1, A>B), (Q2, C>D), (Q3, E>F)} 37
  • 38. Learning to Rank  Pairwise – No longer assume absolute relevance – Limitation: Does not differentiate inversions at top vs. bottom positions 38
  • 39. Listwise approach - DCG  Objective – Come up with a function to convert entire set of ranked search results, each with relevance labels into a score  Characteristics of such a function – Higher relevance in ranked set => higher score – Higher relevance in ranked set on higher positions => higher score  p documents in the search results, each document ‘i’ has a relevance reli. 39 DCGp = p å 2reli -1 log(i +1) i=1
  • 40. DCG Rank Discounted 40 Gain 1 3 2 4.4 3 0.5 (2relevance -1)/log(1+Rank) 7.9
  • 41. NDCG based optimization  NDCG@k = Normalized(DCG@k)  Ensures value is between 0.0 and 1.0  Since NDCG directly represents the “value” of particular ranking given the relevance labels, one can directly formulate ranking as maximizing NDCG@k (say k = 5)  Directly pluggable into a variety of algorithms including coordinate ascent 41
  • 42. Learning to Rank 42 Point wise Simple to understand and debug Straight forward to use ✕Query independent Pair wise ✕Assumes relevance is absolute Assumes relevance is relative Depends on query ✕Loss function agnostic to position List Wise Directly operate on ranked lists Loss function aware of position ✕More complicated, non-convex functions, higher training time
  • 43. Search Ranking 43 Click Logs Training Data Model Offline Evaluation Online A/B test/debug score = f(query, user, document)
  • 44. tl;dr revisited  Ranking interacts heavily with retrieval and query understanding – Query understanding affects intent detection, fixing user errors etc. – Retrieval affects candidate selection, speed etc.  Ground truth > features > model* – Truth data is affected by biases  List wise > pair wise > point wise – Listwise while more complicated avoids some model level issues in pairwise and point wise methods 44 * Airbnb engineering blog: http://nerds.airbnb.com/architecting-machine-learning-system-risk/
  • 45. Useful references  “From RankNet to LambdaRank to LambdaMART: An overview” – Christopher Burges  “Learning to Rank for Information Retrieval” – Tie-Yan Liu  RankLib – has implementations of several LTR approaches 45
  • 46. LinkedIn search is powered by … 46 We are hiring !! careers.linkedin.com