Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular (Apache Solr supports it from Jan 2017), organisations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
– model and collect the necessary feedback from the users (implicit or explicit)
– calculate for each training sample a relevance label which is meaningful and not ambiguous (Click Through Rate, Sales Rate …)
– transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training library expect)
Join us as we explore real world scenarios and dos and don’ts from the e-commerce industry.
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
How to Build your Training Set for a Learning To Rank Project
1. London Information Retrieval Meetup
21 Oct 2019
How to Build your Training Set
for a Learning to Rank Project
Alessandro Benedetti, Software Engineer
21th October 2019
2. London Information Retrieval Meetup
Sease
Search Services
● Open Source Enthusiasts
● Apache Lucene/Solr experts
! Community Contributors
● Active Researchers
● Hot Trends : Learning To Rank, Document Similarity,
Search Quality Evaluation, Relevancy Tuning
3. London Information Retrieval Meetup
Who I am
▪ Search Consultant
▪ R&D Software Engineer
▪ Master in Computer Science
▪ Apache Lucene/Solr Enthusiast
▪ S e m a n t i c , N L P, M a c h i n e L e a r n i n g
Technologies passionate
▪ Beach Volleyball Player & Snowboarder
Alessandro Benedetti
4. London Information Retrieval Meetup
Agenda
● Learning To Rank
● Training Set Definition
! Implicit/Explicit feedback
! Feature Engineering
● Relevance Label
● Metric Evaluation/Loss Function
! Training Set Split
5. London Information Retrieval Meetup
Learning To Rank
What is it ?
Learning from user implicit/explicit feedback
To
Rank documents (sensu lato)
6. London Information Retrieval Meetup
Learning To Rank - What is NOT
“Learning to rank is the application of machine
learning, typically supervised, semi-supervised or
reinforcement learning, in the construction of
ranking models for information retrieval
systems.” Wikipedia
Learning To Rank
Users Interactions
Logger
Judgement Collector
UI
Interactions
Training
7. London Information Retrieval Meetup
• Sentient system that learn by itself
• Continuously improving itself by ingesting additional feedback
• Easy to set up and tune it
• Easy to give a human understandable explanation of why the model
operates in certain ways
Learning To Rank - What is NOT
8. London Information Retrieval Meetup
• A set of Examples to train the model on
• It will considered the golden truth
• Each example is composed by:
- relevancy rating
- query Id
- feature vector
• The feature vector is composed by N features (<id>:<value>)
Training Set: What is it?
3 qid:1 0:3.4 1:0.7 2:1.5 3:0
2 qid:1 0:5.0 1:0.4 2:1.3 3:0
0 qid:1 0:2.4 1:0.7 2:1.5 3:1
1 qid:2 0:5.7 1:0.2 2:1.1 3:0
3 qid:2 0:0.0 1:0.5 2:4.0 3:0
0 qid:3 0:1.0 1:0.7 2:1.5 3:1
9. London Information Retrieval Meetup
Training Set: Gather Feedback
Ratings
Set
Explicit
Feedback
Implicit
Feedback
Judgements Collector
Interactions Logger
Queen music
Bohemian
Rhapsody
D a n c i n g
Queen
Queen
Albums
Bohemian
Rhapsody
D a n c i n g
Queen
Queen
Albums
10. London Information Retrieval Meetup
Feature Engineering : Feature Level
Document level Query level Query Dependent
This feature describes a
property of the DOCUMENT.
The value of the feature depends only on
the document instance.
e.g.
Document Type = E-commerce Product
<Product price> is a Document Level feature.
<Product colour> is a Document Level feature.
<Product size> is a Document Level feature.
Document Type = Hotel Stay
<Hotel star rating> is a Document Level feature.
<Hotel price> is a Document Level feature.
<Hotel food rating> is a Document Level feature.
Each sample is a <query,document> pair, the feature vector describes numerically this
This feature describes a
property of the QUERY.
The value of the feature depends only on
the query instance.
e.g.
Query Type = E-commerce Search
<Query length> is a Query Level feature.
<User device> is a Query Level feature.
<User budget> is a Query Level feature.
This feature describes a
property of the QUERY in correlation
with the DOCUMENT.
The value of the feature depends on
the query and document instance.
e.g.
Query Type = E-commerce Search
Document Type = E-commerce Product
<first Query Term TF in Product title>
is a Query dependent feature.
<first Query Term DF in Product title>
is a Query dependent feature.
<query selected categories intersecting
the product categories>
is a Query dependent feature.
11. London Information Retrieval Meetup
Feature Engineering : Feature Type
Ordinal CategoricalQuantitative
An ordinal feature describes a property
for which the possible values are ordered.
Ordinal variables can be considered
“in between” categorical and quantitative variables.
e.g.
Educational level might be categorized as
1: Elementary school education
2: High school graduate
3: Some college
4: College graduate
5: Graduate degree
1<2<3<4<5
An quantitative feature describes a property
for which the possible values are
a measurable quantity.
e.g.
Document Type = E-commerce Product
<Product price> is a quantity
e.g.
Document Type = Hotel Stay
<Hotel distance from city center>
is a quantity
A categorical feature represents an attribute of an
object that have a set of distinct possible values.
In computer science it is common to call the possible
values of a categorical features Enumerations.
e.g.
Document Type = E-commerce Product
<Product colour> is a categorical feature
<Product brand> is a categorical feature
N.B. It is easy to observe that to give an order
to the values of a categorical feature
does not make any sense.
For the Colour feature :
red < blue < black has no general meaning.
12. London Information Retrieval Meetup
Feature Engineering : One Hot Encoding
Categorical Features
e.g.
Document Type = E-commerce Product
<Product colour> is a categorical feature
Values: Red, Green, Blue, Other
Encoded Features:
Given a cardinality of N, we build N-1 encoded
binary features
product_colour_red = 0/1
product_colour_green = 0/1
product_colour_blue = 0/1
product_colour_other = 0/1 Dummy Variable Trap
predict feature value from others features
High Cardinality Categoricals
you may need to encode only the top frequent
13. London Information Retrieval Meetup
Feature Engineering : Binary Encoding
Categorical Features
e.g.
Document Type = E-commerce Product
<Product colour> is a categorical feature
Values: Red, Green, Blue, Other
Encoded Features:
1) Ordinal Encoding
Red=0, Green=1 Blue=2 Other=3
2) Binary Encoding
product_colour_bit1 = 0/1
product_colour_bit2 = 0/1
Better for high cardinality categorically
Multi Valued?
you may have collisions and not able to use binary features
14. London Information Retrieval Meetup
Feature Engineering : Missing Values
● Some times a missing value would be equivalent to a 0 value semantic
e.g.
Domain: e-commerce products
Feature: Discount Percentage - [quantitative, document level feature]
a missing discount percentage could model a 0 discount percentage,
missing values can be filled with 0 values
● Some times a missing feature value can have a completely different semantic
e.g.
Domain: Hotel Stay
Feature: Star Rating - [quantitative, document level feature]
a missing star rating it’s not equivalent to a 0 star rating, so an additional feature
should be added to distinguish
15. London Information Retrieval Meetup
Relevance Label : Signal Intensity
Discordant training samples
● Each sample is a user interaction (click, add to cart, sale, ect)
● Some sample are impressions (we have showed the document to the user)
● A rank is attributed to the user interaction types
e.g.
0-Impression < 1-click < 2-add to cart < 3-sale
● The rank becomes the relevance label for the sample
16. London Information Retrieval Meetup
Relevance Label : Simple Click Model
● Each sample is a user interaction (click, add to cart, sale, ect)
● Some sample are impressions (we have showed the document to the user)
● One interaction type is set as the target of optimisation
● Identical Samples are aggregated, the new sample generated will have a new feature:
[Interaction Type Count/ Impressions]
e.g. CTR (Click Through Rate) = for the sample, number of clicks/ number of impressions
● We then get the resulting score (in CTR 0<x<1) and normalise it to get the relevance label
The relevance label scale will depend on the Training algorithm chosen
17. London Information Retrieval Meetup
Relevance Label : Advanced Click Model
Given a sample:
● We have the CTR from previous model
● We compare it with the avg CTR of all samples
● We take into account the statistical significance
of that (how many initial samples generated our estimation?)
● The relevance label will be the product of those two factors
(scaled up accordingly to the training algorithm scale)
More info in John Berryman blog:
http://blog.jnbrymn.com/2018/04/16/better-click-tracking-1/
18. London Information Retrieval Meetup
Point-wise Pair-wise List-wise
How many documents you consider at a time, when calculating your loss function?
● Single Document
● You estimate a function that
predicts the best score of the
document
● Rank the results on the
predicted score
● Score of the doc is
independent of the other
scores
in the same result list
● You can use any regression or
classification algorithm
● Pair of documents
● You estimate the optimal local
ordering to maximise the global
● The Objective is to set local
ordering to minimise the
number of inversion across all
pairs
● Works better than point wise
because predicting a local
ordering is closer to solving the
ranking problem that just
estimate a regression score
● Entire list of documents for a
given query
● Direct optimization of IR
measures such as NDCG
● Minimise a specific loss
function
● The evaluation measure is avg
across the queries
● Works better than pair wise
Learning To Rank : Metric Evaluation
19. London Information Retrieval Meetup
Offline Evaluation Metrics[1/3]
• precision TruePositives/TruePositives+FalsePositives
• precision@K (TruePositives/TruePositives+FalsePositives) in topK
• (precision@1, precision@2, precision@10)
• recall TruePositives/TruePositives+FalseNegatives
Learning To Rank : Metric Evaluation
20. London Information Retrieval Meetup
Offline Evaluation Metrics[2/3]
Let’s combine Precision and recall:
Learning To Rank : Metric Evaluation
22. London Information Retrieval Meetup
Learning To Rank : List-wise and NDCG
● The list is the result set of documents
for a given query Id
● In lambdaMart is often used NDCG@K per list
● The Evaluation Metric is avg over the query Ids
when evaluating a training iteration
It is extremely important to assess
the distribution of training samples
per queryId
Model1 Model2 Model3 Ideal
1 1 1 1
1 1 1 1
Model1 Model2 Model3 Ideal
3 3 3 3
7 7 7 7
Query1
Query2
Under sampled QueryId can
potentially sky rock your NDCG
avg
23. London Information Retrieval Meetup
Build the Lists: QueryId Hashing
● Assess how to calculate your queryId,
each queryId should bring a separate set of results
● No free text query? Group query level features
● Target: uniform distribution of samples per query
● Drop training samples if queryIds are undersampled
24. London Information Retrieval Meetup
Split the Set: Training/Validation(dev)/Test
● Each training set iteration will assess the evaluation metric on the training and validation set
● At the end of the iterations the final model will be evaluated on an unknown Test Set
● This split could be random
● This split could depend on the time the interactions were collected