SlideShare a Scribd company logo
1 of 20
On Statistical Analysis and
Optimization of Information Retrieval
Effectiveness Metrics
Jun Wang
Joint work with Jianhan Zhu
Department of Computer Science
University College London
J.Wang@cs.ucl.ac.uk
Motivation
IR Models
Calculate (relevance)
scores for individual documents
Probability Indexing
BM25
Language Models
The Binary Independent Rel. Model
Motivation
✔
✖
✔
✖
m (a rank order | “true” relevance of documents))
A general definition:
Motivation
We have different rank preferences and thus IR
metrics
NDCG
IR Models
MRR
MAP
?
…
Something missing in
between
Motivation
The fundamental question
What is the underlying generative retrieval process?
Outline
• What is happening right now
• The statistical retrieval process
• Text retrieval experiments
What is happening right now (1)?
• Still focusing on (relevance) score, but with the
acknowledgement the final rank context
– The “less is more” model [Chen&Karger 2006] extended
the relevance model
– assumed the previously retrieved documents non-
relevant when calculating the rel. of documents for the
current rank position,
– equivalent to maximizing the Reciprocal Rank measure
What is happening right now (2)?
• Still focusing on (relevance) score, but with the
acknowledgement the final rank context
– In the Language Model framework, various loss
functions were defined to incorporate various ranking
strategies [Zhai&Lafferty 2006]
What is happening right now (3)?
• Focusing on IR metrics and Ranking
– bypass the step of estimating the relevance states of
individual documents
– construct a document ranking model from training data
by directly optimizing an IR metric [Volkovs&Zemel
2009]
• However, not all IR metrics necessarily
summarize the (training) data well; thus, training
data may not be fully explored
A “balanced” view of the retrieval process
– let us first understand
(infer) the relevance of
documents as accurate as
possible,
– and to summarize it by the
joint probability of
documents’ relevance
– dependency between
documents is considered
– Secondly, rank preference
is specified by an IR
metric.
– The rank decision making
is a stochastic one due to
the uncertainty about the
relevance
– As a result, the optimal
ranking action is the one
that maximizes the
expected value of the IR
metric
Given an IR Metric
The statistical document ranking process
ˆa = αργ µ αξα Ε(µ | θ)
= αργ µ αξα1 ,...,αΝ
( µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ))
ρ1 ,...,ρΝ
∑
The joint
probability of
relevance given a
query
IR metric:
Input:
1.A rank order
2.Relevance of
docs. r1,...,rN
a1,...,aN
The Optimal Ranker
uncertainty
Fixed an IR Metric
OUTPUT: the
estimated
Performance
Score
E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)
ρ1 ,...,ρΝ
∑
m
a1,...,aN
p(r1,...,rN | q)
E(m | q)
Now the question is how to calculate the
Expected IR metric under the joint probability
of relevance
if we predefine the IR metric
E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)
ρ1 ,...,ρΝ
∑
m(a1,...,aN | r1,...,rN )
We worked out it for the major IR metrics
(Average Precision, DCG, Precision at N,
Reciprocal Rank)
• Certain assumptions are needed
• The join distribution of relevance
is summarized by the marginal
means and co-variances
E(r1 | q),...,E(rN | q)
cov(ri ,rj | q)
p(r1,...,rN | q)
Some of the results
• Expect Average Precision:
• Expected Reciprocal Rank (two documents):
E[ m ]
Properties of IR metrics under the uncertainty
But, is this analysis can be used in practice?
• The key question is how to obtain the joint
probability of relevance?
– Click through data
– Marginal mean
• Current IR models – relevance models, language models
- Co-variance of relevance
- Use the documents’ score correlation to estimate the relevance
correlation.
- It is query-independent. We approximate it by sampling queries
and calculating the correlation between documents’ ranking
scores
E(r1 | q),...,E(rN | q)
cov(ri ,rj | q)
TREC evaluation
No free lunch
The ideal can be applied for evaluation too.
uncertainty
Fixed an IR Metric
Output the
estimated
Performance
Score
m
a1,...,aN
p(r1,...,rN | q)
E(m | q)
Input a IR model
Relevance judgments

More Related Content

What's hot

Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Mohammed Musah
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationIJECEIAES
 
Statistics for management
Statistics for managementStatistics for management
Statistics for managementVinay Aradhya
 
Introduction to statistics 1
Introduction to statistics 1Introduction to statistics 1
Introduction to statistics 1Anwar Afridi
 
Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2NBER
 
Statistics Assignments 090427
Statistics Assignments 090427Statistics Assignments 090427
Statistics Assignments 090427amykua
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...IJECEIAES
 
Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor AnalysesNeerav Shivhare
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor AnalysisDaire Hooper
 
Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design Peter Kenny
 
Factor analysis (fa)
Factor analysis (fa)Factor analysis (fa)
Factor analysis (fa)Rajdeep Raut
 
Statistics in real life engineering
Statistics in real life engineeringStatistics in real life engineering
Statistics in real life engineeringMD TOUFIQ HASAN ANIK
 
Factor Analysis in Research
Factor Analysis in ResearchFactor Analysis in Research
Factor Analysis in ResearchQasim Raza
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlationdomsr
 
30 14 jun17 3may 7620 7789-1-sm(edit)new
30 14 jun17 3may 7620 7789-1-sm(edit)new30 14 jun17 3may 7620 7789-1-sm(edit)new
30 14 jun17 3may 7620 7789-1-sm(edit)newIAESIJEECS
 
Intermediate Strategies for Metabolomic Data Analysis
Intermediate Strategies for Metabolomic Data AnalysisIntermediate Strategies for Metabolomic Data Analysis
Intermediate Strategies for Metabolomic Data AnalysisDmitry Grapov
 

What's hot (20)

Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classication
 
Statistics for management
Statistics for managementStatistics for management
Statistics for management
 
Introduction to statistics 1
Introduction to statistics 1Introduction to statistics 1
Introduction to statistics 1
 
Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2
 
Statistics Assignments 090427
Statistics Assignments 090427Statistics Assignments 090427
Statistics Assignments 090427
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...
 
Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor Analyses
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
 
Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design Data-analytic sins in property-based molecular design
Data-analytic sins in property-based molecular design
 
Factor analysis (fa)
Factor analysis (fa)Factor analysis (fa)
Factor analysis (fa)
 
Statistics in real life engineering
Statistics in real life engineeringStatistics in real life engineering
Statistics in real life engineering
 
Factor Analysis in Research
Factor Analysis in ResearchFactor Analysis in Research
Factor Analysis in Research
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlation
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
30 14 jun17 3may 7620 7789-1-sm(edit)new
30 14 jun17 3may 7620 7789-1-sm(edit)new30 14 jun17 3may 7620 7789-1-sm(edit)new
30 14 jun17 3may 7620 7789-1-sm(edit)new
 
Factor analysis (1)
Factor analysis (1)Factor analysis (1)
Factor analysis (1)
 
Priya
PriyaPriya
Priya
 
Intermediate Strategies for Metabolomic Data Analysis
Intermediate Strategies for Metabolomic Data AnalysisIntermediate Strategies for Metabolomic Data Analysis
Intermediate Strategies for Metabolomic Data Analysis
 

Viewers also liked

On Search, Personalisation and Real-time Advertising
On Search, Personalisation and Real-time AdvertisingOn Search, Personalisation and Real-time Advertising
On Search, Personalisation and Real-time AdvertisingJun Wang
 
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display AdvertisingWeinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display AdvertisingJun Wang
 
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...Jun Wang
 
Wsdm17 value-at-risk-bidding
Wsdm17 value-at-risk-biddingWsdm17 value-at-risk-bidding
Wsdm17 value-at-risk-biddingJun Wang
 
Statistical Information Retrieval Modelling: from the Probability Ranking Pr...
Statistical Information Retrieval Modelling:  from the Probability Ranking Pr...Statistical Information Retrieval Modelling:  from the Probability Ranking Pr...
Statistical Information Retrieval Modelling: from the Probability Ranking Pr...Jun Wang
 
Deep Learning
Deep LearningDeep Learning
Deep LearningJun Wang
 

Viewers also liked (7)

On Search, Personalisation and Real-time Advertising
On Search, Personalisation and Real-time AdvertisingOn Search, Personalisation and Real-time Advertising
On Search, Personalisation and Real-time Advertising
 
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display AdvertisingWeinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
 
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
A Brief Introduction of Real-time Bidding Display Advertising and Evaluation ...
 
Wsdm17 value-at-risk-bidding
Wsdm17 value-at-risk-biddingWsdm17 value-at-risk-bidding
Wsdm17 value-at-risk-bidding
 
Wsdm2015
Wsdm2015Wsdm2015
Wsdm2015
 
Statistical Information Retrieval Modelling: from the Probability Ranking Pr...
Statistical Information Retrieval Modelling:  from the Probability Ranking Pr...Statistical Information Retrieval Modelling:  from the Probability Ranking Pr...
Statistical Information Retrieval Modelling: from the Probability Ranking Pr...
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 

Similar to On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...Gianluca Bontempi
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavAgile Testing Alliance
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity ResolutionBenjamin Bengfort
 
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...Joydeep Mondal
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
Statistical analysis and interpretation
Statistical analysis and interpretationStatistical analysis and interpretation
Statistical analysis and interpretationDave Marcial
 
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Alexandros Karatzoglou
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationDmitry Grapov
 
4.4 correlation manual calcualtion
4.4 correlation manual calcualtion4.4 correlation manual calcualtion
4.4 correlation manual calcualtionRajeev Kumar
 
Lecture 9 correlation-manual calcualtion
Lecture 9 correlation-manual calcualtionLecture 9 correlation-manual calcualtion
Lecture 9 correlation-manual calcualtionDr Rajeev Kumar
 
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTopic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTetsuya Sakai
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspacePrakash Dubey
 
Ranking using pairwise preferences
Ranking using pairwise preferencesRanking using pairwise preferences
Ranking using pairwise preferencesSweta Sharma
 
Part 1
Part 1Part 1
Part 1butest
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 

Similar to On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics (20)

A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...A model-based relevance estimation approach for feature selection in microarr...
A model-based relevance estimation approach for feature selection in microarr...
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...
Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Jo...
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Statistical analysis and interpretation
Statistical analysis and interpretationStatistical analysis and interpretation
Statistical analysis and interpretation
 
Building the Professional of 2020: An Approach to Business Change Process Int...
Building the Professional of 2020: An Approach to Business Change Process Int...Building the Professional of 2020: An Approach to Business Change Process Int...
Building the Professional of 2020: An Approach to Business Change Process Int...
 
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
Ranking and Diversity in Recommendations - RecSys Stammtisch at SoundCloud, B...
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 
4.4 correlation manual calcualtion
4.4 correlation manual calcualtion4.4 correlation manual calcualtion
4.4 correlation manual calcualtion
 
Chapter two
Chapter twoChapter two
Chapter two
 
Lecture 9 correlation-manual calcualtion
Lecture 9 correlation-manual calcualtionLecture 9 correlation-manual calcualtion
Lecture 9 correlation-manual calcualtion
 
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text ConversationTopic Set Size Design with the Evaluation Measures for Short Text Conversation
Topic Set Size Design with the Evaluation Measures for Short Text Conversation
 
ppt0320defenseday
ppt0320defensedayppt0320defenseday
ppt0320defenseday
 
EDA by Sastry.pptx
EDA by Sastry.pptxEDA by Sastry.pptx
EDA by Sastry.pptx
 
Document ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspaceDocument ranking using qprp with concept of multi dimensional subspace
Document ranking using qprp with concept of multi dimensional subspace
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
Ranking using pairwise preferences
Ranking using pairwise preferencesRanking using pairwise preferences
Ranking using pairwise preferences
 
Part 1
Part 1Part 1
Part 1
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

  • 1. On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics Jun Wang Joint work with Jianhan Zhu Department of Computer Science University College London J.Wang@cs.ucl.ac.uk
  • 2. Motivation IR Models Calculate (relevance) scores for individual documents Probability Indexing BM25 Language Models The Binary Independent Rel. Model
  • 3. Motivation ✔ ✖ ✔ ✖ m (a rank order | “true” relevance of documents)) A general definition:
  • 4. Motivation We have different rank preferences and thus IR metrics NDCG IR Models MRR MAP ? … Something missing in between
  • 5. Motivation The fundamental question What is the underlying generative retrieval process?
  • 6. Outline • What is happening right now • The statistical retrieval process • Text retrieval experiments
  • 7. What is happening right now (1)? • Still focusing on (relevance) score, but with the acknowledgement the final rank context – The “less is more” model [Chen&Karger 2006] extended the relevance model – assumed the previously retrieved documents non- relevant when calculating the rel. of documents for the current rank position, – equivalent to maximizing the Reciprocal Rank measure
  • 8. What is happening right now (2)? • Still focusing on (relevance) score, but with the acknowledgement the final rank context – In the Language Model framework, various loss functions were defined to incorporate various ranking strategies [Zhai&Lafferty 2006]
  • 9. What is happening right now (3)? • Focusing on IR metrics and Ranking – bypass the step of estimating the relevance states of individual documents – construct a document ranking model from training data by directly optimizing an IR metric [Volkovs&Zemel 2009] • However, not all IR metrics necessarily summarize the (training) data well; thus, training data may not be fully explored
  • 10. A “balanced” view of the retrieval process – let us first understand (infer) the relevance of documents as accurate as possible, – and to summarize it by the joint probability of documents’ relevance – dependency between documents is considered – Secondly, rank preference is specified by an IR metric. – The rank decision making is a stochastic one due to the uncertainty about the relevance – As a result, the optimal ranking action is the one that maximizes the expected value of the IR metric Given an IR Metric
  • 11. The statistical document ranking process ˆa = αργ µ αξα Ε(µ | θ) = αργ µ αξα1 ,...,αΝ ( µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)) ρ1 ,...,ρΝ ∑ The joint probability of relevance given a query IR metric: Input: 1.A rank order 2.Relevance of docs. r1,...,rN a1,...,aN
  • 12. The Optimal Ranker uncertainty Fixed an IR Metric OUTPUT: the estimated Performance Score E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ) ρ1 ,...,ρΝ ∑ m a1,...,aN p(r1,...,rN | q) E(m | q)
  • 13. Now the question is how to calculate the Expected IR metric under the joint probability of relevance if we predefine the IR metric E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ) ρ1 ,...,ρΝ ∑ m(a1,...,aN | r1,...,rN )
  • 14. We worked out it for the major IR metrics (Average Precision, DCG, Precision at N, Reciprocal Rank) • Certain assumptions are needed • The join distribution of relevance is summarized by the marginal means and co-variances E(r1 | q),...,E(rN | q) cov(ri ,rj | q) p(r1,...,rN | q)
  • 15. Some of the results • Expect Average Precision: • Expected Reciprocal Rank (two documents): E[ m ]
  • 16. Properties of IR metrics under the uncertainty
  • 17. But, is this analysis can be used in practice? • The key question is how to obtain the joint probability of relevance? – Click through data – Marginal mean • Current IR models – relevance models, language models - Co-variance of relevance - Use the documents’ score correlation to estimate the relevance correlation. - It is query-independent. We approximate it by sampling queries and calculating the correlation between documents’ ranking scores E(r1 | q),...,E(rN | q) cov(ri ,rj | q)
  • 20. The ideal can be applied for evaluation too. uncertainty Fixed an IR Metric Output the estimated Performance Score m a1,...,aN p(r1,...,rN | q) E(m | q) Input a IR model Relevance judgments

Editor's Notes

  1. focus still on designing a scoring function of a document, but with the acknowledgement of various retrieval goals and the final rank context.
  2. focus still on designing a scoring function of a document, but with the acknowledgement of various retrieval goals and the final rank context.
  3. Informative argument.: some evaluation metrics are less informative than others [4]. some IR metrics thus do not necessarily summarize the (training) data well; if we begin optimizing IR metrics right from the data, the statistics of the data may not be fully explored and utilized. It is not really adaptive as have to re-do the whole training if want to optimize another metric.
  4. In the first stage, the aim is to estimate the relevance of documents as accurate as possible, and summarize it by the joint probability of documents’ relevance. Only in the second stage is the rank preference specified, possibly by an IR metric. The rank decision making is a stochastic one due to the uncertainty about the relevance. As a result, the optimal ranking action is the one that maximizes the expected value of the IR metric
  5. In the first stage, the aim is to estimate the relevance of documents as accurate as possible, and summarize it by the joint probability of documents’ relevance. Only in the second stage is the rank preference specified, possibly by an IR metric. The rank decision making is a stochastic one due to the uncertainty about the relevance. As a result, the optimal ranking action is the one that maximizes the expected value of the IR metric