On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

On Statistical Analysis and
Optimization of Information Retrieval
Effectiveness Metrics
Jun Wang
Joint work with Jianhan Zhu
Department of Computer Science
University College London
J.Wang@cs.ucl.ac.uk

Motivation
IR Models
Calculate (relevance)
scores for individual documents
Probability Indexing
BM25
Language Models
The Binary Independent Rel. Model

Motivation
✔
✖
✔
✖
m (a rank order | “true” relevance of documents))
A general definition:

Motivation
We have different rank preferences and thus IR
metrics
NDCG
IR Models
MRR
MAP
?
…
Something missing in
between

Motivation
The fundamental question
What is the underlying generative retrieval process?

Outline
• What is happening right now
• The statistical retrieval process
• Text retrieval experiments

What is happening right now (1)?
• Still focusing on (relevance) score, but with the
acknowledgement the final rank context
– The “less is more” model [Chen&Karger 2006] extended
the relevance model
– assumed the previously retrieved documents non-
relevant when calculating the rel. of documents for the
current rank position,
– equivalent to maximizing the Reciprocal Rank measure

• Still focusing on (relevance) score, but with the
acknowledgement the final rank context
– In the Language Model framework, various loss
functions were defined to incorporate various ranking
strategies [Zhai&Lafferty 2006]

• Focusing on IR metrics and Ranking
– bypass the step of estimating the relevance states of
individual documents
– construct a document ranking model from training data
by directly optimizing an IR metric [Volkovs&Zemel
2009]
• However, not all IR metrics necessarily
summarize the (training) data well; thus, training
data may not be fully explored

A “balanced” view of the retrieval process
– let us first understand
(infer) the relevance of
documents as accurate as
possible,
– and to summarize it by the
joint probability of
documents’ relevance
– dependency between
documents is considered
– Secondly, rank preference
is specified by an IR
metric.
– The rank decision making
is a stochastic one due to
the uncertainty about the
relevance
– As a result, the optimal
ranking action is the one
that maximizes the
expected value of the IR
metric
Given an IR Metric

The statistical document ranking process
ˆa = αργ µ αξα Ε(µ | θ)
= αργ µ αξα1 ,...,αΝ
( µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ))
ρ1 ,...,ρΝ
∑
The joint
probability of
relevance given a
query
IR metric:
Input:
1.A rank order
2.Relevance of
docs. r1,...,rN
a1,...,aN

Now the question is how to calculate the
Expected IR metric under the joint probability
of relevance
if we predefine the IR metric
E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)
ρ1 ,...,ρΝ
∑
m(a1,...,aN | r1,...,rN )

We worked out it for the major IR metrics
(Average Precision, DCG, Precision at N,
Reciprocal Rank)
• Certain assumptions are needed
• The join distribution of relevance
is summarized by the marginal
means and co-variances
E(r1 | q),...,E(rN | q)
cov(ri ,rj | q)
p(r1,...,rN | q)

Some of the results
• Expect Average Precision:
• Expected Reciprocal Rank (two documents):
E[ m ]

Properties of IR metrics under the uncertainty

But, is this analysis can be used in practice?
• The key question is how to obtain the joint
probability of relevance?
– Click through data
– Marginal mean
• Current IR models – relevance models, language models
- Co-variance of relevance
- Use the documents’ score correlation to estimate the relevance
correlation.
- It is query-independent. We approximate it by sampling queries
and calculating the correlation between documents’ ranking
scores
E(r1 | q),...,E(rN | q)
cov(ri ,rj | q)

The ideal can be applied for evaluation too.
uncertainty
Fixed an IR Metric
Output the
estimated
Performance
Score
m
a1,...,aN
p(r1,...,rN | q)
E(m | q)
Input a IR model
Relevance judgments

On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

Similar to On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics (20)

Recently uploaded

Recently uploaded (20)

On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics

Editor's Notes