SlideShare a Scribd company logo
1 of 133
Download to read offline
IR Models 
Part I | Foundations 
Thomas Roelleke, Queen Mary University of London 
Ingo Frommholz, University of Bedfordshire 
Autumn School for Information Retrieval and Foraging 
Schloss Dagstuhl, September 2014 
ASIRF Sponsors:
IR Models 
Acknowledgements 
The knowledge presented in this tutorial and the Morgan & Claypool book is 
the result of many many discussions with colleagues. 
People involved in the production and reviewing: Gianna Amati and Djoerd 
Hiemstra (the experts), Diane Cerra and Gary Marchionini (Morgan & 
Claypool), Ricardo Baeza-Yates, Norbert Fuhr, and Mounia Lalmas. 
Thomas' PhD students (who had no choice): Jun Wang, Hengzhi Wu, Fred 
Forst, Hany Azzam, Sirvan Yahyaei, Marco Bonzanini, Miguel Martinez-Alvarez. 
Many more IR experts including Fabio Crestani, Keith van Rijsbergen, Stephen 
Robertson, Fabrizio Sebastiani, Arjen deVries, Tassos Tombros, Hugo Zaragoza, 
ChengXiang Zhai. 
And non-IR experts Fabrizio Smeraldi, Andreas Kaltenbrunner and Norman 
Fenton. 
2 / 133
IR Models 
Table of Contents 
1 Introduction 
2 Foundations of IR Models 
3 / 133
Introduction 
Warming Up 
Background: Time-Line of IR Models 
Notation
Introduction 
Warming Up
IR Models 
Introduction 
Warming Up 
Information Retrieval Conceptual Model 
DD 
rel. 
judg. 
aQ b 
a 
Q 
Q 
D 
b 
r 
IR 
Q D 
D D 
Q 
D 
R 
[Fuhr, 1992] 
6 / 133
IR Models 
Introduction 
Warming Up 
Vector Space Model, Term Space 
Still one of the prominent IR frameworks is the Vector Space 
Model (VSM) 
A term space is a vector space where each dimension 
represents one term in our vocabulary 
If we have n terms in our collection, we get an n-dimensional 
term or vector space 
Each document and each query is represented by a vector in 
the term space 
7 / 133
IR Models 
Introduction 
Warming Up 
Formal Description 
Set of terms in our vocabulary: T = ft1; : : : ; tng 
T spans an n-dimensional vector space 
Document dj is represented by a vector of document term 
weights 
Query q is represented by a vector of query term weights 
8 / 133
IR Models 
Introduction 
Warming Up 
Document Vector 
Document dj is represented by a vector of document term 
weights dji 2 R: 
Document term weights can be computed, e.g., using tf and 
idf (see below) 
9 / 133
IR Models 
Introduction 
Warming Up 
Document Vector 
Document dj is represented by a vector of document term 
weights dji 2 R: 
Weight of term 
in document 
Document term weights can be computed, e.g., using tf and 
idf (see below) 
10 / 133
IR Models 
Introduction 
Warming Up 
Query Vector 
Like documents, a query q is represented by a vector of query 
term weights qi 2 R: 
~q = 
0 
BB@ 
q1 
q2 
: : : 
qn 
1 
CCA 
qi denotes the query term weight of term ti 
qi is 0 if the term does not appear in the query. 
qi may be set to 1 if the term does appear in the query. 
Further query term weights are possible, for example 
2 if the term is important 
1 if the term is just nice to have" 
11 / 133
IR Models 
Introduction 
Warming Up 
Retrieval Function 
The retrieval function computes a retrieval status value (RSV) 
using a vector similarity measure, e.g. the scalar product: 
RSV (dj ; q) = ~dj ~q = 
Xn 
i=1 
dji  qi 
t 
t 
1 
2 
q 
d 
d 
1 
2 
Ranking of documents according to decreasing RSV 
12 / 133
IR Models 
Introduction 
Warming Up 
Example Query 
Query:  side eects of drugs on memory and cognitive abilities 
ti Query ~q ~ d1 ~ d2 ~ d3 ~ d4 
side eect 2 1 0.5 1 1 
drug 2 1 1 1 1 
memory 1 1 0 1 0 
cognitive ability 1 0 1 1 0.5 
RSV 5 4 6 4.5 
Produces the ranking d3  d1  d4  d2 
13 / 133
IR Models 
Introduction 
Warming Up 
Term weights: Example Text 
In his address to the CBI, Mr Cameron is expected to say: 
Scotland does twice as much trade with the rest of the UK than 
with the rest of the world put together { trade that helps to support 
one million Scottish jobs.Meanwhile, Mr Salmond has set out six 
job-creating powers for Scotland that he said were guaranteed with 
a Yes vote in the referendum. During their televised BBC debate 
on Monday, Mr Salmond had challenged Better Together head 
Alistair Darling to name three job-creating powers that were being 
oered to the Scottish Parliament by the pro-UK parties in the 
event of a No vote. 
Source: http://www.bbc.co.uk/news/uk-scotland-scotland-politics-28952197 
What are good descriptors for the text? Which are more, which are 
less important? Which are informative? Which are good 
discriminators? 
How can a machine answer these questions? 
14 / 133
IR Models 
Introduction 
Warming Up 
Frequencies 
The answer is counting 
Dierent assumptions: 
The more frequent a term appears in a document, the more 
suitable it is to describe its content 
Location-based count. Think of term positions or locations. 
In how many locations of a text do we observe the term? 
The term `scotland' appears in 2 out of 138 locations in the 
example text 
The less documents a term occurs in, the more discriminative 
or informative it is 
Document-based count. In how many documents do we 
observe the term? 
Think of stop-words like `the', `a' etc. 
Location- and document-based frequencies are the building 
blocks of all (probabilistic) models to come 
15 / 133
Introduction 
Background: Time-Line of IR Models
IR Models 
Introduction 
Background: Time-Line of IR Models 
Timeline of IR Models: 50s, 60s and 70s 
Zipf and Luhn: distribution of document frequencies; 
[Croft and Harper, 1979]: BIR without relevance; 
[Robertson and Sparck-Jones, 1976]: BIR; 
[Salton, 1971, Salton et al., 1975]: VSM, TF-IDF; 
[Rocchio, 1971]: Relevance feedback; [Maron and Kuhns, 1960]: 
On Relevance, Probabilistic Indexing, and IR 
17 / 133
IR Models 
Introduction 
Background: Time-Line of IR Models 
Timeline of IR Models: 80s 
[Cooper, 1988, Cooper, 1991, Cooper, 1994]: Beyond Boole, 
Probability Theory in IR: An Encumbrance; 
[Dumais et al., 1988, Deerwester et al., 1990]: Latent semantic 
indexing; [van Rijsbergen, 1986, van Rijsbergen, 1989]: P(d ! q); 
[Bookstein, 1980, Salton et al., 1983]: Fuzzy, extended Boolean 
18 / 133
IR Models 
Introduction 
Background: Time-Line of IR Models 
Timeline of IR Models: 90s 
[Ponte and Croft, 1998]: LM; 
[Brin and Page, 1998, Kleinberg, 1999]: Pagerank and Hits; 
[Robertson et al., 1994, Singhal et al., 1996]: Pivoted Document 
Length Normalisation; [Wong and Yao, 1995]: P(d ! q); 
[Robertson and Walker, 1994, Robertson et al., 1995]: 2-Poisson, 
BM25; [Margulis, 1992, Church and Gale, 1995]: Poisson; 
[Fuhr, 1992]: Probabilistic Models in IR; 
[Turtle and Croft, 1990, Turtle and Croft, 1991]: PIN's; 
[Fuhr, 1989]: Models for Probabilistic Indexing 
19 / 133
IR Models 
Introduction 
Background: Time-Line of IR Models 
Timeline of IR Models: 00s 
ICTIR 2009 and ICTIR 2011; [Roelleke and Wang, 2008]: TF-IDF 
Uncovered; [Luk, 2008, Robertson, 2005]: Event Spaces; 
[Roelleke and Wang, 2006]: Parallel Derivation of Models; 
[Fang and Zhai, 2005]: Axiomatic approach; [He and Ounis, 2005]: 
TF in BM25 and DFR; [Metzler and Croft, 2004]: LM and 
PIN's;[Robertson, 2004]: Understanding IDF; 
[Sparck-Jones et al., 2003]: LM and Relevance; 
[Croft and Laerty, 2003, Laerty and Zhai, 2003]: LM book; 
[Zaragoza et al., 2003]: Bayesian extension to LM; 
[Bruza and Song, 2003]: probabilistic dependencies in LM; 
[Amati and van Rijsbergen, 2002]: DFR; 
[Lavrenko and Croft, 2001]: Relevance-based LM; 
[Hiemstra, 2000]: TF-IDF and LM; [Sparck-Jones et al., 2000]: 
probabilistic model: status 
20 / 133
IR Models 
Introduction 
Background: Time-Line of IR Models 
Timeline of IR Models: 2010 and Beyond 
Models for interactive and dynamic IR (e.g. 
iPRP [Fuhr, 2008]) 
Quantum models 
[van Rijsbergen, 2004, Piwowarski et al., 2010] 
21 / 133
Introduction 
Notation
IR Models 
Introduction 
Notation 
Notation 
A tedious start ... but a must-have. 
Sets 
Locations 
Documents 
Terms 
Probabilities 
23 / 133
IR Models 
Introduction 
Notation 
Notation: Sets 
Notation description of events, sets, and frequencies 
t, d, q, c, r term t, document d, query q, collection c, rele- 
vant r 
Dc , Dr Dc = fd1; : : :g: set of Documents in collection c; 
Dr : relevant documents 
Tc , Tr Tc = ft1; : : :g: set of Terms in collection c; Tr : 
terms that occur in relevant documents 
Lc , Lr Lc = fl1; : : :g; set of Locations in collection c; Lr : 
locations in relevant documents 
24 / 133
IR Models 
Introduction 
Notation 
Notation: Locations 
Notation description of events, sets, and 
frequencies 
Traditional notation 
nL(t; d) number of Locations at which 
term t occurs in document d 
tf, tfd 
NL(d) number of Locations in docu- 
ment d (document length) 
dl 
nL(t; q) number of Locations at which 
term t occurs in query q 
qtf, tfq 
NL(q) number of Locations in query q 
(query length) 
ql 
25 / 133
IR Models 
Introduction 
Notation 
Notation: Locations 
Notation description of events, sets, and 
frequencies 
Traditional notation 
nL(t; c) number of Locations at which 
term t occurs in collection c 
TF, cf(t) 
NL(c) number of Locations in collec- 
tion c 
nL(t; r ) number of Locations at which 
term t occurs in the set Lr 
NL(r ) number of Locations in the set 
Lr 
26 / 133
IR Models 
Introduction 
Notation 
Notation: Documents 
Notation description of events, sets, and 
frequencies 
Traditional notation 
nD(t; c) number of Documents in 
which term t occurs in the set 
Dc of collection c 
nt , df(t) 
ND(c) number of Documents in the 
set Dc of collection c 
N 
nD(t; r ) number of Documents in 
which term t occurs in the set 
Dr of relevant documents 
rt 
ND(r ) number of Documents in the 
set Dr of relevant documents 
R 
27 / 133
IR Models 
Introduction 
Notation 
Notation: Terms 
Notation description of events, sets, and 
frequencies 
Traditional notation 
nT (d; c) number of Terms in docu- 
ment d in collection c 
NT (c) number of Terms in collec- 
tion c 
28 / 133
IR Models 
Introduction 
Notation 
Notation: Average and Pivoted Length 
Let u denote a collection associated with a set of documents. For 
example: u = c, or u = r , or u = r . 
Notation description of events, sets, and frequen- 
cies 
Traditional notation 
avgdl(u) average document length: avgdl(u) = 
NL(u)=ND(u) (avgdl if collection im- 
plicit) 
avgdl 
pivdl(d; u) pivoted document length: pivdl(d; u) = 
NL(d)=avgdl(u) = dl=avgdl(u) 
(pivdl(d) if collection implicit) 
pivdl 
(t; u) average term frequency over all docu- 
ments in Du: nL(t; u)=ND(u) 
avgtf(t; u) average term frequency over elite docu- 
ments in Du: nL(t; u)=nD(t; u) 
29 / 133
IR Models 
Introduction 
Notation 
Notation: Location-based Probabilities 
Notation Description of Probabili- 
ties 
Traditional notation 
PL(tjd) := nL(t;d) 
NL(d) Location-based within- 
document term probabil- 
ity 
P(tjd) = tfd 
jdj , jdj = dl = NL(d) 
PL(tjq) := nL(t;q) 
NL(q) Location-based within- 
query term probability 
P(tjq) = tfq 
jqj , jqj = ql = NL(q) 
PL(tjc) := nL(t;c) 
NL(c) Location-based within- 
collection term probabil- 
ity 
P(tjc) = tfc 
jcj , jcj = NL(c) 
PL(tjr ) := nL(t;r ) 
NL(r ) Location-based within- 
relevance term probabil- 
ity 
Event space PL: Locations (LM, TF) 
30 / 133
IR Models 
Introduction 
Notation 
Notation: Document-based Probabilities 
Notation Description of Probabilities Traditional notation 
PD(tjc) := nD(t;c) 
ND(c) Document-based within- 
collection term probability 
P(t) = nt 
N , N = ND(c) 
PD(tjr ) := nD(t;r ) 
ND(r ) Document-based within- 
relevance term probability 
P(tjr ) = rt 
R , R = ND(r ) 
PT (djc) := nT (d;c) 
NT (c) Term-based document proba- 
bility 
Pavg(tjc) := avgtf(t;c) 
avgdl(c) probability that t occurs 
in document with average 
length; avgtf(t; c)  avgdl(c) 
Event space PD: Documents (BIR, IDF) 
31 / 133
IR Models 
Introduction 
Notation 
Toy Example 
Notation Value 
NL(c) 20 
ND(c) 10 
avgdl(c) 20/10=2 
Notation Value 
doc1 doc2 doc3 
NL(d) 2 3 3 
pivdl(d; c) 2/2 3/2 3/2 
Notation Value 
sailing boats 
nL(t; c) 8 6 
nD(t; c) 6 5 
PL(tjc) 8/20 6/20 
PD(tjc) 6/10 5/10 
(t; c) 8/10 6/10 
avgtf(t; c) 8/6 6/5 
32 / 133
Foundations of IR Models 
TF-IDF 
PRF: The Probability of Relevance Framework 
BIR: Binary Independence Retrieval 
Poisson and 2-Poisson 
BM25 
LM: Language Modelling 
PIN's: Probabilistic Inference Networks 
Relevance-based Models 
Foundations: Summary
Foundations of IR Models 
TF-IDF
IR Models 
Foundations of IR Models 
TF-IDF 
TF-IDF 
Still a very popular model 
Best known outside IR research, ery intuitive 
TF-IDF is not a model; it is just a weighting scheme in the 
vector space model 
TF-IDF is purely heuristic; it has no probabilistic roots. 
But: 
TF-IDF and LM are dual models that can be shown to be 
derived from the same root. 
Simpli
ed version of BM25 
35 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
TF Variants: TF(t; d) 
TFtotal(t; d) := lftotal(t; d) := nL(t; d) (= tfd ) 
TFsum(t; d) := lfsum(t; d) := 
nL(t; d) 
NL(d) 
= PL(tjd) 
 
= 
tfd 
dl 
 
TFmax(t; d) := lfmax(t; d) := 
nL(t; d) 
nL(tmax; d) 
TFlog(t; d) := lflog(t; d) := log(1 + nL(t; d)) (= log(1 + tfd )) 
TFfrac;K (t; d) := lffrac;K (t; d) := 
nL(t; d) 
nL(t; d) + Kd 
 
= 
tfd 
tfd + Kd 
 
TFBM25;k1;b(t; d) := ::: := 
nL(t; d) 
nL(t; d) + k1  (b  pivdl(d; c) + (1  b)) 
36 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
TF Variants: Collection-wide TF(t; c) 
Analogously to TF(t; d), the next de
nition de
nes the variants of 
TF(t; c), the collection-wide term frequency. 
Not considered any further here. 
37 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
TFtotal and TFlog 
0 50 100 150 200 
0 50 100 150 200 
nL(t,d) 
tftotal 
0 50 100 150 200 
0 1 2 3 4 5 
nL(t,d) 
tflog 
Bias towards documents with many terms 
(e.g. books vs. Twitter tweets) 
TFtotal: 
too steep 
assumes all occurrences are independent 
(same impact) 
TFlog: 
less impact to subsequent occurrences 
the base of the logarithm is ranking 
invariant, since it is a constant: 
TFlog;base(t; d) := 
ln(1 + tfd ) 
ln(base) 
38 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
Logarithmic TF: Dependence Assumption 
The logarithmic TF assigns less impact to subsequent occurrences 
than the total TF does. This aspect becomes clear when 
reconsidering that the logarithm is an approximation of the 
harmonic sum: 
TFlog(t; d) = ln(1 + tfd )  1 + 
1 
2 
+ : : : + 
1 
tfd 
tfd  0 
Note: ln(n + 1) = 
R n+1 
1 
1 
x dx. 
Whereas: TFtotal(t; d) = 1 + 1 + ::: + 1 
The
rst occurrence of a term counts in full, the second 
counts 1=2, the third counts 1=3, and so forth. 
This gives a particular insight into the type of dependence that 
is re
ected by bending the total TF into a saturating curve. 
39 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
TFsum, TFmax and TFfrac: Graphical Illustration 
1.0 
0.8 0.6 tfmax: NL(tmax d)=400 
0.4 0.2 tfmax: NL(tmax d)=500 
0.0 0 50 100 150 200 nL(t,d) 
tf 
tfsum: NL(d)=200 
tfsum: NL(d)=2000 
0 50 100 150 200 
0.0 0.2 0.4 0.6 0.8 1.0 
nL(t,d) 
tffrac 
tffrac K=1 
tffrac K=5 
tffrac K=10 
tffrac K=20 
tffrac K=100 
tffrac K=200 
tffrac K=1000 
40 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
TFsum, TFmax and TFfrac: Analysis 
Document length normalisation (TFfrac: K may depend on 
document length) 
Usually TFmax yields higher TF-values than TFsum 
Linear TF variants are not really important anymore, since 
TFfrac (TFBM25) delivers better and more stable quality 
TFfrac yields relatively high TF-values already for small 
frequencies, and the curve saturates for large frequencies 
The good and stable performance of BM25 indicates that this 
non-linear nature is key for achieving good retrieval quality. 
41 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
Fractional TF: Dependence Assumption 
What we refer to as fractional TF, is a ratio: 
ratio(x; y) = 
x 
x + y 
= 
tfd 
tfd + Kd 
= TFfrac;K(t; d) 
The ratio is related to the harmonic sum of squares. 
n 
n + 1 
 1 + 
1 
22 + : : : + 
1 
n2 n  0 
This approximation is based on the following integral: 
Z n+1 
1 
1 
z2 dz = 
 
 
1 
z 
n+1 
1 
= 1  
1 
n + 1 
= 
n 
n + 1 
TFfrac assumes more dependence than TFlog. k-th occurrence of a 
term has an impact of 1=k2. 
42 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
TFBM25 
TFBM25 is a special TFfrac used in the BM25 model 
K is proportional to the pivoted document length 
(pivdl(d; c) = dl=avgdl(c)) and involves adjustment 
parameters (k1, b). 
The common de
nition is: 
KBM25;k1;b(d; c) := k1  (b  pivdl(d; c) + (1  b)) 
For b = 1, K is equal to k1 for average documents, less than 
k1 for short documents and greater than k1 for long 
documents. 
Large b and k1 lead to a strong variation of K with an high 
impact on the retrieval score 
Documents shorter than the average have an advantage over 
documents longer than the average 43 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
Inverse Document Frequency IDF 
The IDF (inverse document frequency) is the negative 
logarithm of the DF (document frequency). 
Idea: the less documents a term appears in, the more 
discrimiative or 'informative' it is 
44 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
DF Variants 
DF(t; c) is a quanti
cation of the document frequency, df(t; c). 
The main variants are: 
df(t; c) := dftotal(t; c) := nD(t; c) 
dfsum(t; c) := 
nD(t; c) 
ND(c) 
= PD(tjc) 
 
= 
df(t; c) 
ND(c) 
 
dfsum,smooth(t; c) := 
nD(t; c) + 0:5 
ND(c) + 1 
dfBIR(t; c) := 
nD(t; c) 
ND(c)  nD(t; c) 
dfBIR,smooth(t; c) := 
nD(t; c) + 0:5 
ND(c)  nD(t; c) + 0:5 
45 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
IDF Variants 
IDF(t; c) is the negative logarithm of a DF quanti
cation. The 
main variants are: 
idftotal(t; c) := log dftotal(t; c) 
idf(t; c) := idfsum(t; c) := log dfsum(t; c) = log PD(tjc) 
idfsum,smooth(t; c) := log dfsum,smooth(t; c) 
idfBIR(t; c) := log dfBIR(t; c) 
idfBIR,smooth(t; c) := log dfBIR,smooth(t; c) 
IDF is high for rare terms and low for frequent terms. 
46 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
Burstiness 
A term is bursty is it occurs often in the documents in which 
it occurs 
Burstiness is measured by the average term frequency in the 
elite set: 
avgtf(t; c) = 
nL(t; c) 
nD(t; c) 
Intuition (relevant vs. non-relevant documents): 
A good term is rare (not frequent, high IDF) and solitude (not 
bursty, low avgtf) in all documents (all non-relevant 
documents) 
Among relevant documents, a good term is frequent (low IDF, 
appears in many relevant documents) and bursty (high avgtf) 
47 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
IDF and Burstiness 
5 
4.5 
4 
3.5 
3 
2.5 
2 
1.5 
1 
0.5 
0 
0 0.2 0.4 0.6 0.8 1 
PD(t|c) 
idf(t,c) 
l(t,c) = 2 
l(t,c) = 1 
BURSTY 
FREQUENT 
l(t,c) = 4 
10 
8 
6 
4 
2 
BURSTY 
RARE 
avgtf(t,c) 
SOLITUDE 
RARE 
SOLITUDE 
FREQUENT 
0.1 0.2 0.3 0.4 0.5 P (t|c) 
BIR 
Example: nD(t1; c) = nD(t2; c) = 1; 000. Same IDF. 
nL(t1; d) = 1 for 1,000 documents, whereas 
nL(t2; d) = 1 for 999 documents, and nL(t2) = 1; 001 for one doc. 
nL(t1; c) = 1; 000 and nL(t2; c) = 2; 000. 
avgtf(t1; c) = 1 and avgtf(t2; c) = 2. 
(t; c) = avgtf(t; c)  PD(tjc) 
48 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
TF-IDF Term Weight 
De
nition (TF-IDF term weight wTF-IDF) 
wTF-IDF(t; d; q; c) := TF(t; d)  TF(t; q)  IDF(t; c) 
49 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
TF-IDF RSV 
De
nition (TF-IDF retrieval status value RSVTF-IDF) 
RSVTF-IDF(d; q; c) := 
X 
t 
wTF-IDF(t; d; q; c) 
RSVTF-IDF(d; q; c) = 
X 
t 
TF(t; d)  TF(t; q)  IDF(t; c) 
50 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
Probabilistic IDF: Probability of Being Informative 
IDF so far is not probabilistic 
In probabilistic scenarios, a normalised IDF value such as 
0  
idf(t; c) 
maxidf(c) 
 1 
can be useful 
maxidf(c) := log 1 
ND(c) = log(ND(c)) is 
8 9 10 11 12 
the maximal value of idf(t; c) (when a term 
occurs only in 1 document) 0 50000 100000 150000 200000 
Nd(c) 
maxidf 
The normalisation does not aect the ranking 
51 / 133
IR Models 
Foundations of IR Models 
TF-IDF 
Probability that term t is informative 
A probabilistic semantics of a max-normalised IDF can be achieved 
by introducing an informativeness-based probability, 
[Roelleke, 2003], as opposed to the normal notion of 
occurrence-based probability, and we denote the probability as 
P(t informsjc), to contrast it from the usual 
P(tjc) := P(t occursjc) = nD(t;c) 
ND(c) . 
52 / 133
Foundations of IR Models 
PRF: The Probability of Relevance 
Framework
IR Models 
Foundations of IR Models 
PRF: The Probability of Relevance Framework 
PRF: The Probability of Relevance Framework 
Relevance is at the core of any information retrieval model 
With r denoting relevance, d a document and q a query, the 
probability that a document is relevant is 
P(r jd; q) 
54 / 133
IR Models 
Foundations of IR Models 
PRF: The Probability of Relevance Framework 
Probability Ranking Principle 
[Robertson, 1977],The Probability Ranking Principle (PRP) in 
IR, describes the PRP as a framework to discuss formallywhat is 
a good ranking? [Robertson, 1977] quotes Cooper's formal 
statement of the PRP: 
If a reference retrieval system's response to each request 
is a ranking of the documents in the collections in order 
of decreasing probability of usefulness to the user who 
submitted the request, ..., then the overall eectiveness 
of the system ... will be the best that is obtainable on 
the basis of that data. 
Formally, we can capture the principle as follows. Let A and B be 
rankings. Then, a ranking A is better than a ranking B if at every 
rank, the probability of satisfaction in A is higher than for B, i.e.: 
8rank : P(satisfactoryjrank; A)  P(satisfactoryjrank;B) 55 / 133
IR Models 
Foundations of IR Models 
PRF: The Probability of Relevance Framework 
PRF: Illustration 
Example (Probability of relevance) 
Let three users u1; u2; u3 have judged document-query pairs. 
User Doc Query Judgement R 
u1 d1 q1 r 
u1 d2 q1 r 
u1 d3 q1 r 
u2 d1 q1 r 
u2 d2 q1 r 
u2 d3 q1 r 
u3 d1 q1 r 
u3 d2 q1 r 
u4 d1 q1 r 
u4 d2 q1 r 
PU(r jd1; q1) = 3=4, and PU(r jd2; q1) = 1=4, and PU(r jd3; q1) = 0=2. 
Subscript in PU: event space, a set of users. 
Total probability: 
PU(r jd; q) = 
X 
u2U 
P(r jd; q; u)  P(u) 
56 / 133
IR Models 
Foundations of IR Models 
PRF: The Probability of Relevance Framework 
Bayes Theorem 
Relevance judgements can be incomplete 
In any case, for a new query we often do not have judgements 
The probability of relevance is estimated via Bayes' 
Theorem: 
P(r jd; q) = 
P(d; qjr )  P(r ) 
P(d; q) 
= 
P(d; q; r ) 
P(d; q) 
Decision whether or not to retrieve a document is based on 
the so-calledBayesian decision rule: 
retrieve document d, if the probability of relevance is greater 
than the probability of non-relevance: 
retrieve d for q if P(r jd; q)  P(r jd; q) 
57 / 133
IR Models 
Foundations of IR Models 
PRF: The Probability of Relevance Framework 
Probabilistic Odds, Rank Equivalence 
We can express the probabilistic odds of relevance 
O(r jd; q) = 
P(r jd; q) 
P(r jd; q) 
= 
P(d; q; r ) 
P(d; q; r ) 
= 
P(d; qjr ) 
P(d; qjr ) 
 
P(r ) 
P(r ) 
Since P(r )=P(r ) is a constant, the following rank equivalence 
holds: 
O(r jd; q) rank = 
P(d; qjr ) 
P(d; qjr ) 
Often we don't need the exact probability, but an 
easier-to-compute value that is rank equivalent (i.e. it preserves the 
ranking) 
58 / 133
IR Models 
Foundations of IR Models 
PRF: The Probability of Relevance Framework 
Probabilistic Odds 
The document-pair probabilities can be decomposed in two ways: 
P(d; qjr ) 
P(djq; r )  P(qjr ) 
= 
P(d; qjr ) 
P(djq; r )  P(qjr ) 
() BIR/Poisson/BM25) 
= 
P(qjd; r )  P(djr ) 
P(qjd; r )  P(djr ) 
() LM?) 
The equation where d depends on q, is the basis of BIR, 
Poisson and BM25...document likelihood P(djq; r ) 
The equation where q depends on d has been related to LM, 
P(qjd), [Laerty and Zhai, 2003], Probabilistic Relevance 
Models Based on Document and Query Generation 
This relationship and the assumptions required to establish it, 
are controversial [Luk, 2008] 
59 / 133
IR Models 
Foundations of IR Models 
PRF: The Probability of Relevance Framework 
Documents as Feature Vectors 
The next step represents document d as a 
vector ~d = (f1; : : : ; fn) in a space of features ~f 
i : 
P(djq; r ) = P(~djq; r ) 
A feature could be, for example, the frequency of a word 
(term), the document length, document creation time, time of 
last update, document owner, number of in-links, or number 
of out-links. 
See also the Vector Space Model in the Introduction { here 
we used term weights as features 
Assumptions based on features: 
Feature independence assumption 
Non-query term assumption 
Term frequency split 
60 / 133
IR Models 
Foundations of IR Models 
PRF: The Probability of Relevance Framework 
Feature Independence Assumption 
The features (terms) are independent events: 
P(~djq; r )  
Y 
i 
P(fi jq; r ) 
Weaker assumption for the fraction of feature probabilities 
(linked dependence): 
P(djq; r ) 
P(djq; r ) 
 
Y 
i 
P(fi jq; r ) 
P(fi jq; r ) 
Here it is not required to distinguish between these two 
assumptions 
P(fi jq; r ) and P(fi jq; r ) may be estimated for instance by 
means of relevance judgements 
61 / 133
IR Models 
Foundations of IR Models 
PRF: The Probability of Relevance Framework 
Non-Query Term Assumption 
Non-query terms can be ignored for retrieval (ranking) 
This reduces the number of features/terms/dimensions to 
consider when computing probabilities 
For non-query terms, the feature probability is the same in relevant 
documents and non-relevant documents. 
for all non-query terms: 
P(fi jq; r ) 
P(fi jq; r ) 
= 1 
Then 
62 / 133
IR Models 
Foundations of IR Models 
PRF: The Probability of Relevance Framework 
Term Frequency Split 
The product over query terms is split into two parts 
First part captures the fi  0 features, i.e. the document terms 
Second part captures the fi = 0 features, i.e. the 
non-document terms 
63 / 133
Foundations of IR Models 
BIR: Binary Independence Retrieval
IR Models 
Foundations of IR Models 
BIR: Binary Independence Retrieval 
BIR: Binary Independence Retrieval 
The BIR instantiation of the PRF assumes the vector 
components to be binary term features, 
i.e. ~d = (x1; x2; : : : ; xn), where xi 2 f0; 1g 
Term occurrences are represented in a binary feature vector ~d 
in the term space 
The event xt = 1 is expressed as t, and xt = 0 as t 
65 / 133
IR Models 
Foundations of IR Models 
BIR: Binary Independence Retrieval 
BIR Term Weight 
We save the derivation ... after few steps from O(r jd; q) ... 
Event Space: d and q are binary vectors. 
De
nition (BIR term weight wBIR) 
wBIR(t; r ; r ) := log 
 
PD(tjr ) 
PD(tjr ) 
 
PD(t 
jr ) 
PD(t 
jr ) 
 
Simpli
ed form (referred to as F1) considering term presence only: 
wBIR;F1(t; r ; c) := log 
PD(tjr ) 
PD(tjc) 
66 / 133
IR Models 
Foundations of IR Models 
BIR: Binary Independence Retrieval 
BIR RSV 
De
nition (BIR retrieval status value RSVBIR) 
RSVBIR(d; q; r ; r ) := 
X 
t2dq 
wBIR(t; r ; r ) 
RSVBIR(d; q; r ; r ) = 
X 
t2dq 
log 
P(tjr )  P(t 
jr ) 
P(tjr )  P(t 
jr ) 
67 / 133
IR Models 
Foundations of IR Models 
BIR: Binary Independence Retrieval 
BIR Estimations 
Relevance judgements for 20 documents (example 
from [Fuhr, 1992]): 
di 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
x1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 
x2 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 
R r r r r r r r r r r r r r r r r r r r r 
ND(r ) = 12;ND(r ) = 8; P(t1jr ) = P(x1 = 1jr ) = 8=12 = 2=3 
P(t1jr ) = 2=8 = 1=4; P(  t1jr ) = 4=12 = 1=3; P(  t1jr ) = 6=8 = 3=4 
Dito for t2. 
68 / 133
IR Models 
Foundations of IR Models 
BIR: Binary Independence Retrieval 
Missing Relevance Information I 
Estimation of non-relevant documents: 
Take collection-wide term probability as approximation (r  c) 
P(tjr )  P(tjc) 
Use the set of all documents minus the set of relevant 
documents (r = cnr ) 
P(tjr )  
nD(t; c)  nD(t; r ) 
ND(c)  ND(r ) 
69 / 133
IR Models 
Foundations of IR Models 
BIR: Binary Independence Retrieval 
Missing Relevance Information II 
Empty set problem: For ND(r ) = ; (no relevant documents), 
P(tjr ) is not de
ned. Smoothing deals with this situation. 
Add the query to the set of relevant documents and the 
collection: 
P(tjr ) = 
nd (t; r ) + 1 
ND(r ) + 1 
Other forms of smoothing, e.g. 
P(tjr ) = 
nd (t; r ) + 0:5 
ND(r ) + 1 
This variant may be justi
ed by Laplace's law of succession. 
70 / 133
IR Models 
Foundations of IR Models 
BIR: Binary Independence Retrieval 
RSJ Term Weight 
Recall wBIR;F1(t; r ; c) := log PD(tjr ) 
PD(tjc) 
De
nition (RSJ term weight wRSJ) 
The RSJ term weight is a smooth BIR term weight. 
The probability estimation is as follows: 
P(tjr ) := (nD(t; r ) + 0:5)=(ND(r ) + 1); 
P(tjr ) := ((nD(t; c)+1)(nD(t; r )+0:5))=((ND(c)+2)(ND(r )+1)). 
wRSJ;F4(t; r ; r ; c) := 
log 
 
(nd (t; r ) + 0:5)=(ND(r )  nd (t; r ) + 0:5) 
(nD(t; c)nD(t; r )+0:5)=((ND(c)ND(r ))  (nD(t; c)nD(t; r )) + 0:5) 
 
71 / 133
Foundations of IR Models 
Poisson and 2-Poisson
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
Poisson Model I 
Less known model; however, there are good reasons to look at this 
model 
Demystify the Poisson probability { some research students 
resign when hearingPoisson ;-) 
Next to the BIR model the natural instantiation of a PRF 
model; the BIR model is a special case of the Poisson model 
The 2-Poisson probability is arguably the foundation of the 
BM25-TF quanti
cation [Robertson and Walker, 1994],Some 
Simple Eective Approximations to the 2-Poisson Model. 
73 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
Poisson Model II 
The Poisson probability is a model of randomness. Divergence 
from randomness (DFR), which is based on the probability 
P(t 2 djcollection) = P(tfd  0jcollection). The probability 
P(tfd jcollection) can be estimated by a Poisson probability. 
The Poisson parameter (t; c) = nL(t; c)=ND(c), i.e. the 
average number of term occurrences, relates Document-based 
and Location-based probabilities. 
avgtf(t; c)  PD(tjc) = (t; c) = avgdl(c)  PL(tjc) 
We refer to this relationship as Poisson Bridge since the 
average term frequency is the parameter of the Poisson 
probability. 
74 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
Poisson Model III 
The Poisson model yields a foundation of TF-IDF (see Part II) 
The Poisson bridge helps to relate TF-IDF and LM (see 
Part II) 
75 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
Poisson Distribution 
Let t be an event that occurs in average t times. The Poisson 
probability is: 
PPoisson;t (k) := 
kt 
k! 
 et 
76 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
Poisson Example 
The probability that k = 4 sunny days 
occur in a week, given the average 
 = p  n = 180=360  7 = 3:5 sunny 
days per week, is: 
0.00 0.05 0.10 0.15 0.20 
k 
P3.5(k) 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
PPoisson;=3:5(k = 4) = 
(3:5)4 
4! 
 e3:5  0:1888 
77 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
Poisson PRF: Basic Idea 
The Poisson model could be used to estimate the document 
probability P(djq; r ) as the product of the probabilities of the 
within-document term frequencies kt : 
P(djq; r ) = P(~djq; r ) = 
Y 
t 
P(kt jq; r ) 
kt := tfd := nL(t; d) 
~d is a vector of term frequencies (cf. BIR: binary vector of 
occurrence/non-occurrence) 
78 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
Poisson PRF: Odds 
Rank-equivalence, non-query term assumption and probabilistic 
odds lead us to: 
O(r jd; q) rank = 
Y 
t2q 
P(kt jr ) 
P(kt jr ) 
; kt := tfd := nL(t; d) 
Splitting the product into document and non-document terms 
yields: 
O(r jd; q) rank = 
Y 
t2dq 
P(kt jr ) 
P(kt jr ) 
 
Y 
t2qnd 
P(0jr ) 
P(0jr ) 
79 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
Poisson PRF: Meaning of Poisson Probability 
For the set r (r ) and a document d of length dl, we look at: 
How many times would we expect to see a term t in d? 
How many times do we actually observe t in d (= kt)? 
P(kt jr ) is highest if our expectation is met by our observation. 
(t; d; r ) = dl  PL(tjr ) is a number of expected occurrences in a 
document with length dl (how many times will we draw t if we had 
dl trials?) 
P(kt jr ) = PPoisson;(t;d;r )(kt ): probability to observe kt occurrences 
of term t in dl trials 
80 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
Poisson PRF: Example 
dl = 5; PL(tjr ) = 5=10 = 1=2 
(t; d; r ) = 5  1=2 = 2:5 
P(2jr ) = PPoisson;2:5(2) = 0:2565156 
0.00 0.05 0.10 0.15 0.20 0.25 
k 
P2.5(k) 
0 1 2 3 4 5 6 7 8 9 10 
81 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
Poisson Term Weight 
Event Space: d and q are frequency vectors. 
De
nition (Poisson term weight wPoisson) 
wPoisson(t; d; q; r ; r ) := TF(t; d)  log 
(t; d; r ) 
(t; d; r ) 
wPoisson(t; d; q; r ; r ) = TF(t; d)  log 
PL(tjr ) 
PL(tjr ) 
(t; d; r ) = dl  PL(tjr ) 
TF(t; d) = kt (smarter TF quanti
cation possible) 
82 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
Poisson RSV 
De
nition (Poisson retrieval status value RSVPoisson) 
RSVPoisson(d; q; r ; r ) := 
2 
4 
X 
t2dq 
wPoisson(t; d; q; r ; r ) 
3 
5+ 
len normPoisson 
RSVPoisson(d; q; r ; r ) = 
2 
4 
X 
t2dq 
TF(t; d)  log 
PL(tjr ) 
PL(tjr ) 
3 
5+ 
dl  
X 
t 
(PL(tjr )  PL(tjr )) 
83 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
2-Poisson 
Viewed as a motivation for the BM25-TF quanti
cation, 
[Robertson and Walker, 1994], Simple Approximations to the 
2-Poisson Model. In an exchange with Stephen Robertson, he 
explained: 
The investigation into the 2-Poisson probability 
motivated the BM25-TF quanti
cation.Regarding the 
combination of TF and RSJ weight in BM25, TF can be 
viewed as a factor to re
ect the uncertainty about 
whether the RSJ weight wRSJ is correct; for terms with a 
relatively high within-document TF, the weight is correct; 
for terms with a relatively low within-document TF, there 
is uncertainty about the correctness. In other words, the 
TF factor can be viewed as a weight to adjust the impact 
of the RSJ weight. 
84 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
2-Poisson Example: How many cars to expect? I 
How many cars are expected on a given commuter car park? 
Approach 1: In average, there are 700 cars per week. The 
daily average is:  = 700=7 = 100 cars/day. 
Then, P=100(k) is the probability that there are k cars 
wanting to park on a given day. 
Estimation is less accurate than an estimation based on a 
2-dimensional model { Mo-Fr are the busy days, and on 
week-ends, the car park is nearly empty. 
This means that a distribution such as (130, 130, 130, 130, 
130, 25, 25) is more likely than 100 each day. 
85 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
2-Poisson Example: How many cars to expect? II 
Approach 2: In a more detailed analysis, we observe 650 cars 
Mon-Fri (work days) and 50 cars Sat-Sun (week-end days). 
The averages are: 1 = 650=5 = 130 cars/work-day, 
2 = 50=2 = 25 cars/we-day. 
Then, P1=5=7;1=130;2=2=7;2=25(k) is the 2-dimensional 
Poisson probability that there are k cars looking for a car 
park. 
86 / 133
IR Models 
Foundations of IR Models 
Poisson and 2-Poisson 
2-Poisson Example: How many cars to expect? III 
Main idea of the 2-Poisson probability: combine (interpolate, 
mix) two Poisson probabilities 
P2-Poisson;1;2;(kt) :=   
kt 
1 
kt ! 
 e1 + (1  )  
kt 
2 
kt ! 
 e2 
Such a mixture model is also used with LM (later) 
1 could be over all documents whereas 2 could be over an 
elite set (e.g. documents that contain at least one query term) 
87 / 133
Foundations of IR Models 
BM25
IR Models 
Foundations of IR Models 
BM25 
BM25 
One of the most prominent IR models 
The ingredients have already been prepared: 
TFBM25;K (t; d) 
wRSJ;F4(t; r ; r ; c) 
Wikipedia 2014 formulation: Given a query Q containing 
keywords q1; : : : ; qn: 
score(D;Q) = 
Xn 
i=1 
IDF(qi )  
f (qi ;D)  (k1 + 1) 
f (qi ;D) + k1  (1  b + b  jDj 
avgdl ) 
with f (qi ;D) = nL(qi ;D) (qi 's term frequency) 
Here: 
Ignore (k1 + 1) (ranking invariant!) 
Use wRSJ;F4(t; r ; r ; c). If relevance information is missing: 
wRSJ;F4(t; r ; r ; c)  IDF(t) 
89 / 133
IR Models 
Foundations of IR Models 
BM25 
BM25 Term Weight 
De
nition (BM25 term weight wBM25) 
wBM25;k1;b;k3(t; d; q; r ; r ) := 
X 
t2dq 
TFBM25;k1;b(t; d)  TFBM25;k3(t; q)  wRSJ(t; r ; r ) 
TFBM25;k1;b(t; d) := 
tfd 
tfd + k1  (b  pivdl(d) + (1  b)) 
90 / 133
IR Models 
Foundations of IR Models 
BM25 
BM25 RSV 
De
nition (BM25 retrieval status value RSVBM25) 
RSVBM25;k1;b;k2;k3(d; q; r ; r ; c) := 2 
4 
X 
t2dq 
wBM25;k1;b;k3(t; d; q; r ; r ) 
3 
5 + len normBM25;k2 
Additional length normalisation 
len normBM25;k2(d; q; c) := k2  ql  
avgdl(c)  dl 
avgdl(c) + dl 
(ql is query length). Suppresses long documents (negative length 
norm). 
91 / 133
IR Models 
Foundations of IR Models 
BM25 
BM25: Summary 
We have discussed the foundations of BM25, an instance of the 
probability relevance framework (PRF). 
TF quanti
cation TFBM25 using TFfrac and the pivoted 
document length 
TFBM25 can be related to probability theory through 
semi-subsumed event occurrences (out of scope of this talk) 
RSJ term weight wRSJ as smooth variant of the BIR term 
weight (0.5-smoothing can be explained through Laplace's 
law of succession) 
92 / 133
IR Models 
Foundations of IR Models 
BM25 
Document- and Query-Likelihood Models 
We are now leaving the world of document-likelihood models and 
move towards LM, the query-likelihood model. 
P(d; qjr ) 
P(d; qjr ) 
= 
P(djq; r )  P(qjr ) 
P(djq; r )  P(qjr ) 
() BIR/Poisson/BM25) 
= 
P(qjd; r )  P(djr ) 
P(qjd; r )  P(djr ) 
() LM?) 
93 / 133
Foundations of IR Models 
LM: Language Modelling
IR Models 
Foundations of IR Models 
LM: Language Modelling 
Language Modelling (LM) 
Popular retrieval model since the late 90s 
[Ponte and Croft, 1998, Hiemstra, 2000, 
Croft and Laerty, 2003] 
Compute probability P(qjd) that a document generates a 
query 
Q 
q is a conjunction of term events: P(qjd) / 
P(tjd) 
Zero-probability problem: terms t that don't appear in d lead 
to P(tjd) = 0 and hence P(qjd) = 0 
Mix the within-document term probability P(tjd) and the 
collection-wide term probability P(tjc) 
95 / 133
IR Models 
Foundations of IR Models 
LM: Language Modelling 
Probability Mixture 
Let three events x; y; z, and two conditional probabilities P(zjx) 
and P(zjy) be given. 
Then, P(zjx; y) can be estimated as a linear combination/mixture 
of P(zjx) and P(zjy). 
P(zjx; y)  x  P(zjx) + (1  x )  P(zjy) 
Here, 0  x  1 is the mixture parameter. 
The mixture parameters can be constant (Jelinek-Mercer mixture), 
or can be set proportional to the total probabilities. 
96 / 133
IR Models 
Foundations of IR Models 
LM: Language Modelling 
Probability Mixture Example 
Let P(sunny;warm; rainy; dry; windyjglasgow) describe the 
probability that a day in Glasgow is sunny, the next day is 
warm, the next rainy, and so forth 
If for one event (e.g. sunny), the probability were zero, then 
the probability of the conjunction (product) is zero. A mixture 
solves the problem. 
For example, mix P(xjglasgow) with P(xjuk) where 
P(xjuk)  0 for each event x. 
Then, in a week in winter, when P(sunnyjglasgow) = 0, and 
for the whole of the UK, the weather oce reports 2 of 7 days 
as sunny, the mixed probability is: 
P(sunnyjglasgow; uk) =   
0 
7 
+ (1  )  
2 
7 
97 / 133
IR Models 
Foundations of IR Models 
LM: Language Modelling 
LM1 Term Weight 
Event Space: d and q are sequences of terms. 
De
nition (LM1 term weight wLM1) 
P(tjd; c) := d  P(tjd) + (1  d )  P(tjc) 
wLM1;d (t; d; q; c) := TF(t; q)  log 
=P(tjd;c) z }| { 
(d  P(tjd) + (1  d )  P(tjc)) 
P(tjd): foreground probability 
P(tjc): background probability 
98 / 133
IR Models 
Foundations of IR Models 
LM: Language Modelling 
Language Modelling Independence Assumption 
We assume terms are independent, meaning the product over the 
term probabilities P(tjd) is equal to P(qjd): 
Probability that mixture of background and foreground 
probabilities generates query as sequence of terms: 
t IN q: sequence of terms (e.g. 
q = (sailing, boat, sailing)) 
t 2 q: set  
of terms; TF(sailing; q) = 2 
Note: log 
P(tjd; c)TF(t;q) 
 
= TF(t; q)  log (P(tjd; c)) 
99 / 133
IR Models 
Foundations of IR Models 
LM: Language Modelling 
LM1 RSV 
De
nition (LM1 retrieval status value RSVLM1) 
RSVLM1;d (d; q; c) := 
X 
t2q 
wLM1;d (t; d; q; c) 
100 / 133
IR Models 
Foundations of IR Models 
LM: Language Modelling 
JM-LM Term Weight 
For constant , the score can be divided by 
Q 
t IN q(1  ). This 
leads to the following equation, [Hiemstra, 2000]: 
P(qjd; c) 
P(qjc)  
Q 
t IN q(1  ) 
= 
Y 
t2dq 
 
1 + 
 
1   
 
P(tjd) 
P(tjc) 
TF(t;q) 
De
nition (JM-LM (Jelinek-Mercer) term weight wJM-LM) 
wJM-LM;(t; d; q; c) := TF(t; q)  log 
 
1 + 
 
1   
 
P(tjd) 
P(tjc) 
 
101 / 133
IR Models 
Foundations of IR Models 
LM: Language Modelling 
JM-LM RSV 
De
nition (JM-LM retrieval status value RSVJM-LM) 
RSVJM-LM;(d; q; c) := 
X 
t2dq 
wJM-LM;(t; d; q; c) 
RSVJM-LM;(d; q; c) := 
X 
t2dq 
TF(t; q)  log 
 
1 + 
 
1   
 
P(tjd) 
P(tjc) 
 
We only need to look at terms that appear in both document and 
query. 
102 / 133
IR Models 
Foundations of IR Models 
LM: Language Modelling 
Dirichlet-LM Term Weight 
Document-dependent mixture parameter: D = dl 
dl+ 
De
nition (Dirich-LM term weight wDirich-LM) 
wDirich-LM;(t; d; q; c) := TF(t; q)  log 
 
 
+jdj 
+ 
jdj 
jdj+ 
 
P(tjd) 
P(tjc) 
 
103 / 133
IR Models 
Foundations of IR Models 
LM: Language Modelling 
Dirich-LM RSV 
De

More Related Content

Viewers also liked

Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalCarsten Eickhoff
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsMounia Lalmas-Roelleke
 
Quantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information RetrievalQuantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information RetrievalIngo Frommholz
 
Dynamic Information Retrieval Tutorial - SIGIR 2015
Dynamic Information Retrieval Tutorial - SIGIR 2015Dynamic Information Retrieval Tutorial - SIGIR 2015
Dynamic Information Retrieval Tutorial - SIGIR 2015Marc Sloan
 
كورس اساسيات استرجاع المعلومات المحاضره 1
كورس اساسيات استرجاع المعلومات المحاضره 1 كورس اساسيات استرجاع المعلومات المحاضره 1
كورس اساسيات استرجاع المعلومات المحاضره 1 Hind Altwirqi
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)9866825059
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsMatthew Lease
 
Jarrar: Introduction to Information Retrieval
Jarrar: Introduction to Information RetrievalJarrar: Introduction to Information Retrieval
Jarrar: Introduction to Information RetrievalMustafa Jarrar
 
تطور نظم إسترجاع المعلومات
تطور نظم إسترجاع المعلوماتتطور نظم إسترجاع المعلومات
تطور نظم إسترجاع المعلوماتAhmed Al-ajamy
 
نظم استرجاع المعلومات باللغة العربية
نظم استرجاع المعلومات باللغة العربيةنظم استرجاع المعلومات باللغة العربية
نظم استرجاع المعلومات باللغة العربيةBeni-Suef University
 
العوامل المؤثرة في كفاءة عمليات استرجاع المعلومات
العوامل المؤثرة في كفاءة عمليات استرجاع المعلوماتالعوامل المؤثرة في كفاءة عمليات استرجاع المعلومات
العوامل المؤثرة في كفاءة عمليات استرجاع المعلوماتالدكتور طلال ناظم الزهيري
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
محاضرتي الاولى
محاضرتي الاولىمحاضرتي الاولى
محاضرتي الاولىAmany Megahed
 
التعريف بنظم استرجاع المعلومات
التعريف بنظم استرجاع المعلوماتالتعريف بنظم استرجاع المعلومات
التعريف بنظم استرجاع المعلوماتHuda Farhan
 
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...Ingo Frommholz
 

Viewers also liked (20)

Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
IR
IRIR
IR
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Quantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information RetrievalQuantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information Retrieval
 
Database security
Database securityDatabase security
Database security
 
Dynamic Information Retrieval Tutorial - SIGIR 2015
Dynamic Information Retrieval Tutorial - SIGIR 2015Dynamic Information Retrieval Tutorial - SIGIR 2015
Dynamic Information Retrieval Tutorial - SIGIR 2015
 
كورس اساسيات استرجاع المعلومات المحاضره 1
كورس اساسيات استرجاع المعلومات المحاضره 1 كورس اساسيات استرجاع المعلومات المحاضره 1
كورس اساسيات استرجاع المعلومات المحاضره 1
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
 
Jarrar: Introduction to Information Retrieval
Jarrar: Introduction to Information RetrievalJarrar: Introduction to Information Retrieval
Jarrar: Introduction to Information Retrieval
 
تطور نظم إسترجاع المعلومات
تطور نظم إسترجاع المعلوماتتطور نظم إسترجاع المعلومات
تطور نظم إسترجاع المعلومات
 
نظم استرجاع المعلومات باللغة العربية
نظم استرجاع المعلومات باللغة العربيةنظم استرجاع المعلومات باللغة العربية
نظم استرجاع المعلومات باللغة العربية
 
العوامل المؤثرة في كفاءة عمليات استرجاع المعلومات
العوامل المؤثرة في كفاءة عمليات استرجاع المعلوماتالعوامل المؤثرة في كفاءة عمليات استرجاع المعلومات
العوامل المؤثرة في كفاءة عمليات استرجاع المعلومات
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
محاضرتي الاولى
محاضرتي الاولىمحاضرتي الاولى
محاضرتي الاولى
 
التعريف بنظم استرجاع المعلومات
التعريف بنظم استرجاع المعلوماتالتعريف بنظم استرجاع المعلومات
التعريف بنظم استرجاع المعلومات
 
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...
Polyrepresentation in Complex (Book) Search Tasks - How can we use what the o...
 
Database Security
Database SecurityDatabase Security
Database Security
 

Similar to Information Retrieval Models Part I

2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_fariaPaulo Faria
 
Interactive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum TheoryInteractive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum TheoryIngo Frommholz
 
IRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF MetricIRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF MetricIRJET Journal
 
Cambridge 2014 Complexity, tails and trends
Cambridge 2014  Complexity, tails and trendsCambridge 2014  Complexity, tails and trends
Cambridge 2014 Complexity, tails and trendsNick Watkins
 
Slides
SlidesSlides
Slidesbutest
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLKostis Kyzirakos
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdfHabtamu100
 
Research on Space Target Recognition Algorithm Based on Empirical Mode Decomp...
Research on Space Target Recognition Algorithm Based on Empirical Mode Decomp...Research on Space Target Recognition Algorithm Based on Empirical Mode Decomp...
Research on Space Target Recognition Algorithm Based on Empirical Mode Decomp...Nooria Sukmaningtyas
 
Recommender systems
Recommender systemsRecommender systems
Recommender systemsVenkat Raman
 
9212018 Topic Discussion for Ch3 Are We There Yetht.docx
9212018 Topic Discussion for Ch3 Are We There Yetht.docx9212018 Topic Discussion for Ch3 Are We There Yetht.docx
9212018 Topic Discussion for Ch3 Are We There Yetht.docxsleeperharwell
 
Calculating Projections via Type Checking
Calculating Projections via Type CheckingCalculating Projections via Type Checking
Calculating Projections via Type CheckingDaisuke BEKKI
 
Characterisation of the properties use of paper by topographical analysis of ...
Characterisation of the properties use of paper by topographical analysis of ...Characterisation of the properties use of paper by topographical analysis of ...
Characterisation of the properties use of paper by topographical analysis of ...Christophe Mercier
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher
 

Similar to Information Retrieval Models Part I (20)

2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria2014-mo444-practical-assignment-02-paulo_faria
2014-mo444-practical-assignment-02-paulo_faria
 
Ir 09
Ir   09Ir   09
Ir 09
 
Interactive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum TheoryInteractive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum Theory
 
IRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF MetricIRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF Metric
 
Cambridge 2014 Complexity, tails and trends
Cambridge 2014  Complexity, tails and trendsCambridge 2014  Complexity, tails and trends
Cambridge 2014 Complexity, tails and trends
 
Wcre12b.ppt
Wcre12b.pptWcre12b.ppt
Wcre12b.ppt
 
Slides
SlidesSlides
Slides
 
UNIT 3 IRT.docx
UNIT 3 IRT.docxUNIT 3 IRT.docx
UNIT 3 IRT.docx
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
 
Chapter 4 IR Models.pdf
Chapter 4 IR Models.pdfChapter 4 IR Models.pdf
Chapter 4 IR Models.pdf
 
Research on Space Target Recognition Algorithm Based on Empirical Mode Decomp...
Research on Space Target Recognition Algorithm Based on Empirical Mode Decomp...Research on Space Target Recognition Algorithm Based on Empirical Mode Decomp...
Research on Space Target Recognition Algorithm Based on Empirical Mode Decomp...
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
LDA on social bookmarking systems
LDA on social bookmarking systemsLDA on social bookmarking systems
LDA on social bookmarking systems
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
9212018 Topic Discussion for Ch3 Are We There Yetht.docx
9212018 Topic Discussion for Ch3 Are We There Yetht.docx9212018 Topic Discussion for Ch3 Are We There Yetht.docx
9212018 Topic Discussion for Ch3 Are We There Yetht.docx
 
Calculating Projections via Type Checking
Calculating Projections via Type CheckingCalculating Projections via Type Checking
Calculating Projections via Type Checking
 
inteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access FrameworkinteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access Framework
 
Characterisation of the properties use of paper by topographical analysis of ...
Characterisation of the properties use of paper by topographical analysis of ...Characterisation of the properties use of paper by topographical analysis of ...
Characterisation of the properties use of paper by topographical analysis of ...
 
What Is Fourier Transform
What Is Fourier TransformWhat Is Fourier Transform
What Is Fourier Transform
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 

Recently uploaded

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Information Retrieval Models Part I

  • 1. IR Models Part I | Foundations Thomas Roelleke, Queen Mary University of London Ingo Frommholz, University of Bedfordshire Autumn School for Information Retrieval and Foraging Schloss Dagstuhl, September 2014 ASIRF Sponsors:
  • 2. IR Models Acknowledgements The knowledge presented in this tutorial and the Morgan & Claypool book is the result of many many discussions with colleagues. People involved in the production and reviewing: Gianna Amati and Djoerd Hiemstra (the experts), Diane Cerra and Gary Marchionini (Morgan & Claypool), Ricardo Baeza-Yates, Norbert Fuhr, and Mounia Lalmas. Thomas' PhD students (who had no choice): Jun Wang, Hengzhi Wu, Fred Forst, Hany Azzam, Sirvan Yahyaei, Marco Bonzanini, Miguel Martinez-Alvarez. Many more IR experts including Fabio Crestani, Keith van Rijsbergen, Stephen Robertson, Fabrizio Sebastiani, Arjen deVries, Tassos Tombros, Hugo Zaragoza, ChengXiang Zhai. And non-IR experts Fabrizio Smeraldi, Andreas Kaltenbrunner and Norman Fenton. 2 / 133
  • 3. IR Models Table of Contents 1 Introduction 2 Foundations of IR Models 3 / 133
  • 4. Introduction Warming Up Background: Time-Line of IR Models Notation
  • 6. IR Models Introduction Warming Up Information Retrieval Conceptual Model DD rel. judg. aQ b a Q Q D b r IR Q D D D Q D R [Fuhr, 1992] 6 / 133
  • 7. IR Models Introduction Warming Up Vector Space Model, Term Space Still one of the prominent IR frameworks is the Vector Space Model (VSM) A term space is a vector space where each dimension represents one term in our vocabulary If we have n terms in our collection, we get an n-dimensional term or vector space Each document and each query is represented by a vector in the term space 7 / 133
  • 8. IR Models Introduction Warming Up Formal Description Set of terms in our vocabulary: T = ft1; : : : ; tng T spans an n-dimensional vector space Document dj is represented by a vector of document term weights Query q is represented by a vector of query term weights 8 / 133
  • 9. IR Models Introduction Warming Up Document Vector Document dj is represented by a vector of document term weights dji 2 R: Document term weights can be computed, e.g., using tf and idf (see below) 9 / 133
  • 10. IR Models Introduction Warming Up Document Vector Document dj is represented by a vector of document term weights dji 2 R: Weight of term in document Document term weights can be computed, e.g., using tf and idf (see below) 10 / 133
  • 11. IR Models Introduction Warming Up Query Vector Like documents, a query q is represented by a vector of query term weights qi 2 R: ~q = 0 BB@ q1 q2 : : : qn 1 CCA qi denotes the query term weight of term ti qi is 0 if the term does not appear in the query. qi may be set to 1 if the term does appear in the query. Further query term weights are possible, for example 2 if the term is important 1 if the term is just nice to have" 11 / 133
  • 12. IR Models Introduction Warming Up Retrieval Function The retrieval function computes a retrieval status value (RSV) using a vector similarity measure, e.g. the scalar product: RSV (dj ; q) = ~dj ~q = Xn i=1 dji qi t t 1 2 q d d 1 2 Ranking of documents according to decreasing RSV 12 / 133
  • 13. IR Models Introduction Warming Up Example Query Query: side eects of drugs on memory and cognitive abilities ti Query ~q ~ d1 ~ d2 ~ d3 ~ d4 side eect 2 1 0.5 1 1 drug 2 1 1 1 1 memory 1 1 0 1 0 cognitive ability 1 0 1 1 0.5 RSV 5 4 6 4.5 Produces the ranking d3 d1 d4 d2 13 / 133
  • 14. IR Models Introduction Warming Up Term weights: Example Text In his address to the CBI, Mr Cameron is expected to say: Scotland does twice as much trade with the rest of the UK than with the rest of the world put together { trade that helps to support one million Scottish jobs.Meanwhile, Mr Salmond has set out six job-creating powers for Scotland that he said were guaranteed with a Yes vote in the referendum. During their televised BBC debate on Monday, Mr Salmond had challenged Better Together head Alistair Darling to name three job-creating powers that were being oered to the Scottish Parliament by the pro-UK parties in the event of a No vote. Source: http://www.bbc.co.uk/news/uk-scotland-scotland-politics-28952197 What are good descriptors for the text? Which are more, which are less important? Which are informative? Which are good discriminators? How can a machine answer these questions? 14 / 133
  • 15. IR Models Introduction Warming Up Frequencies The answer is counting Dierent assumptions: The more frequent a term appears in a document, the more suitable it is to describe its content Location-based count. Think of term positions or locations. In how many locations of a text do we observe the term? The term `scotland' appears in 2 out of 138 locations in the example text The less documents a term occurs in, the more discriminative or informative it is Document-based count. In how many documents do we observe the term? Think of stop-words like `the', `a' etc. Location- and document-based frequencies are the building blocks of all (probabilistic) models to come 15 / 133
  • 17. IR Models Introduction Background: Time-Line of IR Models Timeline of IR Models: 50s, 60s and 70s Zipf and Luhn: distribution of document frequencies; [Croft and Harper, 1979]: BIR without relevance; [Robertson and Sparck-Jones, 1976]: BIR; [Salton, 1971, Salton et al., 1975]: VSM, TF-IDF; [Rocchio, 1971]: Relevance feedback; [Maron and Kuhns, 1960]: On Relevance, Probabilistic Indexing, and IR 17 / 133
  • 18. IR Models Introduction Background: Time-Line of IR Models Timeline of IR Models: 80s [Cooper, 1988, Cooper, 1991, Cooper, 1994]: Beyond Boole, Probability Theory in IR: An Encumbrance; [Dumais et al., 1988, Deerwester et al., 1990]: Latent semantic indexing; [van Rijsbergen, 1986, van Rijsbergen, 1989]: P(d ! q); [Bookstein, 1980, Salton et al., 1983]: Fuzzy, extended Boolean 18 / 133
  • 19. IR Models Introduction Background: Time-Line of IR Models Timeline of IR Models: 90s [Ponte and Croft, 1998]: LM; [Brin and Page, 1998, Kleinberg, 1999]: Pagerank and Hits; [Robertson et al., 1994, Singhal et al., 1996]: Pivoted Document Length Normalisation; [Wong and Yao, 1995]: P(d ! q); [Robertson and Walker, 1994, Robertson et al., 1995]: 2-Poisson, BM25; [Margulis, 1992, Church and Gale, 1995]: Poisson; [Fuhr, 1992]: Probabilistic Models in IR; [Turtle and Croft, 1990, Turtle and Croft, 1991]: PIN's; [Fuhr, 1989]: Models for Probabilistic Indexing 19 / 133
  • 20. IR Models Introduction Background: Time-Line of IR Models Timeline of IR Models: 00s ICTIR 2009 and ICTIR 2011; [Roelleke and Wang, 2008]: TF-IDF Uncovered; [Luk, 2008, Robertson, 2005]: Event Spaces; [Roelleke and Wang, 2006]: Parallel Derivation of Models; [Fang and Zhai, 2005]: Axiomatic approach; [He and Ounis, 2005]: TF in BM25 and DFR; [Metzler and Croft, 2004]: LM and PIN's;[Robertson, 2004]: Understanding IDF; [Sparck-Jones et al., 2003]: LM and Relevance; [Croft and Laerty, 2003, Laerty and Zhai, 2003]: LM book; [Zaragoza et al., 2003]: Bayesian extension to LM; [Bruza and Song, 2003]: probabilistic dependencies in LM; [Amati and van Rijsbergen, 2002]: DFR; [Lavrenko and Croft, 2001]: Relevance-based LM; [Hiemstra, 2000]: TF-IDF and LM; [Sparck-Jones et al., 2000]: probabilistic model: status 20 / 133
  • 21. IR Models Introduction Background: Time-Line of IR Models Timeline of IR Models: 2010 and Beyond Models for interactive and dynamic IR (e.g. iPRP [Fuhr, 2008]) Quantum models [van Rijsbergen, 2004, Piwowarski et al., 2010] 21 / 133
  • 23. IR Models Introduction Notation Notation A tedious start ... but a must-have. Sets Locations Documents Terms Probabilities 23 / 133
  • 24. IR Models Introduction Notation Notation: Sets Notation description of events, sets, and frequencies t, d, q, c, r term t, document d, query q, collection c, rele- vant r Dc , Dr Dc = fd1; : : :g: set of Documents in collection c; Dr : relevant documents Tc , Tr Tc = ft1; : : :g: set of Terms in collection c; Tr : terms that occur in relevant documents Lc , Lr Lc = fl1; : : :g; set of Locations in collection c; Lr : locations in relevant documents 24 / 133
  • 25. IR Models Introduction Notation Notation: Locations Notation description of events, sets, and frequencies Traditional notation nL(t; d) number of Locations at which term t occurs in document d tf, tfd NL(d) number of Locations in docu- ment d (document length) dl nL(t; q) number of Locations at which term t occurs in query q qtf, tfq NL(q) number of Locations in query q (query length) ql 25 / 133
  • 26. IR Models Introduction Notation Notation: Locations Notation description of events, sets, and frequencies Traditional notation nL(t; c) number of Locations at which term t occurs in collection c TF, cf(t) NL(c) number of Locations in collec- tion c nL(t; r ) number of Locations at which term t occurs in the set Lr NL(r ) number of Locations in the set Lr 26 / 133
  • 27. IR Models Introduction Notation Notation: Documents Notation description of events, sets, and frequencies Traditional notation nD(t; c) number of Documents in which term t occurs in the set Dc of collection c nt , df(t) ND(c) number of Documents in the set Dc of collection c N nD(t; r ) number of Documents in which term t occurs in the set Dr of relevant documents rt ND(r ) number of Documents in the set Dr of relevant documents R 27 / 133
  • 28. IR Models Introduction Notation Notation: Terms Notation description of events, sets, and frequencies Traditional notation nT (d; c) number of Terms in docu- ment d in collection c NT (c) number of Terms in collec- tion c 28 / 133
  • 29. IR Models Introduction Notation Notation: Average and Pivoted Length Let u denote a collection associated with a set of documents. For example: u = c, or u = r , or u = r . Notation description of events, sets, and frequen- cies Traditional notation avgdl(u) average document length: avgdl(u) = NL(u)=ND(u) (avgdl if collection im- plicit) avgdl pivdl(d; u) pivoted document length: pivdl(d; u) = NL(d)=avgdl(u) = dl=avgdl(u) (pivdl(d) if collection implicit) pivdl (t; u) average term frequency over all docu- ments in Du: nL(t; u)=ND(u) avgtf(t; u) average term frequency over elite docu- ments in Du: nL(t; u)=nD(t; u) 29 / 133
  • 30. IR Models Introduction Notation Notation: Location-based Probabilities Notation Description of Probabili- ties Traditional notation PL(tjd) := nL(t;d) NL(d) Location-based within- document term probabil- ity P(tjd) = tfd jdj , jdj = dl = NL(d) PL(tjq) := nL(t;q) NL(q) Location-based within- query term probability P(tjq) = tfq jqj , jqj = ql = NL(q) PL(tjc) := nL(t;c) NL(c) Location-based within- collection term probabil- ity P(tjc) = tfc jcj , jcj = NL(c) PL(tjr ) := nL(t;r ) NL(r ) Location-based within- relevance term probabil- ity Event space PL: Locations (LM, TF) 30 / 133
  • 31. IR Models Introduction Notation Notation: Document-based Probabilities Notation Description of Probabilities Traditional notation PD(tjc) := nD(t;c) ND(c) Document-based within- collection term probability P(t) = nt N , N = ND(c) PD(tjr ) := nD(t;r ) ND(r ) Document-based within- relevance term probability P(tjr ) = rt R , R = ND(r ) PT (djc) := nT (d;c) NT (c) Term-based document proba- bility Pavg(tjc) := avgtf(t;c) avgdl(c) probability that t occurs in document with average length; avgtf(t; c) avgdl(c) Event space PD: Documents (BIR, IDF) 31 / 133
  • 32. IR Models Introduction Notation Toy Example Notation Value NL(c) 20 ND(c) 10 avgdl(c) 20/10=2 Notation Value doc1 doc2 doc3 NL(d) 2 3 3 pivdl(d; c) 2/2 3/2 3/2 Notation Value sailing boats nL(t; c) 8 6 nD(t; c) 6 5 PL(tjc) 8/20 6/20 PD(tjc) 6/10 5/10 (t; c) 8/10 6/10 avgtf(t; c) 8/6 6/5 32 / 133
  • 33. Foundations of IR Models TF-IDF PRF: The Probability of Relevance Framework BIR: Binary Independence Retrieval Poisson and 2-Poisson BM25 LM: Language Modelling PIN's: Probabilistic Inference Networks Relevance-based Models Foundations: Summary
  • 34. Foundations of IR Models TF-IDF
  • 35. IR Models Foundations of IR Models TF-IDF TF-IDF Still a very popular model Best known outside IR research, ery intuitive TF-IDF is not a model; it is just a weighting scheme in the vector space model TF-IDF is purely heuristic; it has no probabilistic roots. But: TF-IDF and LM are dual models that can be shown to be derived from the same root. Simpli
  • 36. ed version of BM25 35 / 133
  • 37. IR Models Foundations of IR Models TF-IDF TF Variants: TF(t; d) TFtotal(t; d) := lftotal(t; d) := nL(t; d) (= tfd ) TFsum(t; d) := lfsum(t; d) := nL(t; d) NL(d) = PL(tjd) = tfd dl TFmax(t; d) := lfmax(t; d) := nL(t; d) nL(tmax; d) TFlog(t; d) := lflog(t; d) := log(1 + nL(t; d)) (= log(1 + tfd )) TFfrac;K (t; d) := lffrac;K (t; d) := nL(t; d) nL(t; d) + Kd = tfd tfd + Kd TFBM25;k1;b(t; d) := ::: := nL(t; d) nL(t; d) + k1 (b pivdl(d; c) + (1 b)) 36 / 133
  • 38. IR Models Foundations of IR Models TF-IDF TF Variants: Collection-wide TF(t; c) Analogously to TF(t; d), the next de
  • 40. nes the variants of TF(t; c), the collection-wide term frequency. Not considered any further here. 37 / 133
  • 41. IR Models Foundations of IR Models TF-IDF TFtotal and TFlog 0 50 100 150 200 0 50 100 150 200 nL(t,d) tftotal 0 50 100 150 200 0 1 2 3 4 5 nL(t,d) tflog Bias towards documents with many terms (e.g. books vs. Twitter tweets) TFtotal: too steep assumes all occurrences are independent (same impact) TFlog: less impact to subsequent occurrences the base of the logarithm is ranking invariant, since it is a constant: TFlog;base(t; d) := ln(1 + tfd ) ln(base) 38 / 133
  • 42. IR Models Foundations of IR Models TF-IDF Logarithmic TF: Dependence Assumption The logarithmic TF assigns less impact to subsequent occurrences than the total TF does. This aspect becomes clear when reconsidering that the logarithm is an approximation of the harmonic sum: TFlog(t; d) = ln(1 + tfd ) 1 + 1 2 + : : : + 1 tfd tfd 0 Note: ln(n + 1) = R n+1 1 1 x dx. Whereas: TFtotal(t; d) = 1 + 1 + ::: + 1 The
  • 43. rst occurrence of a term counts in full, the second counts 1=2, the third counts 1=3, and so forth. This gives a particular insight into the type of dependence that is re ected by bending the total TF into a saturating curve. 39 / 133
  • 44. IR Models Foundations of IR Models TF-IDF TFsum, TFmax and TFfrac: Graphical Illustration 1.0 0.8 0.6 tfmax: NL(tmax d)=400 0.4 0.2 tfmax: NL(tmax d)=500 0.0 0 50 100 150 200 nL(t,d) tf tfsum: NL(d)=200 tfsum: NL(d)=2000 0 50 100 150 200 0.0 0.2 0.4 0.6 0.8 1.0 nL(t,d) tffrac tffrac K=1 tffrac K=5 tffrac K=10 tffrac K=20 tffrac K=100 tffrac K=200 tffrac K=1000 40 / 133
  • 45. IR Models Foundations of IR Models TF-IDF TFsum, TFmax and TFfrac: Analysis Document length normalisation (TFfrac: K may depend on document length) Usually TFmax yields higher TF-values than TFsum Linear TF variants are not really important anymore, since TFfrac (TFBM25) delivers better and more stable quality TFfrac yields relatively high TF-values already for small frequencies, and the curve saturates for large frequencies The good and stable performance of BM25 indicates that this non-linear nature is key for achieving good retrieval quality. 41 / 133
  • 46. IR Models Foundations of IR Models TF-IDF Fractional TF: Dependence Assumption What we refer to as fractional TF, is a ratio: ratio(x; y) = x x + y = tfd tfd + Kd = TFfrac;K(t; d) The ratio is related to the harmonic sum of squares. n n + 1 1 + 1 22 + : : : + 1 n2 n 0 This approximation is based on the following integral: Z n+1 1 1 z2 dz = 1 z n+1 1 = 1 1 n + 1 = n n + 1 TFfrac assumes more dependence than TFlog. k-th occurrence of a term has an impact of 1=k2. 42 / 133
  • 47. IR Models Foundations of IR Models TF-IDF TFBM25 TFBM25 is a special TFfrac used in the BM25 model K is proportional to the pivoted document length (pivdl(d; c) = dl=avgdl(c)) and involves adjustment parameters (k1, b). The common de
  • 48. nition is: KBM25;k1;b(d; c) := k1 (b pivdl(d; c) + (1 b)) For b = 1, K is equal to k1 for average documents, less than k1 for short documents and greater than k1 for long documents. Large b and k1 lead to a strong variation of K with an high impact on the retrieval score Documents shorter than the average have an advantage over documents longer than the average 43 / 133
  • 49. IR Models Foundations of IR Models TF-IDF Inverse Document Frequency IDF The IDF (inverse document frequency) is the negative logarithm of the DF (document frequency). Idea: the less documents a term appears in, the more discrimiative or 'informative' it is 44 / 133
  • 50. IR Models Foundations of IR Models TF-IDF DF Variants DF(t; c) is a quanti
  • 51. cation of the document frequency, df(t; c). The main variants are: df(t; c) := dftotal(t; c) := nD(t; c) dfsum(t; c) := nD(t; c) ND(c) = PD(tjc) = df(t; c) ND(c) dfsum,smooth(t; c) := nD(t; c) + 0:5 ND(c) + 1 dfBIR(t; c) := nD(t; c) ND(c) nD(t; c) dfBIR,smooth(t; c) := nD(t; c) + 0:5 ND(c) nD(t; c) + 0:5 45 / 133
  • 52. IR Models Foundations of IR Models TF-IDF IDF Variants IDF(t; c) is the negative logarithm of a DF quanti
  • 53. cation. The main variants are: idftotal(t; c) := log dftotal(t; c) idf(t; c) := idfsum(t; c) := log dfsum(t; c) = log PD(tjc) idfsum,smooth(t; c) := log dfsum,smooth(t; c) idfBIR(t; c) := log dfBIR(t; c) idfBIR,smooth(t; c) := log dfBIR,smooth(t; c) IDF is high for rare terms and low for frequent terms. 46 / 133
  • 54. IR Models Foundations of IR Models TF-IDF Burstiness A term is bursty is it occurs often in the documents in which it occurs Burstiness is measured by the average term frequency in the elite set: avgtf(t; c) = nL(t; c) nD(t; c) Intuition (relevant vs. non-relevant documents): A good term is rare (not frequent, high IDF) and solitude (not bursty, low avgtf) in all documents (all non-relevant documents) Among relevant documents, a good term is frequent (low IDF, appears in many relevant documents) and bursty (high avgtf) 47 / 133
  • 55. IR Models Foundations of IR Models TF-IDF IDF and Burstiness 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 1 PD(t|c) idf(t,c) l(t,c) = 2 l(t,c) = 1 BURSTY FREQUENT l(t,c) = 4 10 8 6 4 2 BURSTY RARE avgtf(t,c) SOLITUDE RARE SOLITUDE FREQUENT 0.1 0.2 0.3 0.4 0.5 P (t|c) BIR Example: nD(t1; c) = nD(t2; c) = 1; 000. Same IDF. nL(t1; d) = 1 for 1,000 documents, whereas nL(t2; d) = 1 for 999 documents, and nL(t2) = 1; 001 for one doc. nL(t1; c) = 1; 000 and nL(t2; c) = 2; 000. avgtf(t1; c) = 1 and avgtf(t2; c) = 2. (t; c) = avgtf(t; c) PD(tjc) 48 / 133
  • 56. IR Models Foundations of IR Models TF-IDF TF-IDF Term Weight De
  • 57. nition (TF-IDF term weight wTF-IDF) wTF-IDF(t; d; q; c) := TF(t; d) TF(t; q) IDF(t; c) 49 / 133
  • 58. IR Models Foundations of IR Models TF-IDF TF-IDF RSV De
  • 59. nition (TF-IDF retrieval status value RSVTF-IDF) RSVTF-IDF(d; q; c) := X t wTF-IDF(t; d; q; c) RSVTF-IDF(d; q; c) = X t TF(t; d) TF(t; q) IDF(t; c) 50 / 133
  • 60. IR Models Foundations of IR Models TF-IDF Probabilistic IDF: Probability of Being Informative IDF so far is not probabilistic In probabilistic scenarios, a normalised IDF value such as 0 idf(t; c) maxidf(c) 1 can be useful maxidf(c) := log 1 ND(c) = log(ND(c)) is 8 9 10 11 12 the maximal value of idf(t; c) (when a term occurs only in 1 document) 0 50000 100000 150000 200000 Nd(c) maxidf The normalisation does not aect the ranking 51 / 133
  • 61. IR Models Foundations of IR Models TF-IDF Probability that term t is informative A probabilistic semantics of a max-normalised IDF can be achieved by introducing an informativeness-based probability, [Roelleke, 2003], as opposed to the normal notion of occurrence-based probability, and we denote the probability as P(t informsjc), to contrast it from the usual P(tjc) := P(t occursjc) = nD(t;c) ND(c) . 52 / 133
  • 62. Foundations of IR Models PRF: The Probability of Relevance Framework
  • 63. IR Models Foundations of IR Models PRF: The Probability of Relevance Framework PRF: The Probability of Relevance Framework Relevance is at the core of any information retrieval model With r denoting relevance, d a document and q a query, the probability that a document is relevant is P(r jd; q) 54 / 133
  • 64. IR Models Foundations of IR Models PRF: The Probability of Relevance Framework Probability Ranking Principle [Robertson, 1977],The Probability Ranking Principle (PRP) in IR, describes the PRP as a framework to discuss formallywhat is a good ranking? [Robertson, 1977] quotes Cooper's formal statement of the PRP: If a reference retrieval system's response to each request is a ranking of the documents in the collections in order of decreasing probability of usefulness to the user who submitted the request, ..., then the overall eectiveness of the system ... will be the best that is obtainable on the basis of that data. Formally, we can capture the principle as follows. Let A and B be rankings. Then, a ranking A is better than a ranking B if at every rank, the probability of satisfaction in A is higher than for B, i.e.: 8rank : P(satisfactoryjrank; A) P(satisfactoryjrank;B) 55 / 133
  • 65. IR Models Foundations of IR Models PRF: The Probability of Relevance Framework PRF: Illustration Example (Probability of relevance) Let three users u1; u2; u3 have judged document-query pairs. User Doc Query Judgement R u1 d1 q1 r u1 d2 q1 r u1 d3 q1 r u2 d1 q1 r u2 d2 q1 r u2 d3 q1 r u3 d1 q1 r u3 d2 q1 r u4 d1 q1 r u4 d2 q1 r PU(r jd1; q1) = 3=4, and PU(r jd2; q1) = 1=4, and PU(r jd3; q1) = 0=2. Subscript in PU: event space, a set of users. Total probability: PU(r jd; q) = X u2U P(r jd; q; u) P(u) 56 / 133
  • 66. IR Models Foundations of IR Models PRF: The Probability of Relevance Framework Bayes Theorem Relevance judgements can be incomplete In any case, for a new query we often do not have judgements The probability of relevance is estimated via Bayes' Theorem: P(r jd; q) = P(d; qjr ) P(r ) P(d; q) = P(d; q; r ) P(d; q) Decision whether or not to retrieve a document is based on the so-calledBayesian decision rule: retrieve document d, if the probability of relevance is greater than the probability of non-relevance: retrieve d for q if P(r jd; q) P(r jd; q) 57 / 133
  • 67. IR Models Foundations of IR Models PRF: The Probability of Relevance Framework Probabilistic Odds, Rank Equivalence We can express the probabilistic odds of relevance O(r jd; q) = P(r jd; q) P(r jd; q) = P(d; q; r ) P(d; q; r ) = P(d; qjr ) P(d; qjr ) P(r ) P(r ) Since P(r )=P(r ) is a constant, the following rank equivalence holds: O(r jd; q) rank = P(d; qjr ) P(d; qjr ) Often we don't need the exact probability, but an easier-to-compute value that is rank equivalent (i.e. it preserves the ranking) 58 / 133
  • 68. IR Models Foundations of IR Models PRF: The Probability of Relevance Framework Probabilistic Odds The document-pair probabilities can be decomposed in two ways: P(d; qjr ) P(djq; r ) P(qjr ) = P(d; qjr ) P(djq; r ) P(qjr ) () BIR/Poisson/BM25) = P(qjd; r ) P(djr ) P(qjd; r ) P(djr ) () LM?) The equation where d depends on q, is the basis of BIR, Poisson and BM25...document likelihood P(djq; r ) The equation where q depends on d has been related to LM, P(qjd), [Laerty and Zhai, 2003], Probabilistic Relevance Models Based on Document and Query Generation This relationship and the assumptions required to establish it, are controversial [Luk, 2008] 59 / 133
  • 69. IR Models Foundations of IR Models PRF: The Probability of Relevance Framework Documents as Feature Vectors The next step represents document d as a vector ~d = (f1; : : : ; fn) in a space of features ~f i : P(djq; r ) = P(~djq; r ) A feature could be, for example, the frequency of a word (term), the document length, document creation time, time of last update, document owner, number of in-links, or number of out-links. See also the Vector Space Model in the Introduction { here we used term weights as features Assumptions based on features: Feature independence assumption Non-query term assumption Term frequency split 60 / 133
  • 70. IR Models Foundations of IR Models PRF: The Probability of Relevance Framework Feature Independence Assumption The features (terms) are independent events: P(~djq; r ) Y i P(fi jq; r ) Weaker assumption for the fraction of feature probabilities (linked dependence): P(djq; r ) P(djq; r ) Y i P(fi jq; r ) P(fi jq; r ) Here it is not required to distinguish between these two assumptions P(fi jq; r ) and P(fi jq; r ) may be estimated for instance by means of relevance judgements 61 / 133
  • 71. IR Models Foundations of IR Models PRF: The Probability of Relevance Framework Non-Query Term Assumption Non-query terms can be ignored for retrieval (ranking) This reduces the number of features/terms/dimensions to consider when computing probabilities For non-query terms, the feature probability is the same in relevant documents and non-relevant documents. for all non-query terms: P(fi jq; r ) P(fi jq; r ) = 1 Then 62 / 133
  • 72. IR Models Foundations of IR Models PRF: The Probability of Relevance Framework Term Frequency Split The product over query terms is split into two parts First part captures the fi 0 features, i.e. the document terms Second part captures the fi = 0 features, i.e. the non-document terms 63 / 133
  • 73. Foundations of IR Models BIR: Binary Independence Retrieval
  • 74. IR Models Foundations of IR Models BIR: Binary Independence Retrieval BIR: Binary Independence Retrieval The BIR instantiation of the PRF assumes the vector components to be binary term features, i.e. ~d = (x1; x2; : : : ; xn), where xi 2 f0; 1g Term occurrences are represented in a binary feature vector ~d in the term space The event xt = 1 is expressed as t, and xt = 0 as t 65 / 133
  • 75. IR Models Foundations of IR Models BIR: Binary Independence Retrieval BIR Term Weight We save the derivation ... after few steps from O(r jd; q) ... Event Space: d and q are binary vectors. De
  • 76. nition (BIR term weight wBIR) wBIR(t; r ; r ) := log PD(tjr ) PD(tjr ) PD(t jr ) PD(t jr ) Simpli
  • 77. ed form (referred to as F1) considering term presence only: wBIR;F1(t; r ; c) := log PD(tjr ) PD(tjc) 66 / 133
  • 78. IR Models Foundations of IR Models BIR: Binary Independence Retrieval BIR RSV De
  • 79. nition (BIR retrieval status value RSVBIR) RSVBIR(d; q; r ; r ) := X t2dq wBIR(t; r ; r ) RSVBIR(d; q; r ; r ) = X t2dq log P(tjr ) P(t jr ) P(tjr ) P(t jr ) 67 / 133
  • 80. IR Models Foundations of IR Models BIR: Binary Independence Retrieval BIR Estimations Relevance judgements for 20 documents (example from [Fuhr, 1992]): di 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 x2 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 R r r r r r r r r r r r r r r r r r r r r ND(r ) = 12;ND(r ) = 8; P(t1jr ) = P(x1 = 1jr ) = 8=12 = 2=3 P(t1jr ) = 2=8 = 1=4; P( t1jr ) = 4=12 = 1=3; P( t1jr ) = 6=8 = 3=4 Dito for t2. 68 / 133
  • 81. IR Models Foundations of IR Models BIR: Binary Independence Retrieval Missing Relevance Information I Estimation of non-relevant documents: Take collection-wide term probability as approximation (r c) P(tjr ) P(tjc) Use the set of all documents minus the set of relevant documents (r = cnr ) P(tjr ) nD(t; c) nD(t; r ) ND(c) ND(r ) 69 / 133
  • 82. IR Models Foundations of IR Models BIR: Binary Independence Retrieval Missing Relevance Information II Empty set problem: For ND(r ) = ; (no relevant documents), P(tjr ) is not de
  • 83. ned. Smoothing deals with this situation. Add the query to the set of relevant documents and the collection: P(tjr ) = nd (t; r ) + 1 ND(r ) + 1 Other forms of smoothing, e.g. P(tjr ) = nd (t; r ) + 0:5 ND(r ) + 1 This variant may be justi
  • 84. ed by Laplace's law of succession. 70 / 133
  • 85. IR Models Foundations of IR Models BIR: Binary Independence Retrieval RSJ Term Weight Recall wBIR;F1(t; r ; c) := log PD(tjr ) PD(tjc) De
  • 86. nition (RSJ term weight wRSJ) The RSJ term weight is a smooth BIR term weight. The probability estimation is as follows: P(tjr ) := (nD(t; r ) + 0:5)=(ND(r ) + 1); P(tjr ) := ((nD(t; c)+1)(nD(t; r )+0:5))=((ND(c)+2)(ND(r )+1)). wRSJ;F4(t; r ; r ; c) := log (nd (t; r ) + 0:5)=(ND(r ) nd (t; r ) + 0:5) (nD(t; c)nD(t; r )+0:5)=((ND(c)ND(r )) (nD(t; c)nD(t; r )) + 0:5) 71 / 133
  • 87. Foundations of IR Models Poisson and 2-Poisson
  • 88. IR Models Foundations of IR Models Poisson and 2-Poisson Poisson Model I Less known model; however, there are good reasons to look at this model Demystify the Poisson probability { some research students resign when hearingPoisson ;-) Next to the BIR model the natural instantiation of a PRF model; the BIR model is a special case of the Poisson model The 2-Poisson probability is arguably the foundation of the BM25-TF quanti
  • 89. cation [Robertson and Walker, 1994],Some Simple Eective Approximations to the 2-Poisson Model. 73 / 133
  • 90. IR Models Foundations of IR Models Poisson and 2-Poisson Poisson Model II The Poisson probability is a model of randomness. Divergence from randomness (DFR), which is based on the probability P(t 2 djcollection) = P(tfd 0jcollection). The probability P(tfd jcollection) can be estimated by a Poisson probability. The Poisson parameter (t; c) = nL(t; c)=ND(c), i.e. the average number of term occurrences, relates Document-based and Location-based probabilities. avgtf(t; c) PD(tjc) = (t; c) = avgdl(c) PL(tjc) We refer to this relationship as Poisson Bridge since the average term frequency is the parameter of the Poisson probability. 74 / 133
  • 91. IR Models Foundations of IR Models Poisson and 2-Poisson Poisson Model III The Poisson model yields a foundation of TF-IDF (see Part II) The Poisson bridge helps to relate TF-IDF and LM (see Part II) 75 / 133
  • 92. IR Models Foundations of IR Models Poisson and 2-Poisson Poisson Distribution Let t be an event that occurs in average t times. The Poisson probability is: PPoisson;t (k) := kt k! et 76 / 133
  • 93. IR Models Foundations of IR Models Poisson and 2-Poisson Poisson Example The probability that k = 4 sunny days occur in a week, given the average = p n = 180=360 7 = 3:5 sunny days per week, is: 0.00 0.05 0.10 0.15 0.20 k P3.5(k) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 PPoisson;=3:5(k = 4) = (3:5)4 4! e3:5 0:1888 77 / 133
  • 94. IR Models Foundations of IR Models Poisson and 2-Poisson Poisson PRF: Basic Idea The Poisson model could be used to estimate the document probability P(djq; r ) as the product of the probabilities of the within-document term frequencies kt : P(djq; r ) = P(~djq; r ) = Y t P(kt jq; r ) kt := tfd := nL(t; d) ~d is a vector of term frequencies (cf. BIR: binary vector of occurrence/non-occurrence) 78 / 133
  • 95. IR Models Foundations of IR Models Poisson and 2-Poisson Poisson PRF: Odds Rank-equivalence, non-query term assumption and probabilistic odds lead us to: O(r jd; q) rank = Y t2q P(kt jr ) P(kt jr ) ; kt := tfd := nL(t; d) Splitting the product into document and non-document terms yields: O(r jd; q) rank = Y t2dq P(kt jr ) P(kt jr ) Y t2qnd P(0jr ) P(0jr ) 79 / 133
  • 96. IR Models Foundations of IR Models Poisson and 2-Poisson Poisson PRF: Meaning of Poisson Probability For the set r (r ) and a document d of length dl, we look at: How many times would we expect to see a term t in d? How many times do we actually observe t in d (= kt)? P(kt jr ) is highest if our expectation is met by our observation. (t; d; r ) = dl PL(tjr ) is a number of expected occurrences in a document with length dl (how many times will we draw t if we had dl trials?) P(kt jr ) = PPoisson;(t;d;r )(kt ): probability to observe kt occurrences of term t in dl trials 80 / 133
  • 97. IR Models Foundations of IR Models Poisson and 2-Poisson Poisson PRF: Example dl = 5; PL(tjr ) = 5=10 = 1=2 (t; d; r ) = 5 1=2 = 2:5 P(2jr ) = PPoisson;2:5(2) = 0:2565156 0.00 0.05 0.10 0.15 0.20 0.25 k P2.5(k) 0 1 2 3 4 5 6 7 8 9 10 81 / 133
  • 98. IR Models Foundations of IR Models Poisson and 2-Poisson Poisson Term Weight Event Space: d and q are frequency vectors. De
  • 99. nition (Poisson term weight wPoisson) wPoisson(t; d; q; r ; r ) := TF(t; d) log (t; d; r ) (t; d; r ) wPoisson(t; d; q; r ; r ) = TF(t; d) log PL(tjr ) PL(tjr ) (t; d; r ) = dl PL(tjr ) TF(t; d) = kt (smarter TF quanti
  • 101. IR Models Foundations of IR Models Poisson and 2-Poisson Poisson RSV De
  • 102. nition (Poisson retrieval status value RSVPoisson) RSVPoisson(d; q; r ; r ) := 2 4 X t2dq wPoisson(t; d; q; r ; r ) 3 5+ len normPoisson RSVPoisson(d; q; r ; r ) = 2 4 X t2dq TF(t; d) log PL(tjr ) PL(tjr ) 3 5+ dl X t (PL(tjr ) PL(tjr )) 83 / 133
  • 103. IR Models Foundations of IR Models Poisson and 2-Poisson 2-Poisson Viewed as a motivation for the BM25-TF quanti
  • 104. cation, [Robertson and Walker, 1994], Simple Approximations to the 2-Poisson Model. In an exchange with Stephen Robertson, he explained: The investigation into the 2-Poisson probability motivated the BM25-TF quanti
  • 105. cation.Regarding the combination of TF and RSJ weight in BM25, TF can be viewed as a factor to re ect the uncertainty about whether the RSJ weight wRSJ is correct; for terms with a relatively high within-document TF, the weight is correct; for terms with a relatively low within-document TF, there is uncertainty about the correctness. In other words, the TF factor can be viewed as a weight to adjust the impact of the RSJ weight. 84 / 133
  • 106. IR Models Foundations of IR Models Poisson and 2-Poisson 2-Poisson Example: How many cars to expect? I How many cars are expected on a given commuter car park? Approach 1: In average, there are 700 cars per week. The daily average is: = 700=7 = 100 cars/day. Then, P=100(k) is the probability that there are k cars wanting to park on a given day. Estimation is less accurate than an estimation based on a 2-dimensional model { Mo-Fr are the busy days, and on week-ends, the car park is nearly empty. This means that a distribution such as (130, 130, 130, 130, 130, 25, 25) is more likely than 100 each day. 85 / 133
  • 107. IR Models Foundations of IR Models Poisson and 2-Poisson 2-Poisson Example: How many cars to expect? II Approach 2: In a more detailed analysis, we observe 650 cars Mon-Fri (work days) and 50 cars Sat-Sun (week-end days). The averages are: 1 = 650=5 = 130 cars/work-day, 2 = 50=2 = 25 cars/we-day. Then, P1=5=7;1=130;2=2=7;2=25(k) is the 2-dimensional Poisson probability that there are k cars looking for a car park. 86 / 133
  • 108. IR Models Foundations of IR Models Poisson and 2-Poisson 2-Poisson Example: How many cars to expect? III Main idea of the 2-Poisson probability: combine (interpolate, mix) two Poisson probabilities P2-Poisson;1;2;(kt) := kt 1 kt ! e1 + (1 ) kt 2 kt ! e2 Such a mixture model is also used with LM (later) 1 could be over all documents whereas 2 could be over an elite set (e.g. documents that contain at least one query term) 87 / 133
  • 109. Foundations of IR Models BM25
  • 110. IR Models Foundations of IR Models BM25 BM25 One of the most prominent IR models The ingredients have already been prepared: TFBM25;K (t; d) wRSJ;F4(t; r ; r ; c) Wikipedia 2014 formulation: Given a query Q containing keywords q1; : : : ; qn: score(D;Q) = Xn i=1 IDF(qi ) f (qi ;D) (k1 + 1) f (qi ;D) + k1 (1 b + b jDj avgdl ) with f (qi ;D) = nL(qi ;D) (qi 's term frequency) Here: Ignore (k1 + 1) (ranking invariant!) Use wRSJ;F4(t; r ; r ; c). If relevance information is missing: wRSJ;F4(t; r ; r ; c) IDF(t) 89 / 133
  • 111. IR Models Foundations of IR Models BM25 BM25 Term Weight De
  • 112. nition (BM25 term weight wBM25) wBM25;k1;b;k3(t; d; q; r ; r ) := X t2dq TFBM25;k1;b(t; d) TFBM25;k3(t; q) wRSJ(t; r ; r ) TFBM25;k1;b(t; d) := tfd tfd + k1 (b pivdl(d) + (1 b)) 90 / 133
  • 113. IR Models Foundations of IR Models BM25 BM25 RSV De
  • 114. nition (BM25 retrieval status value RSVBM25) RSVBM25;k1;b;k2;k3(d; q; r ; r ; c) := 2 4 X t2dq wBM25;k1;b;k3(t; d; q; r ; r ) 3 5 + len normBM25;k2 Additional length normalisation len normBM25;k2(d; q; c) := k2 ql avgdl(c) dl avgdl(c) + dl (ql is query length). Suppresses long documents (negative length norm). 91 / 133
  • 115. IR Models Foundations of IR Models BM25 BM25: Summary We have discussed the foundations of BM25, an instance of the probability relevance framework (PRF). TF quanti
  • 116. cation TFBM25 using TFfrac and the pivoted document length TFBM25 can be related to probability theory through semi-subsumed event occurrences (out of scope of this talk) RSJ term weight wRSJ as smooth variant of the BIR term weight (0.5-smoothing can be explained through Laplace's law of succession) 92 / 133
  • 117. IR Models Foundations of IR Models BM25 Document- and Query-Likelihood Models We are now leaving the world of document-likelihood models and move towards LM, the query-likelihood model. P(d; qjr ) P(d; qjr ) = P(djq; r ) P(qjr ) P(djq; r ) P(qjr ) () BIR/Poisson/BM25) = P(qjd; r ) P(djr ) P(qjd; r ) P(djr ) () LM?) 93 / 133
  • 118. Foundations of IR Models LM: Language Modelling
  • 119. IR Models Foundations of IR Models LM: Language Modelling Language Modelling (LM) Popular retrieval model since the late 90s [Ponte and Croft, 1998, Hiemstra, 2000, Croft and Laerty, 2003] Compute probability P(qjd) that a document generates a query Q q is a conjunction of term events: P(qjd) / P(tjd) Zero-probability problem: terms t that don't appear in d lead to P(tjd) = 0 and hence P(qjd) = 0 Mix the within-document term probability P(tjd) and the collection-wide term probability P(tjc) 95 / 133
  • 120. IR Models Foundations of IR Models LM: Language Modelling Probability Mixture Let three events x; y; z, and two conditional probabilities P(zjx) and P(zjy) be given. Then, P(zjx; y) can be estimated as a linear combination/mixture of P(zjx) and P(zjy). P(zjx; y) x P(zjx) + (1 x ) P(zjy) Here, 0 x 1 is the mixture parameter. The mixture parameters can be constant (Jelinek-Mercer mixture), or can be set proportional to the total probabilities. 96 / 133
  • 121. IR Models Foundations of IR Models LM: Language Modelling Probability Mixture Example Let P(sunny;warm; rainy; dry; windyjglasgow) describe the probability that a day in Glasgow is sunny, the next day is warm, the next rainy, and so forth If for one event (e.g. sunny), the probability were zero, then the probability of the conjunction (product) is zero. A mixture solves the problem. For example, mix P(xjglasgow) with P(xjuk) where P(xjuk) 0 for each event x. Then, in a week in winter, when P(sunnyjglasgow) = 0, and for the whole of the UK, the weather oce reports 2 of 7 days as sunny, the mixed probability is: P(sunnyjglasgow; uk) = 0 7 + (1 ) 2 7 97 / 133
  • 122. IR Models Foundations of IR Models LM: Language Modelling LM1 Term Weight Event Space: d and q are sequences of terms. De
  • 123. nition (LM1 term weight wLM1) P(tjd; c) := d P(tjd) + (1 d ) P(tjc) wLM1;d (t; d; q; c) := TF(t; q) log =P(tjd;c) z }| { (d P(tjd) + (1 d ) P(tjc)) P(tjd): foreground probability P(tjc): background probability 98 / 133
  • 124. IR Models Foundations of IR Models LM: Language Modelling Language Modelling Independence Assumption We assume terms are independent, meaning the product over the term probabilities P(tjd) is equal to P(qjd): Probability that mixture of background and foreground probabilities generates query as sequence of terms: t IN q: sequence of terms (e.g. q = (sailing, boat, sailing)) t 2 q: set of terms; TF(sailing; q) = 2 Note: log P(tjd; c)TF(t;q) = TF(t; q) log (P(tjd; c)) 99 / 133
  • 125. IR Models Foundations of IR Models LM: Language Modelling LM1 RSV De
  • 126. nition (LM1 retrieval status value RSVLM1) RSVLM1;d (d; q; c) := X t2q wLM1;d (t; d; q; c) 100 / 133
  • 127. IR Models Foundations of IR Models LM: Language Modelling JM-LM Term Weight For constant , the score can be divided by Q t IN q(1 ). This leads to the following equation, [Hiemstra, 2000]: P(qjd; c) P(qjc) Q t IN q(1 ) = Y t2dq 1 + 1 P(tjd) P(tjc) TF(t;q) De
  • 128. nition (JM-LM (Jelinek-Mercer) term weight wJM-LM) wJM-LM;(t; d; q; c) := TF(t; q) log 1 + 1 P(tjd) P(tjc) 101 / 133
  • 129. IR Models Foundations of IR Models LM: Language Modelling JM-LM RSV De
  • 130. nition (JM-LM retrieval status value RSVJM-LM) RSVJM-LM;(d; q; c) := X t2dq wJM-LM;(t; d; q; c) RSVJM-LM;(d; q; c) := X t2dq TF(t; q) log 1 + 1 P(tjd) P(tjc) We only need to look at terms that appear in both document and query. 102 / 133
  • 131. IR Models Foundations of IR Models LM: Language Modelling Dirichlet-LM Term Weight Document-dependent mixture parameter: D = dl dl+ De
  • 132. nition (Dirich-LM term weight wDirich-LM) wDirich-LM;(t; d; q; c) := TF(t; q) log +jdj + jdj jdj+ P(tjd) P(tjc) 103 / 133
  • 133. IR Models Foundations of IR Models LM: Language Modelling Dirich-LM RSV De
  • 134. nition (Dirich-LM retrieval status value RSVDirich-LM) RSVDirich-LM;(d; q; c) := X t2q wDirich-LM;(t; d; q; c) RSVDirich-LM(d; q; c; ) = X t2q TF(t; q) log + jdj + jdj jdj + P(tjd) P(tjc) 104 / 133
  • 135. Foundations of IR Models PIN's: Probabilistic Inference Networks
  • 136. IR Models Foundations of IR Models PIN's: Probabilistic Inference Networks PIN's: Probabilistic Inference Networks Random variables and conditional dependencies as directed acyclic graph (DAG) Minterm: Conjunction of term events (e.g. t1 ^ t2) Decomposition of (disjoint) minterms X leads to computation P(qjd) = X x2X P(qjx)P(xjd) 106 / 133
  • 137. IR Models Foundations of IR Models PIN's: Probabilistic Inference Networks Link Matrix Probability ow in a PIN can be described via a link or transition matrix Link matrix L contains the transition probabilities P(targetjx; source) Usually, P(targetjx; source) = P(targetjx) is assumed, and this assumption is referred to as the linked independence assumption. 107 / 133
  • 138. IR Models Foundations of IR Models PIN's: Probabilistic Inference Networks PIN Link Maxtrix L := P(qjx1) : : : P(qjxn) P(qjx1) : : : P(qjxn) P(qjd) P(qjd) = L 0 B@ P(x1jd) ... P(xnjd) 1 CA 108 / 133
  • 139. IR Models Foundations of IR Models PIN's: Probabilistic Inference Networks Link Matrix for 3 Terms For illustrating the link matrix, a matrix for three terms is shown next. LTransposed = 2 66666666664 P(qjt1; t2; t3) P(qjt1; t2; t3) P(qjt1; t2; t3) P(qjt1; t2; t3) P(qjt1; t2; t3) P(qjt1; t2; t3) P(qjt1; t2; t3) P(qjt1; t2; t3) P(qj t1; t2; t3) P(qj t1; t2; t3) P(qj t1; t2; t3) P(qj t1; t2; t3) P(qj t1; t2; t3) P(qj t1; t2; t3) P(qj t1; t2; t3) P(qj t1; t2; t3) 3 77777777775 109 / 133
  • 140. IR Models Foundations of IR Models PIN's: Probabilistic Inference Networks Special Link Matrices The matrices Lor and Land re ect the boolean combination of the linkage between a source and a target. Lor = 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 Land = 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 110 / 133
  • 141. IR Models Foundations of IR Models PIN's: Probabilistic Inference Networks y = A x P(qjd) P(qjd) = L 0 BBBBBBBBBB@ P(t1; t2; t3jd) P(t1; t2; t3jd) P(t1; t2; t3jd) P(t1; t2; t3jd) P( t1; t2; t3jd) P( t1; t2; t3jd) P( t1; t2; t3jd) P( t1; t2; t3jd) 1 CCCCCCCCCCA 111 / 133
  • 142. IR Models Foundations of IR Models PIN's: Probabilistic Inference Networks Turtle/Croft Link Matrix Let wt := P(qjt) be the query term probabilities. Then, estimate the link matrix elements P(qjx), where x is a boolean combination of terms, as follows. LTurtle/Croft = 1 w1+w2 w0 w1+w3 w0 w1 w0 w2+w3 w0 w2 w0 w3 w0 0 0 w3 w0 w2 w0 w2+w3 w0 w1 w0 w1+w3 w0 w1+w2 w0 1 # [Turtle and Croft, 1992, Croft and Turtle, 1992] 112 / 133
  • 143. IR Models Foundations of IR Models PIN's: Probabilistic Inference Networks PIN Term Weight De
  • 144. nition (PIN-based term weight wPIN) wPIN(t; d; q; c) := P 1 t0 P(qjt0; c) P(qjt; c) P(tjd; c) 113 / 133
  • 145. IR Models Foundations of IR Models PIN's: Probabilistic Inference Networks PIN RSV De
  • 146. nition (PIN-based retrieval status value RSVPIN) RSVPIN(d; q; c) := X t wPIN(t; d; q; c) RSVPIN(d; q; c) = P 1 t P(qjt; c) X t2dq P(qjt; c) P(tjd; c) P(qjt; c) proportional pidf(t; c) P(tjd; c) proportional TF(t; d) 114 / 133
  • 147. IR Models Foundations of IR Models PIN's: Probabilistic Inference Networks PIN RSV Computation Example Let the following term probabilities be given: ti P(ti jd) P(qjti ) sailing 2=3 1=10; 000 boats 1=2 1=1; 000 Terms are independent events. P(tjd) is proportional to the within-document term frequency, and P(qjt) is proportional to the IDF. The RSV is: 1 RSVPIN(d; q; c) = 1=10; 000 + 1=1; 000 (1=10; 000 2=3 + 1=1; 000 1=2) = = 1 11=10; 000 (2=30; 000 + 1=2; 000) = 10; 000 11 (2 + 15)=30; 000 = = 1 11 (2 + 15)=3 = 17=33 0:51 115 / 133
  • 148. Foundations of IR Models Relevance-based Models
  • 149. IR Models Foundations of IR Models Relevance-based Models Relevance-based Models VSM (Rocchio's formulae) PRF [Lavrenko and Croft, 2001],Relevance-based Language Models: LM-based approach to estimate P(tjr ). Massive query expansion. 117 / 133
  • 150. IR Models Foundations of IR Models Relevance-based Models Relevance Feedback in the VSM [Rocchio, 1971],Relevance Feedback in Information Retrieval, is the must-have reference and background for what a relevance feedback model aims at. There are two formulations that aggregate term weights: weight(t; q) = weight(t; q) + 1 jRj X d2R ~d 1 jR j X d2R ~d weight(t; q) = weight(t; q) +
  • 151. X d2R ~d X d2R ~d 118 / 133
  • 152. Foundations of IR Models Foundations: Summary
  • 153. IR Models Foundations of IR Models Foundations: Summary Foundations: Summary 1 TF-IDF: Semantics of TF quanti
  • 154. cation. 2 PRF: basis of BM25. LM? 3 BIR 4 Poisson 5 BM25: The P(djq)=P(d) side of IR models. 6 LM: The P(qjd)=P(q) side of IR models. 7 PIN's: Special link matrix . 8 Relevance-based Models 120 / 133
  • 155. IR Models Foundations of IR Models Foundations: Summary Model Overview: TF-IDF, BIR, Poisson RSVTF-IDF(d; q; c) := X t wTF-IDF(t; d; q; c) wTF-IDF(t; d; q; c) := TF(t; d) TF(t; q) IDF(t; c) RSVBIR(d; q; r ; r ) := X t2dq wBIR(t; r ; r ) wBIR(t; r ; r ) := log PD(tjr ) PD(tjr ) PD(tjr ) PD(tjr ) RSVPoisson(d; q; r ; r ) := 2 4 X t2dq wPoisson(t; d; q; r ; r ) 3 5 + len normPoisson wPoisson(t; d; q; r ; r ) := TF(t; d) log (t;d;r ) (t;d;r ) = TF(t; d) log PL(tjr ) PL(tjr ) 121 / 133
  • 156. IR Models Foundations of IR Models Foundations: Summary Model Overview: BM25, LM, DFR RSVBM25;k1;b;k2;k3 (d; q; r ; r ; c) := 2 4 X t2dq wBM25;k1;b;k3 (t; d; q; r ; r ) 3 5 + len normBM25;k2 wBM25;k1;b;k3 (t; d; q; r ; r ) := P t2dq TFBM25;k1;b(t; d) TFBM25;k3 (t; q) wRSJ(t; r ; r ) RSVJM-LM;(d; q; c) := X t2dq wJM-LM;(t; d; q; c) wJM-LM;(t; d; q; c) := TF(t; q) log 1 + 1 P(tjd) P(tjc) RSVDirich-LM;(d; q; c) := X t2q wDirich-LM;(t; d; q; c) wDirich-LM;(t; d; q; c) := TF(t; q) log +jdj + jdj jdj+ P(tjd) P(tjc) RSVDFR;M(d; q; c) := X t2dq wDFR;M(t; d; c) wDFR-1;M(t; d; c) := log PM(t 2 djc) wDFR-2;M(t; d; c) := log PM(tfd jc) 122 / 133
  • 157. IR Models Foundations of IR Models Foundations: Summary The End (of Part I) Material from book: IR Models: Foundations and Relationships. Morgan Claypool, 2013. References and textual explanations of formulae in book. 123 / 133
  • 158. IR Models Foundations of IR Models Foundations: Summary Bib I Amati, G. and van Rijsbergen, C. J. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transaction on Information Systems (TOIS), 20(4):357{389. Bookstein, A. (1980). Fuzzy requests: An approach to weighted Boolean searches. Journal of the American Society for Information Science, 31:240{247. Brin, S. and Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1-7):107{117. Bruza, P. and Song, D. (2003). A comparison of various approaches for using probabilistic dependencies in language modeling. In SIGIR, pages 419{420. ACM. Church, K. and Gale, W. (1995). Inverse document frequency (idf): A measure of deviation from Poisson. In Proceedings of the Third Workshop on Very Large Corpora, pages 121{130. 124 / 133
  • 159. IR Models Foundations of IR Models Foundations: Summary Bib II Cooper, W. (1991). Some inconsistencies and misnomers in probabilistic IR. In Bookstein, A., Chiaramella, Y., Salton, G., and Raghavan, V., editors, Proceedings of the Fourteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 57{61, New York. Cooper, W. S. (1988). Getting beyond Boole. Information Processing and Management, 24(3):243{248. Cooper, W. S. (1994). Triennial ACM SIGIR award presentation and paper: The formalism of probability theory in IR: A foundation for an encumbrance. In [Croft and van Rijsbergen, 1994], pages 242{248. Croft, B. and Laerty, J., editors (2003). Language Modeling for Information Retrieval. Kluwer. Croft, W. and Harper, D. (1979). Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35:285{295. 125 / 133
  • 160. IR Models Foundations of IR Models Foundations: Summary Bib III Croft, W. and Turtle, H. (1992). Retrieval of complex objects. In Pirotte, A., Delobel, C., and Gottlob, G., editors, Advances in Database Technology | EDBT'92, pages 217{229, Berlin et al. Springer. Croft, W. B. and van Rijsbergen, C. J., editors (1994). Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, London, et al. Springer-Verlag. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391{407. Dumais, S. T., Furnas, G. W.and Landauer, T. K., and Deerwester, S. (1988). Using latent semantic analysis to improve information retrieval. pages 281{285. Fang, H. and Zhai, C. (2005). An exploration of axiomatic approaches to information retrieval. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 480{487, New York, NY, USA. ACM. 126 / 133
  • 161. IR Models Foundations of IR Models Foundations: Summary Bib IV Fuhr, N. (1989). Models for retrieval with probabilistic indexing. Information Processing and Management, 25(1):55{72. Fuhr, N. (1992). Probabilistic models in information retrieval. The Computer Journal, 35(3):243{255. Fuhr, N. (2008). A probability ranking principle for interactive information retrieval. Information Retrieval, 11:251{265. He, B. and Ounis, I. (2005). Term frequency normalisation tuning for BM25 and DFR models. In ECIR, pages 200{214. Hiemstra, D. (2000). A probabilistic justi
  • 162. cation for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries, 3(2):131{139. Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of ACM, 46. 127 / 133
  • 163. IR Models Foundations of IR Models Foundations: Summary Bib V Laerty, J. and Zhai, C. (2003). Probabilistic Relevance Models Based on Document and Query Generation, chapter 1. In [Croft and Laerty, 2003]. Lavrenko, V. and Croft, W. B. (2001). Relevance-based language models. In SIGIR, pages 120{127. ACM. Luk, R. W. P. (2008). On event space and rank equivalence between probabilistic retrieval models. Inf. Retr., 11(6):539{561. Margulis, E. (1992). N-Poisson document modelling. In Belkin, N., Ingwersen, P., and Pejtersen, M., editors, Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 177{189, New York. Maron, M. and Kuhns, J. (1960). On relevance, probabilistic indexing, and information retrieval. Journal of the ACM, 7:216{244. Metzler, D. and Croft, W. B. (2004). Combining the language model and inference network approaches to retrieval. Information Processing Management, 40(5):735{750. 128 / 133
  • 164. IR Models Foundations of IR Models Foundations: Summary Bib VI Piwowarski, B., Frommholz, I., Lalmas, M., and Van Rijsbergen, K. (2010). What can Quantum Theory Bring to Information Retrieval? In Proc. 19th International Conference on Information and Knowledge Management, pages 59{68. Ponte, J. and Croft, W. (1998). A language modeling approach to information retrieval. In Croft, W. B., Moat, A., van Rijsbergen, C. J., Wilkinson, R., and Zobel, J., editors, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 275{281, New York. ACM. Robertson, S. (1977). The probability ranking principle in IR. Journal of Documentation, 33:294{304. Robertson, S. (2004). Understanding inverse document frequency: On theoretical arguments for idf. Journal of Documentation, 60:503{520. Robertson, S. (2005). On event spaces and probabilistic models in information retrieval. Information Retrieval Journal, 8(2):319{329. 129 / 133
  • 165. IR Models Foundations of IR Models Foundations: Summary Bib VII Robertson, S., S. Walker, S. J., Hancock-Beaulieu, M., and Gatford, M. (1994). Okapi at TREC-3. In Text REtrieval Conference. Robertson, S. and Sparck-Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129{146. Robertson, S. E. and Walker, S. (1994). Some simple eective approximations to the 2-Poisson model for probabilistic weighted retrieval. In [Croft and van Rijsbergen, 1994], pages 232{241. Robertson, S. E., Walker, S., and Hancock-Beaulieu, M. (1995). Large test collection experiments on an operational interactive system: Okapi at TREC. Information Processing and Management, 31:345{360. Rocchio, J. (1971). Relevance feedback in information retrieval. In [Salton, 1971]. Roelleke, T. (2003). A frequency-based and a Poisson-based probability of being informative. In ACM SIGIR, pages 227{234, Toronto, Canada. 130 / 133
  • 166. IR Models Foundations of IR Models Foundations: Summary Bib VIII Roelleke, T. and Wang, J. (2006). A parallel derivation of probabilistic information retrieval models. In ACM SIGIR, pages 107{114, Seattle, USA. Roelleke, T. and Wang, J. (2008). TF-IDF uncovered: A study of theories and probabilities. In ACM SIGIR, pages 435{442, Singapore. Salton, G., editor (1971). The SMART Retrieval System - Experiments in Automatic Document Processing. Prentice Hall, Englewood, Clis, New Jersey. Salton, G., Fox, E., and Wu, H. (1983). Extended Boolean information retrieval. Communications of the ACM, 26:1022{1036. Salton, G., Wong, A., and Yang, C. (1975). A vector space model for automatic indexing. Communications of the ACM, 18:613{620. 131 / 133
  • 167. IR Models Foundations of IR Models Foundations: Summary Bib IX Singhal, A., Buckley, C., and Mitra, M. (1996). Pivoted document length normalisation. In Frei, H., Harmann, D., Schauble, P., and Wilkinson, R., editors, Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21{39, New York. ACM. Sparck-Jones, K., Robertson, S., Hiemstra, D., and Zaragoza, H. (2003). Language modelling and relevance. Language Modelling for Information Retrieval, pages 57{70. Sparck-Jones, K., Walker, S., and Robertson, S. E. (2000). A probabilistic model of information retrieval: development and comparative experiments: Part 1. Information Processing and Management, 26:779{808. Turtle, H. and Croft, W. (1991). Ecient probabilistic inference for text retrieval. In Proceedings RIAO 91, pages 644{661, Paris, France. Turtle, H. and Croft, W. (1992). A comparison of text retrieval models. The Computer Journal, 35. 132 / 133
  • 168. IR Models Foundations of IR Models Foundations: Summary Bib X Turtle, H. and Croft, W. B. (1990). Inference networks for document retrieval. In Vidick, J.-L., editor, Proceedings of the 13th International Conference on Research and Development in Information Retrieval, pages 1{24, New York. ACM. van Rijsbergen, C. J. (1986). A non-classical logic for information retrieval. The Computer Journal, 29(6):481{485. van Rijsbergen, C. J. (1989). Towards an information logic. In Belkin, N. and van Rijsbergen, C. J., editors, Proceedings of the Twelfth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 77{86, New York. van Rijsbergen, C. J. (2004). The Geometry of Information Retrieval. Cambridge University Press, New York, NY, USA. Wong, S. and Yao, Y. (1995). On modeling information retrieval with probabilistic inference. ACM Transactions on Information Systems, 13(1):38{68. Zaragoza, H., Hiemstra, D., Tipping, M. E., and Robertson, S. E. (2003). Bayesian extension to the language model for ad hoc information retrieval. In ACM SIGIR, pages 4{9, Toronto, Canada. 133 / 133