SlideShare a Scribd company logo
1 of 36
Download to read offline
Querylog-based Assessment of
Retrievability Bias in Delpher
Myriam C. Traub, Thaer Samar, Jacco van Ossenbruggen, Jiyin He,
Arjen de Vries, Lynda Hardman
1
Motivation
• Users want to be able
• to access all (relevant) documents
in Delpher
• to get a fair overview of Delpher’s
content
• However,
• data collections are implicitly
biased,
• users are biased,
• and technology induces even more
bias(es)
2
Motivation
• Users want to be able
• to access all (relevant) documents
in Delpher
• to get a fair overview of Delpher’s
content
• However,
• data collections are implicitly
biased,
• users are biased,
• and technology induces even more
bias(es)
… which I can
deal with if the bias is
made explicit.
#toolcrit
2
Motivation
• Users want to be able
• to access all (relevant) documents
in Delpher
• to get a fair overview of Delpher’s
content
• However,
• data collections are implicitly
biased,
• users are biased,
• and technology induces even more
bias(es)
… which I can
deal with if the bias is
made explicit.
#toolcrit
2
Note:
Bias is not necessarily a
bad thing!
Research Questions
RQ1: Is the access to the digitized
newspaper collection in Delpher
influenced by a retrievability bias?
RQ2: Can we correlate the features of a
document (such as document length, time
of publishing, type of document, etc.) with
its retrievability scores?
RQ3: To what extent are retrievability 

experiments using simulated queries 

representative for the search behavior of 

real users?
3
Retrievability
• Introduced by Azzopardi et al. [1] in 2008 in a study based
on born-digital documents and simulated queries
• Measures the accessibility of all documents in a collection for
a given set of queries
• Retrievability score r(d) measures how 

often a document d is retrieved by a 

given set of queries
• Gini coefficient and Lorenz curves can visualize and 

quantify inequality in the distribution of r(d) scores
4
[1]  L. Azzopardi and V. Vinay. Retrievability: An evaluation measure for higher order information access tasks. In Proceedings of the 17th ACM Conference on
Information and Knowledge Management, CIKM ’08, pages 561–570, New York, NY, USA, 2008. ACM.
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Lorenz curve
% of population
%ofincome
0, 0, 0, 0, 1
1, 1, 1, 1, 1
0, 0, 1, 1, 2
Lorenz Curve 

& Gini Coefficient
• Introduced by economists to
visualize inequality in wealth
distribution
• Ranges between 0 and 1:
• perfect tyranny (G=0.8)
• perfect communist (G=0)
• in-between (G=0.5)
• There is no good or bad G.
5
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Lorenz curve
% of population
%ofincome
0, 0, 0, 0, 1
1, 1, 1, 1, 1
0, 0, 1, 1, 2
Lorenz Curve 

& Gini Coefficient
• Introduced by economists to
visualize inequality in wealth
distribution
• Ranges between 0 and 1:
• perfect tyranny (G=0.8)
• perfect communist (G=0)
• in-between (G=0.5)
• There is no good or bad G.
5
% of documents
%ofaccumulatedr(d)
Document Collection:

KB Newspaper Archive
June 1618 - December 1995
Total Size 102,718,528
Vocabulary Size 353,086,358
Articles 67% 69,237,655
Advertisements 29% 29,591,599
Notifications* 2% 1,918,375
Captions 2% 1,970,899
* Familiebericht
6
Simulated Queries
• Followed similar strategy as previous studies
• Top 2 million single terms from the
preprocessed corpus + top 2 million bigram
terms
• No filtering for OCR errors
7
Real Queries
• User logs collected between March and July 2015 on
Delpher
• Extracted queries and view data related to newspaper
archive
• Total of 957,239 unique queries
8
Experiment /
Parameters
• Real queries, simulated queries
• Standard Information Retrieval models: TFIDF,
LM1000, BM25
• Pre-processing: Stemming, stopword removal,
operator removal
• Cutoff values: c=10, c=100, c=1000
9
Results of
Quantifying
Retrievability Bias
10
Inequality
Real queries, c=1000
GBM25 = 0.76
Simulated queries, c=100
GBM25 = 0.5211
Inequality
c=10
Real queries
GBM25 = 0.97
Simulated queries
GBM25 = 0.85
12
Inequality
c=10
Real queries
GBM25 = 0.97
Simulated queries
GBM25 = 0.85
12
A large fraction of documents
scores r(d)=0
• The Lorenz curves and Gini values
• are strongly influenced by 0 values,
• can indicate the degree of bias, but they 

tell us nothing about the type of bias.
13
Limitations
• The Lorenz curves and Gini values
• are strongly influenced by 0 values,
• can indicate the degree of bias, but they 

tell us nothing about the type of bias.
13
Limitations
Does it
arise from the users’
interest / search behavior?
Or a technological bias towards a
particular document
feature?
Frequencies of r(d)
values
14
• Real queries (top):
• maximum r(d)=4319
• tend to retrieve a few
documents more often
• Simulated queries (bottom):
• maximum r(d)=807
• tend to retrieve a larger
number of documents
1
5
10
50
100
500
1000
5000
10000
50000
100000
500000
1000000
5000000
10000000
30000000
0 500 1000 1500 2000 2500 3000 3500 4000
r(d)
counts
1
5
10
50
100
500
1000
5000
10000
50000
100000
500000
1000000
5000000
10000000
0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800
r(d)
counts
R(d) Values Meaningful?
• Created 4 subsets of documents according to their r(d) score and
selected a set of target documents from each subset
• Generated queries from selected documents, tailored to retrieve these
specific documents
• Performed the search tasks and measured ranks of the target documents
• Showed that documents with lower r(d) score are actually harder to find
15
Hardly retr.
Few times retr.
Often retr.
Very often retr.
Document Features
16
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
● ●
● ●
●
●
●
●
●
●
●
● ●
● ●
●
● ●
●
●
●
●
●
● ● ● ● ●
●
●
●
●
●
●
●
● ●
●
●
0
1
2
3
1618−1862
1862−1891
1891−1904
1904−1913
1913−1920
1920−1926
1926−1929
1929−1932
1932−1935
1935−1939
1939−1941
1941−1948
1948−1956
1956−1963
1963−1969
1969−1974
1974−1979
1979−1984
1984−1989
1989−2011
Meanr(d)perbin
Time of Publishing
• Documents ordered by
publishing date
• Split into 20 bins of
equal size
• Mean r(d) per bin
17
●●●●●●●●
●
●●●●●●●●
●
●
●
●●●●●
●
●●●●●●●
●●
●●●●●●●
●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●●●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●●
●
●
●●
●●●
●
●
●●●
●●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●
●●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●
●●●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●●
●
●
●
●●●
●
●
●●
●
●
●●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●●●
●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●●●●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●
●●
●●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●●
●●
●
●
●
●●●
●
●
0.5
1.0
1.5
2.0
0 1000 2000 3000 4000 5000
Bins based on page confidence (PC)
Meanr(d)perbin
OCR Confidence Scores
• Documents ordered by
page confidence (PC)
• Split into bins according
to PC value
• Mean r(d) per bin
18
Document Length
• Varies from 33 to 381,563 words (mean = 362)
• Documents ordered by length and split into bins of 20,000
• LM1000 (left): upward trend, longer documents more retrievable
• BM25 and TFIDF (right): seem to be better at retrieving documents of
medium length
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●●●
●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●
●
●●●●●●
●●●●●●●●●●●●●●●●●●
●●
●●●
●●●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●
●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●
●●●●●●●●●●●●●
●●●●●●●●●●
●
●●●●●●●●●●●●
●
●
●
●
●●●●●●
●
●●●●●●●●●●●●●●●●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●●●
●●●
●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●
●
●
●●●●●
●●●●●●●●●●●●●●●
●●●
●●●●●
●●●●
●●●
●●●●●●●●●●
●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●
●
●●
●●●
●●●
●●●●●●●●
●
●●●●●
●
●●●
●●●
●●●●●●●●
●
●
●
●●●●●
●●●●
●●●●●●●●●●●●
●
●●●●●●
●●●
●●●
●●●●●●●●●●●●●●●●
●●●
●●●
●●●●●●●●●●●●●●●●
●
●●
●●●
●●●●●●●●●●●
●
●●●●●●●
●●●
●
●●●●●●●●
●
●●●●●
●●●●●
●●●●●●●●●●●●●●●●●
●●
●
●
●●●
●●●●●●●
●
●●
●●●●●
●●
●●
●●●●●●●●●●●
●
●
●●●●
●
●
●●●
●
●●●●●●●●●●●●●●●●
●●●●
●
●●●●●●●●●●●●●●●●
●●●●
●
●●●●●●●●●●●●●●●●
●●
●
●●
●
●
●●●●
●●
●
●
●
●●●●●
●●
●
●●●●
●●●●●●●●●●●●●
●●●
●●●
●
●●●●●●●
●●●●●●
●●●
●●
●
●
●●●●
●●●●●●●●●
●●●
●●●●●●●●●●●
●
●●●●●
●●●
●
●●●●●●●●
●
●
●
●●●●●●●●●●
●
●●
●●●●●●
●●●●●
●●●
●
●●●●●●●●●●●●●●●
●●●
●
●●●●●●●●
●
●
●●●●●
●●●
●
●●●●●●●●
●●●●●●●
●●●
●●●●●●●●●●●●●●●●
●●●
●
●
●●●●●●●●
●
●●●●
●●●●●●●●●●●●
●●●●●●●
●●●
●
●
●
●●●●●●
●
●
●●●●
●●●
●
●●●●
●●●●●●
●●●●●
●
●
●●●●●
●●●●●●
●●●●●
●●
●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●
●
●●●
●
●●●
●
●●●●●●●
●●●●●●●
●●
●●●●●●●●●●
●
●●●●
●
●●●
●●●●●●●●●
●●●●
●●●
●
●
●●●●●●
●●●●●●●●●
●
●
●●●●●●
●●●●●●
●●●
●
●
●●●●●●●●●●●
●
●●●
●
●
●
●●●●●●
●
●●●●
●
●●●●●●●●●●
●●●●●●●●
●
●●●●●●●●●
●●
●
●
●●●●●●●●●●●●●●●●
●●●
●
●●●●●●●
●●●●●●●●
●
●
●●●●●●●
●
●●●●
●●
●
●●●●●●●●
●
●●●●
●●
●
●
●
●●●●●●
●●●●
●
●●●
●
●●●●
●●●●●●●●●●●●●●●●
●
●●
●●●
●
●
●
●●●●●●●●●
●
●
●●●
●●
●
●
●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●
●●●●●●●●●●●
●
●●●●●●
●
●●●●●●●
●
●●●●●●
●
●
●●
●●●●●
●●●●
●●
●
●
●●
●●●●●●●●●●●
●●●●●●●●●●●
●
●
●
●●●●●●
●
●●●●●●
●
●
●
●●●●●●●●●●●
●●
●●●●●●
●●●●●●●
●●
●
●
●●
●●●●
●●●
●
●
●
●
●●●●●
●●●
●●●●●●●●●●●
●●●●●●●
●●●●●●●
●
●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●
●●●●●
●●
●●●●
●
●
●●
●
●●
●
●●●
●
●●
●●●●●
●
●
●
●●
●
●●●●●●●
●●●
●
●
●
●
●●●●
●●●●●
●
●●●
●●●●
●
●●●●●
●●
●
●
●●●
●●●●●●
●●
●
●
●●
●●
●●●●●●●
●
●●●
●●●●●●●●●
●●
●●●●●●●●●●●
●●●●●●●●
●●●
●
●●
●
●●
●●●●●●●
●●
●
●
●●●●●●●●●
●
●
●
●●●●
●●
●●●●
●●●●●●●
●●●●●
●●
●
●●●●
●●●●●●
●●●●●
●●●●●●●
●●
●
●●
●●
●●●●●●●
●
●
●●●
●●
●●●●●
●
●
●●
●●●●●●●
●●●●●
●●●●●●●
●●●●●●
●●●
●
●●
●●●●●●
●●●●●●
●
●
●
●●●
●●●●●●●
●
●●
●●●
●
●●●
●●
●
●●
●●
●●●●●
●●●●●
●
●●●●●●●
●
●●●●
●●●●●
●●●●●
●●●●●●●
●
●
●●
●●
●●●●●
●●●●
●●
●●●●●●
●
●●●
●●●●●●●●
●
●●
●●●●●●●
●●●●
●●
●●●●
●
●●●●
●●
●●●●●
●●
●●●●
●●●●●
●●●
●
●●
●●●●●
●●●●
●●
●●●●●●
●
●●
●●●●●●●●
●●●
●
●●●●●
●●●●
●●
●●●●●
●
●
●●●●●●●●●●
●●
●●●●●●●
●●
●
●
●●●●●●
●●●●●●●●●●●●
●●●
●●●
●●●
●
●
●●
●●●●●●
●●●●
●●
●●●●●●
●●●●
●●●●
●●
●●●●
●●●
●●●
●●●●●●●
●●●
●●●
●●●●●●●
●●
●●
●●●●
●●●●●
●
●●●●
●●●●
●
●●●●●
●●●●
●●●●●●
●
●
●●●
●●●
●●●●●●●
●
●●●
●●●●
●●●●●
●●
●●●●
●●●●
●
●
●●●●●●
●●●●●●
●●
●●●
●●●
●●
●●●●
●●●●
●●●
●●
●
●●●
●●●●●
●●
●●
●●●●●
●●
●●
●●●●●●
●●●
●●
●
●●●●
●●●
●●●●●
●●●
●
●●●●
●●●●
●
●●●●●
●●●
●
●●●
●
●
●
●●
●●●●●
●●
●
●●
●
●●
●
●●●
●
●●●
●●
●●●
●●●●●
●●●
●●
●
●●
●●●●
●●●●●
●●
●●
●●●●
●
●●●
●
●●●●
●●●
●
●●
●
●
●●●●●
●●●
●
●●●●
●●●●
●●●●●
●●●
●●●
●
●●●●
●
●●
●●
●
●●
●
●●●
●
●
●●
●
●●●
●
●●●●
●●●
●
●
●●
●
●●●
●
●
●●●
●●●
●
●
●●
●
●
●
●●
●●●
●
●
●●
●●
●●
●●●
●
●
●●
●
●●●
●
●●●
●
●●
●●
●●●
●●●
●●
●●●
●
●●
●
●●●
●
●
●
●
●●
●
●
●●●
●
●
●●
●●●●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●●
●
●●●
●●●
●
●●
●●
●
●●●
●●
●
●●
●
●●
●
●
●●●
●
●
●●
●●●
●
●●●
●
●●
●
●●●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●●
●●
●●
●
●●
●●
●
●
●●●
●●●
●●●
●
●●●
●
●●
●
●●●
●●●
●●●
●
●●●
●
●
●
●
●
●●
●●●
●●●
●
●●
●
●●
●
●●●
●
●●
●
●●●
●
●
●
●
●●
●
●
●●
●●●
●●
●
●●●
●
●●
●
●●●
●●
●●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●●●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●●
●
●●
●
●●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●●
●●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●●
●
●●
●
●
●
●
●
●
●●●
●
●●●●●
●●
●
●●●●●
●
●●
●
●
●●
●
●●●●●●
●
●
●●●●●
●
●●●
●
●
●●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
●
●●
●●
●
●●●
●●
●
●●
●
●
●
●
●●
●
●●●
●
●
●●●●●●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●●
●
●
●
●●●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●●●●●●
●●
●●
●
●
●
●●●
●●●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●●●●
●
●●●
●●●
●
●
●●●●
●
●
●
●●●●●●●●
●
●●●●●
●
●
●
●●
●
●
●●
●
●
●●
●●
●●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
0
2
4
6
0 1000 2000 3000 4000 5000
Bins based on document length
Meanr(d)perbin
●
●
●
●
●
●
●●●
●
●
●
●●●
●●
●●●
●●
●●
●
●●
●●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
●●●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●●
●●
●●
●●
●●
●
●
●●
●●●●
●●
●
●●
●●●
●
●●
●
●●
●
●
●●
●●●●●
●
●
●
●●
●
●●
●
●●
●
●●●●
●●
●●
●●●●●●
●●●●●●
●●●●●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●●
●
●●
●●●
●●
●●
●●●●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
●
●●
●●
●●●●●●
●●●●●
●●
●●
●
●
●●
●●
●
●●●●●●●●
●
●
●
●
●●●●●●
●●●
●
●●●
●
●
●●
●
●●●
●
●
●
●●
●
●
●●●●
●
●
●
●
●●
●●●
●
●
●●
●●●
●
●
●●●●●●●
●●
●●●
●
●●
●
●●●●●●
●●●
●
●
●
●
●●
●
●●●●●●●
●
●●●●●
●
●●●
●●●●●●
●
●●●●●●●●
●●●
●●●●
●
●●●●
●●●●●●
●
●●●
●
●●●
●
●
●●
●
●
●●●●●●
●
●
●●
●
●
●●●●●●
●
●●●●●●●●●
●●●●
●
●
●●
●
●●
●●●
●
●
●
●●●●●●●●●
●●
●
●●●●●
●●●
●●●
●
●
●●●●
●●●●●●●●●●●●●●●
●
●●
●
●●
●
0.0
0.5
1.0
1.5
0 1000 2000 3000 4000 5000
Bins based on document length
Meanr(d)perbin
19
Newspaper Titles
• Number of articles range
from one to 16,348,557
(mean 82,638, median 127)
• Subset of the 10 most
prevalent newspaper titles
• Mean r(d)
• Top 3 titles are regional
ones
20
Newspaper Title Mean r(d)
Leeuwarder courant: hoofdblad van
Friesland 0.15
Nieuwsblad van het Noorden 0.14
Limburgsch dagblad 0.12
Het vrije volk: democratisch-
socialistisch dagblad 0.10
De Tijd: godsdienstig-staatkundig
dagblad 0.08
Het Vaderland: staat- en letterkundig
nieuwsblad 0.07
Leeuwarder courant 0.07
Algemeen Handelsblad 0.06
De Telegraaf 0.06
Rotterdamsch nieuwsblad 0.05
Document Types
• Hardly any differences
for simulated queries
• In real queries, the
official notifications
stand out with a much
higher score
Real Simulated
Article 0.90 3.89
Advertisement 0.51 3.32
Notification* 4.80 3.22
Caption 0.84 3.06
Mean r(d) for BM25, c=100
21
* Familiebericht
Representativeness
of our Study
22
Top retrieved article 

for real queries
23
Top retrieved article(s)

for simulated queries
24
Differences between
query sets
• Real queries:
• Mean length: 2.32 terms
• Unique terms: 253,637
• 56 references to persons
or locations in top 100
terms
• Simulated queries:
• Mean length: 1.5 terms
• Unique terms: 2,028,617
• 5 references to persons
or locations in top 100
terms
25
1
5
10
50
100
500
1000
5000
10000
50000
100000
500000
1000000
5 10 15 20 25 30 35 40 50 60 65 70 90 110 170 700
Number of Views
Counts
Document views
• 2.7M out of 102M documents were viewed by users
• Shape of the frequency distribution plot is very similar to the r(d)
frequency plots
• Most documents only viewed once, very few are viewed more often
26
Overlap with views
• How many documents were viewed
by Delpher users, but not retrieved in
our study?
• Many non-retrieved documents
• were found using facets, operators
• scored a rank just below the cutoff
• Use a smoother cost function based
on the ranking
• Better representation of the real
search engine, taking faceted search /
operators into account
0
0.75
1.5
2.25
3
c=10 c=100 c=1000
Retrieved
Non-Retrieved
27
Document Types -
revisited
R(d)
Real
R(d)
Simulated
Viewed
Article 0.90 3.89 2.61%
Advertisement 0.51 3.32 2.07%
Official
Notification
4.80 3.22 40.10%
Caption 0.84 3.06 4.01%
28
Parameter Sets for
Preprocessing
[1]  L. Azzopardi and V. Vinay. Retrievability: An evaluation measure for higher order information access tasks. In Proceedings of the 17th ACM Conference on
Information and Knowledge Management, CIKM ’08, pages 561–570, New York, NY, USA, 2008. ACM.
29
Parameters Stemming Stopwords Operators
PS1 

(as used by[1])
yes removed removed
PS2 no kept removed
PS3
(only LM1000)
yes removed kept
0
500000
1000000
1500000
2000000
BM25 TFIDF LM1000 BM25 TFIDF LM1000
PS1
PS2
PS3
c=10 c=100
Overlap Retrieved
Documents and Viewed
30
Conclusions
• Real and simulated queries
differ in regard to
• composition of query sets
• number of (unique) terms
used
• use of named entities
• Apart from document length
and page confidence, we did
not find strong evidence for
technical bias
• Using real queries is important
for realistic results
• Simulation strategies for
queries need to be improved
31

More Related Content

Similar to Querylog-based Assessment of Retrievability Bias in Delpher

Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a  Large Newspaper CorpusQuerylog-based Assessment of Retrievability Bias in a  Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a Large Newspaper CorpusMyriam Traub
 
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...Data Con LA
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
The economics of information (1)
The economics of information (1)The economics of information (1)
The economics of information (1)WiLS
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Andre Freitas
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine LearningDelip Rao
 
Quant Data Analysis
Quant Data AnalysisQuant Data Analysis
Quant Data AnalysisSaad Chahine
 
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Jonathan Stray
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?Tuan Yang
 
Rsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature ImportanceRsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature ImportanceAlessya Visnjic
 
Northern New England TUG - January 2024
Northern New England TUG  -  January 2024Northern New England TUG  -  January 2024
Northern New England TUG - January 2024patrickdtherriault
 
Northern New England TUG - January 2024
Northern New England TUG -  January 2024Northern New England TUG -  January 2024
Northern New England TUG - January 2024patrickdtherriault
 
Data Science Salon: An Experiment on Data Science Algorithms Enabled by a Pil...
Data Science Salon: An Experiment on Data Science Algorithms Enabled by a Pil...Data Science Salon: An Experiment on Data Science Algorithms Enabled by a Pil...
Data Science Salon: An Experiment on Data Science Algorithms Enabled by a Pil...Formulatedby
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratorySara Hooker
 
CarolinaCon Presentation on Streaming Analytics
CarolinaCon Presentation on Streaming AnalyticsCarolinaCon Presentation on Streaming Analytics
CarolinaCon Presentation on Streaming AnalyticsJohn Eberhardt
 
Telefonica Lunch Seminar
Telefonica Lunch SeminarTelefonica Lunch Seminar
Telefonica Lunch SeminarNeal Lathia
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationJen Stirrup
 
Discovery Analytics: Tracking Ebola Spread
Discovery Analytics: Tracking Ebola SpreadDiscovery Analytics: Tracking Ebola Spread
Discovery Analytics: Tracking Ebola SpreadHPCC Systems
 

Similar to Querylog-based Assessment of Retrievability Bias in Delpher (20)

Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a  Large Newspaper CorpusQuerylog-based Assessment of Retrievability Bias in a  Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus
 
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
The economics of information (1)
The economics of information (1)The economics of information (1)
The economics of information (1)
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine Learning
 
Quant Data Analysis
Quant Data AnalysisQuant Data Analysis
Quant Data Analysis
 
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
 
Rsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature ImportanceRsqrd AI - ML Interpretability: Beyond Feature Importance
Rsqrd AI - ML Interpretability: Beyond Feature Importance
 
Northern New England TUG - January 2024
Northern New England TUG  -  January 2024Northern New England TUG  -  January 2024
Northern New England TUG - January 2024
 
Northern New England TUG - January 2024
Northern New England TUG -  January 2024Northern New England TUG -  January 2024
Northern New England TUG - January 2024
 
Data Science Salon: An Experiment on Data Science Algorithms Enabled by a Pil...
Data Science Salon: An Experiment on Data Science Algorithms Enabled by a Pil...Data Science Salon: An Experiment on Data Science Algorithms Enabled by a Pil...
Data Science Salon: An Experiment on Data Science Algorithms Enabled by a Pil...
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratory
 
6.6 Family and Youth Program Measurement Simplified
6.6 Family and Youth Program Measurement Simplified6.6 Family and Youth Program Measurement Simplified
6.6 Family and Youth Program Measurement Simplified
 
CarolinaCon Presentation on Streaming Analytics
CarolinaCon Presentation on Streaming AnalyticsCarolinaCon Presentation on Streaming Analytics
CarolinaCon Presentation on Streaming Analytics
 
Telefonica Lunch Seminar
Telefonica Lunch SeminarTelefonica Lunch Seminar
Telefonica Lunch Seminar
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
 
Discovery Analytics: Tracking Ebola Spread
Discovery Analytics: Tracking Ebola SpreadDiscovery Analytics: Tracking Ebola Spread
Discovery Analytics: Tracking Ebola Spread
 

More from Myriam Traub

Impact of Crowdsourcing OCR Improvements on Retrievability Bias
Impact of Crowdsourcing OCR Improvements  on Retrievability Bias Impact of Crowdsourcing OCR Improvements  on Retrievability Bias
Impact of Crowdsourcing OCR Improvements on Retrievability Bias Myriam Traub
 
Effectiveness of Gamesourcing Expert Painting Annotations
Effectiveness of Gamesourcing Expert Painting AnnotationsEffectiveness of Gamesourcing Expert Painting Annotations
Effectiveness of Gamesourcing Expert Painting AnnotationsMyriam Traub
 
The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool Criticism
The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool CriticismThe Nature Of Digitally-Produced Data: Towards Social-Scientific Tool Criticism
The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool CriticismMyriam Traub
 
Estimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
Estimating the Impact of OCR Quality on Research Tasks in the Digital HumanitiesEstimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
Estimating the Impact of OCR Quality on Research Tasks in the Digital HumanitiesMyriam Traub
 
Measuring the Effectiveness of Gamesourcing Expert Oil Painting Annotations
Measuring the Effectiveness of Gamesourcing Expert Oil Painting AnnotationsMeasuring the Effectiveness of Gamesourcing Expert Oil Painting Annotations
Measuring the Effectiveness of Gamesourcing Expert Oil Painting AnnotationsMyriam Traub
 

More from Myriam Traub (6)

Impact of Crowdsourcing OCR Improvements on Retrievability Bias
Impact of Crowdsourcing OCR Improvements  on Retrievability Bias Impact of Crowdsourcing OCR Improvements  on Retrievability Bias
Impact of Crowdsourcing OCR Improvements on Retrievability Bias
 
Effectiveness of Gamesourcing Expert Painting Annotations
Effectiveness of Gamesourcing Expert Painting AnnotationsEffectiveness of Gamesourcing Expert Painting Annotations
Effectiveness of Gamesourcing Expert Painting Annotations
 
The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool Criticism
The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool CriticismThe Nature Of Digitally-Produced Data: Towards Social-Scientific Tool Criticism
The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool Criticism
 
Tool Criticism
Tool CriticismTool Criticism
Tool Criticism
 
Estimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
Estimating the Impact of OCR Quality on Research Tasks in the Digital HumanitiesEstimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
Estimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
 
Measuring the Effectiveness of Gamesourcing Expert Oil Painting Annotations
Measuring the Effectiveness of Gamesourcing Expert Oil Painting AnnotationsMeasuring the Effectiveness of Gamesourcing Expert Oil Painting Annotations
Measuring the Effectiveness of Gamesourcing Expert Oil Painting Annotations
 

Recently uploaded

preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 

Recently uploaded (20)

preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 

Querylog-based Assessment of Retrievability Bias in Delpher