SlideShare a Scribd company logo
1 of 42
Download to read offline
Querylog-based Assessment of
Retrievability Bias in a
Large Newspaper Corpus
Myriam C. Traub, Thaer Samar, Jacco van Ossenbruggen, 

Jiyin He, Arjen de Vries, Lynda Hardman
Motivation
• Users want to be able
• to get a fair overview of the
archive’s content
• to access all (relevant) documents
in the archive
2
Motivation
• Users want to be able
• to get a fair overview of the
archive’s content
• to access all (relevant) documents
in the archive
• However,
• data collections are implicitly
and explicitly biased,
• users are biased,
• and technology induces even
more bias(es)
2
Motivation
• Users want to be able
• to get a fair overview of the
archive’s content
• to access all (relevant) documents
in the archive
• However,
• data collections are implicitly
and explicitly biased,
• users are biased,
• and technology induces even
more bias(es)
… which I can deal
with if the bias is made
explicit.
2
• Bias in search results
• Potential sources are:
Retrievability Bias
3
• Bias in search results
• Potential sources are:
• User interest
• Search skills of users
• Users’ willingness to explore results
Retrievability Bias
3
• Bias in search results
• Potential sources are:
• User interest
• Search skills of users
• Users’ willingness to explore results
• Collection bias (indexed documents)
Retrievability Bias
3
• Bias in search results
• Potential sources are:
• User interest
• Search skills of users
• Users’ willingness to explore results
• Collection bias (indexed documents)
• OCR errors
Retrievability Bias
3
• Bias in search results
• Potential sources are:
• User interest
• Search skills of users
• Users’ willingness to explore results
• Collection bias (indexed documents)
• OCR errors
• Side-effects of ranking algorithm
Retrievability Bias
3
• Bias in search results
• Potential sources are:
• User interest
• Search skills of users
• Users’ willingness to explore results
• Collection bias (indexed documents)
• OCR errors
• Side-effects of ranking algorithm
• Side-effects of result presentation
Retrievability Bias
3
• Bias in search results
• Potential sources are:
• User interest
• Search skills of users
• Users’ willingness to explore results
• Collection bias (indexed documents)
• OCR errors
• Side-effects of ranking algorithm
• Side-effects of result presentation
Retrievability Bias
3
Research Questions
RQ1: Detecting and quantifying
retrievability bias
RQ2: Influence of document features on
retrievability bias
RQ3: Representativeness of simulated
queries and experimental setup
4
Retrievability
• Introduced by Azzopardi et al. [1] in 2008 in a study
based on born-digital documents and simulated
queries
• Retrievability score counts how 

often a document is retrieved as one of 

the top K documents by a given set of queries
• Gini coefficient and Lorenz curves can visualize and 

quantify inequality in the distribution of the scores
5
[1]  L. Azzopardi and V. Vinay. Retrievability: An evaluation measure for higher order information access tasks. In Proceedings of the 17th
ACM Conference on Information and Knowledge Management, CIKM ’08, pages 561–570, New York, NY, USA, 2008. ACM.
Lorenz Curve 

& Gini Coefficient
• Introduced by economists to
express and visualize inequality
in wealth distribution
• Gini coefficient (G):
6
Lorenz curve for n=5
Lorenz Curve 

& Gini Coefficient
• Introduced by economists to
express and visualize inequality
in wealth distribution
• Gini coefficient (G):
• perfect communist (G=0)
6
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Lorenz curve
% of population
%ofincome
1, 1, 1, 1, 1
Lorenz curve for n=5
Lorenz Curve 

& Gini Coefficient
• Introduced by economists to
express and visualize inequality
in wealth distribution
• Gini coefficient (G):
• perfect communist (G=0)
• in-between (G=0.5)
6
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Lorenz curve
% of population
%ofincome
1, 1, 1, 1, 1
0, 0, 1, 1, 2
Lorenz curve for n=5
Lorenz Curve 

& Gini Coefficient
• Introduced by economists to
express and visualize inequality
in wealth distribution
• Gini coefficient (G):
• perfect communist (G=0)
• in-between (G=0.5)
• perfect tyranny (G=0.8)
6
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Lorenz curve
% of population
%ofincome
1, 1, 1, 1, 1
0, 0, 1, 1, 2
0, 0, 0, 0, 1
Lorenz curve for n=5
Lorenz Curve 

& Gini Coefficient
• Introduced by economists to
express and visualize inequality
in wealth distribution
• Gini coefficient (G):
• perfect communist (G=0)
• in-between (G=0.5)
• perfect tyranny (G=0.8)
6
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Lorenz curve
% of population
%ofincome
1, 1, 1, 1, 1
0, 0, 1, 1, 2
0, 0, 0, 0, 1
% of documents
%ofaccumulatedr(d)
Lorenz curve for n=5
Lorenz Curve 

& Gini Coefficient
• Introduced by economists to
express and visualize inequality
in wealth distribution
• Gini coefficient (G):
• perfect communist (G=0)
• in-between (G=0.5)
• perfect tyranny (G=0.8)
• There is no good or bad G.
6
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Lorenz curve
% of population
%ofincome
1, 1, 1, 1, 1
0, 0, 1, 1, 2
0, 0, 0, 0, 1
% of documents
%ofaccumulatedr(d)
Experimental setup /
Parameters
• Digitized collection of Dutch historic newspapers
• View data extracted from user logs
• Real queries, simulated queries
• Standard Information Retrieval models: TFIDF, LM1000, BM25
(using Lemur framework)
• Pre-processing (corpus & queries): Stemming, stopword removal,
operator removal
• Cutoff values: c=10, c=100, c=1000
7
[1]  L. Azzopardi and V. Vinay. Retrievability: An evaluation measure for higher order information access tasks. In Proceedings of the 17th
ACM Conference on Information and Knowledge Management, CIKM ’08, pages 561–570, New York, NY, USA, 2008. ACM.
Document Collection:

Dutch Newspaper Archive
June 1618 - December 1995
Articles 67% 69,237,655
Advertisements 29% 29,591,599
Notifications* 2% 1,918,375
Captions 2% 1,970,899
Total Size 102,718,528
Vocabulary Size 353,086,358
* Familiebericht
8
Simulated Queries
• Followed similar strategy as previous studies
• Top 2 million single terms from the
preprocessed corpus + top 2 million bigram
terms
• No filtering for OCR errors
9
Real Queries
• User logs collected between March and July 2015 on
Delpher, the online web service of the National Library of
the Netherlands
• Extracted queries and viewed items related to newspaper
archive
• Total of 957,239 unique queries
10
RQ1:
Detecting and Quantifying
Retrievability Bias
11
Inequality
c=10
Real queries
GBM25 = 0.97
Simulated queries
GBM25 = 0.8512
Inequality
c=10
Real queries
GBM25 = 0.97
Simulated queries
GBM25 = 0.8512
A very large fraction of
documents is never
retrieved.
Inequality
Real queries, c=1000
GBM25 = 0.76
Simulated queries, c=100
GBM25 = 0.5213
• The Lorenz curves and Gini values
• are strongly influenced by non-retrieved
documents,
• can indicate the degree of bias, but they 

tell us nothing about the type of bias.
14
Limitations
• The Lorenz curves and Gini values
• are strongly influenced by non-retrieved
documents,
• can indicate the degree of bias, but they 

tell us nothing about the type of bias.
14
Limitations
Does
the inequality arise
from the users’ interest /
search behavior?
Or from a technological bias
towards a particular
document feature?
Retrievability scores
Meaningful?
• Created 4 subsets of documents according to their score and selected a set of
target documents from each subset
• Generated queries from selected documents, tailored to retrieve these specific
documents
• Performed search tasks and measured ranks of target documents
• Showed that documents with lower score are actually harder to find
15
Rarely Sometimes Often Very often
RQ2:
Influence of
Document Features
16
●●●●●●●●
●
●●●●●●●●
●
●
●
●●●●●
●
●●●●●●●
●●
●●●●●●●
●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●●●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●
●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●●
●
●
●●
●●●
●
●
●●●
●●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●●
●●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●
●●●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●●
●
●
●
●●●
●
●
●●
●
●
●●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●●●
●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●●●●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●
●●
●●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●●
●●
●
●
●
●●●
●
●
0.5
1.0
1.5
2.0
0 1000 2000 3000 4000 5000
Bins based on page confidence (PC)
Meanr(d)perbin
OCR Confidence Scores
• Generated by OCR
engine during digitization
• Documents ordered by
page confidence (PC) and
split into bins
• Mean score per bin
17
Document Length
• Documents ordered by length and split into bins of 20,000
• LM1000 (left): upward trend, longer documents more
retrievable
• BM25 and TFIDF (right): seem to be better at retrieving
documents of medium length
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●●●
●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●
●
●●●●●●
●●●●●●●●●●●●●●●●●●
●●
●●●
●●●●●●
●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●
●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●
●●●●●●●●●●●●●
●●●●●●●●●●
●
●●●●●●●●●●●●
●
●
●
●
●●●●●●
●
●●●●●●●●●●●●●●●●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●●●
●●●
●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●
●
●
●●●●●
●●●●●●●●●●●●●●●
●●●
●●●●●
●●●●
●●●
●●●●●●●●●●
●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●
●
●●
●●●
●●●
●●●●●●●●
●
●●●●●
●
●●●
●●●
●●●●●●●●
●
●
●
●●●●●
●●●●
●●●●●●●●●●●●
●
●●●●●●
●●●
●●●
●●●●●●●●●●●●●●●●
●●●
●●●
●●●●●●●●●●●●●●●●
●
●●
●●●
●●●●●●●●●●●
●
●●●●●●●
●●●
●
●●●●●●●●
●
●●●●●
●●●●●
●●●●●●●●●●●●●●●●●
●●
●
●
●●●
●●●●●●●
●
●●
●●●●●
●●
●●
●●●●●●●●●●●
●
●
●●●●
●
●
●●●
●
●●●●●●●●●●●●●●●●
●●●●
●
●●●●●●●●●●●●●●●●
●●●●
●
●●●●●●●●●●●●●●●●
●●
●
●●
●
●
●●●●
●●
●
●
●
●●●●●
●●
●
●●●●
●●●●●●●●●●●●●
●●●
●●●
●
●●●●●●●
●●●●●●
●●●
●●
●
●
●●●●
●●●●●●●●●
●●●
●●●●●●●●●●●
●
●●●●●
●●●
●
●●●●●●●●
●
●
●
●●●●●●●●●●
●
●●
●●●●●●
●●●●●
●●●
●
●●●●●●●●●●●●●●●
●●●
●
●●●●●●●●
●
●
●●●●●
●●●
●
●●●●●●●●
●●●●●●●
●●●
●●●●●●●●●●●●●●●●
●●●
●
●
●●●●●●●●
●
●●●●
●●●●●●●●●●●●
●●●●●●●
●●●
●
●
●
●●●●●●
●
●
●●●●
●●●
●
●●●●
●●●●●●
●●●●●
●
●
●●●●●
●●●●●●
●●●●●
●●
●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●
●
●●●
●
●●●
●
●●●●●●●
●●●●●●●
●●
●●●●●●●●●●
●
●●●●
●
●●●
●●●●●●●●●
●●●●
●●●
●
●
●●●●●●
●●●●●●●●●
●
●
●●●●●●
●●●●●●
●●●
●
●
●●●●●●●●●●●
●
●●●
●
●
●
●●●●●●
●
●●●●
●
●●●●●●●●●●
●●●●●●●●
●
●●●●●●●●●
●●
●
●
●●●●●●●●●●●●●●●●
●●●
●
●●●●●●●
●●●●●●●●
●
●
●●●●●●●
●
●●●●
●●
●
●●●●●●●●
●
●●●●
●●
●
●
●
●●●●●●
●●●●
●
●●●
●
●●●●
●●●●●●●●●●●●●●●●
●
●●
●●●
●
●
●
●●●●●●●●●
●
●
●●●
●●
●
●
●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●
●●●●●●●●●●●
●
●●●●●●
●
●●●●●●●
●
●●●●●●
●
●
●●
●●●●●
●●●●
●●
●
●
●●
●●●●●●●●●●●
●●●●●●●●●●●
●
●
●
●●●●●●
●
●●●●●●
●
●
●
●●●●●●●●●●●
●●
●●●●●●
●●●●●●●
●●
●
●
●●
●●●●
●●●
●
●
●
●
●●●●●
●●●
●●●●●●●●●●●
●●●●●●●
●●●●●●●
●
●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●
●●●●●
●●
●●●●
●
●
●●
●
●●
●
●●●
●
●●
●●●●●
●
●
●
●●
●
●●●●●●●
●●●
●
●
●
●
●●●●
●●●●●
●
●●●
●●●●
●
●●●●●
●●
●
●
●●●
●●●●●●
●●
●
●
●●
●●
●●●●●●●
●
●●●
●●●●●●●●●
●●
●●●●●●●●●●●
●●●●●●●●
●●●
●
●●
●
●●
●●●●●●●
●●
●
●
●●●●●●●●●
●
●
●
●●●●
●●
●●●●
●●●●●●●
●●●●●
●●
●
●●●●
●●●●●●
●●●●●
●●●●●●●
●●
●
●●
●●
●●●●●●●
●
●
●●●
●●
●●●●●
●
●
●●
●●●●●●●
●●●●●
●●●●●●●
●●●●●●
●●●
●
●●
●●●●●●
●●●●●●
●
●
●
●●●
●●●●●●●
●
●●
●●●
●
●●●
●●
●
●●
●●
●●●●●
●●●●●
●
●●●●●●●
●
●●●●
●●●●●
●●●●●
●●●●●●●
●
●
●●
●●
●●●●●
●●●●
●●
●●●●●●
●
●●●
●●●●●●●●
●
●●
●●●●●●●
●●●●
●●
●●●●
●
●●●●
●●
●●●●●
●●
●●●●
●●●●●
●●●
●
●●
●●●●●
●●●●
●●
●●●●●●
●
●●
●●●●●●●●
●●●
●
●●●●●
●●●●
●●
●●●●●
●
●
●●●●●●●●●●
●●
●●●●●●●
●●
●
●
●●●●●●
●●●●●●●●●●●●
●●●
●●●
●●●
●
●
●●
●●●●●●
●●●●
●●
●●●●●●
●●●●
●●●●
●●
●●●●
●●●
●●●
●●●●●●●
●●●
●●●
●●●●●●●
●●
●●
●●●●
●●●●●
●
●●●●
●●●●
●
●●●●●
●●●●
●●●●●●
●
●
●●●
●●●
●●●●●●●
●
●●●
●●●●
●●●●●
●●
●●●●
●●●●
●
●
●●●●●●
●●●●●●
●●
●●●
●●●
●●
●●●●
●●●●
●●●
●●
●
●●●
●●●●●
●●
●●
●●●●●
●●
●●
●●●●●●
●●●
●●
●
●●●●
●●●
●●●●●
●●●
●
●●●●
●●●●
●
●●●●●
●●●
●
●●●
●
●
●
●●
●●●●●
●●
●
●●
●
●●
●
●●●
●
●●●
●●
●●●
●●●●●
●●●
●●
●
●●
●●●●
●●●●●
●●
●●
●●●●
●
●●●
●
●●●●
●●●
●
●●
●
●
●●●●●
●●●
●
●●●●
●●●●
●●●●●
●●●
●●●
●
●●●●
●
●●
●●
●
●●
●
●●●
●
●
●●
●
●●●
●
●●●●
●●●
●
●
●●
●
●●●
●
●
●●●
●●●
●
●
●●
●
●
●
●●
●●●
●
●
●●
●●
●●
●●●
●
●
●●
●
●●●
●
●●●
●
●●
●●
●●●
●●●
●●
●●●
●
●●
●
●●●
●
●
●
●
●●
●
●
●●●
●
●
●●
●●●●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●●
●
●●●
●●●
●
●●
●●
●
●●●
●●
●
●●
●
●●
●
●
●●●
●
●
●●
●●●
●
●●●
●
●●
●
●●●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●●
●●
●●
●
●●
●●
●
●
●●●
●●●
●●●
●
●●●
●
●●
●
●●●
●●●
●●●
●
●●●
●
●
●
●
●
●●
●●●
●●●
●
●●
●
●●
●
●●●
●
●●
●
●●●
●
●
●
●
●●
●
●
●●
●●●
●●
●
●●●
●
●●
●
●●●
●●
●●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●●●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●●
●
●●
●
●●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●●
●●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●●
●
●●
●
●
●
●
●
●
●●●
●
●●●●●
●●
●
●●●●●
●
●●
●
●
●●
●
●●●●●●
●
●
●●●●●
●
●●●
●
●
●●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
●
●●
●●
●
●●●
●●
●
●●
●
●
●
●
●●
●
●●●
●
●
●●●●●●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●●
●
●
●
●●●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●●●●●●
●●
●●
●
●
●
●●●
●●●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●●●●
●
●●●
●●●
●
●
●●●●
●
●
●
●●●●●●●●
●
●●●●●
●
●
●
●●
●
●
●●
●
●
●●
●●
●●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
0
2
4
6
0 1000 2000 3000 4000 5000
Bins based on document length
Meanr(d)perbin
●
●
●
●
●
●
●●●
●
●
●
●●●
●●
●●●
●●
●●
●
●●
●●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
●●●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●●
●●
●●
●●
●●
●
●
●●
●●●●
●●
●
●●
●●●
●
●●
●
●●
●
●
●●
●●●●●
●
●
●
●●
●
●●
●
●●
●
●●●●
●●
●●
●●●●●●
●●●●●●
●●●●●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●●
●
●●
●●●
●●
●●
●●●●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
●
●●
●●
●●●●●●
●●●●●
●●
●●
●
●
●●
●●
●
●●●●●●●●
●
●
●
●
●●●●●●
●●●
●
●●●
●
●
●●
●
●●●
●
●
●
●●
●
●
●●●●
●
●
●
●
●●
●●●
●
●
●●
●●●
●
●
●●●●●●●
●●
●●●
●
●●
●
●●●●●●
●●●
●
●
●
●
●●
●
●●●●●●●
●
●●●●●
●
●●●
●●●●●●
●
●●●●●●●●
●●●
●●●●
●
●●●●
●●●●●●
●
●●●
●
●●●
●
●
●●
●
●
●●●●●●
●
●
●●
●
●
●●●●●●
●
●●●●●●●●●
●●●●
●
●
●●
●
●●
●●●
●
●
●
●●●●●●●●●
●●
●
●●●●●
●●●
●●●
●
●
●●●●
●●●●●●●●●●●●●●●
●
●●
●
●●
●
0.0
0.5
1.0
1.5
0 1000 2000 3000 4000 5000
Bins based on document length
Meanr(d)perbin
18
RQ3:
Representativeness of
Simulated Queries and
Experimental Setup
19
Top retrieved article 

for real queries
20
Top retrieved article(s)

for simulated queries
21
Differences between
query sets
• Real queries:
• Mean length: 2.32 terms
• Unique terms: 253,637
• 56 references to persons
or locations in top 100
terms
• Simulated queries:
• Mean length: 1.5 terms
• Unique terms: 2,028,617
• 5 references to persons
or locations in top 100
terms
22
1
5
10
50
100
500
1000
5000
10000
50000
100000
500000
1000000
5 10 15 20 25 30 35 40 50 60 65 70 90 110 170 700
Number of Views
Counts
Actual views
• Only 2.7M out of 102M documents were viewed by users (G = 0.98)
• most documents have not been viewed at all
• many documents only viewed once
• very few are viewed multiple times
23
Overlap with views
• How many documents were viewed
by the users, but not retrieved in
our study?
• Many non-retrieved documents
• were found using facets or
operators
• scored a rank just below the
cutoff
• Better representation of the
real search engine, taking faceted
search and operators into account
0
0.75
1.5
2.25
3
c=10 c=100 c=1000
Retrieved
Non-Retrieved
24
Document Types Viewed
Simulated Real Viewed
Article 3.89 0.90 2.61%
Advertisement 3.32 0.51 2.07%
Notification 3.22 4.80 40.10%
Caption 3.06 0.84 4.01%
25
Conclusions
• Real and simulated queries differ in
regard to
• composition of query sets
• number of (unique) terms used
• use of named entities
• Apart from document length and
page confidence, we did not find
strong evidence for technical bias
• Using real queries is important for
realistic results
• Simulation strategies for queries
need to be improved
• Retrievability studies should take
faceted search and operators into
account
26
We would like to thank the
for making the newspaper corpus and the
(sensitive) user data available to us for research.
travel grant
Supported 

by
Querylog-based Assessment of
Retrievability Bias in a Large
Newspaper Corpus

More Related Content

Viewers also liked

Luxury news impact china market and salvatore ferragamo
Luxury news impact   china market and salvatore ferragamoLuxury news impact   china market and salvatore ferragamo
Luxury news impact china market and salvatore ferragamoAlina Blaga
 
Startup village biotech_innovations
Startup village biotech_innovationsStartup village biotech_innovations
Startup village biotech_innovationsIvan Okhapkin
 
八大方法--鉴别手表的真伪
八大方法--鉴别手表的真伪八大方法--鉴别手表的真伪
八大方法--鉴别手表的真伪rafaelzone
 
Multimedia sharing
Multimedia sharingMultimedia sharing
Multimedia sharingPaola Garcia
 
UKTI Inward Investment Presentation / TR
UKTI Inward Investment Presentation / TRUKTI Inward Investment Presentation / TR
UKTI Inward Investment Presentation / TRMehmet Basaran
 
WATCH LIST 2017-INTERNATIONAL CRISIS GROUP REPORT
WATCH LIST 2017-INTERNATIONAL CRISIS GROUP REPORTWATCH LIST 2017-INTERNATIONAL CRISIS GROUP REPORT
WATCH LIST 2017-INTERNATIONAL CRISIS GROUP REPORTMYO AUNG Myanmar
 
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...Setia Pramana
 

Viewers also liked (13)

webEX
webEXwebEX
webEX
 
Business consultancy in India
Business consultancy in IndiaBusiness consultancy in India
Business consultancy in India
 
Luxury news impact china market and salvatore ferragamo
Luxury news impact   china market and salvatore ferragamoLuxury news impact   china market and salvatore ferragamo
Luxury news impact china market and salvatore ferragamo
 
Startup village biotech_innovations
Startup village biotech_innovationsStartup village biotech_innovations
Startup village biotech_innovations
 
八大方法--鉴别手表的真伪
八大方法--鉴别手表的真伪八大方法--鉴别手表的真伪
八大方法--鉴别手表的真伪
 
Multimedia sharing
Multimedia sharingMultimedia sharing
Multimedia sharing
 
Freeadsplanet
FreeadsplanetFreeadsplanet
Freeadsplanet
 
Assignment 3.2
Assignment 3.2Assignment 3.2
Assignment 3.2
 
UKTI Inward Investment Presentation / TR
UKTI Inward Investment Presentation / TRUKTI Inward Investment Presentation / TR
UKTI Inward Investment Presentation / TR
 
WATCH LIST 2017-INTERNATIONAL CRISIS GROUP REPORT
WATCH LIST 2017-INTERNATIONAL CRISIS GROUP REPORTWATCH LIST 2017-INTERNATIONAL CRISIS GROUP REPORT
WATCH LIST 2017-INTERNATIONAL CRISIS GROUP REPORT
 
Yoga para Crianças
Yoga para CriançasYoga para Crianças
Yoga para Crianças
 
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
 
Jugaad et paysage
Jugaad et paysageJugaad et paysage
Jugaad et paysage
 

Similar to Assessing Retrievability Bias in a Large Newspaper Corpus

Querylog-based Assessment of Retrievability Bias in Delpher
Querylog-based Assessment of Retrievability Bias in DelpherQuerylog-based Assessment of Retrievability Bias in Delpher
Querylog-based Assessment of Retrievability Bias in DelpherMyriam Traub
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiVijay Susheedran C G
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
 
chương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfchương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfphongnguyen312110237
 
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...Symeon Papadopoulos
 
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdfMeet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf09372002dedi
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079ibankuk
 
Analyzing behavioral data for improving search experience
Analyzing behavioral data for improving search experienceAnalyzing behavioral data for improving search experience
Analyzing behavioral data for improving search experiencePavel Serdyukov
 
Digitised collections: Toward a digital strategy for for the NHM, London
Digitised collections: Toward a digital strategy for for the NHM, LondonDigitised collections: Toward a digital strategy for for the NHM, London
Digitised collections: Toward a digital strategy for for the NHM, LondonVince Smith
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 
Statistical Inference for development statistical model.pptx
Statistical Inference for development statistical model.pptxStatistical Inference for development statistical model.pptx
Statistical Inference for development statistical model.pptxQasimGull
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsYONG ZHENG
 

Similar to Assessing Retrievability Bias in a Large Newspaper Corpus (20)

Querylog-based Assessment of Retrievability Bias in Delpher
Querylog-based Assessment of Retrievability Bias in DelpherQuerylog-based Assessment of Retrievability Bias in Delpher
Querylog-based Assessment of Retrievability Bias in Delpher
 
Data Mining Lecture_1.pptx
Data Mining Lecture_1.pptxData Mining Lecture_1.pptx
Data Mining Lecture_1.pptx
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
chương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfchương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdf
 
datamining-lect1.pptx
datamining-lect1.pptxdatamining-lect1.pptx
datamining-lect1.pptx
 
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
 
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdfMeet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
 
Big data
Big dataBig data
Big data
 
Analyzing behavioral data for improving search experience
Analyzing behavioral data for improving search experienceAnalyzing behavioral data for improving search experience
Analyzing behavioral data for improving search experience
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
 
Digitised collections: Toward a digital strategy for for the NHM, London
Digitised collections: Toward a digital strategy for for the NHM, LondonDigitised collections: Toward a digital strategy for for the NHM, London
Digitised collections: Toward a digital strategy for for the NHM, London
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Statistical Inference for development statistical model.pptx
Statistical Inference for development statistical model.pptxStatistical Inference for development statistical model.pptx
Statistical Inference for development statistical model.pptx
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
 
Sensors1(1)
Sensors1(1)Sensors1(1)
Sensors1(1)
 

More from Myriam Traub

Impact of Crowdsourcing OCR Improvements on Retrievability Bias
Impact of Crowdsourcing OCR Improvements  on Retrievability Bias Impact of Crowdsourcing OCR Improvements  on Retrievability Bias
Impact of Crowdsourcing OCR Improvements on Retrievability Bias Myriam Traub
 
Effectiveness of Gamesourcing Expert Painting Annotations
Effectiveness of Gamesourcing Expert Painting AnnotationsEffectiveness of Gamesourcing Expert Painting Annotations
Effectiveness of Gamesourcing Expert Painting AnnotationsMyriam Traub
 
The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool Criticism
The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool CriticismThe Nature Of Digitally-Produced Data: Towards Social-Scientific Tool Criticism
The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool CriticismMyriam Traub
 
Impact Analysis of OCR Quality on Research Tasks in Digital Archives
Impact Analysis of OCR Quality on Research Tasks in Digital ArchivesImpact Analysis of OCR Quality on Research Tasks in Digital Archives
Impact Analysis of OCR Quality on Research Tasks in Digital ArchivesMyriam Traub
 
Estimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
Estimating the Impact of OCR Quality on Research Tasks in the Digital HumanitiesEstimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
Estimating the Impact of OCR Quality on Research Tasks in the Digital HumanitiesMyriam Traub
 
Measuring the Effectiveness of Gamesourcing Expert Oil Painting Annotations
Measuring the Effectiveness of Gamesourcing Expert Oil Painting AnnotationsMeasuring the Effectiveness of Gamesourcing Expert Oil Painting Annotations
Measuring the Effectiveness of Gamesourcing Expert Oil Painting AnnotationsMyriam Traub
 

More from Myriam Traub (7)

Impact of Crowdsourcing OCR Improvements on Retrievability Bias
Impact of Crowdsourcing OCR Improvements  on Retrievability Bias Impact of Crowdsourcing OCR Improvements  on Retrievability Bias
Impact of Crowdsourcing OCR Improvements on Retrievability Bias
 
Effectiveness of Gamesourcing Expert Painting Annotations
Effectiveness of Gamesourcing Expert Painting AnnotationsEffectiveness of Gamesourcing Expert Painting Annotations
Effectiveness of Gamesourcing Expert Painting Annotations
 
The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool Criticism
The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool CriticismThe Nature Of Digitally-Produced Data: Towards Social-Scientific Tool Criticism
The Nature Of Digitally-Produced Data: Towards Social-Scientific Tool Criticism
 
Impact Analysis of OCR Quality on Research Tasks in Digital Archives
Impact Analysis of OCR Quality on Research Tasks in Digital ArchivesImpact Analysis of OCR Quality on Research Tasks in Digital Archives
Impact Analysis of OCR Quality on Research Tasks in Digital Archives
 
Tool Criticism
Tool CriticismTool Criticism
Tool Criticism
 
Estimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
Estimating the Impact of OCR Quality on Research Tasks in the Digital HumanitiesEstimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
Estimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
 
Measuring the Effectiveness of Gamesourcing Expert Oil Painting Annotations
Measuring the Effectiveness of Gamesourcing Expert Oil Painting AnnotationsMeasuring the Effectiveness of Gamesourcing Expert Oil Painting Annotations
Measuring the Effectiveness of Gamesourcing Expert Oil Painting Annotations
 

Recently uploaded

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 

Recently uploaded (20)

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 

Assessing Retrievability Bias in a Large Newspaper Corpus