Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
WSDM Tutorial February 2nd 2015
Grace Hui Yang
Marc Sloan
Jun Wang
Guest Speaker: Charlie Clarke
Dynamic Information Retri...
Dynamic Information
Retrieval
Dynamic Information Retrieval Modeling Tutorial
20152
Document
s to
explore
Informatio
n
nee...
Evolving IR
Dynamic Information Retrieval Modeling Tutorial
20153
 Paradigm shifts in IR as new models
emerge
 e.g. VSM ...
Outline
Dynamic Information Retrieval Modeling Tutorial
20154
 Introduction & Theory
 Static IR
 Interactive IR
 Dynam...
Conceptual Model – Static IR
Dynamic Information Retrieval Modeling Tutorial
20155
Static IR
Interactive
IR
Dynamic
IR
 N...
Characteristics of Static IR
Dynamic Information Retrieval Modeling Tutorial
20156
 Does not learn directly from
user
 P...
Dynamic Information Retrieval Modeling Tutorial
20157
Commonly Used Static IR
Models
BM25
PageRank
Language
Model
Learning...
Feedback in IR
Dynamic Information Retrieval Modeling Tutorial
20158
Outline
Dynamic Information Retrieval Modeling Tutorial
20159
 Introduction & Theory
 Static IR
 Interactive IR
 Dynam...
Conceptual Model – Interactive
IR
Dynamic Information Retrieval Modeling Tutorial
201510
Static IR
Interactive
IR
Dynamic
...
Learn the user’s taste
interactively!
At the same time, provide good
recommendations!
Dynamic Information Retrieval Modeli...
Toy Example
Dynamic Information Retrieval Modeling Tutorial
201512
 Multi-Page search scenario
 User image searches for ...
Toy Example – Static
Ranking
Dynamic Information Retrieval Modeling Tutorial
201513
 Ranked according to PRP
Page 1 Page ...
Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial
201514
 Interactive Search
 Improve 2nd...
Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial
201515
 Ranked according to PRP and Rocc...
Toy Example – Relevance
Feedback
Dynamic Information Retrieval Modeling Tutorial
201516
 No click when searching for anim...
Toy Example – Value
Function
Dynamic Information Retrieval Modeling Tutorial
201517
 Optimize both pages using dynamic IR...
1 0.8 0.1 0
0.8 1 0.1 0
0.1 0.1 1 0.95
0 0 0.95 1
Toy Example - Covariance
Dynamic Information Retrieval Modeling Tutorial...
Toy Example – Myopic Value
Dynamic Information Retrieval Modeling Tutorial
201519
 For myopic ranking, 𝑉2
= 16.380
Page 1...
Toy Example – Myopic
Ranking
Dynamic Information Retrieval Modeling Tutorial
201520
 Page 2 ranking stays the same regard...
Toy Example – Optimal Value
Dynamic Information Retrieval Modeling Tutorial
201521
 For optimal ranking, 𝑉2
= 16.528
Page...
Toy Example – Optimal Ranking
Dynamic Information Retrieval Modeling Tutorial
201522
 If car clicked, Jaguar logo is more...
Toy Example – Optimal Ranking
Dynamic Information Retrieval Modeling Tutorial
201523
 In all other scenarios, rank animal...
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fruit


O: doc about apple...
Static IR Visualization
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple frui...
Static IR Visualization
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple frui...
Interactive IR Update
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fruit
...
Interactive IR Update
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fruit
...
Dynamic Ranking Principle
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fr...
Dynamic Ranking Principle
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fr...
Dynamic Ranking Principle
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fr...
Dynamic Ranking Principle
x
x
x
x
x x
x
x
xx
x
o
o
o o
o
o
o








x x
doc about apple ceo
X: doc about apple fr...
Interactive vs Dynamic IR
Dynamic Information Retrieval Modeling Tutorial
201533
• Treats
interactions
independently
• Res...
Interactive & Dynamic
Techniques
Dynamic Information Retrieval Modeling Tutorial
201534
• Rocchio
equation in
Relevance
Fe...
Outline
Dynamic Information Retrieval Modeling Tutorial
201535
 Introduction & Theory
 Static IR
 Interactive IR
 Dyna...
Conceptual Model – Interactive
IR
Dynamic Information Retrieval Modeling Tutorial
201536
Static IR
Interactive
IR
Dynamic
...
Characteristics of Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201537
Rich interactions
Query formulation...
Characteristics of Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201538
Temporal dependency
clicked
document...
Characteristics of Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201539
Overall goal
Optimize over all iter...
40/33
Dynamic Information
Retrieval
Dynamic Relevance
Dynamic Users
Dynamic Queries
Dynamic Documents
Dynamic Information ...
Why Not Existing Supervised
Learning for Dynamic IR Modeling?
Dynamic Information Retrieval Modeling Tutorial
201541
 Lac...
Our Solution
Dynamic Information Retrieval Modeling Tutorial
201542
Try to find an optimal solution
through a sequence of ...
Trial and Error
Dynamic Information Retrieval Modeling Tutorial
201543
 q1 – "dulles hotels"
 q2 – "dulles airport"
 q3...
What is a Desirable Model for
Dynamic IR
Dynamic Information Retrieval Modeling Tutorial
201544
 Model interactions, whic...
Markov Decision Process
Dynamic Information Retrieval Modeling Tutorial
201545
 MDP extends MC with actions and rewards1
...
Definition of MDP
Dynamic Information Retrieval Modeling Tutorial
201546
 A tuple (S, M, A, R, γ)
 S : state space
 M: ...
Optimality — Bellman
Equation
Dynamic Information Retrieval Modeling Tutorial
201547
 The Bellman equation1 to MDP is a r...
MDP algorithms
Dynamic Information Retrieval Modeling Tutorial
201548
 Value Iteration
 Policy Iteration
 Modified Poli...
Apply an MDP to an IR
Problem
Dynamic Information Retrieval Modeling Tutorial
201549
 We can model IR systems using a Mar...
Outline
Dynamic Information Retrieval Modeling Tutorial
201550
 Introduction & Theory
 Session Search
 Dynamic Ranking
...
TREC Session Tracks (2010-
now)
 Given a series of queries {q1,q2,…,qn}, top 10
retrieval results {D1, … Di-1 } for q1 to...
1.pocono mountains pennsylvania
2.pocono mountains pennsylvania hotels
3.pocono mountains pennsylvania things to do
4.poco...
Markov Decision Process
 We propose to model session search as a
Markov decision process (MDP)
 Two agents: the User and...
Settings of the Session MDP
 States: Queries
 Environments: Search results
 Actions:
 User actions:
 Add/remove/ unch...
Search Engine Agent’s
Actions
∈ Di−1 action Example
qtheme
Y increase “pocono mountain” in s6
N increase
“france world cup...
Bellman Equation
 In a MDP, it is believed that a future reward is
not worth quite as much as a current reward
and thus a...
Our Tweak
 In a MDP, it is believed that a future reward is
not worth quite as much as a current reward
and thus a discou...
Query Change retrieval Model
(QCM)
 Bellman Equation gives the optimal value for
an MDP:
 The reward function is used as...
Calculating the Transition Model
)|(log)|(
)|(log)()|(log)|(
)|(log)]|(1[+d)|P(qlog=d),Score(q
*
1
*
1
*
1ii
*
1
*
1
dtPdt...
Maximizing the Reward Function
 Generate a maximum rewarded document
denoted as d*
i-1, from Di-1
 That is the document(...
Scoring the Entire Session
 The overall relevance score for a session of
queries is aggregated recursively :
Scoresession...
Experiments
 TREC 2011-2012 query sets, datasets
 ClubWeb09 Category B
62
Dynamic Information Retrieval Modeling Tutoria...
Search Accuracy (TREC
2012)
 nDCG@10 (official metric used in TREC)
Approach nDCG@10 %chg MAP %chg
Lemur 0.2474 -21.54% 0...
Search Accuracy (TREC
2011)
 nDCG@10 (official metric used in TREC)
Approach nDCG@10 %chg MAP %chg
Lemur 0.3378 -23.38% 0...
Search Accuracy for Different
Session Types
 TREC 2012 Sessions are classified into:
 Product: Factual / Intellectual
 ...
POMDP Model
Dynamic Information Retrieval Modeling Tutorial
201566
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
 Hidden states
 Obser...
POMDP Definition
Dynamic Information Retrieval Modeling Tutorial
201567
 A tuple (S, M, A, R, γ, O, Θ, B)
 S : state spa...
68/33
A Markov Chain of Decision Making
…
A1 A2 A3 A4
S1 S2 S3 Sn
“old US coins” “collecting old
US coins”
“selling old US...
69/33
Hidden Decision Making States
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant
& Exploitatio...
70/33
Dual Agent Stochastic Game
Hidden states
Actions
Rewards
Markov
……s0
r0
a0
r1
a1
r2
a2
s1 s2 s3
Dual-agent game
Coop...
71/33
Actions
 User Action (Au)
 add query terms (+Δq)
 remove query terms (-Δq)
 keep query terms (qtheme)
 Search E...
72/33
Dual-agent Stochastic Game
Documents
(world)
User agent Search engine agent
Belief
Updater
[Luo, Zhang and Yang SIGI...
73/33
Dual-agent Stochastic Game
Documents
(world)
User agent
4 3
Search engine agent
Belief
Updater
[Luo, Zhang and Yang ...
74/33
Dual-agent Stochastic Game
Documents
(world)
User agent
4 3

[Luo, Zhang and Yang SIGIR 2014]
Belief
Updater
Search...
75/33
Observation function (O)
O(st+1, at, ωt) = P(ωt|st+1, at)
 Two types of observations
 Relevance related
 Explorat...
76/33
Relevance-related Observation
 Intuition
 Similarly, we have
 As well as 76
st is likely to be
Relevant
Non-Relev...
77/33
 It is a combined observation
 It happens when updating the before-message-belief-state for a user
action au(query...
78/33
 The belief state b is updated when a new observation is
obtained.
𝒃 𝒕+𝟏(𝒔𝒋) = 𝑷(𝒔𝒋|𝝎𝒕, 𝒂 𝒕, 𝒃 𝒕
=
𝑷(𝝎 𝒕|𝒔𝒋, 𝒂 𝒕, 𝒃...
79/33
 The long term reward for the search engine agent
 The long term reward for the user agent
 Joint optimization
𝑸 ...
Dynamic Search Engine Demo
http://dumplingproject.org
Dynamic Information Retrieval Modeling Tutorial
201580
81/33
EXPERIMENTS
 Evaluate on TREC 2012 and 2013 Session Tracks
 The session logs contain
 session topic
 user querie...
82/33
ACTIONS
 increasing weights of the added terms by a factor of
x={1.05, 1.10, 1.15, 1.20, 1.25, 1.5, 1.75 or 2};
 d...
83/33
SEARCH ACCURACY
 Search accuracy on TREC 2012 Session Track
83
 Win-win outperforms most retrieval algorithms on
T...
84/33
84
Win-win outperforms all retrieval algorithms
on TREC 2013.
 It is highly effective in Session Search.
 Search ...
85/33
IMMEDIATE SEARCH ACCURACY
85
 Original run: top returned documents provided by TREC log data
 Win-win’s immediate ...
86/33
86
 q1=“best US destinations”
observation= NRR
SRT
Relevant &
Exploitation
0.1784
SRR
Relevant &
Exploration
0.1135...
87/33
87
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
SRT
Relevant &
Expl...
88/33
88
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
SRT
Relevant &
Expl...
89/33
89
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
 q3=“maps.bing.com...
90/33
90
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
 q3=“maps.bing.com...
91/33
91
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
 q3=“maps.bing.com...
92/33
92
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
 q3=“maps.bing.com...
93/33
93
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
 q3=“maps.bing.com...
94/33
94
 q1=“best US destinations”
observation= NRR
 q2=“distance New York
Boston”
observation = RT
 q3=“maps.bing.com...
Coffee Break
Dynamic Information Retrieval Modeling Tutorial
201595
Apply an MDP to an IR Problem
- Example
Dynamic Information Retrieval Modeling Tutorial
201596
 User agent in session sea...
 The agent uses a state estimator to update its
belief about the hidden states
b′
= 𝑆𝐸(𝑏, 𝑎, 𝑜′)
 b′
s′
= P s′
o′
, a, b...
POMDP → Bellman Equation
Dynamic Information Retrieval Modeling Tutorial
201598
 The Bellman equation for POMDP
𝑉 𝑏 = max...
Applying POMDP to Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
201599
POMDP Dynamic IR
Environment Documents...
Session Search Example - States
100
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant &
Exploitatio...
Session Search Example - Actions
(Au, Ase)
101
 User Action(Au)
 Add query terms (+Δq)
 Remove query terms (-Δq)
 keep...
TREC Session Tracks (2010-
2012)
 Given a series of queries {q1,q2,…,qn}, top 10
retrieval results {D1, … Di-1 } for q1 t...
Query change is an important
form of feedback
 We define query change as the syntactic
editing changes between two adjace...
Where do these query changes come
from?
 Given TREC Session settings, we consider two
sources of query change:
 the prev...
Previous search results could
influence query change in quite
complex ways
 Merck lobbyists  Merck lobbying US policy
 ...
106/33
POMDP
Rich Interactions
Hidden, Evolving
Information Needs
A Long Term
Goal
Temporal
Dependency
actions
hidden stat...
Recap – Characteristics of
Dynamic IR
Dynamic Information Retrieval Modeling Tutorial
2015107
 Rich interactions
Query fo...
Modeling Query Change
 A framework that is inspired by Reinforcement
Learning
 Reinforcement Learning for Markov Decisio...
Outline
Dynamic Information Retrieval Modeling Tutorial
2015109
 Introduction & Theory
 Session Search
 Dynamic Ranking...
Dynamic Information Retrieval Modeling Tutorial
2015110
 Markov Process
 Hidden Markov Model
 Markov Decision Process
...
Multi Armed Bandits (MAB)
Dynamic Information Retrieval Modeling Tutorial
2015111
……
……
Which slot
machine
should I select...
Multi Armed Bandits (MAB)
Dynamic Information Retrieval Modeling Tutorial
2015112
I won! Is this
the best slot
machine?
Re...
MAB Definition
Dynamic Information Retrieval Modeling Tutorial
2015113
 A tuple (S, A, R, B)
S : hidden reward distribut...
Comparison with Markov Models
Dynamic Information Retrieval Modeling Tutorial
2015114
 Single state Markov Decision Proce...
MAB Policy Reward
Dynamic Information Retrieval Modeling Tutorial
2015115
 MAB algorithm describes a policy 𝜋 for
choosin...
Exploration vs Exploitation
Dynamic Information Retrieval Modeling Tutorial
2015116
 Exploration
 Try out bandits to fin...
MAB – Index Algorithms
Dynamic Information Retrieval Modeling Tutorial
2015117
 Gittens index1
 Play bandit with highest...
Comparison of Markov
Models
Dynamic Information Retrieval Modeling Tutorial
2015118
 Markov Process – a fully observable ...
Outline
Dynamic Information Retrieval Modeling Tutorial
2015119
 Introduction & Theory
 Session Search
 Dynamic Ranking...
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015120
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015121
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Calculate for all 𝑖 and select high...
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015122
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Calculate for all 𝑖 and select high...
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015123
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Calculate for all 𝑖 and select high...
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015124
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Calculate for all 𝑖 and select high...
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015125
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Calculate for all 𝑖 and select high...
Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015126
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
M. Sloan and J. Wang ‘1
UCB Algorithm
Dynamic Information Retrieval Modeling Tutorial
2015127
𝑥𝑖 +
2 ln 𝑡
𝑇𝑖
 Documents 𝑖
M. Sloan and J. Wang ‘1
Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015128
𝑟𝑖 +
2 ln 𝑡
𝑇𝑖
 Documents 𝑖
 Average proba...
Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015129
𝑟𝑖 +
2 ln 𝑡
𝛾𝑖(𝑡)
 Documents 𝑖
 Average pr...
Iterative Expectation
Dynamic Information Retrieval Modeling Tutorial
2015130
𝑟𝑖 + 𝜆
2 ln 𝑡
𝛾𝑖(𝑡)
 Documents 𝑖
 Average ...
Portfolio Theory of IR
Dynamic Information Retrieval Modeling Tutorial
2015131
 Portfolio Theory maximises expected retur...
Portfolio Ranking
Dynamic Information Retrieval Modeling Tutorial
2015132
 Documents are dependent on each other
 Co-cli...
Outline
Dynamic Information Retrieval Modeling Tutorial
2015133
 Introduction & Theory
 Session Search
 Dynamic Ranking...
Multi Page Search
Dynamic Information Retrieval Modeling Tutorial
2015134
Page 1 Page 2
2.
1.
2.
1.
X Jin, M. Sloan and J....
Multi Page Search Example -
States & Actions
Dynamic Information Retrieval Modeling Tutorial
2015135
State:
Relevanc
e of
...
Model
Dynamic Information Retrieval Modeling Tutorial
2015136
Model
Dynamic Information Retrieval Modeling Tutorial
2015137
 𝑁 𝜃1, Σ1
 𝜃1 -prior estimate of relevance
 Σ1 - prior es...
Model
Dynamic Information Retrieval Modeling Tutorial
2015138
 Rank action for page 1
Model
Dynamic Information Retrieval Modeling Tutorial
2015139
Model
Dynamic Information Retrieval Modeling Tutorial
2015140
 Feedback from page 1
 𝒓 ~ 𝑁(𝜃𝒔
1
, Σ 𝒔
1
)
Model
Dynamic Information Retrieval Modeling Tutorial
2015141
 Update estimates using 𝒓1
 𝜃1
=
𝜃𝒔′
𝜃 𝒔′
Σ1
=
Σ𝒔′ Σs′𝒔′
Σ...
Model
Dynamic Information Retrieval Modeling Tutorial
2015142
 Rank using PRP
Model
Dynamic Information Retrieval Modeling Tutorial
2015143
 Utility or Ranking
 𝜆 𝑗=1
𝑀
𝜃 𝑠 𝑗
1
log2(𝑗+1)
+ 1 − 𝜆 𝑗=1...
Model – Bellman Equation
Dynamic Information Retrieval Modeling Tutorial
2015144
 Optimize 𝒔1 to improve 𝑼 𝒔
2
 𝑉 𝜃1
, Σ...
𝜆
Dynamic Information Retrieval Modeling Tutorial
2015145
 Balances exploration and exploitation in page 1
 Tuned for di...
Approximation
Dynamic Information Retrieval Modeling Tutorial
2015146
 Monte Carlo Sampling
 ≈ max
𝒔1
𝜆𝜃𝒔
1
. 𝑾1 + max
𝒔...
Experiment Data
Dynamic Information Retrieval Modeling Tutorial
2015147
 Difficult to evaluate without access to live use...
User Simulation
Dynamic Information Retrieval Modeling Tutorial
2015148
 Rank M documents
 Simulated user clicks accordi...
Investigating λ
Dynamic Information Retrieval Modeling Tutorial
2015149
Baselines
Dynamic Information Retrieval Modeling Tutorial
2015150
 𝜆 determined experimentally
 BM25
 BM25 with conditi...
Results
Dynamic Information Retrieval Modeling Tutorial
2015151
Results
Dynamic Information Retrieval Modeling Tutorial
2015152
Results
Dynamic Information Retrieval Modeling Tutorial
2015153
Results
Dynamic Information Retrieval Modeling Tutorial
2015154
Results
Dynamic Information Retrieval Modeling Tutorial
2015155
Outline
Dynamic Information Retrieval Modeling Tutorial
2015156
 Introduction & Theory
 Session Search
 Dynamic Ranking...
Cold-start problem in recommmender systems
Interactive Recommender Systems
Possible Solutions
Zhao, Xiaoxue, Weinan Zhang, and Jun
Wang. "Interactive collaborative filtering."
CIKM, 2013.
Objective
Cold-start problemInteractive
mechanism for CF
Zhao, Xiaoxue, Weinan Zhang, and Jun
Wang. "Interactive collabora...
Proposed EE algorithms
Thompson Sampling
Linear-UCB
General Linear-UCB
Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interac...
Cold-start users
Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM
2013.
Ad selection problem
Dynamic Information Retrieval Modeling Tutorial
2015163
 how online publishers could optimally selec...
Dynamic Information Retrieval Modeling Tutorial
2015164
Problem formulation
Sequential selection of Correlated Ads by POMD...
Problem formulation
Dynamic Information Retrieval Modeling Tutorial
2015165
Sequential selection of Correlated Ads by POMD...
Objective function
Dynamic Information Retrieval Modeling Tutorial
2015166
Sequential selection of Correlated Ads by POMDP...
Belief update
Dynamic Information Retrieval Modeling Tutorial
2015167
Sequential selection of Correlated Ads by POMDPs Shu...
Results
Dynamic Information Retrieval Modeling Tutorial
2015168
Sequential selection of Correlated Ads by POMDPs Shuai Yua...
Outline
Dynamic Information Retrieval Modeling Tutorial
2015169
 Introduction & Theory
 Session Search
 Dynamic Ranking...
Dynamic Information Retrieval Evaluation
Guest talk at the WSDM 2015 tutorial on
Dynamic Information Retrieval Modeling
Ch...
Moving from static ranking to dynamic domains
• How to extend IR evaluation methodologies to
dynamic domains?
• Three key ...
Evaluating Information Access Systems
Charles Clarke, University of Waterloo 172
searching, browsing, summarization,
visua...
Performance 101: Is this a good search result?
Charles Clarke, University of Waterloo 173
How to evaluate?
Study users
Charles Clarke, University of Waterloo 174
Users in the wild:
• A/B Testing
• Result interlea...
Unfortunately user studies are
• Slow
• Expensive
• Conditions can never be exactly duplicated
(e.g., learning to rank)
Ch...
Alternative: User performance prediction
Can we predict the impact of a proposed change to an
information access system (w...
Traditional Evaluation of Rankers
• Test collection:
– Documents
– Queries
– Relevance judgments
• Each ranker generates a...
Charles Clarke, University of Waterloo 178
Example of a good-old-fashioned IR Metric
Relevant2.
Non-relevant1.
Non-relevan...
General form of effectiveness measures
Nearly all standard effectiveness measures
have the same basic form (including nDCG...
Implicit user model…
• User works down the ranked list spending
equal time on each document. Captions,
navigation, etc., h...
Traditional Evaluation of Rankers
• Many effectiveness measures: precision,
recall, average precision, rank-biased
precisi...
How to better reflect user variation and system performance?
Charles Clarke, University of Waterloo 182
Example: What’s th...
Reading speed distribution (from users in the lab)
Charles Clarke, University of Waterloo 183
Empirical distribution of re...
Stopping time distribution (from users in the wild)
Charles Clarke, University of Waterloo 184
Empirical distribution of t...
Evaluating a search result
Charles Clarke, University of Waterloo 185
1) Generate a reading speed from the distribution
2)...
Useful characters read vs. Characters read
Charles Clarke, University of Waterloo 186
Performance of run york04ha1 on TREC...
Useful characters read vs. Time spent reading
Charles Clarke, University of Waterloo 187
Performance of run york04ha1 on T...
Time well spent vs. Time spent reading
Charles Clarke, University of Waterloo 188
Performance of run york04ha1 on TREC 200...
Distribution of time well spent
Charles Clarke, University of Waterloo 189
Performance of run york04ha1 on TREC 2004 HARD ...
Temporal precision vs. Time spent Reading
Charles Clarke, University of Waterloo 190
Performance of run york04ha1 on TREC ...
Distribution of temporal precision
Charles Clarke, University of Waterloo 191
Performance of run york04ha1 on TREC 2004 HA...
General Framework (Part I): Cumulative Gain
• Consider the performance of a system in terms
of a cost-benefit (cumulative ...
General Framework (Part II): Decay
• Consider the user’s willingness to invest time in
terms of a decay curve D(t), which ...
General form of effectiveness measures (REMINDER)
Nearly all standard effectiveness measures
have the same basic form (inc...
General Framework (Part III): Time-biased gain
Overall system performance may be expressed
as expected cumulative gain (wh...
General Framework (Part IV): Multiple users
• Cumulative gain may be computed by
– Simulation (drawing a set of parameters...
General Framework
Most of the evaluation proposals in the
references can be reformulated in terms of this
general framewor...
Session search example
• Two (or more) result lists, e.g., from query
reformulation, query suggestion, or switching
search...
Simulation of searchers switching between lists: A vs. B
Charles Clarke, University of Waterloo 199
User starts on list A....
Simulation of searchers switching between lists: A vs. B
Charles Clarke, University of Waterloo 200
0 2 4 6 8 10
02468
Swi...
Summary
• Primary goal of IR evaluation: Predict how changes
to an IR system will impact the user experience.
• Evaluation...
A few key papers
• Leif Azzopardi. 2009. Usage based effectiveness measures: monitoring application
performance in informa...
A few more key papers
• Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected
reciprocal rank fo...
And yet more key papers
• Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, ranked retrieval and sessions: a
unified framew...
Dynamic Information Retrieval Evaluation
Guest talk at the WSDM 2015 tutorial on
Dynamic Information Retrieval Modeling
Ch...
Outline
Dynamic Information Retrieval Modeling Tutorial
2015206
 Introduction & Theory
 Session Search
 Dynamic Ranking...
Apply an MDP to an IR
Problem
Dynamic Information Retrieval Modeling Tutorial
2015207
 We can model IR systems using a Ma...
Apply an MDP to an IR Problem
- Example
Dynamic Information Retrieval Modeling Tutorial
2015208
 User agent in session se...
Apply an MDP to an IR Problem
- Example
Dynamic Information Retrieval Modeling Tutorial
2015209
 Search engine’s perspect...
Applying POMDP to Dynamic
IR
Dynamic Information Retrieval Modeling Tutorial
2015210
POMDP Dynamic IR
Environment Document...
 SIGIR Tutorial July 7th 2014
Grace Hui Yang
Marc Sloan
Jun Wang
 Guest Speaker: Emine Yilmaz
Dynamic Information Retrie...
Outline
Dynamic Information Retrieval Modeling Tutorial
2015212
 Introduction & Theory
 Session Search
 Dynamic Ranking...
Conclusions
Dynamic Information Retrieval Modeling Tutorial
2015213
 Dynamic IR describes a new class of interactive
mode...
Dynamic IR Book
Dynamic Information Retrieval Modeling Tutorial
2015214
 Published by Morgan & Claypool
 ‘Synthesis Lect...
TREC 2015
Dynamic Domain Track
 Co-organized by Grace Hui Yang, John Frank, Ian Soboroff
 Underexplored subsets of Web c...
Task
 An interactive, multiple runs of search
 Starting point: System is given a search query
 Iterate
 System returns...
domains
Domain Corpus
Illicit goods 30k forum posts from 5-10 forums (total ~300k posts)
Which users are working together ...
TIME Line
 TREC Call for Participation: January 2015
 Data Available: March
 Detailed Guidelines: April/May
 Topics, T...
TREC 2015
Total Recall Track
 Co-organized by Gord Cormack, Maura
Grossman, , Adam Roegiest, Charlie Clarke
 Explores hi...
Acknowledgment
Dynamic Information Retrieval Modeling Tutorial
2015220
 We thank Prof. Charlie Clarke and for his guest
l...
References
Dynamic Information Retrieval Modeling Tutorial
2015221
Static IR
 Modern Information Retrieval. R. Baeza-Yate...
References
Dynamic Information Retrieval Modeling Tutorial
2015222
Interactive IR
 Relevance Feedback in Information Retr...
References
Dynamic Information Retrieval Modeling Tutorial
2015223
Dynamic IR
 A hidden Markov model information retrieva...
References
Dynamic Information Retrieval Modeling Tutorial
2015224
Dynamic IR
 Mortal multi-armed bandits. Deepayan Chakr...
References
Dynamic Information Retrieval Modeling Tutorial
2015225
Dynamic IR
 Using Control Theory for Stable and Effici...
References
Dynamic Information Retrieval Modeling Tutorial
2015226
Dynamic IR
 Win-win search: Dual-agent stochastic game...
References
Dynamic Information Retrieval Modeling Tutorial
2015227
Markov Processes
 A markovian decision process. R. Bel...
References
Dynamic Information Retrieval Modeling Tutorial
2015228
Markov Processes
 Learning to predict by the methods o...
References
Dynamic Information Retrieval Modeling Tutorial
2015229
Markov Processes
 Finding approximate POMDP solutions ...
References
Dynamic Information Retrieval Modeling Tutorial
2015230
Markov Processes
 The optimal control of partially obs...
Upcoming SlideShare
Loading in …5
×

Dynamic Information Retrieval Tutorial - WSDM 2015

2,481 views

Published on

In Dynamic Information Retrieval modeling we model dynamic systems which change or adapt over time or a sequence of events using a range of techniques from artificial intelligence and reinforcement learning. Many of the open problems in current IR research can be described as dynamic systems, for instance, session search or computational advertising. State of the art research provides solutions to these
problems that are responsive to a changing environment, learn from past interactions and predict future utility. Advances in IR interface, personalization and ad display demand models that can react to users in real time and in an intelligent, contextual way.

The objective of this tutorial is to provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling. We motivate a conceptual model linking static, interactive and dynamic retrieval and use this to define dynamics within the context of IR. We then cover a number of algorithms and techniques from the
artificial intelligence (AI) and online learning literature such as Markov Decision Processes (MDP), their partially observable variation (POMDP) and multi-armed bandits. Following this we describe how to identify dynamics in an IR problem and demonstrate how to model them using the described techniques. The remainder of the tutorial will then cover an array of state-of-the-art research on dynamic systems in IR and how they can be modeled using dynamic IR. We use research on session search, multi-page search and online advertising as in-depth examples of such work.

An older version of this tutorial presented at SIGIR 2014 can be found here http://www.slideshare.net/marcCsloan/dynamic-information-retrieval-tutorial
This version has a greater emphasis on the underlying theory and a guest lecture on evaluation by Dr Emine Yilmaz. This newer version presents a wider range of applications of DIR in state of the art research and includes a guest lecture on evaluation by Prof Charles Clarke.

http://www.dynamic-ir-modeling.org/

@inproceedings{Yang:2015:DIR:2684822.2697038,
author = {Yang, Hui and Sloan, Marc and Wang, Jun},
title = {Dynamic Information Retrieval Modeling},
booktitle = {Proceedings of the Eighth ACM International Conference on Web Search and Data Mining},
series = {WSDM '15},
year = {2015},
isbn = {978-1-4503-3317-7},
location = {Shanghai, China},
pages = {409--410},
numpages = {2},
url = {http://doi.acm.org/10.1145/2684822.2697038},
doi = {10.1145/2684822.2697038},
acmid = {2697038},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {dynamic information retrieval modeling, probabilistic relevance model, reinforcement learning},
}

Published in: Science
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/KsHSN ◀ ◀ ◀ ◀
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If we are speaking about saving time and money this site ⇒ www.HelpWriting.net ⇐ is going to be the best option!! I personally used lots of times and remain highly satisfied.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! I do no use writing service very often, only when I really have problems. But this one, I like best of all. The team of writers operates very quickly. It's called ⇒ www.WritePaper.info ⇐ Hope this helps!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Writing you here to say that this is one of the best collection of plans I've seen I'm on my 4th day and have already build a few wooden toys for my daughter! ✱✱✱ https://url.cn/xFeBN0O4
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/KsHSN ◀ ◀ ◀ ◀
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Dynamic Information Retrieval Tutorial - WSDM 2015

  1. 1. WSDM Tutorial February 2nd 2015 Grace Hui Yang Marc Sloan Jun Wang Guest Speaker: Charlie Clarke Dynamic Information Retrieval Modeling
  2. 2. Dynamic Information Retrieval Dynamic Information Retrieval Modeling Tutorial 20152 Document s to explore Informatio n need Observed document s User Devise a strategy for helping the user explore the information space in order to learn which documents are relevant and which aren’t, and satisfy their information need.
  3. 3. Evolving IR Dynamic Information Retrieval Modeling Tutorial 20153  Paradigm shifts in IR as new models emerge  e.g. VSM → BM25 → Language Model  Different ways of defining relationship between query and document  Static → Interactive → Dynamic  Evolution in modeling user interaction with search engine
  4. 4. Outline Dynamic Information Retrieval Modeling Tutorial 20154  Introduction & Theory  Static IR  Interactive IR  Dynamic IR  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  5. 5. Conceptual Model – Static IR Dynamic Information Retrieval Modeling Tutorial 20155 Static IR Interactive IR Dynamic IR  No feedback
  6. 6. Characteristics of Static IR Dynamic Information Retrieval Modeling Tutorial 20156  Does not learn directly from user  Parameters updated periodically
  7. 7. Dynamic Information Retrieval Modeling Tutorial 20157 Commonly Used Static IR Models BM25 PageRank Language Model Learning to Rank
  8. 8. Feedback in IR Dynamic Information Retrieval Modeling Tutorial 20158
  9. 9. Outline Dynamic Information Retrieval Modeling Tutorial 20159  Introduction & Theory  Static IR  Interactive IR  Dynamic IR  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  10. 10. Conceptual Model – Interactive IR Dynamic Information Retrieval Modeling Tutorial 201510 Static IR Interactive IR Dynamic IR  Exploit Feedback
  11. 11. Learn the user’s taste interactively! At the same time, provide good recommendations! Dynamic Information Retrieval Modeling Tutorial 201511 Interactive Recommender Systems
  12. 12. Toy Example Dynamic Information Retrieval Modeling Tutorial 201512  Multi-Page search scenario  User image searches for “jaguar”  Rank two of the four results over two pages: 𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9𝑟 = 0.49
  13. 13. Toy Example – Static Ranking Dynamic Information Retrieval Modeling Tutorial 201513  Ranked according to PRP Page 1 Page 2 1. 2. 𝑟 = 0.9 𝑟 = 0.51 1. 2. 𝑟 = 0.5 𝑟 = 0.49
  14. 14. Toy Example – Relevance Feedback Dynamic Information Retrieval Modeling Tutorial 201514  Interactive Search  Improve 2nd page based on feedback from 1st page  Use clicks as relevance feedback  Rocchio1 algorithm on terms in image webpage  𝑤 𝑞 ′ = 𝛼𝑤 𝑞 + 𝛽 |𝐷 𝑟| 𝑑∈𝐷 𝑟 𝑤 𝑑 − 𝛾 𝐷 𝑛 𝑑∈𝐷 𝑛 𝑤 𝑑  New query closer to relevant documents and different to non-relevant documents1Rocchio, J. J., ’71, Baeza- Yates & Ribeiro-Neto ‘99
  15. 15. Toy Example – Relevance Feedback Dynamic Information Retrieval Modeling Tutorial 201515  Ranked according to PRP and Rocchio Page 1 Page 2 2. 𝑟 = 0.9 𝑟 = 0.51 1. 2. 𝑟 = 0.5 𝑟 = 0.49 * 1. * Click
  16. 16. Toy Example – Relevance Feedback Dynamic Information Retrieval Modeling Tutorial 201516  No click when searching for animals Page 1 Page 2 2. 𝑟 = 0.9 𝑟 = 0.51 1. 2. 1. ? ?
  17. 17. Toy Example – Value Function Dynamic Information Retrieval Modeling Tutorial 201517  Optimize both pages using dynamic IR  Bellman equation for value function  Simplified example:  𝑉 𝑡 𝜃 𝑡 , Σ 𝑡 = max 𝑠 𝑡 𝜃𝑠 𝑡 + 𝐸(𝑉 𝑡+1 𝜃 𝑡+1 , Σ 𝑡+1 𝐶 𝑡 )  𝜃 𝑡, Σ 𝑡 = relevance and covariance of documents for page 𝑡  𝐶 𝑡 = clicks on page 𝑡  𝑉 𝑡 = ‘value’ of ranking on page 𝑡  Maximize value over all pages based on estimating feedback X Jin, M. Sloan and J. Wang ’13
  18. 18. 1 0.8 0.1 0 0.8 1 0.1 0 0.1 0.1 1 0.95 0 0 0.95 1 Toy Example - Covariance Dynamic Information Retrieval Modeling Tutorial 201518  Covariance matrix represents similarity between images X Jin, M. Sloan and J. Wang ’13
  19. 19. Toy Example – Myopic Value Dynamic Information Retrieval Modeling Tutorial 201519  For myopic ranking, 𝑉2 = 16.380 Page 1 2. 1. X Jin, M. Sloan and J. Wang ’13
  20. 20. Toy Example – Myopic Ranking Dynamic Information Retrieval Modeling Tutorial 201520  Page 2 ranking stays the same regardless of clicksPage 1 Page 2 2. 1. 2. 1. X Jin, M. Sloan and J. Wang ’13
  21. 21. Toy Example – Optimal Value Dynamic Information Retrieval Modeling Tutorial 201521  For optimal ranking, 𝑉2 = 16.528 Page 1 2. 1. X Jin, M. Sloan and J. Wang ’13
  22. 22. Toy Example – Optimal Ranking Dynamic Information Retrieval Modeling Tutorial 201522  If car clicked, Jaguar logo is more relevant on next pagePage 1 Page 2 2. 1. 2. 1. X Jin, M. Sloan and J. Wang ’13
  23. 23. Toy Example – Optimal Ranking Dynamic Information Retrieval Modeling Tutorial 201523  In all other scenarios, rank animal first on next pagePage 1 Page 2 2. 1. 2. 1. X Jin, M. Sloan and J. Wang ’13
  24. 24. x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone Documents exist in vector space 24 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015 Static IR Visualization
  25. 25. Static IR Visualization x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone Q 25 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015 t = 1: Static IR considers Relevancy
  26. 26. Static IR Visualization x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone Q 26 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015 t = 1: Static IR considers Relevancy
  27. 27. Interactive IR Update x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone Q -1 -1 +1 Q’ 27 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015 t = 1: Static IR considers Relevancy t = 2: Interactive considers local gain
  28. 28. Interactive IR Update x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone Q -1 -1 +1 Q’ 28 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015 t = 1: Static IR considers Relevancy t = 2: Interactive considers local gain
  29. 29. Dynamic Ranking Principle x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone t = 1: Relevancy + Variance Q 29 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015
  30. 30. Dynamic Ranking Principle x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone t = 1: Relevancy + Variance + |Correlations| Q -1 -1 +1 30 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015
  31. 31. Dynamic Ranking Principle x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone t = 1: Relevancy + Variance + |Correlations| Diversified, exploratory relevance ranking Q 31 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015
  32. 32. Dynamic Ranking Principle x x x x x x x x xx x o o o o o o o         x x doc about apple ceo X: doc about apple fruit   O: doc about apple iphone Q -1 -1 +1 Q’ 32 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under submission, 2015 t = 1: Relevancy + Variance + |Correlations| Diversified, exploratory relevance ranking t = 2: Personalized Re-ranking
  33. 33. Interactive vs Dynamic IR Dynamic Information Retrieval Modeling Tutorial 201533 • Treats interactions independently • Responds to immediate feedback • Static IR used before feedback received • Optimizes over all interaction • Long term gains • Models future user feedback • Also used at beginning of interaction Interactive Dynamic
  34. 34. Interactive & Dynamic Techniques Dynamic Information Retrieval Modeling Tutorial 201534 • Rocchio equation in Relevance Feedback • Collaborative filtering in recommender systems • Active learning in interactive retrieval • POMDP in multi page search and ad recommendati on • Multi Armed Bandits in Online Evaluation • MDP in session search Interactive Dynamic
  35. 35. Outline Dynamic Information Retrieval Modeling Tutorial 201535  Introduction & Theory  Static IR  Interactive IR  Dynamic IR  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  36. 36. Conceptual Model – Interactive IR Dynamic Information Retrieval Modeling Tutorial 201536 Static IR Interactive IR Dynamic IR  Explore and exploit Feedback
  37. 37. Characteristics of Dynamic IR Dynamic Information Retrieval Modeling Tutorial 201537 Rich interactions Query formulation Document clicks Document examination Eye movement Mouse movements etc. [Luo et al., IRJ under revision 2014]
  38. 38. Characteristics of Dynamic IR Dynamic Information Retrieval Modeling Tutorial 201538 Temporal dependency clicked documentsquery D1 ranked documents q1 C1 D2 q2 C2 …… …… Dn qn Cn I information need iteration 1 iteration 2 iteration n [Luo et al., IRJ under revision 2014]
  39. 39. Characteristics of Dynamic IR Dynamic Information Retrieval Modeling Tutorial 201539 Overall goal Optimize over all iterations for goal IR metric or user satisfaction Optimal policy [Luo et al., IRJ under revision 2014]
  40. 40. 40/33 Dynamic Information Retrieval Dynamic Relevance Dynamic Users Dynamic Queries Dynamic Documents Dynamic Information Needs Users change behavior over time, user history Topic Trends, Filtering, document content change User perceived relevance changes Changing query definition i.e. ‘Twitter’ Information needs evolve over time Next generation Search Engine
  41. 41. Why Not Existing Supervised Learning for Dynamic IR Modeling? Dynamic Information Retrieval Modeling Tutorial 201541  Lack of enough training data  Dynamic IR problems contain a sequence of dynamic interactions  E.g. a series of queries in session  Rare to find repeated sequences (close to zero)  Even in large query logs (WSCD 2013 & 2014, query logs from Yandex)  Chance of finding repeated adjacent query pairs is also lowDataset Repeated Adjacent Query Pairs Total Adjacent Query Pairs Repeated Percentage WSCD 2013 476,390 17,784,583 2.68% WSCD 2014 1,959,440 35,376,008 5.54%
  42. 42. Our Solution Dynamic Information Retrieval Modeling Tutorial 201542 Try to find an optimal solution through a sequence of dynamic interactions Trial and Error: learn from repeated, varied attempts which are continued until success No (or less) Supervised Learning
  43. 43. Trial and Error Dynamic Information Retrieval Modeling Tutorial 201543  q1 – "dulles hotels"  q2 – "dulles airport"  q3 – "dulles airport location"  q4 – "dulles metrostop"
  44. 44. What is a Desirable Model for Dynamic IR Dynamic Information Retrieval Modeling Tutorial 201544  Model interactions, which means it needs to have place holders for actions;  Model information need hidden behind user queries and other interactions;  Set up a reward mechanism to guide the entire search algorithm to adjust its retrieval strategies;  Represent Markov properties to handle the temporal dependency. A model in Trial and Error setting will do! A Markov Model will do!
  45. 45. Markov Decision Process Dynamic Information Retrieval Modeling Tutorial 201545  MDP extends MC with actions and rewards1 si– state ai – action ri – reward pi – transition probability p0 p1 p2 ……s0 s1 r0 a0 s2 r1 a1 s3 r2 a2 1R. Bellman, ‘57 (S, M, A, R, γ)
  46. 46. Definition of MDP Dynamic Information Retrieval Modeling Tutorial 201546  A tuple (S, M, A, R, γ)  S : state space  M: transition matrix Ma(s, s') = P(s'|s, a)  A: action space  R: reward function R(s,a) = immediate reward taking action a at state s  γ: discount factor, 0< γ ≤1  policy π π(s) = the action taken at state s  Goal is to find an optimal policy π* maximizing the expected total rewards.
  47. 47. Optimality — Bellman Equation Dynamic Information Retrieval Modeling Tutorial 201547  The Bellman equation1 to MDP is a recursive definition of the optimal value function V*(.) 𝑉∗ s = max 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑠′ 𝑀 𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)  Optimal Policy π∗ s = arg 𝑚𝑎𝑥 𝑎 𝑅 𝑠, 𝑎 + 𝛾 𝑠′ 𝑀 𝑎 𝑠, 𝑠′ 𝑉∗(𝑠′) 1R. Bellman, ‘57 state-value function
  48. 48. MDP algorithms Dynamic Information Retrieval Modeling Tutorial 201548  Value Iteration  Policy Iteration  Modified Policy Iteration  Prioritized Sweeping  Temporal Difference (TD) Learning  Q-Learning Model free approaches Model-based approaches [Bellman, ’57, Howard, ‘60, Puterman and Shin, ‘78, Singh & Sutton, ‘96, Sutton & Barto, ‘98, Richard Sutton, ‘88, Watkins, ‘92] Solve Bellman equation Optimal value V*(s) Optimal policy *(s) [Slide altered from Carlos Guestrin’s ML lecture]
  49. 49. Apply an MDP to an IR Problem Dynamic Information Retrieval Modeling Tutorial 201549  We can model IR systems using a Markov Decision Process  Is there a temporal component?  States – What changes with each time step?  Actions – How does your system change the state?  Rewards – How do you measure feedback or effectiveness in your problem at each time step?  Transition Probability – Can you determine this?
  50. 50. Outline Dynamic Information Retrieval Modeling Tutorial 201550  Introduction & Theory  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  51. 51. TREC Session Tracks (2010- now)  Given a series of queries {q1,q2,…,qn}, top 10 retrieval results {D1, … Di-1 } for q1 to qi-1, and click information  The task is to retrieve a list of documents for the current/last query, qn  Relevance judgment is made based on how relevant the documents are for qn, and how relevant they are for information needs for the entire session (in topic description)  no need to segment the sessions 51 Dynamic Information Retrieval Modeling Tutorial 2015
  52. 52. 1.pocono mountains pennsylvania 2.pocono mountains pennsylvania hotels 3.pocono mountains pennsylvania things to do 4.pocono mountains pennsylvania hotels 5.pocono mountains camelbeach 6.pocono mountains camelbeach hotel 7.pocono mountains chateau resort 8.pocono mountains chateau resort attractions 9.pocono mountains chateau resort getting to 10.chateau resort getting to 11.pocono mountains chateau resort directions TREC 2012 Session 6 52 Information needs: You are planning a winter vacation to the Pocono Mountains region in Pennsylvania in the US. Where will you stay? What will you do while there? How will you get there? In a session, queries change constantly Dynamic Information Retrieval Modeling Tutorial 2015
  53. 53. Markov Decision Process  We propose to model session search as a Markov decision process (MDP)  Two agents: the User and the Search Engine 53 [Guan, Zhang and Yang SIGIR 2013]
  54. 54. Settings of the Session MDP  States: Queries  Environments: Search results  Actions:  User actions:  Add/remove/ unchange the query terms  Nicely correspond to our definition of query change  Search Engine actions:  Increase/ decrease /remain term weights 54 [Guan, Zhang and Yang SIGIR 2013]
  55. 55. Search Engine Agent’s Actions ∈ Di−1 action Example qtheme Y increase “pocono mountain” in s6 N increase “france world cup 98 reaction” in s28, france world cup 98 reaction stock market→ france world cup 98 reaction +∆q Y decrease ‘policy’ in s37, Merck lobbyists → Merck lobbyists US policy N increase ‘US’ in s37, Merck lobbyists → Merck lobbyists US policy −∆q Y decrease ‘reaction’ in s28, france world cup 98 reaction → france world cup 98 N No change ‘legislation’ in s32, bollywood legislation →bollywood law 55 [Guan, Zhang and Yang SIGIR 2013]
  56. 56. Bellman Equation  In a MDP, it is believed that a future reward is not worth quite as much as a current reward and thus a discount factor γ ϵ (0,1) is applied to future rewards.  Bellman Equation gives the optimal value (expected long term reward starting from state s and continuing with policy π from then on) for an MDP: 56 V* (s) = max a R(s,a) + g P(s' | s,a) s' å V* (s')
  57. 57. Our Tweak  In a MDP, it is believed that a future reward is not worth quite as much as a current reward and thus a discount factor γ ϵ (0,1) is applied to future rewards.  In session search, a past reward is not worth quite as much as a current reward and thus a discount factor γ should be applied to past rewards  We model the MDP for session search in a reverse order 57
  58. 58. Query Change retrieval Model (QCM)  Bellman Equation gives the optimal value for an MDP:  The reward function is used as the document relevance score function and is tweaked backwards from Bellman equation: 58 V* (s) = max a R(s,a) + g P(s' | s,a) s' å V* (s')   a Di )D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii 1  Document relevant score Query Transition model Maximum past relevanceCurrent reward/relevan ce score [Guan, Zhang and Yang SIGIR 2013]
  59. 59. Calculating the Transition Model )|(log)|( )|(log)()|(log)|( )|(log)]|(1[+d)|P(qlog=d),Score(q * 1 * 1 * 1ii * 1 * 1 dtPdtP dtPtidfdtPdtP dtPdtP qt i dt qt dt qt i qthemet i ii                    59 • According to Query Change and Search Engine ActionsCurrent reward/ relevance score Increase weights for theme terms Decrease weights for removed terms Increase weights for novel added termsDecrease weights for old added terms [Guan, Zhang and Yang SIGIR 2013]
  60. 60. Maximizing the Reward Function  Generate a maximum rewarded document denoted as d* i-1, from Di-1  That is the document(s) most relevant to qi-1  The relevance score can be calculated as 𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 − 𝑡∈𝑞 𝑖−1 {1 − 𝑃(𝑡|𝑑𝑖−1)} 𝑃 𝑡 𝑑𝑖−1 = #(𝑡,𝑑 𝑖−1) |𝑑 𝑖−1|  From several options, we choose to only use the document with top relevance max Di-1 P(qi-1 | Di-1) 60 Dynamic Information Retrieval Modeling Tutorial 2015 [Guan, Zhang and Yang SIGIR 2013]
  61. 61. Scoring the Entire Session  The overall relevance score for a session of queries is aggregated recursively : Scoresession (qn, d) = Score(qn, d) + gScoresession (qn-1, d) = Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)] = gn-i i=1 n å Score(qi, d) 61 Dynamic Information Retrieval Modeling Tutorial 2015 [Guan, Zhang and Yang SIGIR 2013]
  62. 62. Experiments  TREC 2011-2012 query sets, datasets  ClubWeb09 Category B 62 Dynamic Information Retrieval Modeling Tutorial 2015
  63. 63. Search Accuracy (TREC 2012)  nDCG@10 (official metric used in TREC) Approach nDCG@10 %chg MAP %chg Lemur 0.2474 -21.54% 0.1274 -18.28% TREC’12 median 0.2608 -17.29% 0.1440 -7.63% Our TREC’12 submission 0.3021 −4.19% 0.1490 -4.43% TREC’12 best 0.3221 0.00% 0.1559 0.00% QCM 0.3353 4.10%† 0.1529 -1.92% QCM+Dup 0.3368 4.56%† 0.1537 -1.41% 63 Dynamic Information Retrieval Modeling Tutorial 2015
  64. 64. Search Accuracy (TREC 2011)  nDCG@10 (official metric used in TREC) Approach nDCG@10 %chg MAP %chg Lemur 0.3378 -23.38% 0.1118 -25.86% TREC’11 median 0.3544 -19.62% 0.1143 -24.20% TREC’11 best 0.4409 0.00% 0.1508 0.00% QCM 0.4728 7.24%† 0.1713 13.59%† QCM+Dup 0.4821 9.34%† 0.1714 13.66%† Our TREC’12 submission 0.4836 9.68%† 0.1724 14.32%† 64 Dynamic Information Retrieval Modeling Tutorial 2015
  65. 65. Search Accuracy for Different Session Types  TREC 2012 Sessions are classified into:  Product: Factual / Intellectual  Goal quality: Specific / Amorphous Intellec tual %chg Amorphous %chg Specific %chg Factual %chg TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00% Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51% QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29% QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10% 65 - Better handle sessions that demonstrate evolution and exploration Because QCM treats a session as a continuous process by studying changes among query transitions and modeling the dynamicsDynamic Information Retrieval Modeling Tutorial 2015
  66. 66. POMDP Model Dynamic Information Retrieval Modeling Tutorial 201566 ……s0 s1 r0 a0 s2 r1 a1 s3 r2 a2  Hidden states  Observations  Belief 1R. D. Smallwood et. al., ‘73 o1 o2 o3
  67. 67. POMDP Definition Dynamic Information Retrieval Modeling Tutorial 201567  A tuple (S, M, A, R, γ, O, Θ, B)  S : state space  M: transition matrix  A: action space  R: reward function  γ: discount factor, 0< γ ≤1  O: observation set an observation is a symbol emitted according to a hidden state.  Θ: observation function Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a).  B: belief space Belief is a probability distribution over hidden states.
  68. 68. 68/33 A Markov Chain of Decision Making … A1 A2 A3 A4 S1 S2 S3 Sn “old US coins” “collecting old US coins” “selling old US coins” q1 q2 q3 “D1 is relevant and I stay to find out more about collecting…” D1 D2 D3 “D2 is relevant and I now move to the next topic…” “D3 is irrelevant; I slightly edit the query and stay here a little longer…” [Luo, Zhang and Yang SIGIR 2014]
  69. 69. 69/33 Hidden Decision Making States SRT Relevant & Exploitation SRR Relevant & Exploration SNRT Non-Relevant & Exploitation SNRR Non-Relevant & Exploration  scooter price ⟶ scooter stores  collecting old US coins⟶ selling old US coins  Philadelphia NYC travel ⟶ Philadelphia NYC train  Boston tourism ⟶ NYC tourism q0 [Luo, Zhang and Yang SIGIR 2014]
  70. 70. 70/33 Dual Agent Stochastic Game Hidden states Actions Rewards Markov ……s0 r0 a0 r1 a1 r2 a2 s1 s2 s3 Dual-agent game Cooperative game Joint optimization D2 User Agent Search Engine Agent [Luo, Zhang and Yang SIGIR 2014]
  71. 71. 71/33 Actions  User Action (Au)  add query terms (+Δq)  remove query terms (-Δq)  keep query terms (qtheme)  Search Engine Action(Ase)  Increase/ decrease/ keep term weights  Switch on or off a search technique,  e.g. to use or not to use query expansion  adjust parameters in search techniques  e.g., select the best k for the top k docs used in PRF  Message from the user(Σu)  clicked documents  SAT clicked documents  Message from search engine(Σse)  top k returned documents Messages are essentially documents that an agent thinks are relevant. [Luo, Zhang and Yang SIGIR 2014]
  72. 72. 72/33 Dual-agent Stochastic Game Documents (world) User agent Search engine agent Belief Updater [Luo, Zhang and Yang SIGIR 2014] Σse= 𝐷𝑡𝑜 𝑝_ 𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑
  73. 73. 73/33 Dual-agent Stochastic Game Documents (world) User agent 4 3 Search engine agent Belief Updater [Luo, Zhang and Yang SIGIR 2014] Σse= 𝐷𝑡𝑜 𝑝_ 𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑
  74. 74. 74/33 Dual-agent Stochastic Game Documents (world) User agent 4 3  [Luo, Zhang and Yang SIGIR 2014] Belief Updater Search engine agent Σse= 𝐷𝑡𝑜 𝑝_ 𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑
  75. 75. 75/33 Observation function (O) O(st+1, at, ωt) = P(ωt|st+1, at)  Two types of observations  Relevance related  Exploration-exploitation related Probability of making observation ωt after taking action at and landing in state st+1 [Luo, Zhang and Yang SIGIR 2014]
  76. 76. 76/33 Relevance-related Observation  Intuition  Similarly, we have  As well as 76 st is likely to be Relevant Non-Relevant If ∃d ∈ Dt-1 and d is SAT Clicked otherwise It happens after the user sends out the message 𝛴 𝑢 𝑡 (clicks) 𝑂( 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 , ωt=Rel)≝ 𝑃(ωt = 𝑅𝑒𝑙| 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢) 𝑂(𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 , ωt = 𝑅𝑒𝑙) ∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙, 𝑢) ∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙| 𝑢) 𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 , ωt = 𝑁𝑜𝑛𝑅𝑒𝑙 ∝ 𝑃 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 ω𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 𝑃(ωt = 𝑁𝑜𝑛𝑅𝑒𝑙| 𝑢) 𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 , ωt = 𝑅𝑒𝑙 𝑂 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 , ωt = 𝑁𝑜𝑛𝑅𝑒𝑙 [Luo, Zhang and Yang SIGIR 2014]
  77. 77. 77/33  It is a combined observation  It happens when updating the before-message-belief-state for a user action au(query change) and a search engine message Ʃse =Dt-1  Intuition st is likely to be Exploration Exploitation if (+Δqt≠∅ and +Δqt∉Dt-1) or (+Δqt=∅ and -Δqt≠∅ ) if (+Δqt≠∅ and +Δqt∈Dt-1) or (+Δqt=∅ and –Δqt=∅ ) EXPLORATION-RELATED OBSERVATION 𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1, ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 × 𝑃 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ∆𝑞𝑡, 𝐷𝑡 − 1 𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1, ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛 ∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛 𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛 × 𝑃(𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛|∆𝑞𝑡, 𝐷𝑡 − 1) [Luo, Zhang and Yang SIGIR 2014]
  78. 78. 78/33  The belief state b is updated when a new observation is obtained. 𝒃 𝒕+𝟏(𝒔𝒋) = 𝑷(𝒔𝒋|𝝎𝒕, 𝒂 𝒕, 𝒃 𝒕 = 𝑷(𝝎 𝒕|𝒔𝒋, 𝒂 𝒕, 𝒃 𝒕) 𝒔 𝒊∈𝑺 𝑷(𝒔𝒋|𝒔𝒊, 𝒂 𝒕, 𝒃 𝒕)𝒃 𝒕(𝒔𝒊 )𝑷(𝝎 𝒕|𝒂𝒕, 𝒃 𝒕 = 𝑶(𝒔𝒋, 𝒂 𝒕, 𝝎 𝒕) 𝒔 𝒊∈𝑺 𝑷(𝒔𝒋|𝒔𝒊, 𝒂 𝒕, 𝒃 𝒕)𝒃 𝒕(𝒔𝒊 )𝑷(𝝎 𝒕|𝒂𝒕, 𝒃 𝒕 BELIEF UPDATES (B)
  79. 79. 79/33  The long term reward for the search engine agent  The long term reward for the user agent  Joint optimization 𝑸 𝒔𝒆(𝒃, 𝒂) = 𝒔∈𝑺 )𝒃(𝒔)𝑹(𝒔, 𝒂 + 𝜸 𝝎∈𝜴 𝑷(𝝎|𝒃, 𝒂 𝒖, 𝜮 𝒔𝒆)𝑷(𝝎|𝒃, 𝜮 𝒖)𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒆(𝒃′, 𝒂 𝑸 𝒖(𝒃, 𝒂 𝒖) = 𝑹(𝒔, 𝒂 𝒖) + 𝜸 𝒂 𝒖 )𝑻(𝒔 𝒕|𝒔𝒕−𝟏, 𝑫 𝒕−𝟏 𝒎𝒂𝒙 𝒔 𝒕−𝟏 𝑸 𝒖(𝒔𝒕−𝟏, 𝒂 𝒖) = P(qt|d) +𝜸 𝒂 𝒖 )𝐏(𝒒 𝒕|𝒒 𝒕−𝟏, 𝑫 𝒕−𝟏, 𝒂 𝒎𝒂𝒙 𝑫 𝒕−𝟏 𝑷 (𝒒 𝒕−𝟏|𝑫 𝒕−𝟏) 𝒂 𝒔𝒆 = 𝒂𝒓𝒈𝒎𝒂𝒙 𝒂 (𝑸 𝒔𝒆(𝒃, 𝒂) + 𝑸 𝒖(𝒃, 𝒂 𝒖)) JOINT OPTIMIZATION — WIN-WIN [Luo, Zhang and Yang SIGIR 2014]
  80. 80. Dynamic Search Engine Demo http://dumplingproject.org Dynamic Information Retrieval Modeling Tutorial 201580
  81. 81. 81/33 EXPERIMENTS  Evaluate on TREC 2012 and 2013 Session Tracks  The session logs contain  session topic  user queries  previously retrieved URLs, snippets  user clicks, and dwell time etc.  Task: retrieve 2,000 documents for the last query in each session  The evaluation is based on the whole session.  A document related to any query in the session is a good document 81  Datasets  ClueWeb09  ClueWeb12  Spams, dups are removed
  82. 82. 82/33 ACTIONS  increasing weights of the added terms by a factor of x={1.05, 1.10, 1.15, 1.20, 1.25, 1.5, 1.75 or 2};  decreasing weights of the added terms by a factor of y={0.5, 0.57, 0.67, 0.8, 0.83, 0.87, 0.9 or 0.95};  Query Change Model (QCM) proposed in Guan et. al SIGIR’13;  Pseudo Relevance Feedback which assumes the top 20 retrieved documents are relevant;  directly uses the query in current iteration to perform retrieval;  combines all queries in a session weights them equally. 82   a Di )D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii 1 
  83. 83. 83/33 SEARCH ACCURACY  Search accuracy on TREC 2012 Session Track 83  Win-win outperforms most retrieval algorithms on TREC 2012.
  84. 84. 84/33 84 Win-win outperforms all retrieval algorithms on TREC 2013.  It is highly effective in Session Search.  Search accuracy on TREC 2013 Session Track SEARCH ACCURACY
  85. 85. 85/33 IMMEDIATE SEARCH ACCURACY 85  Original run: top returned documents provided by TREC log data  Win-win’s immediate search accuracy is better than the Original at every iteration  Win-win's immediate search accuracy increases while the number of search iterations increases TREC 2012 Session Track TREC 2013 Session Track
  86. 86. 86/33 86  q1=“best US destinations” observation= NRR SRT Relevant & Exploitation 0.1784 SRR Relevant & Exploration 0.1135 SNRT Non-Relevant & Exploitation 0.2838 SNRR Non-Relevant & Exploration 0.4243 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit? BELIEF UPDATES (B) q0
  87. 87. 87/33 87  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT SRT Relevant & Exploitation 0.0005 SRR Relevant & Exploration 0.0068 SNRT Non-Relevant & Exploitation 0.0715 SNRR Non-Relevant & Exploration 0.9212 BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
  88. 88. 88/33 88  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT SRT Relevant & Exploitation 0.0005 SRR Relevant & Exploration 0.0068 SNRT Non-Relevant & Exploitation 0.0715 SNRR Non-Relevant & Exploration 0.9212 BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
  89. 89. 89/33 89  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT  q3=“maps.bing.com” observation = NRT SRT Relevant & Exploitation 0.0151 SRR Relevant & Exploration 0.4347 SNRT Non-Relevant & Exploitation 0.0276 SNRR Non-Relevant & Exploration 0.5226 BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
  90. 90. 90/33 90  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT  q3=“maps.bing.com” observation = NRT SRT Relevant & Exploitation 0.0151 SRR Relevant & Exploration 0.4347 SNRT Non-Relevant & Exploitation 0.0276 SNRR Non-Relevant & Exploration 0.5226 BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
  91. 91. 91/33 91  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT  q3=“maps.bing.com” observation = NRT SRT Relevant & Exploitation 0.0291 SRR Relevant & Exploration 0.7837 SNRT Non-Relevant & Exploitation 0.0081 SNRR Non-Relevant & Exploration 0.1790  q20=“Philadelphia NYC train” observation = NRT …… BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
  92. 92. 92/33 92  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT  q3=“maps.bing.com” observation = NRT SRT Relevant & Exploitation 0.0291 SRR Relevant & Exploration 0.7837 SNRT Non-Relevant & Exploitation 0.0081 SNRR Non-Relevant & Exploration 0.1790  q20=“Philadelphia NYC train” observation = NRT …… BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit?
  93. 93. 93/33 93  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT  q3=“maps.bing.com” observation = NRT SRT Relevant & Exploitation 0.0304 SRR Relevant & Exploration 0.8126 SNRT Non-Relevant & Exploitation 0.0066 SNRR Non-Relevant & Exploration 0.1505  q20=“Philadelphia NYC train” observation = NRT  q21=“Philadelphia NYC bus” observation = NRT BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit? ……
  94. 94. 94/33 94  q1=“best US destinations” observation= NRR  q2=“distance New York Boston” observation = RT  q3=“maps.bing.com” observation = NRT SRT Relevant & Exploitation 0.0304 SRR Relevant & Exploration 0.8126 SNRT Non-Relevant & Exploitation 0.0066 SNRR Non-Relevant & Exploration 0.1505  q20=“Philadelphia NYC train” observation = NRT  q21=“Philadelphia NYC bus” observation = NRT BELIEF UPDATES (B) q0 TREC’13 session #87 topic: planning a trip to the United States. You will be there for a month and able to travel within a 150-mile radius of your destination. What are the best cities to visit? ……
  95. 95. Coffee Break Dynamic Information Retrieval Modeling Tutorial 201595
  96. 96. Apply an MDP to an IR Problem - Example Dynamic Information Retrieval Modeling Tutorial 201596  User agent in session search  States – user’s relevance judgement  Action – new query  Reward – information gained [Luo, Zhang, Yang SIGIR’14]
  97. 97.  The agent uses a state estimator to update its belief about the hidden states b′ = 𝑆𝐸(𝑏, 𝑎, 𝑜′)  b′ s′ = P s′ o′ , a, b = 𝑃(𝑠′,𝑜′|𝑎,𝑏) P(𝑜′|𝑎,𝑏) = Θ(𝑠′, 𝑎, 𝑜′) 𝑠 𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠) 𝑃(𝑜′|𝑎, 𝑏) POMDP → Belief Update Dynamic Information Retrieval Modeling Tutorial 201597
  98. 98. POMDP → Bellman Equation Dynamic Information Retrieval Modeling Tutorial 201598  The Bellman equation for POMDP 𝑉 𝑏 = max 𝑎 𝑟 𝑏, 𝑎 + 𝛾 𝑜′ 𝑃(𝑜′ |𝑎, 𝑏)𝑉(𝑏′ )  A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A, r, γ)  B : the continuous belief space  𝑀′: transition function 𝑀 𝑎 ′ (𝑏, 𝑏′)= 𝑜∈𝑂 1 𝑎,𝑜′(𝑏′, 𝑏)Pr(𝑜′|𝑎, 𝑏) where 1 𝑎,𝑜′ 𝑏′ , 𝑏 = 1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′ = 𝑏′ 0, 𝑒𝑙𝑠𝑒 .  A: action space  r: reward function r(b, a)= 𝑠∈𝑆 𝑏 𝑠 𝑅(𝑠, 𝑎)
  99. 99. Applying POMDP to Dynamic IR Dynamic Information Retrieval Modeling Tutorial 201599 POMDP Dynamic IR Environment Documents Agents User, Search engine States Queries, User’s decision making status, Relevance of documents, etc Actions Provide a ranking of documents, Weigh terms in the query, Add/remove/unchange the query terms, Switch on or switch off a search technology, Adjust parameters for a search technology Observations Queries, Clicks, Document lists, Snippets, Terms, etc Rewards Evaluation measures (such as DCG, NDCG or MAP) Clicking information Transition matrix Given in advance or estimated from training data. Observation function Problem dependent, Estimated based on sample datasets
  100. 100. Session Search Example - States 100 SRT Relevant & Exploitation SRR Relevant & Exploration SNRT Non-Relevant & Exploitation SNRR Non-Relevant & Exploration  scooter price ⟶ scooter stores  Hartford visitors ⟶ Hartford Connecticut tourism  Philadelphia NYC travel ⟶ Philadelphia NYC train  distance New York Boston ⟶ maps.bing.com q0 [ J. Luo ,et al., ’14] Dynamic Information Retrieval Modeling Tutorial 2015
  101. 101. Session Search Example - Actions (Au, Ase) 101  User Action(Au)  Add query terms (+Δq)  Remove query terms (-Δq)  keep query terms (qtheme)  clicked documents  SAT clicked documents  Search Engine Action(Ase)  increase/decrease/keep term weights,  Switch on or switch off query expansion  Adjust the number of top documents used in PRF  etc. [ J. Luo et al., ’14] Dynamic Information Retrieval Modeling Tutorial 2015
  102. 102. TREC Session Tracks (2010- 2012)  Given a series of queries {q1,q2,…,qn}, top 10 retrieval results {D1, … Di-1 } for q1 to qi-1, and click information  The task is to retrieve a list of documents for the current/last query, qn  Relevance judgment is made based on how relevant the documents are for qn, and how relevant they are for information needs for the entire session (in topic description)  no need to segment the sessions 102 Dynamic Information Retrieval Modeling Tutorial 2015
  103. 103. Query change is an important form of feedback  We define query change as the syntactic editing changes between two adjacent queries:  includes  , added terms  , removed terms  The unchanged/shared terms are called:  , theme term 1 iii qqq iq 103 iq iq iq themeq q1 = “bollywood legislation” q2 = “bollywood law” ------------------------------------- -- Theme Term = “bollywood” Dynamic Information Retrieval Modeling Tutorial 2015
  104. 104. Where do these query changes come from?  Given TREC Session settings, we consider two sources of query change:  the previous search results that a user viewed/read/examined  the information need  Example:  Kurosawa  Kurosawa wife  `wife’ is not in any previous results, but in the topic description  However, knowing information needs before search is difficult to achieve 104 Dynamic Information Retrieval Modeling Tutorial 2015
  105. 105. Previous search results could influence query change in quite complex ways  Merck lobbyists  Merck lobbying US policy  D1 contains several mentions of ‘policy’, such as  “A lobbyist who until 2004 worked as senior policy advisor to Canadian Prime Minister Stephen Harper was hired last month by Merck …”  These mentions are about Canadian policies; while the user adds US policy in q2  Our guess is that the user might be inspired by ‘policy’, but he/she prefers a different sub-concept other than `Canadian policy’  Therefore, for the added terms `US policy’, ‘US’ is the novel term here, and ‘policy’ is not since it appeared in D1.  The two terms should be treated differently 105 Dynamic Information Retrieval Modeling Tutorial 2015
  106. 106. 106/33 POMDP Rich Interactions Hidden, Evolving Information Needs A Long Term Goal Temporal Dependency actions hidden states rewards Markov property POMDP (Partially Observable Markov Decision Process) SG (Stochastic Games) Multi-agent Collaboration
  107. 107. Recap – Characteristics of Dynamic IR Dynamic Information Retrieval Modeling Tutorial 2015107  Rich interactions Query formulation, Document clicks, Document examination, eye movement, mouse movements, etc.  Temporal dependency  Overall goal
  108. 108. Modeling Query Change  A framework that is inspired by Reinforcement Learning  Reinforcement Learning for Markov Decision Process  models a state space S and an action space A according to a transition model T = P(si+1|si ,ai)  a policy π(s) = a indicates that at a state s, what are the actions a can be taken by the agent  each state is associated with a reward function R that indicates possible positive reward or negative loss that a state and an action may result.  Reinforcement learning offers general solutions to MDP and seeks for the best policy for an agent.108
  109. 109. Outline Dynamic Information Retrieval Modeling Tutorial 2015109  Introduction & Theory  Session Search  Dynamic Ranking  Multi Armed Bandits  Portfolio Ranking  Multi-Page Search  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  110. 110. Dynamic Information Retrieval Modeling Tutorial 2015110  Markov Process  Hidden Markov Model  Markov Decision Process  Partially Observable Markov Decision Process  Multi-Armed Bandit Family of Markov Models
  111. 111. Multi Armed Bandits (MAB) Dynamic Information Retrieval Modeling Tutorial 2015111 …… …… Which slot machine should I select in this round? Reward
  112. 112. Multi Armed Bandits (MAB) Dynamic Information Retrieval Modeling Tutorial 2015112 I won! Is this the best slot machine? Reward
  113. 113. MAB Definition Dynamic Information Retrieval Modeling Tutorial 2015113  A tuple (S, A, R, B) S : hidden reward distribution of each bandit A: choose which bandit to play R: reward for playing bandit B: belief space, our estimate of each bandit’s distribution
  114. 114. Comparison with Markov Models Dynamic Information Retrieval Modeling Tutorial 2015114  Single state Markov Decision Process No transition probability  Similar to POMDP in that we maintain a belief state  Action = choose a bandit, does not affect state  Does not ‘plan ahead’ but intelligently adapts  Somewhere between interactive and dynamic IR
  115. 115. MAB Policy Reward Dynamic Information Retrieval Modeling Tutorial 2015115  MAB algorithm describes a policy 𝜋 for choosing bandits  Maximise rewards from chosen bandits over all time steps  Minimize regret  𝑡=1 𝑇 𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎 𝜋(𝑡))  Cumulative difference between optimal reward and actual reward
  116. 116. Exploration vs Exploitation Dynamic Information Retrieval Modeling Tutorial 2015116  Exploration  Try out bandits to find which has highest average reward  Exploitation  Too much exploration leads to poor performance  Play bandits that are known to pay out higher reward on average  MAB algorithms balance exploration and exploitation  Start by exploring more to find best bandits  Exploit more as best bandits become known
  117. 117. MAB – Index Algorithms Dynamic Information Retrieval Modeling Tutorial 2015117  Gittens index1  Play bandit with highest ‘Dynamic Allocation Index’  Modelled using MDP but suffers ‘curse of dimensionality’  𝜖-greedy2  Play highest reward bandit with probability 1 − ϵ  Play random bandit with probability 𝜖  UCB (Upper Confidence Bound)3 1J. C. Gittins. ‘89 2Nicolò Cesa-Bianchi et. al., ‘98
  118. 118. Comparison of Markov Models Dynamic Information Retrieval Modeling Tutorial 2015118  Markov Process – a fully observable stochastic process  Hidden Markov Model – a partially observable stochastic process  MDP – a fully observable decision process  MAB – a decision process, either fully or partially observable  POMDP – a partially observable decision process actions rewards states Markov Process No No Observable Hidden Markov Model No No Unobservable MDP Yes Yes Observable POMDP Yes Yes Unobservable MAB Yes Yes Fixed
  119. 119. Outline Dynamic Information Retrieval Modeling Tutorial 2015119  Introduction & Theory  Session Search  Dynamic Ranking  Multi Armed Bandits  Portfolio Ranking  Multi-Page Search  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  120. 120. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015120 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖
  121. 121. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015121 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖  Calculate for all 𝑖 and select highest
  122. 122. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015122 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖  Calculate for all 𝑖 and select highest  Average reward 𝑥𝑖
  123. 123. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015123 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖  Calculate for all 𝑖 and select highest  Average reward 𝑥𝑖  Time step 𝑡
  124. 124. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015124 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖  Calculate for all 𝑖 and select highest  Average reward 𝑥𝑖  Time step 𝑡  Number of times bandit 𝑖 has been played 𝑇𝑖
  125. 125. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015125 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖  Calculate for all 𝑖 and select highest  Average reward 𝑥𝑖  Time step 𝑡  Number of times bandit 𝑖 has been played 𝑇𝑖  Chances of playing infrequently played bandits increases over time
  126. 126. Iterative Expectation Dynamic Information Retrieval Modeling Tutorial 2015126 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖 M. Sloan and J. Wang ‘1
  127. 127. UCB Algorithm Dynamic Information Retrieval Modeling Tutorial 2015127 𝑥𝑖 + 2 ln 𝑡 𝑇𝑖  Documents 𝑖 M. Sloan and J. Wang ‘1
  128. 128. Iterative Expectation Dynamic Information Retrieval Modeling Tutorial 2015128 𝑟𝑖 + 2 ln 𝑡 𝑇𝑖  Documents 𝑖  Average probability of relevance 𝑟𝑖 M. Sloan and J. Wang ‘1
  129. 129. Iterative Expectation Dynamic Information Retrieval Modeling Tutorial 2015129 𝑟𝑖 + 2 ln 𝑡 𝛾𝑖(𝑡)  Documents 𝑖  Average probability of relevance 𝑟𝑖  ‘Effective’ number of impressions  𝛾𝑖 𝑡 = 𝑘=1 𝑡 𝛼 𝐶 𝑘 𝛽 1−𝐶 𝑘  𝛼 and 𝛽 reward clicks and non-clicks depending on rank M. Sloan and J. Wang ‘1
  130. 130. Iterative Expectation Dynamic Information Retrieval Modeling Tutorial 2015130 𝑟𝑖 + 𝜆 2 ln 𝑡 𝛾𝑖(𝑡)  Documents 𝑖  Average probability of relevance 𝑟𝑖  ‘Effective’ number of impressions  𝛾𝑖 𝑡 = 𝑘=1 𝑡 𝛼 𝐶 𝑘 𝛽 1−𝐶 𝑘  𝛼 and 𝛽 reward clicks and non-clicks depending on rank  Exploration parameter 𝜆 M. Sloan and J. Wang ‘1
  131. 131. Portfolio Theory of IR Dynamic Information Retrieval Modeling Tutorial 2015131  Portfolio Theory maximises expected return for a given amount of risk1  Diversity of portfolio increases likely return  We can consider documents as ‘shares’  Documents are dependent on one another, unlike PRP  Portfolio Theory of IR2 allows us to introduce diversity 1H. Markowitz. ‘52 2J. Wang et. al. ‘09
  132. 132. Portfolio Ranking Dynamic Information Retrieval Modeling Tutorial 2015132  Documents are dependent on each other  Co-click Matrix from users and logs1  Portfolio Armed Bandit Ranking2:  Exploratively rank using Iterative Expectation  Diversify using portfolio optimisation over co-click matrix  Update relevance and dependence with each click  Both explorative and diverse 1W. Wu et al. ‘11 2M. Sloan and Jun Wang‘1
  133. 133. Outline Dynamic Information Retrieval Modeling Tutorial 2015133  Introduction & Theory  Session Search  Dynamic Ranking  Multi Armed Bandits  Portfolio Ranking  Multi-Page Search  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  134. 134. Multi Page Search Dynamic Information Retrieval Modeling Tutorial 2015134 Page 1 Page 2 2. 1. 2. 1. X Jin, M. Sloan and J. Wang ’13
  135. 135. Multi Page Search Example - States & Actions Dynamic Information Retrieval Modeling Tutorial 2015135 State: Relevanc e of docume nt Action: Ranking of document s Observatio n: Clicks Belief: Multivariate Guassian Reward: DCG over 2 pages X Jin, M. Sloan and J. Wang ’13
  136. 136. Model Dynamic Information Retrieval Modeling Tutorial 2015136
  137. 137. Model Dynamic Information Retrieval Modeling Tutorial 2015137  𝑁 𝜃1, Σ1  𝜃1 -prior estimate of relevance  Σ1 - prior estimate of covariance  Document similarity  Topic Clustering
  138. 138. Model Dynamic Information Retrieval Modeling Tutorial 2015138  Rank action for page 1
  139. 139. Model Dynamic Information Retrieval Modeling Tutorial 2015139
  140. 140. Model Dynamic Information Retrieval Modeling Tutorial 2015140  Feedback from page 1  𝒓 ~ 𝑁(𝜃𝒔 1 , Σ 𝒔 1 )
  141. 141. Model Dynamic Information Retrieval Modeling Tutorial 2015141  Update estimates using 𝒓1  𝜃1 = 𝜃𝒔′ 𝜃 𝒔′ Σ1 = Σ𝒔′ Σs′𝒔′ Σs′𝒔′ Σ 𝒔′  𝜃2 = 𝜃𝒔′ + Σs′𝒔′Σ 𝒔′ −1 (𝒓1 − 𝜃𝒔′)  Σ2 = Σ𝒔′ - Σs′𝒔′Σ 𝒔′ −1 Σs′𝒔′
  142. 142. Model Dynamic Information Retrieval Modeling Tutorial 2015142  Rank using PRP
  143. 143. Model Dynamic Information Retrieval Modeling Tutorial 2015143  Utility or Ranking  𝜆 𝑗=1 𝑀 𝜃 𝑠 𝑗 1 log2(𝑗+1) + 1 − 𝜆 𝑗=1+𝑀 2𝑀 𝜃 𝑠 𝑗 2 log2(𝑗+1)  DCG
  144. 144. Model – Bellman Equation Dynamic Information Retrieval Modeling Tutorial 2015144  Optimize 𝒔1 to improve 𝑼 𝒔 2  𝑉 𝜃1 , Σ1 , 1 = max 𝒔1 𝜆𝜃𝒔 1 . 𝑾1 +
  145. 145. 𝜆 Dynamic Information Retrieval Modeling Tutorial 2015145  Balances exploration and exploitation in page 1  Tuned for different queries  Navigational  Informational  𝜆 = 1 for non-ambiguous search
  146. 146. Approximation Dynamic Information Retrieval Modeling Tutorial 2015146  Monte Carlo Sampling  ≈ max 𝒔1 𝜆𝜃𝒔 1 . 𝑾1 + max 𝒔2 1 − 𝜆 1 𝑆 𝑟∈𝑂 𝜃𝒔 2 . 𝑾2 𝑃 𝒓  Sequential Ranking Decision
  147. 147. Experiment Data Dynamic Information Retrieval Modeling Tutorial 2015147  Difficult to evaluate without access to live users  Simulated using 3 TREC collections and relevance judgements  WT10G – Explicit Ratings  TREC8 – Clickthroughs  Robust – Difficult (ambiguous) search
  148. 148. User Simulation Dynamic Information Retrieval Modeling Tutorial 2015148  Rank M documents  Simulated user clicks according to relevance judgements  Update page 2 ranking  Measure at page 1 and 2  Recall  Precision  nDCG  MRR  BM25 – prior ranking model
  149. 149. Investigating λ Dynamic Information Retrieval Modeling Tutorial 2015149
  150. 150. Baselines Dynamic Information Retrieval Modeling Tutorial 2015150  𝜆 determined experimentally  BM25  BM25 with conditional update (𝜆 = 1)  Maximum Marginal Relevance (MMR)  Diversification  MMR with conditional update  Rocchio  Relevance Feedback
  151. 151. Results Dynamic Information Retrieval Modeling Tutorial 2015151
  152. 152. Results Dynamic Information Retrieval Modeling Tutorial 2015152
  153. 153. Results Dynamic Information Retrieval Modeling Tutorial 2015153
  154. 154. Results Dynamic Information Retrieval Modeling Tutorial 2015154
  155. 155. Results Dynamic Information Retrieval Modeling Tutorial 2015155
  156. 156. Outline Dynamic Information Retrieval Modeling Tutorial 2015156  Introduction & Theory  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  157. 157. Cold-start problem in recommmender systems
  158. 158. Interactive Recommender Systems
  159. 159. Possible Solutions Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM, 2013.
  160. 160. Objective Cold-start problemInteractive mechanism for CF Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM, 2013.
  161. 161. Proposed EE algorithms Thompson Sampling Linear-UCB General Linear-UCB Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CI 2013.
  162. 162. Cold-start users Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM 2013.
  163. 163. Ad selection problem Dynamic Information Retrieval Modeling Tutorial 2015163  how online publishers could optimally select ads to maximize their ad incomes over time? Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM 2012 Selling in multiple- channels with non- fixed prices
  164. 164. Dynamic Information Retrieval Modeling Tutorial 2015164 Problem formulation Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM 2012
  165. 165. Problem formulation Dynamic Information Retrieval Modeling Tutorial 2015165 Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM 2012
  166. 166. Objective function Dynamic Information Retrieval Modeling Tutorial 2015166 Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM 2012
  167. 167. Belief update Dynamic Information Retrieval Modeling Tutorial 2015167 Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM 2012
  168. 168. Results Dynamic Information Retrieval Modeling Tutorial 2015168 Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM 2012
  169. 169. Outline Dynamic Information Retrieval Modeling Tutorial 2015169  Introduction & Theory  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  170. 170. Dynamic Information Retrieval Evaluation Guest talk at the WSDM 2015 tutorial on Dynamic Information Retrieval Modeling Charlie Clarke (with much much input from Mark Smucker) University of Waterloo, Canada
  171. 171. Moving from static ranking to dynamic domains • How to extend IR evaluation methodologies to dynamic domains? • Three key ideas: 1. Realistic models of searcher interactions 2. Measures costs to searcher in meaningful units (e.g., time, money, …) 3. Measure benefits to searcher in meaningful units (e.g, time, nuggets, …) Charles Clarke, University of Waterloo 171 This talk strongly reflects my opinions (not trying to be neutral). But I am the guest speaker 
  172. 172. Evaluating Information Access Systems Charles Clarke, University of Waterloo 172 searching, browsing, summarization, visualization, desktop, mobile, web, books, images, questions, etc., and combinations of these Does the system work for its users? Will this change make the system better or worse? How do we quantify performance?
  173. 173. Performance 101: Is this a good search result? Charles Clarke, University of Waterloo 173
  174. 174. How to evaluate? Study users Charles Clarke, University of Waterloo 174 Users in the wild: • A/B Testing • Result interleaving • Clicks and dwell time • Mouse movements • Other implicit feedback • … Users in the lab: • Time to task completion • Think aloud protocols • Questionnaires • Eye tracking • …
  175. 175. Unfortunately user studies are • Slow • Expensive • Conditions can never be exactly duplicated (e.g., learning to rank) Charles Clarke, University of Waterloo 175
  176. 176. Alternative: User performance prediction Can we predict the impact of a proposed change to an information access system (while respecting and reflecting differences between users)? Can we quantify performance improvements in meaningful units so that effect sizes can be considered in statistical testing? Are improvements practically significant, as well as statistically significant? Want to predict the impact of a proposed change automatically, based on existing user performance data, rather than gathering new performance data. Charles Clarke, University of Waterloo 176 The BIG goal ↵
  177. 177. Traditional Evaluation of Rankers • Test collection: – Documents – Queries – Relevance judgments • Each ranker generates a ranked list of documents for each query • Score ranked lists using relevance judgments and standard metrics (recall, mean average precision, nDCG, ERR, RBP, ….). Charles Clarke, University of Waterloo 177
  178. 178. Charles Clarke, University of Waterloo 178 Example of a good-old-fashioned IR Metric Relevant2. Non-relevant1. Non-relevant3. Relevant5. Non-relevant4. Non-relevant6. Non-relevant7. Ranked List of Documents 8. … Precision at Rank N 0.00 0.50 0.33 0.25 0.40 0.33 0.29 … Average Precision is the average of the precision at N for each relevant document. Mean average precision (MAP) is AP averaged over the set of queries. AP = 1 R Prec(Ri ) Ri å Precision at rank N is the fraction of documents that are relevant in the first N documents.
  179. 179. General form of effectiveness measures Nearly all standard effectiveness measures have the same basic form (including nDCG, RBP, ERR, average precision,…): Charles Clarke, University of Waterloo 179 Normalization Rank Gain at rank k Discount factor
  180. 180. Implicit user model… • User works down the ranked list spending equal time on each document. Captions, navigation, etc., have no impact. • If they make it to rank i, they receive some benefit (i.e., gain). • Eventually they stop, which is reflected in the discount (i.e., they are less likely to reach lower ranks). • Normalization typically maps the score into the range [0:1]. Units may not be meaningful. Charles Clarke, University of Waterloo 180
  181. 181. Traditional Evaluation of Rankers • Many effectiveness measures: precision, recall, average precision, rank-biased precision, discounted cumulative gain, etc. • Widely used and accepted as standard practice. • But… • What does an improvement in average precision from 0.28 to 0.31 mean to users? • Does an increase in the measure really translate to an improved user experience? • How will an improve in the performance of a single component impact overall system performance? Charles Clarke, University of Waterloo 181
  182. 182. How to better reflect user variation and system performance? Charles Clarke, University of Waterloo 182 Example: What’s the simplest possible user interface for search? 1) User issues a query 2) System returns material to read i.e., system returns stuff to read, in order (not a list of documents; more like a newspaper article) A correspondingly simple user model, has two parameters: 1) Reading speed 2) Time spent reading
  183. 183. Reading speed distribution (from users in the lab) Charles Clarke, University of Waterloo 183 Empirical distribution of reading speed during an information access task, and its fit to a log-normal distribution.
  184. 184. Stopping time distribution (from users in the wild) Charles Clarke, University of Waterloo 184 Empirical distribution of time spent searching during an information access task, and its fit to a log-normal distribution.
  185. 185. Evaluating a search result Charles Clarke, University of Waterloo 185 1) Generate a reading speed from the distribution 2) Generate a stopping time from the distribution 3) How much useful material did the user read? 4) Repeat for many (simulated) users As an example, we use passage retrieval runs from TREC 2006 Hard Track, which essentially assume our simple user interface. We measure costs to searcher in terms of time spent searching. We measure benefits to searcher in terms of “time well spent”.
  186. 186. Useful characters read vs. Characters read Charles Clarke, University of Waterloo 186 Performance of run york04ha1 on TREC 2004 HARD Track topic 424 (“Bollywood”) with 10,000 simulated users.
  187. 187. Useful characters read vs. Time spent reading Charles Clarke, University of Waterloo 187 Performance of run york04ha1 on TREC 2004 HARD Track topic 424 (“Bollywood”) with 10,000 simulated users.
  188. 188. Time well spent vs. Time spent reading Charles Clarke, University of Waterloo 188 Performance of run york04ha1 on TREC 2004 HARD Track topic 424 (“Bollywood”) with 10,000 simulated users.
  189. 189. Distribution of time well spent Charles Clarke, University of Waterloo 189 Performance of run york04ha1 on TREC 2004 HARD Track topic 424 (“Bollywood”) with 10,000 simulated users.
  190. 190. Temporal precision vs. Time spent Reading Charles Clarke, University of Waterloo 190 Performance of run york04ha1 on TREC 2004 HARD Track topic 424 (“Bollywood”) with 10,000 simulated users.
  191. 191. Distribution of temporal precision Charles Clarke, University of Waterloo 191 Performance of run york04ha1 on TREC 2004 HARD Track topic 424 (“Bollywood”) with 10,000 simulated users.
  192. 192. General Framework (Part I): Cumulative Gain • Consider the performance of a system in terms of a cost-benefit (cumulative gain) curve G(t). – Measure costs (e.g., in terms of time spent). – Measure benefits (e.g., in terms of time well spent). • A particular instance of G(t) represents a single user (described by a set of parameters) interacting with a system. not just a list!!! • G(t) captures factors intrinsic to the system. We don’t know how much time the user has to invest, but for different levels of investment, G(t) indicates the benefit.Charles Clarke, University of Waterloo 192
  193. 193. General Framework (Part II): Decay • Consider the user’s willingness to invest time in terms of a decay curve D(t), which provides a survival probability. • We assume that G(t) and D(t) are independent. (System dependent stopping probabilities are accommodated in G(t). Details on request.) • D(t) captures factors extrinsic to the system. The user only has so much time they could invest. The cannot invest more, even if they would receive substantial additional benefit from further interaction. Charles Clarke, University of Waterloo 193
  194. 194. General form of effectiveness measures (REMINDER) Nearly all standard effectiveness measures have the same basic form (including nDCG, RBP, ERR, average precision,…): Charles Clarke, University of Waterloo 194 Normalization Rank Gain at rank k Discount factor
  195. 195. General Framework (Part III): Time-biased gain Overall system performance may be expressed as expected cumulative gain (which also incorporates standard effectiveness measures): Charles Clarke, University of Waterloo 195 Normalization (== 1?) Time Gain at time t Decay factor
  196. 196. General Framework (Part IV): Multiple users • Cumulative gain may be computed by – Simulation (drawing a set of parameters from a population of users). – Measuring actual interaction on live systems. – Combinations of measurement and simulation. • Simulating and/or measuring multiple users allows us to consider performance difference across the population of users. • Simulation provides matching pairs (the same user on both systems) increasing our ability to detect differences. Charles Clarke, University of Waterloo 196
  197. 197. General Framework Most of the evaluation proposals in the references can be reformulated in terms of this general framework, including those that address issues of: – Novelty and diversity – Filtering, summarization, question answering – Session search, etc. Charles Clarke, University of Waterloo 197 One more example from our current research…
  198. 198. Session search example • Two (or more) result lists, e.g., from query reformulation, query suggestion, or switching search engines. • Modeling searcher interaction requires a switch from one result to another. • The optimal time to switch depends on the total time available to search. For example (with many details omitted…): Charles Clarke, University of Waterloo 198
  199. 199. Simulation of searchers switching between lists: A vs. B Charles Clarke, University of Waterloo 199 User starts on list A. If the user has less than five minutes to search, they should stay on list A. If the user has more than five minutes to search, they should leave list A after 90 seconds. But can we assume optimal behavior when modeling users?
  200. 200. Simulation of searchers switching between lists: A vs. B Charles Clarke, University of Waterloo 200 0 2 4 6 8 10 02468 Switch Time (minutes) AverageGain(relevantdocuments) 10 minutes 8 minutes 6 minutes 4 minutes 2 minutes Session Duration Topic = 389, List A = sab05ror1, List B = uic0501 Different view of the same simulation, with thousands of simulated users. Here, benefits are measured by number of relevant documents seen. Optimal switching time depends on session duration.
  201. 201. Summary • Primary goal of IR evaluation: Predict how changes to an IR system will impact the user experience. • Evaluation in dynamic domains requires us to explicitly model the system interface and the user’s search behavior. Costs and benefits must be measured in meaningful units (e.g., time). • Successful IR evaluation requires measurement of users, both “in the wild” and in the lab. These measurements calibrate models, which make predictions, which improve systems. Charles Clarke, University of Waterloo 201
  202. 202. A few key papers • Leif Azzopardi. 2009. Usage based effectiveness measures: monitoring application performance in information retrieval. In Proceedings of the 18th ACM conference on Information and knowledge management (CIKM '09). • Leif Azzopardi, Diane Kelly, and Kathy Brennan. 2013. How query cost affects search behavior. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '13). • Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2012. Time drives interaction: simulating sessions in diverse searching environments. In Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (SIGIR '12). • Ben Carterette. 2011. System effectiveness, user models, and user utility: a conceptual framework for investigation. In Proceedings of the 34th international ACM SIGIR conference on research and development in Information Retrieval (SIGIR '11). • Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2011. Simulating simple user behavior for system effectiveness evaluation. In Proceedings of the 20th ACM international conference on information and knowledge management (CIKM '11). • Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2012. Incorporating variability in user behavior into systems based evaluation. In Proceedings of the 21st ACM international conference on information and knowledge management (CIKM '12). Charles Clarke, University of Waterloo 202
  203. 203. A few more key papers • Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on information and knowledge management (CIKM '09). • Charles L.A. Clarke, Nick Craswell, Ian Soboroff, and Azin Ashkan. 2011. A comparative analysis of cascade measures for novelty and diversity. In Proceedings of the fourth ACM international conference on web search and data mining (WSDM '11). • Charles L. A. Clarke and Mark D. Smucker. 2014. Time well spent. In Proceedings of the 5th information interaction in context symposium (IIiX '14). • Georges Dupret and Mounia Lalmas. 2013. Absence time and user engagement: evaluating ranking functions. In Proceedings of the sixth ACM international conference on web search and data mining (WSDM '13). • Kalervo Järvelin, Susan L. Price, Lois M. L. Delcambre, and Marianne Lykke Nielsen. 2008. Discounted cumulated gain based evaluation of multiple-query IR sessions. In Proceedings of the IR research, 30th European conference on Advances in information retrieval (ECIR'08). • Jiyun Luo, Christopher Wing, Hui Yang, and Marti Hearst. 2013. The water filling model and the cube test: multi-dimensional evaluation for professional search. In Proceedings of the 22nd ACM international conference on information & knowledge management (CIKM '13). Charles Clarke, University of Waterloo 203
  204. 204. And yet more key papers • Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, ranked retrieval and sessions: a unified framework for information access evaluation. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '13). • Mark D. Smucker and Charles L.A. Clarke. 2012. Time-based calibration of effectiveness measures. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '12). • Mark D. Smucker and Charles L. A. Clarke. 2012. Modeling user variance in time-biased gain. In Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval (HCIR '12). • Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson. 2010. Expected browsing utility for web search evaluation. In Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10). • Yiming Yang and Abhimanyu Lad. 2009. Modeling expected utility of multi-session information distillation. In Proceedings of the 2nd international conference on the theory of information retrieval (ICTIR ’09). • Plus many other (ask me). Charles Clarke, University of Waterloo 204
  205. 205. Dynamic Information Retrieval Evaluation Guest talk at the WSDM 2015 tutorial on Dynamic Information Retrieval Modeling Charlie Clarke University of Waterloo, Canada Thank you!
  206. 206. Outline Dynamic Information Retrieval Modeling Tutorial 2015206  Introduction & Theory  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel
  207. 207. Apply an MDP to an IR Problem Dynamic Information Retrieval Modeling Tutorial 2015207  We can model IR systems using a Markov Decision Process  Is there a temporal component?  States – What changes with each time step?  Actions – How does your system change the state?  Rewards – How do you measure feedback or effectiveness in your problem at each time step?  Transition Probability – Can you determine this?
  208. 208. Apply an MDP to an IR Problem - Example Dynamic Information Retrieval Modeling Tutorial 2015208  User agent in session search  States – user’s relevance judgement  Action – new query  Reward – information gained [Luo, Zhang, Yang SIGIR’14]
  209. 209. Apply an MDP to an IR Problem - Example Dynamic Information Retrieval Modeling Tutorial 2015209  Search engine’s perspective  What if we can’t directly observe user’s relevance judgement?  Click ≠ relevance ? ? ? ?
  210. 210. Applying POMDP to Dynamic IR Dynamic Information Retrieval Modeling Tutorial 2015210 POMDP Dynamic IR Environment Documents Agents User, Search engine States Queries, User’s decision making status, Relevance of documents, etc Actions Provide a ranking of documents, Weigh terms in the query, Add/remove/unchange the query terms, Switch on or switch off a search technology, Adjust parameters for a search technology Observations Queries, Clicks, Document lists, Snippets, Terms, etc Rewards Evaluation measures (such as DCG, NDCG or MAP) Clicking information Transition matrix Given in advance or estimated from training data. Observation function Problem dependent, Estimated based on sample datasets
  211. 211.  SIGIR Tutorial July 7th 2014 Grace Hui Yang Marc Sloan Jun Wang  Guest Speaker: Emine Yilmaz Dynamic Information Retrieval Modeling Panel Discussion
  212. 212. Outline Dynamic Information Retrieval Modeling Tutorial 2015212  Introduction & Theory  Session Search  Dynamic Ranking  Recommendation and Advertising  Guest Talk: Charlie Clarke  Discussion Panel  Conclusion
  213. 213. Conclusions Dynamic Information Retrieval Modeling Tutorial 2015213  Dynamic IR describes a new class of interactive model  Incorporates rich feedback, temporal dependency and is goal oriented.  Family of Markov models and Multi Armed Bandit theory useful in building DIR models  Applicable to a range of IR problems  Useful in applications such as session search and evaluation
  214. 214. Dynamic IR Book Dynamic Information Retrieval Modeling Tutorial 2015214  Published by Morgan & Claypool  ‘Synthesis Lectures on Information Concepts, Retrieval, and Services’  Due April / May 2015 (in time for SIGIR 2015)
  215. 215. TREC 2015 Dynamic Domain Track  Co-organized by Grace Hui Yang, John Frank, Ian Soboroff  Underexplored subsets of Web content  Limited scope and richness of indexed content, which may not include relevant components of the deep web  temporary pages,  pages behind forms, etc.  Basic search interfaces, where there is little collaboration or history beyond independent keyword search  Complex, task-based, dynamic search  Temporal dependency  Rich interactions  Complex, evolving information needs  Professional users  A wide range of search strategies 215
  216. 216. Task  An interactive, multiple runs of search  Starting point: System is given a search query  Iterate  System returns a ranked list of 5 documents  API returns relevance judgments  go to next iteration of retrieval  until done (system decides when to stop)  The goal of the system is to find relevant information for each topic as soon as possible  One-shot ad-hoc search is included  If system decides to stop after iteration one 216
  217. 217. domains Domain Corpus Illicit goods 30k forum posts from 5-10 forums (total ~300k posts) Which users are working together to sell illicit goods? Ebola One million tweets 300k docs from in-country web sites (mostly official sites) Who is doing what and where? Local Politics 300k docs from local political groups in Pacific Northwest and British Columbia. Who is campaigning for what and why? 217
  218. 218. TIME Line  TREC Call for Participation: January 2015  Data Available: March  Detailed Guidelines: April/May  Topics, Tasks available: June  Systems do their thing: June-July  Evaluation: August  Results to participants: September  Conference: November 2015 218
  219. 219. TREC 2015 Total Recall Track  Co-organized by Gord Cormack, Maura Grossman, , Adam Roegiest, Charlie Clarke  Explores high recall tasks through an active learning process modeled on legal search tasks (eDiscovery, patent search).  Participating system start with a topic and proposes a relevant document.  Systems gets immediate feedback on relevance.  Continues to propose additional documents and receive feedback until stopping condition is researched.  Shared online infrastructure and collections with Dynamic Domain. Easy to participate in both, if you participate in one. 219
  220. 220. Acknowledgment Dynamic Information Retrieval Modeling Tutorial 2015220  We thank Prof. Charlie Clarke and for his guest lecture  We sincerely thank Dr. Xuchu Dong for his help in preparation of the tutorial  We also thank comments and suggestions from the following colleagues:  Dr Filip Radlinski  Prof. Maarten de Rijke
  221. 221. References Dynamic Information Retrieval Modeling Tutorial 2015221 Static IR  Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro-Neto. Addison-Wesley, 1999.  The PageRank Citation Ranking: Bringing Order to the Web. Lawrence Page , Sergey Brin , Rajeev Motwani , Terry Winograd. 1999  Implicit User Modeling for Personalized Search, Xuehua Shen et. al, CIKM, 2005  A Short Introduction to Learning to Rank. Hang Li, IEICE Transactions 94-D(10): 1854-1862, 2011.  Portfolio Theory of Information Retrieval. J. Wang and J. Zhu. In SIGIR 2009
  222. 222. References Dynamic Information Retrieval Modeling Tutorial 2015222 Interactive IR  Relevance Feedback in Information Retrieval, Rocchio, J. J., The SMART Retrieval System (pp. 313-23), 1971  A study in interface support mechanisms for interactive information retrieval, Ryen W. White et. al, JASIST, 2006  Visualizing stages during an exploratory search session, Bill Kules et. al, HCIR, 2011  Dynamic Ranked Retrieval, Cristina Brandt et. al, WSDM, 2011  Structured Learning of Two-level Dynamic Rankings, Karthik Raman et. al, CIKM, 2011
  223. 223. References Dynamic Information Retrieval Modeling Tutorial 2015223 Dynamic IR  A hidden Markov model information retrieval system. D. R. H. Miller, T. Leek, and R. M. Schwartz. In SIGIR’99, pages 214-221.  Threshold setting and performance optimization in adaptive filtering, Stephen Robertson, JIR 2002  A large-scale study of the evolution of web pages, Dennis Fetterly et. al., WWW 2003  Learning diverse rankings with multi-armed bandits. Filip Radlinski, Robert Kleinberg, Thorsten Joachims. ICML, 2008.  Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem, Yisong Yue et. al., ICML 2009  Meme-tracking and the dynamics of the news cycle,
  224. 224. References Dynamic Information Retrieval Modeling Tutorial 2015224 Dynamic IR  Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, Eli Upfal. NIPS 2009  A Novel Click Model and Its Applications to Online Advertising , Zeyuan Allen Zhu et. al., WSDM 2010  A contextual-bandit approach to personalized news article recommendation. Lihong Li, Wei Chu, John Langford, Robert E. Schapire. WWW, 2010  Inferring search behaviors using partially observable markov model with duration (POMD), Yin he et. al., WSDM, 2011  No Clicks, No Problem: Using Cursor Movements to Understand and Improve Search, Jeff Huang et. al., CHI 2011  Balancing Exploration and Exploitation in Learning to Rank Online, Katja Hofmann et. al., ECIR, 2011  Large-Scale Validation and Analysis of Interleaved Search Evaluation, Olivier Chapelle et. al., TOIS 2012
  225. 225. References Dynamic Information Retrieval Modeling Tutorial 2015225 Dynamic IR  Using Control Theory for Stable and Efficient Recommender Systems. T. Jambor, J. Wang, N. Lathia. In: WWW '12, pages 11-20.  Sequential selection of correlated ads by POMDPs, Shuai Yuan et. al., CIKM 2012  Utilizing query change for session search. D. Guan, S. Zhang, and H. Yang. In SIGIR ’13, pages 453– 462.  Query Change as Relevance Feedback in Session Search (short paper). S. Zhang, D. Guan, and H. Yang. In SIGIR 2013.  Interactive exploratory search for multi page search results. X. Jin, M. Sloan, and J. Wang. In WWW ’13.  Interactive Collaborative Filtering. X. Zhao, W.
  226. 226. References Dynamic Information Retrieval Modeling Tutorial 2015226 Dynamic IR  Win-win search: Dual-agent stochastic game in session search. J. Luo, S. Zhang, and H. Yang. In SIGIR ’14.  Iterative Expectation for Multi-Period Information Retrieval. M. Sloan and J. Wang. In WSCD 2013.  Dynamical Information Retrieval Modelling: A Portfolio-Armed Bandit Machine Approach. M. Sloan and J. Wang. In WWW 2012.  Jiyun Luo, Sicong Zhang, Xuchu Dong and Hui Yang. Designing States, Actions, and Rewards for Using POMDP in Session Search. In ECIR 2015.  Sicong Zhang, Jiyun Luo, Hui Yang. A POMDP Model for Content-Free Document Re-ranking. In SIGIR 2014.
  227. 227. References Dynamic Information Retrieval Modeling Tutorial 2015227 Markov Processes  A markovian decision process. R. Bellman. Indiana University Mathematics Journal, 6:679–684, 1957.  Dynamic Programming. R. Bellman. Princeton University Press, Princeton, NJ, USA, first edition, 1957.  Dynamic Programming and Markov Processes. R.A. Howard. MIT Press. 1960  Linear Programming and Sequential Decisions. Alan S. Manne. Management Science, 1960  Statistical Inference for Probabilistic Functions of Finite State Markov Chains. Baum, Leonard E.; Petrie, Ted. The Annals of Mathematical Statistics 37, 1966
  228. 228. References Dynamic Information Retrieval Modeling Tutorial 2015228 Markov Processes  Learning to predict by the methods of temporal differences. Richard Sutton. Machine Learning 3. 1988  Computationally feasible bounds for partially observed Markov decision processes. W. Lovejoy. Operations Research 39: 162–175, 1991.  Q-Learning. Christopher J.C.H. Watkins, Peter Dayan. Machine Learning. 1992  Reinforcement learning with replacing eligibility traces. Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages 123-158, 1996.  Reinforcement Learning: An Introduction. Richard S. Sutton and Andrew G. Barto. MIT Press, 1998.  Planning and acting in partially observable stochastic domains. L. Kaelbling, M. Littman, and A. Cassandra. Artificial Intelligence, 101(1-2):99–134, 1998.
  229. 229. References Dynamic Information Retrieval Modeling Tutorial 2015229 Markov Processes  Finding approximate POMDP solutions through belief compression. N. Roy. PhD Thesis Carnegie Mellon. 2003  VDCBPI: an approximate scalable algorithm for large scale POMDPs, P. Poupart and C. Boutilier. In NIPS-2004, pages 1081–1088.  Finding Approximate POMDP solutions Through Belief Compression. N. Roy, G. Gordon and S. Thrun. Journal of Artificial Intelligence Research, 23:1-40,2005.  Probabilistic robotics. S. Thrun, W. Burgard, D. Fox. Cambridge. MIT Press. 2005  Anytime Point-Based Approximations for Large POMDPs. J. Pineau, G. Gordon and S. Thrun. Volume 27, pages 335-380, 2006  Probabilistic Robotics. S. Thrun, W. Burgard, D. Fox. The MIT Press, 2006.
  230. 230. References Dynamic Information Retrieval Modeling Tutorial 2015230 Markov Processes  The optimal control of partially observable Markov decision processes over a finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978.  An example of statistical investigation of the text eugene onegin the connection of samples in chains. A. A. Markov. Science in Context, 19:591–600, 12 2006.  Learning to Rank for Information Retrieval. Tie-Yan Liu. Springer Science & Business Media. 2011  Finite-Time Regret Bounds for the Multiarmed Bandit Problem, Nicolò Cesa-Bianchi, Paul Fischer. ICML 100-108, 1998  Multi-armed bandit allocation indices, Wiley, J. C. Gittins. 1989  Finite-time Analysis of the Multiarmed Bandit Problem, Peter Auer et. al., Machine Learning 47, Issue 2-3. 2002.

×