Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems, are increasingly being utilized in search engines and information filtering systems. Examples include large datasets containing sequential data capturing document dynamics and modern IR systems observing user dynamics through interactivity. Existing IR techniques are limited in their ability to optimize over changes, learn with minimal computational footprint and be responsive and adaptive.
The objective of this tutorial is to provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling. Dynamic IR Modeling is the statistical modeling of IR systems that can adapt to change. It is a natural follow-up to previous statistical IR modeling tutorials with a fresh look on state-of-the-art dynamic retrieval models and their applications including session search and online advertising. The tutorial covers techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and presents to fellow researchers and practitioners a handful of useful algorithms and tools for solving IR problems incorporating dynamics.
http://www.dynamic-ir-modeling.org/
A newer version of this tutorial presented at WSDM 2015 can be found here http://www.slideshare.net/marcCsloan/dynamic-information-retrieval-tutorial-wsdm-2015
This version has a greater emphasis on the underlying theory and a guest lecture on evaluation by Dr Emine Yilmaz. The newer version presents a wider range of applications of DIR in state of the art research and includes a guest lecture on evaluation by Prof Charles Clarke.
@inproceedings{Yang:2014:DIR:2600428.2602297,
author = {Yang, Hui and Sloan, Marc and Wang, Jun},
title = {Dynamic Information Retrieval Modeling},
booktitle = {Proceedings of the 37th International ACM SIGIR Conference on Research \&\#38; Development in Information Retrieval},
series = {SIGIR '14},
year = {2014},
isbn = {978-1-4503-2257-7},
location = {Gold Coast, Queensland, Australia},
pages = {1290--1290},
numpages = {1},
url = {http://doi.acm.org/10.1145/2600428.2602297},
doi = {10.1145/2600428.2602297},
acmid = {2602297},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {dynamic information retrieval modeling, probabilistic relevance model, reinforcement learning},
}
4. Dynamic Information Retrieval
Dynamic Information Retrieval ModelingTutorial 20144
Documents
to explore Information
need
Observed
documents
User
Devise a strategy for
helping the user
explore the
information space in
order to learn which
documents are
relevant and which
arenāt, and satisfy
their information
need.
5. Evolving IR
Dynamic Information Retrieval ModelingTutorial 20145
ļ Paradigm shifts in IR as new models emerge
ļ e.g.VSM ā BM25 ā Language Model
ļ Different ways of defining relationship between
query and document
ļ Static ā Interactive ā Dynamic
ļ Evolution in modeling user interaction with search
engine
6. Outline
Dynamic Information Retrieval ModelingTutorial 20146
ļ Introduction
ļ¬ Static IR
ļ¬ Interactive IR
ļ¬ Dynamic IR
ļ Theory and Models
ļ Session Search
ļ Reranking
ļ GuestTalk: Evaluation
7. Conceptual Model ā Static IR
Dynamic Information Retrieval ModelingTutorial 20147
Static IR
Interactive
IR
Dynamic
IR
ļ No feedback
8. Characteristics of Static IR
Dynamic Information Retrieval ModelingTutorial 20148
ļ Does not learn directly from user
ļ Parameters updated periodically
12. Outline
Dynamic Information Retrieval ModelingTutorial 201412
ļ Introduction
ļ¬ Static IR
ļ¬ Interactive IR
ļ¬ Dynamic IR
ļ Theory and Models
ļ Session Search
ļ Reranking
ļ GuestTalk: Evaluation
13. Conceptual Model ā Interactive IR
Dynamic Information Retrieval ModelingTutorial 201413
Static IR
Interactive
IR
Dynamic
IR
ļ Exploit Feedback
15. Learn the userās taste
interactively!
At the same time, provide good
recommendations!
Dynamic Information Retrieval ModelingTutorial 201415
Interactive Recommender
Systems
16. Example - Multi Page Search
Dynamic Information Retrieval ModelingTutorial 201416
Ambiguous
Query
17. Example - Multi Page Search
Dynamic Information Retrieval ModelingTutorial 201417
Topic: Car
18. Example - Multi Page Search
Dynamic Information Retrieval ModelingTutorial 201418
Topic:Animal
19. Example ā Interactive Search
Dynamic Information Retrieval ModelingTutorial 201419
Click on ācarā
webpage
20. Example ā Interactive Search
Dynamic Information Retrieval ModelingTutorial 201420
Click on āNext
Pageā
21. Example ā Interactive Search
Dynamic Information Retrieval ModelingTutorial 201421
Page 2 results:
Cars
22. Example ā Interactive Search
Dynamic Information Retrieval ModelingTutorial 201422
Click on āanimalā
webpage
23. Example ā Interactive Search
Dynamic Information Retrieval ModelingTutorial 201423
Page 2 results:
Animals
24. Example ā Dynamic Search
Dynamic Information Retrieval ModelingTutorial 201424
Topic: Guitar
25. Example ā Dynamic Search
Dynamic Information Retrieval ModelingTutorial 201425
Diversified Page
1
Topics: Cars,
animals, guitars
26. Toy Example
Dynamic Information Retrieval ModelingTutorial 201426
ļ Multi-Page search scenario
ļ User image searches for ājaguarā
ļ Rank two of the four results over two pages:
š = 0.5 š = 0.51 š = 0.9š = 0.49
27. Toy Example ā Static Ranking
Dynamic Information Retrieval ModelingTutorial 201427
ļ Ranked according to PRP
Page 1 Page 2
1.
2.
š = 0.9
š = 0.51
1.
2.
š = 0.5
š = 0.49
28. Toy Example ā Relevance
Feedback
Dynamic Information Retrieval ModelingTutorial 201428
ļ Interactive Search
ļ Improve 2nd page based on feedback from 1st page
ļ Use clicks as relevance feedback
ļ Rocchio1 algorithm on terms in image webpage
ļ š¤ š
ā²
= š¼š¤ š +
š½
|š· š|
š¤ ššāš· š
ā
š¾
š· š
š¤ ššāš· š
ļ New query closer to relevant documents and
different to non-relevant documents
1Rocchio, J. J., ā71, Baeza-Yates &
Ribeiro-Netoā99
29. Toy Example ā Relevance
Feedback
Dynamic Information Retrieval ModelingTutorial 201429
ļ Ranked according to PRP and Rocchio
Page 1 Page 2
2.
š = 0.9
š = 0.51
1.
2.
š = 0.5
š = 0.49
1.
*
* Click
30. Toy Example ā Relevance
Feedback
Dynamic Information Retrieval ModelingTutorial 201430
ļ No click when searching for animals
Page 1 Page 2
2.
š = 0.9
š = 0.51
1.
2.
1.
?
?
31. Toy Example ā Value Function
Dynamic Information Retrieval ModelingTutorial 201431
ļ Optimize both pages using dynamic IR
ļ Bellman equation for value function
ļ Simplified example:
ļ š š”
š š”
, Ī£ š”
= max
š š”
šš
š”
+ šø(š š”+1
š š”+1
, Ī£ š”+1
š¶ š”
)
ļ š š”
, Ī£ š”
= relevance and covariance of documents for page š”
ļ š¶ š” = clicks on page š”
ļ š š” =āvalueā of ranking on page š”
ļ Maximize value over all pages based on estimating feedback
32. 1 0.8 0.1 0
0.8 1 0.1 0
0.1 0.1 1 0.95
0 0 0.95 1
Toy Example - Covariance
Dynamic Information Retrieval ModelingTutorial 201432
ļ Covariance matrix represents similarity between images
33. Toy Example ā Myopic Value
Dynamic Information Retrieval ModelingTutorial 201433
ļ For myopic ranking, š2
= 16.380
Page 1
2.
1.
34. Toy Example ā Myopic Ranking
Dynamic Information Retrieval ModelingTutorial 201434
ļ Page 2 ranking stays the same regardless of clicks
Page 1 Page 2
2.
1.
2.
1.
35. Toy Example ā Optimal Value
Dynamic Information Retrieval ModelingTutorial 201435
ļ For optimal ranking, š2
= 16.528
Page 1
2.
1.
36. Toy Example ā Optimal Ranking
Dynamic Information Retrieval ModelingTutorial 201436
ļ If car clicked, Jaguar logo is more relevant on next page
Page 1 Page 2
2.
1.
2.
1.
37. Toy Example ā Optimal Ranking
Dynamic Information Retrieval ModelingTutorial 201437
ļ In all other scenarios, rank animal first on next page
Page 1 Page 2
2.
1.
2.
1.
38. Interactive vs Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201438
ā¢ Treats interactions
independently
ā¢ Responds to
immediate
feedback
ā¢ Static IR used
before feedback
received
ā¢ Optimizes over
all interaction
ā¢ Long term gains
ā¢ Models future
user feedback
ā¢ Also used at
beginning of
interaction
Interactive Dynamic
39. Outline
Dynamic Information Retrieval ModelingTutorial 201439
ļ Introduction
ļ¬ Static IR
ļ¬ Interactive IR
ļ¬ Dynamic IR
ļ Theory and Models
ļ Session Search
ļ Reranking
ļ GuestTalk: Evaluation
40. Conceptual Model ā Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201440
Static IR
Interactive
IR
Dynamic
IR
ļ Explore and exploit Feedback
41. Characteristics of Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201441
ļ¬Rich interactions
ļ¬ Query formulation
ļ¬ Document clicks
ļ¬ Document examination
ļ¬ eye movement
ļ¬ mouse movements
ļ¬ etc.
42. Characteristics of Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201442
ļ¬Temporal dependency
clicked
documentsquery
D1
ranked documents
q1 C1
D2
q2 C2
ā¦ā¦
ā¦ā¦ Dn
qn Cn
I
information need
iteration 1 iteration 2 iteration n
43. Characteristics of Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201443
ļ¬Overall goal
ļ¬Optimize over all iterations for goal
ļ¬IR metric or user satisfaction
ļ¬Optimal policy
44. Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201444
ļ Dynamic IR explores actions
ļ Dynamic IR learns from user and adjusts its
actions
ļ May hurt performance in a single stage, but
improves over all stages
45. Applications to IR
Dynamic Information Retrieval ModelingTutorial 201445
ļ Dynamics found in lots of different aspects of IR
ļ Dynamic Users
ļ Users change behaviour over time, user history
ļ Dynamic Documents
ļ Information Filtering, document content change
ļ Dynamic Queries
ļ Changing query definition i.e.āTwitterā
ļ Dynamic Information Needs
ļ Topic ontologies evolve over time
ļ Dynamic Relevance
ļ Seasonal/time of day change in relevance
46. User Interactivity in DIR
Dynamic Information Retrieval ModelingTutorial 201446
ļ Modern IR interfaces
ļ Facets
ļ Verticals
ļ Personalization
ļ Responsive to particular user
ļ Complex log data
ļ Mobile
ļ Richer user interactions
ļ Ads
ļ Adaptive targeting
47. Big Data
Dynamic Information Retrieval ModelingTutorial 201447
ļ Data set sizes are always increasing
ļ Computational footprint of learning to rank
ļ Rich, sequential data
1Yin He et. al, ā11
ļ Complex user model behaviour found in data, takes into
account reading, skipping and re-reading behaviours1
ļ Uses a POMDP
Example
48. Online Learning to Rank
Dynamic Information Retrieval ModelingTutorial 201448
ļ Learning to rank iteratively on sequential data
ļ Clicks as implicit user feedback/preference
ļ Often uses multi-armed bandit techniques
1Katja Hofmann et. al., ā11
2YisongYue et. al.,ā09
ļ Uses click models to interpret clicks and a contextual
bandit to improve learning1
ļ Pairwise comparison of rankings using duelling bandits
formulation2
Example
49. Evaluation
Dynamic Information Retrieval ModelingTutorial 201449
ļ Use complex user interaction data to assess rankings
ļ Compare ranking techniques in online testing
ļ Minimise user dissatisfaction
1Jeff Huang et. al.,ā11
2Olivier Chapelle et. al.,ā12
ļ Modelled cursor activity and correlated with eye tracking to
validate good or bad abandonment1
ļ Interleave search results from two ranking algorithms to
determine which is better2
Example
50. Filtering and News
Dynamic Information Retrieval ModelingTutorial 201450
ļ Adaptive techniques to personalize information filtering
or news recommendation
ļ Understand the complex dynamics of real world events
in search logs
ļ Capture temporal document change1
1Dennis Fetterly et. al.,ā03
2Stephen Robertson,ā02
3Jure Leskovec et. al.,ā09
ļ Uses relevance feedback to adapt threshold sensitivity over
time in information filtering to maximise overal utility1
ļ Detected patterns and memes in news cycles and modeled
how information spreads2
Example
51. Advertising
Dynamic Information Retrieval ModelingTutorial 201451
ļ Behavioural targeting and personalized ads
ļ Learn when to display new ads
ļ Maximise profit from available ads
1ShuaiYuan et. al.,ā12
2ZeyuanAllen Zhu et. al.,ā10
ļ Uses a POMDP and ad correlation to find the optimal ad to
display to a user1
ļ Dynamic click model that can interpret complex user
behaviour in logs and apply results to tail queries and unseen
ads2
Example
52. Outline
Dynamic Information Retrieval ModelingTutorial 201452
ļ Introduction
ļ Theory and Models
ļ Session Search
ļ Reranking
ļ GuestTalk: Evaluation
53. Outline
Dynamic Information Retrieval ModelingTutorial 201453
ļ Introduction
ļ Theory and Models
ļ¬ Why not use supervised learning
ļ¬ Markov Models
ļ Session Search
ļ Reranking
ļ Evaluation
54. Why not use Supervised Learning
for Dynamic IR Modeling?
Dynamic Information Retrieval ModelingTutorial 201454
ļ Lack of enough training data
ļ Dynamic IR problems contain a sequence of dynamic interactions
ļ E.g. a series of queries in session
ļ Rare to find repeated sequences (close to zero)
ļ Even in large query logs (WSCD 2013 & 2014, query logs fromYandex)
ļ Chance of finding repeated adjacent query pairs is
also low
Dataset Repeated Adjacent
Query Pairs
Total Adjacent
Query Pairs
Repeated
Percentage
WSCD 2013 476,390 17,784,583 2.68%
WSCD 2014 1,959,440 35,376,008 5.54%
55. Our Solution
Dynamic Information Retrieval ModelingTutorial 201455
Try to find an optimal solution through a
sequence of dynamic interactions
Trial and Error:
learn from repeated, varied attempts which
are continued until success
No Supervised Learning
57. Dynamic Information Retrieval ModelingTutorial 201457
ļ Rich interactions
Query formulation, Document clicks, Document examination,
eye movement, mouse movements, etc.
ļ Temporal dependency
ļ Overall goal
Recap ā Characteristics of
Dynamic IR
58. Dynamic Information Retrieval ModelingTutorial 201458
ļ Model interactions, which means it needs to have place holders for
actions;
ļ Model information need hidden behind user queries and other
interactions;
ļ Set up a reward mechanism to guide the entire search algorithm to adjust
its retrieval strategies;
ļ Represent Markov properties to handle the temporal dependency.
What is a Desirable Model for
Dynamic IR
A model inTrial and Error setting will do!
A Markov Model will do!
59. Outline
Dynamic Information Retrieval ModelingTutorial 201459
ļ Introduction
ļ Theory and Models
ļ¬ Why not use supervised learning
ļ¬ Markov Models
ļ Session Search
ļ Reranking
ļ Evaluation
60. Markov Process
ļ Markov Property1 (the āmemorylessā property)
for a system, its next state depends on its current state.
Pr(Si+1|Si,ā¦,S0)=Pr(Si+1|Si)
ļ Markov Process
a stochastic process with Markov property.
e.g.
Dynamic Information Retrieval ModelingTutorial 201460 1A.A. Markov,ā06
s0 s1
ā¦ā¦ si
ā¦ā¦si+1
61. Dynamic Information Retrieval ModelingTutorial 201461
ļ Markov Chain
ļ Hidden Markov Model
ļ Markov Decision Process
ļ Partially Observable Markov Decision Process
ļ Multi-armed Bandit
Family of Markov Models
62. A
Pagerank(A)
ļ Discrete-time Markov process
ļ Example: Google PageRank1
Markov Chain
B
Pagerank(B)
šššššššš š =
1 ā š¼
š
+ š¼
šššššššš(š)
šæ(š)
šāĪ
# of pages # of outlinks
pages linked to S
Dynamic Information Retrieval ModelingTutorial 201462
D
Pagerank(D)
C
Pagerank(C)
E
Pagerank(E)
Random jump factor
1L. Page et. al.,ā99
The stable state distribution of such an MC is PageRank
ļ State S ā web page
ļ Transition probability M
ļ PageRank: how likely a random web
surfer will land on a page
(S, M)
63. Hidden Markov Model
ļ A Markov chain that states are hidden and observable
symbols are emitted with some probability according to its
states1.
Dynamic Information Retrieval ModelingTutorial 201463
s0 s1 s2
ā¦ā¦
o0 o1 o2
p0
š0
p1 p2
š1 š2
Siā hidden state pi -- transition probability oi --observation
ei --observation probability (emission probability)
1Leonard E. Baum et. al.,ā66
(S, M, O, e)
64. An HMM example for IR
Construct an HMM for each document1
Dynamic Information Retrieval ModelingTutorial 201464
s0 s1 s2 ā¦ā¦
t0 t1 t2
p0
š0
p1 p2
š1 š2
Siā āDocumentā or
āGeneral Englishā
pi āa0 or a1
ti ā query term
ei ā Pr(t|D) or Pr(t|GE)
P(D|q)ā (š0 š š” šŗšø + š1 š(š”|š·))š”āš
Document-to-query relevance
1Miller et. al.ā99
query
65. ļ MDP extends MC with actions and rewards1
siā state ai ā action ri ā reward
pi ā transition probability
p0 p1 p2
Markov Decision Process
Dynamic Information Retrieval ModelingTutorial 201465
ā¦ā¦s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
1R. Bellman,ā57
(S, M, A, R, Ī³)
66. Definition of MDP
ļ A tuple (S, M, A, R, Ī³)
ļ S : state space
ļ M: transition matrix
Ma(s, s') = P(s'|s, a)
ļ A: action space
ļ R: reward function
R(s,a) = immediate reward taking action a at state s
ļ Ī³: discount factor, 0< Ī³ ā¤1
ļ policy Ļ
Ļ(s) = the action taken at state s
ļ Goal is to find an optimal policy Ļ* maximizing the expected
total rewards.
Dynamic Information Retrieval ModelingTutorial 201466
67. Policy
Policy: ļ°(s) = a
According to which,
select an action a at
state s.
ļ°(s0) =move right and ups0
ļ°(s1) =move right and ups1
ļ°(s2) = move rights2
Dynamic Information Retrieval ModelingTutorial 201467 [Slide altered from Carlos Guestrinās ML lecture]
68. Value of Policy
Value:Vļ°(s)
Expected long-term
reward starting from s
Start from s0
s0
R(s0)
ļ°(s0)
Vļ°(s0) = E[R(s0) + ļ§ R(s1) + ļ§2 R(s2) + ļ§3 R(s3)
+ ļ§4 R(s4) + ļ]
Future rewards
discounted by ļ§ ļ [0,1)
Dynamic Information Retrieval ModelingTutorial 201468 [Slide altered from Carlos Guestrinās ML lecture]
69. Value of Policy
Value:Vļ°(s)
Expected long-term
reward starting from s
Start from s0
s0
R(s0)
ļ°(s0)
Vļ°(s0) = E[R(s0) + ļ§ R(s1) + ļ§2 R(s2) + ļ§3 R(s3)
+ ļ§4 R(s4) + ļ]
Future rewards
discounted by ļ§ ļ [0,1)
s1
R(s1)
s1āā
s1ā
R(s1ā)
R(s1āā)
Dynamic Information Retrieval ModelingTutorial 201469 [Slide altered from Carlos Guestrinās ML lecture]
70. Value of Policy
Value:Vļ°(s)
Expected long-term
reward starting from s
Start from s0
s0
R(s0)
ļ°(s0)
Vļ°(s0) = E[R(s0) + ļ§ R(s1) + ļ§2 R(s2) + ļ§3 R(s3)
+ ļ§4 R(s4) + ļ]
Future rewards
discounted by ļ§ ļ [0,1)
s1
R(s1)
s1āā
s1ā
R(s1ā)
R(s1āā)
ļ°(s1)
R(s2)
s2
ļ°(s1ā)
ļ°(s1āā)
s2āā
s2ā
R(s2ā)
R(s2āā)
Dynamic Information Retrieval ModelingTutorial 201470 [Slide altered from Carlos Guestrinās ML lecture]
71. Computing the value of a policy
Dynamic Information Retrieval ModelingTutorial 201471
Vļ°(s0) = šø š
[š š 0, š + š¾š š 1, š + š¾2 š š 2, š + š¾3 š š 3, š + āÆ ]
=šø š[š š 0, š + š¾ š¾ š”ā1 š (š š”, š)ā
š”=1 ]
=š š 0, š + š¾šø š
[ š¾ š”ā1
š (š š”, š)ā
š”=1 ]
=š š 0, š + š¾ š š š (š , š ā²) šļ°(š ā²)š ā²
Value function
A possible next state
The current
state
72. Optimality ā Bellman Equation
ļ¬ The Bellman equation1 to MDP is a recursive definition of
the optimal value function V*(.)
šā s = max
š
š š , š + š¾ š š(š , š ā²)šā(š ā²)
š ā²
Dynamic Information Retrieval ModelingTutorial 201472
ļ¬ Optimal Policy
Ļā s = arg ššš„
š
š š , š + š¾ š š š , š ā² šā(š ā²)
š ā²
1R. Bellman,ā57
state-value function
73. Optimality ā Bellman Equation
ļ¬ The Bellman equation can be rewritten as
šā š = max
a
š(š , š)
š(š , š) = š š , š + š¾ š š(š , š ā²)šā(š ā²)
š ā²
Dynamic Information Retrieval ModelingTutorial 201473
ļ¬ Optimal Policy
Ļā s = arg ššš„
š
š š , š
action-value function
Relationship
betweenV and Q
74. MDP algorithms
Dynamic Information Retrieval ModelingTutorial 201474
ļ Value Iteration
ļ Policy Iteration
ļ Modified Policy Iteration
ļ Prioritized Sweeping
ļ Temporal Difference (TD) Learning
ļ Q-Learning
Model free
approaches
Model-based
approaches
[Bellman, ā57, Howard,ā60, Puterman and Shin,ā78, Singh & Sutton,ā96, Sutton & Barto,ā98,
Richard Sutton,ā88,Watkins,ā92]
Solve Bellman
equation
Optimal
valueV*(s)
Optimal
policy ļ°*(s)
[Slide altered from Carlos Guestrinās ML lecture]
75. Value Iteration
ļ¬ Initialization
Initialize š0 š arbitrarily
ļ¬ Loop
ļ¬ Iteration
šš+1 š ā max
š
š š , š + š¾ š š(š , š ā²)šš(š ā²)š ā²
Ļ s ā arg ššš„
š
š š , š + š¾ š š(š , š ā²)šš(š ā²)š ā²
ļ¬ Stopping criteria
ļ¬ Ļ s is good enough
Dynamic Information Retrieval ModelingTutorial 201475
1Bellman,ā57
81. Policy Iteration
1.For each state sāS
š š ā0, Ļ0 s ā ššššš”šššš¦ šššššš¦ , š ā 0
End for
2. Repeat
2.1 Repeat
For each š ā š
šā²(š ) ā š(š )
š š ā š š , Ļš s + š¾ š š š , š ā² š(š ā²)š ā²
End for
until āš š š ā šā² š < Īµ
2.2 For each š ā š
Ļš+1 s ā arg ššš„
š
š š , š + š¾ š š š , š ā²
š(š ā²)
š ā²
End for
2.3 š ā š + 1
Until Ļš = Ļšā1
Algorithm
Dynamic Information Retrieval ModelingTutorial 201481
82. Modified Policy Iteration
ļ¬ The āPolicy Evaluationā step in Policy Iteration is time-
consuming, especially when the state space is large.
ļ¬ The Modified Policy Iteration calculates an approximated
policy evaluation by running just a few iterations
Dynamic Information Retrieval ModelingTutorial 201482
Modified Policy
Iteration
Policy Iteration
GreedyValue Iterationk=1
k=ā
83. Modified Policy Iteration
1.For each state sāS
š š ā0, Ļ0 s ā ššššš”šššš¦ šššššš¦ , š ā 0
End for
2. Repeat
2.1 Repeat k times
For each š ā š
š š ā š š , Ļš s + š¾ š š š , š ā² š(š ā²)š ā²
End for
2.2 For each š ā š
Ļš+1 s ā arg ššš„
š
š š , š + š¾ š š š , š ā² š(š ā²)
š ā²
End for
2.3 š ā š + 1
Until Ļš = Ļšā1
Algorithm
Dynamic Information Retrieval ModelingTutorial 201483
84. MDP algorithms
Dynamic Information Retrieval ModelingTutorial 201484
ļ Value Iteration
ļ Policy Iteration
ļ Modified Policy Iteration
ļ Prioritized Sweeping
ļ Temporal Difference (TD) Learning
ļ Q-Learning
Model free
approaches
Model-based
approaches
[Bellman, ā57, Howard,ā60, Puterman and Shin,ā78, Singh & Sutton,ā96, Sutton & Barto,ā98,
Richard Sutton,ā88,Watkins,ā92]
Solve Bellman
equation
Optimal
valueV*(s)
Optimal
policy ļ°*(s)
[Slide altered from Carlos Guestrinās ML lecture]
85. Temporal Difference Learning
Dynamic Information Retrieval ModelingTutorial 201485
ļ¬ Monte Carlo Sampling can be used for model-free policy iteration
ļ¬ Estimate š š s in āPolicy Evaluationā by the average reward of trajectories from s
ļ¬ However, on the trajectories, some of them can be reused
ļ¬ So, we estimate them by an expectation over next state
š š s ā š š š + š + Ī³šø š š š ā²
|š , š
ļ¬ The simplest estimation:
š š s ā š š š + š + š¾š š sā²
ļ¬ A smoothed version:
š š s ā š š š + š¼ š + š¾š š sā²
+ (1 ā š¼) š š š
ļ¬ TD-Learning rule:
š š s ā š š š + š¼ š + š¾š š š ā² ā š š(š )
ļ r is the immediate reward, Ī± is the learning rate
Temporal difference
Richard Sutton,ā88
Singh & Sutton,ā96
Sutton & Barto,ā98
86. Dynamic Information Retrieval ModelingTutorial 201486
1. For each state sāS
Initialize V š(s) arbitrarily
End for
2. For each step in the state sequence
2.1 Initialize s
2.2 repeat
2.2.1 take action a at state s according to š
2.2.2 observe immediate reward r and the next state š ā²
2.2.3 š š s ā š š š + š¼ š + š¾š š š ā²
ā š š(š )
2.2.4 š ā š ā²
Until s is a terminal state
End for
Algorithm
Temporal Difference Learning
88. Q-Learning
Dynamic Information Retrieval ModelingTutorial 201488
1. For each state sāS and aāA
initialize Q0(s,a) arbitrarily
End for
2. š ā 0
3. For each step in the state sequence
3.1 Initialize s
3.2 Repeat
3.2.1 š ā š + 1
3.2.2 select an action a at state s according to Qi-1
3.2.3 take action a, observe immediate reward r and the next state š ā²
3.2.4 šš š , š ā ššā1 š , š + š¼ š + š¾ max
šā²
ššā1 š ā²
, šā²
ā ššā1(š , š)
3.2.5 š ā š ā²
Until s is a terminal state
End for
4. For each š ā š
Ļ s ā arg ššš„
š
šš š , š
End for
Algorithm
89. Apply an MDP to an IR Problem
Dynamic Information Retrieval ModelingTutorial 201489
ļ We can model IR systems using a Markov Decision
Process
ļ Is there a temporal component?
ļ States āWhat changes with each time step?
ļ Actions ā How does your system change the state?
ļ Rewards ā How do you measure feedback or
effectiveness in your problem at each time step?
ļ Transition Probability ā Can you determine this?
ļ If not, then model free approach is more suitable
90. Apply an MDP to an IR Problem -
Example
Dynamic Information Retrieval ModelingTutorial 201490
ļ User agent in session search
ļ States ā userās relevance judgement
ļ Action ā new query
ļ Reward ā information gained
91. Apply an MDP to an IR Problem -
Example
Dynamic Information Retrieval ModelingTutorial 201491
ļ Search engineās perspective
ļ What if we canāt directly observe userās relevance
judgement?
ļ Click ā relevance
? ? ? ?
92. Dynamic Information Retrieval ModelingTutorial 201492
ļ Markov Chain
ļ Hidden Markov Model
ļ Markov Decision Process
ļ Partially Observable Markov Decision Process
ļ Multi-armed Bandit
Family of Markov Models
93. POMDP Model
Dynamic Information Retrieval ModelingTutorial 201493
ā¦ā¦s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
ļ Hidden states
ļ Observations
ļ Belief
1R. D. Smallwood et. al.,ā73
o1 o2 o3
94. POMDP Definition
Dynamic Information Retrieval ModelingTutorial 201494
ļ A tuple (S, M,A, R, Ī³, O, Ī, B)
ļ S : state space
ļ M: transition matrix
ļ A: action space
ļ R: reward function
ļ Ī³: discount factor, 0< Ī³ ā¤1
ļ O: observation set
an observation is a symbol emitted according to a hidden state.
ļ Ī: observation function
Ī(s,a,o) is the probability that o is observed when the system transitions
into state s after taking action a, i.e. P(o|s,a).
ļ B: belief space
Belief is a probability distribution over hidden states.
95. Dynamic Information Retrieval ModelingTutorial 201495
ļ The agent uses a state estimator to update its belief about the
hidden states
bā²
= ššø(š, š, šā²)
ļ bā²
sā²
= P sā²
oā²
, a, b =
š(š ā²,šā²|š,š)
P(šā²|š,š)
=
Ī(š ā², š, šā²) š(š , š, š ā²)š(š )š
š(šā²|š, š)
POMDP ā Belief Update
96. Dynamic Information Retrieval ModelingTutorial 201496
ļ The Bellman equation for POMDP
š š = max
š
š š, š + š¾ š(šā²|š, š)š(šā²)
šā²
ļ A POMDP can be transformed into a continuous belief MDP (B, šā², A, r, Ī³)
ļ B : the continuous belief space
ļ šā²: transition function š š
ā² (š, šā²)= 1 š,šā²(šā², š)Pr(šā²|š, š)šāš
where 1 š,šā² šā²
, š =
1, šš ššø š, š, šā² = šā²
0, ššš š
.
ļ A: action space
ļ r: reward function r(b, a)= š š š (š , š)š āš
POMDP ā Bellman Equation
97. Dynamic Information Retrieval ModelingTutorial 201497
The optimal policy of a POMDP
The optimal policy of its belief MDP
1L. Kaelbling et. al., ā98
A variation of the value iteration algorithm
Solving POMDPs ā The Witness
Algorithm
98. Policy Tree
Dynamic Information Retrieval ModelingTutorial 201498
ā¢ A policy tree of depth i is an i-step non-stationary policy
ā¢ As if we run value iteration until the ith iteration
a(h)
ok(h) ok
a11
a21
a2k a2l
ā¦ ā¦
ā¦
ā¦
ā¦
ā¦ ā¦ ā¦ ā¦ ā¦ ā¦
o1 ol
ā¦aik
ā¦
a(i-1)k
ai1
ail
o1 olok
i steps to go
i-1 steps to go
2 steps to go
1 step to go
99. Value of a Policy Tree
Dynamic Information Retrieval ModelingTutorial 201499
ļ Can only determine the value of a policy tree h from some belief state
b, because it never knows the exact state.
šā š = š(š )šā(š )š āš
ļ šā š = š š , š ā + š¾ š š ā (š , š ā²) Ī(š ā², š ā , šš)šš š ā (š ā²)š šāšš ā²āš
the action at the
root node of h
the (i-1)-step subtree associated
with ok under the root node of h
100. Idea of the Witness Algorithm
Dynamic Information Retrieval ModelingTutorial 2014100
ļ For each action a, compute Īš
š
, the set of candidate i-step policy
trees with action a at their roots
ļ The optimal value function at the ith step, šš
ā
(b), is the upper
surface of the value functions of all i-step policy trees.
101. Optimal value function
Dynamic Information Retrieval ModelingTutorial 2014101
ļ Geometrically, šš
ā
(b) is piecewise linear and convex.
An example for a two-state POMDP
b(s1)+b(s2)=1
Simplex constraint
The belief space is one-dimensional
Vh2(b)
Vh3(b)
Vh1(b)
Vh5(b)
Vh4(b)
šš
ā
š = max
āāH
šā š
Pruning the Set of
PolicyTrees
102. Outlines of the Witness Algorithm
Dynamic Information Retrieval ModelingTutorial 2014102
Algorithm
1.š»1 ā{}
2. i ā 1
3. Repeat
3.1 i ā i+1
3.2 For each a in A
Īš
š
ā witness(š»iā1, a)
end for
3.3 Prune Īš
š
š to get š»i
until š š¢š š|Vi(b) ā Viā1(b)| < š
the inner loop
103. Inner Loop of the Witness
Algorithm
Dynamic Information Retrieval ModelingTutorial 2014103
Inner loop of the witness algorithm
1. Select a belief b arbitrarily. Generate a best i-step policy tree hi. Add
āi to an agenda.
2. In each iteration
2.1 Select a policy tree ā ššš¤ from the agenda.
2.2 Look for a witness point b using Za and ā ššš¤.
2.3 If find such a witness point b,
2.3.1 Calculate the best policy tree ā ššš š” for b.
2.3.2 Add ā ššš š” to Za.
2.3.3 Add all the alternative trees of ā ššš š” to the agenda.
2.4 Else remove ā ššš¤ from the agenda.
3. Repeat the above iteration until the agenda is empty.
105. Dynamic Information Retrieval ModelingTutorial 2014105
POMDP Dynamic IR
Environment Documents
Agents User, Search engine
States Queries, Userās decision making status, Relevance of
documents, etc
Actions Provide a ranking of documents, Weigh terms in the query,
Add/remove/unchange the query terms, Switch on or
switch off a search technology, Adjust parameters for a
search technology
Observations Queries, Clicks, Document lists, Snippets, Terms, etc
Rewards Evaluation measures (such as DCG, NDCG or MAP)
Clicking information
Transition matrix Given in advance or estimated from training data.
Observation
function
Problem dependent, Estimated based on sample datasets
Applying POMDP to Dynamic IR
106. Session Search Example - States
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant &
Exploitation
SNRR
Non-Relevant &
Exploration
ļ scooter price ā¶ scooter stores ļ Hartford visitors ā¶ Hartford
Connecticut tourism
ļ Philadelphia NYC travel ā¶
Philadelphia NYC train
ļ distance NewYork Boston ā¶
maps.bing.com
q0
106 [ J. Luo ,et al., ā14]
107. Session Search Example - Actions
(Au, Ase)
ļ User Action(Au)
ļ Add query terms (+Īq)
ļ Remove query terms (-Īq)
ļ keep query terms (qtheme)
ļ clicked documents
ļ SAT clicked documents
ļ Search Engine Action(Ase)
ļ increase/decrease/keep term weights,
ļ Switch on or switch off query expansion
ļ Adjust the number of top documents used in PRF
ļ etc.
107 [ J. Luo et al., ā14]
108. Multi Page Search Example -
States & Actions
Dynamic Information Retrieval ModelingTutorial 2014108
State:
Relevance
of
document
Action:
Ranking of
documents
Observation:
Clicks
Belief: Multivariate
Guassian
Reward: DCG over 2
pages
[Xiaoran Jin et. al., ā13]
109. SIGIRTutorial July 7th 2014
Grace Hui Yang
Marc Sloan
JunWang
Guest Speaker: EmineYilmaz
Dynamic Information Retrieval
Modeling
Exercise
110. Dynamic Information Retrieval ModelingTutorial 2014110
ļ Markov Chain
ļ Hidden Markov Model
ļ Markov Decision Process
ļ Partially Observable Markov Decision Process
ļ Multi-Armed Bandit
Family of Markov Models
111. Multi Armed Bandits (MAB)
Dynamic Information Retrieval ModelingTutorial 2014111
ā¦ā¦
ā¦ā¦
Which slot
machine should
I select in this
round?
Reward
112. Multi Armed Bandits (MAB)
Dynamic Information Retrieval ModelingTutorial 2014112
I won! Is this
the best slot
machine?
Reward
113. MAB Definition
Dynamic Information Retrieval ModelingTutorial 2014113
ļ A tuple (S,A, R, B)
ļS : hidden reward distribution of each bandit
ļA: choose which bandit to play
ļR: reward for playing bandit
ļB: belief space, our estimate of each banditās
distribution
114. Comparison with Markov Models
Dynamic Information Retrieval ModelingTutorial 2014114
ļ Single state Markov Decision Process
ļNo transition probability
ļ Similar to POMDP in that we maintain a belief
state
ļ Action = choose a bandit, does not affect state
ļ Does notāplan aheadā but intelligently adapts
ļ Somewhere between interactive and dynamic IR
115. Markov Multi Armed Bandits
Dynamic Information Retrieval ModelingTutorial 2014115
ā¦ā¦
ā¦ā¦
Markov
Process 1
Markov
Process 2
Markov
Process k
Which slot
machine should
I select in this
round?
Reward
116. Markov Multi Armed Bandits
Dynamic Information Retrieval ModelingTutorial 2014116
ā¦ā¦
ā¦ā¦
Markov
Process 1
Markov
Process 2
Markov
Process k
Markov
Process
Action
Which slot
machine should
I select in this
round?
Reward
117. MAB Policy Reward
Dynamic Information Retrieval ModelingTutorial 2014117
ļ MAB algorithm describes a policy š for choosing
bandits
ļ Maximise rewards from chosen bandits over all
time steps
ļ Minimize regret
ļ š šš¤ššš šā ā š šš¤ššš(š š(š”))š
š”=1
ļ Cumulative difference between optimal reward and
actual reward
118. Exploration vs Exploitation
Dynamic Information Retrieval ModelingTutorial 2014118
ļ Exploration
ļ Try out bandits to find which has highest average reward
ļ Exploitation
ļ Too much exploration leads to poor performance
ļ Play bandits that are known to pay out higher reward on average
ļ MAB algorithms balance exploration and exploitation
ļ Start by exploring more to find best bandits
ļ Exploit more as best bandits become known
120. MAB ā Index Algorithms
Dynamic Information Retrieval ModelingTutorial 2014120
ļ Gittens index1
ļ Play bandit with highestāDynamic Allocation Indexā
ļ Modelled using MDP but suffersācurse of dimensionalityā
ļ š-greedy2
ļ Play highest reward bandit with probability 1 ā Ļµ
ļ Play random bandit with probability š
ļ UCB (Upper Confidence Bound)3
ļ Play bandit š with highest š„š +
2 ln š”
š š
ļ Chances of playing infrequently played bandits increases over
time
1J. C. Gittins.ā89
2NicolĆ² Cesa-Bianchi et. al.,ā98
3P.Auer et. al.,ā02
121. MAB use in IR
Dynamic Information Retrieval ModelingTutorial 2014121
ļ Choosing ads to display to users1
ļ Each ad is a bandit
ļ User click through rate is reward
ļ Recommending news articles2
ļ News article is a bandit
ļ Similar to Information Filtering case
ļ Diversifying search results3
ļ Each rank position is an MAB dependent on higher ranks
ļ Documents are bandits chosen by each rank
1Deepayan Chakrabarti et. al. ,ā09
2Lihong Li et. al., ā10
3Radlinski et. al.,ā08
122. MAB Variations
Dynamic Information Retrieval ModelingTutorial 2014122
ļ Contextual Bandits1
ļ World has some context š„ ā š (i.e. user location)
ļ Learn policy š: š ā š“ that maps context to arms (online or
offline)
ļ Duelling Bandits2
ļ Play two (or more) bandits at each time step
ļ Observe relative reward rather than absolute
ļ Learn order of bandits
ļ Mortal Bandits3
ļ Value of bandits decays over time
ļ Exploitation > exploration
1Lihong Li et. al.,ā10
2YisongYue et. al.,ā09
3Deepayan Chakrabarti et. al. ,ā09
123. Comparison of Markov Models
Dynamic Information Retrieval ModelingTutorial 2014123
ļ MC ā a fully observable stochastic process
ļ HMM ā a partially observable stochastic process
ļ MDP ā a fully observable decision process
ļ MAB ā a decision process, either fully or partially observable
ļ POMDP ā a partially observable decision process
actions rewards states
MC No No Observable
HMM No No Unobservable
MDP Yes Yes Observable
POMDP Yes Yes Unobservable
MAB Yes Yes Fixed
124. SIGIRTutorial July 7th 2014
Grace Hui Yang
Marc Sloan
JunWang
Guest Speaker: EmineYilmaz
Dynamic Information Retrieval
Modeling
Exercise
125. Outline
Dynamic Information Retrieval ModelingTutorial 2014125
ļ Introduction
ļ Theory and Models
ļ Session Search
ļ Reranking
ļ GuestTalk: Evaluation
126. TREC Session Tracks (2010-2012)
ļ Given a series of queries {q1,q2,ā¦,qn}, top 10 retrieval
results {D1, ā¦ Di-1 } for q1 to qi-1, and click information
ļ The task is to retrieve a list of documents for the current/last
query, qn
ļ Relevance judgment is made based on how relevant the
documents are for qn, and how relevant they are for information
needs for the entire session (in topic description)
ļ no need to segment the sessions
126
127. 1.pocono mountains pennsylvania
2.pocono mountains pennsylvania hotels
3.pocono mountains pennsylvania things to do
4.pocono mountains pennsylvania hotels
5.pocono mountains camelbeach
6.pocono mountains camelbeach hotel
7.pocono mountains chateau resort
8.pocono mountains chateau resort attractions
9.pocono mountains chateau resort getting to
10.chateau resort getting to
11.pocono mountains chateau resort directions
TREC 2012 Session 6
127
Information needs:
You are planning a winter vacation to the
Pocono Mountains region in Pennsylvania in
the US.Where will you stay?What will you
do while there? How will you get there?
In a session, queries change
constantly
128. Query change is an important
form of feedback
ļ We define query change as the syntactic editing changes
between two adjacent queries:
ļ includes
ļ , added terms
ļ , removed terms
ļ The unchanged/shared terms are called:
ļ , theme term
1ļļļ½ļ iii qqq
iqļ
128
iqļļ«
iqļ
iqļļ
themeq
q1 = ābollywood legislationā
q2 = ābollywood lawā
---------------------------------------
ThemeTerm = ābollywoodā
Added (+Īq) = ālawā
Removed (-Īq) = ālegislationā
129. Where do these query changes come
from?
ļ GivenTREC Session settings, we consider two sources of
query change:
ļ the previous search results that a user viewed/read/examined
ļ the information need
ļ Example:
ļ Kurosawa ļ Kurosawa wife
ļ `wifeā is not in any previous results, but in the topic description
ļ However, knowing information needs before search is
difficult to achieve
129
130. Previous search results could influence
query change in quite complex ways
ļ Merck lobbyists ļ Merck lobbying US policy
ļ D1 contains several mentions ofāpolicyā, such as
ļ āA lobbyist who until 2004 worked as senior policy advisor to
Canadian Prime Minister Stephen Harper was hired last month by
Merck ā¦ā
ļ These mentions are about Canadian policies; while the user adds
US policy in q2
ļ Our guess is that the user might be inspired byāpolicyā, but
he/she prefers a different sub-concept other than `Canadian
policyā
ļ Therefore, for the added terms `US policyā,āUSā is the novel term
here, andāpolicyā is not since it appeared in D1.
ļ The two terms should be treated differently
130
131. ļ We propose to model session search as a Markov decision process (MDP)
ļ Two agents: the User and the Search Engine
Dynamic Information Retrieval ModelingTutorial 2014131
ļ¬ Environments
Search results
ļ¬ States Queries
ļ¬ Actions
ļ¬ User actions:
Add/remove/unchange
the query terms
ļ¬ Search Engine actions:
Increase/ decrease
/remain term weights
Applying MDP to Session Search
132. Search Engine Agentās Actions
ā Diā1 action Example
qtheme
Y increase āpocono mountainā in s6
N increase
āfrance world cup 98 reactionā in s28,
france world cup 98 reaction stock
marketā france world cup 98 reaction
+āq
Y decrease
āpolicyā in s37, Merck lobbyists ā Merck
lobbyists US policy
N increase
āUSā in s37, Merck lobbyists ā Merck lobbyists
US policy
āāq
Y decrease
āreactionā in s28, france world cup 98
reaction
ā france world cup 98
N
No
change
ālegislationā in s32, bollywood legislation
ābollywood law
132
133. Query Change retrieval Model
(QCM)
ļ Bellman Equation gives the optimal value for an MDP:
ļ The reward function is used as the document relevance score
function and is tweaked backwards from Bellman equation:
133
V*
(s) = max
a
R(s,a) + g P(s' | s,a)
s'
Ć„ V*
(s')
ļ„ ļ
a
Di
)D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii
1
ļ§
Document
relevant score Query
Transition
model
Maximum
past
relevanceCurrent
reward/relevanc
e score
134. Calculating the Transition Model
)|(log)|(
)|(log)()|(log)|(
)|(log)]|(1[+d)|P(qlog=d),Score(q
*
1
*
1
*
1ii
*
1
*
1
dtPdtP
dtPtidfdtPdtP
dtPdtP
qt
i
dt
qt
dt
qt
i
qthemet
i
ii
ļ„
ļ„ļ„
ļ„
ļļļ
ļ
ļ
ļļ«ļ
ļ
ļļ«ļ
ļ
ļ
ļ
ļ
ļ«ļ
ļ
ļļ
ļ¤
ļ„ļ¢
ļ”
134
ā¢ According to Query Change and Search Engine
Actions
Current reward/
relevance score
Increase weights
for theme terms
Decrease weights
for removed terms
Increase weights
for novel added
terms
Decrease weights
for old added
terms
135. Maximizing the Reward Function
ļ Generate a maximum rewarded document denoted as d*
i-1,
from Di-1
ļ That is the document(s) most relevant to qi-1
ļ The relevance score can be calculated as
š ššā1 ššā1 = 1 ā {1 ā š(š”|ššā1)}
š”āš šā1
š š” ššā1 =
#(š”,š šā1)
|š šā1|
ļ From several options, we choose to only use the document
with top relevance
max
Di-1
P(qi-1 | Di-1)
135
136. Scoring the Entire Session
ļ The overall relevance score for a session of queries is
aggregated recursively :
Scoresession (qn, d) = Score(qn, d) + gScoresession (qn-1, d)
= Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)]
= gn-i
i=1
n
Ć„ Score(qi, d)
136
144. Relevance Feedback
Dynamic Information Retrieval ModelingTutorial 2014144
ļ No UI Changes
ļ Interactivity is Hidden
ļ Private, performed in browser
145. Relevance Feedback
Dynamic Information Retrieval ModelingTutorial 2014145
Page 1
ā¢ Diverse Ranking
ā¢ Maximise
learning
potential
ā¢ Exploration vs
Exploitation
Page 2
ā¢ Clickthroughs or
explicit ratings
ā¢ Respond to
feedback from
page 1
ā¢ Personalized
154. Model ā Bellman Equation
Dynamic Information Retrieval ModelingTutorial 2014154
ļ Optimize š1 to improve š¼ š
2
ļ š š1
, Ī£1
, 1 =
max
š1
ššš
1
. š¾1 + max
š2
(1 ā š) šš
2
. š¾2 š š ššš
155. š
Dynamic Information Retrieval ModelingTutorial 2014155
ļ Balances exploration and exploitation in page 1
ļ Tuned for different queries
ļ Navigational
ļ Informational
ļ š = 1 for non-ambiguous search
156. Approximation
Dynamic Information Retrieval ModelingTutorial 2014156
ļ Monte Carlo Sampling
ļ ā max
š1
ššš
1
. š¾1 + max
š2
1 ā š
1
š
šš
2
. š¾2 š ššāš
ļ Sequential Ranking Decision
157. Experiment Data
Dynamic Information Retrieval ModelingTutorial 2014157
ļ Difficult to evaluate without access to live users
ļ Simulated using 3TREC collections and relevance
judgements
ļ WT10G ā Explicit Ratings
ļ TREC8 ā Clickthroughs
ļ Robust ā Difficult (ambiguous) search
158. User Simulation
Dynamic Information Retrieval ModelingTutorial 2014158
ļ Rank M documents
ļ Simulated user clicks according to relevance judgements
ļ Update page 2 ranking
ļ Measure at page 1 and 2
ļ Recall
ļ Precision
ļ nDCG
ļ MRR
ļ BM25 ā prior ranking model
165. Results
Dynamic Information Retrieval ModelingTutorial 2014165
ļ Similar results across data sets and metrics
ļ 2nd page gain outweighs 1st page losses
ļ Outperformed Maximum Marginal Relevance using MRR to
measure diversity
ļ BM25-U simply no exploration case
ļ Similar results when š = 5
174. Different Approaches to
Evaluation
ļ Online Evaluation
ļ Design interactive experiments
ļ Use usersā actions to evaluate the quality
ļ Inherently dynamic in nature
ļ Offline Evaluation
ļ Controlled laboratory experiments
ļ The usersā interaction with the engine is only simulated
ļ Recent work focused on dynamic IR evaluation
175. Online Evaluation
ļ Standard click metrics
ļ Clickthrough rate
ļ Probability user skips over results they have considered (pSkip)
ļ Most recently: Result interleaving
ļ
ļ
ļ
Click/Noclick
Evaluate
175
176. What is result interleaving?
ļ A way to compare rankers online
ļ Given the two rankings produced by two methods
ļ Present a combination of the rankings to users
ļ Team Draft Interleaving (Radlinski et al., 2008)
ļ Interleaving two rankings
ļ Input:Two rankings (ācan be seen as teams who pick playersā)
ļ Repeat:
o Toss a coin to see which team (ranking) picks next
o Winner picks their best remaining player (document)
o Loser picks their best remaining player (document)
ļ Output: One ranking (2 teams of 5)
ļ Credit assignment
ļ Ranking providing more of the clicked results wins
177. Team Draft InterleavingRanking A
1. Napa Valley ā The authority for lodging...
www.napavalley.com
2. Napa Valley Wineries - Plan your wine...
www.napavalley.com/wineries
3. Napa Valley College
www.napavalley.edu/homex.asp
4. Been There | Tips | Napa Valley
www.ivebeenthere.co.uk/tips/16681
5. Napa Valley Wineries and Wine
www.napavintners.com
6. Napa Country, California ā Wikipedia
en.wikipedia.org/wiki/Napa_Valley
Ranking B
1. Napa Country, California ā Wikipedia
en.wikipedia.org/wiki/Napa_Valley
2. Napa Valley ā The authority for lodging...
www.napavalley.com
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Hotels ā Bed and Breakfast...
www.napalinks.com
5. NapaValley.org
www.napavalley.org
6. The Napa Valley Marathon
www.napavalleymarathon.org
Presented Ranking
1. Napa Valley ā The authority for lodging...
www.napavalley.com
2. Napa Country, California ā Wikipedia
en.wikipedia.org/wiki/Napa_Valley
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Wineries ā Plan your wine...
www.napavalley.com/wineries
5. Napa Valley Hotels ā Bed and Breakfast...
www.napalinks.com
6. Napa Valley College
www.napavalley.edu/homex.asp
7 NapaValley.org
www.napavalley.org
AB
178. Team Draft InterleavingRanking A
1. Napa Valley ā The authority for lodging...
www.napavalley.com
2. Napa Valley Wineries - Plan your wine...
www.napavalley.com/wineries
3. Napa Valley College
www.napavalley.edu/homex.asp
4. Been There | Tips | Napa Valley
www.ivebeenthere.co.uk/tips/16681
5. Napa Valley Wineries and Wine
www.napavintners.com
6. Napa Country, California ā Wikipedia
en.wikipedia.org/wiki/Napa_Valley
Ranking B
1. Napa Country, California ā Wikipedia
en.wikipedia.org/wiki/Napa_Valley
2. Napa Valley ā The authority for lodging...
www.napavalley.com
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Hotels ā Bed and Breakfast...
www.napalinks.com
5. NapaValley.org
www.napavalley.org
6. The Napa Valley Marathon
www.napavalleymarathon.org
Presented Ranking
1. Napa Valley ā The authority for lodging...
www.napavalley.com
2. Napa Country, California ā Wikipedia
en.wikipedia.org/wiki/Napa_Valley
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Wineries ā Plan your wine...
www.napavalley.com/wineries
5. Napa Valley Hotels ā Bed and Breakfast...
www.napalinks.com
6. Napa Valley College
www.napavalley.edu/homex.asp
7 NapaValley.org
www.napavalley.org
B wins!
179. Team Draft InterleavingRanking A
1. Napa Valley ā The authority for lodging...
www.napavalley.com
2. Napa Valley Wineries - Plan your wine...
www.napavalley.com/wineries
3. Napa Valley College
www.napavalley.edu/homex.asp
4. Been There | Tips | Napa Valley
www.ivebeenthere.co.uk/tips/16681
5. Napa Valley Wineries and Wine
www.napavintners.com
6. Napa Country, California ā Wikipedia
en.wikipedia.org/wiki/Napa_Valley
Ranking B
1. Napa Country, California ā Wikipedia
en.wikipedia.org/wiki/Napa_Valley
2. Napa Valley ā The authority for lodging...
www.napavalley.com
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Hotels ā Bed and Breakfast...
www.napalinks.com
5. NapaValley.org
www.napavalley.org
6. The Napa Valley Marathon
www.napavalleymarathon.org
Presented Ranking
1. Napa Valley ā The authority for lodging...
www.napavalley.com
2. Napa Country, California ā Wikipedia
en.wikipedia.org/wiki/Napa_Valley
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Wineries ā Plan your wine...
www.napavalley.com/wineries
5. Napa Valley Hotels ā Bed and Breakfast...
www.napalinks.com
6. Napa Valley College
www.napavalley.edu/homex.asp
7 NapaValley.org
www.napavalley.org
B wins!
Repeat Over Many Different
Queries!
180. Offline Evaluation
ļ Controlled laboratory experiments
ļ The userās interaction with the engine is
only simulated
ļ Ask experts to judge each query result
ļ Predict how users behave when they search
ļ Aggregate judgments to evaluate
180
181. Offline Evaluation
ļ Until recently: Metrics assume that userās information need was not affected
by the documents read
ļ E.g.Average Precision, NDCG, ā¦
ā¢ Users are more likely to stop searching when they see a highly relevant
document
ā¢ Lately: Metrics that incorporate the affect of relevance of documents seen
by the user on user behavior
ļ Based on devising more realistic user models
ļ EBU, ERR [Yilmaz et al CIKM10, Chapelle et al CIKM09]
181
182. Modeling User Behavior
Cascade-based models
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
ā¦
ā¢ The user views search results from top to bottom
ā¢ At each rank i, the user has a certain probability of being
satisfied.
ā¢ Probability of satisfaction proportional to the
relevance grade of the document at rank i.
ā¢ Once the user is satisfied with a document, he terminates
the search.
185. Expected Reciprocal Rank
[Chapelle et al CIKM09]
Query
Stop
Relevant?
View Next
Item
nosomewhathighly
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
ā¦
186. Expected Reciprocal Rank
[Chapelle et al CIKM09]
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
ā¦
rrankatdocument"perfectthe"findingofUtility:(r)ļŖ
1/r(r) ļ½ļŖ
)positionatstopsuser(
1
1
rP
r
ERR
n
r
ļ„ļ½
ļ½
ļļ„
ļ
ļ½ļ½
ļļ½
1
11
)1(
1 r
i
ri
n
r
RR
r
ERR
documentitheofgraderelevance: th
ig
iRi g
g
i
i
docatstopofProb.
2
12
docofrelevanceofProb. max
ļ½
ļ
ļ½ļ½
189. Measuring āgoodnessā
The user steps down a ranked list of documents and
observes each one of them until a decision point and either
a) abandons the search, or
b) reformulates
While stepping down or sideways, the user accumulates
utility
190. Evaluation over a single ranked list
1
2
3
4
5
6
7
8
9
10
ā¦
kenya cooking
traditional swahili
kenya cooking
traditional
kenya swahili
traditional food
recipes
191.
192. Session DCG
[JƤrvelin et al ECIR 2008]
kenya cooking
traditional swahili
kenya cooking
traditional
ļ ļ
2rel(r)
ļ1
logb (r ļ«b ļ1)rļ½1
k
ļ„
ļ ļ
2rel(r)
ļ1
logb (r ļ«b ļ1)rļ½1
k
ļ„
1
logc (1ļ« c ļ1)
DCG(RL1) ļ«
1
logc (2 ļ« c ļ1)
ļ DCG(RL2)
194. Probability of a path
Probability of abandoning at
reform 2
X
Probability of reformulating at rank
3
Q1 Q2 Q3
N R R
N R R
N R R
N R R
N R R
N N R
N N R
N N R
N N R
N N R
ā¦ ā¦ ā¦
(1)
(2)
195. Expected Global Utility
[Yang and Lad ICTIR 2009]
1. User steps down ranked results one-by-one
2. Stops browsing documents based on a stochastic process
that defines a stopping probability distribution over ranks
and reformulates
3. Gains something from relevant documents, accumulating
utility
196. Q1 Q2 Q3
N R R
N R R
N R R
N R R
N R R
N N R
N N R
N N R
N N R
N N R
ā¦ ā¦ ā¦
Probability
of abandoning
the session at
reformulation i
Geometric w/ parameter preform
(1)
197. Q1 Q2 Q3
N R R
N R R
N R R
N R R
N R R
N N R
N N R
N N R
N N R
N N R
ā¦ ā¦ ā¦
Geometricw/parameterpdown
Probability
of reformulating
at rank j
(2)
Geometric w/ parameter preform
198. Expected Global Utility
[Yang and Lad ICTIR 2009]
ļ The probability of a user following a path Ļ:
P(Ļ) = P(r1, r2, ..., rK)
ri is the stopping and reformulation point in list i
ļ Assumption: stopping positions in each list are independent
P(r1, r2, ..., rK) = P(r1)P(r2)...P(rK)
ļ Use geometric distribution (RBP) to model the stopping and
reformulation behaviour
P(ri = r) = (1-ļ)ļkļ1
199. Conclusions
ļ Recent focus on evaluating the dynamic nature of the search
process
ļ Interleaving
ļ New offline evaluation metrics
ļ ERR, RBU
ļ Session evaluation metrics
200. Outline
Dynamic Information Retrieval ModelingTutorial 2014200
ļ Introduction
ļ Theory and Models
ļ Session Search
ļ Reranking
ļ GuestTalk: Evaluation
ļ Conclusion
201. Conclusions
Dynamic Information Retrieval ModelingTutorial 2014201
ļ Dynamic IR describes a new class of interactive model
ļ Incorporates rich feedback, temporal dependency and is goal
oriented.
ļ Family of Markov models and Multi Armed Bandit theory
useful in building DIR models
ļ Applicable to a range of IR problems
ļ Useful in applications such as session search and evaluation
202. Dynamic IR Book
Dynamic Information Retrieval ModelingTutorial 2014202
ļ Published by Morgan & Claypool
ļ āSynthesis Lectures on Information Concepts, Retrieval, and
Servicesā
ļ Due March/April 2015 (in time for SIGIR 2015)
203. Acknowledgment
Dynamic Information Retrieval ModelingTutorial 2014203
ļ We thank Dr. EmineYilmaz for giving us the guest speech.
ļ We sincerely thank Dr. Xuchu Dong for his help in
preparation of the tutorial
ļ We also thank comments and suggestions from the following
colleagues:
ļ Dr. Jamie Callan
ļ Dr. Ophir Frieder
ļ Dr. Fernando Diaz
ļ Dr Filip Radlinski
206. References
Dynamic Information Retrieval ModelingTutorial 2014206
Static IR
ļ Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro-
Neto.Addison-Wesley, 1999.
ļ The PageRank Citation Ranking: Bringing Order to theWeb.
Lawrence Page , Sergey Brin , Rajeev Motwani ,TerryWinograd.
1999
ļ Implicit User Modeling for Personalized Search, Xuehua Shen et.
al, CIKM, 2005
ļ A Short Introduction to Learning to Rank. Hang Li, IEICE
Transactions 94-D(10): 1854-1862, 2011.
207. References
Dynamic Information Retrieval ModelingTutorial 2014207
Interactive IR
ļ Relevance Feedback in Information Retrieval, Rocchio, J. J.,The
SMART Retrieval System (pp. 313-23), 1971
ļ A study in interface support mechanisms for interactive
information retrieval, RyenW.White et. al, JASIST, 2006
ļ Visualizing stages during an exploratory search session, Bill Kules
et. al, HCIR, 2011
ļ Dynamic Ranked Retrieval, Cristina Brandt et. al,WSDM, 2011
ļ Structured Learning of Two-level Dynamic Rankings, Karthik
Raman et. al, CIKM, 2011
208. References
Dynamic Information Retrieval ModelingTutorial 2014208
Dynamic IR
ļ A hidden Markov model information retrieval system. D. R. H.
Miller,T. Leek, and R. M. Schwartz. In SIGIRā99, pages 214-221.
ļ Threshold setting and performance optimization in adaptive
ļ¬ltering, Stephen Robertson, JIR 2002
ļ A large-scale study of the evolution of web pages, Dennis Fetterly
et. al.,WWW 2003
ļ Learning diverse rankings with multi-armed bandits. Filip
Radlinski, Robert Kleinberg,Thorsten Joachims. ICML, 2008.
ļ Interactively Optimizing Information Retrieval Systems as a
Dueling Bandits Problem,YisongYue et. al., ICML 2009
ļ Meme-tracking and the dynamics of the news cycle, Jure Leskovec
et. al., KDD 2009
209. References
Dynamic Information Retrieval ModelingTutorial 2014209
Dynamic IR
ļ Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip
Radlinski, Eli Upfal. NIPS 2009
ļ A Novel Click Model and Its Applications to Online Advertising ,
Zeyuan Allen Zhu et. al.,WSDM 2010
ļ A contextual-bandit approach to personalized news article
recommendation. Lihong Li,Wei Chu, John Langford, Robert E.
Schapire.WWW, 2010
ļ Inferring search behaviors using partially observable markov model with
duration (POMD),Yin he et. al.,WSDM, 2011
ļ No Clicks, No Problem: Using Cursor Movements to Understand and
Improve Search, Jeff Huang et. al., CHI 2011
ļ Balancing Exploration and Exploitation in Learning to Rank Online,
Katja Hofmann et. al., ECIR, 2011
ļ Large-ScaleValidation and Analysis of Interleaved Search Evaluation,
Olivier Chapelle et. al.,TOIS 2012
210. References
Dynamic Information Retrieval ModelingTutorial 2014210
Dynamic IR
ļ Using ControlTheory for Stable and Efficient Recommender Systems.T.
Jambor, J.Wang, N. Lathia. In:WWW '12, pages 11-20.
ļ Sequential selection of correlated ads by POMDPs, ShuaiYuan et. al.,
CIKM 2012
ļ Utilizing query change for session search. D. Guan, S. Zhang, and H.
Yang. In SIGIR ā13, pages 453ā462.
ļ Query Change as Relevance Feedback in Session Search (short paper). S.
Zhang, D. Guan, and H.Yang. In SIGIR 2013.
ļ Interactive exploratory search for multi page search results. X. Jin, M.
Sloan, and J.Wang. InWWW ā13.
ļ Interactive Collaborative Filtering. X. Zhao,W. Zhang, J.Wang. In:
CIKM'2013, pages 1411-1420.
ļ Win-win search: Dual-agent stochastic game in session search. J. Luo, S.
Zhang, and H.Yang. In SIGIR ā14.
211. References
Dynamic Information Retrieval ModelingTutorial 2014211
Markov Processes
ļ A markovian decision process. R. Bellman. Indiana University
Mathematics Journal, 6:679ā684, 1957.
ļ Dynamic Programming. R. Bellman. Princeton University Press,
Princeton, NJ, USA, first edition, 1957.
ļ Dynamic Programming and Markov Processes. R.A. Howard. MIT Press.
1960
ļ Linear Programming and Sequential Decisions.Alan S. Manne.
Management Science, 1960
ļ Statistical Inference for Probabilistic Functions of Finite State Markov
Chains. Baum, Leonard E.; Petrie,Ted.The Annals of Mathematical
Statistics 37, 1966
212. References
Dynamic Information Retrieval ModelingTutorial 2014212
Markov Processes
ļ Learning to predict by the methods of temporal differences. Richard
Sutton. Machine Learning 3. 1988
ļ Computationally feasible bounds for partially observed Markov decision
processes.W. Lovejoy. Operations Research 39: 162ā175, 1991.
ļ Q-Learning. Christopher J.C.H.Watkins, Peter Dayan. Machine
Learning. 1992
ļ Reinforcement learning with replacing eligibility traces. Singh, S. P. &
Sutton, R. S. Machine Learning, 22, pages 123-158, 1996.
ļ Reinforcement Learning:An Introduction. Richard S. Sutton and
Andrew G. Barto. MIT Press, 1998.
ļ Planning and acting in partially observable stochastic domains. L.
Kaelbling, M. Littman, and A. Cassandra.Artificial Intelligence, 101(1-
2):99ā134, 1998.
213. References
Dynamic Information Retrieval ModelingTutorial 2014213
Markov Processes
ļ Finding approximate POMDP solutions through belief compression. N.
Roy. PhDThesis Carnegie Mellon. 2003
ļ VDCBPI: an approximate scalable algorithm for large scale POMDPs, P.
Poupart and C. Boutilier. In NIPS-2004, pages 1081ā1088.
ļ Finding Approximate POMDP solutionsThrough Belief Compression. N.
Roy, G. Gordon and S.Thrun. Journal of Artificial Intelligence Research,
23:1-40,2005.
ļ Probabilistic robotics. S.Thrun,W. Burgard, D. Fox. Cambridge. MIT
Press. 2005
ļ Anytime Point-Based Approximations for Large POMDPs. J. Pineau, G.
Gordon and S.Thrun.Volume 27, pages 335-380, 2006
ļ Probabilistic Robotics. S.Thrun,W. Burgard, D. Fox.The MIT Press,
2006.
214. References
Dynamic Information Retrieval ModelingTutorial 2014214
Markov Processes
ļ The optimal control of partially observable Markov decision processes over a
finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973
ļ Modified Policy IterationAlgorithms for Discounted Markov Decision
Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978.
ļ An example of statistical investigation of the text eugene onegin the connection
of samples in chains.A.A. Markov. Science in Context, 19:591ā600, 12 2006.
ļ Learning to Rank for Information Retrieval.Tie-Yan Liu. Springer Science &
Business Media. 2011
ļ Finite-Time Regret Bounds for the Multiarmed Bandit Problem, NicolĆ² Cesa-
Bianchi, Paul Fischer. ICML 100-108, 1998
ļ Multi-armed bandit allocation indices,Wiley, J. C. Gittins. 1989
ļ Finite-time Analysis of the Multiarmed Bandit Problem, PeterAuer et. al.,
Machine Learning 47, Issue 2-3. 2002.