Adversarial and reinforcement learning approaches to optimize information retrieval

ADVERSARIAL AND
REINFORCEMENT
LEARNING BASED
APPROACHES TO
INFORMATION RETRIEVAL
Bhaskar Mitra
Principal Applied Scientist, Microsoft AI & Research
Joint work with Daniel Cohen, Katja Hofmann, W. Bruce Croft,
Corby Rosset, Damien Jose, Gargi Ghosh, and Saurabh Tiwary
SIGIR 2018 | Ann Arbor, Michigan

Today’s topics: two SIGIR 2018 short papers
Awarded SIGIR 2018 Best Short Paper
https://arxiv.org/abs/1805.03403 https://arxiv.org/abs/1804.04410

Cross Domain Regularization
for Neural Ranking Models
Using Adversarial Learning
Daniel Cohen, Bhaskar Mitra, Katja Hofmann, W. Bruce Croft
https://arxiv.org/abs/1805.03403

Clever Hans was a horse claimed to have been
capable of performing arithmetic and other
intellectual tasks.
"If the eighth day of the month comes on a
Tuesday, what is the date of the following Friday?“
Hans would answer by tapping his hoof.
In fact, the horse was purported to have been
responding directly to involuntary cues in the
body language of the human trainer, who had the
faculties to solve each problem. The trainer was
entirely unaware that he was providing such cues.
(source: Wikipedia)

Duet model for document ranking (2017)
Latent representation learning
models (e.g., duet and DSSM)
“memorize” relationships
between term and entities

Today Recent In older
(1990s)
TREC data
Query: uk prime minister

Cross domain performance is an important
requirement in many IR scenarios–e.g.,
1. Bing (across markets)
2. Enterprise search (across tenants)

BM25 vs.
Inverse document
frequency of terms( )
Duet
Embeddings containing
noisy co-occurrence
information
( )
What corpus statistics do they depend on?

Problem setup
domain A domain B domain C domain X
training domains test domain

The distributed sub-model of duet
Projects query and document
to latent space for matching
Additional fully-connected
layers to estimate relevance
Hidden layers may encode
domain specific statistics
convolution and
pooling layers
convolution and
pooling layers
hadamard
product
dense layers 𝑦
query
doc
How do we encourage the model to only learn
features that generalize across multiple domains?

The distributed sub-model of duet
Train model on multiple domains
During training, an adversarial
discriminator inspects the hidden
states of the model and tries to
predict the source corpus of the
training sample
convolution and
pooling layers
convolution and
pooling layers
hadamard
product
dense layers
adversarial discriminator (dense) 𝑧
𝑦
query
doc
The duet model, in addition to optimizing for the
ranking loss, also tries to “fool” the adversarial
discriminator – and in the process learns more
domain independent representations

Additional regularization for the ranking loss

Additional regularization for the ranking loss
query
relevant
document
non-relevant
document
parameters of
the adversarial
discriminator
parameters of the
ranking model

Gradient reversal
Reverse the gradient from
the discriminator when
back-propagating through
the ranking model
convolution and
pooling layers
convolution and
pooling layers
hadamard
product
dense layers
adversarial discriminator (dense) 𝑧
𝑦
query
doc
≈ ≈

Results: Yahoo Webscope L4 topics
In-domain (large) ≫ Out-of-domain + adversarial ≫ Out-of-domain ≫ In-domain (small)

Results: cross collection
Out-of-domain + Adversarial ≫ Out-of-domain

There are other challenges
with depending too heavily
on cooccurrence patterns

Adversarial regularization
may also be useful for
mitigating such issues

Optimizing Query Evaluations
using Reinforcement Learning
for Web Search
Corby Rosset, Damien Jose, Gargi Ghosh, Bhaskar Mitra,
and Saurabh Tiwary
https://arxiv.org/abs/1804.04410

Large scale IR systems trade-off search result quality and query response time
In Bing, we have a candidate generation stage followed by multiple rank and prune stages
Typically, we apply machine learning in the re-ranking stages
In this work, we explore reinforcement learning for effective and efficient candidate generation

In Bing, the index is distributed over multiple machines
For candidate generation, on each machine the documents are linearly scanned using a match plan

When a query comes in, it is automatically
categorized and a pre-defined match plan is
selected
A match plan consists of a sequence of
match rules, and corresponding stopping
criteria
A match rule defines the condition that
a document should satisfy to be selected as
a candidate
The stopping criteria decides when
the index scan using a particular match rule
should terminate—and if the matching
process should continue with the next match
rule, or conclude, or reset to the beginning
of the index

Match plans influence the
trade-off between effectiveness
and efficiency
E.g., long queries with rare
intents may require expensive
match plans that consider body
text and search deeper into the
index
In contrast, for popular
navigational queries a shallow
scan against URL and title
metastreams may be sufficient

E.g.,
Query: halloween costumes
Match rule: mrA → (halloween ∈ A|U|B|T ) ∧ (costumes ∈ A|U|B|T )
Query: facebook login
Match rule: mrB → (facebook ∈ U|T )

During execution, two accumulators are tracked
u: the number of blocks accessed from disk
v: the cum. number of term matches in all inspected documents
A stopping criteria sets thresholds for each – when either thresholds are met, the scan using
that particular match rule terminates
Matching may then continue with a new match rule, or terminate, or re-start from beginning

Typically these match plans are hand-crafted and
statically assigned to different query categories
In this work, we cast match planning as a
reinforcement learning task

Reinforcement
learning
environment
action reward
agent
state

Reinforcement
learning
(for Bing candidate generation)
index
match rule relevance discounted by
index blocks accessed
agent
accumulators
(u, v)

Reinforcement
learning
Learn a policy πθ : S → A which
maximizes the cumulative
discounted reward R
Where, γ is the discount rate
index
agent
accumulators
(u, v)

Reinforcement
learning
We use table based Q learning
State space: discrete <ut, vt>
Action space:
index
agent
accumulators
(u, v)

Reinforcement
learning
Reward function:
g(di) is the relevance of the ith
document estimated based on the
subsequent L1 ranker score—
considering only top n documents
index
agent
accumulators
(u, v)

Reinforcement
learning
Final reward:
If no new documents are selected,
we assign a small negative reward
index
agent
accumulators
(u, v)

Conclusions
Traditionally, ML models consumer more time and resources to
improve quality of retrieved results
In this work, we argue that ML based approaches can help improve
our response time
Milliseconds saved can translate to material cost savings in query
serving infrastructure or can be re-purposed by upstream systems to
provide better end-user experience

THANK YOU!
Blog post: https://www.microsoft.com/en-
us/research/blog/adversarial-and-reinforcement-
learning-based-approaches-to-information-retrieval/

Adversarial and reinforcement learning approaches to optimize information retrieval

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Adversarial and reinforcement learning approaches to optimize information retrieval

Similar to Adversarial and reinforcement learning approaches to optimize information retrieval (20)

More from Bhaskar Mitra

More from Bhaskar Mitra (18)

Recently uploaded

Recently uploaded (20)

Adversarial and reinforcement learning approaches to optimize information retrieval

Editor's Notes