Causal inference in Recommender Systems

Causal Inference in
Recommender Systems
Amit Sharma
Senior Researcher, Microsoft Research India
@amt_shrma
http://www.amitsharma.in/
Invited Talk: REVEAL Workshop @ACM RecSys 2020

How to evaluate a recommender system?
Accuracy
• Is the predicted rating similar to a user’s
rating?
• Does the user click on a
recommendation?
Coverage
• Does the system exclude certain items
from recommendation?
Diversity
• Does the system recommend items
different from each other?
Insufficient for the questions we
want to answer.
Does the recommender system increase
revenue?
Does it shape what people buy or
consume?
Does it create “echo chambers” or
make people more polarized?

Simple example: The “Harry Potter” Problem
Suppose a recommender always recommends the next book by the same author.
High accuracy and high coverage system. Diversity can also be high if user reads diverse genres of books.
Harry Potter 2
By J.K. Rowling
The Road
By Cormac McCarthy

A causal view of a recommender system
Key question: What would be the outcome metric in a world without the
recommendation algorithm?
Recommendation
Algorithm
Evaluating the
algorithm
Policy or
Intervention
Causal effect of
intervention
𝑃(𝑅𝑒𝑐|𝑈𝑠𝑒𝑟𝐶𝑜𝑛𝑡𝑒𝑥𝑡)
𝑃 𝑂𝑢𝑡𝑐𝑜𝑚𝑒 𝐝𝐨(𝑅𝑒𝑐))
𝑃 𝑂𝑢𝑡𝑐𝑜𝑚𝑒 𝐝𝐨(𝑅𝑒𝑐 = 1)) 𝑃 𝑂𝑢𝑡𝑐𝑜𝑚𝑒 𝐝𝐨(𝑅𝑒𝑐 = 0))
Causal Impact of Recommender= 𝑃 𝑂𝑢𝑡𝑐𝑜𝑚𝑒 𝐝𝐨(𝑅𝑒𝑐 = 1)) − 𝑃 𝑂𝑢𝑡𝑐𝑜𝑚𝑒 𝐝𝐨(𝑅𝑒𝑐 = 1))

Comparing to a counterfactual world provides
new, causal metrics
Serendipity: “Recommendation helps the user find a surprisingly interesting item
they might not have otherwise discovered” ---Herlocker et al. 2004 (TOIS)
But so far we lacked the tools to measure such metrics.
Accuracy
Coverage
Increase in Clicks
or Revenue
Fairness by Parity
Diversity
Fairness by Equal
Opportunity

Today’s talk: How to estimate causal metrics
for a recommender system?
1. Case study: Estimate the impact of Amazon’s recommendation engine
Describe the four steps of causal analysis:
1. Model causal mechanisms in a system.
2. Identify the correct metric to estimate.
3. Estimate the metric.
4. Check robustness of the estimate to unobserved confounding.
2. New, causal metrics: How a causal inference view enables us to ask new
questions about a recommender system?
DoWhy: A Python library for causal inference that implements the four
steps. https://github.com/microsoft/dowhy

Causal Impact: How many additional views does a
recommender system bring?
8
Accuracy
Increase in Clicks
or Revenue

Hypothetical experiment: Randomized A/B test
Can we develop an offline metric? 9
Treatment (A): Observed Control (B): Counterfactual world

Step 1: Modeling the causal mechanism and
identifying the confounding factors
10
Demand for
The Road
Visits to The
Road
Rec. visits to
No Country
for Old Men
Demand for
No Country for
Old Men

Observed activity is almost surely an
overestimate of the causal effect
11
Causal
Convenience
OBSERVED ACTIVITY
FROM RECOMMENDER
All page
visits
?
ACTIVITY WITHOUT
RECOMMENDER

Step 2: Identification--Is there a way to
recover the causal effect from observed data?
Naïve: 𝐄[𝑌/𝑋]
To remove convenience clicks, need
a proxy for unobserved demand.
“Backdoor criterion”: 𝐄 wY/X
where the weight
𝑤 = 1/𝑃(𝑋 = 1| 𝑈𝑠𝑒𝑟𝐶𝑜𝑛𝑡𝑒𝑥𝑡) captures
demand of the user. (inverse propensity
weighting).
But method depends on accurately
capturing unknown user context.
Demand
for
Product
Visits to
Product
(X)
Visits to
Recommended
product (Y)
Demand for
Recommended
product

Finding a demand proxy using natural experiments:
Split outcome into recommender (primary) and direct visits
13
All visits to a
recommended product
Recommender
visits
Direct
visits
Search
visits
Direct
browsing
Auxiliary outcome: Proxy
for unobserved demand
for recommended product
Demand
for
Product
Visits to
Product
(X)
Rec. Visits
to Y (𝒀 𝑹)
Demand for
Recommended
product
Direct Visits
to Y (𝒀 𝑫)

? ?
Example: Product X’s visits change but the direct visits
to recommended product Y are constant (Accept)
14

15
Example: Products visits change and direct visits to
recommended product also change similarly (Reject)

Leads to the “split-door” criterion
16
Criterion: Observed visits through a recommended link are causal only
when 𝑿 ∐ 𝒀 𝑫 .
Demand for
focal product
(UX)
Visits to focal
product (X)
Rec. visits
(YR)
Direct visits
(YD)
Demand for
rec. product
(UY)

More formally, the criterion is based on do-
calculus over the causal graph
17
Unobserved
variables (UX)
Cause
(X)
Outcome (YR)
Auxiliary
Outcome
(YD)
Unobserved
variables (UY)

Step 3: Estimation with Amazon.com logs
from the Bing toolbar
Out of which 20 K products have at least 10 visits on any one day

Implementing the split-door criterion
19
< 𝑋, 𝑌𝐷 >
𝑡 = 15 days

Estimate the metric over valid split-door pairs
of products
20
Using the split-door criterion, obtained 23,000
natural experiments for over 12,000 products.
(~half of all products~20k)

Step 4: Check robustness of the estimate to
unobserved confounding
What if there is an
unobserved confounder
that affects the
recommendation click-
throughs but not the
direct visits?
• Select plausible values
for the confounder
• Simulate how robust
the estimate is.

Summary: Same process of causal analysis can be
applied to develop metrics for new problems
• Does a system provide same accuracy/performance across
demographics?
• Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, Emine
Yilmaz (WWW 2017). Auditing Search Engines for Differential Satisfaction Across
Demographics.
• How to measure long-term outcomes due to a system that cannot be
measured by randomized experiments?
• If you have a new product, which people to send the
recommendation to such that number of purchases is maximized
(limited budget to send recommendations)?
• Email for a copy.

Thank you!!
• Try DoWhy, a Python library for causal inference that implements the four steps
of causal analysis
https://github.com/microsoft/dowhy
• Upcoming book on Causal Inference in ML systems (w/ Emre Kiciman):
https://causalinference.gitlab.io/
• Papers
• Sharma, Amit, Jake M. Hofman, and Duncan J. Watts. "Estimating the causal impact of
recommendation systems from observational data." Proc. ACM EC 2015.
• Sharma, Amit, Jake M. Hofman, and Duncan J. Watts. "Split-door criterion: Identification
of causal effects through auxiliary outcomes." The Annals of Applied Statistics (2018).
Amit Sharma, Microsoft Research India
@amt_shrma www.amitsharma.in

Causal inference in Recommender Systems

Recommended

Recommended

More Related Content

More from Amit Sharma

More from Amit Sharma (14)

Recently uploaded

Recently uploaded (20)

Causal inference in Recommender Systems

Editor's Notes