SlideShare a Scribd company logo
1 of 23
Causal Inference in
Recommender Systems
Amit Sharma
Senior Researcher, Microsoft Research India
@amt_shrma
http://www.amitsharma.in/
Invited Talk: REVEAL Workshop @ACM RecSys 2020
2
Quartz
How to evaluate a recommender system?
Accuracy
β€’ Is the predicted rating similar to a user’s
rating?
β€’ Does the user click on a
recommendation?
Coverage
β€’ Does the system exclude certain items
from recommendation?
Diversity
β€’ Does the system recommend items
different from each other?
Insufficient for the questions we
want to answer.
Does the recommender system increase
revenue?
Does it shape what people buy or
consume?
Does it create β€œecho chambers” or
make people more polarized?
Simple example: The β€œHarry Potter” Problem
Suppose a recommender always recommends the next book by the same author.
High accuracy and high coverage system. Diversity can also be high if user reads diverse genres of books.
Harry Potter 2
By J.K. Rowling
The Road
By Cormac McCarthy
A causal view of a recommender system
Key question: What would be the outcome metric in a world without the
recommendation algorithm?
Recommendation
Algorithm
Evaluating the
algorithm
Policy or
Intervention
Causal effect of
intervention
𝑃(𝑅𝑒𝑐|π‘ˆπ‘ π‘’π‘ŸπΆπ‘œπ‘›π‘‘π‘’π‘₯𝑑)
𝑃 π‘‚π‘’π‘‘π‘π‘œπ‘šπ‘’ 𝐝𝐨(𝑅𝑒𝑐))
𝑃 π‘‚π‘’π‘‘π‘π‘œπ‘šπ‘’ 𝐝𝐨(𝑅𝑒𝑐 = 1)) 𝑃 π‘‚π‘’π‘‘π‘π‘œπ‘šπ‘’ 𝐝𝐨(𝑅𝑒𝑐 = 0))
Causal Impact of Recommender= 𝑃 π‘‚π‘’π‘‘π‘π‘œπ‘šπ‘’ 𝐝𝐨(𝑅𝑒𝑐 = 1)) βˆ’ 𝑃 π‘‚π‘’π‘‘π‘π‘œπ‘šπ‘’ 𝐝𝐨(𝑅𝑒𝑐 = 1))
Comparing to a counterfactual world provides
new, causal metrics
Serendipity: β€œRecommendation helps the user find a surprisingly interesting item
they might not have otherwise discovered” ---Herlocker et al. 2004 (TOIS)
But so far we lacked the tools to measure such metrics.
Accuracy
Coverage
Increase in Clicks
or Revenue
Fairness by Parity
Diversity
Fairness by Equal
Opportunity
Today’s talk: How to estimate causal metrics
for a recommender system?
1. Case study: Estimate the impact of Amazon’s recommendation engine
Describe the four steps of causal analysis:
1. Model causal mechanisms in a system.
2. Identify the correct metric to estimate.
3. Estimate the metric.
4. Check robustness of the estimate to unobserved confounding.
2. New, causal metrics: How a causal inference view enables us to ask new
questions about a recommender system?
DoWhy: A Python library for causal inference that implements the four
steps. https://github.com/microsoft/dowhy
Causal Impact: How many additional views does a
recommender system bring?
8
Accuracy
Increase in Clicks
or Revenue
Hypothetical experiment: Randomized A/B test
Can we develop an offline metric? 9
Treatment (A): Observed Control (B): Counterfactual world
Step 1: Modeling the causal mechanism and
identifying the confounding factors
10
Demand for
The Road
Visits to The
Road
Rec. visits to
No Country
for Old Men
Demand for
No Country for
Old Men
Observed activity is almost surely an
overestimate of the causal effect
11
Causal
Convenience
OBSERVED ACTIVITY
FROM RECOMMENDER
All page
visits
?
ACTIVITY WITHOUT
RECOMMENDER
Step 2: Identification--Is there a way to
recover the causal effect from observed data?
NaΓ―ve: 𝐄[π‘Œ/𝑋]
To remove convenience clicks, need
a proxy for unobserved demand.
β€œBackdoor criterion”: 𝐄 wY/X
where the weight
𝑀 = 1/𝑃(𝑋 = 1| π‘ˆπ‘ π‘’π‘ŸπΆπ‘œπ‘›π‘‘π‘’π‘₯𝑑) captures
demand of the user. (inverse propensity
weighting).
But method depends on accurately
capturing unknown user context.
Demand
for
Product
Visits to
Product
(X)
Visits to
Recommended
product (Y)
Demand for
Recommended
product
Finding a demand proxy using natural experiments:
Split outcome into recommender (primary) and direct visits
13
All visits to a
recommended product
Recommender
visits
Direct
visits
Search
visits
Direct
browsing
Auxiliary outcome: Proxy
for unobserved demand
for recommended product
Demand
for
Product
Visits to
Product
(X)
Rec. Visits
to Y (𝒀 𝑹)
Demand for
Recommended
product
Direct Visits
to Y (𝒀 𝑫)
? ?
Example: Product X’s visits change but the direct visits
to recommended product Y are constant (Accept)
14
15
Example: Products visits change and direct visits to
recommended product also change similarly (Reject)
Leads to the β€œsplit-door” criterion
16
Criterion: Observed visits through a recommended link are causal only
when 𝑿 ∐ 𝒀 𝑫 .
Demand for
focal product
(UX)
Visits to focal
product (X)
Rec. visits
(YR)
Direct visits
(YD)
Demand for
rec. product
(UY)
More formally, the criterion is based on do-
calculus over the causal graph
17
Unobserved
variables (UX)
Cause
(X)
Outcome (YR)
Auxiliary
Outcome
(YD)
Unobserved
variables (UY)
Step 3: Estimation with Amazon.com logs
from the Bing toolbar
Out of which 20 K products have at least 10 visits on any one day
Implementing the split-door criterion
19
< 𝑋, π‘Œπ· >
𝑑 = 15 days
Estimate the metric over valid split-door pairs
of products
20
Using the split-door criterion, obtained 23,000
natural experiments for over 12,000 products.
(~half of all products~20k)
Step 4: Check robustness of the estimate to
unobserved confounding
What if there is an
unobserved confounder
that affects the
recommendation click-
throughs but not the
direct visits?
β€’ Select plausible values
for the confounder
β€’ Simulate how robust
the estimate is.
Summary: Same process of causal analysis can be
applied to develop metrics for new problems
β€’ Does a system provide same accuracy/performance across
demographics?
β€’ Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, Emine
Yilmaz (WWW 2017). Auditing Search Engines for Differential Satisfaction Across
Demographics.
β€’ How to measure long-term outcomes due to a system that cannot be
measured by randomized experiments?
β€’ If you have a new product, which people to send the
recommendation to such that number of purchases is maximized
(limited budget to send recommendations)?
β€’ Email for a copy.
Thank you!!
β€’ Try DoWhy, a Python library for causal inference that implements the four steps
of causal analysis
https://github.com/microsoft/dowhy
β€’ Upcoming book on Causal Inference in ML systems (w/ Emre Kiciman):
https://causalinference.gitlab.io/
β€’ Papers
β€’ Sharma, Amit, Jake M. Hofman, and Duncan J. Watts. "Estimating the causal impact of
recommendation systems from observational data." Proc. ACM EC 2015.
β€’ Sharma, Amit, Jake M. Hofman, and Duncan J. Watts. "Split-door criterion: Identification
of causal effects through auxiliary outcomes." The Annals of Applied Statistics (2018).
Amit Sharma, Microsoft Research India
@amt_shrma www.amitsharma.in

More Related Content

More from Amit Sharma

Causal inference in data science
Causal inference in data scienceCausal inference in data science
Causal inference in data scienceAmit Sharma
Β 
Causal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesCausal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesAmit Sharma
Β 
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesEquivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesAmit Sharma
Β 
Estimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsEstimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsAmit Sharma
Β 
Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Amit Sharma
Β 
Data mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comData mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comAmit Sharma
Β 
Estimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsEstimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsAmit Sharma
Β 
From prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsFrom prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsAmit Sharma
Β 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practiceAmit Sharma
Β 
Causal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereCausal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereAmit Sharma
Β 
The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...Amit Sharma
Β 
The role of social connections in shaping our preferences
The role of social connections in shaping our preferencesThe role of social connections in shaping our preferences
The role of social connections in shaping our preferencesAmit Sharma
Β 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...Amit Sharma
Β 
RSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationRSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationAmit Sharma
Β 

More from Amit Sharma (14)

Causal inference in data science
Causal inference in data scienceCausal inference in data science
Causal inference in data science
Β 
Causal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesCausal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practices
Β 
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesEquivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Β 
Estimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsEstimating the causal impact of recommender systems
Estimating the causal impact of recommender systems
Β 
Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...
Β 
Data mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comData mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.com
Β 
Estimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsEstimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actions
Β 
From prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsFrom prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systems
Β 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practice
Β 
Causal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereCausal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhere
Β 
The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...
Β 
The role of social connections in shaping our preferences
The role of social connections in shaping our preferencesThe role of social connections in shaping our preferences
The role of social connections in shaping our preferences
Β 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
Β 
RSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationRSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendation
Β 

Recently uploaded

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSΓ©rgio Sacani
Β 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
Β 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
Β 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
Β 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
Β 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
Β 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
Β 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
Β 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
Β 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
Β 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
Β 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
Β 
CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service πŸͺ‘
CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service  πŸͺ‘CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service  πŸͺ‘
CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service πŸͺ‘anilsa9823
Β 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...SΓ©rgio Sacani
Β 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
Β 
Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
Β 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSΓ©rgio Sacani
Β 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSΓ©rgio Sacani
Β 

Recently uploaded (20)

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Β 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
Β 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Β 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Β 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
Β 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
Β 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Β 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
Β 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Β 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
Β 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
Β 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
Β 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
Β 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
Β 
CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service πŸͺ‘
CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service  πŸͺ‘CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service  πŸͺ‘
CALL ON βž₯8923113531 πŸ”Call Girls Kesar Bagh Lucknow best Night Fun service πŸͺ‘
Β 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Β 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
Β 
Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow πŸ’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Β 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Β 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
Β 

Causal inference in Recommender Systems

  • 1. Causal Inference in Recommender Systems Amit Sharma Senior Researcher, Microsoft Research India @amt_shrma http://www.amitsharma.in/ Invited Talk: REVEAL Workshop @ACM RecSys 2020
  • 3. How to evaluate a recommender system? Accuracy β€’ Is the predicted rating similar to a user’s rating? β€’ Does the user click on a recommendation? Coverage β€’ Does the system exclude certain items from recommendation? Diversity β€’ Does the system recommend items different from each other? Insufficient for the questions we want to answer. Does the recommender system increase revenue? Does it shape what people buy or consume? Does it create β€œecho chambers” or make people more polarized?
  • 4. Simple example: The β€œHarry Potter” Problem Suppose a recommender always recommends the next book by the same author. High accuracy and high coverage system. Diversity can also be high if user reads diverse genres of books. Harry Potter 2 By J.K. Rowling The Road By Cormac McCarthy
  • 5. A causal view of a recommender system Key question: What would be the outcome metric in a world without the recommendation algorithm? Recommendation Algorithm Evaluating the algorithm Policy or Intervention Causal effect of intervention 𝑃(𝑅𝑒𝑐|π‘ˆπ‘ π‘’π‘ŸπΆπ‘œπ‘›π‘‘π‘’π‘₯𝑑) 𝑃 π‘‚π‘’π‘‘π‘π‘œπ‘šπ‘’ 𝐝𝐨(𝑅𝑒𝑐)) 𝑃 π‘‚π‘’π‘‘π‘π‘œπ‘šπ‘’ 𝐝𝐨(𝑅𝑒𝑐 = 1)) 𝑃 π‘‚π‘’π‘‘π‘π‘œπ‘šπ‘’ 𝐝𝐨(𝑅𝑒𝑐 = 0)) Causal Impact of Recommender= 𝑃 π‘‚π‘’π‘‘π‘π‘œπ‘šπ‘’ 𝐝𝐨(𝑅𝑒𝑐 = 1)) βˆ’ 𝑃 π‘‚π‘’π‘‘π‘π‘œπ‘šπ‘’ 𝐝𝐨(𝑅𝑒𝑐 = 1))
  • 6. Comparing to a counterfactual world provides new, causal metrics Serendipity: β€œRecommendation helps the user find a surprisingly interesting item they might not have otherwise discovered” ---Herlocker et al. 2004 (TOIS) But so far we lacked the tools to measure such metrics. Accuracy Coverage Increase in Clicks or Revenue Fairness by Parity Diversity Fairness by Equal Opportunity
  • 7. Today’s talk: How to estimate causal metrics for a recommender system? 1. Case study: Estimate the impact of Amazon’s recommendation engine Describe the four steps of causal analysis: 1. Model causal mechanisms in a system. 2. Identify the correct metric to estimate. 3. Estimate the metric. 4. Check robustness of the estimate to unobserved confounding. 2. New, causal metrics: How a causal inference view enables us to ask new questions about a recommender system? DoWhy: A Python library for causal inference that implements the four steps. https://github.com/microsoft/dowhy
  • 8. Causal Impact: How many additional views does a recommender system bring? 8 Accuracy Increase in Clicks or Revenue
  • 9. Hypothetical experiment: Randomized A/B test Can we develop an offline metric? 9 Treatment (A): Observed Control (B): Counterfactual world
  • 10. Step 1: Modeling the causal mechanism and identifying the confounding factors 10 Demand for The Road Visits to The Road Rec. visits to No Country for Old Men Demand for No Country for Old Men
  • 11. Observed activity is almost surely an overestimate of the causal effect 11 Causal Convenience OBSERVED ACTIVITY FROM RECOMMENDER All page visits ? ACTIVITY WITHOUT RECOMMENDER
  • 12. Step 2: Identification--Is there a way to recover the causal effect from observed data? NaΓ―ve: 𝐄[π‘Œ/𝑋] To remove convenience clicks, need a proxy for unobserved demand. β€œBackdoor criterion”: 𝐄 wY/X where the weight 𝑀 = 1/𝑃(𝑋 = 1| π‘ˆπ‘ π‘’π‘ŸπΆπ‘œπ‘›π‘‘π‘’π‘₯𝑑) captures demand of the user. (inverse propensity weighting). But method depends on accurately capturing unknown user context. Demand for Product Visits to Product (X) Visits to Recommended product (Y) Demand for Recommended product
  • 13. Finding a demand proxy using natural experiments: Split outcome into recommender (primary) and direct visits 13 All visits to a recommended product Recommender visits Direct visits Search visits Direct browsing Auxiliary outcome: Proxy for unobserved demand for recommended product Demand for Product Visits to Product (X) Rec. Visits to Y (𝒀 𝑹) Demand for Recommended product Direct Visits to Y (𝒀 𝑫)
  • 14. ? ? Example: Product X’s visits change but the direct visits to recommended product Y are constant (Accept) 14
  • 15. 15 Example: Products visits change and direct visits to recommended product also change similarly (Reject)
  • 16. Leads to the β€œsplit-door” criterion 16 Criterion: Observed visits through a recommended link are causal only when 𝑿 ∐ 𝒀 𝑫 . Demand for focal product (UX) Visits to focal product (X) Rec. visits (YR) Direct visits (YD) Demand for rec. product (UY)
  • 17. More formally, the criterion is based on do- calculus over the causal graph 17 Unobserved variables (UX) Cause (X) Outcome (YR) Auxiliary Outcome (YD) Unobserved variables (UY)
  • 18. Step 3: Estimation with Amazon.com logs from the Bing toolbar Out of which 20 K products have at least 10 visits on any one day
  • 19. Implementing the split-door criterion 19 < 𝑋, π‘Œπ· > 𝑑 = 15 days
  • 20. Estimate the metric over valid split-door pairs of products 20 Using the split-door criterion, obtained 23,000 natural experiments for over 12,000 products. (~half of all products~20k)
  • 21. Step 4: Check robustness of the estimate to unobserved confounding What if there is an unobserved confounder that affects the recommendation click- throughs but not the direct visits? β€’ Select plausible values for the confounder β€’ Simulate how robust the estimate is.
  • 22. Summary: Same process of causal analysis can be applied to develop metrics for new problems β€’ Does a system provide same accuracy/performance across demographics? β€’ Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, Emine Yilmaz (WWW 2017). Auditing Search Engines for Differential Satisfaction Across Demographics. β€’ How to measure long-term outcomes due to a system that cannot be measured by randomized experiments? β€’ If you have a new product, which people to send the recommendation to such that number of purchases is maximized (limited budget to send recommendations)? β€’ Email for a copy.
  • 23. Thank you!! β€’ Try DoWhy, a Python library for causal inference that implements the four steps of causal analysis https://github.com/microsoft/dowhy β€’ Upcoming book on Causal Inference in ML systems (w/ Emre Kiciman): https://causalinference.gitlab.io/ β€’ Papers β€’ Sharma, Amit, Jake M. Hofman, and Duncan J. Watts. "Estimating the causal impact of recommendation systems from observational data." Proc. ACM EC 2015. β€’ Sharma, Amit, Jake M. Hofman, and Duncan J. Watts. "Split-door criterion: Identification of causal effects through auxiliary outcomes." The Annals of Applied Statistics (2018). Amit Sharma, Microsoft Research India @amt_shrma www.amitsharma.in

Editor's Notes

  1. What is the impact of a recommender system? The truth obviously lies somewhere in the middle. Both are exaggerated.
  2. Key question
  3. Nothing new.
  4. Suppose you are Amazon and you are While the concepts are general, they are best understood through an example. Causal: how much activity Suppose you want to improve recommendation. One of the metrics you want is for novel recommendation
  5. And Ideally, we would want such an estimate for every product. And in many cases, infeasible. E.g. considerable effect on user experience. Question: rec has value Question: can randomize order. Or show random recommendations: why costly? Answer: can do but we need offlne metric..can be used to train new algorithm.
  6. But if you just think about it, obs. CTR is almost surely an overestimate. It is helpful to think about in terms of causal and convenience. By design, a recommender system shows similar products,
  7. In our case, it is page visits due to recommender and direct visits.
  8. Story: yd is instrument. Not coming automatically but more validating. Say and this actually happened..oprah invited road book.
  9. Everything that is affecting pr outcome should affect auxiliary. Can think of as giving us exclusion. But more broadly, serves to remove this arrow.
  10. Observed effect is also the causal effect.
  11. But we can actually do more general.
  12. Improve quality of image.
  13. All products is it method? Baseline: A method that can generate valid instrument
  14. Can discover those that we would not think of.