SlideShare a Scribd company logo
1 of 23
DoWhy: An end-to-end
library for causal inference
Amit Sharma (@amt_shrma), Emre Kiciman (@emrek)
Microsoft Research
A big thanks to Adam Kelleher, Tanmay Kulkarni and many other open-
source contributors!
https://github.com/microsoft/dowhy
Prediction Causation
Assume:
𝑃𝑡𝑟𝑎𝑖𝑛 𝑊, 𝑋, 𝑌 = 𝑃𝑡𝑒𝑠𝑡(𝑊, 𝑋, 𝑌)
Estimate: min 𝐿( 𝑦, 𝑦)
Evaluate: Cross-validation
Fundamental problem with causal inference
• Causal inference concerns estimation about different data
distributions than the training distribution
• What if 𝑥 is changed to different value?
• How do the results change for a different sample of people?
• What if a particular algorithm is changed in a system?
• Often, no data is available for that distribution
• Cross-validation is not possible
• 𝛽is not observed, unlike 𝑦.
• 𝑦 is observed for training domain, but not for a new domain.
Estimation about different data
distributions than the training distribution.
Often, no data is available for that
distribution.
1. Assumptions
2. Evaluation
1. Assumptions drive causal inference
• Causal inference methods depend on untestable assumptions.
• Even with large-scale data, the final estimate can be heavily sensitive to
those assumptions.
• Important to transparently communicate those assumptions.
Item 1
Demand for
items
Item 2
Recommendation
System
2. Causal estimates are hard to validate
• Cannot compare two causal
estimates on the same dataset
• Q: Is algorithm B better than the
production algorithm A?
• Cannot tell without doing an A/B
test.
• Let alone compare two estimates
from two different datasets
• Everyone prefers their own
favorite methods
• Need objective metrics to
validate causal estimates
The effect of online advertising on sales is
20% (std error=5)
What assumptions went
in the analysis?
How would it change if
one of the assumptions
was incorrect?
Is it robust to seasonal
shifts in behavior?
What is the expected
error in this estimate?
We built DoWhy to make assumptions front-and-
center of any causal analysis.
- Transparent declaration of assumptions
- Evaluation of those assumptions, to the extent possible
An end-to-end platform for doing causal inference
Formulate correct
estimand
Estimate causal
effect
Check robustness
Input Data
<cause, outcome,
other variables>
Domain Knowledge
Causal
effect
CausalImpact, tmle,
causaleffect,…
Formulate correct
estimand
• Check
data with
properties
implied
by the
model
Estimate causal
effect
• Use a
suitable
method
to
estimate
effect.
Check robustness
• Refute
obtained
estimate
through
multiple
tests.
Input Data
<cause, outcome,
other variables>
Cause
v1,v2
Outcome
v3 v5
w
Domain Knowledge
Causal
effect
DoWhy
Making Assumptions Transparent
Testing those assumptions
Code demo
https://github.com/microsoft/dowhy/blob/master/docs/source/example
_notebooks/dowhy_confounder_example.ipynb
DoWhy encodes the four steps of causal
reasoning
1. Modeling: Create a causal graph to encode assumptions
2. Identification: Formulate what to estimate
3. Estimation: Compute the estimate
4. Refutation: Validate the assumptions
I. Identification: Formulate correct estimand
1. Constructs causal Bayesian network from user-provided
knowledge.
• Check whether the data satisfies the Bayesian network’s
assumptions.
2. Tries out different techniques for identifying a causal effect and
check which ones are feasible.
• Back-door criterion [Pearl 2000]
• Instrumental variable [Wright 1928, Angrist and Pischke 1991]
3. Provides “what to estimate”: a target estimand for causal
effect.
II. Estimation: Compute the causal effect
Uses well-known techniques for causal inference.
Based on the estimand from Formulation step,
implements multiple methods including,
• Stratification
• Propensity score matching,
• Inverse propensity weighting,
• Natural experiments
• Conditional treatment effect estimators from
microsoft/EconML library.
Cause
v1,v2
Outcome
v3
v5
w
I. Formulate estimand
Find variables that “d-separate”
cause and outcome.
II. Estimate causal effect
Estimate as the observed effect
conditioned on the back-door
variables.
Cause
v1,v2
Outcome
v3
v5
w
Cause
v1,v2
Outcome
Input
Causal graph
𝑪𝒂𝒖𝒔𝒆, 𝑶𝒖𝒕, 𝑣1, 𝑣2,
𝑣3, 𝑣4, 𝑣5, 𝑤
𝑶𝒖𝒕 ⫫ 𝑪 𝑣1, 𝑣2] 𝐺¬𝑐→𝑂𝑢𝑡
𝑷(𝑶𝒖𝒕|𝒅𝒐 𝒄 )
=
𝑣 𝑖
𝑃 𝑂𝑢𝑡 𝑐, 𝑣𝑖 𝑃(𝑣𝑖)
COMPUTER SCIENCE
Do-calculus (Pearl 2001)
STATISTICS
Potential Outcomes (Rubin 1984)
𝐸 𝑶𝒖𝒕 𝒄 = 𝟏, 𝑣𝑖
− 𝐸 𝑶𝒖𝒕 𝒄 = 𝟎, 𝑣𝑖
What if the user
forgot to add an important variable to the
graph, or
did not even know about a confounder?
!
III. Refutation/Validation: Test robustness of
obtained estimate
Cause
v1,v2
Outcome
v3
v5
w
Input
Data
Cause
v1,v2, U
Outcome
v3
v5
w
Input
Data
Many “automatic” validation tests: Dummy Outcome test, Placebo test,
Subsample test, Add-unobserved-confounder test,
IIIa. Adding New Confounders
Add a variable 𝑼 that causes both 𝐶𝑎𝑢𝑠𝑒 and
𝑂𝑢𝑡𝑐𝑜𝑚𝑒.
1. 𝑼 is randomly generated.
• Rerun analysis, expect no change in causal effect.
2. 𝑼 is generated to have a correlation 𝜌 with 𝐶𝑎𝑢𝑠𝑒
and 𝑂𝑢𝑡𝑐𝑜𝑚𝑒.
• Assess sensitivity: how fast does the new causal
estimate go to zero?
Cause
v1,v2, U
Outcome
v3
v5
w
𝑿 = 𝑿′
+ 𝑼
𝒀 = 𝒀′ + 𝑼
IIIb. Placebo (“A/A”) test
Simulate a world where 𝐶𝑎𝑢𝑠𝑒 does not affect
𝑂𝑢𝑡𝑐𝑜𝑚𝑒.
Replace 𝐶𝑎𝑢𝑠𝑒 by a randomly generated variable in
the dataset.
• Rerun analysis, expect causal effect to go
to zero.
Cause
v1,v2, U
Outcome
v3
v5
w
IIIc. Subsampling test
Can also test statistical robustness.
E.g., Remove a random subset of the data.
• Rerun analysis, expect no change in the causal effect.
Input Data
Input DataInput Data
Summary: DoWhy, an end-to-end library for
causal inference
Test assumptions as far as possible
• Make assumptions explicit through a Bayesian network.
• Test assumptions from observed data [Sharma 2018, Arxiv].
Assess sensitivity to untested assumptions
• When tests are inconclusive, assess sensitivity of causal estimate to violation
of assumptions [Sharma et al. 2018, Annals of Applied Statistics].
Unify best practices from different scientific fields
• Unify different frameworks from computer science and statistics (“graphs and
potential outcomes”) [Kiciman & Sharma 2018, KDD Tutorial].
Thank you!
Resources
• KDD 2018 tutorial on causal inference
• https://causalinference.gitlab.io/kdd-tutorial/
• Upcoming book on “Causal Reasoning: Fundamentals and ML Applications”
• https://causalinference.gitlab.io/
• DoWhy
• Code: https://github.com/microsoft/dowhy
• Docs: https://microsoft.github.io/dowhy/
Amit Sharma
Microsoft Research India
@amt_shrma

More Related Content

What's hot

Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentationHJ van Veen
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
ONNX and MLflow
ONNX and MLflowONNX and MLflow
ONNX and MLflowamesar0
 
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningAnomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningQuantUniversity
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?blueace
 
[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices
[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices
[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best PracticesPAP (Product Analytics Playground)
 
[PAP] 실무자를 위한 인과추론 활용 : Best Practices
[PAP] 실무자를 위한 인과추론 활용 : Best Practices[PAP] 실무자를 위한 인과추론 활용 : Best Practices
[PAP] 실무자를 위한 인과추론 활용 : Best PracticesBokyung Choi
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat omarodibat
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionAdnan Masood
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelinesRamesh Sampath
 
Mathematics, Machine Learning and ML Engineering
Mathematics, Machine Learning and ML EngineeringMathematics, Machine Learning and ML Engineering
Mathematics, Machine Learning and ML EngineeringGopi Krishna Nuti
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation MethodSHUBHAM GUPTA
 
Optimizers
OptimizersOptimizers
OptimizersIl Gu Yi
 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Sri Ambati
 

What's hot (20)

Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
ONNX and MLflow
ONNX and MLflowONNX and MLflow
ONNX and MLflow
 
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningAnomaly detection: Core Techniques and Advances in Big Data and Deep Learning
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?
 
[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices
[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices
[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices
 
[PAP] 실무자를 위한 인과추론 활용 : Best Practices
[PAP] 실무자를 위한 인과추론 활용 : Best Practices[PAP] 실무자를 위한 인과추론 활용 : Best Practices
[PAP] 실무자를 위한 인과추론 활용 : Best Practices
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelines
 
Mathematics, Machine Learning and ML Engineering
Mathematics, Machine Learning and ML EngineeringMathematics, Machine Learning and ML Engineering
Mathematics, Machine Learning and ML Engineering
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
Optimizers
OptimizersOptimizers
Optimizers
 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
 

Similar to DoWhy Python library for causal inference: An End-to-End tool

UX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsData Science Milan
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusoneDotNetCampus
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATADotNetCampus
 
capture-recapture Single Defect
capture-recapture Single Defectcapture-recapture Single Defect
capture-recapture Single DefectJames Orr
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
The Art Of Debugging
The Art Of DebuggingThe Art Of Debugging
The Art Of Debuggingsvilen.ivanov
 
How to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated TestingHow to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated TestingTechWell
 
Experimental design version 4.3
Experimental design version 4.3Experimental design version 4.3
Experimental design version 4.3jschmied
 
Risk Management in Data Analysis
Risk Management in Data AnalysisRisk Management in Data Analysis
Risk Management in Data AnalysisDavid Lee
 
Exploratory Testing in an Agile Context
Exploratory Testing in an Agile ContextExploratory Testing in an Agile Context
Exploratory Testing in an Agile ContextElisabeth Hendrickson
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
 
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...Jörg Bächtiger
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Matt Hansen
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Matt Hansen
 
Exploratory Testing Explained
Exploratory Testing ExplainedExploratory Testing Explained
Exploratory Testing ExplainedTechWell
 
A beginners guide to testing
A beginners guide to testingA beginners guide to testing
A beginners guide to testingPhilip Johnson
 

Similar to DoWhy Python library for causal inference: An End-to-End tool (20)

UX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning Methods
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 
capture-recapture Single Defect
capture-recapture Single Defectcapture-recapture Single Defect
capture-recapture Single Defect
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
The Art Of Debugging
The Art Of DebuggingThe Art Of Debugging
The Art Of Debugging
 
How to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated TestingHow to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated Testing
 
Experimental design version 4.3
Experimental design version 4.3Experimental design version 4.3
Experimental design version 4.3
 
Risk Management in Data Analysis
Risk Management in Data AnalysisRisk Management in Data Analysis
Risk Management in Data Analysis
 
Exploratory Testing in an Agile Context
Exploratory Testing in an Agile ContextExploratory Testing in an Agile Context
Exploratory Testing in an Agile Context
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
Software testing
Software testingSoftware testing
Software testing
 
Exploratory Testing in Practice
Exploratory Testing in PracticeExploratory Testing in Practice
Exploratory Testing in Practice
 
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)
 
2014 toronto-torbug
2014 toronto-torbug2014 toronto-torbug
2014 toronto-torbug
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
 
Exploratory Testing Explained
Exploratory Testing ExplainedExploratory Testing Explained
Exploratory Testing Explained
 
A beginners guide to testing
A beginners guide to testingA beginners guide to testing
A beginners guide to testing
 

More from Amit Sharma

Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAmit Sharma
 
The Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practiceThe Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practiceAmit Sharma
 
Artificial Intelligence for Societal Impact
Artificial Intelligence for Societal ImpactArtificial Intelligence for Societal Impact
Artificial Intelligence for Societal ImpactAmit Sharma
 
Measuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsMeasuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsAmit Sharma
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleAmit Sharma
 
Auditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographicsAuditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographicsAmit Sharma
 
Causal inference in data science
Causal inference in data scienceCausal inference in data science
Causal inference in data scienceAmit Sharma
 
Causal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesCausal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesAmit Sharma
 
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesEquivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesAmit Sharma
 
Estimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsEstimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsAmit Sharma
 
Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Amit Sharma
 
Data mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comData mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comAmit Sharma
 
Estimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsEstimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsAmit Sharma
 
From prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsFrom prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsAmit Sharma
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practiceAmit Sharma
 
Causal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereCausal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereAmit Sharma
 
The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...Amit Sharma
 
The role of social connections in shaping our preferences
The role of social connections in shaping our preferencesThe role of social connections in shaping our preferences
The role of social connections in shaping our preferencesAmit Sharma
 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...Amit Sharma
 
RSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationRSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationAmit Sharma
 

More from Amit Sharma (20)

Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal Models
 
The Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practiceThe Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practice
 
Artificial Intelligence for Societal Impact
Artificial Intelligence for Societal ImpactArtificial Intelligence for Societal Impact
Artificial Intelligence for Societal Impact
 
Measuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsMeasuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systems
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scale
 
Auditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographicsAuditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographics
 
Causal inference in data science
Causal inference in data scienceCausal inference in data science
Causal inference in data science
 
Causal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesCausal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practices
 
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesEquivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
 
Estimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsEstimating the causal impact of recommender systems
Estimating the causal impact of recommender systems
 
Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...
 
Data mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comData mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.com
 
Estimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsEstimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actions
 
From prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsFrom prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systems
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practice
 
Causal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereCausal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhere
 
The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...
 
The role of social connections in shaping our preferences
The role of social connections in shaping our preferencesThe role of social connections in shaping our preferences
The role of social connections in shaping our preferences
 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
 
RSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationRSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendation
 

Recently uploaded

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 

Recently uploaded (20)

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 

DoWhy Python library for causal inference: An End-to-End tool

  • 1. DoWhy: An end-to-end library for causal inference Amit Sharma (@amt_shrma), Emre Kiciman (@emrek) Microsoft Research A big thanks to Adam Kelleher, Tanmay Kulkarni and many other open- source contributors! https://github.com/microsoft/dowhy
  • 2. Prediction Causation Assume: 𝑃𝑡𝑟𝑎𝑖𝑛 𝑊, 𝑋, 𝑌 = 𝑃𝑡𝑒𝑠𝑡(𝑊, 𝑋, 𝑌) Estimate: min 𝐿( 𝑦, 𝑦) Evaluate: Cross-validation
  • 3. Fundamental problem with causal inference • Causal inference concerns estimation about different data distributions than the training distribution • What if 𝑥 is changed to different value? • How do the results change for a different sample of people? • What if a particular algorithm is changed in a system? • Often, no data is available for that distribution • Cross-validation is not possible • 𝛽is not observed, unlike 𝑦. • 𝑦 is observed for training domain, but not for a new domain.
  • 4. Estimation about different data distributions than the training distribution. Often, no data is available for that distribution. 1. Assumptions 2. Evaluation
  • 5. 1. Assumptions drive causal inference • Causal inference methods depend on untestable assumptions. • Even with large-scale data, the final estimate can be heavily sensitive to those assumptions. • Important to transparently communicate those assumptions. Item 1 Demand for items Item 2 Recommendation System
  • 6. 2. Causal estimates are hard to validate • Cannot compare two causal estimates on the same dataset • Q: Is algorithm B better than the production algorithm A? • Cannot tell without doing an A/B test. • Let alone compare two estimates from two different datasets • Everyone prefers their own favorite methods • Need objective metrics to validate causal estimates
  • 7. The effect of online advertising on sales is 20% (std error=5) What assumptions went in the analysis? How would it change if one of the assumptions was incorrect? Is it robust to seasonal shifts in behavior? What is the expected error in this estimate?
  • 8. We built DoWhy to make assumptions front-and- center of any causal analysis. - Transparent declaration of assumptions - Evaluation of those assumptions, to the extent possible An end-to-end platform for doing causal inference
  • 9. Formulate correct estimand Estimate causal effect Check robustness Input Data <cause, outcome, other variables> Domain Knowledge Causal effect CausalImpact, tmle, causaleffect,…
  • 10. Formulate correct estimand • Check data with properties implied by the model Estimate causal effect • Use a suitable method to estimate effect. Check robustness • Refute obtained estimate through multiple tests. Input Data <cause, outcome, other variables> Cause v1,v2 Outcome v3 v5 w Domain Knowledge Causal effect DoWhy
  • 13. DoWhy encodes the four steps of causal reasoning 1. Modeling: Create a causal graph to encode assumptions 2. Identification: Formulate what to estimate 3. Estimation: Compute the estimate 4. Refutation: Validate the assumptions
  • 14. I. Identification: Formulate correct estimand 1. Constructs causal Bayesian network from user-provided knowledge. • Check whether the data satisfies the Bayesian network’s assumptions. 2. Tries out different techniques for identifying a causal effect and check which ones are feasible. • Back-door criterion [Pearl 2000] • Instrumental variable [Wright 1928, Angrist and Pischke 1991] 3. Provides “what to estimate”: a target estimand for causal effect.
  • 15. II. Estimation: Compute the causal effect Uses well-known techniques for causal inference. Based on the estimand from Formulation step, implements multiple methods including, • Stratification • Propensity score matching, • Inverse propensity weighting, • Natural experiments • Conditional treatment effect estimators from microsoft/EconML library.
  • 16. Cause v1,v2 Outcome v3 v5 w I. Formulate estimand Find variables that “d-separate” cause and outcome. II. Estimate causal effect Estimate as the observed effect conditioned on the back-door variables. Cause v1,v2 Outcome v3 v5 w Cause v1,v2 Outcome Input Causal graph 𝑪𝒂𝒖𝒔𝒆, 𝑶𝒖𝒕, 𝑣1, 𝑣2, 𝑣3, 𝑣4, 𝑣5, 𝑤 𝑶𝒖𝒕 ⫫ 𝑪 𝑣1, 𝑣2] 𝐺¬𝑐→𝑂𝑢𝑡 𝑷(𝑶𝒖𝒕|𝒅𝒐 𝒄 ) = 𝑣 𝑖 𝑃 𝑂𝑢𝑡 𝑐, 𝑣𝑖 𝑃(𝑣𝑖) COMPUTER SCIENCE Do-calculus (Pearl 2001) STATISTICS Potential Outcomes (Rubin 1984) 𝐸 𝑶𝒖𝒕 𝒄 = 𝟏, 𝑣𝑖 − 𝐸 𝑶𝒖𝒕 𝒄 = 𝟎, 𝑣𝑖
  • 17. What if the user forgot to add an important variable to the graph, or did not even know about a confounder? !
  • 18. III. Refutation/Validation: Test robustness of obtained estimate Cause v1,v2 Outcome v3 v5 w Input Data Cause v1,v2, U Outcome v3 v5 w Input Data Many “automatic” validation tests: Dummy Outcome test, Placebo test, Subsample test, Add-unobserved-confounder test,
  • 19. IIIa. Adding New Confounders Add a variable 𝑼 that causes both 𝐶𝑎𝑢𝑠𝑒 and 𝑂𝑢𝑡𝑐𝑜𝑚𝑒. 1. 𝑼 is randomly generated. • Rerun analysis, expect no change in causal effect. 2. 𝑼 is generated to have a correlation 𝜌 with 𝐶𝑎𝑢𝑠𝑒 and 𝑂𝑢𝑡𝑐𝑜𝑚𝑒. • Assess sensitivity: how fast does the new causal estimate go to zero? Cause v1,v2, U Outcome v3 v5 w 𝑿 = 𝑿′ + 𝑼 𝒀 = 𝒀′ + 𝑼
  • 20. IIIb. Placebo (“A/A”) test Simulate a world where 𝐶𝑎𝑢𝑠𝑒 does not affect 𝑂𝑢𝑡𝑐𝑜𝑚𝑒. Replace 𝐶𝑎𝑢𝑠𝑒 by a randomly generated variable in the dataset. • Rerun analysis, expect causal effect to go to zero. Cause v1,v2, U Outcome v3 v5 w
  • 21. IIIc. Subsampling test Can also test statistical robustness. E.g., Remove a random subset of the data. • Rerun analysis, expect no change in the causal effect. Input Data Input DataInput Data
  • 22. Summary: DoWhy, an end-to-end library for causal inference Test assumptions as far as possible • Make assumptions explicit through a Bayesian network. • Test assumptions from observed data [Sharma 2018, Arxiv]. Assess sensitivity to untested assumptions • When tests are inconclusive, assess sensitivity of causal estimate to violation of assumptions [Sharma et al. 2018, Annals of Applied Statistics]. Unify best practices from different scientific fields • Unify different frameworks from computer science and statistics (“graphs and potential outcomes”) [Kiciman & Sharma 2018, KDD Tutorial].
  • 23. Thank you! Resources • KDD 2018 tutorial on causal inference • https://causalinference.gitlab.io/kdd-tutorial/ • Upcoming book on “Causal Reasoning: Fundamentals and ML Applications” • https://causalinference.gitlab.io/ • DoWhy • Code: https://github.com/microsoft/dowhy • Docs: https://microsoft.github.io/dowhy/ Amit Sharma Microsoft Research India @amt_shrma

Editor's Notes

  1. So this tutorial is going to be about how to get better at it. Suppose a simple world. If you believe everything relevant is captured, go for prediction and you should be fine. But if not, need to understand causal factors. And btw, we will also learn that regression is one of the worst methods because the world is almost never linear. One of the reasons is that while we have got pretty good at processing terabytes of data, causal inference methods haven’t caught up. The example I like to think of is fundamentally the difference between prediction and causation, as I describe in a recent paper in Science. Suppose there are two variables and for simplicity, you believe that the true model of the world is given by beta x. But we know nothing about the error eta, it may not even be independent. Prediction is what big data is often used for…so if we have a big dataset with two variables X and Y, , we can simply feed it to a machine learning algorithm to get reasonable accurate predictions for y. However, often the most interesting questions are of a causal nature---does X cause Y---and here it is not entirely clear how to use big data:
  2. So let’s first look at how someone at Office or in biomedical scientist would have run a causal analysis, before DoWhy. Finally *think of* how to check robustness of estimate.
  3. What DoWhy does is that it implements the hard parts of all three steps, leading to an easy interface. The user still has to provide domain knowledge, that’s a harder problem we don’t solve yet..but we want to..
  4. Once it has done that, the second step is straightforward.
  5. As an example of the first two steps While both of these are prior work, they are actually parts of two different frameworks that are not used together. DoWhy combines them and
  6. All this is good, but..
  7. Test the sensitivity of the estimate as causal assumptions are violated. Test like a scientific theory
  8. And how is dowhy able to implement the full workflow of causal inference ?Dowhy consider assumptions its first class citizens. A user is encouraged to think more about their domain assumptions, than the methods. And at the backend, DoWhy uses our recent research to test those assumptions as far as possible. -- Causal inference methods depend critically on assumptions Vast, contradictory literature Assumptions are often not explicit, masked in “statistical assumptions” What happens when the method’s assumptions fail? Causal analysis restricted to experts in causal inference or statistics