SlideShare a Scribd company logo
1 of 38
Download to read offline
DoWhy: An End-to-End Library for
Causal Inference
Amit Sharma (@amt_shrma), Emre Kıcıman (@emrek)
Microsoft Research
Causal ML group @ Microsoft:
https://www.microsoft.com/en-us/research/group/causal-inference/
Code: https://github.com/microsoft/dowhy
Paper: https://arxiv.org/abs/2011.04216
From prediction to decision-making
Decision-making: Acting/intervening based on analysis
• Interventions break correlations used by conventional ML
• The feature with the highest importance score in a prediction model,
• Need not be the best feature to act on
• May not even affect the outcome at all!
For decision-making, need to find the features that cause the outcome &
estimate how the outcome would change if the features are changed.
Many data science Qs are causal Qs
• A/B experiments: If I change the algorithm, will it lead to a higher
success rate?
• Policy decisions: If we adopt this treatment/policy, will it lead to a
healthier patient/more revenue/etc.?
• Policy evaluation: knowing what I know now, did my policy help or
hurt?
• Credit attribution: are people buying because of the
recommendation algorithm? Would they have bought anyway?
In fact, causal questions form the basis of almost all scientific inquiry
Prediction Causation
Assume:
𝑃𝑡𝑟𝑎𝑖𝑛 𝑊, 𝐴, 𝑌 = 𝑃𝑡𝑒𝑠𝑡(𝑊, 𝐴, 𝑌)
Estimate: min 𝐿( 𝑦, 𝑦)
Evaluate: Cross-validation
Assume:
𝑃𝑡𝑟𝑎𝑖𝑛 𝑊, 𝐴, 𝑌 ≠ 𝑃𝑡𝑒𝑠𝑡(𝑊, 𝐴, 𝑌)
Two Fundamental Challenges for Causal Inference
Multiple causal mechanisms and estimates
can fit the same data distribution.
Estimation about different data
distributions than the training distribution
(no easy “cross-validation”).
1. Assumptions
2. Evaluation
Real World: do(A=1) Counterfactual World: do(A=0)
We built DoWhy library to make assumptions front-
and-center of any causal analysis.
- Transparent declaration of assumptions
- Evaluation of those assumptions, to the extent possible
Most popular causal library on GitHub (>2.4k stars, 300+ forks)
Taught in third-party tutorials and courses: O’Reilly, PyData, Northeastern, …
An end-to-end platform for doing causal inference
EconML,
CausalML,
CausalImpact,
tmle,…
Formulate correct
estimand based on
causal assumptions?
Estimate causal
effect
Check
robustness?
Input Data
<action, outcome,
other variables>
Domain Knowledge
Causal
Estimate
Model causal
mechanisms
•Construct a
causal
graph
based on
domain
knowledge
Identify the
target estimand
•Formulate
correct
estimand
based on
the causal
model
Estimate causal
effect
•Use a
suitable
method to
estimate
effect
Refute estimate
•Check
robustness
of estimate
to
assumption
violations
Input Data
<action, outcome,
other variables>
Domain Knowledge
Causal
effect
DoWhy
Action
w
Outcome
v3 v5
v1,v2
DoWhy provides a general API for the four
steps of causal inference
1. Modeling: Create a causal graph to encode assumptions.
2. Identification: Formulate what to estimate.
3. Estimation: Compute the estimate.
4. Refutation: Validate the assumptions.
We’ll discuss the four steps and show a code example using DoWhy.
I. Model the assumptions using a causal graph
Convert domain knowledge to a formal
model of causal assumptions
• 𝐴 → 𝐵 or 𝐵 → 𝐴?
• Causal graph implies conditional statistical
independences
• E.g., 𝐴 ⫫ 𝐶, 𝐷 ⫫ A | B, …
• Identified by d-separation rules [Pearl 2009]
• These assumptions significantly impact the
causal estimate we’ll obtain.
A
C
B D
Key intuitions about causal graphs
• Assumptions are encoded by missing edges, and direction of edges
• Relationships represent stable and independent mechanisms
• Graph cannot be learnt from data alone
• Graphs are a tool to help us reason about a specific problem
Example Graph
Assumption 1: User fatigue does not
affect user interests
Assumption 2: Past clicks do not
directly affect outcome
Assumption 3: Treatment does not
affect user fatigue.
..and so on.
User
Interests
YT
User
Fatigue
Past Clicks
Intervention is represented by a new graph
User
Interests
YT
User
Fatigue
Past Likes
YTYT
Want to answer questions about data that
will be generated by intervention graph
Observed data generated
by this graph
II. Identification: Formulate desired quantity
and check if it is estimable from given data
Trivial Example: Randomized Experiments
• Observed graph is same as intervention graph
in randomized experiment!
• Treatment 𝑇 is already generated independent of
all other features
•  𝑃 𝑌 𝑑𝑜 𝑇 = 𝑃(𝑌|𝑇)
• Intuition: Generalize by simulating randomized
experiment
• When treatment T is caused by other features, 𝑍,
adjust for their influence to simulate a randomized
experiment
YT
Adjustment Formula and Adjustment Sets
Adjustment formula
𝑝 𝑌 𝑑𝑜 𝑇 =
𝑍
𝑝 𝑌 𝑇, 𝑍 𝑝(𝑍)
Where 𝑍 must be a valid adjustment set:
- The set of all parents of 𝑇
- Features identified via backdoor criterion
- Features identified via “towards necessity” criterion
Intuitions:
- The union of all features is not necessarily a valid adjustment set
- Why not always use parents? Sometimes parent features are unobserved
Many kinds of identification methods
Graphical constraint-based
methods
• Randomized and natural
experiments
• Adjustment Sets
• Backdoor, “towards necessity”
• Front-door criterion
• Mediation formula
Identification under additional
non-graphical constraints
• Instrumental variables
• Regression discontinuity
• Difference-in-differences
Many of these methods can be used through DoWhy.
III. Estimation: Compute the causal effect
Estimation uses observed data to compute the target
probability expression from the Identification step.
For common identification strategies using adjustment sets,
𝐸[𝑌|𝑑𝑜 𝑇 = 𝑡 , 𝑊 = 𝑤]= 𝐸 𝑌 𝑇 = 𝑡, 𝑊 = 𝑤
assuming W is a valid adjustment set.
• For binary treatment,
Causal Effect = 𝐸 𝑌 𝑇 = 1, 𝑊 = 𝑤 − 𝐸 𝑌 𝑇 = 0, 𝑊 = 𝑤
Goal: Estimating conditional probability Y|T=t when all
confounders W are kept constant.
Control Treatment (Cycling)
Simple Matching: Match data points with the same
confounders and then compare their outcomes
Simple Matching: Match data points with the same
confounders and then compare their outcomes
Identify pairs of treated (𝑗) and
untreated individuals (𝑘) who are
similar or identical to each other.
Match := 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑊𝑗, 𝑊𝑘 < 𝜖
• Paired individuals have almost
the same confounders.
Causal Effect =
𝑗,𝑘 ∈𝑀𝑎𝑡𝑐ℎ(𝑦𝑗 − 𝑦 𝑘)
:
:
:
:
:
:
:
:
:
Challenges of building a good estimator
• Variance: If we have a stringent matching criterion, we may obtain
very few matches and the estimate will be unreliable.
• Bias: If we relax the matching criterion, we obtain many more
matches but now the estimate does not capture the target estimand.
• Uneven treatment assignment: If very few people have treatment,
leads to both high bias and variance.
Need better methods to navigate the bias-variance tradeoff.
Machine learning methods can help find a
better match for each data point
Synthetic Control: If a good match
does not exist for a data point, can we
create it synthetically?
Learn 𝑦 = 𝑓𝑡=0 𝑤 ,
𝑦 = 𝑓𝑡=1(𝑤)
Assuming f approximates the true
relationship between 𝑌 and 𝑊,
Causal Effect =
𝑖
𝑡𝑖(𝑦𝑖 − 𝑓𝑡=0(𝑤𝑖)) + (1 − 𝑡𝑖)(𝑓𝑡=1 𝑤𝑖 − 𝑦𝑖)
Confounder (W)
Outcome(Y)
Confounder
T=1
T=0
Other ML methods generalize estimation to
continuous treatments
The standard predictor, 𝑦 = 𝑓 𝑡, 𝑤 + 𝜖
may not provide the right estimate for
𝜕𝑦
𝜕𝑡
.
Double-ML [Chernozhukov et al. 2016]:
• Stage 1: Break down conditional
estimation into two prediction sub-tasks.
𝑦 = 𝑔 𝑤 + 𝑦
𝑡 = ℎ 𝑤 + 𝑡
𝑦 and 𝑡 refer to the unconfounded variation in
𝑌 and 𝑇 respectively after conditioning on w.
• Stage 2: A final regression of 𝑦 on 𝑡 gives
the causal effect.
𝑦~ 𝛽 𝑡 + 𝜖
Outcome(Y)
Confounder (W)
Treatment(T)
Confounder (W)
Residual Treatment ( 𝑇)
ResidualOutcome(𝑌)
Depending on the dataset properties,
different estimation methods can be used
Simple Conditioning
• Matching
• Stratification
Propensity Score-Based [Rubin 1983]
• Propensity Matching
• Inverse Propensity Weighting
Synthetic Control [Abadie et al.]
Outcome-based
• Double ML [Chernozhukov et al. 2016]
• T-learner
• X-learner [Kunzel et al. 2017]
Loss-Based
• R-learner [Nie & Wager 2017]
Threshold-based
• Difference-in-differences
All these methods can be called through DoWhy.
(directly or through the Microsoft EconML library)
IV. Robustness Checks: Test robustness of
obtained estimate to violation of assumptions
Obtained estimate depends on many (untestable) assumptions.
Model:
Did we miss any unobserved variables in the assumed graph?
Did we miss any edge between two variables in the assumed graph?
Identify:
Did we make any parametric assumption for deriving the estimand?
Estimate:
Is the assumed functional form sufficient for capturing the variation in
data?
Do the estimator assumptions lead to high variance?
Best practice: Do refutation/robustness tests
for as many assumptions as possible
UNIT TESTS
Model:
• Conditional Independence Test
Identify:
• D-separation Test
Estimate:
• Bootstrap Refuter
• Data Subset Refuter
INTEGRATION TESTS
Test all steps at once.
• Placebo Treatment Refuter
• Dummy Outcome Refuter
• Random Common Cause Refuter
• Sensitivity Analysis
• Simulated Outcome Refuter
/Synth-validation [Schuler et al. 2017]
All these refutation methods are implemented in DoWhy.
Caveat: They can refute a given analysis, but cannot prove its correctness.
Example 1: Conditional Independence Refuter
Through its edges, each causal graph
implies certain conditional independence
constraints on its nodes. [d-separation, Pearl
2009]
Model refutation: Check if the observed
data satisfies the assumed model’s
independence constraints.
• Use an appropriate statistical test for
independence [Heinze-Demel et al. 2018].
• If not, the model is incorrect.
W
YT
A B
Conditional Independencies:
𝐴⫫𝐵 𝐴⫫T|W 𝐵⫫ T|W
Example 1: Placebo Treatment (“A/A”) Refuter
Q: What if we can generate a dataset where
the treatment does not cause the outcome?
Then a correct causal inference method should
return an estimate of zero.
Placebo Treatment Refuter:
Replace treatment variable T by a randomly
generated variable (e.g., Gaussian).
• Rerun the causal inference analysis.
• If the estimate is significantly away from zero,
then analysis is incorrect.
W
YT
W
YT
?
Original
Treatment
“Placebo”
Treatment
Example 2: Add Unobserved Confounder to
check sensitivity of an estimate
Q: What if there was an unobserved confounder
that was not included in the causal model?
Check how sensitive the obtained estimate is
after introducing a new confounder.
Unobserved Confounder Refuter:
• Simulate a confounder based on a given
correlation 𝜌 with both treatment and
outcome.
• Maximum Correlation 𝜌 is based on the maximum
correlation of any observed confounder.
• Re-run the analysis and check if the
sign/direction of estimate flips.
W
YT
Observed
Confounders
W
YT
U
Unobserved
Confounder
Walk-through of the 4 steps using
the DoWhy Python library
𝒙
𝒚
𝒙 𝒚 𝒘
A mystery problem of two correlated variables:
Does 𝒙 cause 𝒚?
You can try out this example on Github:
https://github.com/microsoft/dowhy/blob/master/docs/source/example_notebooks/dowhy_confounder_example.ipynb
What else is in DoWhy?
• A unified, extensible API for causal inference that allows external
implementations for the 4 steps
• Can use estimation methods from external libraries such as EconML and CausalML.
• A convenient CausalDataFrame (contributed by Adam Kelleher)
• Pandas DataFrame extension with inbuilt functions for calculating causal effect.
Summary: DoWhy, a library that focuses on
causal assumptions and their validation
Growing open-source community: > 30 contributors
• Roadmap: More powerful refutation tests, counterfactual prediction.
• Please contribute! Would love to hear your ideas on Github.
Resources
 DoWhy Library: https://github.com/microsoft/dowhy
 Arxiv paper on the four steps: https://arxiv.org/abs/2011.04216
 Upcoming book on causality and ML: http://causalinference.gitlab.io/
Goal: A unified API for causal inference problems, just like
PyTorch or Tensorflow for predictive ML.
thank you– Amit Sharma
(@amt_shrma)

More Related Content

What's hot

[PAP] 실무자를 위한 인과추론 활용 : Best Practices
[PAP] 실무자를 위한 인과추론 활용 : Best Practices[PAP] 실무자를 위한 인과추론 활용 : Best Practices
[PAP] 실무자를 위한 인과추론 활용 : Best PracticesBokyung Choi
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
Why start using uplift models for more efficient marketing campaigns
Why start using uplift models for more efficient marketing campaignsWhy start using uplift models for more efficient marketing campaigns
Why start using uplift models for more efficient marketing campaignsData Con LA
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for RecommendationOlivier Jeunen
 
Causal Inference : Primer (2019-06-01 잔디콘)
Causal Inference : Primer (2019-06-01 잔디콘)Causal Inference : Primer (2019-06-01 잔디콘)
Causal Inference : Primer (2019-06-01 잔디콘)Minho Lee
 
[팝콘 시즌1] 이윤희 : 다짜고짜 배워보는 인과추론
[팝콘 시즌1] 이윤희 : 다짜고짜 배워보는 인과추론[팝콘 시즌1] 이윤희 : 다짜고짜 배워보는 인과추론
[팝콘 시즌1] 이윤희 : 다짜고짜 배워보는 인과추론PAP (Product Analytics Playground)
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
 
올바른 분석을 방해하는 함정 카드 피해가기
올바른 분석을 방해하는 함정 카드 피해가기올바른 분석을 방해하는 함정 카드 피해가기
올바른 분석을 방해하는 함정 카드 피해가기Minho Lee
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsT212
 
실리콘 밸리 데이터 사이언티스트의 하루
실리콘 밸리 데이터 사이언티스트의 하루실리콘 밸리 데이터 사이언티스트의 하루
실리콘 밸리 데이터 사이언티스트의 하루Jaimie Kwon (권재명)
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisitedXavier Amatriain
 
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22GiacomoBalloccu
 
프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트
프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트
프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트Minho Lee
 
Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksZimin Park
 
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...Anmol Bhasin
 
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들Minho Lee
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 

What's hot (20)

[PAP] 실무자를 위한 인과추론 활용 : Best Practices
[PAP] 실무자를 위한 인과추론 활용 : Best Practices[PAP] 실무자를 위한 인과추론 활용 : Best Practices
[PAP] 실무자를 위한 인과추론 활용 : Best Practices
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Why start using uplift models for more efficient marketing campaigns
Why start using uplift models for more efficient marketing campaignsWhy start using uplift models for more efficient marketing campaigns
Why start using uplift models for more efficient marketing campaigns
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
 
Causal Inference : Primer (2019-06-01 잔디콘)
Causal Inference : Primer (2019-06-01 잔디콘)Causal Inference : Primer (2019-06-01 잔디콘)
Causal Inference : Primer (2019-06-01 잔디콘)
 
[팝콘 시즌1] 이윤희 : 다짜고짜 배워보는 인과추론
[팝콘 시즌1] 이윤희 : 다짜고짜 배워보는 인과추론[팝콘 시즌1] 이윤희 : 다짜고짜 배워보는 인과추론
[팝콘 시즌1] 이윤희 : 다짜고짜 배워보는 인과추론
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
올바른 분석을 방해하는 함정 카드 피해가기
올바른 분석을 방해하는 함정 카드 피해가기올바른 분석을 방해하는 함정 카드 피해가기
올바른 분석을 방해하는 함정 카드 피해가기
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
실리콘 밸리 데이터 사이언티스트의 하루
실리콘 밸리 데이터 사이언티스트의 하루실리콘 밸리 데이터 사이언티스트의 하루
실리콘 밸리 데이터 사이언티스트의 하루
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
 
프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트
프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트
프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트
 
Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networks
 
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
 
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 

Similar to Dowhy: An end-to-end library for causal inference

Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxnagarajan740445
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptxhiblooms
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3Luis Borbon
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Frank Kienle
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsJen Stirrup
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxJishanAhmed24
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 Georgina Tilby
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachAjit Ghodke
 
Unit-V Machine Learning.ppt
Unit-V Machine Learning.pptUnit-V Machine Learning.ppt
Unit-V Machine Learning.pptSharpmark256
 
PG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisPG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisAashish Patel
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationScientificRevenue
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
 

Similar to Dowhy: An end-to-end library for causal inference (20)

Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptx
 
Machine learning
Machine learningMachine learning
Machine learning
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
 
'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015 'A critique of testing' UK TMF forum January 2015
'A critique of testing' UK TMF forum January 2015
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approach
 
Unit-V Machine Learning.ppt
Unit-V Machine Learning.pptUnit-V Machine Learning.ppt
Unit-V Machine Learning.ppt
 
PG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisPG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data Analysis
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous Optimization
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 

More from Amit Sharma

Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAmit Sharma
 
The Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practiceThe Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practiceAmit Sharma
 
Artificial Intelligence for Societal Impact
Artificial Intelligence for Societal ImpactArtificial Intelligence for Societal Impact
Artificial Intelligence for Societal ImpactAmit Sharma
 
Measuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsMeasuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsAmit Sharma
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleAmit Sharma
 
Auditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographicsAuditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographicsAmit Sharma
 
Causal inference in data science
Causal inference in data scienceCausal inference in data science
Causal inference in data scienceAmit Sharma
 
Causal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesCausal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesAmit Sharma
 
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesEquivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesAmit Sharma
 
Estimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsEstimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsAmit Sharma
 
Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Amit Sharma
 
Data mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comData mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comAmit Sharma
 
Estimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsEstimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsAmit Sharma
 
From prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsFrom prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsAmit Sharma
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practiceAmit Sharma
 
Causal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereCausal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereAmit Sharma
 
The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...Amit Sharma
 
The role of social connections in shaping our preferences
The role of social connections in shaping our preferencesThe role of social connections in shaping our preferences
The role of social connections in shaping our preferencesAmit Sharma
 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...Amit Sharma
 
RSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationRSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationAmit Sharma
 

More from Amit Sharma (20)

Alleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal ModelsAlleviating Privacy Attacks Using Causal Models
Alleviating Privacy Attacks Using Causal Models
 
The Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practiceThe Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practice
 
Artificial Intelligence for Societal Impact
Artificial Intelligence for Societal ImpactArtificial Intelligence for Societal Impact
Artificial Intelligence for Societal Impact
 
Measuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsMeasuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systems
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scale
 
Auditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographicsAuditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographics
 
Causal inference in data science
Causal inference in data scienceCausal inference in data science
Causal inference in data science
 
Causal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesCausal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practices
 
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesEquivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
 
Estimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsEstimating the causal impact of recommender systems
Estimating the causal impact of recommender systems
 
Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...
 
Data mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comData mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.com
 
Estimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsEstimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actions
 
From prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsFrom prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systems
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practice
 
Causal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereCausal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhere
 
The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...
 
The role of social connections in shaping our preferences
The role of social connections in shaping our preferencesThe role of social connections in shaping our preferences
The role of social connections in shaping our preferences
 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
 
RSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationRSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendation
 

Recently uploaded

Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...jana861314
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
Understanding Nutrition, 16th Edition pdf
Understanding Nutrition, 16th Edition pdfUnderstanding Nutrition, 16th Edition pdf
Understanding Nutrition, 16th Edition pdfHabibouKarbo
 
Role of Gibberellins, mode of action and external applications.pptx
Role of Gibberellins, mode of action and external applications.pptxRole of Gibberellins, mode of action and external applications.pptx
Role of Gibberellins, mode of action and external applications.pptxjana861314
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 
Food_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiologyFood_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiologyHemantThakare8
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
HEMATOPOIESIS - formation of blood cells
HEMATOPOIESIS - formation of blood cellsHEMATOPOIESIS - formation of blood cells
HEMATOPOIESIS - formation of blood cellsSachinSuresh44
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 
Harry Coumnas Thinks That Human Teleportation May Ensure Humanity's Survival
Harry Coumnas Thinks That Human Teleportation May Ensure Humanity's SurvivalHarry Coumnas Thinks That Human Teleportation May Ensure Humanity's Survival
Harry Coumnas Thinks That Human Teleportation May Ensure Humanity's Survivalkevin8smith
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasChayanika Das
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docx3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docxUlahVanessaBasa
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptAmirRaziq1
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsDanielBaumann11
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 

Recently uploaded (20)

Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
Understanding Nutrition, 16th Edition pdf
Understanding Nutrition, 16th Edition pdfUnderstanding Nutrition, 16th Edition pdf
Understanding Nutrition, 16th Edition pdf
 
Role of Gibberellins, mode of action and external applications.pptx
Role of Gibberellins, mode of action and external applications.pptxRole of Gibberellins, mode of action and external applications.pptx
Role of Gibberellins, mode of action and external applications.pptx
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 
Food_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiologyFood_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiology
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
HEMATOPOIESIS - formation of blood cells
HEMATOPOIESIS - formation of blood cellsHEMATOPOIESIS - formation of blood cells
HEMATOPOIESIS - formation of blood cells
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 
Harry Coumnas Thinks That Human Teleportation May Ensure Humanity's Survival
Harry Coumnas Thinks That Human Teleportation May Ensure Humanity's SurvivalHarry Coumnas Thinks That Human Teleportation May Ensure Humanity's Survival
Harry Coumnas Thinks That Human Teleportation May Ensure Humanity's Survival
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docx3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docx
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.ppt
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Introduction Classification Of Alkaloids
Introduction Classification Of AlkaloidsIntroduction Classification Of Alkaloids
Introduction Classification Of Alkaloids
 

Dowhy: An end-to-end library for causal inference

  • 1. DoWhy: An End-to-End Library for Causal Inference Amit Sharma (@amt_shrma), Emre Kıcıman (@emrek) Microsoft Research Causal ML group @ Microsoft: https://www.microsoft.com/en-us/research/group/causal-inference/ Code: https://github.com/microsoft/dowhy Paper: https://arxiv.org/abs/2011.04216
  • 2. From prediction to decision-making Decision-making: Acting/intervening based on analysis • Interventions break correlations used by conventional ML • The feature with the highest importance score in a prediction model, • Need not be the best feature to act on • May not even affect the outcome at all! For decision-making, need to find the features that cause the outcome & estimate how the outcome would change if the features are changed.
  • 3. Many data science Qs are causal Qs • A/B experiments: If I change the algorithm, will it lead to a higher success rate? • Policy decisions: If we adopt this treatment/policy, will it lead to a healthier patient/more revenue/etc.? • Policy evaluation: knowing what I know now, did my policy help or hurt? • Credit attribution: are people buying because of the recommendation algorithm? Would they have bought anyway? In fact, causal questions form the basis of almost all scientific inquiry
  • 4. Prediction Causation Assume: 𝑃𝑡𝑟𝑎𝑖𝑛 𝑊, 𝐴, 𝑌 = 𝑃𝑡𝑒𝑠𝑡(𝑊, 𝐴, 𝑌) Estimate: min 𝐿( 𝑦, 𝑦) Evaluate: Cross-validation Assume: 𝑃𝑡𝑟𝑎𝑖𝑛 𝑊, 𝐴, 𝑌 ≠ 𝑃𝑡𝑒𝑠𝑡(𝑊, 𝐴, 𝑌)
  • 5. Two Fundamental Challenges for Causal Inference Multiple causal mechanisms and estimates can fit the same data distribution. Estimation about different data distributions than the training distribution (no easy “cross-validation”). 1. Assumptions 2. Evaluation Real World: do(A=1) Counterfactual World: do(A=0)
  • 6. We built DoWhy library to make assumptions front- and-center of any causal analysis. - Transparent declaration of assumptions - Evaluation of those assumptions, to the extent possible Most popular causal library on GitHub (>2.4k stars, 300+ forks) Taught in third-party tutorials and courses: O’Reilly, PyData, Northeastern, … An end-to-end platform for doing causal inference
  • 7. EconML, CausalML, CausalImpact, tmle,… Formulate correct estimand based on causal assumptions? Estimate causal effect Check robustness? Input Data <action, outcome, other variables> Domain Knowledge Causal Estimate
  • 8. Model causal mechanisms •Construct a causal graph based on domain knowledge Identify the target estimand •Formulate correct estimand based on the causal model Estimate causal effect •Use a suitable method to estimate effect Refute estimate •Check robustness of estimate to assumption violations Input Data <action, outcome, other variables> Domain Knowledge Causal effect DoWhy Action w Outcome v3 v5 v1,v2
  • 9. DoWhy provides a general API for the four steps of causal inference 1. Modeling: Create a causal graph to encode assumptions. 2. Identification: Formulate what to estimate. 3. Estimation: Compute the estimate. 4. Refutation: Validate the assumptions. We’ll discuss the four steps and show a code example using DoWhy.
  • 10. I. Model the assumptions using a causal graph Convert domain knowledge to a formal model of causal assumptions • 𝐴 → 𝐵 or 𝐵 → 𝐴? • Causal graph implies conditional statistical independences • E.g., 𝐴 ⫫ 𝐶, 𝐷 ⫫ A | B, … • Identified by d-separation rules [Pearl 2009] • These assumptions significantly impact the causal estimate we’ll obtain. A C B D
  • 11. Key intuitions about causal graphs • Assumptions are encoded by missing edges, and direction of edges • Relationships represent stable and independent mechanisms • Graph cannot be learnt from data alone • Graphs are a tool to help us reason about a specific problem
  • 12. Example Graph Assumption 1: User fatigue does not affect user interests Assumption 2: Past clicks do not directly affect outcome Assumption 3: Treatment does not affect user fatigue. ..and so on. User Interests YT User Fatigue Past Clicks
  • 13. Intervention is represented by a new graph User Interests YT User Fatigue Past Likes
  • 14. YTYT Want to answer questions about data that will be generated by intervention graph Observed data generated by this graph II. Identification: Formulate desired quantity and check if it is estimable from given data
  • 15. Trivial Example: Randomized Experiments • Observed graph is same as intervention graph in randomized experiment! • Treatment 𝑇 is already generated independent of all other features •  𝑃 𝑌 𝑑𝑜 𝑇 = 𝑃(𝑌|𝑇) • Intuition: Generalize by simulating randomized experiment • When treatment T is caused by other features, 𝑍, adjust for their influence to simulate a randomized experiment YT
  • 16. Adjustment Formula and Adjustment Sets Adjustment formula 𝑝 𝑌 𝑑𝑜 𝑇 = 𝑍 𝑝 𝑌 𝑇, 𝑍 𝑝(𝑍) Where 𝑍 must be a valid adjustment set: - The set of all parents of 𝑇 - Features identified via backdoor criterion - Features identified via “towards necessity” criterion Intuitions: - The union of all features is not necessarily a valid adjustment set - Why not always use parents? Sometimes parent features are unobserved
  • 17. Many kinds of identification methods Graphical constraint-based methods • Randomized and natural experiments • Adjustment Sets • Backdoor, “towards necessity” • Front-door criterion • Mediation formula Identification under additional non-graphical constraints • Instrumental variables • Regression discontinuity • Difference-in-differences Many of these methods can be used through DoWhy.
  • 18. III. Estimation: Compute the causal effect Estimation uses observed data to compute the target probability expression from the Identification step. For common identification strategies using adjustment sets, 𝐸[𝑌|𝑑𝑜 𝑇 = 𝑡 , 𝑊 = 𝑤]= 𝐸 𝑌 𝑇 = 𝑡, 𝑊 = 𝑤 assuming W is a valid adjustment set. • For binary treatment, Causal Effect = 𝐸 𝑌 𝑇 = 1, 𝑊 = 𝑤 − 𝐸 𝑌 𝑇 = 0, 𝑊 = 𝑤 Goal: Estimating conditional probability Y|T=t when all confounders W are kept constant.
  • 19. Control Treatment (Cycling) Simple Matching: Match data points with the same confounders and then compare their outcomes
  • 20. Simple Matching: Match data points with the same confounders and then compare their outcomes Identify pairs of treated (𝑗) and untreated individuals (𝑘) who are similar or identical to each other. Match := 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑊𝑗, 𝑊𝑘 < 𝜖 • Paired individuals have almost the same confounders. Causal Effect = 𝑗,𝑘 ∈𝑀𝑎𝑡𝑐ℎ(𝑦𝑗 − 𝑦 𝑘) : : : : : : : : :
  • 21. Challenges of building a good estimator • Variance: If we have a stringent matching criterion, we may obtain very few matches and the estimate will be unreliable. • Bias: If we relax the matching criterion, we obtain many more matches but now the estimate does not capture the target estimand. • Uneven treatment assignment: If very few people have treatment, leads to both high bias and variance. Need better methods to navigate the bias-variance tradeoff.
  • 22. Machine learning methods can help find a better match for each data point Synthetic Control: If a good match does not exist for a data point, can we create it synthetically? Learn 𝑦 = 𝑓𝑡=0 𝑤 , 𝑦 = 𝑓𝑡=1(𝑤) Assuming f approximates the true relationship between 𝑌 and 𝑊, Causal Effect = 𝑖 𝑡𝑖(𝑦𝑖 − 𝑓𝑡=0(𝑤𝑖)) + (1 − 𝑡𝑖)(𝑓𝑡=1 𝑤𝑖 − 𝑦𝑖) Confounder (W) Outcome(Y) Confounder T=1 T=0
  • 23. Other ML methods generalize estimation to continuous treatments The standard predictor, 𝑦 = 𝑓 𝑡, 𝑤 + 𝜖 may not provide the right estimate for 𝜕𝑦 𝜕𝑡 . Double-ML [Chernozhukov et al. 2016]: • Stage 1: Break down conditional estimation into two prediction sub-tasks. 𝑦 = 𝑔 𝑤 + 𝑦 𝑡 = ℎ 𝑤 + 𝑡 𝑦 and 𝑡 refer to the unconfounded variation in 𝑌 and 𝑇 respectively after conditioning on w. • Stage 2: A final regression of 𝑦 on 𝑡 gives the causal effect. 𝑦~ 𝛽 𝑡 + 𝜖 Outcome(Y) Confounder (W) Treatment(T) Confounder (W) Residual Treatment ( 𝑇) ResidualOutcome(𝑌)
  • 24. Depending on the dataset properties, different estimation methods can be used Simple Conditioning • Matching • Stratification Propensity Score-Based [Rubin 1983] • Propensity Matching • Inverse Propensity Weighting Synthetic Control [Abadie et al.] Outcome-based • Double ML [Chernozhukov et al. 2016] • T-learner • X-learner [Kunzel et al. 2017] Loss-Based • R-learner [Nie & Wager 2017] Threshold-based • Difference-in-differences All these methods can be called through DoWhy. (directly or through the Microsoft EconML library)
  • 25. IV. Robustness Checks: Test robustness of obtained estimate to violation of assumptions Obtained estimate depends on many (untestable) assumptions. Model: Did we miss any unobserved variables in the assumed graph? Did we miss any edge between two variables in the assumed graph? Identify: Did we make any parametric assumption for deriving the estimand? Estimate: Is the assumed functional form sufficient for capturing the variation in data? Do the estimator assumptions lead to high variance?
  • 26. Best practice: Do refutation/robustness tests for as many assumptions as possible UNIT TESTS Model: • Conditional Independence Test Identify: • D-separation Test Estimate: • Bootstrap Refuter • Data Subset Refuter INTEGRATION TESTS Test all steps at once. • Placebo Treatment Refuter • Dummy Outcome Refuter • Random Common Cause Refuter • Sensitivity Analysis • Simulated Outcome Refuter /Synth-validation [Schuler et al. 2017] All these refutation methods are implemented in DoWhy. Caveat: They can refute a given analysis, but cannot prove its correctness.
  • 27. Example 1: Conditional Independence Refuter Through its edges, each causal graph implies certain conditional independence constraints on its nodes. [d-separation, Pearl 2009] Model refutation: Check if the observed data satisfies the assumed model’s independence constraints. • Use an appropriate statistical test for independence [Heinze-Demel et al. 2018]. • If not, the model is incorrect. W YT A B Conditional Independencies: 𝐴⫫𝐵 𝐴⫫T|W 𝐵⫫ T|W
  • 28. Example 1: Placebo Treatment (“A/A”) Refuter Q: What if we can generate a dataset where the treatment does not cause the outcome? Then a correct causal inference method should return an estimate of zero. Placebo Treatment Refuter: Replace treatment variable T by a randomly generated variable (e.g., Gaussian). • Rerun the causal inference analysis. • If the estimate is significantly away from zero, then analysis is incorrect. W YT W YT ? Original Treatment “Placebo” Treatment
  • 29. Example 2: Add Unobserved Confounder to check sensitivity of an estimate Q: What if there was an unobserved confounder that was not included in the causal model? Check how sensitive the obtained estimate is after introducing a new confounder. Unobserved Confounder Refuter: • Simulate a confounder based on a given correlation 𝜌 with both treatment and outcome. • Maximum Correlation 𝜌 is based on the maximum correlation of any observed confounder. • Re-run the analysis and check if the sign/direction of estimate flips. W YT Observed Confounders W YT U Unobserved Confounder
  • 30. Walk-through of the 4 steps using the DoWhy Python library
  • 31. 𝒙 𝒚 𝒙 𝒚 𝒘 A mystery problem of two correlated variables: Does 𝒙 cause 𝒚?
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. You can try out this example on Github: https://github.com/microsoft/dowhy/blob/master/docs/source/example_notebooks/dowhy_confounder_example.ipynb
  • 37. What else is in DoWhy? • A unified, extensible API for causal inference that allows external implementations for the 4 steps • Can use estimation methods from external libraries such as EconML and CausalML. • A convenient CausalDataFrame (contributed by Adam Kelleher) • Pandas DataFrame extension with inbuilt functions for calculating causal effect.
  • 38. Summary: DoWhy, a library that focuses on causal assumptions and their validation Growing open-source community: > 30 contributors • Roadmap: More powerful refutation tests, counterfactual prediction. • Please contribute! Would love to hear your ideas on Github. Resources  DoWhy Library: https://github.com/microsoft/dowhy  Arxiv paper on the four steps: https://arxiv.org/abs/2011.04216  Upcoming book on causality and ML: http://causalinference.gitlab.io/ Goal: A unified API for causal inference problems, just like PyTorch or Tensorflow for predictive ML. thank you– Amit Sharma (@amt_shrma)