DoWhy Python library for causal inference: An End-to-End tool

DoWhy: An end-to-end
library for causal inference
Amit Sharma (@amt_shrma), Emre Kiciman (@emrek)
Microsoft Research
A big thanks to Adam Kelleher, Tanmay Kulkarni and many other open-
source contributors!
https://github.com/microsoft/dowhy

Prediction Causation
Assume:
𝑃𝑡𝑟𝑎𝑖𝑛 𝑊, 𝑋, 𝑌 = 𝑃𝑡𝑒𝑠𝑡(𝑊, 𝑋, 𝑌)
Estimate: min 𝐿( 𝑦, 𝑦)
Evaluate: Cross-validation

Fundamental problem with causal inference
• Causal inference concerns estimation about different data
distributions than the training distribution
• What if 𝑥 is changed to different value?
• How do the results change for a different sample of people?
• What if a particular algorithm is changed in a system?
• Often, no data is available for that distribution
• Cross-validation is not possible
• 𝛽is not observed, unlike 𝑦.
• 𝑦 is observed for training domain, but not for a new domain.

Estimation about different data
distributions than the training distribution.
Often, no data is available for that
distribution.
1. Assumptions
2. Evaluation

1. Assumptions drive causal inference
• Causal inference methods depend on untestable assumptions.
• Even with large-scale data, the final estimate can be heavily sensitive to
those assumptions.
• Important to transparently communicate those assumptions.
Item 1
Demand for
items
Item 2
Recommendation
System

2. Causal estimates are hard to validate
• Cannot compare two causal
estimates on the same dataset
• Q: Is algorithm B better than the
production algorithm A?
• Cannot tell without doing an A/B
test.
• Let alone compare two estimates
from two different datasets
• Everyone prefers their own
favorite methods
• Need objective metrics to
validate causal estimates

The effect of online advertising on sales is
20% (std error=5)
What assumptions went
in the analysis?
How would it change if
one of the assumptions
was incorrect?
Is it robust to seasonal
shifts in behavior?
What is the expected
error in this estimate?

We built DoWhy to make assumptions front-and-
center of any causal analysis.
- Transparent declaration of assumptions
- Evaluation of those assumptions, to the extent possible
An end-to-end platform for doing causal inference

Formulate correct
estimand
Estimate causal
effect
Check robustness
Input Data
<cause, outcome,
other variables>
Domain Knowledge
Causal
effect
CausalImpact, tmle,
causaleffect,…

Formulate correct
estimand
• Check
data with
properties
implied
by the
model
Estimate causal
effect
• Use a
suitable
method
to
estimate
effect.
Check robustness
• Refute
obtained
estimate
through
multiple
tests.
Input Data
<cause, outcome,
other variables>
Cause
v1,v2
Outcome
v3 v5
w
Domain Knowledge
Causal
effect
DoWhy

Making Assumptions Transparent
Testing those assumptions

Code demo
https://github.com/microsoft/dowhy/blob/master/docs/source/example
_notebooks/dowhy_confounder_example.ipynb

DoWhy encodes the four steps of causal
reasoning
1. Modeling: Create a causal graph to encode assumptions
2. Identification: Formulate what to estimate
3. Estimation: Compute the estimate
4. Refutation: Validate the assumptions

I. Identification: Formulate correct estimand
1. Constructs causal Bayesian network from user-provided
knowledge.
• Check whether the data satisfies the Bayesian network’s
assumptions.
2. Tries out different techniques for identifying a causal effect and
check which ones are feasible.
• Back-door criterion [Pearl 2000]
• Instrumental variable [Wright 1928, Angrist and Pischke 1991]
3. Provides “what to estimate”: a target estimand for causal
effect.

II. Estimation: Compute the causal effect
Uses well-known techniques for causal inference.
Based on the estimand from Formulation step,
implements multiple methods including,
• Stratification
• Propensity score matching,
• Inverse propensity weighting,
• Natural experiments
• Conditional treatment effect estimators from
microsoft/EconML library.

Cause
v1,v2
Outcome
v3
v5
w
I. Formulate estimand
Find variables that “d-separate”
cause and outcome.
II. Estimate causal effect
Estimate as the observed effect
conditioned on the back-door
variables.
Cause
v1,v2
Outcome
v3
v5
w
Cause
v1,v2
Outcome
Input
Causal graph
𝑪𝒂𝒖𝒔𝒆, 𝑶𝒖𝒕, 𝑣1, 𝑣2,
𝑣3, 𝑣4, 𝑣5, 𝑤
𝑶𝒖𝒕 ⫫ 𝑪 𝑣1, 𝑣2] 𝐺¬𝑐→𝑂𝑢𝑡
𝑷(𝑶𝒖𝒕|𝒅𝒐 𝒄 )
=
𝑣 𝑖
𝑃 𝑂𝑢𝑡 𝑐, 𝑣𝑖 𝑃(𝑣𝑖)
COMPUTER SCIENCE
Do-calculus (Pearl 2001)
STATISTICS
Potential Outcomes (Rubin 1984)
𝐸 𝑶𝒖𝒕 𝒄 = 𝟏, 𝑣𝑖
− 𝐸 𝑶𝒖𝒕 𝒄 = 𝟎, 𝑣𝑖

What if the user
forgot to add an important variable to the
graph, or
did not even know about a confounder?
!

III. Refutation/Validation: Test robustness of
obtained estimate
Cause
v1,v2
Outcome
v3
v5
w
Input
Data
Cause
v1,v2, U
Outcome
v3
v5
w
Input
Data
Many “automatic” validation tests: Dummy Outcome test, Placebo test,
Subsample test, Add-unobserved-confounder test,

IIIa. Adding New Confounders
Add a variable 𝑼 that causes both 𝐶𝑎𝑢𝑠𝑒 and
𝑂𝑢𝑡𝑐𝑜𝑚𝑒.
1. 𝑼 is randomly generated.
• Rerun analysis, expect no change in causal effect.
2. 𝑼 is generated to have a correlation 𝜌 with 𝐶𝑎𝑢𝑠𝑒
and 𝑂𝑢𝑡𝑐𝑜𝑚𝑒.
• Assess sensitivity: how fast does the new causal
estimate go to zero?
Cause
v1,v2, U
Outcome
v3
v5
w
𝑿 = 𝑿′
+ 𝑼
𝒀 = 𝒀′ + 𝑼

IIIb. Placebo (“A/A”) test
Simulate a world where 𝐶𝑎𝑢𝑠𝑒 does not affect
𝑂𝑢𝑡𝑐𝑜𝑚𝑒.
Replace 𝐶𝑎𝑢𝑠𝑒 by a randomly generated variable in
the dataset.
• Rerun analysis, expect causal effect to go
to zero.
Cause
v1,v2, U
Outcome
v3
v5
w

IIIc. Subsampling test
Can also test statistical robustness.
E.g., Remove a random subset of the data.
• Rerun analysis, expect no change in the causal effect.
Input Data
Input DataInput Data

Summary: DoWhy, an end-to-end library for
causal inference
Test assumptions as far as possible
• Make assumptions explicit through a Bayesian network.
• Test assumptions from observed data [Sharma 2018, Arxiv].
Assess sensitivity to untested assumptions
• When tests are inconclusive, assess sensitivity of causal estimate to violation
of assumptions [Sharma et al. 2018, Annals of Applied Statistics].
Unify best practices from different scientific fields
• Unify different frameworks from computer science and statistics (“graphs and
potential outcomes”) [Kiciman & Sharma 2018, KDD Tutorial].

Thank you!
Resources
• KDD 2018 tutorial on causal inference
• https://causalinference.gitlab.io/kdd-tutorial/
• Upcoming book on “Causal Reasoning: Fundamentals and ML Applications”
• https://causalinference.gitlab.io/
• DoWhy
• Code: https://github.com/microsoft/dowhy
• Docs: https://microsoft.github.io/dowhy/
Amit Sharma
Microsoft Research India
@amt_shrma

DoWhy Python library for causal inference: An End-to-End tool

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DoWhy Python library for causal inference: An End-to-End tool

Similar to DoWhy Python library for causal inference: An End-to-End tool (20)

More from Amit Sharma

More from Amit Sharma (20)

Recently uploaded

Recently uploaded (20)

DoWhy Python library for causal inference: An End-to-End tool

Editor's Notes