As computing systems are more frequently and more actively intervening in societally critical domains such as healthcare, education, and governance, it is critical to correctly predict and understand the causal effects of these interventions. Without an A/B test, conventional machine learning methods, built on pattern recognition and correlational analyses, are insufficient for causal reasoning.
Much like machine learning libraries have done for prediction, "DoWhy" is a Python library that aims to spark causal thinking and analysis. DoWhy provides a unified interface for causal inference methods and automatically tests many assumptions, thus making inference accessible to non-experts.
For a quick introduction to causal inference, check out amit-sharma/causal-inference-tutorial. We also gave a more comprehensive tutorial at the ACM Knowledge Discovery and Data Mining (KDD 2018) conference: causalinference.gitlab.io/kdd-tutorial.
Identifying Appropriate Test Statistics Involving Population Mean
DoWhy Python library for causal inference: An End-to-End tool
1. DoWhy: An end-to-end
library for causal inference
Amit Sharma (@amt_shrma), Emre Kiciman (@emrek)
Microsoft Research
A big thanks to Adam Kelleher, Tanmay Kulkarni and many other open-
source contributors!
https://github.com/microsoft/dowhy
3. Fundamental problem with causal inference
• Causal inference concerns estimation about different data
distributions than the training distribution
• What if 𝑥 is changed to different value?
• How do the results change for a different sample of people?
• What if a particular algorithm is changed in a system?
• Often, no data is available for that distribution
• Cross-validation is not possible
• 𝛽is not observed, unlike 𝑦.
• 𝑦 is observed for training domain, but not for a new domain.
4. Estimation about different data
distributions than the training distribution.
Often, no data is available for that
distribution.
1. Assumptions
2. Evaluation
5. 1. Assumptions drive causal inference
• Causal inference methods depend on untestable assumptions.
• Even with large-scale data, the final estimate can be heavily sensitive to
those assumptions.
• Important to transparently communicate those assumptions.
Item 1
Demand for
items
Item 2
Recommendation
System
6. 2. Causal estimates are hard to validate
• Cannot compare two causal
estimates on the same dataset
• Q: Is algorithm B better than the
production algorithm A?
• Cannot tell without doing an A/B
test.
• Let alone compare two estimates
from two different datasets
• Everyone prefers their own
favorite methods
• Need objective metrics to
validate causal estimates
7. The effect of online advertising on sales is
20% (std error=5)
What assumptions went
in the analysis?
How would it change if
one of the assumptions
was incorrect?
Is it robust to seasonal
shifts in behavior?
What is the expected
error in this estimate?
8. We built DoWhy to make assumptions front-and-
center of any causal analysis.
- Transparent declaration of assumptions
- Evaluation of those assumptions, to the extent possible
An end-to-end platform for doing causal inference
10. Formulate correct
estimand
• Check
data with
properties
implied
by the
model
Estimate causal
effect
• Use a
suitable
method
to
estimate
effect.
Check robustness
• Refute
obtained
estimate
through
multiple
tests.
Input Data
<cause, outcome,
other variables>
Cause
v1,v2
Outcome
v3 v5
w
Domain Knowledge
Causal
effect
DoWhy
13. DoWhy encodes the four steps of causal
reasoning
1. Modeling: Create a causal graph to encode assumptions
2. Identification: Formulate what to estimate
3. Estimation: Compute the estimate
4. Refutation: Validate the assumptions
14. I. Identification: Formulate correct estimand
1. Constructs causal Bayesian network from user-provided
knowledge.
• Check whether the data satisfies the Bayesian network’s
assumptions.
2. Tries out different techniques for identifying a causal effect and
check which ones are feasible.
• Back-door criterion [Pearl 2000]
• Instrumental variable [Wright 1928, Angrist and Pischke 1991]
3. Provides “what to estimate”: a target estimand for causal
effect.
15. II. Estimation: Compute the causal effect
Uses well-known techniques for causal inference.
Based on the estimand from Formulation step,
implements multiple methods including,
• Stratification
• Propensity score matching,
• Inverse propensity weighting,
• Natural experiments
• Conditional treatment effect estimators from
microsoft/EconML library.
16. Cause
v1,v2
Outcome
v3
v5
w
I. Formulate estimand
Find variables that “d-separate”
cause and outcome.
II. Estimate causal effect
Estimate as the observed effect
conditioned on the back-door
variables.
Cause
v1,v2
Outcome
v3
v5
w
Cause
v1,v2
Outcome
Input
Causal graph
𝑪𝒂𝒖𝒔𝒆, 𝑶𝒖𝒕, 𝑣1, 𝑣2,
𝑣3, 𝑣4, 𝑣5, 𝑤
𝑶𝒖𝒕 ⫫ 𝑪 𝑣1, 𝑣2] 𝐺¬𝑐→𝑂𝑢𝑡
𝑷(𝑶𝒖𝒕|𝒅𝒐 𝒄 )
=
𝑣 𝑖
𝑃 𝑂𝑢𝑡 𝑐, 𝑣𝑖 𝑃(𝑣𝑖)
COMPUTER SCIENCE
Do-calculus (Pearl 2001)
STATISTICS
Potential Outcomes (Rubin 1984)
𝐸 𝑶𝒖𝒕 𝒄 = 𝟏, 𝑣𝑖
− 𝐸 𝑶𝒖𝒕 𝒄 = 𝟎, 𝑣𝑖
17. What if the user
forgot to add an important variable to the
graph, or
did not even know about a confounder?
!
18. III. Refutation/Validation: Test robustness of
obtained estimate
Cause
v1,v2
Outcome
v3
v5
w
Input
Data
Cause
v1,v2, U
Outcome
v3
v5
w
Input
Data
Many “automatic” validation tests: Dummy Outcome test, Placebo test,
Subsample test, Add-unobserved-confounder test,
19. IIIa. Adding New Confounders
Add a variable 𝑼 that causes both 𝐶𝑎𝑢𝑠𝑒 and
𝑂𝑢𝑡𝑐𝑜𝑚𝑒.
1. 𝑼 is randomly generated.
• Rerun analysis, expect no change in causal effect.
2. 𝑼 is generated to have a correlation 𝜌 with 𝐶𝑎𝑢𝑠𝑒
and 𝑂𝑢𝑡𝑐𝑜𝑚𝑒.
• Assess sensitivity: how fast does the new causal
estimate go to zero?
Cause
v1,v2, U
Outcome
v3
v5
w
𝑿 = 𝑿′
+ 𝑼
𝒀 = 𝒀′ + 𝑼
20. IIIb. Placebo (“A/A”) test
Simulate a world where 𝐶𝑎𝑢𝑠𝑒 does not affect
𝑂𝑢𝑡𝑐𝑜𝑚𝑒.
Replace 𝐶𝑎𝑢𝑠𝑒 by a randomly generated variable in
the dataset.
• Rerun analysis, expect causal effect to go
to zero.
Cause
v1,v2, U
Outcome
v3
v5
w
21. IIIc. Subsampling test
Can also test statistical robustness.
E.g., Remove a random subset of the data.
• Rerun analysis, expect no change in the causal effect.
Input Data
Input DataInput Data
22. Summary: DoWhy, an end-to-end library for
causal inference
Test assumptions as far as possible
• Make assumptions explicit through a Bayesian network.
• Test assumptions from observed data [Sharma 2018, Arxiv].
Assess sensitivity to untested assumptions
• When tests are inconclusive, assess sensitivity of causal estimate to violation
of assumptions [Sharma et al. 2018, Annals of Applied Statistics].
Unify best practices from different scientific fields
• Unify different frameworks from computer science and statistics (“graphs and
potential outcomes”) [Kiciman & Sharma 2018, KDD Tutorial].
23. Thank you!
Resources
• KDD 2018 tutorial on causal inference
• https://causalinference.gitlab.io/kdd-tutorial/
• Upcoming book on “Causal Reasoning: Fundamentals and ML Applications”
• https://causalinference.gitlab.io/
• DoWhy
• Code: https://github.com/microsoft/dowhy
• Docs: https://microsoft.github.io/dowhy/
Amit Sharma
Microsoft Research India
@amt_shrma
Editor's Notes
So this tutorial is going to be about how to get better at it. Suppose a simple world. If you believe everything relevant is captured, go for prediction and you should be fine. But if not, need to understand causal factors. And btw, we will also learn that regression is one of the worst methods because the world is almost never linear.
One of the reasons is that while we have got pretty good at processing terabytes of data, causal inference methods haven’t caught up.
The example I like to think of is fundamentally the difference between prediction and causation, as I describe in a recent paper in Science. Suppose there are two variables and for simplicity, you believe that the true model of the world is given by beta x. But we know nothing about the error eta, it may not even be independent.
Prediction is what big data is often used for…so if we have a big dataset with two variables X and Y,
, we can simply feed it to a machine learning algorithm to get reasonable accurate predictions for y. However, often the most interesting questions are of a causal nature---does X cause Y---and here it is not entirely clear how to use big data:
So let’s first look at how someone at Office or in biomedical scientist would have run a causal analysis, before DoWhy.
Finally *think of* how to check robustness of estimate.
What DoWhy does is that it implements the hard parts of all three steps, leading to an easy interface. The user still has to provide domain knowledge, that’s a harder problem we don’t solve yet..but we want to..
Once it has done that, the second step is straightforward.
As an example of the first two steps
While both of these are prior work, they are actually parts of two different frameworks that are not used together. DoWhy combines them and
All this is good, but..
Test the sensitivity of the estimate as causal assumptions are violated.
Test like a scientific theory
And how is dowhy able to implement the full workflow of causal inference ?Dowhy consider assumptions its first class citizens. A user is encouraged to think more about their domain assumptions, than the methods. And at the backend, DoWhy uses our recent research to test those assumptions as far as possible.
--
Causal inference methods depend critically on assumptions
Vast, contradictory literature
Assumptions are often not explicit, masked in “statistical assumptions”
What happens when the method’s assumptions fail?
Causal analysis restricted to experts in causal inference or statistics