Identifying causal effects is an integral part of scientific inquiry, spanning a wide range of questions such as understanding behavior in online systems, effect of social policies, or risk factors for diseases. In the absence of a randomized experiment, however, traditional methods such as matching or instrumental variables fail to provide robust estimates because they depend on strong assumptions that are never tested.
My research shows that many of the strong assumptions are testable. This leads to a data mining framework for causal inference from observed data: instead of relying on untestable assumptions, we develop tests for valid experiment-like data---a "natural" experiment---and estimate causal effects only from subsets of data that pass those tests. Two such methods are presented. The first utilizes auxiliary data from large-scale systems to automate the search for natural experiments. Applying it to estimate the additional activity caused by Amazon's recommendation system, I find over 20,000 natural experiments, an order of magnitude more than those in past work. These experiments indicate that less than half of the click-throughs typically attributed to the recommendation system are causal; the rest would have happened anyways. The second method proposes a general Bayesian test that can be used for validating natural experiments in any dataset. For instance, I find that a majority of natural experiments used in recent studies in a premier economics journal are likely invalid. More generally, the proposed framework presents a viable way of doing causal inference in large-scale datasets with minimal assumptions.
Recombination DNA Technology (Nucleic Acid Hybridization )
Causal data mining: Identifying causal effects at scale
1. Causal data mining:
Identifying causal
effects at scale
1
AMIT SHARMA
Postdoctoral Researcher, Microsoft Research New York
http://www.amitsharma.in
@amt_shrma
5. Distinguishing between personal preference and homophily
in online activity feeds. Sharma and Cosley (2016).
Studying and modeling the effect of social explanations
in recommender systems. Sharma and Cosley (2013).
Amit and Dan like this.
6. Distinguishing between personal preference and homophily
in online activity feeds. Sharma and Cosley (2016).
Studying and modeling the effect of social explanations
in recommender systems. Sharma and Cosley (2013).
Amit and Dan like this.
Averaging Gone Wrong: Using Time-Aware Analyses to Better
Understand Behavior. Barbosa, Cosley, Sharma, Cesar (2016)
Auditing search engines for differential satisfaction across
demographics. Mehrotra, Anderson, Diaz, Sharma, Wallach (2016)
51. 51
All visits to a
recommended product
Recommender
visits
Direct visits
Search visits
Direct
browsing
Auxiliary outcome: Proxy for
unobserved demand
63. Recreating sequence of visits: Log data
63
Timestamp URL
2014-01-20
09:04:10
http://www.amazon.com/s/ref=nb_s
b_noss_1?field-
keywords=Cormac%20McCarthy
2014-01-20
09:04:15
http://www.amazon.com/dp/081298
4250/ref=sr_1_2
2014-01-20
09:05:01
http://www.amazon.com/dp/157322
5797/ref=pd_sim_b_1
User searches for
Cormac McCarthy
User clicks on the
second search result
User clicks on the first
recommendation
93. Denominator (Invalid-IV)
Derived a closed form
solution.
Properties of dirichlet and
hyperdirichlet distributions.
-Laplace transform
Numerator (Valid-IV)
No closed form solution
exists.
Used Monte Carlo methods
for approximating.
-Annealed Importance
Sampling
93
95. 95
Studies from American Economic Review Validity Ratio
Effect of Mexican immigration on crime in United States (2015) 0.07
Effect of subsidy manipulation on Medicare premiums (2015) 1.02
Effect of credit supply on housing prices (2015) 0.01
Effect of Chinese import competition on local labor markets (2013) 0.3
Effect of rural electrification on employment in South Africa (2011) 3.6
Expt: National Job Training Partnership Act (JTPA) Study (2002) 3.4
101. http://www.amitsharma.in
101
1. Hofman, Sharma, and Watts (2017). Prediction and explanation in
social systems. Science, 355.6324.
2. Sharma (2016). Necessary and probably sufficient test for finding
instrumental variables. Working paper.
3. Sharma, Hofman, and Watts (2016). Split-door criterion for causal
identification: An algorithm for finding natural experiments. Under
review at Annals of Applied Statistics (AOAS).
4. Sharma, Hofman, and Watts (2015). Estimating the causal impact of
recommendation systems from observational data. In Proceedings of
the 16th ACM Conference on Economics and Computation.