SlideShare a Scribd company logo
1 of 68
Download to read offline
causal inference:
a friendly introduction
Alex Dimakis
UT Austin
based on joint work with
Murat Kocaoglu,
Karthik Shanmugam
Sriram Vishwanath,
Babak Hassibi
Overview
• What is causal inference
• Interventions and how to design them
• What to do if you cannot intervene
Disclaimer
• There are many frameworks of causality
• For time-series: Granger causality
• Potential Outcomes / CounterFactuals framework
(Imbens & Rubin)
• Pearl’s structural equation models
• aka Causal Graph models
• Additive models, Dawid’s decision-oriented approach, Information
Geometry, many others…
Overview
• What is causal inference
• Directed graphical models and conditional independence
• That’s not it.
• Interventions and how to design them
• What to do if you cannot intervene
Independence of random variables
S: Heavy
smoker
C: Lung cancer
before 60
0 0
1 1
0 1
1 …. 1 ….
Observational data
How to check if S independent from C ?
Joint Pdf and Independence
S: Heavy
smoker
C: Lung cancer
before 60
0 0
1 1
0 1
1 …. 1 ….
Observational data
S=0 S=1
C=0 30/100 10/100
C=1 20/100 40/100
Joint pdf
How to check if S independent from C ?
Joint Pdf and Independence
S: Heavy
smoker
C: Lung cancer
before 60
0 0
1 1
0 1
1 …. 1 ….
Observational data
S=0 S=1
C=0 30/100 10/100
C=1 20/100 40/100
Joint pdf
How to check if S independent from C ?
Compare P(S,C) with P(S)P(C)
0.4
0.6
0.5 0.5
Directed graphical models
A B C
Given data on A,B,C we can estimate the joint PDF
p(A,B,C)
See if it factorizes as
P(A,B,C)= P(A) P(B|A) P (C|B)
i.e. has some conditional indepedencies.
A directed graphical model describes all
distributions that have a given set of conditional
independencies.
This one: A ⫫ C |B
P(C|A,B) = P(C| B)
P(A,C|B) = P(A|B) P(C|B)
A B C
0 1 0
1 1 1
… … …
Directed graphical models
A B C
Given data on A,B,C we can estimate the joint PDF
p(A,B,C)
See if it factorizes as
P(A,B,C)= P(A) P(B|A) P (C|B)
i.e. has some conditional indepedencies.
A directed graphical model describes all
distributions that have a given set of conditional
independencies.
This one: A ⫫ C |B
P(C|A,B) = P(C| B)
P(A,C|B) = P(A|B) P(C|B)
A B C
0 1 0
1 1 1
… … …
• learning a directed
graphical model = learning
all conditional
independencies in data.
• learning a causal graph is
not learning a directed
graphical model.
Smoking causes cancer
S: Heavy smoker C: Lung cancer before
60
0 0
1 1
0 1
1 …. 1 ….
Observational data
S=0 S=1
C=0 30/100 10/100
C=1 20/100 40/100
Joint pdf
Causality= mechanism
S C Pr(S,C)
Causality= mechanism
S C
S=0 0.5
S=1 0.5
Pr(S)
S=0 S=1
C=0 30/50 10/50
C=1 20/50 40/50
Pr(C/S)
Pr(S,C)
Universe 1
S C
S=0 0.5
S=1 0.5
Pr(S)
S=0 S=1
C=0 30/50 10/50
C=1 20/50 40/50
Pr(C/S)
Pr(S,C) C=F(S,E)
E ⫫ S
Universe 2
S C
Universe 2
S C
C=0 0.4
C=1 0.6
Pr(C)
C=0 C=1
S=0 30/(100*0.4) = 0.75 20/(100*0.6) = 0.33
S=1 10/(100*0.4) = 0.25 40/(100*0.6) = 0.66
Pr(S/C)
Pr(S,C) S=F(C,E)
E ⫫ C
How to find the causal direction?
Pr(S,C)
Pr(S) Pr(C/S)
S C
C=F(S,E)
E ⫫ S
How to find the causal direction?
S C
Pr(S,C)
Pr(C) Pr(S/C)Pr(S) Pr(C/S)
S C
C=F(S,E)
E ⫫ S
S=F’(C,E’)
E’ ⫫ S
How to find the causal direction?
S C
Pr(S,C)
Pr(C) Pr(S/C)Pr(S) Pr(C/S)
S C
C=F(S,E)
E ⫫ S
S=F’(C,E’)
E’ ⫫ S
• It is impossible to find the true causal direction from observational data
for two random variables.
• (Unless we make more assumptions)
• You need interventions, i.e. messing with the mechanism.
• For more than two r.v.s there is a rich theory and some directions can
be learned without interventions. (Spirtes et al.)
Overview
• What is causal inference
• Directed graphical models and conditional independence
• That’s not it.
• Interventions and how to design them
• What to do if you cannot intervene
Intervention: force people to smoke
S C
S=0 0.5
S=1 0.5
Pr(S)
S=0 S=1
C=0 30/50 10/50
C=1 20/50 40/50
Pr(C/S)
• Flip coin and force each person to smoke or not, with prob ½.
• In Universe1 (i.e. Under S→C) ,
• new joint pdf stays same as before intervention.
Intervention: force people to smoke
• Flip coin and force each person to smoke or not, with prob ½.
• In Universe 2 (Under C→S)
• S, C will become independent after intervention.
C=0 0.4
C=1 0.6
Pr(C)
C=0 C=1
S=0 30/(100*0.4) = 0.75 20/(100*0.6) = 0.33
S=1 10/(100*0.4) = 0.25 40/(100*0.6) = 0.66
Pr(S/C)
S C
Intervention: force people to smoke
• Flip coin and force each person to smoke or not, with prob ½.
• In Universe 2 (Under C→S)
• S, C will become independent after intervention.
• So check correlation on data after intervention and find true
direction!
C=0 0.4
C=1 0.6
Pr(C)
C=0 C=1
S=0 30/(100*0.4) = 0.75 20/(100*0.6) = 0.33
S=1 10/(100*0.4) = 0.25 40/(100*0.6) = 0.66
Pr(S/C)
S C
who does interventions like that?
-you’re giving dying people sugar pills?
More variables
S2 S7
S1 S3
S4
S6
S5
True Causal DAG
More variables
S2 S7
S1 S3
S4
S6
S5
True Causal DAG
From observational
Data we can learn
Conditional
independencies.
Obtain Skeleton
(lose directions)
More variables
S2 S7
S1 S3
S4
S6
S5
S2 S7
S1 S3
S4
S6
S5
True Causal DAG Skeleton
From observational
Data we can learn
Conditional
independencies.
Obtain Skeleton
(lose directions)
PC Algorithm (Spirtes et al. Meek)
S2 S7
S1 S3
S4
S6
S5
Skeleton
There are a few directions we can learn from
observational Data
(Immoralities, Meek Rules)
Spirtes, Glymour, Scheines 2001, PC Algorithm
C. Meek , 1995.
Andersson, Madigan, Perlman, 1997
PC Algorithm (Spirtes et al. Meek)
S2 S7
S1 S3
S4
S6
S5
Skeleton
There are a few directions we can learn from
observational Data
(Immoralities, Meek Rules)
Spirtes, Glymour, Scheines 2001, PC Algorithm
C. Meek , 1995.
Andersson, Madigan, Perlman, 1997
How interventions reveal directions
S2 S7
S1 S3
S4
S6
S5
Intervened Set S
={S1,S2,S4}
We choose a subset of the variables S and
Intervene (i.e. force random values )
How interventions reveal directions
S2 S7
S1 S3
S4
S6
S5
Intervened Set S
={S1,S2,S4}
We choose a subset of the variables S and
Intervene (i.e. force random values )
Directions of edges
between S and Sc are revealed to me.
How interventions reveal directions
S2 S7
S1 S3
S4
S6
S5
Intervened Set S
={S1,S2,S4}
We choose a subset of the variables S and
Intervene (i.e. force random values )
Directions of edges
between S and Sc are revealed to me.
Re-apply PC Algorithm+Meek rules to learn a few
more edges possibly
Learning Causal DAGs
S2 S7
S1 S3
S4
S6
S5
Skeleton
Given a skeleton graph, how many interventions are
needed to learn all directions ?
• A-priori fixed set of interventions (non-Adaptive)
Learning Causal DAGs
S2 S7
S1 S3
S4
S6
S5
Skeleton
Given a skeleton graph, how many interventions are
needed to learn all directions ?
• A-priori fixed set of interventions (non-Adaptive)
• Adaptive
• Randomized Adaptive
Learning Causal DAGs
S2 S7
S1 S3
S4
S6
S5
Skeleton
Given a skeleton graph, how many interventions are
needed to learn all directions ?
• A-priori fixed set of interventions (non-Adaptive)
Theorem (Hauser & Buhlmann 2014):
Log(χ) interventions suffice
(χ= chromatic number of skeleton)
Adaptive?
(NIPS15): Adaptive does not help
(in the worst case)
• Randomized Adaptive
(Li,Vetta, NIPS14): loglog(n) interventions with high
probability suffice for complete skeleton.
A good algorithm for general graphs
Overview
• What is causal inference
• Interventions and how to design them
• What to do if you cannot intervene
• Make more assumptions
• compare on standard benchmark
Data-driven causality
• How to find causal direction without interventions
• Impossible for two variables. Possible under assumptions.
• Popular assumption Y= F(X) + E, (E ⫫ X)
(Additive models)(Shimizu et al., Hoyer et al., Peters et al. Chen et al., Mooij et al.)
• Entropic Causality: Use information theory for general data-
driven causality. Y= F(X,E), (E ⫫ X)
• (related work: Janzing, Mooij, Zhang, Lemeire: not additive assumption but no noise. Y=F(X) )
Conclusions
• Learning causal graphs with interventions is an on-going field of research
• Tetrad project (CMU)
• http://www.phil.cmu.edu/projects/tetrad/
• When time is present more things can be done (Difference in Differences
method, Granger, Potential outcomes etc.)
• Additive models and entropic causality can give be used for data-driven
causal inference.
Pointers
• Tuebingen Benchmark: https://webdav.tuebingen.mpg.de/cause-effect/
• http://www.phil.cmu.edu/projects/tetrad/
• https://github.com/mkocaoglu/Entropic-Causality
• P. Spirtes, C. Glymour and R. Scheines, Causation, Prediction, and Search. Bradford Books, 2001.
• Causality by J. Pearl Cambridge University Press, 2009.
• Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction, G. Imbens and D. Rubin
• https://www.youtube.com/watch?v=9yEYZURoE3Y&feature=youtu.be
CCD Summer Short Course 2016CMU Center for Causal Discovery short course.
• Jonas Peters, Peter Buehlmann and Nicolai Meinshausen (2016) Causal inference using invariant prediction: identification and confidence
intervals
Journal of the Royal Statistical Society, Series B
• Learning Causal Graphs with Small Interventions
K. Shanmugam, M. Kocaoglu, A.G. Dimakis, S. Vishwanath (NIPS 2015)
• Jonas Peters, Peter Buehlmann and Nicolai Meinshausen (2016) Causal inference using invariant prediction:
identification and confidence intervals Journal of the Royal Statistical Society, Series B
• Frederich Eberhardt, Clark Glymour, and Richard Scheines. On the number of experiments sufficient and
in the worst case necessary to identify all causal relations among n variables.
• Alain Hauser and Peter Buhlmann. Two optimal strategies for active learning of causal models from interventional data.
International Journal of Approximate Reasoning, 55(4):926–939, 2014.
• Hoyer, Patrik O., et al. "Nonlinear causal discovery with additive noise models." Advances in neural information processing systems. 2009.
• Janzing, Dominik, et al. "Information-geometric approach to inferring causal directions." Artificial Intelligence 182 (2012)
• Peters, Jonas, Dominik Janzing, and Bernhard Scholkopf. "Causal inference on discrete data using additive noise models.”
IEEE Transactions on Pattern Analysis and Machine Intelligence 33.12 (2011)
fin
Learning Causal DAGs
Theorem: Log(χ) interventions suffice
Proof:
1.Color the vertices. (legal coloring)
S2 S7
S1 S3
S4
S6
S5
Skeleton
Learning Causal DAGs
S2 S7
S1 S3
S4
S6
S5
Skeleton
Thm: Log(χ) interventions suffice
Proof:
1.Color the vertices.
2. Form table with binary representations of colors
Red: 0 0
Green: 0 1
Blue: 1 0 S1 0 0
S2 0 1
S3 1 0
S4 0 1
S5 1 0
S6 0 1
S7 1 0
Learning Causal DAGs
S2 S7
S1 S3
S4
S6
S5
Skeleton
Thm: Log(χ) interventions suffice
Proof:
1.Color the vertices.
2. Form table with binary representations of colors
Red: 0 0
Green: 0 1
Blue: 1 0
3. Each intervention
is indexed by a column
of this table.
S1 0 0
S2 0 1
S3 1 0
S4 0 1
S5 1 0
S6 0 1
S7 1 0
Intervention 1
Learning Causal DAGs
S2 S7
S1 S3
S4
S6
S5
For any edge, its two vertices have different colors.
Their binary reps are different in 1 bit.
So for some intervention, one is in set and other is
not. So I will learn its direction. ΟΕΔ.
Thm: Log(χ) interventions suffice
Proof:
1.Color the vertices.
2. Form table with binary representations of colors
Red: 0 0
Green: 0 1
Blue: 1 0
3. Each intervention
is indexed by a column
of this table.
S1 0 0
S2 0 1
S3 1 0
S4 0 1
S5 1 0
S6 0 1
S7 1 0
Intervention 1
Learning Causal DAGs
S2 S7
S1 S3
S4
S6
S5
Skeleton
On-going Research on several problems
• What if the size of the intervention sets is limited
(NIPS 15)
• What if some variables cannot be intervened on
Major problem: Size of interventions
S2 S7
S1 S3
S4
S6
S5
Intervened Set S
={S1,S2,S4}
We choose a subset of the variables S and
Intervene (i.e. force random values )
Question: If each intervention has size up to k, how many
interventions do we need ?
Eberhardt: A separating system on χ elements with weight k
is sufficient to produce a non-adaptive causal inference
algorithm
A separating system on n elements with weight k is a {0,1}
matrix with n distinct columns and each row having weight
at most k.
Reyni, Kantona, Wegener: (n,k) separating systems have
size
Major problem: Size of interventions
S2 S7
S1 S3
S4
S6
S5
Intervened Set S
={S1,S2,S4}
Open problem: Is a separating system necessary or
can adaptive algorithms do better ?
(NIPS15): For complete graph skeletons, separating
systems are necessary.
Even for adaptive algorithms.
We can use lower bounds on size of separating
systems to get lower bounds on the number of
interventions.
Randomized adaptive: loglogn interventions
Our result: n/k loglog k interventions suffice , each of
size up to k.
Entropic Causality
• Extra slides
Entropic Causality
• Given data Xi,Yi.
• Search over explanations assuming X→Y
• Y= F(X,E) , (E ⫫ X)
• Simplest explanation: One that minimizes H(E).
• Search in the other direction, assuming Y→X
• X= F’(Y,E’) , (E’ ⫫ Y)
• If H(E’) << H(E) decide Y→X
• If H(E) <<H(E’) decide X→Y
• If H(E), H(E’) close, say ‘don’t know’
Entropic Causality in pictures
S C S C
C= F(S,E) , (E ⫫ S)
H(E) small
S= F’(C,E’) , (E’ ⫫ C)
H(E’) big
Entropic Causality in pictures
S C S C
C= F(S,E) , (E ⫫ S)
H(E) small
S= F’(C,E’) , (E’ ⫫ C)
H(E’) big
• You may be thinking that
min H(E) is like minimizing
H(C/S).
• But it is fundamentally
different
• (we’ll prove its NP-hard to
compute)
Question 1: Identifiability?
• If data is generated from X→Y ,
i.e. Y= f(X,E), (E ⫫ X) and H(E) is small.
• Is it true that all possible reverse explanations
• X= f’(Y,E’) , (E’ ⫫ Y)
must have H(E’) big, for all f’,E’ ?
• Theorem 1: If X,E,f are generic, then identifiability holds for H0
(support of distribution of E’ must be large).
• Conjecture 1: Same result holds for H1 (Shannon entropy).
Question 2: How to find simplest
explanation?
• Minimum entropy coupling problem: Given some marginal
distributions U1,U2, .. Un , find the joint distribution that has
these as marginals and has minimal entropy.
• (NP-Hard, Kovacevic et al. 2012).
• Theorem 2: Finding the simplest data explanation f,E, is
equivalent to solving the minimum entropy coupling problem.
• How to use: We propose a greedy algorithm that empirically
performs reasonably well
Proof idea
• Consider Y = f(X, E). (X,Y over n sized alphabet.)
• pi,j =P(Y = i|X=j) = P(f(X,E) = i | X = j) = P( fj(E) = i ) since E ⫫ X
e1
e2
e3
e4
e5
e6
.
.
.
em
Distribution of E
p1,1
p2,1
p3,1
.
.
.
pn,1
Distribution of Y
conditioned on X = 1
f1
• Each conditional probability is a subset
sum of distribution of E
• Si,j: index set for pi,j
Performance on Tubingen dataset
20 30 40 50 60 70 80 90 100
Decision Rate, %
0
10
20
30
40
50
60
70
80
90
100
Accuracy,%
Accuracy vs. Decision Percentage
Entropy-based Causal Inference
68% Confidence Interval
95% Condifence Interval
• Decision rate:
• Fraction of pairs that algorithm makes a decision.
• Decision made when
|H(X,E)-H(Y,E’)|> t
(t determines the decision rate)
• Confidence intervals based
on number of datapoints
• Slightly better than ANMs
Conclusions 2
• Introduced a new framework for data-driven causality for two variables
• Established Identifiability for generic distributions for H0 entropy. Conjectured it
holds for Shannon entropy.
• Inspired by Occam’s razor. Natural and different from prior works.
• Natural for categorical variables (Additive models do not work there)
• Proposed practical greedy algorithm using Shannon entropy.
• Empirically performs very well for artificial and real causal datasets.
Existing Theory: Additive Noise
Models
• Assume Y = f(X)+E, X⫫E
• Identifiability 1:
• If f nonlinear, then ∄ g, N ⫫Y such that X = g(Y)+N (almost surely)
• If E non-Gaussian, ∄ g, N ⫫Y such that X = g(Y)+N
• Performs 63% on real data*
• Drawback: Additivity is a restrictive functional assumption
* Cause Effect Pairs Dataset: https://webdav.tuebingen.mpg.de/cause-effect/
Existing Theory: Independence of
Cause and Mechanism
• Function f chosen “independently” from distribution of X by nature
• Notion of independence: Assign a variable to f, check log-slope
integral
• Boils down to: X causes Y if h(Y) < h(X) [h: differential
entropy]
• Drawback:
• No exogenous variable assumption (deterministic X-Y relation)
• Continuous variables only
Our Approach
• Consider discrete variables X, Y, E.
• Use total input (Renyi) entropy as a measure of complexity
• Choose the simpler model
• Assumption: (Renyi) entropy of exogenous variable E is small
• Theoretical guarantees for H0 Renyi entropy (cardinality)
Causal direction (almost surely) identifiable if E has small cardinality
Performance of Greedy Joint Entropy
Minimization
2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7
Log of number of states, log2
(n)
0
0.2
0.4
0.6
0.8
1
1.2
H*
(X1
,X2
,...,Xn
)-maxi
(H(Xi
))
Minimum Joint Entropy by Greedy Algorithm
Average Gap H*
(X
1
,X
2
,...X
n
)-max
i
H(X
i
)
Minimum Gap
Maximum Gap
• n marginal distributions each
with n states are randomly
generated for each n
• The minimum joint entropy
obtained by the greedy
algorithm is at most 1 bit
away from the largest
marginal maxiH(Xi)
Results
Shannon Entropy-based Identifiability
H(E)/log(n)
0 0.2 0.4 0.6 0.8 1
ProbabilityofSuccess
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Accuracy for Low Entropy E
n = 4
n = 8
n = 16
n = 32
n = 64
n = 128
0 0.5 1
0.997
0.998
0.999
1
• Generate distributions of X,Y by
randomly selecting f, X, E.
• Probability of success is the fraction of
points where H(X,E) < H(Y,N).
• Larger n drives probability of success
to 1 when $H(E) < log(n), supporting
the conjecture.
Characterization of Conditionals
• Define conditional distribution
• Let p = [p1
T, p2
T, …, pn
T]T. Then
Ex.:
where M is a block partition matrix:
Each block of length n is a partitioning
of columns
General Position Argument
• Suppose Y|X = j are uniform over simplex (not realistic, toy
example)
• Note: Let xi ∼ exp(1). Then following is a uniform random vector over the
simplex:
• Drop n rows of p to make it (almost)
i.i.d.
• Claim: There does not exist an e with H0 < n(n-1)
• Proof: Assume otherwise.
• Rows of M are linearly dependent.
• ∃ a such that aT M = 0
• Then aTp = 0
• Implies a random hyperplane being orthogonal to a vector, has probability
0.
Our contribution
• Nature chooses X, E, f. Joint distribution over X, Y implied
• Choose X, E randomly over simplex.
• Derive X|Y from induced joint
• Any ⫫ Y for which X = g(Y, ) implies
• Corresponds to a non-zero polynomial being zero, has
probability 0.
Formal Result
• X, Y discrete r.v.’s with cardinality n
• Y = f(X,E) where E ⫫ X is also discrete
• f is generic (technical condition to avoid edge cases, true in real
data)
• Distribution vectors of X, E uniformly randomly sampled from
simplex
• Then with probability 1, there does not exist N ⫫ Y such that there
exist g that satisfies X = g(Y, N)
Working with Shannon Entropy
• Given Y|X, finding E with minimum Shannon entropy such that
there is f that satisfies Y = f(X,E) is equivalent to
• Given marginal distributions of n variables Xi, find the joint
distribution with minimum entropy
• NP hard problem.
• We propose a greedy algorithm (that produces a local optimum)

More Related Content

Viewers also liked

Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017MLconf
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016MLconf
 
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...MLconf
 
Scott Clark, CEO, SigOpt, at The AI Conference 2017
Scott Clark, CEO, SigOpt, at The AI Conference 2017Scott Clark, CEO, SigOpt, at The AI Conference 2017
Scott Clark, CEO, SigOpt, at The AI Conference 2017MLconf
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016MLconf
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017MLconf
 
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017MLconf
 
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017 Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017 MLconf
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016MLconf
 
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017MLconf
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...MLconf
 
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...MLconf
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016MLconf
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016MLconf
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf
 

Viewers also liked (19)

Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
 
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
 
Scott Clark, CEO, SigOpt, at The AI Conference 2017
Scott Clark, CEO, SigOpt, at The AI Conference 2017Scott Clark, CEO, SigOpt, at The AI Conference 2017
Scott Clark, CEO, SigOpt, at The AI Conference 2017
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
 
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
 
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017 Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 

Similar to Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineering, University of Texas at Austin at MLconf SF 2016

Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningBill Liu
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsMaarten van Smeden
 
Homework 21. Complete Chapter 3, Problem #1 under Project.docx
Homework 21. Complete Chapter 3, Problem #1 under Project.docxHomework 21. Complete Chapter 3, Problem #1 under Project.docx
Homework 21. Complete Chapter 3, Problem #1 under Project.docxadampcarr67227
 
Network meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencyNetwork meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencycheweb1
 
Bayesian networks and the search for causality
Bayesian networks and the search for causalityBayesian networks and the search for causality
Bayesian networks and the search for causalityBayes Nets meetup London
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalAConeAdam Cone
 
Networks, Deep Learning (and COVID-19)
Networks, Deep Learning (and COVID-19)Networks, Deep Learning (and COVID-19)
Networks, Deep Learning (and COVID-19)tm1966
 
Ancestral Causal Inference - NIPS 2016 poster
Ancestral Causal Inference - NIPS 2016 posterAncestral Causal Inference - NIPS 2016 poster
Ancestral Causal Inference - NIPS 2016 posterSara Magliacane
 
Multivariate Regression using Skull Structures
Multivariate Regression using Skull StructuresMultivariate Regression using Skull Structures
Multivariate Regression using Skull StructuresJustin Pierce
 
Global Bilateral Symmetry Detection Using Multiscale Mirror Histograms
Global Bilateral Symmetry Detection Using Multiscale Mirror HistogramsGlobal Bilateral Symmetry Detection Using Multiscale Mirror Histograms
Global Bilateral Symmetry Detection Using Multiscale Mirror HistogramsMohamed Elawady
 
MH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMin-hyung Kim
 
Introduction geostatistic for_mineral_resources
Introduction geostatistic for_mineral_resourcesIntroduction geostatistic for_mineral_resources
Introduction geostatistic for_mineral_resourcesAdi Handarbeni
 
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhChapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhbeshahashenafe20
 
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhChapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhbeshahashenafe20
 
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryAstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryNeo4j
 
Meta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxMeta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxACSRM
 

Similar to Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineering, University of Texas at Austin at MLconf SF 2016 (20)

Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
Day 3.pptx
Day 3.pptxDay 3.pptx
Day 3.pptx
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
 
Homework 21. Complete Chapter 3, Problem #1 under Project.docx
Homework 21. Complete Chapter 3, Problem #1 under Project.docxHomework 21. Complete Chapter 3, Problem #1 under Project.docx
Homework 21. Complete Chapter 3, Problem #1 under Project.docx
 
Network meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencyNetwork meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistency
 
Bayesian networks and the search for causality
Bayesian networks and the search for causalityBayesian networks and the search for causality
Bayesian networks and the search for causality
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalACone
 
Networks, Deep Learning (and COVID-19)
Networks, Deep Learning (and COVID-19)Networks, Deep Learning (and COVID-19)
Networks, Deep Learning (and COVID-19)
 
Ancestral Causal Inference - NIPS 2016 poster
Ancestral Causal Inference - NIPS 2016 posterAncestral Causal Inference - NIPS 2016 poster
Ancestral Causal Inference - NIPS 2016 poster
 
Multivariate Regression using Skull Structures
Multivariate Regression using Skull StructuresMultivariate Regression using Skull Structures
Multivariate Regression using Skull Structures
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Global Bilateral Symmetry Detection Using Multiscale Mirror Histograms
Global Bilateral Symmetry Detection Using Multiscale Mirror HistogramsGlobal Bilateral Symmetry Detection Using Multiscale Mirror Histograms
Global Bilateral Symmetry Detection Using Multiscale Mirror Histograms
 
MH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -clean
 
Chi‑square test
Chi‑square test Chi‑square test
Chi‑square test
 
Introduction geostatistic for_mineral_resources
Introduction geostatistic for_mineral_resourcesIntroduction geostatistic for_mineral_resources
Introduction geostatistic for_mineral_resources
 
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhChapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
 
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhChapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
 
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryAstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
 
Meta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxMeta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptx
 
Science or not media
Science or not mediaScience or not media
Science or not media
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineering, University of Texas at Austin at MLconf SF 2016

  • 1. causal inference: a friendly introduction Alex Dimakis UT Austin based on joint work with Murat Kocaoglu, Karthik Shanmugam Sriram Vishwanath, Babak Hassibi
  • 2. Overview • What is causal inference • Interventions and how to design them • What to do if you cannot intervene
  • 3. Disclaimer • There are many frameworks of causality • For time-series: Granger causality • Potential Outcomes / CounterFactuals framework (Imbens & Rubin) • Pearl’s structural equation models • aka Causal Graph models • Additive models, Dawid’s decision-oriented approach, Information Geometry, many others…
  • 4.
  • 5.
  • 6. Overview • What is causal inference • Directed graphical models and conditional independence • That’s not it. • Interventions and how to design them • What to do if you cannot intervene
  • 7. Independence of random variables S: Heavy smoker C: Lung cancer before 60 0 0 1 1 0 1 1 …. 1 …. Observational data How to check if S independent from C ?
  • 8. Joint Pdf and Independence S: Heavy smoker C: Lung cancer before 60 0 0 1 1 0 1 1 …. 1 …. Observational data S=0 S=1 C=0 30/100 10/100 C=1 20/100 40/100 Joint pdf How to check if S independent from C ?
  • 9. Joint Pdf and Independence S: Heavy smoker C: Lung cancer before 60 0 0 1 1 0 1 1 …. 1 …. Observational data S=0 S=1 C=0 30/100 10/100 C=1 20/100 40/100 Joint pdf How to check if S independent from C ? Compare P(S,C) with P(S)P(C) 0.4 0.6 0.5 0.5
  • 10. Directed graphical models A B C Given data on A,B,C we can estimate the joint PDF p(A,B,C) See if it factorizes as P(A,B,C)= P(A) P(B|A) P (C|B) i.e. has some conditional indepedencies. A directed graphical model describes all distributions that have a given set of conditional independencies. This one: A ⫫ C |B P(C|A,B) = P(C| B) P(A,C|B) = P(A|B) P(C|B) A B C 0 1 0 1 1 1 … … …
  • 11. Directed graphical models A B C Given data on A,B,C we can estimate the joint PDF p(A,B,C) See if it factorizes as P(A,B,C)= P(A) P(B|A) P (C|B) i.e. has some conditional indepedencies. A directed graphical model describes all distributions that have a given set of conditional independencies. This one: A ⫫ C |B P(C|A,B) = P(C| B) P(A,C|B) = P(A|B) P(C|B) A B C 0 1 0 1 1 1 … … … • learning a directed graphical model = learning all conditional independencies in data. • learning a causal graph is not learning a directed graphical model.
  • 12. Smoking causes cancer S: Heavy smoker C: Lung cancer before 60 0 0 1 1 0 1 1 …. 1 …. Observational data S=0 S=1 C=0 30/100 10/100 C=1 20/100 40/100 Joint pdf
  • 14. Causality= mechanism S C S=0 0.5 S=1 0.5 Pr(S) S=0 S=1 C=0 30/50 10/50 C=1 20/50 40/50 Pr(C/S) Pr(S,C)
  • 15. Universe 1 S C S=0 0.5 S=1 0.5 Pr(S) S=0 S=1 C=0 30/50 10/50 C=1 20/50 40/50 Pr(C/S) Pr(S,C) C=F(S,E) E ⫫ S
  • 17. Universe 2 S C C=0 0.4 C=1 0.6 Pr(C) C=0 C=1 S=0 30/(100*0.4) = 0.75 20/(100*0.6) = 0.33 S=1 10/(100*0.4) = 0.25 40/(100*0.6) = 0.66 Pr(S/C) Pr(S,C) S=F(C,E) E ⫫ C
  • 18. How to find the causal direction? Pr(S,C) Pr(S) Pr(C/S) S C C=F(S,E) E ⫫ S
  • 19. How to find the causal direction? S C Pr(S,C) Pr(C) Pr(S/C)Pr(S) Pr(C/S) S C C=F(S,E) E ⫫ S S=F’(C,E’) E’ ⫫ S
  • 20. How to find the causal direction? S C Pr(S,C) Pr(C) Pr(S/C)Pr(S) Pr(C/S) S C C=F(S,E) E ⫫ S S=F’(C,E’) E’ ⫫ S • It is impossible to find the true causal direction from observational data for two random variables. • (Unless we make more assumptions) • You need interventions, i.e. messing with the mechanism. • For more than two r.v.s there is a rich theory and some directions can be learned without interventions. (Spirtes et al.)
  • 21. Overview • What is causal inference • Directed graphical models and conditional independence • That’s not it. • Interventions and how to design them • What to do if you cannot intervene
  • 22. Intervention: force people to smoke S C S=0 0.5 S=1 0.5 Pr(S) S=0 S=1 C=0 30/50 10/50 C=1 20/50 40/50 Pr(C/S) • Flip coin and force each person to smoke or not, with prob ½. • In Universe1 (i.e. Under S→C) , • new joint pdf stays same as before intervention.
  • 23. Intervention: force people to smoke • Flip coin and force each person to smoke or not, with prob ½. • In Universe 2 (Under C→S) • S, C will become independent after intervention. C=0 0.4 C=1 0.6 Pr(C) C=0 C=1 S=0 30/(100*0.4) = 0.75 20/(100*0.6) = 0.33 S=1 10/(100*0.4) = 0.25 40/(100*0.6) = 0.66 Pr(S/C) S C
  • 24. Intervention: force people to smoke • Flip coin and force each person to smoke or not, with prob ½. • In Universe 2 (Under C→S) • S, C will become independent after intervention. • So check correlation on data after intervention and find true direction! C=0 0.4 C=1 0.6 Pr(C) C=0 C=1 S=0 30/(100*0.4) = 0.75 20/(100*0.6) = 0.33 S=1 10/(100*0.4) = 0.25 40/(100*0.6) = 0.66 Pr(S/C) S C
  • 25. who does interventions like that? -you’re giving dying people sugar pills?
  • 26. More variables S2 S7 S1 S3 S4 S6 S5 True Causal DAG
  • 27. More variables S2 S7 S1 S3 S4 S6 S5 True Causal DAG From observational Data we can learn Conditional independencies. Obtain Skeleton (lose directions)
  • 28. More variables S2 S7 S1 S3 S4 S6 S5 S2 S7 S1 S3 S4 S6 S5 True Causal DAG Skeleton From observational Data we can learn Conditional independencies. Obtain Skeleton (lose directions)
  • 29. PC Algorithm (Spirtes et al. Meek) S2 S7 S1 S3 S4 S6 S5 Skeleton There are a few directions we can learn from observational Data (Immoralities, Meek Rules) Spirtes, Glymour, Scheines 2001, PC Algorithm C. Meek , 1995. Andersson, Madigan, Perlman, 1997
  • 30. PC Algorithm (Spirtes et al. Meek) S2 S7 S1 S3 S4 S6 S5 Skeleton There are a few directions we can learn from observational Data (Immoralities, Meek Rules) Spirtes, Glymour, Scheines 2001, PC Algorithm C. Meek , 1995. Andersson, Madigan, Perlman, 1997
  • 31. How interventions reveal directions S2 S7 S1 S3 S4 S6 S5 Intervened Set S ={S1,S2,S4} We choose a subset of the variables S and Intervene (i.e. force random values )
  • 32. How interventions reveal directions S2 S7 S1 S3 S4 S6 S5 Intervened Set S ={S1,S2,S4} We choose a subset of the variables S and Intervene (i.e. force random values ) Directions of edges between S and Sc are revealed to me.
  • 33. How interventions reveal directions S2 S7 S1 S3 S4 S6 S5 Intervened Set S ={S1,S2,S4} We choose a subset of the variables S and Intervene (i.e. force random values ) Directions of edges between S and Sc are revealed to me. Re-apply PC Algorithm+Meek rules to learn a few more edges possibly
  • 34. Learning Causal DAGs S2 S7 S1 S3 S4 S6 S5 Skeleton Given a skeleton graph, how many interventions are needed to learn all directions ? • A-priori fixed set of interventions (non-Adaptive)
  • 35. Learning Causal DAGs S2 S7 S1 S3 S4 S6 S5 Skeleton Given a skeleton graph, how many interventions are needed to learn all directions ? • A-priori fixed set of interventions (non-Adaptive) • Adaptive • Randomized Adaptive
  • 36. Learning Causal DAGs S2 S7 S1 S3 S4 S6 S5 Skeleton Given a skeleton graph, how many interventions are needed to learn all directions ? • A-priori fixed set of interventions (non-Adaptive) Theorem (Hauser & Buhlmann 2014): Log(χ) interventions suffice (χ= chromatic number of skeleton) Adaptive? (NIPS15): Adaptive does not help (in the worst case) • Randomized Adaptive (Li,Vetta, NIPS14): loglog(n) interventions with high probability suffice for complete skeleton.
  • 37. A good algorithm for general graphs
  • 38. Overview • What is causal inference • Interventions and how to design them • What to do if you cannot intervene • Make more assumptions • compare on standard benchmark
  • 39. Data-driven causality • How to find causal direction without interventions • Impossible for two variables. Possible under assumptions. • Popular assumption Y= F(X) + E, (E ⫫ X) (Additive models)(Shimizu et al., Hoyer et al., Peters et al. Chen et al., Mooij et al.) • Entropic Causality: Use information theory for general data- driven causality. Y= F(X,E), (E ⫫ X) • (related work: Janzing, Mooij, Zhang, Lemeire: not additive assumption but no noise. Y=F(X) )
  • 40. Conclusions • Learning causal graphs with interventions is an on-going field of research • Tetrad project (CMU) • http://www.phil.cmu.edu/projects/tetrad/ • When time is present more things can be done (Difference in Differences method, Granger, Potential outcomes etc.) • Additive models and entropic causality can give be used for data-driven causal inference.
  • 41. Pointers • Tuebingen Benchmark: https://webdav.tuebingen.mpg.de/cause-effect/ • http://www.phil.cmu.edu/projects/tetrad/ • https://github.com/mkocaoglu/Entropic-Causality • P. Spirtes, C. Glymour and R. Scheines, Causation, Prediction, and Search. Bradford Books, 2001. • Causality by J. Pearl Cambridge University Press, 2009. • Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction, G. Imbens and D. Rubin • https://www.youtube.com/watch?v=9yEYZURoE3Y&feature=youtu.be CCD Summer Short Course 2016CMU Center for Causal Discovery short course. • Jonas Peters, Peter Buehlmann and Nicolai Meinshausen (2016) Causal inference using invariant prediction: identification and confidence intervals Journal of the Royal Statistical Society, Series B • Learning Causal Graphs with Small Interventions K. Shanmugam, M. Kocaoglu, A.G. Dimakis, S. Vishwanath (NIPS 2015) • Jonas Peters, Peter Buehlmann and Nicolai Meinshausen (2016) Causal inference using invariant prediction: identification and confidence intervals Journal of the Royal Statistical Society, Series B • Frederich Eberhardt, Clark Glymour, and Richard Scheines. On the number of experiments sufficient and in the worst case necessary to identify all causal relations among n variables. • Alain Hauser and Peter Buhlmann. Two optimal strategies for active learning of causal models from interventional data. International Journal of Approximate Reasoning, 55(4):926–939, 2014. • Hoyer, Patrik O., et al. "Nonlinear causal discovery with additive noise models." Advances in neural information processing systems. 2009. • Janzing, Dominik, et al. "Information-geometric approach to inferring causal directions." Artificial Intelligence 182 (2012) • Peters, Jonas, Dominik Janzing, and Bernhard Scholkopf. "Causal inference on discrete data using additive noise models.” IEEE Transactions on Pattern Analysis and Machine Intelligence 33.12 (2011)
  • 42. fin
  • 43. Learning Causal DAGs Theorem: Log(χ) interventions suffice Proof: 1.Color the vertices. (legal coloring) S2 S7 S1 S3 S4 S6 S5 Skeleton
  • 44. Learning Causal DAGs S2 S7 S1 S3 S4 S6 S5 Skeleton Thm: Log(χ) interventions suffice Proof: 1.Color the vertices. 2. Form table with binary representations of colors Red: 0 0 Green: 0 1 Blue: 1 0 S1 0 0 S2 0 1 S3 1 0 S4 0 1 S5 1 0 S6 0 1 S7 1 0
  • 45. Learning Causal DAGs S2 S7 S1 S3 S4 S6 S5 Skeleton Thm: Log(χ) interventions suffice Proof: 1.Color the vertices. 2. Form table with binary representations of colors Red: 0 0 Green: 0 1 Blue: 1 0 3. Each intervention is indexed by a column of this table. S1 0 0 S2 0 1 S3 1 0 S4 0 1 S5 1 0 S6 0 1 S7 1 0 Intervention 1
  • 46. Learning Causal DAGs S2 S7 S1 S3 S4 S6 S5 For any edge, its two vertices have different colors. Their binary reps are different in 1 bit. So for some intervention, one is in set and other is not. So I will learn its direction. ΟΕΔ. Thm: Log(χ) interventions suffice Proof: 1.Color the vertices. 2. Form table with binary representations of colors Red: 0 0 Green: 0 1 Blue: 1 0 3. Each intervention is indexed by a column of this table. S1 0 0 S2 0 1 S3 1 0 S4 0 1 S5 1 0 S6 0 1 S7 1 0 Intervention 1
  • 47. Learning Causal DAGs S2 S7 S1 S3 S4 S6 S5 Skeleton On-going Research on several problems • What if the size of the intervention sets is limited (NIPS 15) • What if some variables cannot be intervened on
  • 48. Major problem: Size of interventions S2 S7 S1 S3 S4 S6 S5 Intervened Set S ={S1,S2,S4} We choose a subset of the variables S and Intervene (i.e. force random values ) Question: If each intervention has size up to k, how many interventions do we need ? Eberhardt: A separating system on χ elements with weight k is sufficient to produce a non-adaptive causal inference algorithm A separating system on n elements with weight k is a {0,1} matrix with n distinct columns and each row having weight at most k. Reyni, Kantona, Wegener: (n,k) separating systems have size
  • 49. Major problem: Size of interventions S2 S7 S1 S3 S4 S6 S5 Intervened Set S ={S1,S2,S4} Open problem: Is a separating system necessary or can adaptive algorithms do better ? (NIPS15): For complete graph skeletons, separating systems are necessary. Even for adaptive algorithms. We can use lower bounds on size of separating systems to get lower bounds on the number of interventions. Randomized adaptive: loglogn interventions Our result: n/k loglog k interventions suffice , each of size up to k.
  • 51. Entropic Causality • Given data Xi,Yi. • Search over explanations assuming X→Y • Y= F(X,E) , (E ⫫ X) • Simplest explanation: One that minimizes H(E). • Search in the other direction, assuming Y→X • X= F’(Y,E’) , (E’ ⫫ Y) • If H(E’) << H(E) decide Y→X • If H(E) <<H(E’) decide X→Y • If H(E), H(E’) close, say ‘don’t know’
  • 52. Entropic Causality in pictures S C S C C= F(S,E) , (E ⫫ S) H(E) small S= F’(C,E’) , (E’ ⫫ C) H(E’) big
  • 53. Entropic Causality in pictures S C S C C= F(S,E) , (E ⫫ S) H(E) small S= F’(C,E’) , (E’ ⫫ C) H(E’) big • You may be thinking that min H(E) is like minimizing H(C/S). • But it is fundamentally different • (we’ll prove its NP-hard to compute)
  • 54. Question 1: Identifiability? • If data is generated from X→Y , i.e. Y= f(X,E), (E ⫫ X) and H(E) is small. • Is it true that all possible reverse explanations • X= f’(Y,E’) , (E’ ⫫ Y) must have H(E’) big, for all f’,E’ ? • Theorem 1: If X,E,f are generic, then identifiability holds for H0 (support of distribution of E’ must be large). • Conjecture 1: Same result holds for H1 (Shannon entropy).
  • 55. Question 2: How to find simplest explanation? • Minimum entropy coupling problem: Given some marginal distributions U1,U2, .. Un , find the joint distribution that has these as marginals and has minimal entropy. • (NP-Hard, Kovacevic et al. 2012). • Theorem 2: Finding the simplest data explanation f,E, is equivalent to solving the minimum entropy coupling problem. • How to use: We propose a greedy algorithm that empirically performs reasonably well
  • 56. Proof idea • Consider Y = f(X, E). (X,Y over n sized alphabet.) • pi,j =P(Y = i|X=j) = P(f(X,E) = i | X = j) = P( fj(E) = i ) since E ⫫ X e1 e2 e3 e4 e5 e6 . . . em Distribution of E p1,1 p2,1 p3,1 . . . pn,1 Distribution of Y conditioned on X = 1 f1 • Each conditional probability is a subset sum of distribution of E • Si,j: index set for pi,j
  • 57. Performance on Tubingen dataset 20 30 40 50 60 70 80 90 100 Decision Rate, % 0 10 20 30 40 50 60 70 80 90 100 Accuracy,% Accuracy vs. Decision Percentage Entropy-based Causal Inference 68% Confidence Interval 95% Condifence Interval • Decision rate: • Fraction of pairs that algorithm makes a decision. • Decision made when |H(X,E)-H(Y,E’)|> t (t determines the decision rate) • Confidence intervals based on number of datapoints • Slightly better than ANMs
  • 58. Conclusions 2 • Introduced a new framework for data-driven causality for two variables • Established Identifiability for generic distributions for H0 entropy. Conjectured it holds for Shannon entropy. • Inspired by Occam’s razor. Natural and different from prior works. • Natural for categorical variables (Additive models do not work there) • Proposed practical greedy algorithm using Shannon entropy. • Empirically performs very well for artificial and real causal datasets.
  • 59. Existing Theory: Additive Noise Models • Assume Y = f(X)+E, X⫫E • Identifiability 1: • If f nonlinear, then ∄ g, N ⫫Y such that X = g(Y)+N (almost surely) • If E non-Gaussian, ∄ g, N ⫫Y such that X = g(Y)+N • Performs 63% on real data* • Drawback: Additivity is a restrictive functional assumption * Cause Effect Pairs Dataset: https://webdav.tuebingen.mpg.de/cause-effect/
  • 60. Existing Theory: Independence of Cause and Mechanism • Function f chosen “independently” from distribution of X by nature • Notion of independence: Assign a variable to f, check log-slope integral • Boils down to: X causes Y if h(Y) < h(X) [h: differential entropy] • Drawback: • No exogenous variable assumption (deterministic X-Y relation) • Continuous variables only
  • 61. Our Approach • Consider discrete variables X, Y, E. • Use total input (Renyi) entropy as a measure of complexity • Choose the simpler model • Assumption: (Renyi) entropy of exogenous variable E is small • Theoretical guarantees for H0 Renyi entropy (cardinality) Causal direction (almost surely) identifiable if E has small cardinality
  • 62. Performance of Greedy Joint Entropy Minimization 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 Log of number of states, log2 (n) 0 0.2 0.4 0.6 0.8 1 1.2 H* (X1 ,X2 ,...,Xn )-maxi (H(Xi )) Minimum Joint Entropy by Greedy Algorithm Average Gap H* (X 1 ,X 2 ,...X n )-max i H(X i ) Minimum Gap Maximum Gap • n marginal distributions each with n states are randomly generated for each n • The minimum joint entropy obtained by the greedy algorithm is at most 1 bit away from the largest marginal maxiH(Xi)
  • 63. Results Shannon Entropy-based Identifiability H(E)/log(n) 0 0.2 0.4 0.6 0.8 1 ProbabilityofSuccess 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 Accuracy for Low Entropy E n = 4 n = 8 n = 16 n = 32 n = 64 n = 128 0 0.5 1 0.997 0.998 0.999 1 • Generate distributions of X,Y by randomly selecting f, X, E. • Probability of success is the fraction of points where H(X,E) < H(Y,N). • Larger n drives probability of success to 1 when $H(E) < log(n), supporting the conjecture.
  • 64. Characterization of Conditionals • Define conditional distribution • Let p = [p1 T, p2 T, …, pn T]T. Then Ex.: where M is a block partition matrix: Each block of length n is a partitioning of columns
  • 65. General Position Argument • Suppose Y|X = j are uniform over simplex (not realistic, toy example) • Note: Let xi ∼ exp(1). Then following is a uniform random vector over the simplex: • Drop n rows of p to make it (almost) i.i.d. • Claim: There does not exist an e with H0 < n(n-1) • Proof: Assume otherwise. • Rows of M are linearly dependent. • ∃ a such that aT M = 0 • Then aTp = 0 • Implies a random hyperplane being orthogonal to a vector, has probability 0.
  • 66. Our contribution • Nature chooses X, E, f. Joint distribution over X, Y implied • Choose X, E randomly over simplex. • Derive X|Y from induced joint • Any ⫫ Y for which X = g(Y, ) implies • Corresponds to a non-zero polynomial being zero, has probability 0.
  • 67. Formal Result • X, Y discrete r.v.’s with cardinality n • Y = f(X,E) where E ⫫ X is also discrete • f is generic (technical condition to avoid edge cases, true in real data) • Distribution vectors of X, E uniformly randomly sampled from simplex • Then with probability 1, there does not exist N ⫫ Y such that there exist g that satisfies X = g(Y, N)
  • 68. Working with Shannon Entropy • Given Y|X, finding E with minimum Shannon entropy such that there is f that satisfies Y = f(X,E) is equivalent to • Given marginal distributions of n variables Xi, find the joint distribution with minimum entropy • NP hard problem. • We propose a greedy algorithm (that produces a local optimum)