SlideShare a Scribd company logo
1 of 146
Download to read offline
The ABC of ABC
Christian P. Robert
(U. Paris Dauphine PSL & Warwick U.)
JSM 2019, Denver, Colorado
Not to be confused with...
The XKCD of ABC
Christian P. Robert
(U. Paris Dauphine PSL & Warwick U.)
JSM 2019, Denver, Colorado
Thanks
Jean-Marie Cornuet, Arnaud Estoup (INRA
Montpellier)
David Frazier, Gael Martin (Monash U)
Pierre Jacob, Natesh Pillai (Harvard U)
Kerrie Mengersen (QUT)
Jean-Michel Marin (U Montpellier)
Judith Rousseau (U Oxford)
Robin Ryder, Julien Stoehr (U Paris Dauphine)
Outline
1 Motivation
2 Concept
3 Implementation
4 Convergence
5 Advanced topics
Posterior distribution
Central concept of Bayesian inference:
π( θ
parameter
| xobs
observation
) ∝ π(θ)
prior
× L(θ | xobs
)
likelihood
Drives
derivation of optimal decisions
assessment of uncertainty
model selection
[McElreath, 2015]
Monte Carlo representation
Exploration of posterior π(θ|xobs) may require to produce
sample
θ1, . . . , θT
distributed from π(θ|xobs) (or asymptotically by Markov chain
Monte Carlo)
[McElreath, 2015]
Difficulties
MCMC = workhorse of practical Bayesian analysis (BUGS,
JAGS, Stan, &tc.), except when product
π(θ) × L(θ | xobs
)
well-defined but numerically unavailable or too costly to
compute
Only partial solutions are available:
demarginalisation (latent variables)
exchange algorithm (auxiliary variables)
pseudo-marginal (unbiased estimator)
Difficulties
MCMC = workhorse of practical Bayesian analysis (BUGS,
JAGS, Stan, &tc.), except when product
π(θ) × L(θ | xobs
)
well-defined but numerically unavailable or too costly to
compute
Only partial solutions are available:
demarginalisation (latent variables)
exchange algorithm (auxiliary variables)
pseudo-marginal (unbiased estimator)
Example 1: Dynamic mixture
Mixture model
{1 − wµ,τ(x)}fβ,λ(x) + wµ,τ(x)gε,σ(x) x > 0
where fβ,λ Weibull density, gε,σ generalised Pareto density, and
wµ,τ Cauchy cdf
Intractable normalising constant
C(µ, τ, β, λ, ε, σ) =
∞
0
{(1 − wµ,τ(x))fβ,λ(x) + wµ,τ(x)gε,σ(x)} dx
[Frigessi, Haug & Rue, 2002]
Example 2: truncated Normal
Given set A ⊂ Rk (k large), truncated Normal model
f(x | µ, Σ, A) ∝ exp{−(x − µ)T
Σ−1
(x − µ)/2} IA(x)
with intractable normalising constant
C(µ, Σ, A) =
A
exp{−(x − µ)T
Σ−1
(x − µ)/2}dx
Example 3: robust Normal statistics
Normal sample
x1, . . . , xn ∼ N(µ, σ2
)
summarised into (insufficient)
^µn = med(x1, . . . , xn)
^σn = mad(x1, . . . , xn) = med |xi − med(x1, . . . , xn)|
Under a conjugate prior π(µ, σ2), posterior close to intractable.
but simulation of (^µn, ^σn) straightforward
Example 3: robust Normal statistics
Normal sample
x1, . . . , xn ∼ N(µ, σ2
)
summarised into (insufficient)
^µn = med(x1, . . . , xn)
^σn = mad(x1, . . . , xn) = med |xi − med(x1, . . . , xn)|
Under a conjugate prior π(µ, σ2), posterior close to intractable.
but simulation of (^µn, ^σn) straightforward
Example 5: exponential random graph
ERGM: binary random vector x indexed
by all edges on set of nodes plus graph
f(x | θ) =
1
C(θ)
exp(θT
S(x))
with S(x) vector of statistics and C(θ)
intractable normalising constant
[Grelaud & al., 2009; Everitt, 2012; Bouranis & al., 2017]
Concept
2 Concept
algorithm
summary statistics
random forests
calibration of tolerance
3 Implementation
4 Convergence
5 Advanced topics
A?B?C?
A stands for approximate [wrong
likelihood]
B stands for Bayesian [right prior]
C stands for computation [producing
a parameter sample]
A?B?C?
Rough version of the data [from dot
to ball]
Non-parametric approximation of
the likelihood [near actual
observation]
Use of non-sufficient statistics
[dimension reduction]
Monte Carlo error [and no
unbiasedness]
A?B?C?
“Marin et al. (2012) proposed
Approximate Bayesian Computation
(ABC), which is more complex than
MCMC, but outputs cruder estimates”
[Heavens et al., 2017]
A seemingly na¨ıve representation
When likelihood f(x|θ) not in closed form, likelihood-free
rejection technique:
ABC algorithm
For an observation xobs ∼ f(x|θ), under the prior π(θ), keep
jointly simulating
θ ∼ π(θ) , z ∼ f(z|θ ) ,
until the auxiliary variable z is equal to the observed value,
z = xobs
[Diggle & Gratton, 1984; Rubin, 1984; Tavar´e et al., 1997]
A seemingly na¨ıve representation
When likelihood f(x|θ) not in closed form, likelihood-free
rejection technique:
ABC algorithm
For an observation xobs ∼ f(x|θ), under the prior π(θ), keep
jointly simulating
θ ∼ π(θ) , z ∼ f(z|θ ) ,
until the auxiliary variable z is equal to the observed value,
z = xobs
[Diggle & Gratton, 1984; Rubin, 1984; Tavar´e et al., 1997]
Example 3: robust Normal statistics
mu=rnorm(N<-1e6) #prior
sig=1/sqrt(rgamma(N,2,2))
medobs=median(obs) #summary
madobs=mad(obs)
for(t in diz<-1:N){
psud=rnorm(1e2)*sig[t]+mu[t]
medpsu=median(psud)-medobs
madpsu=mad(psud)-madobs
diz[t]=medpsuˆ2+madpsuˆ2}
#ABC sample
subz=which(diz<quantile(diz,.1))
plot(mu[subz],sig[subz])
Example 6: combinatorics
If one draws 11 socks out of m socks made of f orphans and g
pairs, with f + 2g = m, number k of socks from the orphan
group is hypergeometric H(11, m, f) and probability to observe
11 orphan socks total is
11
k=0
f
k
2g
11−k
m
11
×
211−k g
11−k
2g
11−k
Example 6: combinatorics
Given a prior distribution on nsocks and npairs,
nsocks ∼ Neg(30, 15) npairs|nsocks ∼ nsocks/2Be(15, 2)
possible to
1. generate new values
of nsocks and npairs,
2. generate a new
observation of X,
number of unique
socks out of 11.
3. accept the pair
(nsocks, npairs) if the
realisation of X is
equal to 11
Example 6: combinatorics
ns
Density
0 10 20 30 40 50 60
0.000.010.020.030.040.050.06
The outcome of this simulation method returns a distribution
on the pair (nsocks, npairs) that is the conditional distribution of
the pair given the observation X = 11
Why does it work?
The mathematical proof is trivial:
f(θi) ∝
z∈D
π(θi)f(z|θi)Iy(z)
∝ π(θi)f(y|θi)
= π(θi|y) .
[Accept–Reject 101]
But very impractical when
Pθ(Z = xobs
) ≈ 0
Why does it work?
The mathematical proof is trivial:
f(θi) ∝
z∈D
π(θi)f(z|θi)Iy(z)
∝ π(θi)f(y|θi)
= π(θi|y) .
[Accept–Reject 101]
But very impractical when
Pθ(Z = xobs
) ≈ 0
A as approximative
When y is a continuous random variable, strict equality
z = xobs is replaced with a tolerance zone
ρ(xobs
, z) ε
where ρ is a distance
Output distributed from
π(θ) Pθ{ρ(xobs
, z) < ε}
def
∝ π(θ|ρ(xobs
, z) < ε)
[Pritchard et al., 1999]
A as approximative
When y is a continuous random variable, strict equality
z = xobs is replaced with a tolerance zone
ρ(xobs
, z) ε
where ρ is a distance
Output distributed from
π(θ) Pθ{ρ(xobs
, z) < ε}
def
∝ π(θ|ρ(xobs
, z) < ε)
[Pritchard et al., 1999]
ABC algorithm
Algorithm 1 Likelihood-free rejection sampler
for i = 1 to N do
repeat
generate θ from prior π(·)
generate z from sampling density f(·|θ )
until ρ{η(z), η(xobs)} ε
set θi = θ
end for
where η(xobs) defines a (not necessarily sufficient) statistic
Custom: η(xobs) called Summary statistic
ABC algorithm
Algorithm 2 Likelihood-free rejection sampler
for i = 1 to N do
repeat
generate θ from prior π(·)
generate z from sampling density f(·|θ )
until ρ{η(z), η(xobs)} ε
set θi = θ
end for
where η(xobs) defines a (not necessarily sufficient) statistic
Custom: η(xobs) called Summary statistic
Exact ABC posterior
Algorithm samples from marginal in z of posterior
πABC
ε (θ, z|xobs
) =
π(θ)f(z|θ)IAε,xobs
(z)
Aε,xobs ×Θ π(θ)f(z|θ)dzdθ
,
where Aε,xobs = {z ∈ D|ρ(η(z), η(xobs)) < ε}.
Intuition that proper summary statistics coupled with small
tolerance ε = εη should provide good approximation of the
posterior distribution:
πABC
ε (θ|xobs
) = πABC
ε (θ, z|xobs
)dz ≈ π(θ|η(xobs
))
Exact ABC posterior
Algorithm samples from marginal in z of posterior
πABC
ε (θ, z|xobs
) =
π(θ)f(z|θ)IAε,xobs
(z)
Aε,xobs ×Θ π(θ)f(z|θ)dzdθ
,
where Aε,xobs = {z ∈ D|ρ(η(z), η(xobs)) < ε}.
Intuition that proper summary statistics coupled with small
tolerance ε = εη should provide good approximation of the
posterior distribution:
πABC
ε (θ|xobs
) = πABC
ε (θ, z|xobs
)dz ≈ π(θ|η(xobs
))
why summaries?
reduction of dimension
improvement of signal to noise ratio
reduce tolerance ε considerably
whole data may be unavailable
Example 7: MA inference
Moving average model MA(2):
xt = εt + θ1εt−1 + θ2εt−2 εt
iid
∼ N(0, 1)
Comparison of raw series:
Example 7: MA inference
Moving average model MA(2):
xt = εt + θ1εt−1 + θ2εt−2 εt
iid
∼ N(0, 1)
Comparison of acf’s:
Example 7: MA inference
Summary vs. raw:
why not summaries?
loss of sufficient information with πABC(θ|xobs) replaced
with πABC(θ|η(xobs))
arbitrariness of summaries
uncalibrated approximation
whole data may be available (at same cost as summaries)
distributions may be compared (Wasserstein distances)
[Bernton et al., 2019]
summary strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
summary strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
1. Starting from large collection of summary statistics, Joyce
and Marjoram (2008) consider the sequential inclusion into the
ABC target, with a stopping rule based on a likelihood ratio
test.
summary strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
2. Based on decision-theoretic principles, Fearnhead and
Prangle (2012) end up with E[θ|xobs] as the optimal summary
statistic
summary strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
3. Use of indirect inference by Drovandi, Pettit, & Paddy
(2011) with estimators of parameters of auxiliary model as
summary statistics.
summary strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
4. Starting from large collection of summary statistics, Raynal
& al. (2018, 2019) rely on random forests to build estimators
and select summaries
summary strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
5. Starting from large collection of summary statistics, Sedki &
Pudlo (2012) use the Lasso to eliminate summaries
Semi-automated ABC
Use of summary statistic η(·), importance proposal g(·), kernel
K(·) 1 with bandwidth h ↓ 0 such that
(θ, z) ∼ g(θ)f(z|θ)
accepted with probability (hence the bound)
K[{η(z) − η(xobs
)}/h]
and the corresponding importance weight defined by
π(θ) g(θ)
Theorem Optimality of posterior expectation E[θ|xobs] of
parameter of interest as summary statistics η(xobs)
[Fearnhead & Prangle, 2012; Sisson et al., 2019]
Semi-automated ABC
Use of summary statistic η(·), importance proposal g(·), kernel
K(·) 1 with bandwidth h ↓ 0 such that
(θ, z) ∼ g(θ)f(z|θ)
accepted with probability (hence the bound)
K[{η(z) − η(xobs
)}/h]
and the corresponding importance weight defined by
π(θ) g(θ)
Theorem Optimality of posterior expectation E[θ|xobs] of
parameter of interest as summary statistics η(xobs)
[Fearnhead & Prangle, 2012; Sisson et al., 2019]
Random forests
Technique that stemmed from Leo Breiman’s bagging (or
bootstrap aggregating) machine learning algorithm for both
classification [testing] and regression [estimation]
[Breiman, 1996]
Improved performances by averaging over classification schemes
of randomly generated training sets, creating a “forest” of
(CART) decision trees, inspired by Amit and Geman (1997)
ensemble learning
[Breiman, 2001]
Growing the forest
Breiman’s solution for inducing random features in the trees of
the forest:
boostrap resampling of the dataset and
random subset-ing [of size
√
t] of the covariates driving the
classification or regression at every node of each tree
Covariate (summary) xτ that drives the node separation
xτ cτ
and the separation bound cτ chosen by minimising entropy or
Gini index
ABC with random forests
Idea: Starting with
possibly large collection of summary statistics (η1, . . . , ηp)
(from scientific theory input to available statistical
softwares, methodologies, to machine-learning alternatives)
ABC reference table involving model index, parameter
values and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M or θ from (η1i, . . . , ηpi)
ABC with random forests
Idea: Starting with
possibly large collection of summary statistics (η1, . . . , ηp)
(from scientific theory input to available statistical
softwares, methodologies, to machine-learning alternatives)
ABC reference table involving model index, parameter
values and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M or θ from (η1i, . . . , ηpi)
at each step O(
√
p) indices sampled at random and most
discriminating statistic selected, by minimising Gini loss
ABC with random forests
Idea: Starting with
possibly large collection of summary statistics (η1, . . . , ηp)
(from scientific theory input to available statistical
softwares, methodologies, to machine-learning alternatives)
ABC reference table involving model index, parameter
values and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M or θ from (η1i, . . . , ηpi)
Average of the trees is resulting summary statistics, highly
non-linear predictor of the model index
ABC with random forests
Idea: Starting with
possibly large collection of summary statistics (η1, . . . , ηp)
(from scientific theory input to available statistical
softwares, methodologies, to machine-learning alternatives)
ABC reference table involving model index, parameter
values and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M or θ from (η1i, . . . , ηpi)
Potential selection of most active summaries, calibrated against
pure noise
Classification of summaries by random forests
Given large collection of summary statistics, rather than
selecting a subset and excluding the others, estimate each
parameter random forests
handles thousands of predictors
ignores useless components
fast estimation with good local properties
automatised with few calibration steps
substitute to Fearnhead and Prangle (2012) preliminary
estimation of ^θ(yobs)
includes a natural (classification) distance measure that
avoids choice of either distance or tolerance
[Marin et al., 2016]
Calibration of tolerance
Calibration of threshold ε
from scratch [how small is small?]
from k-nearest neighbour perspective [quantile of prior
predictive]
from asymptotics [convergence speed]
related with choice of distance [automated selection by
random forests]
[Fearnhead & Prangle, 1992; Wilkinson, 2013; Liu & Fearnhead 2018]
Implementation
3 Implementation
ABC-MCMC
ABC-PMC
post-processed ABC
4 Convergence
5 Advanced topics
Sofware
Several abc R packages for performing parameter estimation
and model selection
[Nunes & Prangle, 2017]
Sofware
Several abc R packages for performing parameter estimation
and model selection
[Nunes & Prangle, 2017]
abctools R package tuning ABC analyses
Sofware
Several abc R packages for performing parameter estimation
and model selection
[Nunes & Prangle, 2017]
abcrf R package ABC via random forests
Sofware
Several abc R packages for performing parameter estimation
and model selection
[Nunes & Prangle, 2017]
EasyABC R package several algorithms for performing
efficient ABC sampling schemes, including four sequential
sampling schemes and 3 MCMC schemes
Sofware
Several abc R packages for performing parameter estimation
and model selection
[Nunes & Prangle, 2017]
DIYABC non R software for population genetics
ABC-IS
Basic ABC algorithm suggests simulating from the prior
blind [no learning]
inefficient [curse of simension]
inapplicable to improper priors
Importance sampling version
importance density g(θ)
bounded kernel function Kh with bandwidth h
acceptance probability of
Kh{ρ[η(xobs
), η(x{θ})]} π(θ) g(θ) max
θ
Aθ
[Fearnhead & Prangle, 2012]
ABC-IS
Basic ABC algorithm suggests simulating from the prior
blind [no learning]
inefficient [curse of simension]
inapplicable to improper priors
Importance sampling version
importance density g(θ)
bounded kernel function Kh with bandwidth h
acceptance probability of
Kh{ρ[η(xobs
), η(x{θ})]} π(θ) g(θ) max
θ
Aθ
[Fearnhead & Prangle, 2012]
ABC-MCMC
Markov chain (θ(t)) created via transition function
θ(t+1)
=



θ ∼ Kω(θ |θ(t)) if x ∼ f(x|θ ) is such that x = y
and u ∼ U(0, 1) π(θ )Kω(θ(t)|θ )
π(θ(t))Kω(θ |θ(t))
,
θ(t) otherwise,
has the posterior π(θ|y) as stationary distribution
[Marjoram et al, 2003]
ABC-MCMC
Algorithm 3 Likelihood-free MCMC sampler
get (θ(0), z(0)) by Algorithm 1
for t = 1 to N do
generate θ from Kω ·|θ(t−1) , z from f(·|θ ), u from U[0,1],
if u π(θ )Kω(θ(t−1)|θ )
π(θ(t−1)Kω(θ |θ(t−1))
IAε,xobs
(z ) then
set (θ(t), z(t)) = (θ , z )
else
(θ(t), z(t))) = (θ(t−1), z(t−1)),
end if
end for
ABC-PMC
Generate a sample at iteration t by
^πt(θ(t)
) ∝
N
j=1
ω
(t−1)
j Kt(θ(t)
|θ
(t−1)
j )
modulo acceptance of the associated xt, with tolerance εt ↓, and
use importance weight associated with accepted simulation θ
(t)
i
ω
(t)
i ∝ π(θ
(t)
i ) ^πt(θ
(t)
i )
c Still likelihood free
[Sisson et al., 2007; Beaumont et al., 2009]
ABC-SMC
Use of a kernel Kt associated with target πεt and derivation of
the backward kernel
Lt−1(z, z ) =
πεt (z )Kt(z , z)
πεt (z)
Update of the weights
ω
(t)
i ∝ ω
(t−1)
i
M
m=1 IAεt
(x
(t)
im )
M
m=1 IAεt−1
(x
(t−1)
im )
when x
(t)
im ∼ Kt(x
(t−1)
i , ·)
[Del Moral, Doucet & Jasra, 2009]
ABC-NP
Better usage of [prior] simulations by
adjustement: instead of throwing
away θ such that ρ(η(z), η(xobs)) > ε,
replace θ’s with locally regressed
transforms
θ∗
= θ − {η(z) − η(xobs
)}T ^β [Csill´ery et al., TEE, 2010]
where ^β is obtained by [NP] weighted least square regression on
(η(z) − η(xobs)) with weights
Kδ ρ(η(z), η(xobs
))
[Beaumont et al., 2002, Genetics]
ABC-NP (regression)
Also found in the subsequent literature, e.g. in Fearnhead
& Prangle (2012): weight directly simulation by
Kδ ρ[η(z(θ)), η(xobs
)]
or
1
S
S
s=1
Kδ ρ[η(zs
(θ)), η(xobs
)]
[consistent estimate of f(η|θ)]
Curse of dimensionality: poor when d = dim(η) large
ABC-NP (regression)
Also found in the subsequent literature, e.g. in Fearnhead
& Prangle (2012): weight directly simulation by
Kδ ρ[η(z(θ)), η(xobs
)]
or
1
S
S
s=1
Kδ ρ[η(zs
(θ)), η(xobs
)]
[consistent estimate of f(η|θ)]
Curse of dimensionality: poor when d = dim(η) large
ABC-NP (density estimation)
Use of the kernel weights
Kδ ρ[η(z(θ)), η(xobs
)]
leads to the NP estimate of the posterior expectation
EABC
[θ | xobs
] ≈ i θiKδ ρ[η(z(θi)), η(xobs)]
i Kδ {ρ[η(z(θi)), η(xobs)]}
[Blum, JASA, 2010]
ABC-NP (density estimation)
Use of the kernel weights
Kδ ρ[η(z(θ)), η(xobs
)]
leads to the NP estimate of the posterior conditional density
πABC
[θ | xobs
] ≈ i
˜Kb(θi − θ)Kδ ρ[η(z(θi)), η(xobs)]
i Kδ {ρ[η(z(θi)), η(xobs)]}
[Blum, JASA, 2010]
ABC-NP (density estimations)
Other versions incorporating regression adjustments
i
˜Kb(θ∗
i − θ)Kδ ρ[η(z(θi)), η(xobs)]
i Kδ {ρ[η(z(θi)), η(xobs)]}
In all cases, error
E[^g(θ|xobs
)] − g(θ|xobs
) = cb2
+ cδ2
+ OP(b2
+ δ2
) + OP(1/nδd
)
var(^g(θ|xobs
)) =
c
nbδd
(1 + oP(1))
ABC-NP (density estimations)
Other versions incorporating regression adjustments
i
˜Kb(θ∗
i − θ)Kδ ρ[η(z(θi)), η(xobs)]
i Kδ {ρ[η(z(θi)), η(xobs)]}
In all cases, error
E[^g(θ|xobs
)] − g(θ|xobs
) = cb2
+ cδ2
+ OP(b2
+ δ2
) + OP(1/nδd
)
var(^g(θ|xobs
)) =
c
nbδd
(1 + oP(1))
[Blum, JASA, 2010]
ABC-NP (density estimations)
Other versions incorporating regression adjustments
i
˜Kb(θ∗
i − θ)Kδ ρ[η(z(θi)), η(xobs)]
i Kδ {ρ[η(z(θi)), η(xobs)]}
In all cases, error
E[^g(θ|xobs
)] − g(θ|xobs
) = cb2
+ cδ2
+ OP(b2
+ δ2
) + OP(1/nδd
)
var(^g(θ|xobs
)) =
c
nbδd
(1 + oP(1))
[standard NP calculations]
ABC-NCH
Incorporating non-linearities and heterocedasticities:
θ∗
= ^m(η(xobs
)) + [θ − ^m(η(z))]
^σ(η(xobs))
^σ(η(z))
where
^m(η) estimated by non-linear regression (e.g., neural
network)
^σ(η) estimated by non-linear regression on residuals
log{θi − ^m(ηi)}2
= log σ2
(ηi) + ξi
[Blum & Franc¸ois, 2009]
ABC-NCH
Incorporating non-linearities and heterocedasticities:
θ∗
= ^m(η(xobs
)) + [θ − ^m(η(z))]
^σ(η(xobs))
^σ(η(z))
where
^m(η) estimated by non-linear regression (e.g., neural
network)
^σ(η) estimated by non-linear regression on residuals
log{θi − ^m(ηi)}2
= log σ2
(ηi) + ξi
[Blum & Franc¸ois, 2009]
ABC-NCH
Why neural network?
fights curse of dimensionality
selects relevant summary statistics
provides automated dimension reduction
offers a model choice capability
improves upon multinomial logistic
[Blum & Franc¸ois, 2009]
Convergence
4 Convergence
ABC inference consistency
ABC model choice consistency
ABC under misspecification
5 Advanced topics
Asymptotics of ABC
Since πABC(· | xobs) is an approximation of π(· | xobs) or
π(· | η(xobs)) coherence of ABC-based inference need be
established on its own
[Li & Fearnhead, 2018a, 2018b; Frazier et al., 2018]
Meaning
establishing large sample (n) properties of ABC posteriors
and ABC procedures
finding sufficient conditions and checks on summary
statistics η(cdot)
determining proper rate of convergence of tolerance to 0 as
ε = εn
[mostly] ignoring Monte Carlo errors
Asymptotics of ABC
Since πABC(· | xobs) is an approximation of π(· | xobs) or
π(· | η(xobs)) coherence of ABC-based inference need be
established on its own
[Li & Fearnhead, 2018a, 2018b; Frazier et al., 2018]
Meaning
establishing large sample (n) properties of ABC posteriors
and ABC procedures
finding sufficient conditions and checks on summary
statistics η(cdot)
determining proper rate of convergence of tolerance to 0 as
ε = εn
[mostly] ignoring Monte Carlo errors
Consistency of ABC posteriors
ABC algorithm Bayesian consistent at θ0 if for any δ > 0,
Π θ − θ0 > δ|ρ{η(xobs
), η(Z} ε → 0
as n → +∞, ε → 0
Bayesian consistency implies that sets containing θ0 have
posterior probability tending to one as n → +∞, with
implication being the existence of a specific rate of
concentration
Consistency of ABC posteriors
ABC algorithm Bayesian consistent at θ0 if for any δ > 0,
Π θ − θ0 > δ|ρ{η(xobs
), η(Z} ε → 0
as n → +∞, ε → 0
Concentration around true value and Bayesian consistency
impose less stringent conditions on the convergence speed
of tolerance εn to zero, when compared with asymptotic
normality of ABC posterior
asymptotic normality of ABC posterior mean does not
require asymptotic normality of ABC posterior
Asymptotic setup
Assumptions:
asymptotic: xobs = xobs(n)
∼ Pn
θ0
and ε = εn, n → +∞
parametric: θ ∈ Rk, k fixed concentration of summary
statistic η(zn):
∃b : θ → b(θ) η(zn
) − b(θ) = oPθ
(1), ∀θ
identifiability of parameter b(θ) = b(θ ) when θ = θ
Consistency of ABC posteriors
Concentration of summary η(z): there exists b(θ) such that
η(z) − b(θ) = oPθ
(1)
Consistency:
Πεn θ − θ0 δ|η(xobs
) = 1 + op(1)
Convergence rate: there exists δn = o(1) such that
Πεn θ − θ0 δn|η(xobs
) = 1 + op(1)
Consistency of ABC posteriors
Consistency:
Πεn θ − θ0 δ|η(xobs
) = 1 + op(1)
Convergence rate: there exists δn = o(1) such that
Πεn θ − θ0 δn|η(xobs
) = 1 + op(1)
Point estimator consistency
^θε = EABC[θ|η(xobs(n)
)], EABC[θ|η(xobs(n)
)] − θ0 = op(1)
vn(EABC[θ|η(xobs(n)
)] − θ0) ⇒ N(0, v)
Related convergence results
Studies on the large sample properties of ABC, with focus the
asymptotic properties of ABC point estimators
[Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a,b]
assumption of CLT for summary statistic plus regularity
assumptions on convergence of its density to Normal limit
convergence rate of posterior mean if εT = o(1/v
3/5
T )
acceptance probability chosen as arbitrary density
η(xobs) − η(z)
Related convergence results
Studies on the large sample properties of ABC, with focus the
asymptotic properties of ABC point estimators
[Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a,b]
characterisation of asymptotic behavior of ABC posterior
for all εT = o(1) with posterior concentration
asymptotic normality and unbiasedness of posterior mean
remain achievable even when limT vT εT = ∞, provided
εT = o(1/v
1/2
T )
Related convergence results
Studies on the large sample properties of ABC, with focus the
asymptotic properties of ABC point estimators
[Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a,b]
if kη > kθ 1 and if εT = o(1/v
3/5
T ), posterior mean
asymptotically normal, unbiased, but asymptotically
inefficient
if vT εT → ∞ and vT ε2
T = o(1), for kη > kθ 1,
EΠε {vT (θ−θ0)} = { θb(θ0) θb(θ0)}−1
θb(θ0) vT {η(xobs
)−b(θ0)}+op(1).
Asymptotic shape of posterior distribution
Shape of
Π · | η(xobs
), η(z) εn
depending on relation between εn and rate σn at which η(xobsn
)
satisfy CLT
Three different regimes:
1. σn = o(εn) −→ Uniform limit
2. σn εn −→ perturbated Gaussian limit
3. σn εn −→ Gaussian limit
asymptotic behaviour of posterior mean
When p = dim(η(xobs)) = d = dim(θ) and εn = o(n−3/10)
EABC[dT (θ − θ0)|xobs
] ⇒ N(0, ( bo
)T
Σ−1
bo −1
[Li & Fearnhead (2018a)]
In fact, if εβ+1
n
√
n = o(1), with β H¨older-smoothness of π
EABC[(θ−θ0)|xobs
] =
( bo)−1Zo
√
n
+
k
j=1
hj(θ0)ε2j
n +op(1), 2k = β
[Fearnhead & Prangle, 2012]
asymptotic behaviour of posterior mean
When p = dim(η(xobs)) = d = dim(θ) and εn = o(n−3/10)
EABC[dT (θ − θ0)|xobs
] ⇒ N(0, ( bo
)T
Σ−1
bo −1
[Li & Fearnhead (2018a)]
Iterating for fixed p mildly interesting: if
˜η(xobs) = EABC[θ|xobs
]
then
EABC[θ|˜η(xobs
)] = θ0 +
( bo)−1Zo
√
n
+
π (θ0)
π(θ0)
ε2
n + o()
[Fearnhead & Prangle, 2012]
Curse of dimensionality
For reasonable statistical behavior, decline of acceptance
αT the faster the larger the dimension of θ, kθ, but
unaffected by dimension of η, kη
Theoretical justification for dimension reduction methods
that process parameter components individually and
independently of other components
[Fearnhead & Prangle, 2012; Martin & al., 2016]
importance sampling approach of Li & Fearnhead (2018a)
yields acceptance rates αT = O(1), when εT = O(1/vT )
Curse of dimensionality
For reasonable statistical behavior, decline of acceptance
αT the faster the larger the dimension of θ, kθ, but
unaffected by dimension of η, kη
Theoretical justification for dimension reduction methods
that process parameter components individually and
independently of other components
[Fearnhead & Prangle, 2012; Martin & al., 2016]
importance sampling approach of Li & Fearnhead (2018a)
yields acceptance rates αT = O(1), when εT = O(1/vT )
Monte Carlo error
Link the choice of εT to Monte Carlo error associated with NT
draws in ABC Algorithm
Conditions (on εT ) under which
^αT = αT {1 + op(1)}
where ^αT = NT
i=1 1l [d{η(y), η(z)} εT ] /NT proportion of
accepted draws from NT simulated draws of θ
Either
(i) εT = o(v−1
T ) and (vT εT )−kη ε−kθ
T MNT
or
(ii) εT v−1
T and ε−kθ
T MNT
for M large enough
Bayesian model choice
Several models M1, M2, . . . to be compared for dataset xobs
making model index M part of inference
Use of a prior distribution. π(M = m), plus a prior distribution
on the parameter conditional on the value m of the model
index, πm(θm)
Goal to derive the posterior distribution of M, challenging
computational target when models are complex
Generic ABC for model choice
Algorithm 4 Likelihood-free model choice sampler (ABC-MC)
for t = 1 to T do
repeat
Generate m from the prior π(M = m)
Generate θm from the prior πm(θm)
Generate z from the model fm(z|θm)
until ρ{η(z), η(xobs)} < ε
Set m(t) = m and θ(t) = θm
end for
[Cornuet et al., DIYABC, 2009]
ABC model choice consistency
Leaving approximations aside, limiting ABC procedure is Bayes
factor based on η(xobs)
B12(η(xobs
))
Potential loss of information at the testing level
[Robert et al., 2010]
When is Bayes factor based on insufficient statistic η(xobs)
consistent?
[Marin et al., 2013]
ABC model choice consistency
Leaving approximations aside, limiting ABC procedure is Bayes
factor based on η(xobs)
B12(η(xobs
))
Potential loss of information at the testing level
[Robert et al., 2010]
When is Bayes factor based on insufficient statistic η(xobs)
consistent?
[Marin et al., 2013]
Example 8: Gauss versus Laplace
Model M1: xobs ∼ N(θ1, 1)⊗n opposed to model M2:
xobs ∼ L(θ2, 1/
√
2)⊗n, Laplace distribution with mean θ2 and
variance one
Four possible statistics η(xobs)
1. sample mean xobs (sufficient for M1 if not M);
2. sample median med(xobs) (insufficient);
3. sample variance var(xobs) (ancillary);
4. median absolute deviation
mad(xobs) = med(|xobs − med(xobs)|);
Example 8: Gauss versus Laplace
Model M1: xobs ∼ N(θ1, 1)⊗n opposed to model M2:
xobs ∼ L(θ2, 1/
√
2)⊗n, Laplace distribution with mean θ2 and
variance one
q
q
q
q
q
q
q
q
q
q
q
Gauss Laplace
0.00.10.20.30.40.50.60.7
n=100
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Gauss Laplace
0.00.20.40.60.81.0
n=100
Consistency
Summary statistics
η(xobs
) = (τ1(xobs
), τ2(xobs
), · · · , τd(xobs
)) ∈ Rd
with true distribution η ∼ Gn, true mean µ0, and distribution
Gi,n under model Mi, corresponding posteriors pi2(· | ηn)
Assumptions of central limit theorem and large deviations for
eta(xobs) under true, plus usual Bayesian asymptotics with di
effective dimension of the parameter)
[Pillai et al., 2013]
Asymptotic marginals
Asymptotically
mi,n(t) =
Θi
gi,n(t|θi) πi(θi) dθi
such that
(i) if inf{|µi(θi) − µ0|; θi ∈ Θi} = 0,
Cl
√
n
d−di
mi,n(ηn
) Cu
√
n
d−di
and
(ii) if inf{|µi(θi) − µ0|; θi ∈ Θi} > 0
mi,n(ηn
) = o¶n [
√
n
d−τi
+
√
n
d−αi
].
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of ηn under
both models. And only by this mean value!
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of ηn under
both models. And only by this mean value!
Indeed, if
inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} = inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0
then
Cl
√
n
−(d1−d2)
m1,n(ηn
) m2(ηn
) Cu
√
n
−(d1−d2)
,
where Cl, Cu = O¶n (1), irrespective of the true model.
c Only depends on the difference d1 − d2: no consistency
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of ηn under
both models. And only by this mean value!
Else, if
inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} > inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0
then
m1,n(ηn)
m2,n(ηn)
Cu min
√
n
−(d1−α2)
,
√
n
−(d1−τ2)
Checking for adequate statistics
Run a practical check of the relevance (or non-relevance) of ηn
null hypothesis that both models are compatible with the
statistic ηn
H0 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} = 0
against
H1 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} > 0
testing procedure provides estimates of mean of ηn under each
model and checks for equality
ABC under misspecification
ABC methods rely on simulations z(θ) from the model to
identify those close to xobs
What is happening when the model is wrong?
for some tolerance sequences εn ↓ ε∗, well-behaved ABC
posteriors that concentrate posterior mass on pseudo-true
value
if εn too large, asymptotic limit of ABC posterior uniform
with radius of order εn − ε∗
even if
√
n{εn − ε∗} → 2c ∈ R, limiting distribution no
longer Gaussian
ABC credible sets invalid confidence sets
ABC under misspecification
ABC methods rely on simulations z(θ) from the model to
identify those close to xobs
What is happening when the model is wrong?
for some tolerance sequences εn ↓ ε∗, well-behaved ABC
posteriors that concentrate posterior mass on pseudo-true
value
if εn too large, asymptotic limit of ABC posterior uniform
with radius of order εn − ε∗
even if
√
n{εn − ε∗} → 2c ∈ R, limiting distribution no
longer Gaussian
ABC credible sets invalid confidence sets
Example 9: Normal model with wrong variance
Assumed data generating process (DGP) is z ∼ N(θ, 1) but
actual DGP is xobs ∼ N(θ, ˜σ2)
Use of summaries
sample mean η1(xobs) = 1
n
n
i=1 xi
centered summary η2(xobs) = 1
n−1
n
i=1(xi − η1(xobs)2 − 1
Three ABC:
ABC-AR: accept/reject approach with
Kε(d{η(z), η(xobs)}) = I d{η(z), η(xobs)} ε and
d{x, y} = x − y
ABC-K: smooth rejection approach, with
Kε(d{η(z), η(xobs)}) univariate Gaussian kernel
ABC-Reg: post-processing ABC approach with weighted
linear regression adjustment
Example 9: Normal model with wrong variance
posterior means for ABC-AR, ABC-K and ABC-Reg as σ2
increases (N = 50, 000 simulated data sets)
αn = n−5/9 quantile for ABC-AR
ABC-K and ABC-Reg bandwidth of n−5/9
ABC misspecification
data xobs with true distribution P0 assumed issued from
model Pθ θ ∈ Θ ⊂ Rkθ and summary statistic
η(xobs) = (η1(xobs), ..., ηkη (xobs))
misspecification
inf
θ∈Θ
D(P0||Pθ) = inf
θ∈Θ
log
dP0(x)
dPθ(x)
dP0(y) > 0,
with
θ∗
= arg inf
θ∈Θ
D(P0||Pθ)
[Muller, 2013]
ABC misspecification:
for b0 (resp. b(θ)) limit of η(xobs) (resp. η(z))
inf
θ∈Θ
d{b0, b(θ)} > 0
ABC misspecification
data xobs with true distribution P0 assumed issued from
model Pθ θ ∈ Θ ⊂ Rkθ and summary statistic
η(xobs) = (η1(xobs), ..., ηkη (xobs))
ABC misspecification:
for b0 (resp. b(θ)) limit of η(xobs) (resp. η(z))
inf
θ∈Θ
d{b0, b(θ)} > 0
ABC pseudo-true value:
θ∗
= arg inf
θ∈Θ
d{b0, b(θ)}.
Minimum tolerance
Approximate nature of ABC means that D(P0||Pθ) yields no
meaningful notion of model misspecification associated with the
output of ABC posterior distributions
Under identification conditions on b(·) ∈ Rkη , there exists ε∗
such that
ε∗
= inf
θ∈Θ
d{b0, b(θ)} > 0
Once εn < ε∗ no draw of θ to be selected and posterior
Πε[A|η(xobs)] ill-conditioned
But appropriately chosen tolerance sequence (εn)n allows
ABC-based posterior to concentrate on distance-dependent
pseudo-true value θ∗
Minimum tolerance
Approximate nature of ABC means that D(P0||Pθ) yields no
meaningful notion of model misspecification associated with the
output of ABC posterior distributions
Under identification conditions on b(·) ∈ Rkη , there exists ε∗
such that
ε∗
= inf
θ∈Θ
d{b0, b(θ)} > 0
Once εn < ε∗ no draw of θ to be selected and posterior
Πε[A|η(xobs)] ill-conditioned
But appropriately chosen tolerance sequence (εn)n allows
ABC-based posterior to concentrate on distance-dependent
pseudo-true value θ∗
Minimum tolerance
Approximate nature of ABC means that D(P0||Pθ) yields no
meaningful notion of model misspecification associated with the
output of ABC posterior distributions
Under identification conditions on b(·) ∈ Rkη , there exists ε∗
such that
ε∗
= inf
θ∈Θ
d{b0, b(θ)} > 0
Once εn < ε∗ no draw of θ to be selected and posterior
Πε[A|η(xobs)] ill-conditioned
But appropriately chosen tolerance sequence (εn)n allows
ABC-based posterior to concentrate on distance-dependent
pseudo-true value θ∗
Minimum tolerance
Approximate nature of ABC means that D(P0||Pθ) yields no
meaningful notion of model misspecification associated with the
output of ABC posterior distributions
Under identification conditions on b(·) ∈ Rkη , there exists ε∗
such that
ε∗
= inf
θ∈Θ
d{b0, b(θ)} > 0
Once εn < ε∗ no draw of θ to be selected and posterior
Πε[A|η(xobs)] ill-conditioned
But appropriately chosen tolerance sequence (εn)n allows
ABC-based posterior to concentrate on distance-dependent
pseudo-true value θ∗
ABC concentration under misspecification
Assumptions
(A0) Existence of unique b0 such that d(η(xobs), b0) = oP0
(1)
and of sequence v0,n → +∞ such that
lim inf
n→+∞
P0 d(η(xobs
n ), b0) v−1
0,n = 1.
ABC concentration under misspecification
Assumptions
(A1) Existence of injective map b : Θ → B ⊂ Rkη and function
ρn with ρn(·) ↓ 0 as n → +∞, and ρn(·) non-increasing,
such that
Pθ [d(η(Z), b(θ)) > u] c(θ)ρn(u),
Θ
c(θ)dΠ(θ) < ∞
and assume either
(i) Polynomial deviations: existence of vn ↑ +∞ and
u0, κ > 0 such that ρn(u) = v−κ
n u−κ, for u u0
(ii) Exponential deviations:
ABC concentration under misspecification
Assumptions
(A1) Existence of injective map b : Θ → B ⊂ Rkη and function
ρn with ρn(·) ↓ 0 as n → +∞, and ρn(·) non-increasing,
such that
Pθ [d(η(Z), b(θ)) > u] c(θ)ρn(u),
Θ
c(θ)dΠ(θ) < ∞
and assume either
(i) Polynomial deviations:
(ii) Exponential deviations: existence of hθ(·) > 0 such
that Pθ[d(η(z), b(θ)) > u] c(θ)e−hθ(uvn) and
existence of m, C > 0 such that
Θ
c(θ)e−hθ(uvn)
dΠ(θ) Ce−m·(uvn)τ
, for u u0.
ABC concentration under misspecification
Assumptions
(A2) existence of D > 0 and M0, δ0 > 0 such that, for all
δ0 δ > 0 and M M0, existence of
Sδ ⊂ {θ ∈ Θ : d(b(θ), b0) − ε∗ δ} for which
(i) In case (i), D < κ and Sδ
1 − c(θ)
M dΠ(θ) δD.
(ii) In case (ii), Sδ
1 − c(θ)e−hθ(M) dΠ(θ) δD.
Consistency
Assume (A0), [A1] and [A2], with εn ↓ ε∗ with
εn ε∗
+ Mv−1
n + v−1
0,n,
for M large enough. Let Mn ↑ ∞ and δn Mn(εn − ε∗), then
Πε d(b(θ), b0) ε∗
+ δn|η(xobs
) = oP0
(1),
1. if δn Mnv−1
n u
−D/κ
n = o(1) in case (i)
2. if δn Mnv−1
n | log(un)|1/τ = o(1) in case (ii)
with un = εn − (ε∗ + Mv−1
n + v−1
0,n) 0 .
[Bernton et al., 2017; Frazier et al., 2019]
Regression adjustement under misspecification
Original accepted value θ artificially related to η(xobs) and η(z)
through local linear regression model
θ = µ + β {η(xobs
) − η(z)} + ν,
where νi model residual
[Beaumont et al., 2003]
Asymptotic behavior of ABC-Reg posterior
Πε[·|η(xobs
)]
determined by behavior of
Πε[·|η(xobs
)], ^β, and {η(xobs
) − η(z)}
Regression adjustement under misspecification
Original accepted value θ artificially related to η(xobs) and η(z)
through local linear regression model
θ = µ + β {η(xobs
) − η(z)} + ν,
where νi model residual
[Beaumont et al., 2003]
Asymptotic behavior of ABC-Reg posterior
Πε[·|η(xobs
)]
determined by behavior of
Πε[·|η(xobs
)], ^β, and {η(xobs
) − η(z)}
Regression adjustement under misspecification
ABC-Reg takes draws of asymptotically optimal θ and
perturbs them in a manner that need not preserve
optimality of the original draws
for β0 large, pseudo-true value ˜θ∗ possibly outside Θ
extends to nonlinear regression adjustments (Blum
& Fran¸cois, 2010)
potential correction of the adjustment (Frazier et al., 2019)
all local regression adjustments have much smaller
posterior variability than ABC-AR but false sense of the
procedures precision
Example 11: misspecified g-and-k
Quantile function of g-and-k model:
F−1
(q) ∈ (0, 1) → a+b 1 + 0.8
1 − exp(−gz(q)
1 + exp(−gz(q)
1 + z(q)2 k
z(q),
where z(q) q-th N(0, 1) quantile
Data generated from a mixture distribution with minor
bi-modality
Example 11: misspecified g-and-k
Advanced topics
5 Advanced topics
big data issues
big parameter issues
computational bottleneck
Time per iteration increases with sample size n of the data:
cost of sampling associated with a reasonable acceptance
probability makes ABC infeasible for large datasets
surrogate models to get samples (e.g., using copulas)
direct sampling of summary statistics (e.g., synthetic
likelihood)
borrow from proposals for scalable MCMC e.g., divide
& conquer)
Approximate ABC [AABC]
Idea approximations on both parameter and model spaces by
resorting to bootstrap techniques.
[Buzbas & Rosenberg, 2015]
Procedure scheme
1. Sample (θi, xi), i = 1, . . . , m, from prior predictive
2. Simulate θ∗ ∼ π(·) and assign weight wi to dataset x(i)
simulated under k-closest θi to θ∗ by Epanechnikov kernel
3. Generate dataset x∗ as bootstrap weighted sample from
(x(1), . . . , x(k))
Drawbacks
If m too small, prior predictive sample may miss
informative parameters
large error and misleading representation of true posterior
only well-suited for models with very few parameters
Approximate ABC [AABC]
Procedure scheme
1. Sample (θi, xi), i = 1, . . . , m, from prior predictive
2. Simulate θ∗ ∼ π(·) and assign weight wi to dataset x(i)
simulated under k-closest θi to θ∗ by Epanechnikov kernel
3. Generate dataset x∗ as bootstrap weighted sample from
(x(1), . . . , x(k))
Drawbacks
If m too small, prior predictive sample may miss
informative parameters
large error and misleading representation of true posterior
only well-suited for models with very few parameters
Divide-and-conquer perspectives
1. divide the large dataset into smaller batches
2. sample from the batch posterior
3. combine the result to get a sample from the targeted
posterior
Alternative via ABC-EP
[Barthelm´e & Chopin, 2014]
Divide-and-conquer perspectives
1. divide the large dataset into smaller batches
2. sample from the batch posterior
3. combine the result to get a sample from the targeted
posterior
Alternative via ABC-EP
[Barthelm´e & Chopin, 2014]
Geometric combination: WASP
Subset posteriors given partition xobs
[1] , . . . , xobs
[B] of the
observed data xobs, let define
π(θ | xobs
[b] ) ∝ π(θ)
j∈[b]
f(xobs
j | θ)B
.
[Srivastava et al., 2015]
Subset posteriors are combined using the Wasserstein
barycenter
[Cuturi, 2014]
Drawback require sampling from f(· | θ)B by ABC means.
Should be feasible for latent variable (z) representations when
f(x | z, θ) available in closed form
[Doucet & Robert, 2001]
Geometric combination: WASP
Subset posteriors given partition xobs
[1] , . . . , xobs
[B] of the
observed data xobs, let define
π(θ | xobs
[b] ) ∝ π(θ)
j∈[b]
f(xobs
j | θ)B
.
[Srivastava et al., 2015]
Subset posteriors are combined using the Wasserstein
barycenter
[Cuturi, 2014]
Alternative backfeed subset posteriors as priors to other
subsets
Consensus ABC
Na¨ıve scheme
For each data batch b = 1, . . . , B
1. Sample (θ
[b]
1 , . . . , θ
[b]
n ) from diffused prior
˜π(·) ∝ π(·)1/B
2. Run ABC to sample from batch posterior
^π(· | d(S(xobs
[b] ), S(x[b])) < ε)
3. Compute sample posterior variance Σb
Combine batch posterior approximations
θj =
B
b=1
Σb
−1 B
b=1
Σbθ
[b]
j .
Remark Diffuse prior quite non informative which call for
ABC-MCMC to avoid sampling from ˜π(·).
Consensus ABC
Na¨ıve scheme
For each data batch b = 1, . . . , B
1. Sample (θ
[b]
1 , . . . , θ
[b]
n ) from diffused prior
˜π(·) ∝ π(·)1/B
2. Run ABC to sample from batch posterior
^π(· | d(S(xobs
[b] ), S(x[b])) < ε)
3. Compute sample posterior variance Σb
Combine batch posterior approximations
θj =
B
b=1
Σb
−1 B
b=1
Σbθ
[b]
j .
Remark Diffuse prior quite non informative which call for
ABC-MCMC to avoid sampling from ˜π(·).
Big parameter issues
Curse of dimensionality: as dim(Θ) increases
exploration of parameter space gets harder
summary statistic η forced to increase, since at least of
dimension dim(Θ)
Some solutions
adopt more local algorithms like ABC-MCMCM or
ABC-SMC
aim at posterior marginals and approximate joint posterior
by copula
[Li et al., 2016]
run ABC-Gibbs
[Clart´e et al., 2016]
Example 9: Hierarchical MA(2)
α = (α1, α2, α3), with prior E(1)⊗3
σ = (σ1, σ2) with prior C(1)+⊗2
µi = (βi,1 − βi,2, 2(βi,1 + βi,2) − 1) with
(βi,1, βi,2, βi,3)
iid
∼ Dir(α1, α2, α3)
σi
iid
∼ IG(σ1, σ2)
xi
iid
∼ MA2(µi, σi)
α
µ1 µ2 µn. . .
x1 x2 xn. . .
σ
σ1 σ2 σn. . .
Example 9: Hierarchical MA(2)
c Based on 106 prior and 103 posteriors simulations, 3n
summary statistics, and series of length 100, ABC posterior
hardly distinguishable from prior!
ABC-Gibbs
When parameter decomposed into θ = (θ1, . . . , θn)
Algorithm 5 ABC-Gibbs sampler
starting point θ(0) = (θ
(0)
1 , . . . , θ
(0)
n ), observation xobs
for i = 1, . . . , N do
for j = 1, . . . , n do
θ
(i)
j ∼ πεj
(· | x , sj, θ
(i)
1 , . . . , θ
(i)
j−1, θ
(i−1)
j+1 , . . . , θ
(i−1)
n )
end for
end for
Divide & conquer:
one tolerance εj for each parameter θj
one statistic sj for each parameter θj
[Clart´e et al., 2016]
ABC-Gibbs
When parameter decomposed into θ = (θ1, . . . , θn)
Algorithm 6 ABC-Gibbs sampler
starting point θ(0) = (θ
(0)
1 , . . . , θ
(0)
n ), observation xobs
for i = 1, . . . , N do
for j = 1, . . . , n do
θ
(i)
j ∼ πεj
(· | x , sj, θ
(i)
1 , . . . , θ
(i)
j−1, θ
(i−1)
j+1 , . . . , θ
(i−1)
n )
end for
end for
Divide & conquer:
one tolerance εj for each parameter θj
one statistic sj for each parameter θj
[Clart´e et al., 2016]
Compatibility
When using ABC-Gibbs conditionals with different acceptance
events, e.g., different statistics
π(α)π(sα(µ) | α) and π(µ)f(sµ(x ) | α, µ).
conditionals are incompatible
ABC-Gibbs does not necessarily converge (even for
tolerance equal to zero)
potential limiting distribution
not a genuine posterior (double use of data)
unknown [except for a specific version]
possibly far from genuine posterior(s)
[Clart´e et al., 2016]
Compatibility
When using ABC-Gibbs conditionals with different acceptance
events, e.g., different statistics
π(α)π(sα(µ) | α) and π(µ)f(sµ(x ) | α, µ).
conditionals are incompatible
ABC-Gibbs does not necessarily converge (even for
tolerance equal to zero)
potential limiting distribution
not a genuine posterior (double use of data)
unknown [except for a specific version]
possibly far from genuine posterior(s)
[Clart´e et al., 2016]
Convergence
In the hierarchical case n = 2,
Theorem
If there exists 0 < κ < 1/2 such that
sup
θ1,˜θ1
πε2
(· | x , s2, θ1) − πε2
(· | x , s2, ˜θ1) TV = κ
the ABCG Markov chain geometrically converges in total
variation to a stationary distribution νε, with geometric rate
1 − 2κ.
Example 9: Hierarchical MA(2)
Separation from the prior for identical number of simulations
ABC sessions at JSM [and further]
#107 - The ABC of Making an Impact, Monday
8:30-10:20am
#???
[A]BayesComp, Jan 7-10 2020
ABC in Grenoble, March 18-19 2020
ABC in Longyearbyen, April 8-9 2021
ABC sessions at JSM [and further]
#107 - The ABC of Making an Impact, Monday
8:30-10:20am
#???
[A]BayesComp, Jan 7-10 2020
ABC in Grenoble, March 18-19 2020
ABC in Longyearbyen, April 8-9 2021
ABC postdoc positions
2 post-doc positions with the ABSint research grant:
Focus on approximate Bayesian techniques like ABC,
variational Bayes, PAC-Bayes, Bayesian non-parametrics,
scalable MCMC, and related topics. A potential direction
of research would be the derivation of new Bayesian tools
for model checking in such complex environments.
Terms: up to 24 months, no teaching duty attached,
primarily located in Universit´e Paris-Dauphine, with
supported periods in Oxford (J. Rousseau) and visits to
Montpellier (J.-M. Marin). No hard deadline.
If interested, send application to
bayesianstatistics@gmail.com

More Related Content

What's hot

Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Christian Robert
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsChristian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chaptersChristian Robert
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methodsChristian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapterChristian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chaptersChristian Robert
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapterChristian Robert
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 

What's hot (20)

Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified models
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chapters
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chapters
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 

Similar to the ABC of ABC

Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMINGChristian Robert
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelMatt Moores
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingFrank Nielsen
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10FredrikRonquist
 
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapConfidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapFrancesco Casalegno
 

Similar to the ABC of ABC (20)

Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts model
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapConfidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
 
sada_pres
sada_pressada_pres
sada_pres
 

More from Christian Robert

Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimationChristian Robert
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modulesChristian Robert
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplerChristian Robert
 

More from Christian Robert (11)

discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle sampler
 

Recently uploaded

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 

the ABC of ABC

  • 1. The ABC of ABC Christian P. Robert (U. Paris Dauphine PSL & Warwick U.) JSM 2019, Denver, Colorado
  • 2. Not to be confused with... The XKCD of ABC Christian P. Robert (U. Paris Dauphine PSL & Warwick U.) JSM 2019, Denver, Colorado
  • 3. Thanks Jean-Marie Cornuet, Arnaud Estoup (INRA Montpellier) David Frazier, Gael Martin (Monash U) Pierre Jacob, Natesh Pillai (Harvard U) Kerrie Mengersen (QUT) Jean-Michel Marin (U Montpellier) Judith Rousseau (U Oxford) Robin Ryder, Julien Stoehr (U Paris Dauphine)
  • 4. Outline 1 Motivation 2 Concept 3 Implementation 4 Convergence 5 Advanced topics
  • 5. Posterior distribution Central concept of Bayesian inference: π( θ parameter | xobs observation ) ∝ π(θ) prior × L(θ | xobs ) likelihood Drives derivation of optimal decisions assessment of uncertainty model selection [McElreath, 2015]
  • 6. Monte Carlo representation Exploration of posterior π(θ|xobs) may require to produce sample θ1, . . . , θT distributed from π(θ|xobs) (or asymptotically by Markov chain Monte Carlo) [McElreath, 2015]
  • 7. Difficulties MCMC = workhorse of practical Bayesian analysis (BUGS, JAGS, Stan, &tc.), except when product π(θ) × L(θ | xobs ) well-defined but numerically unavailable or too costly to compute Only partial solutions are available: demarginalisation (latent variables) exchange algorithm (auxiliary variables) pseudo-marginal (unbiased estimator)
  • 8. Difficulties MCMC = workhorse of practical Bayesian analysis (BUGS, JAGS, Stan, &tc.), except when product π(θ) × L(θ | xobs ) well-defined but numerically unavailable or too costly to compute Only partial solutions are available: demarginalisation (latent variables) exchange algorithm (auxiliary variables) pseudo-marginal (unbiased estimator)
  • 9. Example 1: Dynamic mixture Mixture model {1 − wµ,τ(x)}fβ,λ(x) + wµ,τ(x)gε,σ(x) x > 0 where fβ,λ Weibull density, gε,σ generalised Pareto density, and wµ,τ Cauchy cdf Intractable normalising constant C(µ, τ, β, λ, ε, σ) = ∞ 0 {(1 − wµ,τ(x))fβ,λ(x) + wµ,τ(x)gε,σ(x)} dx [Frigessi, Haug & Rue, 2002]
  • 10. Example 2: truncated Normal Given set A ⊂ Rk (k large), truncated Normal model f(x | µ, Σ, A) ∝ exp{−(x − µ)T Σ−1 (x − µ)/2} IA(x) with intractable normalising constant C(µ, Σ, A) = A exp{−(x − µ)T Σ−1 (x − µ)/2}dx
  • 11. Example 3: robust Normal statistics Normal sample x1, . . . , xn ∼ N(µ, σ2 ) summarised into (insufficient) ^µn = med(x1, . . . , xn) ^σn = mad(x1, . . . , xn) = med |xi − med(x1, . . . , xn)| Under a conjugate prior π(µ, σ2), posterior close to intractable. but simulation of (^µn, ^σn) straightforward
  • 12. Example 3: robust Normal statistics Normal sample x1, . . . , xn ∼ N(µ, σ2 ) summarised into (insufficient) ^µn = med(x1, . . . , xn) ^σn = mad(x1, . . . , xn) = med |xi − med(x1, . . . , xn)| Under a conjugate prior π(µ, σ2), posterior close to intractable. but simulation of (^µn, ^σn) straightforward
  • 13. Example 5: exponential random graph ERGM: binary random vector x indexed by all edges on set of nodes plus graph f(x | θ) = 1 C(θ) exp(θT S(x)) with S(x) vector of statistics and C(θ) intractable normalising constant [Grelaud & al., 2009; Everitt, 2012; Bouranis & al., 2017]
  • 14. Concept 2 Concept algorithm summary statistics random forests calibration of tolerance 3 Implementation 4 Convergence 5 Advanced topics
  • 15. A?B?C? A stands for approximate [wrong likelihood] B stands for Bayesian [right prior] C stands for computation [producing a parameter sample]
  • 16. A?B?C? Rough version of the data [from dot to ball] Non-parametric approximation of the likelihood [near actual observation] Use of non-sufficient statistics [dimension reduction] Monte Carlo error [and no unbiasedness]
  • 17. A?B?C? “Marin et al. (2012) proposed Approximate Bayesian Computation (ABC), which is more complex than MCMC, but outputs cruder estimates” [Heavens et al., 2017]
  • 18. A seemingly na¨ıve representation When likelihood f(x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation xobs ∼ f(x|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f(z|θ ) , until the auxiliary variable z is equal to the observed value, z = xobs [Diggle & Gratton, 1984; Rubin, 1984; Tavar´e et al., 1997]
  • 19. A seemingly na¨ıve representation When likelihood f(x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation xobs ∼ f(x|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f(z|θ ) , until the auxiliary variable z is equal to the observed value, z = xobs [Diggle & Gratton, 1984; Rubin, 1984; Tavar´e et al., 1997]
  • 20. Example 3: robust Normal statistics mu=rnorm(N<-1e6) #prior sig=1/sqrt(rgamma(N,2,2)) medobs=median(obs) #summary madobs=mad(obs) for(t in diz<-1:N){ psud=rnorm(1e2)*sig[t]+mu[t] medpsu=median(psud)-medobs madpsu=mad(psud)-madobs diz[t]=medpsuˆ2+madpsuˆ2} #ABC sample subz=which(diz<quantile(diz,.1)) plot(mu[subz],sig[subz])
  • 21. Example 6: combinatorics If one draws 11 socks out of m socks made of f orphans and g pairs, with f + 2g = m, number k of socks from the orphan group is hypergeometric H(11, m, f) and probability to observe 11 orphan socks total is 11 k=0 f k 2g 11−k m 11 × 211−k g 11−k 2g 11−k
  • 22. Example 6: combinatorics Given a prior distribution on nsocks and npairs, nsocks ∼ Neg(30, 15) npairs|nsocks ∼ nsocks/2Be(15, 2) possible to 1. generate new values of nsocks and npairs, 2. generate a new observation of X, number of unique socks out of 11. 3. accept the pair (nsocks, npairs) if the realisation of X is equal to 11
  • 23. Example 6: combinatorics ns Density 0 10 20 30 40 50 60 0.000.010.020.030.040.050.06 The outcome of this simulation method returns a distribution on the pair (nsocks, npairs) that is the conditional distribution of the pair given the observation X = 11
  • 24. Why does it work? The mathematical proof is trivial: f(θi) ∝ z∈D π(θi)f(z|θi)Iy(z) ∝ π(θi)f(y|θi) = π(θi|y) . [Accept–Reject 101] But very impractical when Pθ(Z = xobs ) ≈ 0
  • 25. Why does it work? The mathematical proof is trivial: f(θi) ∝ z∈D π(θi)f(z|θi)Iy(z) ∝ π(θi)f(y|θi) = π(θi|y) . [Accept–Reject 101] But very impractical when Pθ(Z = xobs ) ≈ 0
  • 26. A as approximative When y is a continuous random variable, strict equality z = xobs is replaced with a tolerance zone ρ(xobs , z) ε where ρ is a distance Output distributed from π(θ) Pθ{ρ(xobs , z) < ε} def ∝ π(θ|ρ(xobs , z) < ε) [Pritchard et al., 1999]
  • 27. A as approximative When y is a continuous random variable, strict equality z = xobs is replaced with a tolerance zone ρ(xobs , z) ε where ρ is a distance Output distributed from π(θ) Pθ{ρ(xobs , z) < ε} def ∝ π(θ|ρ(xobs , z) < ε) [Pritchard et al., 1999]
  • 28. ABC algorithm Algorithm 1 Likelihood-free rejection sampler for i = 1 to N do repeat generate θ from prior π(·) generate z from sampling density f(·|θ ) until ρ{η(z), η(xobs)} ε set θi = θ end for where η(xobs) defines a (not necessarily sufficient) statistic Custom: η(xobs) called Summary statistic
  • 29. ABC algorithm Algorithm 2 Likelihood-free rejection sampler for i = 1 to N do repeat generate θ from prior π(·) generate z from sampling density f(·|θ ) until ρ{η(z), η(xobs)} ε set θi = θ end for where η(xobs) defines a (not necessarily sufficient) statistic Custom: η(xobs) called Summary statistic
  • 30. Exact ABC posterior Algorithm samples from marginal in z of posterior πABC ε (θ, z|xobs ) = π(θ)f(z|θ)IAε,xobs (z) Aε,xobs ×Θ π(θ)f(z|θ)dzdθ , where Aε,xobs = {z ∈ D|ρ(η(z), η(xobs)) < ε}. Intuition that proper summary statistics coupled with small tolerance ε = εη should provide good approximation of the posterior distribution: πABC ε (θ|xobs ) = πABC ε (θ, z|xobs )dz ≈ π(θ|η(xobs ))
  • 31. Exact ABC posterior Algorithm samples from marginal in z of posterior πABC ε (θ, z|xobs ) = π(θ)f(z|θ)IAε,xobs (z) Aε,xobs ×Θ π(θ)f(z|θ)dzdθ , where Aε,xobs = {z ∈ D|ρ(η(z), η(xobs)) < ε}. Intuition that proper summary statistics coupled with small tolerance ε = εη should provide good approximation of the posterior distribution: πABC ε (θ|xobs ) = πABC ε (θ, z|xobs )dz ≈ π(θ|η(xobs ))
  • 32. why summaries? reduction of dimension improvement of signal to noise ratio reduce tolerance ε considerably whole data may be unavailable
  • 33. Example 7: MA inference Moving average model MA(2): xt = εt + θ1εt−1 + θ2εt−2 εt iid ∼ N(0, 1) Comparison of raw series:
  • 34. Example 7: MA inference Moving average model MA(2): xt = εt + θ1εt−1 + θ2εt−2 εt iid ∼ N(0, 1) Comparison of acf’s:
  • 35. Example 7: MA inference Summary vs. raw:
  • 36. why not summaries? loss of sufficient information with πABC(θ|xobs) replaced with πABC(θ|η(xobs)) arbitrariness of summaries uncalibrated approximation whole data may be available (at same cost as summaries) distributions may be compared (Wasserstein distances) [Bernton et al., 2019]
  • 37. summary strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field]
  • 38. summary strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 1. Starting from large collection of summary statistics, Joyce and Marjoram (2008) consider the sequential inclusion into the ABC target, with a stopping rule based on a likelihood ratio test.
  • 39. summary strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 2. Based on decision-theoretic principles, Fearnhead and Prangle (2012) end up with E[θ|xobs] as the optimal summary statistic
  • 40. summary strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 3. Use of indirect inference by Drovandi, Pettit, & Paddy (2011) with estimators of parameters of auxiliary model as summary statistics.
  • 41. summary strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 4. Starting from large collection of summary statistics, Raynal & al. (2018, 2019) rely on random forests to build estimators and select summaries
  • 42. summary strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 5. Starting from large collection of summary statistics, Sedki & Pudlo (2012) use the Lasso to eliminate summaries
  • 43. Semi-automated ABC Use of summary statistic η(·), importance proposal g(·), kernel K(·) 1 with bandwidth h ↓ 0 such that (θ, z) ∼ g(θ)f(z|θ) accepted with probability (hence the bound) K[{η(z) − η(xobs )}/h] and the corresponding importance weight defined by π(θ) g(θ) Theorem Optimality of posterior expectation E[θ|xobs] of parameter of interest as summary statistics η(xobs) [Fearnhead & Prangle, 2012; Sisson et al., 2019]
  • 44. Semi-automated ABC Use of summary statistic η(·), importance proposal g(·), kernel K(·) 1 with bandwidth h ↓ 0 such that (θ, z) ∼ g(θ)f(z|θ) accepted with probability (hence the bound) K[{η(z) − η(xobs )}/h] and the corresponding importance weight defined by π(θ) g(θ) Theorem Optimality of posterior expectation E[θ|xobs] of parameter of interest as summary statistics η(xobs) [Fearnhead & Prangle, 2012; Sisson et al., 2019]
  • 45. Random forests Technique that stemmed from Leo Breiman’s bagging (or bootstrap aggregating) machine learning algorithm for both classification [testing] and regression [estimation] [Breiman, 1996] Improved performances by averaging over classification schemes of randomly generated training sets, creating a “forest” of (CART) decision trees, inspired by Amit and Geman (1997) ensemble learning [Breiman, 2001]
  • 46. Growing the forest Breiman’s solution for inducing random features in the trees of the forest: boostrap resampling of the dataset and random subset-ing [of size √ t] of the covariates driving the classification or regression at every node of each tree Covariate (summary) xτ that drives the node separation xτ cτ and the separation bound cτ chosen by minimising entropy or Gini index
  • 47. ABC with random forests Idea: Starting with possibly large collection of summary statistics (η1, . . . , ηp) (from scientific theory input to available statistical softwares, methodologies, to machine-learning alternatives) ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M or θ from (η1i, . . . , ηpi)
  • 48. ABC with random forests Idea: Starting with possibly large collection of summary statistics (η1, . . . , ηp) (from scientific theory input to available statistical softwares, methodologies, to machine-learning alternatives) ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M or θ from (η1i, . . . , ηpi) at each step O( √ p) indices sampled at random and most discriminating statistic selected, by minimising Gini loss
  • 49. ABC with random forests Idea: Starting with possibly large collection of summary statistics (η1, . . . , ηp) (from scientific theory input to available statistical softwares, methodologies, to machine-learning alternatives) ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M or θ from (η1i, . . . , ηpi) Average of the trees is resulting summary statistics, highly non-linear predictor of the model index
  • 50. ABC with random forests Idea: Starting with possibly large collection of summary statistics (η1, . . . , ηp) (from scientific theory input to available statistical softwares, methodologies, to machine-learning alternatives) ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M or θ from (η1i, . . . , ηpi) Potential selection of most active summaries, calibrated against pure noise
  • 51. Classification of summaries by random forests Given large collection of summary statistics, rather than selecting a subset and excluding the others, estimate each parameter random forests handles thousands of predictors ignores useless components fast estimation with good local properties automatised with few calibration steps substitute to Fearnhead and Prangle (2012) preliminary estimation of ^θ(yobs) includes a natural (classification) distance measure that avoids choice of either distance or tolerance [Marin et al., 2016]
  • 52. Calibration of tolerance Calibration of threshold ε from scratch [how small is small?] from k-nearest neighbour perspective [quantile of prior predictive] from asymptotics [convergence speed] related with choice of distance [automated selection by random forests] [Fearnhead & Prangle, 1992; Wilkinson, 2013; Liu & Fearnhead 2018]
  • 54. Sofware Several abc R packages for performing parameter estimation and model selection [Nunes & Prangle, 2017]
  • 55. Sofware Several abc R packages for performing parameter estimation and model selection [Nunes & Prangle, 2017] abctools R package tuning ABC analyses
  • 56. Sofware Several abc R packages for performing parameter estimation and model selection [Nunes & Prangle, 2017] abcrf R package ABC via random forests
  • 57. Sofware Several abc R packages for performing parameter estimation and model selection [Nunes & Prangle, 2017] EasyABC R package several algorithms for performing efficient ABC sampling schemes, including four sequential sampling schemes and 3 MCMC schemes
  • 58. Sofware Several abc R packages for performing parameter estimation and model selection [Nunes & Prangle, 2017] DIYABC non R software for population genetics
  • 59. ABC-IS Basic ABC algorithm suggests simulating from the prior blind [no learning] inefficient [curse of simension] inapplicable to improper priors Importance sampling version importance density g(θ) bounded kernel function Kh with bandwidth h acceptance probability of Kh{ρ[η(xobs ), η(x{θ})]} π(θ) g(θ) max θ Aθ [Fearnhead & Prangle, 2012]
  • 60. ABC-IS Basic ABC algorithm suggests simulating from the prior blind [no learning] inefficient [curse of simension] inapplicable to improper priors Importance sampling version importance density g(θ) bounded kernel function Kh with bandwidth h acceptance probability of Kh{ρ[η(xobs ), η(x{θ})]} π(θ) g(θ) max θ Aθ [Fearnhead & Prangle, 2012]
  • 61. ABC-MCMC Markov chain (θ(t)) created via transition function θ(t+1) =    θ ∼ Kω(θ |θ(t)) if x ∼ f(x|θ ) is such that x = y and u ∼ U(0, 1) π(θ )Kω(θ(t)|θ ) π(θ(t))Kω(θ |θ(t)) , θ(t) otherwise, has the posterior π(θ|y) as stationary distribution [Marjoram et al, 2003]
  • 62. ABC-MCMC Algorithm 3 Likelihood-free MCMC sampler get (θ(0), z(0)) by Algorithm 1 for t = 1 to N do generate θ from Kω ·|θ(t−1) , z from f(·|θ ), u from U[0,1], if u π(θ )Kω(θ(t−1)|θ ) π(θ(t−1)Kω(θ |θ(t−1)) IAε,xobs (z ) then set (θ(t), z(t)) = (θ , z ) else (θ(t), z(t))) = (θ(t−1), z(t−1)), end if end for
  • 63. ABC-PMC Generate a sample at iteration t by ^πt(θ(t) ) ∝ N j=1 ω (t−1) j Kt(θ(t) |θ (t−1) j ) modulo acceptance of the associated xt, with tolerance εt ↓, and use importance weight associated with accepted simulation θ (t) i ω (t) i ∝ π(θ (t) i ) ^πt(θ (t) i ) c Still likelihood free [Sisson et al., 2007; Beaumont et al., 2009]
  • 64. ABC-SMC Use of a kernel Kt associated with target πεt and derivation of the backward kernel Lt−1(z, z ) = πεt (z )Kt(z , z) πεt (z) Update of the weights ω (t) i ∝ ω (t−1) i M m=1 IAεt (x (t) im ) M m=1 IAεt−1 (x (t−1) im ) when x (t) im ∼ Kt(x (t−1) i , ·) [Del Moral, Doucet & Jasra, 2009]
  • 65. ABC-NP Better usage of [prior] simulations by adjustement: instead of throwing away θ such that ρ(η(z), η(xobs)) > ε, replace θ’s with locally regressed transforms θ∗ = θ − {η(z) − η(xobs )}T ^β [Csill´ery et al., TEE, 2010] where ^β is obtained by [NP] weighted least square regression on (η(z) − η(xobs)) with weights Kδ ρ(η(z), η(xobs )) [Beaumont et al., 2002, Genetics]
  • 66. ABC-NP (regression) Also found in the subsequent literature, e.g. in Fearnhead & Prangle (2012): weight directly simulation by Kδ ρ[η(z(θ)), η(xobs )] or 1 S S s=1 Kδ ρ[η(zs (θ)), η(xobs )] [consistent estimate of f(η|θ)] Curse of dimensionality: poor when d = dim(η) large
  • 67. ABC-NP (regression) Also found in the subsequent literature, e.g. in Fearnhead & Prangle (2012): weight directly simulation by Kδ ρ[η(z(θ)), η(xobs )] or 1 S S s=1 Kδ ρ[η(zs (θ)), η(xobs )] [consistent estimate of f(η|θ)] Curse of dimensionality: poor when d = dim(η) large
  • 68. ABC-NP (density estimation) Use of the kernel weights Kδ ρ[η(z(θ)), η(xobs )] leads to the NP estimate of the posterior expectation EABC [θ | xobs ] ≈ i θiKδ ρ[η(z(θi)), η(xobs)] i Kδ {ρ[η(z(θi)), η(xobs)]} [Blum, JASA, 2010]
  • 69. ABC-NP (density estimation) Use of the kernel weights Kδ ρ[η(z(θ)), η(xobs )] leads to the NP estimate of the posterior conditional density πABC [θ | xobs ] ≈ i ˜Kb(θi − θ)Kδ ρ[η(z(θi)), η(xobs)] i Kδ {ρ[η(z(θi)), η(xobs)]} [Blum, JASA, 2010]
  • 70. ABC-NP (density estimations) Other versions incorporating regression adjustments i ˜Kb(θ∗ i − θ)Kδ ρ[η(z(θi)), η(xobs)] i Kδ {ρ[η(z(θi)), η(xobs)]} In all cases, error E[^g(θ|xobs )] − g(θ|xobs ) = cb2 + cδ2 + OP(b2 + δ2 ) + OP(1/nδd ) var(^g(θ|xobs )) = c nbδd (1 + oP(1))
  • 71. ABC-NP (density estimations) Other versions incorporating regression adjustments i ˜Kb(θ∗ i − θ)Kδ ρ[η(z(θi)), η(xobs)] i Kδ {ρ[η(z(θi)), η(xobs)]} In all cases, error E[^g(θ|xobs )] − g(θ|xobs ) = cb2 + cδ2 + OP(b2 + δ2 ) + OP(1/nδd ) var(^g(θ|xobs )) = c nbδd (1 + oP(1)) [Blum, JASA, 2010]
  • 72. ABC-NP (density estimations) Other versions incorporating regression adjustments i ˜Kb(θ∗ i − θ)Kδ ρ[η(z(θi)), η(xobs)] i Kδ {ρ[η(z(θi)), η(xobs)]} In all cases, error E[^g(θ|xobs )] − g(θ|xobs ) = cb2 + cδ2 + OP(b2 + δ2 ) + OP(1/nδd ) var(^g(θ|xobs )) = c nbδd (1 + oP(1)) [standard NP calculations]
  • 73. ABC-NCH Incorporating non-linearities and heterocedasticities: θ∗ = ^m(η(xobs )) + [θ − ^m(η(z))] ^σ(η(xobs)) ^σ(η(z)) where ^m(η) estimated by non-linear regression (e.g., neural network) ^σ(η) estimated by non-linear regression on residuals log{θi − ^m(ηi)}2 = log σ2 (ηi) + ξi [Blum & Franc¸ois, 2009]
  • 74. ABC-NCH Incorporating non-linearities and heterocedasticities: θ∗ = ^m(η(xobs )) + [θ − ^m(η(z))] ^σ(η(xobs)) ^σ(η(z)) where ^m(η) estimated by non-linear regression (e.g., neural network) ^σ(η) estimated by non-linear regression on residuals log{θi − ^m(ηi)}2 = log σ2 (ηi) + ξi [Blum & Franc¸ois, 2009]
  • 75. ABC-NCH Why neural network? fights curse of dimensionality selects relevant summary statistics provides automated dimension reduction offers a model choice capability improves upon multinomial logistic [Blum & Franc¸ois, 2009]
  • 76. Convergence 4 Convergence ABC inference consistency ABC model choice consistency ABC under misspecification 5 Advanced topics
  • 77. Asymptotics of ABC Since πABC(· | xobs) is an approximation of π(· | xobs) or π(· | η(xobs)) coherence of ABC-based inference need be established on its own [Li & Fearnhead, 2018a, 2018b; Frazier et al., 2018] Meaning establishing large sample (n) properties of ABC posteriors and ABC procedures finding sufficient conditions and checks on summary statistics η(cdot) determining proper rate of convergence of tolerance to 0 as ε = εn [mostly] ignoring Monte Carlo errors
  • 78. Asymptotics of ABC Since πABC(· | xobs) is an approximation of π(· | xobs) or π(· | η(xobs)) coherence of ABC-based inference need be established on its own [Li & Fearnhead, 2018a, 2018b; Frazier et al., 2018] Meaning establishing large sample (n) properties of ABC posteriors and ABC procedures finding sufficient conditions and checks on summary statistics η(cdot) determining proper rate of convergence of tolerance to 0 as ε = εn [mostly] ignoring Monte Carlo errors
  • 79. Consistency of ABC posteriors ABC algorithm Bayesian consistent at θ0 if for any δ > 0, Π θ − θ0 > δ|ρ{η(xobs ), η(Z} ε → 0 as n → +∞, ε → 0 Bayesian consistency implies that sets containing θ0 have posterior probability tending to one as n → +∞, with implication being the existence of a specific rate of concentration
  • 80. Consistency of ABC posteriors ABC algorithm Bayesian consistent at θ0 if for any δ > 0, Π θ − θ0 > δ|ρ{η(xobs ), η(Z} ε → 0 as n → +∞, ε → 0 Concentration around true value and Bayesian consistency impose less stringent conditions on the convergence speed of tolerance εn to zero, when compared with asymptotic normality of ABC posterior asymptotic normality of ABC posterior mean does not require asymptotic normality of ABC posterior
  • 81. Asymptotic setup Assumptions: asymptotic: xobs = xobs(n) ∼ Pn θ0 and ε = εn, n → +∞ parametric: θ ∈ Rk, k fixed concentration of summary statistic η(zn): ∃b : θ → b(θ) η(zn ) − b(θ) = oPθ (1), ∀θ identifiability of parameter b(θ) = b(θ ) when θ = θ
  • 82. Consistency of ABC posteriors Concentration of summary η(z): there exists b(θ) such that η(z) − b(θ) = oPθ (1) Consistency: Πεn θ − θ0 δ|η(xobs ) = 1 + op(1) Convergence rate: there exists δn = o(1) such that Πεn θ − θ0 δn|η(xobs ) = 1 + op(1)
  • 83. Consistency of ABC posteriors Consistency: Πεn θ − θ0 δ|η(xobs ) = 1 + op(1) Convergence rate: there exists δn = o(1) such that Πεn θ − θ0 δn|η(xobs ) = 1 + op(1) Point estimator consistency ^θε = EABC[θ|η(xobs(n) )], EABC[θ|η(xobs(n) )] − θ0 = op(1) vn(EABC[θ|η(xobs(n) )] − θ0) ⇒ N(0, v)
  • 84. Related convergence results Studies on the large sample properties of ABC, with focus the asymptotic properties of ABC point estimators [Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a,b] assumption of CLT for summary statistic plus regularity assumptions on convergence of its density to Normal limit convergence rate of posterior mean if εT = o(1/v 3/5 T ) acceptance probability chosen as arbitrary density η(xobs) − η(z)
  • 85. Related convergence results Studies on the large sample properties of ABC, with focus the asymptotic properties of ABC point estimators [Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a,b] characterisation of asymptotic behavior of ABC posterior for all εT = o(1) with posterior concentration asymptotic normality and unbiasedness of posterior mean remain achievable even when limT vT εT = ∞, provided εT = o(1/v 1/2 T )
  • 86. Related convergence results Studies on the large sample properties of ABC, with focus the asymptotic properties of ABC point estimators [Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a,b] if kη > kθ 1 and if εT = o(1/v 3/5 T ), posterior mean asymptotically normal, unbiased, but asymptotically inefficient if vT εT → ∞ and vT ε2 T = o(1), for kη > kθ 1, EΠε {vT (θ−θ0)} = { θb(θ0) θb(θ0)}−1 θb(θ0) vT {η(xobs )−b(θ0)}+op(1).
  • 87. Asymptotic shape of posterior distribution Shape of Π · | η(xobs ), η(z) εn depending on relation between εn and rate σn at which η(xobsn ) satisfy CLT Three different regimes: 1. σn = o(εn) −→ Uniform limit 2. σn εn −→ perturbated Gaussian limit 3. σn εn −→ Gaussian limit
  • 88. asymptotic behaviour of posterior mean When p = dim(η(xobs)) = d = dim(θ) and εn = o(n−3/10) EABC[dT (θ − θ0)|xobs ] ⇒ N(0, ( bo )T Σ−1 bo −1 [Li & Fearnhead (2018a)] In fact, if εβ+1 n √ n = o(1), with β H¨older-smoothness of π EABC[(θ−θ0)|xobs ] = ( bo)−1Zo √ n + k j=1 hj(θ0)ε2j n +op(1), 2k = β [Fearnhead & Prangle, 2012]
  • 89. asymptotic behaviour of posterior mean When p = dim(η(xobs)) = d = dim(θ) and εn = o(n−3/10) EABC[dT (θ − θ0)|xobs ] ⇒ N(0, ( bo )T Σ−1 bo −1 [Li & Fearnhead (2018a)] Iterating for fixed p mildly interesting: if ˜η(xobs) = EABC[θ|xobs ] then EABC[θ|˜η(xobs )] = θ0 + ( bo)−1Zo √ n + π (θ0) π(θ0) ε2 n + o() [Fearnhead & Prangle, 2012]
  • 90. Curse of dimensionality For reasonable statistical behavior, decline of acceptance αT the faster the larger the dimension of θ, kθ, but unaffected by dimension of η, kη Theoretical justification for dimension reduction methods that process parameter components individually and independently of other components [Fearnhead & Prangle, 2012; Martin & al., 2016] importance sampling approach of Li & Fearnhead (2018a) yields acceptance rates αT = O(1), when εT = O(1/vT )
  • 91. Curse of dimensionality For reasonable statistical behavior, decline of acceptance αT the faster the larger the dimension of θ, kθ, but unaffected by dimension of η, kη Theoretical justification for dimension reduction methods that process parameter components individually and independently of other components [Fearnhead & Prangle, 2012; Martin & al., 2016] importance sampling approach of Li & Fearnhead (2018a) yields acceptance rates αT = O(1), when εT = O(1/vT )
  • 92. Monte Carlo error Link the choice of εT to Monte Carlo error associated with NT draws in ABC Algorithm Conditions (on εT ) under which ^αT = αT {1 + op(1)} where ^αT = NT i=1 1l [d{η(y), η(z)} εT ] /NT proportion of accepted draws from NT simulated draws of θ Either (i) εT = o(v−1 T ) and (vT εT )−kη ε−kθ T MNT or (ii) εT v−1 T and ε−kθ T MNT for M large enough
  • 93. Bayesian model choice Several models M1, M2, . . . to be compared for dataset xobs making model index M part of inference Use of a prior distribution. π(M = m), plus a prior distribution on the parameter conditional on the value m of the model index, πm(θm) Goal to derive the posterior distribution of M, challenging computational target when models are complex
  • 94. Generic ABC for model choice Algorithm 4 Likelihood-free model choice sampler (ABC-MC) for t = 1 to T do repeat Generate m from the prior π(M = m) Generate θm from the prior πm(θm) Generate z from the model fm(z|θm) until ρ{η(z), η(xobs)} < ε Set m(t) = m and θ(t) = θm end for [Cornuet et al., DIYABC, 2009]
  • 95. ABC model choice consistency Leaving approximations aside, limiting ABC procedure is Bayes factor based on η(xobs) B12(η(xobs )) Potential loss of information at the testing level [Robert et al., 2010] When is Bayes factor based on insufficient statistic η(xobs) consistent? [Marin et al., 2013]
  • 96. ABC model choice consistency Leaving approximations aside, limiting ABC procedure is Bayes factor based on η(xobs) B12(η(xobs )) Potential loss of information at the testing level [Robert et al., 2010] When is Bayes factor based on insufficient statistic η(xobs) consistent? [Marin et al., 2013]
  • 97. Example 8: Gauss versus Laplace Model M1: xobs ∼ N(θ1, 1)⊗n opposed to model M2: xobs ∼ L(θ2, 1/ √ 2)⊗n, Laplace distribution with mean θ2 and variance one Four possible statistics η(xobs) 1. sample mean xobs (sufficient for M1 if not M); 2. sample median med(xobs) (insufficient); 3. sample variance var(xobs) (ancillary); 4. median absolute deviation mad(xobs) = med(|xobs − med(xobs)|);
  • 98. Example 8: Gauss versus Laplace Model M1: xobs ∼ N(θ1, 1)⊗n opposed to model M2: xobs ∼ L(θ2, 1/ √ 2)⊗n, Laplace distribution with mean θ2 and variance one q q q q q q q q q q q Gauss Laplace 0.00.10.20.30.40.50.60.7 n=100 q q q q q q q q q q q q q q q q q q Gauss Laplace 0.00.20.40.60.81.0 n=100
  • 99. Consistency Summary statistics η(xobs ) = (τ1(xobs ), τ2(xobs ), · · · , τd(xobs )) ∈ Rd with true distribution η ∼ Gn, true mean µ0, and distribution Gi,n under model Mi, corresponding posteriors pi2(· | ηn) Assumptions of central limit theorem and large deviations for eta(xobs) under true, plus usual Bayesian asymptotics with di effective dimension of the parameter) [Pillai et al., 2013]
  • 100. Asymptotic marginals Asymptotically mi,n(t) = Θi gi,n(t|θi) πi(θi) dθi such that (i) if inf{|µi(θi) − µ0|; θi ∈ Θi} = 0, Cl √ n d−di mi,n(ηn ) Cu √ n d−di and (ii) if inf{|µi(θi) − µ0|; θi ∈ Θi} > 0 mi,n(ηn ) = o¶n [ √ n d−τi + √ n d−αi ].
  • 101. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of ηn under both models. And only by this mean value!
  • 102. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of ηn under both models. And only by this mean value! Indeed, if inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} = inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0 then Cl √ n −(d1−d2) m1,n(ηn ) m2(ηn ) Cu √ n −(d1−d2) , where Cl, Cu = O¶n (1), irrespective of the true model. c Only depends on the difference d1 − d2: no consistency
  • 103. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of ηn under both models. And only by this mean value! Else, if inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} > inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0 then m1,n(ηn) m2,n(ηn) Cu min √ n −(d1−α2) , √ n −(d1−τ2)
  • 104. Checking for adequate statistics Run a practical check of the relevance (or non-relevance) of ηn null hypothesis that both models are compatible with the statistic ηn H0 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} = 0 against H1 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} > 0 testing procedure provides estimates of mean of ηn under each model and checks for equality
  • 105. ABC under misspecification ABC methods rely on simulations z(θ) from the model to identify those close to xobs What is happening when the model is wrong? for some tolerance sequences εn ↓ ε∗, well-behaved ABC posteriors that concentrate posterior mass on pseudo-true value if εn too large, asymptotic limit of ABC posterior uniform with radius of order εn − ε∗ even if √ n{εn − ε∗} → 2c ∈ R, limiting distribution no longer Gaussian ABC credible sets invalid confidence sets
  • 106. ABC under misspecification ABC methods rely on simulations z(θ) from the model to identify those close to xobs What is happening when the model is wrong? for some tolerance sequences εn ↓ ε∗, well-behaved ABC posteriors that concentrate posterior mass on pseudo-true value if εn too large, asymptotic limit of ABC posterior uniform with radius of order εn − ε∗ even if √ n{εn − ε∗} → 2c ∈ R, limiting distribution no longer Gaussian ABC credible sets invalid confidence sets
  • 107. Example 9: Normal model with wrong variance Assumed data generating process (DGP) is z ∼ N(θ, 1) but actual DGP is xobs ∼ N(θ, ˜σ2) Use of summaries sample mean η1(xobs) = 1 n n i=1 xi centered summary η2(xobs) = 1 n−1 n i=1(xi − η1(xobs)2 − 1 Three ABC: ABC-AR: accept/reject approach with Kε(d{η(z), η(xobs)}) = I d{η(z), η(xobs)} ε and d{x, y} = x − y ABC-K: smooth rejection approach, with Kε(d{η(z), η(xobs)}) univariate Gaussian kernel ABC-Reg: post-processing ABC approach with weighted linear regression adjustment
  • 108. Example 9: Normal model with wrong variance posterior means for ABC-AR, ABC-K and ABC-Reg as σ2 increases (N = 50, 000 simulated data sets) αn = n−5/9 quantile for ABC-AR ABC-K and ABC-Reg bandwidth of n−5/9
  • 109. ABC misspecification data xobs with true distribution P0 assumed issued from model Pθ θ ∈ Θ ⊂ Rkθ and summary statistic η(xobs) = (η1(xobs), ..., ηkη (xobs)) misspecification inf θ∈Θ D(P0||Pθ) = inf θ∈Θ log dP0(x) dPθ(x) dP0(y) > 0, with θ∗ = arg inf θ∈Θ D(P0||Pθ) [Muller, 2013] ABC misspecification: for b0 (resp. b(θ)) limit of η(xobs) (resp. η(z)) inf θ∈Θ d{b0, b(θ)} > 0
  • 110. ABC misspecification data xobs with true distribution P0 assumed issued from model Pθ θ ∈ Θ ⊂ Rkθ and summary statistic η(xobs) = (η1(xobs), ..., ηkη (xobs)) ABC misspecification: for b0 (resp. b(θ)) limit of η(xobs) (resp. η(z)) inf θ∈Θ d{b0, b(θ)} > 0 ABC pseudo-true value: θ∗ = arg inf θ∈Θ d{b0, b(θ)}.
  • 111. Minimum tolerance Approximate nature of ABC means that D(P0||Pθ) yields no meaningful notion of model misspecification associated with the output of ABC posterior distributions Under identification conditions on b(·) ∈ Rkη , there exists ε∗ such that ε∗ = inf θ∈Θ d{b0, b(θ)} > 0 Once εn < ε∗ no draw of θ to be selected and posterior Πε[A|η(xobs)] ill-conditioned But appropriately chosen tolerance sequence (εn)n allows ABC-based posterior to concentrate on distance-dependent pseudo-true value θ∗
  • 112. Minimum tolerance Approximate nature of ABC means that D(P0||Pθ) yields no meaningful notion of model misspecification associated with the output of ABC posterior distributions Under identification conditions on b(·) ∈ Rkη , there exists ε∗ such that ε∗ = inf θ∈Θ d{b0, b(θ)} > 0 Once εn < ε∗ no draw of θ to be selected and posterior Πε[A|η(xobs)] ill-conditioned But appropriately chosen tolerance sequence (εn)n allows ABC-based posterior to concentrate on distance-dependent pseudo-true value θ∗
  • 113. Minimum tolerance Approximate nature of ABC means that D(P0||Pθ) yields no meaningful notion of model misspecification associated with the output of ABC posterior distributions Under identification conditions on b(·) ∈ Rkη , there exists ε∗ such that ε∗ = inf θ∈Θ d{b0, b(θ)} > 0 Once εn < ε∗ no draw of θ to be selected and posterior Πε[A|η(xobs)] ill-conditioned But appropriately chosen tolerance sequence (εn)n allows ABC-based posterior to concentrate on distance-dependent pseudo-true value θ∗
  • 114. Minimum tolerance Approximate nature of ABC means that D(P0||Pθ) yields no meaningful notion of model misspecification associated with the output of ABC posterior distributions Under identification conditions on b(·) ∈ Rkη , there exists ε∗ such that ε∗ = inf θ∈Θ d{b0, b(θ)} > 0 Once εn < ε∗ no draw of θ to be selected and posterior Πε[A|η(xobs)] ill-conditioned But appropriately chosen tolerance sequence (εn)n allows ABC-based posterior to concentrate on distance-dependent pseudo-true value θ∗
  • 115. ABC concentration under misspecification Assumptions (A0) Existence of unique b0 such that d(η(xobs), b0) = oP0 (1) and of sequence v0,n → +∞ such that lim inf n→+∞ P0 d(η(xobs n ), b0) v−1 0,n = 1.
  • 116. ABC concentration under misspecification Assumptions (A1) Existence of injective map b : Θ → B ⊂ Rkη and function ρn with ρn(·) ↓ 0 as n → +∞, and ρn(·) non-increasing, such that Pθ [d(η(Z), b(θ)) > u] c(θ)ρn(u), Θ c(θ)dΠ(θ) < ∞ and assume either (i) Polynomial deviations: existence of vn ↑ +∞ and u0, κ > 0 such that ρn(u) = v−κ n u−κ, for u u0 (ii) Exponential deviations:
  • 117. ABC concentration under misspecification Assumptions (A1) Existence of injective map b : Θ → B ⊂ Rkη and function ρn with ρn(·) ↓ 0 as n → +∞, and ρn(·) non-increasing, such that Pθ [d(η(Z), b(θ)) > u] c(θ)ρn(u), Θ c(θ)dΠ(θ) < ∞ and assume either (i) Polynomial deviations: (ii) Exponential deviations: existence of hθ(·) > 0 such that Pθ[d(η(z), b(θ)) > u] c(θ)e−hθ(uvn) and existence of m, C > 0 such that Θ c(θ)e−hθ(uvn) dΠ(θ) Ce−m·(uvn)τ , for u u0.
  • 118. ABC concentration under misspecification Assumptions (A2) existence of D > 0 and M0, δ0 > 0 such that, for all δ0 δ > 0 and M M0, existence of Sδ ⊂ {θ ∈ Θ : d(b(θ), b0) − ε∗ δ} for which (i) In case (i), D < κ and Sδ 1 − c(θ) M dΠ(θ) δD. (ii) In case (ii), Sδ 1 − c(θ)e−hθ(M) dΠ(θ) δD.
  • 119. Consistency Assume (A0), [A1] and [A2], with εn ↓ ε∗ with εn ε∗ + Mv−1 n + v−1 0,n, for M large enough. Let Mn ↑ ∞ and δn Mn(εn − ε∗), then Πε d(b(θ), b0) ε∗ + δn|η(xobs ) = oP0 (1), 1. if δn Mnv−1 n u −D/κ n = o(1) in case (i) 2. if δn Mnv−1 n | log(un)|1/τ = o(1) in case (ii) with un = εn − (ε∗ + Mv−1 n + v−1 0,n) 0 . [Bernton et al., 2017; Frazier et al., 2019]
  • 120. Regression adjustement under misspecification Original accepted value θ artificially related to η(xobs) and η(z) through local linear regression model θ = µ + β {η(xobs ) − η(z)} + ν, where νi model residual [Beaumont et al., 2003] Asymptotic behavior of ABC-Reg posterior Πε[·|η(xobs )] determined by behavior of Πε[·|η(xobs )], ^β, and {η(xobs ) − η(z)}
  • 121. Regression adjustement under misspecification Original accepted value θ artificially related to η(xobs) and η(z) through local linear regression model θ = µ + β {η(xobs ) − η(z)} + ν, where νi model residual [Beaumont et al., 2003] Asymptotic behavior of ABC-Reg posterior Πε[·|η(xobs )] determined by behavior of Πε[·|η(xobs )], ^β, and {η(xobs ) − η(z)}
  • 122. Regression adjustement under misspecification ABC-Reg takes draws of asymptotically optimal θ and perturbs them in a manner that need not preserve optimality of the original draws for β0 large, pseudo-true value ˜θ∗ possibly outside Θ extends to nonlinear regression adjustments (Blum & Fran¸cois, 2010) potential correction of the adjustment (Frazier et al., 2019) all local regression adjustments have much smaller posterior variability than ABC-AR but false sense of the procedures precision
  • 123. Example 11: misspecified g-and-k Quantile function of g-and-k model: F−1 (q) ∈ (0, 1) → a+b 1 + 0.8 1 − exp(−gz(q) 1 + exp(−gz(q) 1 + z(q)2 k z(q), where z(q) q-th N(0, 1) quantile Data generated from a mixture distribution with minor bi-modality
  • 125. Advanced topics 5 Advanced topics big data issues big parameter issues
  • 126. computational bottleneck Time per iteration increases with sample size n of the data: cost of sampling associated with a reasonable acceptance probability makes ABC infeasible for large datasets surrogate models to get samples (e.g., using copulas) direct sampling of summary statistics (e.g., synthetic likelihood) borrow from proposals for scalable MCMC e.g., divide & conquer)
  • 127. Approximate ABC [AABC] Idea approximations on both parameter and model spaces by resorting to bootstrap techniques. [Buzbas & Rosenberg, 2015] Procedure scheme 1. Sample (θi, xi), i = 1, . . . , m, from prior predictive 2. Simulate θ∗ ∼ π(·) and assign weight wi to dataset x(i) simulated under k-closest θi to θ∗ by Epanechnikov kernel 3. Generate dataset x∗ as bootstrap weighted sample from (x(1), . . . , x(k)) Drawbacks If m too small, prior predictive sample may miss informative parameters large error and misleading representation of true posterior only well-suited for models with very few parameters
  • 128. Approximate ABC [AABC] Procedure scheme 1. Sample (θi, xi), i = 1, . . . , m, from prior predictive 2. Simulate θ∗ ∼ π(·) and assign weight wi to dataset x(i) simulated under k-closest θi to θ∗ by Epanechnikov kernel 3. Generate dataset x∗ as bootstrap weighted sample from (x(1), . . . , x(k)) Drawbacks If m too small, prior predictive sample may miss informative parameters large error and misleading representation of true posterior only well-suited for models with very few parameters
  • 129. Divide-and-conquer perspectives 1. divide the large dataset into smaller batches 2. sample from the batch posterior 3. combine the result to get a sample from the targeted posterior Alternative via ABC-EP [Barthelm´e & Chopin, 2014]
  • 130. Divide-and-conquer perspectives 1. divide the large dataset into smaller batches 2. sample from the batch posterior 3. combine the result to get a sample from the targeted posterior Alternative via ABC-EP [Barthelm´e & Chopin, 2014]
  • 131. Geometric combination: WASP Subset posteriors given partition xobs [1] , . . . , xobs [B] of the observed data xobs, let define π(θ | xobs [b] ) ∝ π(θ) j∈[b] f(xobs j | θ)B . [Srivastava et al., 2015] Subset posteriors are combined using the Wasserstein barycenter [Cuturi, 2014] Drawback require sampling from f(· | θ)B by ABC means. Should be feasible for latent variable (z) representations when f(x | z, θ) available in closed form [Doucet & Robert, 2001]
  • 132. Geometric combination: WASP Subset posteriors given partition xobs [1] , . . . , xobs [B] of the observed data xobs, let define π(θ | xobs [b] ) ∝ π(θ) j∈[b] f(xobs j | θ)B . [Srivastava et al., 2015] Subset posteriors are combined using the Wasserstein barycenter [Cuturi, 2014] Alternative backfeed subset posteriors as priors to other subsets
  • 133. Consensus ABC Na¨ıve scheme For each data batch b = 1, . . . , B 1. Sample (θ [b] 1 , . . . , θ [b] n ) from diffused prior ˜π(·) ∝ π(·)1/B 2. Run ABC to sample from batch posterior ^π(· | d(S(xobs [b] ), S(x[b])) < ε) 3. Compute sample posterior variance Σb Combine batch posterior approximations θj = B b=1 Σb −1 B b=1 Σbθ [b] j . Remark Diffuse prior quite non informative which call for ABC-MCMC to avoid sampling from ˜π(·).
  • 134. Consensus ABC Na¨ıve scheme For each data batch b = 1, . . . , B 1. Sample (θ [b] 1 , . . . , θ [b] n ) from diffused prior ˜π(·) ∝ π(·)1/B 2. Run ABC to sample from batch posterior ^π(· | d(S(xobs [b] ), S(x[b])) < ε) 3. Compute sample posterior variance Σb Combine batch posterior approximations θj = B b=1 Σb −1 B b=1 Σbθ [b] j . Remark Diffuse prior quite non informative which call for ABC-MCMC to avoid sampling from ˜π(·).
  • 135. Big parameter issues Curse of dimensionality: as dim(Θ) increases exploration of parameter space gets harder summary statistic η forced to increase, since at least of dimension dim(Θ) Some solutions adopt more local algorithms like ABC-MCMCM or ABC-SMC aim at posterior marginals and approximate joint posterior by copula [Li et al., 2016] run ABC-Gibbs [Clart´e et al., 2016]
  • 136. Example 9: Hierarchical MA(2) α = (α1, α2, α3), with prior E(1)⊗3 σ = (σ1, σ2) with prior C(1)+⊗2 µi = (βi,1 − βi,2, 2(βi,1 + βi,2) − 1) with (βi,1, βi,2, βi,3) iid ∼ Dir(α1, α2, α3) σi iid ∼ IG(σ1, σ2) xi iid ∼ MA2(µi, σi) α µ1 µ2 µn. . . x1 x2 xn. . . σ σ1 σ2 σn. . .
  • 137. Example 9: Hierarchical MA(2) c Based on 106 prior and 103 posteriors simulations, 3n summary statistics, and series of length 100, ABC posterior hardly distinguishable from prior!
  • 138. ABC-Gibbs When parameter decomposed into θ = (θ1, . . . , θn) Algorithm 5 ABC-Gibbs sampler starting point θ(0) = (θ (0) 1 , . . . , θ (0) n ), observation xobs for i = 1, . . . , N do for j = 1, . . . , n do θ (i) j ∼ πεj (· | x , sj, θ (i) 1 , . . . , θ (i) j−1, θ (i−1) j+1 , . . . , θ (i−1) n ) end for end for Divide & conquer: one tolerance εj for each parameter θj one statistic sj for each parameter θj [Clart´e et al., 2016]
  • 139. ABC-Gibbs When parameter decomposed into θ = (θ1, . . . , θn) Algorithm 6 ABC-Gibbs sampler starting point θ(0) = (θ (0) 1 , . . . , θ (0) n ), observation xobs for i = 1, . . . , N do for j = 1, . . . , n do θ (i) j ∼ πεj (· | x , sj, θ (i) 1 , . . . , θ (i) j−1, θ (i−1) j+1 , . . . , θ (i−1) n ) end for end for Divide & conquer: one tolerance εj for each parameter θj one statistic sj for each parameter θj [Clart´e et al., 2016]
  • 140. Compatibility When using ABC-Gibbs conditionals with different acceptance events, e.g., different statistics π(α)π(sα(µ) | α) and π(µ)f(sµ(x ) | α, µ). conditionals are incompatible ABC-Gibbs does not necessarily converge (even for tolerance equal to zero) potential limiting distribution not a genuine posterior (double use of data) unknown [except for a specific version] possibly far from genuine posterior(s) [Clart´e et al., 2016]
  • 141. Compatibility When using ABC-Gibbs conditionals with different acceptance events, e.g., different statistics π(α)π(sα(µ) | α) and π(µ)f(sµ(x ) | α, µ). conditionals are incompatible ABC-Gibbs does not necessarily converge (even for tolerance equal to zero) potential limiting distribution not a genuine posterior (double use of data) unknown [except for a specific version] possibly far from genuine posterior(s) [Clart´e et al., 2016]
  • 142. Convergence In the hierarchical case n = 2, Theorem If there exists 0 < κ < 1/2 such that sup θ1,˜θ1 πε2 (· | x , s2, θ1) − πε2 (· | x , s2, ˜θ1) TV = κ the ABCG Markov chain geometrically converges in total variation to a stationary distribution νε, with geometric rate 1 − 2κ.
  • 143. Example 9: Hierarchical MA(2) Separation from the prior for identical number of simulations
  • 144. ABC sessions at JSM [and further] #107 - The ABC of Making an Impact, Monday 8:30-10:20am #??? [A]BayesComp, Jan 7-10 2020 ABC in Grenoble, March 18-19 2020 ABC in Longyearbyen, April 8-9 2021
  • 145. ABC sessions at JSM [and further] #107 - The ABC of Making an Impact, Monday 8:30-10:20am #??? [A]BayesComp, Jan 7-10 2020 ABC in Grenoble, March 18-19 2020 ABC in Longyearbyen, April 8-9 2021
  • 146. ABC postdoc positions 2 post-doc positions with the ABSint research grant: Focus on approximate Bayesian techniques like ABC, variational Bayes, PAC-Bayes, Bayesian non-parametrics, scalable MCMC, and related topics. A potential direction of research would be the derivation of new Bayesian tools for model checking in such complex environments. Terms: up to 24 months, no teaching duty attached, primarily located in Universit´e Paris-Dauphine, with supported periods in Oxford (J. Rousseau) and visits to Montpellier (J.-M. Marin). No hard deadline. If interested, send application to bayesianstatistics@gmail.com