SlideShare a Scribd company logo
1 of 67
Download to read offline
ABC-Gibbs
Christian P. Robert
U. Paris Dauphine PSL & Warwick U. & CREST
ENSAE
New(eb)castle, 13 Nov. 2019
Outline
1 Motivation
2 Principle and calibration
algorithm
summary statistics
random forests
calibration of tolerance
3 Further approximations
big data issues
big parameter issues
Posterior distribution
Central concept of Bayesian inference:
π( θ
parameter
| xobs
observation
)
proportional
∝ π(θ)
prior
× L(θ | xobs
)
likelihood
Drives
derivation of optimal decisions
assessment of uncertainty
model selection
prediction
[McElreath, 2015]
Monte Carlo representation
Exploration of Bayesian posterior π(θ|xobs) may (!) require to
produce sample
θ1, . . . , θT
distributed from π(θ|xobs) (or asymptotically by Markov chain
Monte Carlo aka MCMC)
[McElreath, 2015]
Difficulties
MCMC = workhorse of practical Bayesian analysis (BUGS,
JAGS, Stan, &tc.), except when product
π(θ) × L(θ | xobs
)
well-defined but numerically unavailable or too costly to
compute
Only partial solutions are available:
demarginalisation (latent variables)
exchange algorithm (auxiliary variables)
pseudo-marginal (unbiased estimator)
Difficulties
MCMC = workhorse of practical Bayesian analysis (BUGS,
JAGS, Stan, &tc.), except when product
π(θ) × L(θ | xobs
)
well-defined but numerically unavailable or too costly to
compute
Only partial solutions are available:
demarginalisation (latent variables)
exchange algorithm (auxiliary variables)
pseudo-marginal (unbiased estimator)
Example 1: Dynamic mixture
Mixture model
{1 − wµ,τ(x)}fβ,λ(x) + wµ,τ(x)gε,σ(x) x > 0
where
fβ,λ Weibull density,
gε,σ generalised Pareto density, and
wµ,τ Cauchy (arctan) cdf
Intractable normalising constant
C(µ, τ, β, λ, ε, σ) =
∞
0
{(1 − wµ,τ(x))fβ,λ(x) + wµ,τ(x)gε,σ(x)} dx
[Frigessi, Haug & Rue, 2002]
Example 2: truncated Normal
Given set A ⊂ Rk (k large), truncated Normal model
f(x | µ, Σ, A) ∝ exp{−(x − µ)T
Σ−1
(x − µ)/2} IA(x)
with intractable normalising constant
C(µ, Σ, A) =
A
exp{−(x − µ)T
Σ−1
(x − µ)/2}dx
Example 3: robust Normal statistics
Normal sample
x1, . . . , xn ∼ N(µ, σ2
)
summarised into (insufficient)
^µn = med(x1, . . . , xn)
and
^σn = mad(x1, . . . , xn)
= med |xi − ^µn|
Under a conjugate prior π(µ, σ2), posterior close to intractable.
but simulation of (^µn, ^σn) straightforward
Example 3: robust Normal statistics
Normal sample
x1, . . . , xn ∼ N(µ, σ2
)
summarised into (insufficient)
^µn = med(x1, . . . , xn)
and
^σn = mad(x1, . . . , xn)
= med |xi − ^µn|
Under a conjugate prior π(µ, σ2), posterior close to intractable.
but simulation of (^µn, ^σn) straightforward
Realistic[er] applications
Kingman’s coalescent in population genetics
[Tavar´e et al., 1997; Beaumont et al., 2003]
α-stable distributions
[Peters et al, 2012]
complex networks
[Dutta et al., 2018]
astrostatistics & cosmostatistics
[Cameron & Pettitt, 2012; Ishida et al., 2015]
Principle and calibration
2 Principle and calibration
algorithm
summary statistics
random forests
calibration of tolerance
3 Further approximations
A?B?C?
A stands for approximate [wrong
likelihood]
B stands for Bayesian [right prior]
C stands for computation [producing
a parameter sample]
A?B?C?
Rough version of the data [from dot
to ball]
Non-parametric approximation of
the likelihood [near actual
observation]
Use of non-sufficient statistics
[dimension reduction]
Monte Carlo error [and no
unbiasedness]
A seemingly na¨ıve representation
When likelihood f(x|θ) not in closed form, likelihood-free
rejection technique:
ABC-AR algorithm
For an observation xobs ∼ f(x|θ), under the prior π(θ), keep
jointly simulating
θ ∼ π(θ) , z ∼ f(z|θ ) ,
until the auxiliary variable z is equal to the observed value,
z = xobs
[Diggle & Gratton, 1984; Rubin, 1984; Tavar´e et al., 1997]
A seemingly na¨ıve representation
When likelihood f(x|θ) not in closed form, likelihood-free
rejection technique:
ABC-AR algorithm
For an observation xobs ∼ f(x|θ), under the prior π(θ), keep
jointly simulating
θ ∼ π(θ) , z ∼ f(z|θ ) ,
until the auxiliary variable z is equal to the observed value,
z = xobs
[Diggle & Gratton, 1984; Rubin, 1984; Tavar´e et al., 1997]
A as approximative
When y is a continuous random variable, strict equality
z = xobs
is replaced with a tolerance zone
ρ(xobs
, z) ≤ ε
where ρ is a distance
Output distributed from
π(θ) Pθ{ρ(xobs
, z) < ε}
def
∝ π(θ|ρ(xobs
, z) < ε)
[Pritchard et al., 1999]
A as approximative
When y is a continuous random variable, strict equality
z = xobs
is replaced with a tolerance zone
ρ(xobs
, z) ≤ ε
where ρ is a distance
Output distributed from
π(θ) Pθ{ρ(xobs
, z) < ε}
def
∝ π(θ|ρ(xobs
, z) < ε)
[Pritchard et al., 1999]
ABC algorithm
Algorithm 1 Likelihood-free rejection sampler
for i = 1 to N do
repeat
generate θ from prior π(·)
generate z from sampling density f(·|θ )
until ρ{η(z), η(xobs)} ≤ ε
set θi = θ
end for
where η(xobs) defines a (not necessarily sufficient) statistic
Custom: η(xobs) called summary statistic
ABC algorithm
Algorithm 2 Likelihood-free rejection sampler
for i = 1 to N do
repeat
generate θ from prior π(·)
generate z from sampling density f(·|θ )
until ρ{η(z), η(xobs)} ≤ ε
set θi = θ
end for
where η(xobs) defines a (not necessarily sufficient) statistic
Custom: η(xobs) called summary statistic
Example 3: robust Normal statistics
mu=rnorm(N<-1e6) #prior
sig=sqrt(rgamma(N,2,2))
medobs=median(obs)
madobs=mad(obs) #summary
for(t in diz<-1:N){
psud=rnorm(1e2)/sig[t]+mu[t]
medpsu=median(psud)-medobs
madpsu=mad(psud)-madobs
diz[t]=medpsu^2+madpsu^2}
#ABC subsample
subz=which(diz<quantile(diz,.1))
Why summaries?
reduction of dimension
improvement of signal to noise ratio
reduce tolerance ε considerably
whole data may be unavailable (as in Example 3)
medobs=median(obs)
madobs=mad(obs) #summary
Why not summaries?
loss of sufficient information when πABC(θ|xobs) replaced
with πABC(θ|η(xobs))
arbitrariness of summaries
uncalibrated approximation
whole data may be available (at same cost as summaries)
(empirical) distributions may be compared (Wasserstein
distances)
[Bernton et al., 2019]
Summary selection strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
Summary selection strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
1. Starting from large collection of summary statistics, Joyce
and Marjoram (2008) consider the sequential inclusion into the
ABC target, with a stopping rule based on a likelihood ratio test
Summary selection strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
2. Based on decision-theoretic principles, Fearnhead and
Prangle (2012) end up with E[θ|xobs] as the optimal summary
statistic
Summary selection strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
3. Use of indirect inference by Drovandi, Pettit, & Paddy
(2011) with estimators of parameters of auxiliary model as
summary statistics
Summary selection strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
4. Starting from large collection of summary statistics, Raynal
& al. (2018, 2019) rely on random forests to build estimators
and select summaries
Summary selection strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
5. Starting from large collection of summary statistics, Sedki &
Pudlo (2012) use the Lasso to eliminate summaries
ABC with random forests
Idea: Starting with
possibly large collection of summary statistics (η1, . . . , ηp)
(from scientific theory input to available statistical
softwares, methodologies, to machine-learning alternatives)
ABC reference table involving model index, parameter
values and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M or θ from (η1i, . . . , ηpi)
ABC with random forests
Idea: Starting with
possibly large collection of summary statistics (η1, . . . , ηp)
(from scientific theory input to available statistical
softwares, methodologies, to machine-learning alternatives)
ABC reference table involving model index, parameter
values and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M or θ from (η1i, . . . , ηpi)
Average of the trees is resulting summary statistics, highly
non-linear predictor of the model index
ABC with random forests
Idea: Starting with
possibly large collection of summary statistics (η1, . . . , ηp)
(from scientific theory input to available statistical
softwares, methodologies, to machine-learning alternatives)
ABC reference table involving model index, parameter
values and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M or θ from (η1i, . . . , ηpi)
Potential selection of most active summaries, calibrated against
pure noise
Classification of summaries by random forests
Given large collection of summary statistics, rather than
selecting a subset and excluding the others, estimate each
parameter by random forests
handles thousands of predictors
ignores useless components
fast estimation with good local properties
automatised with few calibration steps
substitute to Fearnhead and Prangle (2012) preliminary
estimation of ^θ(yobs)
includes a natural (classification) distance measure that
avoids choice of either distance or tolerance
[Marin et al., 2016, 2018]
Calibration of tolerance
Calibration of threshold ε
from scratch [how small is small?]
from k-nearest neighbour perspective [quantile of prior
predictive]
subz=which(diz<quantile(diz,.1))
from asymptotics [convergence speed]
related with choice of distance [automated selection by
random forests]
[Fearnhead & Prangle, 2012; Biau et al., 2013; Liu & Fearnhead 2018]
Advanced topics
3 Further approximations
big data issues
big parameter issues
Computational bottleneck
Time per iteration increases with sample size n of the data:
cost of sampling O(n1+??) associated with a reasonable
acceptance probability makes ABC infeasible for large datasets
surrogate models to get samples (e.g., using copulas)
[Li et al., 1996]
direct sampling of summary statistics (e.g., synthetic
likelihood)
[Wood, 2010]
borrow from proposals for scalable MCMC (e.g., divide
& conquer)
Approximate ABC [AABC]
Idea approximations on both parameter and model spaces by
resorting to bootstrap techniques.
[Buzbas & Rosenberg, 2015]
Procedure scheme
1. Sample (θi, xi), i = 1, . . . , m, from prior predictive
2. Simulate θ∗ ∼ π(·) and assign weight wi to dataset x(i)
simulated under k-closest θi to θ∗
3. Generate dataset x∗ as bootstrap weighted sample from
(x(1), . . . , x(k))
Drawbacks
If m too small, prior predictive sample may miss
informative parameters
large error and misleading representation of true posterior
only suited for models with very few parameters
Approximate ABC [AABC]
Procedure scheme
1. Sample (θi, xi), i = 1, . . . , m, from prior predictive
2. Simulate θ∗ ∼ π(·) and assign weight wi to dataset x(i)
simulated under k-closest θi to θ∗
3. Generate dataset x∗ as bootstrap weighted sample from
(x(1), . . . , x(k))
Drawbacks
If m too small, prior predictive sample may miss
informative parameters
large error and misleading representation of true posterior
only suited for models with very few parameters
Divide-and-conquer perspectives
1. divide the large dataset into smaller batches
2. sample from the batch posterior
3. combine the result to get a sample from the targeted
posterior
Alternative via ABC-EP
[Barthelm´e & Chopin, 2014]
Divide-and-conquer perspectives
1. divide the large dataset into smaller batches
2. sample from the batch posterior
3. combine the result to get a sample from the targeted
posterior
Alternative via ABC-EP
[Barthelm´e & Chopin, 2014]
Geometric combination: WASP
Subset posteriors given partition xobs
[1] , . . . , xobs
[B] of observed
data xobs, let define
π(θ | xobs
[b] ) ∝ π(θ)
j∈[b]
f(xobs
j | θ)B
.
[Srivastava et al., 2015]
Subset posteriors are combined via Wasserstein barycenter
[Cuturi, 2014]
Geometric combination: WASP
Subset posteriors given partition xobs
[1] , . . . , xobs
[B] of observed
data xobs, let define
π(θ | xobs
[b] ) ∝ π(θ)
j∈[b]
f(xobs
j | θ)B
.
[Srivastava et al., 2015]
Subset posteriors are combined via Wasserstein barycenter
[Cuturi, 2014]
Drawback require sampling from f(· | θ)B by ABC means.
Should be feasible for latent variable (z) representations when
f(x | z, θ) available in closed form
[Doucet & Robert, 2001]
Geometric combination: WASP
Subset posteriors given partition xobs
[1] , . . . , xobs
[B] of observed
data xobs, let define
π(θ | xobs
[b] ) ∝ π(θ)
j∈[b]
f(xobs
j | θ)B
.
[Srivastava et al., 2015]
Subset posteriors are combined via Wasserstein barycenter
[Cuturi, 2014]
Alternative backfeed subset posteriors as priors to other
subsets, partitioning summaries
Consensus ABC
Na¨ıve scheme
For each data batch b = 1, . . . , B
1. Sample (θ
[b]
1 , . . . , θ
[b]
n ) from diffused prior ˜π(·) ∝ π(·)1/B
2. Run ABC to sample from batch posterior
^π(· | d(S(xobs
[b] ), S(x[b])) < ε)
3. Compute sample posterior variance Σ−1
b
Combine batch posterior approximations
θj =
B
b=1
Σbθ
[b]
j
B
b=1
Σb
Remark Diffuse prior ˜π(·) non informative calls for
ABC-MCMC steps
Consensus ABC
Na¨ıve scheme
For each data batch b = 1, . . . , B
1. Sample (θ
[b]
1 , . . . , θ
[b]
n ) from diffused prior ˜π(·) ∝ π(·)1/B
2. Run ABC to sample from batch posterior
^π(· | d(S(xobs
[b] ), S(x[b])) < ε)
3. Compute sample posterior variance Σ−1
b
Combine batch posterior approximations
θj =
B
b=1
Σbθ
[b]
j
B
b=1
Σb
Remark Diffuse prior ˜π(·) non informative calls for
ABC-MCMC steps
Big parameter issues
Curse of dimension: as dim(Θ) = kθ increases
exploration of parameter space gets harder
summary statistic η forced to increase, since at least of
dimension kη ≥ dim(Θ)
Some solutions
adopt more local algorithms like ABC-MCMC or
ABC-SMC
aim at posterior marginals and approximate joint posterior
by copula
[Li et al., 2016]
run ABC-Gibbs
[Clart´e et al., 2020]
Component-wise approximation
Curse of dimensionality for vanilla ABC sampler
[Li & Fearnhead, 2018]
Gibbs-sampling as early solution to high dimension challenges
[Geman & Geman, 1984]
ABC version that allows for dramatic reduction in summary
dimension
Example 4: Hierarchical MA(2)
xi
iid
∼ MA2(µi, σi)
µi = (βi,1 − βi,2, 2(βi,1 + βi,2) − 1) with
(βi,1, βi,2, βi,3)
iid
∼ Dir(α1, α2, α3)
σi
iid
∼ IG(σ1, σ2)
α = (α1, α2, α3), with prior E(1)⊗3
σ = (σ1, σ2) with prior C(1)+⊗2
α
µ1 µ2 µn. . .
x1 x2 xn. . .
σ
σ1 σ2 σn. . .
Example 4: Hierarchical MA(2)
c Based on 106 prior and 103 posteriors simulations, 3n
summary statistics, and series of length 100, ABC-Rej posterior
hardly distinguishable from prior!
ABC-Gibbs
When parameter decomposed into θ = (θ1, . . . , θn)
Algorithm 3 ABC-Gibbs sampler
starting point θ(0) = (θ
(0)
1 , . . . , θ
(0)
n ), observation xobs
for i = 1, . . . , N do
for j = 1, . . . , n do
θ
(i)
j ∼ πεj
(· | x , sj, θ
(i)
1 , . . . , θ
(i)
j−1, θ
(i−1)
j+1 , . . . , θ
(i−1)
n )
end for
end for
Divide & conquer:
one different tolerance εj for each parameter θj
one different statistic sj for each parameter θj
[Clart´e et al., 2020]
ABC-Gibbs
When parameter decomposed into θ = (θ1, . . . , θn)
Algorithm 4 ABC-Gibbs sampler
starting point θ(0) = (θ
(0)
1 , . . . , θ
(0)
n ), observation xobs
for i = 1, . . . , N do
for j = 1, . . . , n do
θ
(i)
j ∼ πεj
(· | x , sj, θ
(i)
1 , . . . , θ
(i)
j−1, θ
(i−1)
j+1 , . . . , θ
(i−1)
n )
end for
end for
Divide & conquer:
one different tolerance εj for each parameter θj
one different statistic sj for each parameter θj
[Clart´e et al., 2020]
Compatibility
When using ABC-Gibbs conditionals with different acceptance
events, e.g., different statistics
π(α)π(sα(µ) | α) and π(µ)f(sµ(x ) | α, µ).
conditionals are incompatible
ABC-Gibbs does not necessarily converge (even for
tolerance equal to zero)
potential limiting distribution
not a genuine posterior (double use of data)
unknown [except for a specific version]
possibly far from genuine posterior(s)
requires a small reference table at each sub-iteration
[Clart´e et al., 2020]
Compatibility
When using ABC-Gibbs conditionals with different acceptance
events, e.g., different statistics
π(α)π(sα(µ) | α) and π(µ)f(sµ(x ) | α, µ).
conditionals are incompatible
ABC-Gibbs does not necessarily converge (even for
tolerance equal to zero)
potential limiting distribution
not a genuine posterior (double use of data)
unknown [except for a specific version]
possibly far from genuine posterior(s)
requires a small reference table at each sub-iteration
[Clart´e et al., 2020]
Compatibility
Assuming existence of two measurable functions u(α) and v(µ)
such that
π(α)π{sα(µ) | α}
π(µ | α)f{sµ(x) | µ}
= u(α)v(µ)
Proposition Existence of limiting distribution, plus
convergence, as both εµ and εα decrease to 0, to the stationary
distribution of the Gibbs sampler with conditionals
π(α)π{sα(µ) | α} and π(µ | α)f{sµ(x) | µ}
If sα sufficient for conditional π(µ|α), limiting distribution
equals π(α, µ)π{sµ(x) | µ}
Compatibility
Proposition Existence of limiting distribution, plus
convergence, as both εµ and εα decrease to 0, to the stationary
distribution of the Gibbs sampler with conditionals
π(α)π{sα(µ) | α} and π(µ | α)f{sµ(x) | µ}
If sα sufficient for conditional π(µ|α), limiting distribution
equals π(α, µ)π{sµ(x) | µ}
Proposition If π(θ1, θ2) = π(θ1)π(θ2) and sθ1
= sθ2
, as ε goes
to zero, ABC-Gibbs and ABC share same limiting distribution
Convergence
In hierarchical case n = 2,
Theorem If there exists 0 < κ < 1/2 such that
sup
θ1,˜θ1
πε2
(· | x , s2, θ1) − πε2
(· | x , s2, ˜θ1) TV = κ
ABC-Gibbs Markov chain geometrically converges in total
variation to stationary distribution νε, with geometric rate
1 − 2κ.
Example 4: Hierarchical MA(2)
Separation from the prior for identical number of simulations
A numerical comparison
Comparison of ABC-Gibbs with vanilla ABC and SMC-ABC
[Del Moral & Doucet, 2012]
Normal–Normal model
µj
iid
∼ N(α, σ2
) , xj,k
ind
∼ N(µj, σ2
) , j = 1, . . . , n, k = 1, . . . , K
with variances σ2 and σ2 known, and hyperprior α ∼ U[−4, 4]
A numerical comparison
α µ1 µ2 µ3
5103050100
−2.5 0.0 2.5 5.0 −2.5 0.0 2.5 5.0 −2.5 0.0 2.5 5.0 −2.5 0.0 2.5 5.0
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
Comparison of posterior densities of hyperparameter and first
three parameter components obtained with ABC (dashed line)
and ABC-Gibbs (full line) with identical computational cost
A numerical comparison
ABC−GibbsSimpleABCSMC−ABC
−2.5 0.0 2.5 5.0 7.5
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
µ1
density
ABC−GibbsSimpleABCSMC−ABC
0 1 2 3 4 5
0.0
0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
α
density
Full lines: posterior densities for 10 replicas of the algorithms.
Dashed line: exact posterior density
Example 5: hierarchical G & K distribution
G & K distribution benchmark of intractable distribution
defined by quantile function
F−1
(x; µ, B, g, k, c) = µ + B 1 + c
1 − e−gz(x)
1 + e−gz(x)
1 + z(x)2
k
z(x)
[Prangle, 2017]
Hierarchical extension
µi ∼ N(α, 1) xi ∼ gk(µi, B, g, k) i = 1, . . . , n
with hyperprior α ∼ U[−10, 10]
Example 5: hierarchical G & K distribution
Hierarchical extension
µi ∼ N(α, 1) xi ∼ gk(µi, B, g, k) i = 1, . . . , n
with hyperprior α ∼ U[−10, 10]
µ3 µ4
α µ1 µ2
8 9 10 11 12 8 9 10 11 12
8 9 10 11 12
0.0
2.5
5.0
7.5
10.0
0.0
2.5
5.0
7.5
10.0
Plain, dotted and dashed lines represent posterior
approximations from ABC-Gibbs, ABC-SMC and vanilla ABC,
resp.
Explicit limiting distribution
For model
xj | µj ∼ π(xj | µj) , µj | α
i.i.d.
∼ π(µj | α) , α ∼ π(α)
alternative ABC based on:
˜π(α, µ | x ) ∝ π(α)q(µ)
generate a new µ
π(˜µ | α)1η(sα(µ),sα(˜µ))<εα
d˜µ
× f(˜x | µ)π(x | µ)1η(sµ(x ),sµ(˜x))<εµ
d˜x,
with q arbitrary distribution on µ
Explicit limiting distribution
For model
xj | µj ∼ π(xj | µj) , µj | α
i.i.d.
∼ π(µj | α) , α ∼ π(α)
induces full conditionals
˜π(α | µ) ∝ π(α) π(˜µ | α)1η(sα(µ),sα(˜µ))<εα
d˜x
and
˜π(µ | α, x ) ∝ q(µ) π(˜µ | α)1η(sα(µ),sα(˜µ))<εα
d˜µ
× f(˜x | µ)π(x | µ)1η(sµ(x ),sµ(˜x))<εµ
d˜x
now compatible with new artificial joint
Explicit limiting distribution
For model
xj | µj ∼ π(xj | µj) , µj | α
i.i.d.
∼ π(µj | α) , α ∼ π(α)
that is,
prior simulations of α ∼ π(α) and of ˜µ ∼ π(˜µ | α) until
η(sα(µ), sα(˜µ)) < εα
simulation of µ from instrumental q(µ) and of auxiliary
variables ˜µ and ˜x until both constraints satisfied
Explicit limiting distribution
For model
xj | µj ∼ π(xj | µj) , µj | α
i.i.d.
∼ π(µj | α) , α ∼ π(α)
Resulting Gibbs sampler stationary for posterior proportional to
π(α, µ) q(sα(µ))
projection
f(sµ(x ) | µ)
projection
that is, for likelihood associated with sµ(x ) and prior
distribution proportional to π(α, µ)q(sα(µ)) [exact!]
Incoming ABC workshops
ABC in Grenoble, France, March 18-19 2020
ISBA(BC), Kunming, China, June 26-30 2020
ABC in Longyearbyen, Svalbard, April 11-13 2021

More Related Content

What's hot

accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
olli0601
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
Caleb (Shiqiang) Jin
 

What's hot (20)

Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chapters
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle sampler
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified models
 

Similar to ABC-Gibbs

Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
Matthew Magistrado
 

Similar to ABC-Gibbs (20)

Boston talk
Boston talkBoston talk
Boston talk
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chapters
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017
 
Likelihood free computational statistics
Likelihood free computational statisticsLikelihood free computational statistics
Likelihood free computational statistics
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
Pre-computation for ABC in image analysis
Pre-computation for ABC in image analysisPre-computation for ABC in image analysis
Pre-computation for ABC in image analysis
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and data
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 
Hessian Matrices in Statistics
Hessian Matrices in StatisticsHessian Matrices in Statistics
Hessian Matrices in Statistics
 

More from Christian Robert

More from Christian Robert (12)

discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 

Recently uploaded

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Recently uploaded (20)

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

ABC-Gibbs

  • 1. ABC-Gibbs Christian P. Robert U. Paris Dauphine PSL & Warwick U. & CREST ENSAE New(eb)castle, 13 Nov. 2019
  • 2. Outline 1 Motivation 2 Principle and calibration algorithm summary statistics random forests calibration of tolerance 3 Further approximations big data issues big parameter issues
  • 3. Posterior distribution Central concept of Bayesian inference: π( θ parameter | xobs observation ) proportional ∝ π(θ) prior × L(θ | xobs ) likelihood Drives derivation of optimal decisions assessment of uncertainty model selection prediction [McElreath, 2015]
  • 4. Monte Carlo representation Exploration of Bayesian posterior π(θ|xobs) may (!) require to produce sample θ1, . . . , θT distributed from π(θ|xobs) (or asymptotically by Markov chain Monte Carlo aka MCMC) [McElreath, 2015]
  • 5. Difficulties MCMC = workhorse of practical Bayesian analysis (BUGS, JAGS, Stan, &tc.), except when product π(θ) × L(θ | xobs ) well-defined but numerically unavailable or too costly to compute Only partial solutions are available: demarginalisation (latent variables) exchange algorithm (auxiliary variables) pseudo-marginal (unbiased estimator)
  • 6. Difficulties MCMC = workhorse of practical Bayesian analysis (BUGS, JAGS, Stan, &tc.), except when product π(θ) × L(θ | xobs ) well-defined but numerically unavailable or too costly to compute Only partial solutions are available: demarginalisation (latent variables) exchange algorithm (auxiliary variables) pseudo-marginal (unbiased estimator)
  • 7. Example 1: Dynamic mixture Mixture model {1 − wµ,τ(x)}fβ,λ(x) + wµ,τ(x)gε,σ(x) x > 0 where fβ,λ Weibull density, gε,σ generalised Pareto density, and wµ,τ Cauchy (arctan) cdf Intractable normalising constant C(µ, τ, β, λ, ε, σ) = ∞ 0 {(1 − wµ,τ(x))fβ,λ(x) + wµ,τ(x)gε,σ(x)} dx [Frigessi, Haug & Rue, 2002]
  • 8. Example 2: truncated Normal Given set A ⊂ Rk (k large), truncated Normal model f(x | µ, Σ, A) ∝ exp{−(x − µ)T Σ−1 (x − µ)/2} IA(x) with intractable normalising constant C(µ, Σ, A) = A exp{−(x − µ)T Σ−1 (x − µ)/2}dx
  • 9. Example 3: robust Normal statistics Normal sample x1, . . . , xn ∼ N(µ, σ2 ) summarised into (insufficient) ^µn = med(x1, . . . , xn) and ^σn = mad(x1, . . . , xn) = med |xi − ^µn| Under a conjugate prior π(µ, σ2), posterior close to intractable. but simulation of (^µn, ^σn) straightforward
  • 10. Example 3: robust Normal statistics Normal sample x1, . . . , xn ∼ N(µ, σ2 ) summarised into (insufficient) ^µn = med(x1, . . . , xn) and ^σn = mad(x1, . . . , xn) = med |xi − ^µn| Under a conjugate prior π(µ, σ2), posterior close to intractable. but simulation of (^µn, ^σn) straightforward
  • 11. Realistic[er] applications Kingman’s coalescent in population genetics [Tavar´e et al., 1997; Beaumont et al., 2003] α-stable distributions [Peters et al, 2012] complex networks [Dutta et al., 2018] astrostatistics & cosmostatistics [Cameron & Pettitt, 2012; Ishida et al., 2015]
  • 12. Principle and calibration 2 Principle and calibration algorithm summary statistics random forests calibration of tolerance 3 Further approximations
  • 13. A?B?C? A stands for approximate [wrong likelihood] B stands for Bayesian [right prior] C stands for computation [producing a parameter sample]
  • 14. A?B?C? Rough version of the data [from dot to ball] Non-parametric approximation of the likelihood [near actual observation] Use of non-sufficient statistics [dimension reduction] Monte Carlo error [and no unbiasedness]
  • 15. A seemingly na¨ıve representation When likelihood f(x|θ) not in closed form, likelihood-free rejection technique: ABC-AR algorithm For an observation xobs ∼ f(x|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f(z|θ ) , until the auxiliary variable z is equal to the observed value, z = xobs [Diggle & Gratton, 1984; Rubin, 1984; Tavar´e et al., 1997]
  • 16. A seemingly na¨ıve representation When likelihood f(x|θ) not in closed form, likelihood-free rejection technique: ABC-AR algorithm For an observation xobs ∼ f(x|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f(z|θ ) , until the auxiliary variable z is equal to the observed value, z = xobs [Diggle & Gratton, 1984; Rubin, 1984; Tavar´e et al., 1997]
  • 17. A as approximative When y is a continuous random variable, strict equality z = xobs is replaced with a tolerance zone ρ(xobs , z) ≤ ε where ρ is a distance Output distributed from π(θ) Pθ{ρ(xobs , z) < ε} def ∝ π(θ|ρ(xobs , z) < ε) [Pritchard et al., 1999]
  • 18. A as approximative When y is a continuous random variable, strict equality z = xobs is replaced with a tolerance zone ρ(xobs , z) ≤ ε where ρ is a distance Output distributed from π(θ) Pθ{ρ(xobs , z) < ε} def ∝ π(θ|ρ(xobs , z) < ε) [Pritchard et al., 1999]
  • 19. ABC algorithm Algorithm 1 Likelihood-free rejection sampler for i = 1 to N do repeat generate θ from prior π(·) generate z from sampling density f(·|θ ) until ρ{η(z), η(xobs)} ≤ ε set θi = θ end for where η(xobs) defines a (not necessarily sufficient) statistic Custom: η(xobs) called summary statistic
  • 20. ABC algorithm Algorithm 2 Likelihood-free rejection sampler for i = 1 to N do repeat generate θ from prior π(·) generate z from sampling density f(·|θ ) until ρ{η(z), η(xobs)} ≤ ε set θi = θ end for where η(xobs) defines a (not necessarily sufficient) statistic Custom: η(xobs) called summary statistic
  • 21. Example 3: robust Normal statistics mu=rnorm(N<-1e6) #prior sig=sqrt(rgamma(N,2,2)) medobs=median(obs) madobs=mad(obs) #summary for(t in diz<-1:N){ psud=rnorm(1e2)/sig[t]+mu[t] medpsu=median(psud)-medobs madpsu=mad(psud)-madobs diz[t]=medpsu^2+madpsu^2} #ABC subsample subz=which(diz<quantile(diz,.1))
  • 22. Why summaries? reduction of dimension improvement of signal to noise ratio reduce tolerance ε considerably whole data may be unavailable (as in Example 3) medobs=median(obs) madobs=mad(obs) #summary
  • 23. Why not summaries? loss of sufficient information when πABC(θ|xobs) replaced with πABC(θ|η(xobs)) arbitrariness of summaries uncalibrated approximation whole data may be available (at same cost as summaries) (empirical) distributions may be compared (Wasserstein distances) [Bernton et al., 2019]
  • 24. Summary selection strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field]
  • 25. Summary selection strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 1. Starting from large collection of summary statistics, Joyce and Marjoram (2008) consider the sequential inclusion into the ABC target, with a stopping rule based on a likelihood ratio test
  • 26. Summary selection strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 2. Based on decision-theoretic principles, Fearnhead and Prangle (2012) end up with E[θ|xobs] as the optimal summary statistic
  • 27. Summary selection strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 3. Use of indirect inference by Drovandi, Pettit, & Paddy (2011) with estimators of parameters of auxiliary model as summary statistics
  • 28. Summary selection strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 4. Starting from large collection of summary statistics, Raynal & al. (2018, 2019) rely on random forests to build estimators and select summaries
  • 29. Summary selection strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 5. Starting from large collection of summary statistics, Sedki & Pudlo (2012) use the Lasso to eliminate summaries
  • 30. ABC with random forests Idea: Starting with possibly large collection of summary statistics (η1, . . . , ηp) (from scientific theory input to available statistical softwares, methodologies, to machine-learning alternatives) ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M or θ from (η1i, . . . , ηpi)
  • 31. ABC with random forests Idea: Starting with possibly large collection of summary statistics (η1, . . . , ηp) (from scientific theory input to available statistical softwares, methodologies, to machine-learning alternatives) ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M or θ from (η1i, . . . , ηpi) Average of the trees is resulting summary statistics, highly non-linear predictor of the model index
  • 32. ABC with random forests Idea: Starting with possibly large collection of summary statistics (η1, . . . , ηp) (from scientific theory input to available statistical softwares, methodologies, to machine-learning alternatives) ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M or θ from (η1i, . . . , ηpi) Potential selection of most active summaries, calibrated against pure noise
  • 33. Classification of summaries by random forests Given large collection of summary statistics, rather than selecting a subset and excluding the others, estimate each parameter by random forests handles thousands of predictors ignores useless components fast estimation with good local properties automatised with few calibration steps substitute to Fearnhead and Prangle (2012) preliminary estimation of ^θ(yobs) includes a natural (classification) distance measure that avoids choice of either distance or tolerance [Marin et al., 2016, 2018]
  • 34. Calibration of tolerance Calibration of threshold ε from scratch [how small is small?] from k-nearest neighbour perspective [quantile of prior predictive] subz=which(diz<quantile(diz,.1)) from asymptotics [convergence speed] related with choice of distance [automated selection by random forests] [Fearnhead & Prangle, 2012; Biau et al., 2013; Liu & Fearnhead 2018]
  • 35. Advanced topics 3 Further approximations big data issues big parameter issues
  • 36. Computational bottleneck Time per iteration increases with sample size n of the data: cost of sampling O(n1+??) associated with a reasonable acceptance probability makes ABC infeasible for large datasets surrogate models to get samples (e.g., using copulas) [Li et al., 1996] direct sampling of summary statistics (e.g., synthetic likelihood) [Wood, 2010] borrow from proposals for scalable MCMC (e.g., divide & conquer)
  • 37. Approximate ABC [AABC] Idea approximations on both parameter and model spaces by resorting to bootstrap techniques. [Buzbas & Rosenberg, 2015] Procedure scheme 1. Sample (θi, xi), i = 1, . . . , m, from prior predictive 2. Simulate θ∗ ∼ π(·) and assign weight wi to dataset x(i) simulated under k-closest θi to θ∗ 3. Generate dataset x∗ as bootstrap weighted sample from (x(1), . . . , x(k)) Drawbacks If m too small, prior predictive sample may miss informative parameters large error and misleading representation of true posterior only suited for models with very few parameters
  • 38. Approximate ABC [AABC] Procedure scheme 1. Sample (θi, xi), i = 1, . . . , m, from prior predictive 2. Simulate θ∗ ∼ π(·) and assign weight wi to dataset x(i) simulated under k-closest θi to θ∗ 3. Generate dataset x∗ as bootstrap weighted sample from (x(1), . . . , x(k)) Drawbacks If m too small, prior predictive sample may miss informative parameters large error and misleading representation of true posterior only suited for models with very few parameters
  • 39. Divide-and-conquer perspectives 1. divide the large dataset into smaller batches 2. sample from the batch posterior 3. combine the result to get a sample from the targeted posterior Alternative via ABC-EP [Barthelm´e & Chopin, 2014]
  • 40. Divide-and-conquer perspectives 1. divide the large dataset into smaller batches 2. sample from the batch posterior 3. combine the result to get a sample from the targeted posterior Alternative via ABC-EP [Barthelm´e & Chopin, 2014]
  • 41. Geometric combination: WASP Subset posteriors given partition xobs [1] , . . . , xobs [B] of observed data xobs, let define π(θ | xobs [b] ) ∝ π(θ) j∈[b] f(xobs j | θ)B . [Srivastava et al., 2015] Subset posteriors are combined via Wasserstein barycenter [Cuturi, 2014]
  • 42. Geometric combination: WASP Subset posteriors given partition xobs [1] , . . . , xobs [B] of observed data xobs, let define π(θ | xobs [b] ) ∝ π(θ) j∈[b] f(xobs j | θ)B . [Srivastava et al., 2015] Subset posteriors are combined via Wasserstein barycenter [Cuturi, 2014] Drawback require sampling from f(· | θ)B by ABC means. Should be feasible for latent variable (z) representations when f(x | z, θ) available in closed form [Doucet & Robert, 2001]
  • 43. Geometric combination: WASP Subset posteriors given partition xobs [1] , . . . , xobs [B] of observed data xobs, let define π(θ | xobs [b] ) ∝ π(θ) j∈[b] f(xobs j | θ)B . [Srivastava et al., 2015] Subset posteriors are combined via Wasserstein barycenter [Cuturi, 2014] Alternative backfeed subset posteriors as priors to other subsets, partitioning summaries
  • 44. Consensus ABC Na¨ıve scheme For each data batch b = 1, . . . , B 1. Sample (θ [b] 1 , . . . , θ [b] n ) from diffused prior ˜π(·) ∝ π(·)1/B 2. Run ABC to sample from batch posterior ^π(· | d(S(xobs [b] ), S(x[b])) < ε) 3. Compute sample posterior variance Σ−1 b Combine batch posterior approximations θj = B b=1 Σbθ [b] j B b=1 Σb Remark Diffuse prior ˜π(·) non informative calls for ABC-MCMC steps
  • 45. Consensus ABC Na¨ıve scheme For each data batch b = 1, . . . , B 1. Sample (θ [b] 1 , . . . , θ [b] n ) from diffused prior ˜π(·) ∝ π(·)1/B 2. Run ABC to sample from batch posterior ^π(· | d(S(xobs [b] ), S(x[b])) < ε) 3. Compute sample posterior variance Σ−1 b Combine batch posterior approximations θj = B b=1 Σbθ [b] j B b=1 Σb Remark Diffuse prior ˜π(·) non informative calls for ABC-MCMC steps
  • 46. Big parameter issues Curse of dimension: as dim(Θ) = kθ increases exploration of parameter space gets harder summary statistic η forced to increase, since at least of dimension kη ≥ dim(Θ) Some solutions adopt more local algorithms like ABC-MCMC or ABC-SMC aim at posterior marginals and approximate joint posterior by copula [Li et al., 2016] run ABC-Gibbs [Clart´e et al., 2020]
  • 47. Component-wise approximation Curse of dimensionality for vanilla ABC sampler [Li & Fearnhead, 2018] Gibbs-sampling as early solution to high dimension challenges [Geman & Geman, 1984] ABC version that allows for dramatic reduction in summary dimension
  • 48. Example 4: Hierarchical MA(2) xi iid ∼ MA2(µi, σi) µi = (βi,1 − βi,2, 2(βi,1 + βi,2) − 1) with (βi,1, βi,2, βi,3) iid ∼ Dir(α1, α2, α3) σi iid ∼ IG(σ1, σ2) α = (α1, α2, α3), with prior E(1)⊗3 σ = (σ1, σ2) with prior C(1)+⊗2 α µ1 µ2 µn. . . x1 x2 xn. . . σ σ1 σ2 σn. . .
  • 49. Example 4: Hierarchical MA(2) c Based on 106 prior and 103 posteriors simulations, 3n summary statistics, and series of length 100, ABC-Rej posterior hardly distinguishable from prior!
  • 50. ABC-Gibbs When parameter decomposed into θ = (θ1, . . . , θn) Algorithm 3 ABC-Gibbs sampler starting point θ(0) = (θ (0) 1 , . . . , θ (0) n ), observation xobs for i = 1, . . . , N do for j = 1, . . . , n do θ (i) j ∼ πεj (· | x , sj, θ (i) 1 , . . . , θ (i) j−1, θ (i−1) j+1 , . . . , θ (i−1) n ) end for end for Divide & conquer: one different tolerance εj for each parameter θj one different statistic sj for each parameter θj [Clart´e et al., 2020]
  • 51. ABC-Gibbs When parameter decomposed into θ = (θ1, . . . , θn) Algorithm 4 ABC-Gibbs sampler starting point θ(0) = (θ (0) 1 , . . . , θ (0) n ), observation xobs for i = 1, . . . , N do for j = 1, . . . , n do θ (i) j ∼ πεj (· | x , sj, θ (i) 1 , . . . , θ (i) j−1, θ (i−1) j+1 , . . . , θ (i−1) n ) end for end for Divide & conquer: one different tolerance εj for each parameter θj one different statistic sj for each parameter θj [Clart´e et al., 2020]
  • 52. Compatibility When using ABC-Gibbs conditionals with different acceptance events, e.g., different statistics π(α)π(sα(µ) | α) and π(µ)f(sµ(x ) | α, µ). conditionals are incompatible ABC-Gibbs does not necessarily converge (even for tolerance equal to zero) potential limiting distribution not a genuine posterior (double use of data) unknown [except for a specific version] possibly far from genuine posterior(s) requires a small reference table at each sub-iteration [Clart´e et al., 2020]
  • 53. Compatibility When using ABC-Gibbs conditionals with different acceptance events, e.g., different statistics π(α)π(sα(µ) | α) and π(µ)f(sµ(x ) | α, µ). conditionals are incompatible ABC-Gibbs does not necessarily converge (even for tolerance equal to zero) potential limiting distribution not a genuine posterior (double use of data) unknown [except for a specific version] possibly far from genuine posterior(s) requires a small reference table at each sub-iteration [Clart´e et al., 2020]
  • 54. Compatibility Assuming existence of two measurable functions u(α) and v(µ) such that π(α)π{sα(µ) | α} π(µ | α)f{sµ(x) | µ} = u(α)v(µ) Proposition Existence of limiting distribution, plus convergence, as both εµ and εα decrease to 0, to the stationary distribution of the Gibbs sampler with conditionals π(α)π{sα(µ) | α} and π(µ | α)f{sµ(x) | µ} If sα sufficient for conditional π(µ|α), limiting distribution equals π(α, µ)π{sµ(x) | µ}
  • 55. Compatibility Proposition Existence of limiting distribution, plus convergence, as both εµ and εα decrease to 0, to the stationary distribution of the Gibbs sampler with conditionals π(α)π{sα(µ) | α} and π(µ | α)f{sµ(x) | µ} If sα sufficient for conditional π(µ|α), limiting distribution equals π(α, µ)π{sµ(x) | µ} Proposition If π(θ1, θ2) = π(θ1)π(θ2) and sθ1 = sθ2 , as ε goes to zero, ABC-Gibbs and ABC share same limiting distribution
  • 56. Convergence In hierarchical case n = 2, Theorem If there exists 0 < κ < 1/2 such that sup θ1,˜θ1 πε2 (· | x , s2, θ1) − πε2 (· | x , s2, ˜θ1) TV = κ ABC-Gibbs Markov chain geometrically converges in total variation to stationary distribution νε, with geometric rate 1 − 2κ.
  • 57. Example 4: Hierarchical MA(2) Separation from the prior for identical number of simulations
  • 58. A numerical comparison Comparison of ABC-Gibbs with vanilla ABC and SMC-ABC [Del Moral & Doucet, 2012] Normal–Normal model µj iid ∼ N(α, σ2 ) , xj,k ind ∼ N(µj, σ2 ) , j = 1, . . . , n, k = 1, . . . , K with variances σ2 and σ2 known, and hyperprior α ∼ U[−4, 4]
  • 59. A numerical comparison α µ1 µ2 µ3 5103050100 −2.5 0.0 2.5 5.0 −2.5 0.0 2.5 5.0 −2.5 0.0 2.5 5.0 −2.5 0.0 2.5 5.0 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 Comparison of posterior densities of hyperparameter and first three parameter components obtained with ABC (dashed line) and ABC-Gibbs (full line) with identical computational cost
  • 60. A numerical comparison ABC−GibbsSimpleABCSMC−ABC −2.5 0.0 2.5 5.0 7.5 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 µ1 density ABC−GibbsSimpleABCSMC−ABC 0 1 2 3 4 5 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 α density Full lines: posterior densities for 10 replicas of the algorithms. Dashed line: exact posterior density
  • 61. Example 5: hierarchical G & K distribution G & K distribution benchmark of intractable distribution defined by quantile function F−1 (x; µ, B, g, k, c) = µ + B 1 + c 1 − e−gz(x) 1 + e−gz(x) 1 + z(x)2 k z(x) [Prangle, 2017] Hierarchical extension µi ∼ N(α, 1) xi ∼ gk(µi, B, g, k) i = 1, . . . , n with hyperprior α ∼ U[−10, 10]
  • 62. Example 5: hierarchical G & K distribution Hierarchical extension µi ∼ N(α, 1) xi ∼ gk(µi, B, g, k) i = 1, . . . , n with hyperprior α ∼ U[−10, 10] µ3 µ4 α µ1 µ2 8 9 10 11 12 8 9 10 11 12 8 9 10 11 12 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 Plain, dotted and dashed lines represent posterior approximations from ABC-Gibbs, ABC-SMC and vanilla ABC, resp.
  • 63. Explicit limiting distribution For model xj | µj ∼ π(xj | µj) , µj | α i.i.d. ∼ π(µj | α) , α ∼ π(α) alternative ABC based on: ˜π(α, µ | x ) ∝ π(α)q(µ) generate a new µ π(˜µ | α)1η(sα(µ),sα(˜µ))<εα d˜µ × f(˜x | µ)π(x | µ)1η(sµ(x ),sµ(˜x))<εµ d˜x, with q arbitrary distribution on µ
  • 64. Explicit limiting distribution For model xj | µj ∼ π(xj | µj) , µj | α i.i.d. ∼ π(µj | α) , α ∼ π(α) induces full conditionals ˜π(α | µ) ∝ π(α) π(˜µ | α)1η(sα(µ),sα(˜µ))<εα d˜x and ˜π(µ | α, x ) ∝ q(µ) π(˜µ | α)1η(sα(µ),sα(˜µ))<εα d˜µ × f(˜x | µ)π(x | µ)1η(sµ(x ),sµ(˜x))<εµ d˜x now compatible with new artificial joint
  • 65. Explicit limiting distribution For model xj | µj ∼ π(xj | µj) , µj | α i.i.d. ∼ π(µj | α) , α ∼ π(α) that is, prior simulations of α ∼ π(α) and of ˜µ ∼ π(˜µ | α) until η(sα(µ), sα(˜µ)) < εα simulation of µ from instrumental q(µ) and of auxiliary variables ˜µ and ˜x until both constraints satisfied
  • 66. Explicit limiting distribution For model xj | µj ∼ π(xj | µj) , µj | α i.i.d. ∼ π(µj | α) , α ∼ π(α) Resulting Gibbs sampler stationary for posterior proportional to π(α, µ) q(sα(µ)) projection f(sµ(x ) | µ) projection that is, for likelihood associated with sµ(x ) and prior distribution proportional to π(α, µ)q(sα(µ)) [exact!]
  • 67. Incoming ABC workshops ABC in Grenoble, France, March 18-19 2020 ISBA(BC), Kunming, China, June 26-30 2020 ABC in Longyearbyen, Svalbard, April 11-13 2021