User Guide: Orion™ Weather Station (Columbia Weather Systems)
Likelihood-free Design: a discussion
1. “likelihood-free design @ SimStat 2019:
a discussion
Christian P. Robert
Universit´e Paris-Dauphine, Paris & University of Warwick, Coventry
& CREST, Paris
bayesianstatistics@gmail.com
2. Bayesian design
Natural decision theoretic framework for constructing design
arg min
d
E[L(Y , θ, d)]
Main message: optimisation program opposed to full Bayesian
inference, meaning potentially faster tools than Monte Carlo
3. intractable likelihood
Complex models often lead to intractable likelihoods, even
though they are well-defined and identifiable. Particularly for
dynamic models like ODEs (Lotke-Voltera benchmark)
Optimal design thus intractable as well if aiming at optimising
expected utility function under equally intractable posterior
Potentially aggravated by choice of utility function related
with likelihood, like Hyv¨arinen score
2∆ log p(x) + || p(x)||2
[Hyv¨arinen, 2005; Dawid & Musio, 2015]
4. [in]tractable target
Minimisation program implies
no observed data
entire posterior distribution of limited relevance
only one expectation (in the parameter θ) used to define
function of the design d
expectation being in terms of the prior predictive
no clear need for ABC, synthetic likelihood and similar heavy-duty
approximations
5. Bayesian tools
Potential for making program more Bayesian by
Bayesian non-parametrics (e.g., LFRIRE)
[Thomas et al., 2018]
annealing algorithms like SAME (State-Augmentation for
Marginal Estimation) turning design d into another
component of the simulation
[Doucet, Godsill, X, 2002]
posterior on design d
Bayesian Lasso
[Park & Casella, 2008]
6. experimental Bayesian design
[Kleinegesse & Gutmann]
Link with earlier papers of Michael Gutmann and co-authors,
e.g. through default call to Gaussian processes (GP)
Gaussian processes as black box mecchanisms which accuracy
hard to predict
Potential dimension curse, although achieving 192 dimensions
quite impressive
Difficult to assess impact of approximation except via
(costly?) repreated experiments
7. LFRIRE
Require approximation to
log
p(θ|y(i), d)
p(θ)
(4)
Likelihood-Free Inference by Ratio Estimation (LFRIRE) resolution
interesting expansion on Geyer’s 1994 estimate of the
marginal likelihood (and Gutmann & Hyv¨arinen’s 2012 noise
contrastive improvement)
unavailability of both likelihood and marginal adds to
complexity
sounds fairly costly especially when imbedded within design
optimisation
8. logistic estimation of constants
When p(x) known up to constant c, i.e.
p(x) = c˜p(x)
samples
x1, . . . , xn ∼ p(x) and y1, . . . , yn ∼ q(x)
allows for estimating c
P(X ∼ p(x)) =
p(x)
p(x) + q(x)
= 1 1 + exp{log q(x)/p(x)}
= 1 1 + exp{− log c + log q(x)/˜p(x)}
making c intercept in logistic regression
[Geyer, 1994]
9. LFRIRE extension (Thomas & al., 2018)
estimating likelihood (and posterior) when intractable
likelihood-free but not ABC, since ratio likelihood to marginal
estimated in a non- or semi-parametric (and biased) way
probabilistic logistic classification and (arbitrary) exponential
family representation of the ratio based on (arbitrary)
summary statistics
simulated data from density conditional on parameter θ and
data from the marginal, assuming both can be readily
estimating exponential family parameters β(θ) by minimizing
classification error with sparsity regularisation (Bayesian
Lasso?)
selection of summaries, but for each value of parameter θ
comparison with more standard density estimation methods
(convergence speed)
10. Gaussian processes
default modelling for complex unknown functions
still requires some calibration (e.g. Mat´ern covariance)
impact of prior modelling separable from generic imprecision
of inference?
limitations of this modelling as representation of reality?
11. Bayesian design
[Overstall, Woods and Parker]
when design loss is pre-posterior variance trace, Monte Carlo
simulation could be considered, with potential simulated
annealing additions
Gaussian process again default approximation, required at
every iteration
Nice to see probabilistic numerics come into play, but seems
required as well at every iteration and adds uncertainty and
imprecision
can loss of efficiency due to GP approximation be assessed as
such?
unclear [complexity of the] link between G(·, ·, ·) and u(·)
12. Optimal Bayesian model discrimination design
[Hainy et al.]
Surprising use of design across models rather than within models
as clear impact of prior weights on models
Unsurprising that classification approach does better than ABC as
no need for rejection
Recall ABC not that great at separating between models and even
less at assessing posterior model probabilities
[X, Cornuet, Marin & Pillai, 2011]
With random forests additional ABC step unnecessary as it lowers
efficiency and requires more simulations
[Pudlo & al., 2016]
Random forests do allow for evaluating model misspecification
error and posterior variance
[Pudlo & al., 2016, 2018]
13. Optimal Bayesian model discrimination design
if need be, what about using more traditional ways to
approximate the marginal p(m|y, d) like Geyer’s (1994)?
harder when utility itself is defined in terms of likelihood : any
strong motivation for this choice (since all models are wrong)
even when comparing models?
log marginal unavailable, thus unclear why deviance would
beavailable, unless logistic approximation `a la Geyer called
why CART rather than BART?!
[Chipman & al., 2010]
as for random forests lack of theory to connect with marginal
and consistency of outcome