Scanning the Internet for External Cloud Exposures via SSL Certs
Uncertainty Estimation in Deep Learning
1. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Uncertainty Estimation
in Deep Learning
A brief introduction
Christian S. Perone
christian.perone@gmail.com
http://blog.christianperone.com
2. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Agenda
Uncertainties
Knowing what you don’t know
The problem
Different Uncertainties
Importance of Uncertainty
Bayesian Inference
The frequentist way
The bayesian inference
MCMC Sampling
Deep Learning
Short intro
Bayesian Neural Networks
Variational Inference
Introduction
Posterior Approximation
Training a BNN
Dropout
Ensembles
Introduction
Deep Ensembles
Randomized Prior Functions
Final Remarks
Q&A
3. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Who Am I
Christian S. Perone
BSc in Computer Science in Brazil (UPF),
MSc in Biomedical Eng. in Montreal
(Polytechnique/UdeM)
Machine Learning / Data Science
Working at Jungle
Blog at
blog.christianperone.com
Open-source projects
https://github.com/perone
Twitter @tarantulae
4. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Section I
Uncertainties
5. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Knowing what you don’t know
It is correct, somebody might say, that
(...) Socrates did not know anything; and
it was indeed wisdom that they
recognized their own lack of knowledge,
(...).
—Karl R. Popper, The World of Parmenides
6. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Knowing what you don’t know
It is correct, somebody might say, that
(...) Socrates did not know anything; and
it was indeed wisdom that they
recognized their own lack of knowledge,
(...).
—Karl R. Popper, The World of Parmenides
What this has to do statistical learning ?
7. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
The problem
Let’s say you trained a model to classify an image as having lesion or
not;
Different MRI contrasts (T2/T1). Source: http://www.msdiscovery.org. 2019.
8. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
The problem
Let’s say you trained a model to classify an image as having lesion or
not;
Different MRI contrasts (T2/T1). Source: http://www.msdiscovery.org. 2019.
Later you do prediction on volumes with different parametrization,
anatomy, etc;
9. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
The problem
Let’s say you trained a model to classify an image as having lesion or
not;
Different MRI contrasts (T2/T1). Source: http://www.msdiscovery.org. 2019.
Later you do prediction on volumes with different parametrization,
anatomy, etc;
The problem: you can still have a prediction with high probability,
even if your sample is out-of-distribution.
10. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
The problem
A simple regression problem.
Source: Yarin Gal. Uncertainty in Deep Learning. PhD Thesis. 2016.
11. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
The problem
A simple regression problem.
6 4 2 0 2 4 6
20
10
0
10
20
30
40
Source: Ian Osband et al. Using Randomized Prior Functions for Deep Reinforcement Learning. NIPS
2018. Image from: http://blog.christianperone.com
12. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Different Uncertainties
Two main types of uncertainty, often confused by practitioners, but very
different quantities:
13. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Different Uncertainties
Two main types of uncertainty, often confused by practitioners, but very
different quantities:
Aleatoric Uncertainty
Information data cannot explain, also called data uncertainty, or irreducible
uncertainty. More data might not reduce it;
Ex: increasing measurement precision can reduce it.
14. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Different Uncertainties
Two main types of uncertainty, often confused by practitioners, but very
different quantities:
Aleatoric Uncertainty
Information data cannot explain, also called data uncertainty, or irreducible
uncertainty. More data might not reduce it;
Ex: increasing measurement precision can reduce it.
Epistemic Uncertainty
Uncertainty in the model itself, also called model uncertainty, or reducible
uncertainty;
Ex: can be explained away by increasing training size.
15. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Importance of Uncertainty
Medical imaging (classification, segmentation);
16. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Importance of Uncertainty
Medical imaging (classification, segmentation);
Autonomous vehicles (what’s the uncertainty this object is a tree ?);
17. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Importance of Uncertainty
Medical imaging (classification, segmentation);
Autonomous vehicles (what’s the uncertainty this object is a tree ?);
Active Learning (which sample should be labeled ?);
18. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Importance of Uncertainty
Medical imaging (classification, segmentation);
Autonomous vehicles (what’s the uncertainty this object is a tree ?);
Active Learning (which sample should be labeled ?);
Explore/exploit dilemma in reinforcement learning;
19. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Importance of Uncertainty
Medical imaging (classification, segmentation);
Autonomous vehicles (what’s the uncertainty this object is a tree ?);
Active Learning (which sample should be labeled ?);
Explore/exploit dilemma in reinforcement learning;
Out-of-distribution detection;
20. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Importance of Uncertainty
Medical imaging (classification, segmentation);
Autonomous vehicles (what’s the uncertainty this object is a tree ?);
Active Learning (which sample should be labeled ?);
Explore/exploit dilemma in reinforcement learning;
Out-of-distribution detection;
Model understanding/dataset understanding;
21. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Importance of Uncertainty
Medical imaging (classification, segmentation);
Autonomous vehicles (what’s the uncertainty this object is a tree ?);
Active Learning (which sample should be labeled ?);
Explore/exploit dilemma in reinforcement learning;
Out-of-distribution detection;
Model understanding/dataset understanding;
Nearly all applications !
22. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Example in Reinforcement Learning
The explore/exploit dilemma:
23. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Example in Reinforcement Learning
Work by Maxime Wabartha et al.:
estimated by taking, for each approach, the pointwise average and standard deviation over 50 sampled
functions. We expect the empirical posterior predictive distribution to cover the ground truth function.
While we succeed to do so using a MSE loss and the proposed approach, we do not manage to obtain
diverse functions using solely anchoring neither using dropout; in our experiments, changing the
dropout rate did not improve the quality of the obtained uncertainty. Input bootstrapping does produce
functions that better span the width of outputs, but it also disregards by nature certain points of the
training set, where we expect the uncertainty to be low given our current knowledge. We also provide
in the appendix an example of the functions generated by our function approach when fixing X.
0.4 0.2 0.0 0.2 0.4
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
Dropout 0.2
0.4 0.2 0.0 0.2 0.4
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
Input bootstrapping
0.4 0.2 0.0 0.2 0.4
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
AnchoringGround truth
Sample function
Standard deviations
Training set
0.4 0.2 0.0 0.2 0.4
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
RepulsiveReference
function
Figure 1: Comparison of the empirical (over 20 sample functions) posterior predictive distribution
for dropout, input bootstrapping, anchoring and repulsive constraint.
3.2 Diverse functions in high-dimensional input space
We apply the method to function approximation in the case of a reinforcement learning problem
requiring exploration. More precisely, we showcase how our method can help sample diverse reward
functions in a model-based setting. We create a dataset of 43 13x13 frames with the associated reward.
We use as function approximator a small CNN outputing a reward for a given frame (see appendix).
To illustrate our method, we sample the repulsive points from possible frames, thus directly from the
manifold, in or out of the training distribution (see appendix). Figure 2 (rightmost figure) shows how
Source: Maxime Wabartha et al. Sampling diverse neural networks for exploration in reinforcement
learning. NIPS 2018.
24. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Section II
Bayesian Inference
25. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
A simple frequentist regression
In a frequentist linear regression, we have a point estimate for the
parameters of our model.
For a maximum likelihood derivation, take a look at
http://blog.christianperone.com/2019/01/mle/.
26. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
A simple frequentist regression
In a frequentist linear regression, we have a point estimate for the
parameters of our model.
First, we define our model:
f(x) = θ0 + θ1x1 + θ2x2 + . . . =
Vectorial notation
x β
For a maximum likelihood derivation, take a look at
http://blog.christianperone.com/2019/01/mle/.
27. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
A simple frequentist regression
In a frequentist linear regression, we have a point estimate for the
parameters of our model.
First, we define our model:
f(x) = θ0 + θ1x1 + θ2x2 + . . . =
Vectorial notation
x β
Later, we define a loss such as the MSE (mean squared error):
L =
1
n
n
i=1
(f(xi) − yi)2
For a maximum likelihood derivation, take a look at
http://blog.christianperone.com/2019/01/mle/.
28. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
A simple frequentist regression
In a frequentist linear regression, we have a point estimate for the
parameters of our model.
First, we define our model:
f(x) = θ0 + θ1x1 + θ2x2 + . . . =
Vectorial notation
x β
Later, we define a loss such as the MSE (mean squared error):
L =
1
n
n
i=1
(f(xi) − yi)2
Finally, we optimize it:
ˆθ = arg min
θ
L(f(x), y)
For a maximum likelihood derivation, take a look at
http://blog.christianperone.com/2019/01/mle/.
29. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
A simple frequentist regression
0.0 0.2 0.4 0.6 0.8 1.0
x
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
y
Frequentist regression
sample data
regression line
30. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
The bayesian way
Bayesian approaches represent the uncertainty using a distribution over
parameters. Instead of a point estimate, we have an entire posterior.
31. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
The bayesian way
Bayesian approaches represent the uncertainty using a distribution over
parameters. Instead of a point estimate, we have an entire posterior.
To formulate our bayesian regression, we first select a likelihood;
32. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
The bayesian way
Bayesian approaches represent the uncertainty using a distribution over
parameters. Instead of a point estimate, we have an entire posterior.
To formulate our bayesian regression, we first select a likelihood;
After that, we select priors over parameters;
33. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
The bayesian way
Bayesian approaches represent the uncertainty using a distribution over
parameters. Instead of a point estimate, we have an entire posterior.
To formulate our bayesian regression, we first select a likelihood;
After that, we select priors over parameters;
Then we compute or approximate (sampling) the posterior of our
model and data.
34. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Prior, likelihood and posterior
1 2 3
Credibility
Prior
35. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Prior, likelihood and posterior
1 2 3
Credibility
Prior
1 2 3
Credibility
Data
36. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Prior, likelihood and posterior
1 2 3
Credibility
Prior
1 2 3
Credibility
Data
1 2 3
Credibility
Posterior
37. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Prior, likelihood and posterior
Posterior
p(θ|X)
38. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Prior, likelihood and posterior
Posterior
p(θ|X) ∝ p(X|θ)
Likelihood
39. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Prior, likelihood and posterior
Posterior
p(θ|X) ∝ p(X|θ)
Likelihood
Prior
π(θ)
41. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian regression
Let’s reformulate our regression:
We will use a simple Gaussian distribution for our observations,
defined as:
Y ∼ N(µ, σ2
)
42. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian regression
Let’s reformulate our regression:
We will use a simple Gaussian distribution for our observations,
defined as:
Y ∼ N(µ, σ2
)
We plug our regression of the µ:
Y ∼ N( α + βx
Linear model
, σ2
)
43. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian regression
Let’s reformulate our regression:
We will use a simple Gaussian distribution for our observations,
defined as:
Y ∼ N(µ, σ2
)
We plug our regression of the µ:
Y ∼ N( α + βx
Linear model
, σ2
)
And define the priors:
α ∼ N(0, 20)
β ∼ N(0, 20)
σ ∼ U(0, 5)
44. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian Regression in Plate Notation
You can represent the same model below with plate notation:
Y ∼ N(α + βx, σ2
)
α ∼ N(0, 20)
β ∼ N(0, 20)
σ ∼ U(0, 5)
45. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian Regression in Plate Notation
You can represent the same model below with plate notation:
Y ∼ N(α + βx, σ2
)
α ∼ N(0, 20)
β ∼ N(0, 20)
σ ∼ U(0, 5)
46. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
MCMC Sampling
Let’s see a demo of a Monte Carlo Markov Chain sampler:
Source: MCMC Demos, by Chi Feng
47. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
MCMC Sampling
0.7 0.8 0.9 1.0 1.1 1.2
0
2
4
Frequency
Intercept
0 1000 2000 3000 4000
0.8
1.0
1.2
Samplevalue
Intercept
1.6 1.8 2.0 2.2 2.4
0
1
2
3
Frequency
x
0 1000 2000 3000 4000
1.5
2.0
Samplevalue
x
0.45 0.50 0.55 0.60
0
5
10
15
Frequency
sigma
0 1000 2000 3000 4000
0.5
0.6
Samplevalue
sigma
Trace plot generated using PyMC3, you can also use ArviZ.
48. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian regression
0.0 0.2 0.4 0.6 0.8 1.0
x
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
y
Posterior predictive regression lines
sample data
posterior predictive regression lines
49. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian methods
Bayesian methods can give us a full posterior to reason about;
1
Zoubin Ghahramani, History of Bayesian Neural Networks, NIPS 2016
50. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian methods
Bayesian methods can give us a full posterior to reason about;
Explicit priors;
1
Zoubin Ghahramani, History of Bayesian Neural Networks, NIPS 2016
51. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian methods
Bayesian methods can give us a full posterior to reason about;
Explicit priors;
Uncertainty;
1
Zoubin Ghahramani, History of Bayesian Neural Networks, NIPS 2016
52. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian methods
Bayesian methods can give us a full posterior to reason about;
Explicit priors;
Uncertainty;
They’re on the side of algorithms, not models 1;
1
Zoubin Ghahramani, History of Bayesian Neural Networks, NIPS 2016
53. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian methods
Bayesian methods can give us a full posterior to reason about;
Explicit priors;
Uncertainty;
They’re on the side of algorithms, not models 1;
However,
Intractable posterior for many practical cases and large datasets;
p(θ|X) =
p(X|θ)π(θ)
p(X)
1
Zoubin Ghahramani, History of Bayesian Neural Networks, NIPS 2016
54. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian methods
Bayesian methods can give us a full posterior to reason about;
Explicit priors;
Uncertainty;
They’re on the side of algorithms, not models 1;
However,
Intractable posterior for many practical cases and large datasets;
p(θ|X) =
p(X|θ)π(θ)
p(X)
Tuning and using MCMC algorithms can be tricky.
1
Zoubin Ghahramani, History of Bayesian Neural Networks, NIPS 2016
55. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Section III
Deep Learning
56. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Deep Learning
It’s not a secret that Deep Learning reached an important milestone in
Machine Learning:
Non-linear function approximators;
57. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Deep Learning
It’s not a secret that Deep Learning reached an important milestone in
Machine Learning:
Non-linear function approximators;
They can scale to large datasets (thanks to stochastic approximation);
58. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Deep Learning
It’s not a secret that Deep Learning reached an important milestone in
Machine Learning:
Non-linear function approximators;
They can scale to large datasets (thanks to stochastic approximation);
59. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Deep Learning
It’s not a secret that Deep Learning reached an important milestone in
Machine Learning:
Non-linear function approximators;
They can scale to large datasets (thanks to stochastic approximation);
They are state-of-the-art for NLP, computer vision, speech, etc;
60. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Deep Learning
It’s not a secret that Deep Learning reached an important milestone in
Machine Learning:
Non-linear function approximators;
They can scale to large datasets (thanks to stochastic approximation);
They are state-of-the-art for NLP, computer vision, speech, etc;
Very expressive and flexible;
61. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Deep Learning
It’s not a secret that Deep Learning reached an important milestone in
Machine Learning:
Non-linear function approximators;
They can scale to large datasets (thanks to stochastic approximation);
They are state-of-the-art for NLP, computer vision, speech, etc;
Very expressive and flexible;
Representation learning;
62. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
One-slide Intro to Deep Learning
x0
x1
...
xD
y
(1)
0
y
(1)
1
...
y
(1)
m(1)
. . .
. . .
. . . y
(L)
0
y
(L)
1
...
y
(L)
m(L)
y
(L+1)
1
y
(L+1)
2
...
y
(L+1)
C
input layer
1st hidden layer Lth hidden layer
output layer
A multi-layer perceptron (MLP) network overview. Source: David Stutz, 2018, BSD 3-Clause License.
Parametrized models with composition of functions;
Trained using backpropagation and SGD;
Learned usually by maximizing the log likelihood;
63. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian Neural Networks
A Bayesian Neural Network (BNN) is a Neural Network with
distributions over parameters2.
2
Neal, Radford M. (2012). Bayesian learning for neural networks.
64. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian Neural Networks
A Bayesian Neural Network (BNN) is a Neural Network with
distributions over parameters2.
Source: Weight Uncertainty in Neural Networks. Charles Blundell et al. 2015.
2
Neal, Radford M. (2012). Bayesian learning for neural networks.
65. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian Neural Networks
In modern Deep Neural Networks, however, we have some challenges:
66. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian Neural Networks
In modern Deep Neural Networks, however, we have some challenges:
A lot of data;
67. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian Neural Networks
In modern Deep Neural Networks, however, we have some challenges:
A lot of data;
High-dimensionality in data;
68. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian Neural Networks
In modern Deep Neural Networks, however, we have some challenges:
A lot of data;
High-dimensionality in data;
Millions of parameters;
69. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian Neural Networks
In modern Deep Neural Networks, however, we have some challenges:
A lot of data;
High-dimensionality in data;
Millions of parameters;
Highly non-convex surfaces;
70. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bayesian Neural Networks
In modern Deep Neural Networks, however, we have some challenges:
A lot of data;
High-dimensionality in data;
Millions of parameters;
Highly non-convex surfaces;
This makes these models very difficult for Bayesian methods, therefore an
approximation is required:
Variational Inference
(variational bayes)
71. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Section IV
Variational Inference
72. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Variational Inference
Variational Inference (VI) is often used as an alternative to MCMC;
73. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Variational Inference
Variational Inference (VI) is often used as an alternative to MCMC;
Can be used to approximate the posterior of Bayesian models;
74. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Variational Inference
Variational Inference (VI) is often used as an alternative to MCMC;
Can be used to approximate the posterior of Bayesian models;
Faster than MCMC for complex models and larger datasets;
75. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Variational Inference
Variational Inference (VI) is often used as an alternative to MCMC;
Can be used to approximate the posterior of Bayesian models;
Faster than MCMC for complex models and larger datasets;
Shift from sampling to optimization;
76. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Variational Inference
Variational Inference (VI) is often used as an alternative to MCMC;
Can be used to approximate the posterior of Bayesian models;
Faster than MCMC for complex models and larger datasets;
Shift from sampling to optimization;
Less guarantees than MCMC, density close to the target;
77. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Variational Inference
Variational Inference (VI) is often used as an alternative to MCMC;
Can be used to approximate the posterior of Bayesian models;
Faster than MCMC for complex models and larger datasets;
Shift from sampling to optimization;
Less guarantees than MCMC, density close to the target;
For an in-depth review
For a modern in-depth review please refer to: Variational Inference: A Review for
Statisticians. Blei, D. M. et al (2018).
78. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Variational Inference
We have a very complex posterior distribution p(w | D) that we
want to approximate (w are the parameters, and D is the data);
79. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Variational Inference
We have a very complex posterior distribution p(w | D) that we
want to approximate (w are the parameters, and D is the data);
We do this approximation by using an "easier" distribution q(w | θ)
(also called the variational distribution, where θ are the variational
parameters);
80. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Variational Inference
We have a very complex posterior distribution p(w | D) that we
want to approximate (w are the parameters, and D is the data);
We do this approximation by using an "easier" distribution q(w | θ)
(also called the variational distribution, where θ are the variational
parameters);
Variational approximation (green). Source: Eric Jang, 2016. https://blog.evjang.com
81. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Posterior approximation
If we want to approximate p(w | D) with q(w | θ), we need a
measure of "closeness";
82. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Posterior approximation
If we want to approximate p(w | D) with q(w | θ), we need a
measure of "closeness";
We use Kullback-Leibler (KL) divergence:
Source: Flawnson Tong, https://towardsdatascience.com
83. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Posterior approximation
We use Kullback-Leibler (KL) divergence:
θ∗
= arg min
θ
KL[q(w | θ) || p(w | D)]
84. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Posterior approximation
We use Kullback-Leibler (KL) divergence:
θ∗
= arg min
θ
KL[q(w | θ) || p(w | D)]
θ∗
= arg min
θ
log q(w | θ)
variational posterior
− log p(w)
prior
− log p(D | w)
log likelihood
85. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Posterior approximation
We use Kullback-Leibler (KL) divergence:
θ∗
= arg min
θ
KL[q(w | θ) || p(w | D)]
θ∗
= arg min
θ
log q(w | θ)
variational posterior
− log p(w)
prior
− log p(D | w)
log likelihood
Why KL-divergence ?
Because it allows us to derive a cost that is tractable to optimization.
86. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Posterior approximation
We use Kullback-Leibler (KL) divergence:
θ∗
= arg min
θ
KL[q(w | θ) || p(w | D)]
θ∗
= arg min
θ
log q(w | θ)
variational posterior
− log p(w)
prior
− log p(D | w)
log likelihood
Why KL-divergence ?
Because it allows us to derive a cost that is tractable to optimization.
Not without paying a price though.
87. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Forward and Reverse KL
Forms of the KL-divergence. Source: Pattern Recognition and Machine Learning. Christopher M.
Bishop. 2006. (a) forward KL-divergence, (b) and (c) reverse KL-divergence.
88. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Forward KL
Source: Colin Raffel, https://colinraffel.com
89. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Forward KL (misspecification)
Source: Colin Raffel, https://colinraffel.com
90. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Reverse KL
Source: Colin Raffel, https://colinraffel.com
91. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Quality of the uncertainty estimation
MFVB approximation. Source: Variational Bayes and beyond: Bayesian inference for big data.
Tamara Broderick. ICML 2018.
92. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Quality of the uncertainty estimation
MFVB approximation. Source: Variational Bayes and beyond: Bayesian inference for big data.
Tamara Broderick. ICML 2018.
Can underestimate variance severely;
When compared to MCMC, means are usually fine, but variance is
far away;
93. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Training a Bayesian Neural Network
The training loop for a Bayesian Neural Network (BNN) using
Variational Inference is shown below:
Sample from q(w | θ) the parameters of the network. Two
variational parameters for each weight in q: µ and σ;
94. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Training a Bayesian Neural Network
The training loop for a Bayesian Neural Network (BNN) using
Variational Inference is shown below:
Sample from q(w | θ) the parameters of the network. Two
variational parameters for each weight in q: µ and σ;
Parametrize the network with the sampled parameters, often using
the reparametrization trick;
95. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Training a Bayesian Neural Network
The training loop for a Bayesian Neural Network (BNN) using
Variational Inference is shown below:
Sample from q(w | θ) the parameters of the network. Two
variational parameters for each weight in q: µ and σ;
Parametrize the network with the sampled parameters, often using
the reparametrization trick;
Forward pass with the data batch;
96. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Training a Bayesian Neural Network
The training loop for a Bayesian Neural Network (BNN) using
Variational Inference is shown below:
Sample from q(w | θ) the parameters of the network. Two
variational parameters for each weight in q: µ and σ;
Parametrize the network with the sampled parameters, often using
the reparametrization trick;
Forward pass with the data batch;
Calculate the combined loss: variational posterior, prior and log
likelihood;
97. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Training a Bayesian Neural Network
The training loop for a Bayesian Neural Network (BNN) using
Variational Inference is shown below:
Sample from q(w | θ) the parameters of the network. Two
variational parameters for each weight in q: µ and σ;
Parametrize the network with the sampled parameters, often using
the reparametrization trick;
Forward pass with the data batch;
Calculate the combined loss: variational posterior, prior and log
likelihood;
Compute gradients by backpropagation and optimize with SGD;
98. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Training a Bayesian Neural Network
The training loop for a Bayesian Neural Network (BNN) using
Variational Inference is shown below:
Sample from q(w | θ) the parameters of the network. Two
variational parameters for each weight in q: µ and σ;
Parametrize the network with the sampled parameters, often using
the reparametrization trick;
Forward pass with the data batch;
Calculate the combined loss: variational posterior, prior and log
likelihood;
Compute gradients by backpropagation and optimize with SGD;
Repeat;
99. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Training a Bayesian Neural Network
The training loop for a Bayesian Neural Network (BNN) using
Variational Inference is shown below:
Sample from q(w | θ) the parameters of the network. Two
variational parameters for each weight in q: µ and σ;
Parametrize the network with the sampled parameters, often using
the reparametrization trick;
Forward pass with the data batch;
Calculate the combined loss: variational posterior, prior and log
likelihood;
Compute gradients by backpropagation and optimize with SGD;
Repeat;
Prediction: multiple forward passes.
100. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Training a Bayesian Neural Network
The training loop for a Bayesian Neural Network (BNN) using
Variational Inference is shown below:
Sample from q(w | θ) the parameters of the network. Two
variational parameters for each weight in q: µ and σ;
Parametrize the network with the sampled parameters, often using
the reparametrization trick;
Forward pass with the data batch;
Calculate the combined loss: variational posterior, prior and log
likelihood;
Compute gradients by backpropagation and optimize with SGD;
Repeat;
Prediction: multiple forward passes.
This method is also called bayes by backprop.
101. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Quality of the uncertainty estimation
HMC vs VI. Source: Bayesian Inference with Anchored Ensembles of Neural Networks, and Application
to Exploration in Reinforcement Learning. Tim Pearce. 2018.
For more information
For more information about the variational approach, please refer to: Weight
Uncertainty in Neural Networks. C. Blundell, et al. 2015.
102. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Dropout as a Bayesian Approximation
Dropout. Source: Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Nitish
Srivastava, et al. 2014.
103. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Dropout as a Bayesian Approximation
In 2015, the work Dropout as a Bayesian Approximation: Insights and
Applications. Yarin Gal et al., they found a relationship between
Dropout and Bayesian approximation;
104. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Dropout as a Bayesian Approximation
In 2015, the work Dropout as a Bayesian Approximation: Insights and
Applications. Yarin Gal et al., they found a relationship between
Dropout and Bayesian approximation;
It turns out that to do a Bernoulli approximate variational inference
in Bayesian NNs, you can just add dropout during training and
during prediction time as well;
105. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Dropout as a Bayesian Approximation
In 2015, the work Dropout as a Bayesian Approximation: Insights and
Applications. Yarin Gal et al., they found a relationship between
Dropout and Bayesian approximation;
It turns out that to do a Bernoulli approximate variational inference
in Bayesian NNs, you can just add dropout during training and
during prediction time as well;
Quite appealing due to its simplicity and it also provided an
interesting interpretation of dropout;
106. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Dropout as a Bayesian Approximation
In 2015, the work Dropout as a Bayesian Approximation: Insights and
Applications. Yarin Gal et al., they found a relationship between
Dropout and Bayesian approximation;
It turns out that to do a Bernoulli approximate variational inference
in Bayesian NNs, you can just add dropout during training and
during prediction time as well;
Quite appealing due to its simplicity and it also provided an
interesting interpretation of dropout;
This technique is called "MC Dropout" or "Monte Carlo Dropout".
107. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
MC Dropout on a Regression Setting
Some results from the MC Dropout on a regression setting:
MC Dropout. Source: Dropout as a Bayesian Approximation: Insights and Applications. Yarin Gal et al.
ICML 2015.
108. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
MC Dropout on a Classification Setting
Some results from the MC Dropout on a classification setting:
MC Dropout. Source: Dropout as a Bayesian Approximation: Insights and Applications. Yarin Gal et al.
ICML 2015.
109. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Criticism of MC Dropout
Some results from the MC Dropout on a regression setting:
MC Dropout with varying number of data points. Gray regions is 1, std. dev. above and below. Source:
Randomized Prior Functions for Deep Reinforcement Learning. Ian Osband et al. 2018.
It was shown that MC Dropout didn’t pass a simple sanity check in a linear
setting, as it didn’t concentrate with more data.
110. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Section V
Ensembles
111. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Ensembles
Uses multiple hypothesis to
learn a better one;
We can see dropout as an
ensemble, but with shared
weights;
The ensemble variance can be
interpreted as uncertainty;
Simple intuition why it works.
Input Data
Combine predictions
Model #1 Model #2 Model #3
112. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Deep Ensembles
In the work: Simple and Scalable Predictive Uncertainty Estimation using
Deep Ensembles. Lakshminarayanan B., et al. NIPS 2017., they proposed a
very simple method to compute uncertainty with ensembles:
113. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Deep Ensembles
In the work: Simple and Scalable Predictive Uncertainty Estimation using
Deep Ensembles. Lakshminarayanan B., et al. NIPS 2017., they proposed a
very simple method to compute uncertainty with ensembles:
Setting
You have M models, with independent parameters θ1, θ2, θM .
114. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Deep Ensembles
In the work: Simple and Scalable Predictive Uncertainty Estimation using
Deep Ensembles. Lakshminarayanan B., et al. NIPS 2017., they proposed a
very simple method to compute uncertainty with ensembles:
Setting
You have M models, with independent parameters θ1, θ2, θM .
1) Initialize parameters θ1, θ2, θM randomly;
115. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Deep Ensembles
In the work: Simple and Scalable Predictive Uncertainty Estimation using
Deep Ensembles. Lakshminarayanan B., et al. NIPS 2017., they proposed a
very simple method to compute uncertainty with ensembles:
Setting
You have M models, with independent parameters θ1, θ2, θM .
1) Initialize parameters θ1, θ2, θM randomly;
2) Train each network m ∈ M with weights θm individually;
116. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Deep Ensembles
In the work: Simple and Scalable Predictive Uncertainty Estimation using
Deep Ensembles. Lakshminarayanan B., et al. NIPS 2017., they proposed a
very simple method to compute uncertainty with ensembles:
Setting
You have M models, with independent parameters θ1, θ2, θM .
1) Initialize parameters θ1, θ2, θM randomly;
2) Train each network m ∈ M with weights θm individually;
3) Add or not adversarial training;
117. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Deep Ensembles
In the work: Simple and Scalable Predictive Uncertainty Estimation using
Deep Ensembles. Lakshminarayanan B., et al. NIPS 2017., they proposed a
very simple method to compute uncertainty with ensembles:
Setting
You have M models, with independent parameters θ1, θ2, θM .
1) Initialize parameters θ1, θ2, θM randomly;
2) Train each network m ∈ M with weights θm individually;
3) Add or not adversarial training;
4) Combine the predictions with:
p(y | x) = M−1
average
M
m=1
prediction from each network
pθm (y | x, θm)
118. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Evaluating Entropy on Classification
Plot of the binary entropy function H(p). A measure of the uncertainty.
119. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Evaluating Entropy on Classification
0.20.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
entropy values
0
1
2
3
4
5
6
7
8 Known classes
1
2
3
4
5
1 0 1 2 3 4 5
entropy values
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7 Unknown classes
1
2
3
4
5
ImageNet trained only on dogs. Histogram of the predictive entropy on test examples from known classes
(dogs) and unknown classes (non-dogs) with varying ensemble size. Source: Lakshminarayanan B., et al.
NIPS 2017.
120. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Evaluating Entropy on Classification
−0.50.0 0.5 1.0 1.5 2.0 2.5
entropy values
0
1
2
3
4
5
6
7
Ensemble
1
5
10
−0.50.0 0.5 1.0 1.5 2.0 2.5
entropy values
Ensemble + R
1
5
10
−0.50.0 0.5 1.0 1.5 2.0 2.5
entropy values
Ensemble + AT
1
5
10
−0.5 0.0 0.5 1.0 1.5 2.0
entropy values
MC dropout
1
5
10
−0.50.0 0.5 1.0 1.5 2.0 2.5
entropy values
0
1
2
3
4
5
6
7
Ensemble
1
5
10
−0.50.0 0.5 1.0 1.5 2.0 2.5
entropy values
Ensemble + R
1
5
10
−0.50.0 0.5 1.0 1.5 2.0 2.5
entropy values
Ensemble + AT
1
5
10
−0.50.0 0.5 1.0 1.5 2.0 2.5
entropy values
MC dropout
1
5
10
Histogram of the predictive entropy on test examples from known classes from SVHN (top row) and
unknown classes from CIFAR-10 (bottom row). Source: Lakshminarayanan B., et al. NIPS 2017.
121. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Randomized Priors
In Randomized Prior Functions for Deep Reinforcement Learning. Ian
Osband et al. 2018:
Very simple and elegant modification on the ensemble method for
uncertainty;
122. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Randomized Priors
In Randomized Prior Functions for Deep Reinforcement Learning. Ian
Osband et al. 2018:
Very simple and elegant modification on the ensemble method for
uncertainty;
Developed in the Reinforcement Learning context;
123. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Randomized Priors
In Randomized Prior Functions for Deep Reinforcement Learning. Ian
Osband et al. 2018:
Very simple and elegant modification on the ensemble method for
uncertainty;
Developed in the Reinforcement Learning context;
Overcome the issue of injecting a prior into ensemble-based
approaches to uncertainty;
124. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Randomized Priors
In Randomized Prior Functions for Deep Reinforcement Learning. Ian
Osband et al. 2018:
Very simple and elegant modification on the ensemble method for
uncertainty;
Developed in the Reinforcement Learning context;
Overcome the issue of injecting a prior into ensemble-based
approaches to uncertainty;
On a simple linear setting, it is equivalent to exact Bayesian inference
for the case of a linear Gaussian model.
125. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bootstrap
Population
126. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bootstrap
Population
Sample #1
Sample #2
Sample #3
127. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bootstrap
Population
Sample #1
Sample #2
Sample #3
Statistic
Statistic
Statistic
q1
q2
q3
128. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Bootstrap
Population
Sample #1
Sample #2
Sample #3
Statistic
Statistic
Statistic
q1
q2
q3
Bootstrap Statistic
Distribution
129. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Randomized Prior Functions
The key insight is to add a randomized (but fixed) prior and
bootstraped data:
130. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Randomized Prior Functions
The key insight is to add a randomized (but fixed) prior and
bootstraped data:
for k = 1, . . . , K do:
Initialize θk ∼ random;
Form Dk with bootstrap;
Sample prior function pk ∼ P
Optimize L(fθ + λpk; Dk)
return posterior ensemble {fθk
+ pk}K
k=1
131. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Qualitative Inspection
Some pathological cases:
Posterior predictive distributions for 1D regression with a (20, 20)-MLP and ReLUs. Source:
Randomized Prior Functions for Deep Reinforcement Learning. Ian Osband et al. 2018.
132. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Qualitative Inspection
Some pathological cases:
Posterior predictive distributions for 1D regression with a (20, 20)-MLP and ReLUs. Source:
Randomized Prior Functions for Deep Reinforcement Learning. Ian Osband et al. 2018.
“(...) If an agent has only ever observed zero reward, then no amount of
bootstrapping or ensembling will cause it to simulate positive rewards.
(...)”
– Randomized Prior Functions for Deep Reinforcement Learning. Ian
133. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Predictive Uncertainty
6 4 2 0 2 4 6
20
10
0
10
20
30
40
Source: Ian Osband et al. Using Randomized Prior Functions for Deep Reinforcement Learning. NIPS
2018. Image from: http://blog.christianperone.com
134. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Posterior Samples
4 3 2 1 0 1 2 3 4
10
5
0
5
10
Source: Ian Osband et al. Using Randomized Prior Functions for Deep Reinforcement Learning. NIPS
2018. Image from: http://blog.christianperone.com
135. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Prior Samples
4 3 2 1 0 1 2 3 4
4
2
0
2
4
Source: Ian Osband et al. Using Randomized Prior Functions for Deep Reinforcement Learning. NIPS
2018. Image from: http://blog.christianperone.com
136. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Final Remarks
Many methods, no standardized evaluation, no ground truth for
model uncertainty;
137. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Final Remarks
Many methods, no standardized evaluation, no ground truth for
model uncertainty;
Performance (CPU/GPU resources) penalty basically for all
methods;
138. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Final Remarks
Many methods, no standardized evaluation, no ground truth for
model uncertainty;
Performance (CPU/GPU resources) penalty basically for all
methods;
No scalable solution for MCMC (yet);
139. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Final Remarks
Many methods, no standardized evaluation, no ground truth for
model uncertainty;
Performance (CPU/GPU resources) penalty basically for all
methods;
No scalable solution for MCMC (yet);
Choice depends on application;
140. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Final Remarks
Many methods, no standardized evaluation, no ground truth for
model uncertainty;
Performance (CPU/GPU resources) penalty basically for all
methods;
No scalable solution for MCMC (yet);
Choice depends on application;
Always take into consideration the trade-off of guarantees;
141. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Final Remarks
Many methods, no standardized evaluation, no ground truth for
model uncertainty;
Performance (CPU/GPU resources) penalty basically for all
methods;
No scalable solution for MCMC (yet);
Choice depends on application;
Always take into consideration the trade-off of guarantees;
Significant evolution of methods, frameworks and hardware.
142. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Learning More - I
Statistical Rethinking (excellent book and course), by Richard
McElreath.
https://xcelab.net/rm/statistical-rethinking/
Variational Inference: A Review, by David M. Blei, et al.
https://arxiv.org/abs/1601.00670
Scalable Bayesian Inference, by David Dunson. NIPS 2018 Talk.
https://www.youtube.com/watch?v=0HXpnG_WnlI
Variational Bayes and Beyond, by Tamara Broderick. ICML 2018
Tutorial.
https://www.youtube.com/watch?v=Moo4-KR5qNg
History of Bayesian Neural Networks, by Zoubin Ghahramani.
NIPS 2016 Keynote talk.
https://www.youtube.com/watch?v=FD8l2vPU5FY
143. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Learning More - II
Uncertainty in Deep Learning, Slides, by Roberto Silveira.
http://tiny.cc/c77n9y
A Beginner’s Guide to Variational Methods, by Eric Jang.
https:
//blog.evjang.com/2016/08/variational-bayes.html
Uncertainty in Deep Learning, Thesis, by Yarin Gal.
http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf
PyMC3, Framework, by PyMC3 developers.
https://docs.pymc.io/
Pyro, Framework, by Pyro developers.
http://pyro.ai/
Tensorflow Probability, Framework, by TensorFlow developers.
https://www.tensorflow.org/probability
144. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Section VI
Q&A
145. Uncertainty in Deep Learning - Christian S. Perone (2019)
Uncertainties Bayesian Inference Deep Learning Variational Inference Ensembles Q&A
Q&A
Hope you liked ! Questions ?