SlideShare a Scribd company logo
1 of 47
Download to read offline
Gan with BNN
• Generative Models
• GAN –Foundation
• BNN
• GAN for BNN
What is Generative Models?
What it is not?
Discriminative models
• We study the conditional distribution P(Y=c|X=x)
c-class, x-features vector
• These models are trained for prediction tasks
• Most of the DL renaissance occurs in such models
Generative Model in Supervised Framework
Generative models Supervised
• We Train P(X=x| Y=c)
• By Bayes formula (and the prior on Y)we obtain the join dist P(Y,X)
We learn the statistical manners of a single class!!
We acquire the ability to generate samples from a given class
A common tool is Naïve Bayes
Generative Models (Cont)
Unsupervised
1. We don’t have target that guides us how to sectorize the data
2. We learn a generating deterministic function
x= f(z,θ)
f –deterministic, z –hidden variable θ -parameters
We aim to maximize the likelihood.
Before GAN
• Most of the generative models used sampling tools (M.H, Gibbs)
• Typically they need inference for next sampling (HMM, LDA,RBM)
• They suffer from several failure:
1. They don’t handle well high dimensions
2. Sampling converges slowly (they are “expensive”)
3. They prefer high distr domains, hence dont map the entire space
(M.H.)
4. Mini batch and gradient step are not always plausible.
Then came GAN
What was Adversarial?
Adversarial are simply perturbed inputs that may cause NN to
misclassify the data
1. They are often generated intentionally
2. They are located outside the data manifold (kind of noise)
Goodfellow -Explaining & Harnessing Adv. Ex.
He aimed to train DNN by introducing adversarial examples.
What is Adversarial now?
Nowadays
• Adversarial refers to a training on worst case scenario
examples
• One can think of it as a game between an agent and herself
Example : Samuels and his checker game ( 1950)
• GAN – The worst case scenario is created by a network too
Goodfellow’s Network
(pylearn2 code at https://github.com/goodfeli/adversarial)
Discriminator
A Common neural net (DNNCNN )
Input: a sample of real data
Output: The probability that the data is real data and not
“fake”
Labels: Simply 1 for real data and 0 for fake
GAN
Generator
A Common neural net (DNN Goodfellow’s work)
Input: A generic distribution (GaussianUniform)
Output: Data sample from “real data” space such as fake images
Loss
𝒎𝒊𝒏
𝑮
𝒎𝒂𝒙
𝑫
(𝑽 𝑮, 𝑫 ) = 𝑬 𝒙~𝑷 𝒅𝒂𝒕𝒂(𝒙)[𝒍𝒐𝒈( 𝑫(𝒙))] + 𝑬 𝒛~𝑷(𝒛)[𝟏 − 𝒍𝒐𝒈( 𝑫(𝑮(𝒛))]
GAN –Advanced Architectures
(with available torch code)
• DCGAN – Both generator and discriminator are CNN:
using batch normalization, no max pooling layers, in the
disc, we replace fully connected layers by average pooling
• CGAN – Supervised data where the inputs of both the discr
& the generator contains the target
• ACGAN –Similar to CGAN but a score is given for the class
as well
Wasserstein Distance
A distance between prob. Measures:
𝑊𝑝(𝜉, 𝜋) = min
𝛾∊𝛤
E[𝑑(𝑥, 𝑦) 𝑝 ]
ξ and π are the marginals of X and Y respectively
We discuss only 𝑊1 the Earth Mover Distance
Earth Mover Distance
1. Very intuitive –The work performed to move from dist P to dist Q
2. Weak Convergence (e.g. in comparison to KL )
3 Analytically continuity is guaranteed!!
Kantorovich-Rubinstein Duality:
𝑊 𝜉, 𝜋 = max
𝑓 𝐿 <1
𝐸 𝑋~𝜉 𝑓 𝑥 − 𝐸 𝑦~𝜋 𝑓 𝑦
• We can now train 𝑓 using a NN ( with some weights clipping to mimic the Lipschitz property)
• Arjovsky – Wassertein Gan
WGAN - GP
• As said Lipschitz property has not fully achieved
Gulrajani, Arjovsky Improved WGAN “WGAN-GP”
Rather weights clipping we add gradient penalty
L= 𝐸 𝑋~𝜉 𝐷 𝑥 − 𝐸 𝑦~𝜋 [𝐷 𝑦 ]+λ𝐸𝑧 [( 𝛻z 𝐷(𝑧) 2−1)2]
z =𝜀𝑥 + (1 − 𝜀) 𝑦 𝜉, 𝜋 Distributions
𝜀 ~ U[0,1]
GAN Summary
• Generator – A deterministic function the maps distribution Q to
distribution a vector in the space X “Real Data”
• Discriminator – Receives vectors from space X and estimates whether
they are from distribution Q or dist. P
• Loss- Function that measures the distance between P & Q
1. We don’t need Markov chains
2. Work well with mini batches and have nice gradients
3. No inference during training
4. Handle “difficult “distribtuions (MC need convenient dist.)
Uncertainty
• Statistics prediction tools such as Bayesian inference output
a confidence estimation in addition to the prediction score .
What about DL and confidence….?
Not too much … 
Uncertainty Types
Uncertainty Types:
1. Epistemic -Uncertainty due to lack of knowledge
Episteme= Knowledge
2. Aleatoric -Uncertainty due to noisy data :
We need better data not more data
Aleatory=dice player
The notions “reducible” & “irreducible” are used too
Uncertainty Estimation Methods
1. Conditional entropy:
H(P(y|x)) = 𝑦∈𝑌 𝑃(𝑦|𝑥) log 𝑃 𝑦 𝑥
Entropy can’t differentiate between epistemic & aleatoric uncertainty
2. Inform. Gain (info gain over params values upon input prediction)
I(w,y |x,D) =H[p(y|x, D)]- 𝐸 𝑝 𝑤 𝐷 𝐻[𝑝 𝑦 𝑥, 𝑤 ]
It well measures the epistemic uncertainty because little info implies that
the parameter is well known
3. (VR) variation-ratio :
VR(x) =1- 𝒕 𝟏 𝒚 𝒕=𝒄∗
𝑻
DL & Uncertainty
• Deep Learning does not handle confidence:
The network is trained to get features and returns probabilities or
numbers, but nothing about how certain the output is.
DL is about training deterministic functions upon data!!
Is uncertainty important ?
Images of dogs and cats are nice anecdote, but … What about MRI?
Melanoma?
GAN for Bayesian Inference objectives
Uncertainty (Cont.)
So DL does not provide uncertainty measures
Still…
DL is a class of tools that strongly rely on probabilistic
mechanism
Which steps can we take in order to measure uncertainty?
It appears that we simply have to add distribution to the
weights!!! We can do this
Bayesian Neural network (BNN)
DL Vs. BNN
DL
1. Loss is related to prediction probability P(Y|X,W)
2. Study weights W point-wise with MLE
Bayesian NN
1. Loss is related the posterior probability P(W|X,Y)
2. Study weights distribution (prior assumption is given)
Framework –Bayesian Inference
The inputs:
1. Observed Data D of length n ,{(x, y)} , (numbers, categories, vectors,
images) It is known also as–Evidence
2. An assumption about the probabilistic structure that generates the
sample –Hypothesis
3. Prior distribution - a pre-assumption about the hypothesis distribution
Objective :
• GainUpdate information about the Hypothesis using the Evidence
• We assume the Prior Prior and learn the Posterior P(H|D) .
• Bayes Formumla
BNN
Training Process -Inference
We assume prior knowledge on the weights distribution π
As in any NN we get an input x’ and aim to predict y’ :
P(y’| x’) = 𝑃 y’ 𝑥′
, 𝑤 𝑃 𝑤 𝐷 𝑑𝑤
This can be rewritten as:
P(y’| x’) =𝐸 𝑃(𝑤|𝐷) 𝑃 y’ 𝑥′
, 𝑤
Common tools to solve the integral
1. MCMC –Sampling (Metropolis –Hastings, Gibbs)
2. Variational Inference
3. HMC
4. SGLD
Variational Inference
We wish to estimate the posterior distribution P(Θ|D)
• Rather sampling methods we can construct analytical solution :
1. Choose class of distributions Q (e.g. Gaussians)
2. Find the q that optimizes:
𝐦𝐢𝐧
𝒒∊𝑸
(𝑲𝑳(𝒒(Θ)||𝑷(Θ|𝑫))
(Jordan ,1999 , Blei 2003, Graves 2011)
GAN for Bayesian Inference objectives
What is Hamiltonian?
• Operator that measures the total energy of an system
Two sets of coordinates
q -State coordinates (generalized coordinates)
p- Momentum
H(p, q) =U(q) +K(p)
U(q) = log[π 𝑞 𝐿(𝑞|𝐷)] K(P)=
𝑝 2
2𝑚
U-Potential energy, K –Kinetic
𝑑𝐻
𝑑𝑝
= 𝑞 ,
𝑑𝐻
𝑑𝑞
= - 𝑝
Hamiltonian Monte Carlo
Hamiltonians satisfy the following properties
1. They are Volume preserved (Liouville’s Theorem)
2. Time invariant
3. Time reversible
4. Hamiltonians offer a deterministic vector field (with trajectories….)
We can therefore use it for sampling needs, if we take distribution
that depends solely in the Hamiltonian!!
P(x,y) = 𝑒−𝐻(𝑥,𝑦)
Hybrid - MC
• We have the “state space” x
• We can add “momentum” and use Hamiltonian mechanism
Leap Frog Algorithm
We set a time interval δ, For each step i :
1. 𝑃𝑖(t+0.5 δ) =𝑃𝑖(t) – (δ/2)
𝑑𝑈
𝑑𝑞(𝑡)
2 𝑄𝑖(t+ δ ) = 𝑄𝑖(t) + δ
𝑑𝐾
𝑑𝑝(𝑡+0.5δ)
3 𝑃𝑖(t+ δ) = 𝑃𝑖(t+0.5 δ) - (δ/2)
𝑑𝑈
𝑑𝑞(𝑡+δ)
𝑄𝑖
𝑄
HMC
Algorithm (Neal 1995, 2012, Duane 1987)
1. Draw 𝑥0 from our prior
Draw 𝑝0 from standard normal dist.
2. Perform L steps of leapfrog
3 Pick the 𝑥 𝑡 upon M.H step
min [ 1, exp(−U(q ∗ ) + U(q) − K(p ∗ ) + K(p))]
GAN for Bayesian Inference objectives
HMC –Pros & Cons
Pros
• It takes points from a wider domains therefore we can describe the
distribution better and converges faster
• It may take points with lower density
• Faster than MCMC
• Ergodicity
Cons
• It may suffer from low energy barrier
• No minibatch
• It has to calculate gradients for the entire data!!! Bad
What do we need then?
• A tool that allows sub-sampling
• Fewer Gradients
• Keen knowledge about extremums and escape rooms
Langevin Equation
Langevin Equation describes the motion of pollen grain in water:
F -γ𝑣 𝑡 +ξ 𝑡=0 ξ 𝑡 ~N(0,I)
ξ 𝑡 is a Brownian Force- The collisions with the water molecules
We have : F=𝛻𝐸 𝑣 𝑡 =
𝑑𝑋
𝑑𝑇
=> 𝑥 𝑡+1 = 𝑥 𝑡 +
dt
γ
ξ 𝑡 + 𝛻𝐸
dt
γ
(looks familiar doesn’t it?)
SGLD Welling & Teh 2011
1. Let’s do a single leap frog at each step
2. We add the gradient a zero mean Gaussian sample .
variance? Wait!
3. Robbins & Monro (1951) Stochastic Optimization , stochastic approx.
method
Learning rate decays in time
𝑖=1
∞
ε 𝑡 = ∞ 𝑖=1
∞
ε2
𝑡 < ∞
=> Δ 𝜃𝑡 =
ε𝑡
2
(𝛻log 𝑝 𝜃𝑡 +
𝑁
𝑛 𝑖=1
𝑁
𝛻log 𝑝 𝑥𝑖|𝜃𝑡 ) + η 𝑡 η 𝑡 ~N(0, ε 𝑡)
What did we learn?
• GAN -A generative tool that knows to approximate distributions
• BNN –A cool NN tool for uncertainty estimations
Can they together construct a deep girls power ?!
GAN meets BNN
Adversarial Distillation of Bayesian Neural Network Posteriors
Basic Idea
• Train GAN to create posterior distribution of BNN
• We use WGAN-GP as loss function:
L = 𝑬 𝜭~𝑷 𝜭
[𝑫(θ)] -𝑬 𝝃~𝑷 𝒓
[𝑫(𝝃)] + λ 𝑬 𝜭~𝑷 𝜭
(〖 𝜵𝑫 θ 𝟐〗 − 𝟏) 𝟐
Two Steps Training
1. Create sample from the posterior using SGLD mechanism
2. Train the WGAN-GP to sample from this posterior
Adversarial Posterior Distillation (APD)
• A generative model that distills the posterior dist (P(θ|X)
Algorithmic advantage
1. Sample can be performed in parallel (MCMC is sequential)
2. A relatively small storage is required for the generator’s parameters
APD –Offline
1 Sample a series of weights: θ 𝑡 𝑡=1
𝑇
2 Optimize G using WGAN-GP , where θ 𝑡 𝑡=1
𝑇
is the
“real data
Remark: They used different version of SGLD
Baysian Dark Knowledge (2014,Murphy Welling)
APD -Online
1. Draw the θ 𝑡 using the Generator
2. Loop until convergence
• Draw θ 𝑡 upon MCMC method (Gibbs MH) for several iterations
• Put the samples in a buffer
• Use the buffer to optimize G where θ 𝑡 is the “real data”
Post Training
• GAN is a generative tool so we can simply generate….
• Rather using the posterior samples , we use the samples that the GAN
generates
How should we measure the uncertainty?
We predict by the following :
P(y| x, D) ≈ 𝑖=1
𝑇
𝑃(𝑦|𝑥,𝐺(𝑧 𝑡)
𝑇
𝑧 𝑡
~ N(0,I)
Uncertainty
There are several methods of uncertainty:
1. Simply calculate the entropy H(y|x,D)
2. Information gain (here it has the notion:
Bayesian active learning by disagreement (BALD) (Houlsby 2011)
3 We have also (VR) variation-ratio :
VR(x) =1- 𝑡 1 𝑦 𝑡=𝑐∗
𝑇
Some outcomes
• APD can retain SGLD features:
1. Anomaly detection
2. Defense
3. Active Learning
• APD reduces the storage cost of SGLD (or any other MCMC)
• WGAN-GP works better than Wasserstein or the original GAN
That’s All
THANKS!!!
• https://henripal.github.io/blog/langevin
• http://www.quretec.com/u/vilo/edu/2003-04/DM_seminar_2003_II/Bayes/lampinen01bayesian.pdf
• http://bayesiandeeplearning.org/2016/slides/nips16bayesdeep.pdf
• https://pdfs.semanticscholar.org/b0f2/433c088591d265891231f1c22424047f1bc1.pdf?_ga=2.47068911.9935516.1543231280-
34044526.1542095209
• https://arxiv.org/pdf/1505.05424.pdf
• https://henripal.github.io/blog/langevin -pytorch code
• https://www.coursera.org/lecture/bayesian-methods-in-machine-learning/bayesian-neural-networks-HI8ta
• https://pdfs.semanticscholar.org/49c6/c08709d3cbf4b58477375d7c04bcd4da4520.pdf
• https://pdfs.semanticscholar.org/579d/308b610da58266dbfa3574ba9c234ff1da13.pdf
• https://arxiv.org/pdf/1701.07875.pdf
• https://arxiv.org/pdf/1704.00028.pdf
• https://arxiv.org/pdf/1806.10317.pdf
• https://arxiv.org/abs/1112.5745
• https://www.ics.uci.edu/~welling/publications/papers/stoclangevin_v6.pdf
• https://www.cs.ox.ac.uk/people/yarin.gal/website/thesis/thesis.pdf
• http://arogozhnikov.github.io/2016/12/19/markov_chain_monte_carlo.html
• https://theclevermachine.wordpress.com/2012/11/18/mcmc-hamiltonian-monte-carlo-a-k-a-hybrid-monte-carlo/
• https://arxiv.org/pdf/1206.1901.pdf
• https://pdfs.semanticscholar.org/49c6/c08709d3cbf4b58477375d7c04bcd4da4520.pdf
• http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf
• https://papers.nips.cc/paper/4329-practical-variational-inference-for-neural-networks.pdf
• https://www.cs.toronto.edu/~graves/nips_2011.pdf
• https://arxiv.org/pdf/1206.1901.pdf
• https://danieltakeshi.github.io/2017/11/26/basics-of-bayesian-neural-networks/
• https://www.math.wustl.edu/~sawyer/hmhandouts/MetropHastingsEtc.pdf
• http://edwardlib.org/tutorials/bayesian-neural-network -code python
• http://physics.gu.se/~frtbm/joomla/media/mydocs/LennartSjogren/kap6.pdf
• https://www.ics.uci.edu/~welling/publications/papers/stoclangevin_v6.pdf
• https://henripal.github.io/blog/langevin
• https://pdfs.semanticscholar.org/34dd/d8865569c2c32dec9bf7ffc817ff42faaa01.pdf

More Related Content

What's hot

DMTM Lecture 03 Regression
DMTM Lecture 03 RegressionDMTM Lecture 03 Regression
DMTM Lecture 03 RegressionPier Luca Lanzi
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Traian Rebedea
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiestaeseon ryu
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
 
Machine Learning for Trading
Machine Learning for TradingMachine Learning for Trading
Machine Learning for TradingLarry Guo
 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networksParveenMalik18
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
Probability distributions for ml
Probability distributions for mlProbability distributions for ml
Probability distributions for mlSung Yub Kim
 
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Koh Takeuchi
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learningBig Data Colombia
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modeljins0618
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
DMTM Lecture 04 Classification
DMTM Lecture 04 ClassificationDMTM Lecture 04 Classification
DMTM Lecture 04 ClassificationPier Luca Lanzi
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Chris Fregly
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningCharles Deledalle
 
Phase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle SystemsPhase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle SystemsStefan Eng
 

What's hot (20)

DMTM Lecture 03 Regression
DMTM Lecture 03 RegressionDMTM Lecture 03 Regression
DMTM Lecture 03 Regression
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
 
Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
Machine Learning for Trading
Machine Learning for TradingMachine Learning for Trading
Machine Learning for Trading
 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networks
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Probability distributions for ml
Probability distributions for mlProbability distributions for ml
Probability distributions for ml
 
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Deep Reasoning
Deep ReasoningDeep Reasoning
Deep Reasoning
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
DMTM Lecture 04 Classification
DMTM Lecture 04 ClassificationDMTM Lecture 04 Classification
DMTM Lecture 04 Classification
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learning
 
Phase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle SystemsPhase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle Systems
 

Similar to GAN for Bayesian Inference objectives

Variational inference
Variational inference  Variational inference
Variational inference Natan Katz
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptxsghorai
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Supervised Learning.pptx
Supervised Learning.pptxSupervised Learning.pptx
Supervised Learning.pptxShishir Ahmed
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Abductive commonsense reasoning
Abductive commonsense reasoningAbductive commonsense reasoning
Abductive commonsense reasoningSan Kim
 
Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Zihui Li
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Rethinking of Generalization
Rethinking of GeneralizationRethinking of Generalization
Rethinking of GeneralizationHikaru Ibayashi
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 

Similar to GAN for Bayesian Inference objectives (20)

Variational inference
Variational inference  Variational inference
Variational inference
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptx
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Supervised Learning.pptx
Supervised Learning.pptxSupervised Learning.pptx
Supervised Learning.pptx
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof..."Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Transfer learningforclp
Transfer learningforclpTransfer learningforclp
Transfer learningforclp
 
Abductive commonsense reasoning
Abductive commonsense reasoningAbductive commonsense reasoning
Abductive commonsense reasoning
 
Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)
 
Lec10.pptx
Lec10.pptxLec10.pptx
Lec10.pptx
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Rethinking of Generalization
Rethinking of GeneralizationRethinking of Generalization
Rethinking of Generalization
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 

More from Natan Katz

AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptxNatan Katz
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUPNatan Katz
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Natan Katz
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihoodNatan Katz
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference projectNatan Katz
 

More from Natan Katz (12)

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUP
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
 
Finalver
FinalverFinalver
Finalver
 
Quant2a
Quant2aQuant2a
Quant2a
 
Bismark
BismarkBismark
Bismark
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
Ucb
UcbUcb
Ucb
 
Neural ODE
Neural ODENeural ODE
Neural ODE
 

Recently uploaded

AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 

Recently uploaded (20)

AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 

GAN for Bayesian Inference objectives

  • 2. • Generative Models • GAN –Foundation • BNN • GAN for BNN
  • 3. What is Generative Models? What it is not? Discriminative models • We study the conditional distribution P(Y=c|X=x) c-class, x-features vector • These models are trained for prediction tasks • Most of the DL renaissance occurs in such models
  • 4. Generative Model in Supervised Framework Generative models Supervised • We Train P(X=x| Y=c) • By Bayes formula (and the prior on Y)we obtain the join dist P(Y,X) We learn the statistical manners of a single class!! We acquire the ability to generate samples from a given class A common tool is Naïve Bayes
  • 5. Generative Models (Cont) Unsupervised 1. We don’t have target that guides us how to sectorize the data 2. We learn a generating deterministic function x= f(z,θ) f –deterministic, z –hidden variable θ -parameters We aim to maximize the likelihood.
  • 6. Before GAN • Most of the generative models used sampling tools (M.H, Gibbs) • Typically they need inference for next sampling (HMM, LDA,RBM) • They suffer from several failure: 1. They don’t handle well high dimensions 2. Sampling converges slowly (they are “expensive”) 3. They prefer high distr domains, hence dont map the entire space (M.H.) 4. Mini batch and gradient step are not always plausible. Then came GAN
  • 7. What was Adversarial? Adversarial are simply perturbed inputs that may cause NN to misclassify the data 1. They are often generated intentionally 2. They are located outside the data manifold (kind of noise) Goodfellow -Explaining & Harnessing Adv. Ex. He aimed to train DNN by introducing adversarial examples.
  • 8. What is Adversarial now? Nowadays • Adversarial refers to a training on worst case scenario examples • One can think of it as a game between an agent and herself Example : Samuels and his checker game ( 1950) • GAN – The worst case scenario is created by a network too
  • 9. Goodfellow’s Network (pylearn2 code at https://github.com/goodfeli/adversarial) Discriminator A Common neural net (DNNCNN ) Input: a sample of real data Output: The probability that the data is real data and not “fake” Labels: Simply 1 for real data and 0 for fake
  • 10. GAN Generator A Common neural net (DNN Goodfellow’s work) Input: A generic distribution (GaussianUniform) Output: Data sample from “real data” space such as fake images Loss 𝒎𝒊𝒏 𝑮 𝒎𝒂𝒙 𝑫 (𝑽 𝑮, 𝑫 ) = 𝑬 𝒙~𝑷 𝒅𝒂𝒕𝒂(𝒙)[𝒍𝒐𝒈( 𝑫(𝒙))] + 𝑬 𝒛~𝑷(𝒛)[𝟏 − 𝒍𝒐𝒈( 𝑫(𝑮(𝒛))]
  • 11. GAN –Advanced Architectures (with available torch code) • DCGAN – Both generator and discriminator are CNN: using batch normalization, no max pooling layers, in the disc, we replace fully connected layers by average pooling • CGAN – Supervised data where the inputs of both the discr & the generator contains the target • ACGAN –Similar to CGAN but a score is given for the class as well
  • 12. Wasserstein Distance A distance between prob. Measures: 𝑊𝑝(𝜉, 𝜋) = min 𝛾∊𝛤 E[𝑑(𝑥, 𝑦) 𝑝 ] ξ and π are the marginals of X and Y respectively We discuss only 𝑊1 the Earth Mover Distance
  • 13. Earth Mover Distance 1. Very intuitive –The work performed to move from dist P to dist Q 2. Weak Convergence (e.g. in comparison to KL ) 3 Analytically continuity is guaranteed!! Kantorovich-Rubinstein Duality: 𝑊 𝜉, 𝜋 = max 𝑓 𝐿 <1 𝐸 𝑋~𝜉 𝑓 𝑥 − 𝐸 𝑦~𝜋 𝑓 𝑦 • We can now train 𝑓 using a NN ( with some weights clipping to mimic the Lipschitz property) • Arjovsky – Wassertein Gan
  • 14. WGAN - GP • As said Lipschitz property has not fully achieved Gulrajani, Arjovsky Improved WGAN “WGAN-GP” Rather weights clipping we add gradient penalty L= 𝐸 𝑋~𝜉 𝐷 𝑥 − 𝐸 𝑦~𝜋 [𝐷 𝑦 ]+λ𝐸𝑧 [( 𝛻z 𝐷(𝑧) 2−1)2] z =𝜀𝑥 + (1 − 𝜀) 𝑦 𝜉, 𝜋 Distributions 𝜀 ~ U[0,1]
  • 15. GAN Summary • Generator – A deterministic function the maps distribution Q to distribution a vector in the space X “Real Data” • Discriminator – Receives vectors from space X and estimates whether they are from distribution Q or dist. P • Loss- Function that measures the distance between P & Q 1. We don’t need Markov chains 2. Work well with mini batches and have nice gradients 3. No inference during training 4. Handle “difficult “distribtuions (MC need convenient dist.)
  • 16. Uncertainty • Statistics prediction tools such as Bayesian inference output a confidence estimation in addition to the prediction score . What about DL and confidence….? Not too much … 
  • 17. Uncertainty Types Uncertainty Types: 1. Epistemic -Uncertainty due to lack of knowledge Episteme= Knowledge 2. Aleatoric -Uncertainty due to noisy data : We need better data not more data Aleatory=dice player The notions “reducible” & “irreducible” are used too
  • 18. Uncertainty Estimation Methods 1. Conditional entropy: H(P(y|x)) = 𝑦∈𝑌 𝑃(𝑦|𝑥) log 𝑃 𝑦 𝑥 Entropy can’t differentiate between epistemic & aleatoric uncertainty 2. Inform. Gain (info gain over params values upon input prediction) I(w,y |x,D) =H[p(y|x, D)]- 𝐸 𝑝 𝑤 𝐷 𝐻[𝑝 𝑦 𝑥, 𝑤 ] It well measures the epistemic uncertainty because little info implies that the parameter is well known 3. (VR) variation-ratio : VR(x) =1- 𝒕 𝟏 𝒚 𝒕=𝒄∗ 𝑻
  • 19. DL & Uncertainty • Deep Learning does not handle confidence: The network is trained to get features and returns probabilities or numbers, but nothing about how certain the output is. DL is about training deterministic functions upon data!! Is uncertainty important ? Images of dogs and cats are nice anecdote, but … What about MRI? Melanoma?
  • 21. Uncertainty (Cont.) So DL does not provide uncertainty measures Still… DL is a class of tools that strongly rely on probabilistic mechanism Which steps can we take in order to measure uncertainty? It appears that we simply have to add distribution to the weights!!! We can do this Bayesian Neural network (BNN)
  • 22. DL Vs. BNN DL 1. Loss is related to prediction probability P(Y|X,W) 2. Study weights W point-wise with MLE Bayesian NN 1. Loss is related the posterior probability P(W|X,Y) 2. Study weights distribution (prior assumption is given)
  • 23. Framework –Bayesian Inference The inputs: 1. Observed Data D of length n ,{(x, y)} , (numbers, categories, vectors, images) It is known also as–Evidence 2. An assumption about the probabilistic structure that generates the sample –Hypothesis 3. Prior distribution - a pre-assumption about the hypothesis distribution Objective : • GainUpdate information about the Hypothesis using the Evidence • We assume the Prior Prior and learn the Posterior P(H|D) . • Bayes Formumla
  • 24. BNN Training Process -Inference We assume prior knowledge on the weights distribution π As in any NN we get an input x’ and aim to predict y’ : P(y’| x’) = 𝑃 y’ 𝑥′ , 𝑤 𝑃 𝑤 𝐷 𝑑𝑤 This can be rewritten as: P(y’| x’) =𝐸 𝑃(𝑤|𝐷) 𝑃 y’ 𝑥′ , 𝑤
  • 25. Common tools to solve the integral 1. MCMC –Sampling (Metropolis –Hastings, Gibbs) 2. Variational Inference 3. HMC 4. SGLD
  • 26. Variational Inference We wish to estimate the posterior distribution P(Θ|D) • Rather sampling methods we can construct analytical solution : 1. Choose class of distributions Q (e.g. Gaussians) 2. Find the q that optimizes: 𝐦𝐢𝐧 𝒒∊𝑸 (𝑲𝑳(𝒒(Θ)||𝑷(Θ|𝑫)) (Jordan ,1999 , Blei 2003, Graves 2011)
  • 28. What is Hamiltonian? • Operator that measures the total energy of an system Two sets of coordinates q -State coordinates (generalized coordinates) p- Momentum H(p, q) =U(q) +K(p) U(q) = log[π 𝑞 𝐿(𝑞|𝐷)] K(P)= 𝑝 2 2𝑚 U-Potential energy, K –Kinetic 𝑑𝐻 𝑑𝑝 = 𝑞 , 𝑑𝐻 𝑑𝑞 = - 𝑝
  • 29. Hamiltonian Monte Carlo Hamiltonians satisfy the following properties 1. They are Volume preserved (Liouville’s Theorem) 2. Time invariant 3. Time reversible 4. Hamiltonians offer a deterministic vector field (with trajectories….) We can therefore use it for sampling needs, if we take distribution that depends solely in the Hamiltonian!! P(x,y) = 𝑒−𝐻(𝑥,𝑦)
  • 30. Hybrid - MC • We have the “state space” x • We can add “momentum” and use Hamiltonian mechanism Leap Frog Algorithm We set a time interval δ, For each step i : 1. 𝑃𝑖(t+0.5 δ) =𝑃𝑖(t) – (δ/2) 𝑑𝑈 𝑑𝑞(𝑡) 2 𝑄𝑖(t+ δ ) = 𝑄𝑖(t) + δ 𝑑𝐾 𝑑𝑝(𝑡+0.5δ) 3 𝑃𝑖(t+ δ) = 𝑃𝑖(t+0.5 δ) - (δ/2) 𝑑𝑈 𝑑𝑞(𝑡+δ) 𝑄𝑖 𝑄
  • 31. HMC Algorithm (Neal 1995, 2012, Duane 1987) 1. Draw 𝑥0 from our prior Draw 𝑝0 from standard normal dist. 2. Perform L steps of leapfrog 3 Pick the 𝑥 𝑡 upon M.H step min [ 1, exp(−U(q ∗ ) + U(q) − K(p ∗ ) + K(p))]
  • 33. HMC –Pros & Cons Pros • It takes points from a wider domains therefore we can describe the distribution better and converges faster • It may take points with lower density • Faster than MCMC • Ergodicity Cons • It may suffer from low energy barrier • No minibatch • It has to calculate gradients for the entire data!!! Bad
  • 34. What do we need then? • A tool that allows sub-sampling • Fewer Gradients • Keen knowledge about extremums and escape rooms
  • 35. Langevin Equation Langevin Equation describes the motion of pollen grain in water: F -γ𝑣 𝑡 +ξ 𝑡=0 ξ 𝑡 ~N(0,I) ξ 𝑡 is a Brownian Force- The collisions with the water molecules We have : F=𝛻𝐸 𝑣 𝑡 = 𝑑𝑋 𝑑𝑇 => 𝑥 𝑡+1 = 𝑥 𝑡 + dt γ ξ 𝑡 + 𝛻𝐸 dt γ (looks familiar doesn’t it?)
  • 36. SGLD Welling & Teh 2011 1. Let’s do a single leap frog at each step 2. We add the gradient a zero mean Gaussian sample . variance? Wait! 3. Robbins & Monro (1951) Stochastic Optimization , stochastic approx. method Learning rate decays in time 𝑖=1 ∞ ε 𝑡 = ∞ 𝑖=1 ∞ ε2 𝑡 < ∞ => Δ 𝜃𝑡 = ε𝑡 2 (𝛻log 𝑝 𝜃𝑡 + 𝑁 𝑛 𝑖=1 𝑁 𝛻log 𝑝 𝑥𝑖|𝜃𝑡 ) + η 𝑡 η 𝑡 ~N(0, ε 𝑡)
  • 37. What did we learn? • GAN -A generative tool that knows to approximate distributions • BNN –A cool NN tool for uncertainty estimations Can they together construct a deep girls power ?!
  • 38. GAN meets BNN Adversarial Distillation of Bayesian Neural Network Posteriors Basic Idea • Train GAN to create posterior distribution of BNN • We use WGAN-GP as loss function: L = 𝑬 𝜭~𝑷 𝜭 [𝑫(θ)] -𝑬 𝝃~𝑷 𝒓 [𝑫(𝝃)] + λ 𝑬 𝜭~𝑷 𝜭 (〖 𝜵𝑫 θ 𝟐〗 − 𝟏) 𝟐 Two Steps Training 1. Create sample from the posterior using SGLD mechanism 2. Train the WGAN-GP to sample from this posterior
  • 39. Adversarial Posterior Distillation (APD) • A generative model that distills the posterior dist (P(θ|X) Algorithmic advantage 1. Sample can be performed in parallel (MCMC is sequential) 2. A relatively small storage is required for the generator’s parameters
  • 40. APD –Offline 1 Sample a series of weights: θ 𝑡 𝑡=1 𝑇 2 Optimize G using WGAN-GP , where θ 𝑡 𝑡=1 𝑇 is the “real data Remark: They used different version of SGLD Baysian Dark Knowledge (2014,Murphy Welling)
  • 41. APD -Online 1. Draw the θ 𝑡 using the Generator 2. Loop until convergence • Draw θ 𝑡 upon MCMC method (Gibbs MH) for several iterations • Put the samples in a buffer • Use the buffer to optimize G where θ 𝑡 is the “real data”
  • 42. Post Training • GAN is a generative tool so we can simply generate…. • Rather using the posterior samples , we use the samples that the GAN generates How should we measure the uncertainty? We predict by the following : P(y| x, D) ≈ 𝑖=1 𝑇 𝑃(𝑦|𝑥,𝐺(𝑧 𝑡) 𝑇 𝑧 𝑡 ~ N(0,I)
  • 43. Uncertainty There are several methods of uncertainty: 1. Simply calculate the entropy H(y|x,D) 2. Information gain (here it has the notion: Bayesian active learning by disagreement (BALD) (Houlsby 2011) 3 We have also (VR) variation-ratio : VR(x) =1- 𝑡 1 𝑦 𝑡=𝑐∗ 𝑇
  • 44. Some outcomes • APD can retain SGLD features: 1. Anomaly detection 2. Defense 3. Active Learning • APD reduces the storage cost of SGLD (or any other MCMC) • WGAN-GP works better than Wasserstein or the original GAN
  • 46. • https://henripal.github.io/blog/langevin • http://www.quretec.com/u/vilo/edu/2003-04/DM_seminar_2003_II/Bayes/lampinen01bayesian.pdf • http://bayesiandeeplearning.org/2016/slides/nips16bayesdeep.pdf • https://pdfs.semanticscholar.org/b0f2/433c088591d265891231f1c22424047f1bc1.pdf?_ga=2.47068911.9935516.1543231280- 34044526.1542095209 • https://arxiv.org/pdf/1505.05424.pdf • https://henripal.github.io/blog/langevin -pytorch code • https://www.coursera.org/lecture/bayesian-methods-in-machine-learning/bayesian-neural-networks-HI8ta • https://pdfs.semanticscholar.org/49c6/c08709d3cbf4b58477375d7c04bcd4da4520.pdf • https://pdfs.semanticscholar.org/579d/308b610da58266dbfa3574ba9c234ff1da13.pdf • https://arxiv.org/pdf/1701.07875.pdf • https://arxiv.org/pdf/1704.00028.pdf • https://arxiv.org/pdf/1806.10317.pdf • https://arxiv.org/abs/1112.5745
  • 47. • https://www.ics.uci.edu/~welling/publications/papers/stoclangevin_v6.pdf • https://www.cs.ox.ac.uk/people/yarin.gal/website/thesis/thesis.pdf • http://arogozhnikov.github.io/2016/12/19/markov_chain_monte_carlo.html • https://theclevermachine.wordpress.com/2012/11/18/mcmc-hamiltonian-monte-carlo-a-k-a-hybrid-monte-carlo/ • https://arxiv.org/pdf/1206.1901.pdf • https://pdfs.semanticscholar.org/49c6/c08709d3cbf4b58477375d7c04bcd4da4520.pdf • http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.446.9306&rep=rep1&type=pdf • https://papers.nips.cc/paper/4329-practical-variational-inference-for-neural-networks.pdf • https://www.cs.toronto.edu/~graves/nips_2011.pdf • https://arxiv.org/pdf/1206.1901.pdf • https://danieltakeshi.github.io/2017/11/26/basics-of-bayesian-neural-networks/ • https://www.math.wustl.edu/~sawyer/hmhandouts/MetropHastingsEtc.pdf • http://edwardlib.org/tutorials/bayesian-neural-network -code python • http://physics.gu.se/~frtbm/joomla/media/mydocs/LennartSjogren/kap6.pdf • https://www.ics.uci.edu/~welling/publications/papers/stoclangevin_v6.pdf • https://henripal.github.io/blog/langevin • https://pdfs.semanticscholar.org/34dd/d8865569c2c32dec9bf7ffc817ff42faaa01.pdf