SlideShare a Scribd company logo
1 of 21
Variational Inference
Presenter: Shuai Zhang, CSE, UNSW
Content
1
•Brief Introduction
2
•Core Idea of VI
•Optimization
3
•Example: Bayesian Mix of Gauss
What is Variational Inference?
Variational Bayesian methods are a family of techniques for
approximating intractable integrals arising in Bayesian inference
and machine learning [Wiki].
It is widely used to approximate posterior densities for Bayesian
models, an alternative strategy to Markov Chain Monte Carlo, but
it tends to be faster and easier to scale to large data.
It has been applied to problems such as document analysis,
computational neuroscience and computer vision.
Core Idea
Consider a general problem of Bayesian Inference - Let the latent
variables in our problem be and the observed data
Inference in a Bayesian model amounts to conditioning on data
and computing the posterior
Approximate Inference
The inference problem is to compute the conditional given by the
below equation.
the denominator is the marginal distribution of the data obtained
by marginalizing all the latent variables from the joint distribution
p(x,z).
For many models, this evidence integral is unavailable in closed
form or requires exponential time to compute. The evidence is
what we need to compute the conditional from the joint; this is
why inference in such models is hard
MCMC
In MCMC, we first construct an ergodic Markov chain on z whose
stationary distribution is the posterior
Then, we sample from the chain to collect samples from the
stationary distribution.
Finally, we approximate the posterior with an empirical estimate
constructed from the collected samples.
VI vs. MCMC
MCMC VI
More computationally intensive Less intensive
Guarantees producing
asymptotically exact samples from
target distribution
No such guarantees
Slower Faster, especially for large data
sets and complex distributions
Best for precise inference Useful to explore many scenarios
quickly or large data sets
Core Idea
Rather than use sampling, the main idea behind variational
inference is to use optimization.
we restrict ourselves a family of approximate distributions D over
the latent variables. We then try to find the member of that
family that minimizes the Kullback-Leibler divergence to the exact
posterior. This reduces to solving an optimization problem.
The goal is to approximate p(z|x) with the resulting q(z). We
optimize q(z) for minimal value of KL divergence
Core Idea
The objective is not computable. Because
Because we cannot compute the KL, we optimize an alternative
objective that is equivalent to the KL up to an added constant.
We know from our discussion of EM.
Core Idea
Thus, we have the objective function:
Maximizing the ELBO is equivalent to minimizing the KL
divergence.
Intuition: We rewrite the ELBO as a sum of the expected log
likelihood of the data and the KL divergence between the prior
p(z) and q(z)
Mean field approximation
Now that we have specified the variational objective function
with the ELBO, we now need to specify the variational family of
distributions from which we pick the approximate variational
distribution.
A common family of distributions to pick is the Mean-field
variational family. Here, the latent variables are mutually
independent and each governed by a distinct factor in the
variational distribution.
Coordinate ascent mean-field VI
Having specified our objective function and the variational family
of distributions from which to pick the approximation, we now
work to optimize.
CAVI maximizes ELBO by iteratively optimizing each variational
factor of the mean-field variational distribution, while holding
the others fixed. It however, does not guarantee finding the
global optimum.
Coordinate ascent mean-field VI
given that we fix the value of all other variational factors ql(zl) (l
not equal to j), the optimal 𝑞 𝑗(𝑧𝑗) is proportional to the
exponentiated expected log of the complete conditional. This
then is equivalent to being proportional to the log of the joint
because the mean-field family assumes that all the latent
variables are independent.
Coordinate ascent mean-field VI
Below, we rewrite the first term using iterated expectation and
for the second term, we have only retained the term that
depends on
In this final equation, the RHS is equal to the negative KL
divergence between 𝑞 𝑗 and exp(A). Thus, maximizing this
expression is the same as minimizing the KL divergence between
𝑞 𝑗 and exp(A).
This occurs when 𝑞 𝑗 =exp(A).
Coordinate ascent mean-field VI
Bayesian Mixture of Gaussians
The full hierarchical model of
The joint dist.
Bayesian Mixture of Gaussians
The mean field variational family contains approximate posterior
densities of the form
Bayesian Mixture of Gaussians
Derive the ELBO as a function of the variational factors. Solve for
the ELBO
Bayesian Mixture of Gaussians
Next, we derive the CAVI update for the variational factors.
References
1. https://am207.github.io/2017/wiki/VI.html
2. https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf
3. https://www.cs.cmu.edu/~epxing/Class/10708-17/notes-17/10708-scribe-lecture13.pdf
4. https://arxiv.org/pdf/1601.00670.pdf
Week Report
• Last week
• Metric Factorization model
• Learning Group
• This week
• Submit the ICDE paper

More Related Content

What's hot

Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data miningZHAO Sam
 
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING csandit
 
Emergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsEmergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsSangwoo Mo
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Taiji Suzuki
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscapeDevansh16
 
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...Taiji Suzuki
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelSqrrl
 
Parallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangParallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangChinmay Patel
 

What's hot (11)

Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
 
Birch
BirchBirch
Birch
 
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
IMAGE REGISTRATION USING ADVANCED TOPOLOGY PRESERVING RELAXATION LABELING
 
Emergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsEmergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep Representations
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
Optics
OpticsOptics
Optics
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscape
 
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
 
Parallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangParallel kmeans clustering in Erlang
Parallel kmeans clustering in Erlang
 

Similar to Learning group variational inference

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20Yuta Kashino
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihoodNatan Katz
 
Variational Inference in Python
Variational Inference in PythonVariational Inference in Python
Variational Inference in PythonPeadar Coyle
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5Sunwoo Kim
 
Composing graphical models with neural networks for structured representatio...
Composing graphical models with  neural networks for structured representatio...Composing graphical models with  neural networks for structured representatio...
Composing graphical models with neural networks for structured representatio...Jeongmin Cha
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesSho Takase
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingTomonari Masada
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data AnalysisNBER
 
A Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete ProblemsA Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete ProblemsBrittany Allen
 
Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)Mumbai Academisc
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligencekeerthikaA8
 

Similar to Learning group variational inference (20)

A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
 
Zahedi
ZahediZahedi
Zahedi
 
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
Variational Inference in Python
Variational Inference in PythonVariational Inference in Python
Variational Inference in Python
 
Quantum Deep Learning
Quantum Deep LearningQuantum Deep Learning
Quantum Deep Learning
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
PRML Chapter 5
PRML Chapter 5PRML Chapter 5
PRML Chapter 5
 
Composing graphical models with neural networks for structured representatio...
Composing graphical models with  neural networks for structured representatio...Composing graphical models with  neural networks for structured representatio...
Composing graphical models with neural networks for structured representatio...
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
1607.01152.pdf
1607.01152.pdf1607.01152.pdf
1607.01152.pdf
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic Rules
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
A Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete ProblemsA Probabilistic Attack On NP-Complete Problems
A Probabilistic Attack On NP-Complete Problems
 
Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)Face recognition using laplacianfaces (synopsis)
Face recognition using laplacianfaces (synopsis)
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 

More from Shuai Zhang

Introduction to Random Walk
Introduction to Random WalkIntroduction to Random Walk
Introduction to Random WalkShuai Zhang
 
Reading group nfm - 20170312
Reading group  nfm - 20170312Reading group  nfm - 20170312
Reading group nfm - 20170312Shuai Zhang
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copyShuai Zhang
 
Learning group dssm - 20170605
Learning group   dssm - 20170605Learning group   dssm - 20170605
Learning group dssm - 20170605Shuai Zhang
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417Shuai Zhang
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboostShuai Zhang
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 

More from Shuai Zhang (8)

Introduction to Random Walk
Introduction to Random WalkIntroduction to Random Walk
Introduction to Random Walk
 
Reading group nfm - 20170312
Reading group  nfm - 20170312Reading group  nfm - 20170312
Reading group nfm - 20170312
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copy
 
Learning group dssm - 20170605
Learning group   dssm - 20170605Learning group   dssm - 20170605
Learning group dssm - 20170605
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboost
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 

Recently uploaded

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Learning group variational inference

  • 2. Content 1 •Brief Introduction 2 •Core Idea of VI •Optimization 3 •Example: Bayesian Mix of Gauss
  • 3. What is Variational Inference? Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning [Wiki]. It is widely used to approximate posterior densities for Bayesian models, an alternative strategy to Markov Chain Monte Carlo, but it tends to be faster and easier to scale to large data. It has been applied to problems such as document analysis, computational neuroscience and computer vision.
  • 4. Core Idea Consider a general problem of Bayesian Inference - Let the latent variables in our problem be and the observed data Inference in a Bayesian model amounts to conditioning on data and computing the posterior
  • 5. Approximate Inference The inference problem is to compute the conditional given by the below equation. the denominator is the marginal distribution of the data obtained by marginalizing all the latent variables from the joint distribution p(x,z). For many models, this evidence integral is unavailable in closed form or requires exponential time to compute. The evidence is what we need to compute the conditional from the joint; this is why inference in such models is hard
  • 6. MCMC In MCMC, we first construct an ergodic Markov chain on z whose stationary distribution is the posterior Then, we sample from the chain to collect samples from the stationary distribution. Finally, we approximate the posterior with an empirical estimate constructed from the collected samples.
  • 7. VI vs. MCMC MCMC VI More computationally intensive Less intensive Guarantees producing asymptotically exact samples from target distribution No such guarantees Slower Faster, especially for large data sets and complex distributions Best for precise inference Useful to explore many scenarios quickly or large data sets
  • 8. Core Idea Rather than use sampling, the main idea behind variational inference is to use optimization. we restrict ourselves a family of approximate distributions D over the latent variables. We then try to find the member of that family that minimizes the Kullback-Leibler divergence to the exact posterior. This reduces to solving an optimization problem. The goal is to approximate p(z|x) with the resulting q(z). We optimize q(z) for minimal value of KL divergence
  • 9. Core Idea The objective is not computable. Because Because we cannot compute the KL, we optimize an alternative objective that is equivalent to the KL up to an added constant. We know from our discussion of EM.
  • 10. Core Idea Thus, we have the objective function: Maximizing the ELBO is equivalent to minimizing the KL divergence. Intuition: We rewrite the ELBO as a sum of the expected log likelihood of the data and the KL divergence between the prior p(z) and q(z)
  • 11. Mean field approximation Now that we have specified the variational objective function with the ELBO, we now need to specify the variational family of distributions from which we pick the approximate variational distribution. A common family of distributions to pick is the Mean-field variational family. Here, the latent variables are mutually independent and each governed by a distinct factor in the variational distribution.
  • 12. Coordinate ascent mean-field VI Having specified our objective function and the variational family of distributions from which to pick the approximation, we now work to optimize. CAVI maximizes ELBO by iteratively optimizing each variational factor of the mean-field variational distribution, while holding the others fixed. It however, does not guarantee finding the global optimum.
  • 13. Coordinate ascent mean-field VI given that we fix the value of all other variational factors ql(zl) (l not equal to j), the optimal 𝑞 𝑗(𝑧𝑗) is proportional to the exponentiated expected log of the complete conditional. This then is equivalent to being proportional to the log of the joint because the mean-field family assumes that all the latent variables are independent.
  • 14. Coordinate ascent mean-field VI Below, we rewrite the first term using iterated expectation and for the second term, we have only retained the term that depends on In this final equation, the RHS is equal to the negative KL divergence between 𝑞 𝑗 and exp(A). Thus, maximizing this expression is the same as minimizing the KL divergence between 𝑞 𝑗 and exp(A). This occurs when 𝑞 𝑗 =exp(A).
  • 16. Bayesian Mixture of Gaussians The full hierarchical model of The joint dist.
  • 17. Bayesian Mixture of Gaussians The mean field variational family contains approximate posterior densities of the form
  • 18. Bayesian Mixture of Gaussians Derive the ELBO as a function of the variational factors. Solve for the ELBO
  • 19. Bayesian Mixture of Gaussians Next, we derive the CAVI update for the variational factors.
  • 20. References 1. https://am207.github.io/2017/wiki/VI.html 2. https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf 3. https://www.cs.cmu.edu/~epxing/Class/10708-17/notes-17/10708-scribe-lecture13.pdf 4. https://arxiv.org/pdf/1601.00670.pdf
  • 21. Week Report • Last week • Metric Factorization model • Learning Group • This week • Submit the ICDE paper

Editor's Notes

  1. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  2. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  3. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  4. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  5. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  6. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  7. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  8. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  9. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  10. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  11. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  12. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  13. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  14. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  15. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  16. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  17. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  18. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  19. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  20. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation
  21. Named Entity Mining from Click-Through Data Using Weakly Supervised Latent Dirichlet Allocation Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation