SlideShare a Scribd company logo
1 of 34
Lecture 18: Gaussian Mixture Models and Expectation Maximization Machine Learning April 13, 2010
Last Time Review of Supervised Learning Clustering K-means Soft K-means
Today A brief look at Homework 2 Gaussian Mixture Models Expectation Maximization
The Problem You have data that you believe is drawn from npopulations You want to identify parameters for each population You don’t know anything about the populations a priori Except you believe that they’re gaussian…
Gaussian Mixture Models Rather than identifying clusters by “nearest” centroids Fit a Set of k Gaussians to the data  Maximum Likelihood over a mixture model
GMM example
Mixture Models Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribution,
Gaussian Mixture Models GMM: the weighted sum of a number of Gaussians where the weights are determined by a distribution,
Graphical Modelswith unobserved variables What if you have variables in a Graphical model that are never observed? Latent Variables Training latent variable models is an unsupervised learning application uncomfortable amused laughing sweating
Latent Variable HMMs We can cluster sequences using an HMM with unobserved state variables  We will train latent variable models using Expectation Maximization
Expectation Maximization Both the training of GMMs and Graphical Models with latent variables can be accomplished using Expectation Maximization Step 1: Expectation (E-step) Evaluate the “responsibilities” of each cluster with the current parameters Step 2: Maximization (M-step) Re-estimate parameters using the existing “responsibilities” Similar to k-means training.
Latent Variable Representation We can represent a GMM involving a latent variable What does this give us? TODO: plate notation
GMM data and Latent variables
One last bit We have representations of the joint p(x,z) and the marginal, p(x)… The conditional of p(z|x) can be derived using Bayes rule. The responsibility that a mixture component takes for explaining an observation x.
Maximum Likelihood over a GMM As usual: Identify a likelihood function And set partials to zero…
Maximum Likelihood of a GMM Optimization of means.
Maximum Likelihood of a GMM Optimization of covariance Note the similarity to the regular MLE without responsibility terms.
Maximum Likelihood of a GMM Optimization of mixing term
MLE of a GMM
EM for GMMs Initialize the parameters Evaluate the log likelihood Expectation-step: Evaluate the responsibilities Maximization-step: Re-estimate Parameters Evaluate the log likelihood Check for convergence
EM for GMMs E-step: Evaluate the Responsibilities
EM for GMMs M-Step: Re-estimate Parameters
Visual example of EM
Potential Problems Incorrect number of Mixture Components Singularities
Incorrect Number of Gaussians
Incorrect Number of Gaussians
Singularities A minority of the data can have a disproportionate effect on the model likelihood. For example…
GMM example
Singularities When a mixture component collapses on a given point, the mean becomes the point, and the variance goes to zero. Consider the likelihood function as the covariance goes to zero. The likelihood approaches infinity.
Relationship to K-means K-means makes hard decisions.  Each data point gets assigned to a single cluster. GMM/EM makes soft decisions. Each data point can yield a posterior p(z|x) Soft K-means is a special case of EM.
Soft means as GMM/EM Assume equal covariance matrices for every mixture component: Likelihood: Responsibilities: As epsilon approaches zero, the responsibility approaches unity.
Soft K-Means as GMM/EM Overall Log likelihood as epsilon approaches zero: The expectation of soft k-means is the intercluster variability Note: only the means are reestimated in Soft K-means.   The covariance matrices are all tied.
General form of EM Given a joint distribution over observed and latent variables:  Want to maximize: Initialize parameters E Step: Evaluate: M-Step: Re-estimate parameters (based on expectation of complete-data log likelihood) Check for convergence of params or likelihood
Next Time Homework 4 due… Proof of Expectation Maximization in GMMs Generalized EM – Hidden Markov Models

More Related Content

What's hot

Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine LearningKuppusamy P
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic RegressionKnoldus Inc.
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksFrancesco Collova'
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2butest
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade offVARUN KUMAR
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and ldaSuresh Pokharel
 
Computational Learning Theory
Computational Learning TheoryComputational Learning Theory
Computational Learning Theorybutest
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learningmahutte
 
Intro to Feature Selection
Intro to Feature SelectionIntro to Feature Selection
Intro to Feature Selectionchenhm
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 

What's hot (20)

Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Computational Learning Theory
Computational Learning TheoryComputational Learning Theory
Computational Learning Theory
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Intro to Feature Selection
Intro to Feature SelectionIntro to Feature Selection
Intro to Feature Selection
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Feature selection
Feature selectionFeature selection
Feature selection
 

Similar to Lecture 18: Gaussian Mixture Models and Expectation Maximization

PRML Chapter 9
PRML Chapter 9PRML Chapter 9
PRML Chapter 9Sunwoo Kim
 
Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques butest
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
Linear Regression
Linear RegressionLinear Regression
Linear Regressionmailund
 
Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Zihui Li
 
Applications Of Evt In Financial Markets
Applications Of Evt In Financial MarketsApplications Of Evt In Financial Markets
Applications Of Evt In Financial Marketsmisterval
 
Reweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEPReweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEParogozhnikov
 
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...
Forecasting Default Probabilities  in Emerging Markets and   Dynamical Regula...Forecasting Default Probabilities  in Emerging Markets and   Dynamical Regula...
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...SSA KPI
 
Lecture 17: Supervised Learning Recap
Lecture 17: Supervised Learning RecapLecture 17: Supervised Learning Recap
Lecture 17: Supervised Learning Recapbutest
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:Sean Golliher
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
Machine Learning: An Introduction Fu Chang
Machine Learning: An Introduction Fu ChangMachine Learning: An Introduction Fu Chang
Machine Learning: An Introduction Fu Changbutest
 
Lecture 6
Lecture 6Lecture 6
Lecture 6hunglq
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptShayanChowdary
 

Similar to Lecture 18: Gaussian Mixture Models and Expectation Maximization (20)

PRML Chapter 9
PRML Chapter 9PRML Chapter 9
PRML Chapter 9
 
Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques
 
2008 spie gmm
2008 spie gmm2008 spie gmm
2008 spie gmm
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)
 
Applications Of Evt In Financial Markets
Applications Of Evt In Financial MarketsApplications Of Evt In Financial Markets
Applications Of Evt In Financial Markets
 
Reweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEPReweighting and Boosting to uniforimty in HEP
Reweighting and Boosting to uniforimty in HEP
 
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...
Forecasting Default Probabilities  in Emerging Markets and   Dynamical Regula...Forecasting Default Probabilities  in Emerging Markets and   Dynamical Regula...
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...
 
Lecture 17: Supervised Learning Recap
Lecture 17: Supervised Learning RecapLecture 17: Supervised Learning Recap
Lecture 17: Supervised Learning Recap
 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Machine Learning: An Introduction Fu Chang
Machine Learning: An Introduction Fu ChangMachine Learning: An Introduction Fu Chang
Machine Learning: An Introduction Fu Chang
 
Lecture 6
Lecture 6Lecture 6
Lecture 6
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Lecture 18: Gaussian Mixture Models and Expectation Maximization

  • 1. Lecture 18: Gaussian Mixture Models and Expectation Maximization Machine Learning April 13, 2010
  • 2. Last Time Review of Supervised Learning Clustering K-means Soft K-means
  • 3. Today A brief look at Homework 2 Gaussian Mixture Models Expectation Maximization
  • 4. The Problem You have data that you believe is drawn from npopulations You want to identify parameters for each population You don’t know anything about the populations a priori Except you believe that they’re gaussian…
  • 5. Gaussian Mixture Models Rather than identifying clusters by “nearest” centroids Fit a Set of k Gaussians to the data Maximum Likelihood over a mixture model
  • 7. Mixture Models Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribution,
  • 8. Gaussian Mixture Models GMM: the weighted sum of a number of Gaussians where the weights are determined by a distribution,
  • 9. Graphical Modelswith unobserved variables What if you have variables in a Graphical model that are never observed? Latent Variables Training latent variable models is an unsupervised learning application uncomfortable amused laughing sweating
  • 10. Latent Variable HMMs We can cluster sequences using an HMM with unobserved state variables We will train latent variable models using Expectation Maximization
  • 11. Expectation Maximization Both the training of GMMs and Graphical Models with latent variables can be accomplished using Expectation Maximization Step 1: Expectation (E-step) Evaluate the “responsibilities” of each cluster with the current parameters Step 2: Maximization (M-step) Re-estimate parameters using the existing “responsibilities” Similar to k-means training.
  • 12. Latent Variable Representation We can represent a GMM involving a latent variable What does this give us? TODO: plate notation
  • 13. GMM data and Latent variables
  • 14. One last bit We have representations of the joint p(x,z) and the marginal, p(x)… The conditional of p(z|x) can be derived using Bayes rule. The responsibility that a mixture component takes for explaining an observation x.
  • 15. Maximum Likelihood over a GMM As usual: Identify a likelihood function And set partials to zero…
  • 16. Maximum Likelihood of a GMM Optimization of means.
  • 17. Maximum Likelihood of a GMM Optimization of covariance Note the similarity to the regular MLE without responsibility terms.
  • 18. Maximum Likelihood of a GMM Optimization of mixing term
  • 19. MLE of a GMM
  • 20. EM for GMMs Initialize the parameters Evaluate the log likelihood Expectation-step: Evaluate the responsibilities Maximization-step: Re-estimate Parameters Evaluate the log likelihood Check for convergence
  • 21. EM for GMMs E-step: Evaluate the Responsibilities
  • 22. EM for GMMs M-Step: Re-estimate Parameters
  • 24. Potential Problems Incorrect number of Mixture Components Singularities
  • 25. Incorrect Number of Gaussians
  • 26. Incorrect Number of Gaussians
  • 27. Singularities A minority of the data can have a disproportionate effect on the model likelihood. For example…
  • 29. Singularities When a mixture component collapses on a given point, the mean becomes the point, and the variance goes to zero. Consider the likelihood function as the covariance goes to zero. The likelihood approaches infinity.
  • 30. Relationship to K-means K-means makes hard decisions. Each data point gets assigned to a single cluster. GMM/EM makes soft decisions. Each data point can yield a posterior p(z|x) Soft K-means is a special case of EM.
  • 31. Soft means as GMM/EM Assume equal covariance matrices for every mixture component: Likelihood: Responsibilities: As epsilon approaches zero, the responsibility approaches unity.
  • 32. Soft K-Means as GMM/EM Overall Log likelihood as epsilon approaches zero: The expectation of soft k-means is the intercluster variability Note: only the means are reestimated in Soft K-means. The covariance matrices are all tied.
  • 33. General form of EM Given a joint distribution over observed and latent variables: Want to maximize: Initialize parameters E Step: Evaluate: M-Step: Re-estimate parameters (based on expectation of complete-data log likelihood) Check for convergence of params or likelihood
  • 34. Next Time Homework 4 due… Proof of Expectation Maximization in GMMs Generalized EM – Hidden Markov Models

Editor's Notes

  1. p(x) = pi_0f_0(x) + pi_1f_1(x) + pi_2f_2(x) + ldots + pi_kf_k(x)
  2. frac{partial lnp(x|pi, mu,Sigma) + lambdaleft(sum_{k=1}^K pi_k -1 ight)}{partial pi_k}&=&
  3. p(x|mu_k,Sigma_k) = frac{1}{(2piepsilon)^{M/2}}expleft{-frac{1}{2epsilon}lVert x-mu_k Vert^2 ight}