SlideShare a Scribd company logo
1 of 64
Spectral Clustering

      Zitao Liu
Agenda
•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
CLUSTERING REVIEW
K-MEANS CLUSTERING
• Description
   Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional
   real vector, k-means clustering aims to partition the n observations into k sets
   (k ≤ n) S = {S1, S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS):



   where μi is the mean of points in Si.
• Standard Algorithm




1) k initial "means"    2) k clusters are created   3) The centroid of   4) Steps 2 and 3
(in this case k=3)      by associating every        each of the k        are repeated until
are randomly            observation with the        clusters becomes     convergence has
selected from the       nearest mean.               the new means.       been reached.
data set.
•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
GENERAL




  Disconnected     Groups of points(Weakly connections in between components
graph components   Strongly connections within components)
GRAPH NOTATION
GRAPH NOTATION
SIMILARITY
GRAPH




 All the above graphs are regularly used in spectral clustering!
Spectral Clustering

•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
GRAPH
LAPLACIANS
• Unnormalized Graph Laplacian

                         L=D-W
GRAPH
LAPLACIANS
• Unnormalized Graph Laplacian

                         L=D-W
GRAPH
LAPLACIANS
• Unnormalized Graph Laplacian

                         L=D-W




Proof:
GRAPH
LAPLACIANS
• Normalized Graph Laplacian
GRAPH
LAPLACIANS Laplacian
• Normalized Graph
GRAPH
LAPLACIANS
• Normalized Graph Laplacian




  Proof. The proof is analogous to the one of Proposition 2, using Proposition 3.
Spectral Clustering

•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
ALGORIGHM
ALGORIGHM

• Unnormalized Graph Laplacian

                         L=D-W
ALGORIGHM
• Normalized Graph Laplacian
ALGORIGHM
• Normalized Graph Laplacian
ALGORIGHM




        It really works!!!
ALGORIGHM
Spectral Clustering

•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
GRAPH CUT




  Disconnected     Groups of points(Weakly connections in between components
graph components   Strongly connections within components)
GRAPH CUT
GRAPH CUT
Problems!!!
• Sensitive to outliers




     What we get          What we want
GRAPH CUT
Solutions
• RatioCut(Hagen and Kahng, 1992)




• Ncut(Shi and Malik, 2000)
GRAPH CUT
Problem!!!
• NP hard

Solution!!!
• Approximation


                                    relaxing
                          Ncut                 Normalized Spectral Clustering
Approx       Spectral
imation     Clustering
                                    relaxing
                         RatioCut              Unnormalized Spectral Clustering
GRAPH CUT
• Approximation RatioCut for k=2
    Our goal is to solve the optimization problem:



    Rewrite the problem in a more convenient form:




                     Magic happens!!!
GRAPH CUT
• Approximation RatioCut for k=2
GRAPH CUT
• Approximation RatioCut for k=2
    Additionally, we have




    f satisfies
GRAPH CUT
• Approximation RatioCut for k=2




                             Relaxation !!!




                              Rayleigh-Ritz Theorem
GRAPH CUT
• Approximation RatioCut for k=2

                     f is the eigenvector corresponding to
                     the second smallest eigenvalue of L


        Use the sign as
          indicator               re-convert
           function




      Only works for k = 2                               More General, works for any k
GRAPH CUT
• Approximation RatioCut for arbitrary k



                                           (i=1,…,n; j=1,…,k)
GRAPH CUT
• Approximation RatioCut for arbitrary k
    Problem reformulation:




                                             Relaxation !!!




                                           Rayleigh-Ritz Theorem

                 Optimal H is the first k eigenvectors of L as
                 columns.
GRAPH CUT
• Approximation Ncut for k=2
    Our goal is to solve the optimization problem:



    Rewrite the problem in a more convenient form:




    Similar to above one can check that:
GRAPH CUT
• Approximation Ncut for k=2                                      (6)




                               Relaxation !!!




                                                Rayleigh-Ritz Theorem!!!
GRAPH CUT
• Approximation Ncut for arbitrary k
    Problem reformulation:




                                  Relaxation !!!




                                 Rayleigh-Ritz Theorem
Spectral Clustering

•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
RANDOM WALK

• A random walk on a graph is a stochastic process which
  randomly jumps from vertex to vertex.

• Random walk stays long within the same cluster and seldom
  jumps between clusters.

• A balanced partition with a low cut will also have the property
  that the random walk does not have many opportunities to
  jump between clusters.
RANDOM WALK
RANDOM WALK
RANDOM WALK
• Random walks and Ncut
RANDOM WALK
• Random walks and Ncut




Proof. First of all observe that




Using this we obtain




 Now the proposition follows directly with the definition of Ncut.
RANDOM WALK
• Random walks and Ncut
RANDOM WALK
• What is commute distance




  Points which are connected by a short path in the graph and lie in the same
  high-density region of the graph are considered closer to each other than
  points which are connected by a short path but lie in different high-density
  regions of the graph.

                    Well-suited for Clustering
RANDOM WALK

• How to calculate commute distance
  Generalized inverse (also called pseudo-inverse or Moore-Penrose inverse)
RANDOM WALK

• How to calculate commute distance




   This is result has been published by Klein and Randic(1993), where it has been
   proved by methods of electrical network theory.
RANDOM WALK
• Proposition 6’s consequence



                          Construct
                        an embedding
RANDOM WALK
• A loose relation between spectral clustering and commute distance.
Spectral Clustering

•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
PERTURBATION THEORY
• Perturbation theory studies the question of how eigenvalues
  and eigenvectors of a matrix A change if we add a small
  perturbation H.
PERTURBATION THEORY




Below we will see that the size of the eigengap can also be used in a different context
as a quality criterion for spectral clustering, namely when choosing the number k of
clusters to construct.
Spectral Clustering

•   Brief Clustering Review
•   Similarity Graph
•   Graph Laplacian
•   Spectral Clustering Algorithm
•   Graph Cut Point of View
•   Random Walk Point of View
•   Perturbation Theory Point of View
•   Practical Details
PRACTICAL DETAILS
• Constructing the similarity graph
PRACTICAL DETAILS

• Computing Eigenvectors
     1.   How to compute the first eigenvectors efficiently for large L
     2.   Numerical eigensolvers converge to some orthonormal basis of the
          eigenspace.


• Number of Clusters
PRACTICAL DETAILS




    Well Separated                 More Blurry                 Overlap So Much


Eigengap Heuristic usually works well if the data contains very well pronounced
clusters, but in ambiguous cases it also returns ambiguous results.
PRACTICAL DETAILS
• The k-means step
      It is not necessary. People also use other techniques

• Which graph Laplacian should be used?
PRACTICAL DETAILS
• Which graph Laplacian should be used?
       Why normalized is better than unnormalized spectral clustering?
   Objective1:




   Both RatioCut and Ncut directly implement

   Objective2:




   Only Ncut implements

 Normalized spectral clustering implements both clustering objectives mentioned above,
 while unnormalized spectral clustering only implements the first obejctive.
PRACTICAL DETAILS

• Which graph Laplacian should be used?
REFERENCE
•   Ulrike Von Luxburg. A Tutorial on Spectral Clustering. Max Planck Institute for
    Biological Cybernetics Technical Report No. TR-149.
•   Marina Meila and Jianbo Shi. A random walks view of spectral segmentation. In
    Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS),
    2001.
•   A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an
    algorithm. Advances in Neural Information Processing Systems (NIPS),14, 2001.
•   A. Azran and Z. Ghahramani. A new Approach to Data Driven Clustering. In
    International Conference on Machine Learning (ICML),11, 2006.
•   A. Azran and Z. Ghahramani. Spectral Methods for Automatic Multiscale Data
    Clustering. In IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR), 2006.
•   Arik Azran. A Tutorial on Spectral Clustring.
    http://videolectures.net/mlcued08_azran_mcl/
SPECTRAL CLUSTERING




            Thank you

More Related Content

What's hot

Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Databricks
 

What's hot (20)

Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Information and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theoremInformation and network security 35 the chinese remainder theorem
Information and network security 35 the chinese remainder theorem
 
Genetic algorithms in Data Mining
Genetic algorithms in Data MiningGenetic algorithms in Data Mining
Genetic algorithms in Data Mining
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 
Approximation Algorithms
Approximation AlgorithmsApproximation Algorithms
Approximation Algorithms
 
Multimodal Deep Learning
Multimodal Deep LearningMultimodal Deep Learning
Multimodal Deep Learning
 
Hmm viterbi
Hmm viterbiHmm viterbi
Hmm viterbi
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Tabnet presentation
Tabnet presentationTabnet presentation
Tabnet presentation
 
Pretty good privacy
Pretty good privacyPretty good privacy
Pretty good privacy
 
Maximum Matching in General Graphs
Maximum Matching in General GraphsMaximum Matching in General Graphs
Maximum Matching in General Graphs
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Chapter 1 - Introduction
Chapter 1 - IntroductionChapter 1 - Introduction
Chapter 1 - Introduction
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Language models
Language modelsLanguage models
Language models
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection method
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
bag-of-words models
bag-of-words models bag-of-words models
bag-of-words models
 

Viewers also liked

icml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part IIicml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part II
zukun
 
طبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلب
طبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلبطبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلب
طبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلب
faradars
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part I
zukun
 
Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
idoguy
 
tensor-decomposition
tensor-decompositiontensor-decomposition
tensor-decomposition
Kenta Oono
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
André Panisson
 

Viewers also liked (19)

Pert 05 aplikasi clustering
Pert 05 aplikasi clusteringPert 05 aplikasi clustering
Pert 05 aplikasi clustering
 
icml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part IIicml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part II
 
Graph partition 2
Graph partition 2Graph partition 2
Graph partition 2
 
Energy Efficient Wireless Communications
Energy Efficient Wireless CommunicationsEnergy Efficient Wireless Communications
Energy Efficient Wireless Communications
 
Genetic algorithm and graph partitioning problem
Genetic algorithm and graph partitioning problemGenetic algorithm and graph partitioning problem
Genetic algorithm and graph partitioning problem
 
طبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلب
طبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلبطبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلب
طبقه بندی و بازشناسی الگو با شبکه های عصبی LVQ در متلب
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphs
 
ICWSM12 Brief Review
ICWSM12 Brief ReviewICWSM12 Brief Review
ICWSM12 Brief Review
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part I
 
Clusterin k means
Clusterin k meansClusterin k means
Clusterin k means
 
Blog clustering
Blog clusteringBlog clustering
Blog clustering
 
A short history of MCMC
A short history of MCMCA short history of MCMC
A short history of MCMC
 
Notes on Spectral Clustering
Notes on Spectral ClusteringNotes on Spectral Clustering
Notes on Spectral Clustering
 
Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
 
Multilayer tutorial-netsci2014-slightlyupdated
Multilayer tutorial-netsci2014-slightlyupdatedMultilayer tutorial-netsci2014-slightlyupdated
Multilayer tutorial-netsci2014-slightlyupdated
 
Spectral clustering - Houston ML Meetup
Spectral clustering - Houston ML MeetupSpectral clustering - Houston ML Meetup
Spectral clustering - Houston ML Meetup
 
tensor-decomposition
tensor-decompositiontensor-decomposition
tensor-decomposition
 
Tensor Spectral Clustering
Tensor Spectral ClusteringTensor Spectral Clustering
Tensor Spectral Clustering
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
 

Similar to Spectral clustering Tutorial

Why quants love NAG & NAG from Excel
Why quants love NAG & NAG from ExcelWhy quants love NAG & NAG from Excel
Why quants love NAG & NAG from Excel
Marcin Krzysztofik
 
08-Regression.pptx
08-Regression.pptx08-Regression.pptx
08-Regression.pptx
Shree Shree
 

Similar to Spectral clustering Tutorial (20)

cutnpeel_wsdm2022_slide.pdf
cutnpeel_wsdm2022_slide.pdfcutnpeel_wsdm2022_slide.pdf
cutnpeel_wsdm2022_slide.pdf
 
9553477.ppt
9553477.ppt9553477.ppt
9553477.ppt
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
Optim_methods.pdf
Optim_methods.pdfOptim_methods.pdf
Optim_methods.pdf
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
141222 graphulo ingraphblas
141222 graphulo ingraphblas141222 graphulo ingraphblas
141222 graphulo ingraphblas
 
141205 graphulo ingraphblas
141205 graphulo ingraphblas141205 graphulo ingraphblas
141205 graphulo ingraphblas
 
image segmentation image segmentation.pptx
image segmentation image segmentation.pptximage segmentation image segmentation.pptx
image segmentation image segmentation.pptx
 
Gdc2012 frames, sparsity and global illumination
Gdc2012 frames, sparsity and global illumination Gdc2012 frames, sparsity and global illumination
Gdc2012 frames, sparsity and global illumination
 
Approximation Algorithms TSP
Approximation Algorithms   TSPApproximation Algorithms   TSP
Approximation Algorithms TSP
 
Webinar on Graph Neural Networks
Webinar on Graph Neural NetworksWebinar on Graph Neural Networks
Webinar on Graph Neural Networks
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
Why quants love NAG & NAG from Excel
Why quants love NAG & NAG from ExcelWhy quants love NAG & NAG from Excel
Why quants love NAG & NAG from Excel
 
08-Regression.pptx
08-Regression.pptx08-Regression.pptx
08-Regression.pptx
 
CHAPTER 4.1.pdf
CHAPTER 4.1.pdfCHAPTER 4.1.pdf
CHAPTER 4.1.pdf
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
Stats chapter 3
Stats chapter 3Stats chapter 3
Stats chapter 3
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Spectral clustering Tutorial

  • 2. Agenda • Brief Clustering Review • Similarity Graph • Graph Laplacian • Spectral Clustering Algorithm • Graph Cut Point of View • Random Walk Point of View • Perturbation Theory Point of View • Practical Details
  • 3. Brief Clustering Review • Similarity Graph • Graph Laplacian • Spectral Clustering Algorithm • Graph Cut Point of View • Random Walk Point of View • Perturbation Theory Point of View • Practical Details
  • 5. K-MEANS CLUSTERING • Description Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k sets (k ≤ n) S = {S1, S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS): where μi is the mean of points in Si. • Standard Algorithm 1) k initial "means" 2) k clusters are created 3) The centroid of 4) Steps 2 and 3 (in this case k=3) by associating every each of the k are repeated until are randomly observation with the clusters becomes convergence has selected from the nearest mean. the new means. been reached. data set.
  • 6. Brief Clustering Review • Similarity Graph • Graph Laplacian • Spectral Clustering Algorithm • Graph Cut Point of View • Random Walk Point of View • Perturbation Theory Point of View • Practical Details
  • 7. GENERAL Disconnected Groups of points(Weakly connections in between components graph components Strongly connections within components)
  • 10. SIMILARITY GRAPH All the above graphs are regularly used in spectral clustering!
  • 11. Spectral Clustering • Brief Clustering Review • Similarity Graph • Graph Laplacian • Spectral Clustering Algorithm • Graph Cut Point of View • Random Walk Point of View • Perturbation Theory Point of View • Practical Details
  • 14. GRAPH LAPLACIANS • Unnormalized Graph Laplacian L=D-W Proof:
  • 17. GRAPH LAPLACIANS • Normalized Graph Laplacian Proof. The proof is analogous to the one of Proposition 2, using Proposition 3.
  • 18. Spectral Clustering • Brief Clustering Review • Similarity Graph • Graph Laplacian • Spectral Clustering Algorithm • Graph Cut Point of View • Random Walk Point of View • Perturbation Theory Point of View • Practical Details
  • 23. ALGORIGHM It really works!!!
  • 25. Spectral Clustering • Brief Clustering Review • Similarity Graph • Graph Laplacian • Spectral Clustering Algorithm • Graph Cut Point of View • Random Walk Point of View • Perturbation Theory Point of View • Practical Details
  • 26. GRAPH CUT Disconnected Groups of points(Weakly connections in between components graph components Strongly connections within components)
  • 28. GRAPH CUT Problems!!! • Sensitive to outliers What we get What we want
  • 29. GRAPH CUT Solutions • RatioCut(Hagen and Kahng, 1992) • Ncut(Shi and Malik, 2000)
  • 30. GRAPH CUT Problem!!! • NP hard Solution!!! • Approximation relaxing Ncut Normalized Spectral Clustering Approx Spectral imation Clustering relaxing RatioCut Unnormalized Spectral Clustering
  • 31. GRAPH CUT • Approximation RatioCut for k=2 Our goal is to solve the optimization problem: Rewrite the problem in a more convenient form: Magic happens!!!
  • 32. GRAPH CUT • Approximation RatioCut for k=2
  • 33. GRAPH CUT • Approximation RatioCut for k=2 Additionally, we have f satisfies
  • 34. GRAPH CUT • Approximation RatioCut for k=2 Relaxation !!! Rayleigh-Ritz Theorem
  • 35. GRAPH CUT • Approximation RatioCut for k=2 f is the eigenvector corresponding to the second smallest eigenvalue of L Use the sign as indicator re-convert function Only works for k = 2 More General, works for any k
  • 36. GRAPH CUT • Approximation RatioCut for arbitrary k (i=1,…,n; j=1,…,k)
  • 37. GRAPH CUT • Approximation RatioCut for arbitrary k Problem reformulation: Relaxation !!! Rayleigh-Ritz Theorem Optimal H is the first k eigenvectors of L as columns.
  • 38. GRAPH CUT • Approximation Ncut for k=2 Our goal is to solve the optimization problem: Rewrite the problem in a more convenient form: Similar to above one can check that:
  • 39. GRAPH CUT • Approximation Ncut for k=2 (6) Relaxation !!! Rayleigh-Ritz Theorem!!!
  • 40. GRAPH CUT • Approximation Ncut for arbitrary k Problem reformulation: Relaxation !!! Rayleigh-Ritz Theorem
  • 41. Spectral Clustering • Brief Clustering Review • Similarity Graph • Graph Laplacian • Spectral Clustering Algorithm • Graph Cut Point of View • Random Walk Point of View • Perturbation Theory Point of View • Practical Details
  • 42. RANDOM WALK • A random walk on a graph is a stochastic process which randomly jumps from vertex to vertex. • Random walk stays long within the same cluster and seldom jumps between clusters. • A balanced partition with a low cut will also have the property that the random walk does not have many opportunities to jump between clusters.
  • 45. RANDOM WALK • Random walks and Ncut
  • 46. RANDOM WALK • Random walks and Ncut Proof. First of all observe that Using this we obtain Now the proposition follows directly with the definition of Ncut.
  • 47. RANDOM WALK • Random walks and Ncut
  • 48. RANDOM WALK • What is commute distance Points which are connected by a short path in the graph and lie in the same high-density region of the graph are considered closer to each other than points which are connected by a short path but lie in different high-density regions of the graph. Well-suited for Clustering
  • 49. RANDOM WALK • How to calculate commute distance Generalized inverse (also called pseudo-inverse or Moore-Penrose inverse)
  • 50. RANDOM WALK • How to calculate commute distance This is result has been published by Klein and Randic(1993), where it has been proved by methods of electrical network theory.
  • 51. RANDOM WALK • Proposition 6’s consequence Construct an embedding
  • 52. RANDOM WALK • A loose relation between spectral clustering and commute distance.
  • 53. Spectral Clustering • Brief Clustering Review • Similarity Graph • Graph Laplacian • Spectral Clustering Algorithm • Graph Cut Point of View • Random Walk Point of View • Perturbation Theory Point of View • Practical Details
  • 54. PERTURBATION THEORY • Perturbation theory studies the question of how eigenvalues and eigenvectors of a matrix A change if we add a small perturbation H.
  • 55. PERTURBATION THEORY Below we will see that the size of the eigengap can also be used in a different context as a quality criterion for spectral clustering, namely when choosing the number k of clusters to construct.
  • 56. Spectral Clustering • Brief Clustering Review • Similarity Graph • Graph Laplacian • Spectral Clustering Algorithm • Graph Cut Point of View • Random Walk Point of View • Perturbation Theory Point of View • Practical Details
  • 57. PRACTICAL DETAILS • Constructing the similarity graph
  • 58. PRACTICAL DETAILS • Computing Eigenvectors 1. How to compute the first eigenvectors efficiently for large L 2. Numerical eigensolvers converge to some orthonormal basis of the eigenspace. • Number of Clusters
  • 59. PRACTICAL DETAILS Well Separated More Blurry Overlap So Much Eigengap Heuristic usually works well if the data contains very well pronounced clusters, but in ambiguous cases it also returns ambiguous results.
  • 60. PRACTICAL DETAILS • The k-means step It is not necessary. People also use other techniques • Which graph Laplacian should be used?
  • 61. PRACTICAL DETAILS • Which graph Laplacian should be used? Why normalized is better than unnormalized spectral clustering? Objective1: Both RatioCut and Ncut directly implement Objective2: Only Ncut implements Normalized spectral clustering implements both clustering objectives mentioned above, while unnormalized spectral clustering only implements the first obejctive.
  • 62. PRACTICAL DETAILS • Which graph Laplacian should be used?
  • 63. REFERENCE • Ulrike Von Luxburg. A Tutorial on Spectral Clustering. Max Planck Institute for Biological Cybernetics Technical Report No. TR-149. • Marina Meila and Jianbo Shi. A random walks view of spectral segmentation. In Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS), 2001. • A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems (NIPS),14, 2001. • A. Azran and Z. Ghahramani. A new Approach to Data Driven Clustering. In International Conference on Machine Learning (ICML),11, 2006. • A. Azran and Z. Ghahramani. Spectral Methods for Automatic Multiscale Data Clustering. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006. • Arik Azran. A Tutorial on Spectral Clustring. http://videolectures.net/mlcued08_azran_mcl/
  • 64. SPECTRAL CLUSTERING Thank you

Editor's Notes

  1. In this presentation, I will mainly talk about how to find a good partitioning of a given graph using spectral properties of that graph. In mathematics, the spectrum of a (finite-dimensional) matrix is the set of its eigenvalues.
  2. In virtually every scientific field dealing with empirical data, people attempt to get a first impression on their data by trying to identify groups of “similar behavior” in their data. Connectivity models: hierarchical clustering; Centroid models: k-means; Distribution models: Mixture of Gaussian.
  3. In this presentation, I will mainly talk about how to find a good partitioning of a given graph using spectral properties of that graph. In mathematics, the spectrum of a (finite-dimensional) matrix is the set of its eigenvalues.
  4. vol(A) measures the size of A by summing over the weights of all edges attached to vertices in A.
  5. 𝜀-neighborhood graph: the distances between all connected points are roughly of the same scale𝜀-neighborhood graph: the distances between all connected points are roughly of the same scale
  6. As each Li is a graph Laplacian of a connected graph, we know that every Li has eigenvalue 0 with multiplicity 1, and the corresponding eigenvector is the constant one vector on the i-th connected component. Thus, the matrix L has as many eigenvalues 0 as there are connected components, and the corresponding eigenvectors are the indicator vectors of the connected components.
  7. Due to the properties of the graph Laplacians that this change of representation is useful. We will see in the next sections that this change of representation enhances the cluster-properties in the data, so that clusters can be trivially detected in the new representation.
  8. 10-nearest neighbor graph: We can see that the first four eigenvalues are 0, and the corresponding eigenvectors are cluster indicator vectors. The reason is that the clusters form disconnected parts in the 10-nearest neighbor graph, in which case the eigenvectors are given as in Propositions 2 and 4.Fully connected graph: this graph only consists of one connected component. Thus, eigenvalue 0 has multiplicity 1, and the first eigenvector is the constant vector. Threshold the second eigenvector at 0. the first four eigenvectors carry all the information about the four clusters.
  9. The results are surprisingly good: Even for clusters that do not form convex regions or that are not cleanly separated, the algorithm reliably finds clusterings consistent with what a human would have chosen.
  10. This problem can be restated as follows: we want to find a partition of the graph such that the edges between different groups have a very low weight (which meansthat points in different clusters are dissimilar from each other) and the edges within a group have high weight (which means that points within the same cluster are similar to each other).
  11. we introduce the factor 1/2 for notational consistency, otherwise we would count each edge twice in the cut.
  12. Spectral clustering is a way to solve relaxed versions of those problems.
  13. Magic happens!!!
  14. Again we need to re-convert the real valued solution matrix to a discrete partition. As above, the standard way is to use the k-means algorithms on the rows of U.
  15. It is well known that many properties of a graph can be expressed in terms of the corresponding random walk transition matrix P
  16. The embedding used in unnormalized spectral clustering is related to the commute time embedding, but not identical.