SlideShare a Scribd company logo
1 of 31
www.edureka.in/data-science
Slide 1
Clustering
www.edureka.in/data-science
Slide 2
Clustering: Scenarios
The following scenarios implement Clustering:
 A telephone company needs to establish its network by putting its towers in a particular region
it has acquired. The location of putting these towers can be found by clustering algorithm so
that all its users receive maximum signal strength.
 Cisco wants to open its new office in California. The management wants to be cordial to its
employees and want their office in a location so that its employees’ commutation is reduced to
minimum.
 The Miami DEA wants to make its law enforcement more stringent and hence have decided to
make their patrol vans stationed across the area so that the areas of high crime rates are in
vicinity to the patrol vans.
 A Hospital Care chain wants to open a series of Emergency-Care wards, keeping in mind the
factor of maximum accident prone areas in a region.
www.edureka.in/data-science
Slide 3
What is Clustering?
Slide 3
Organizing data into clusters such that there is:
 High intra-cluster similarity
 Low inter-cluster similarity
 Informally, finding natural groupings among objects
Why Clustering?
www.edureka.in/data-science
Slide 4
Why Clustering?
Slide 4
 Organizing data into clusters shows internal structure of the data
Ex. Clusty and clustering genes
 Sometimes the partitioning is the goal
Ex. Market segmentation
 Prepare for other AI techniques
Ex. Summarize news (cluster and then find centroid)
 Discovery in data
Ex. Underlying rules, reoccurring patterns, topics, etc.
www.edureka.in/data-science
Slide 5
Clustering algorithms may be classified as:
Exclusive Clustering:
Data is grouped in an exclusive way, so that if a certain
datum belongs to a definite cluster then it could not be
included in another cluster.
E.g. K-means
Overlapping Clustering:
The overlapping clustering, uses fuzzy sets to cluster data,
so that each point may belong to two or more clusters with
different degrees of membership.
E.g. Fuzzy C-means
www.edureka.in/data-science
Slide 6
Hierarchical Clustering:
It is based on the union between the two
nearest clusters. The beginning condition is
realized by setting every datum as a cluster.
There are certain properties which one cluster
receives in hierarchy from another cluster.
Clustering algorithms may be classified as:
www.edureka.in/data-science
Slide 7
“Clustering is in the eye of the beholder."
The most appropriate clustering algorithm for a particular problem often needs to be chosen
experimentally, unless there is a mathematical reason to prefer one cluster model over
another.
It should be noted that an algorithm that is designed for one kind of model has no chance on a
data set that contains a radically different kind of model.
For example, k-means cannot find non-convex clusters.
www.edureka.in/data-science
Similarity/Dissimilarity Measurement
Slide 8
To achieve Clustering, a similarity/dissimilarity
measure must be determined so as to cluster the
data points based either on :
1. Similarity in the data or
2. Dissimilarity in the data
The measure reflects the degree of closeness or
separation of the target objects and should
correspond to the characteristics that are
believed to distinguish the clusters embedded in
the data.
Measurement
Similarity Dissimilarity
www.edureka.in/data-science
Slide 9
Similarity Measurement
Similarity measures the degree to which a pair of
objects are alike.
Concerning structural patterns represented as strings
or sequences of symbols, the concept of pattern
resemblance has typically been viewed from three
main perspectives:
 Similarity as matching, according to which
patterns are seen as different viewpoints,
possible instantiations or noisy versions of the
same object;
 Structural resemblance, based on the similarity of
their composition rules and primitives;
 Content-based similarity.
www.edureka.in/data-science
Dissimilarity Measurement: Distance Measures
Slide 10
Similarity can also be measured in
terms of the placing of data points.
By finding the distance between the
data points , the distance/difference of
the point to the cluster can be found.
Distance
Measures
Euclidean Distance Measure
Manhattan Distance Measure
Cosine Distance Measure
Tanimoto Distance Measure
Squared Euclidean Distance
Measure
www.edureka.in/data-science
Slide 11
Difference between Euclidean and Manhattan
From this image we can say that, The Euclidean distance measure gives 5.65 as the distance
between (2, 2) and (6, 6) whereas the Manhattan distance is 8.0
Slide 11
Mathematically, Euclidean distance between two
n-dimensional vectors
(a1, a2, ... , an) and (b1,b2,...,bn) is:
d = |a1 – b1| + |a2 – b2| + ... + |an – bn|
Manhattan distance between two n-
dimensional vectors
www.edureka.in/data-science
Cosine Distance Measure
The formula for the cosine distance between n-dimensional vectors
(a1, a2, ... , an) and (b1, b2, ...,bn) is
Slide 12
www.edureka.in/data-science
Slide 13
Slide 13
K-Means Clustering
www.edureka.in/data-science
Slide 14
Slide 14
K-Means Clustering
The process by which objects are classified into
a number of groups so that they are as much
dissimilar as possible from one group to another
group, but as much similar as possible within
each group.
The objects in group 1 should be as similar as
possible.
But there should be much difference between an
object in group 1 and group 2.
The attributes of the objects are allowed to
determine which objects should be grouped
together.
Total population
Group 1
Group 2 Group 3
Group 4
www.edureka.in/data-science
Slide 15
Slide 15
Current
Balance
High
High
Medium
Medium
Low
Low
Gross Monthly Income
Example Cluster 1
High Balance
Low Income
Example Cluster 2
High Income
Low Balance
 Cluster 1 and Cluster 2 are being differentiated by Income and Current Balance.
 The objects in Cluster 1 have similar characteristics (High Income and Low balance), on the other hand
the objects in Cluster 2 have the same characteristic (High Balance and Low Income).
 But there are much differences between an object in Cluster 1 and an object in Cluster 2.
Basic concepts of Cluster Analysis using two variables
K-Means Clustering
www.edureka.in/data-science
Slide 16
Process Flow of K-means
Iterate until stable (cluster centers converge):
1. Determine the centroid coordinate.
2. Determine the distance of each object to the
centroids.
3. Group the object based on minimum
distance (find the closest centroid)
Start
Number of
Cluster K
Centroid
Distance objects to
centroids
Grouping based on
minimum distance
End
No object
move
group?
+
www.edureka.in/data-science
Slide 17
K-Means Clustering Use-Case:
Problem Statement:
The newly appointed Governor has finally decided to do something for the society and wants to open
a chain of schools across a particular region, keeping in mind the distance travelled by children is
minimum, so that the percentage turnout is more.
Poor fella cannot decide himself and has asked its Data Science team to come up with the solution.
Bet, these guys have the solution to almost everything!!
www.edureka.in/data-science
Slide 18
Slide 18
K-Means Clustering Steps
1. If k=4, we select 4 random points in
the 2d space and assume them to be
cluster centers for the clusters to be
created.
www.edureka.in/data-science
Slide 19
Slide 19
2. We take up a random data point from
the space and find out its distance from
all the 4 clusters centers.
If the data point is closest to the pink
cluster center, it is colored pink.
K-Means Clustering Steps
www.edureka.in/data-science
Slide 20
Slide 20
3. Now we calculate the centroid of all
the pink points and assign that point
as the cluster center for that cluster.
Similarly, we calculate centroids for all
the 4 colored(clustered) points and
assign the new centroids as the
cluster centers.
K-Means Clustering Steps
www.edureka.in/data-science
Slide 21
Slide 21
4. Step-2 and step-3 are run iteratively, unless the cluster centers converge at a point and no
longer move.
Iteration-1 Iteration-2
K-Means Clustering Steps
www.edureka.in/data-science
Slide 22
Iteration-3 Iteration-4
5. We can see that the cluster centers are still not converged so we go ahead and iterate it more.
K-Means Clustering Steps
www.edureka.in/data-science
Slide 23
Finally, after multiple iterations, we reach a
stage where the cluster centers coverge and
the clusters look like as:
Here we have performed:
Iterations: 5
K-Means Clustering Steps
Slide 24 www.edureka.in/data-science
Q1. In cluster analysis objects are classified into a number of groups so that
1. They are as much dissimilar as possible from one group to another group,
but as much similar as possible within each group.
2. They are as much similar as possible from one group to another group, but
as much dissimilar as possible within each group.
Annie’s Question
Slide 25 www.edureka.in/data-science
Correct Answer.
Option 1: They are as much dissimilar as possible from one group to another
group, but as much similar as possible within each group.
Annie’s Answer
Slide 26 www.edureka.in/data-science
K-Means Mathematical Formulation
Distortion = =
(within cluster sum of squares)



m
i
i
i c
x
1
2
)
(  
 

k
j OwnedBy
i
j
i
j
x
1 )
(
2
)
(


Owned By(.): set of records that belong to the specified cluster center
D={x1,x2,…,xi,…,xm}  data set of m records
xi=(xi1,xi2,…,xin)  each record is an n-dimensional vector
ci =
cluster(xi)=
Slide 27 www.edureka.in/data-science
Goal: Find cluster centers that minimize Distortion
Solution can be found by setting the partial derivative of Distortion w.r.t. each cluster center to zero








)
(
2
)
(
Distortion
j
OwnedBy
i
j
i
j
j
x









)
(
)
(
2
j
OwnedBy
i
j
i
x

 minimum)
(for
0





)
(
|
)
(
|
1
j
OwnedBy
i
i
j
j x
OwnedBy 


K-Means Mathematical Formulation
Slide 28 www.edureka.in/data-science
Will we find the Optimal Solution?
Not necessarily!
Try to come up with a converged solution, but does not have minimum distortion:
We might get stuck in local minimum, and not a global minimum
Slide 29 www.edureka.in/data-science
 Choose first center at random
 Choose second center that is far away from the first center
 … Choose jth center as far away as possible from the closest of centers 1 through
(j-1)
Idea 1: careful about where we start
Idea 2: Do many runs of K-means, each with different random starting point
How to find Optimal Solution?
Slide 30 www.edureka.in/data-science
Choosing the Number of Clusters
Elbow method
Objective
Function
Value
i.e.,
Distortion
K means Clustering

More Related Content

What's hot

K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means ClusteringAnna Fensel
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
 
K means clustering
K means clusteringK means clustering
K means clusteringAhmedasbasb
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithmhadifar
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data MiningValerii Klymchuk
 
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...Simplilearn
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...Edureka!
 

What's hot (20)

K means clustering
K means clusteringK means clustering
K means clustering
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Clustering
ClusteringClustering
Clustering
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Kmeans
KmeansKmeans
Kmeans
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
 

Viewers also liked

Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysisguru_prasadg
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysisguest0edcaf
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationAdnan Masood
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysisDataminingTools Inc
 
phase rule & phase diagram
phase rule & phase diagramphase rule & phase diagram
phase rule & phase diagramYog's Malani
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionAdnan Masood
 
The phase rule
The phase ruleThe phase rule
The phase ruleJatin Garg
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesGilad Barkan
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsPrashanth Guntal
 
Coacervation Phase Separation Techniques
Coacervation Phase Separation TechniquesCoacervation Phase Separation Techniques
Coacervation Phase Separation TechniquesGargi Nanda
 
Clustering training
Clustering trainingClustering training
Clustering trainingGabor Veress
 
Phase Diagrams and Phase Rule
Phase Diagrams and Phase RulePhase Diagrams and Phase Rule
Phase Diagrams and Phase RuleRuchi Pandey
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithmDarshak Mehta
 

Viewers also liked (18)

Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
phase rule & phase diagram
phase rule & phase diagramphase rule & phase diagram
phase rule & phase diagram
 
Phase rule
Phase rulePhase rule
Phase rule
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
The phase rule
The phase ruleThe phase rule
The phase rule
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummies
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Coacervation Phase Separation Techniques
Coacervation Phase Separation TechniquesCoacervation Phase Separation Techniques
Coacervation Phase Separation Techniques
 
Clustering training
Clustering trainingClustering training
Clustering training
 
Phase Diagrams and Phase Rule
Phase Diagrams and Phase RulePhase Diagrams and Phase Rule
Phase Diagrams and Phase Rule
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
 

Similar to K means Clustering

International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingIOSR Journals
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptionsrefedey275
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications
 
Predicting Students Performance using K-Median Clustering
Predicting Students Performance using  K-Median ClusteringPredicting Students Performance using  K-Median Clustering
Predicting Students Performance using K-Median ClusteringIIRindia
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataKathleneNgo
 
Dynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c meansDynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c meansWrishin Bhattacharya
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxSureshPolisetty2
 
Exploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationExploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationChristopher Peter Makris
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringIJERD Editor
 

Similar to K means Clustering (20)

International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in Datamining
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
Premeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means ClusteringPremeditated Initial Points for K-Means Clustering
Premeditated Initial Points for K-Means Clustering
 
Predicting Students Performance using K-Median Clustering
Predicting Students Performance using  K-Median ClusteringPredicting Students Performance using  K-Median Clustering
Predicting Students Performance using K-Median Clustering
 
Visualization of Crisp and Rough Clustering using MATLAB
Visualization of Crisp and Rough Clustering using MATLABVisualization of Crisp and Rough Clustering using MATLAB
Visualization of Crisp and Rough Clustering using MATLAB
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports Data
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
47 292-298
47 292-29847 292-298
47 292-298
 
Dynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c meansDynamic clustering algorithm using fuzzy c means
Dynamic clustering algorithm using fuzzy c means
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptx
 
Exploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationExploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image Segmentation
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 

More from Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 

Recently uploaded (20)

4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 

K means Clustering

  • 2. www.edureka.in/data-science Slide 2 Clustering: Scenarios The following scenarios implement Clustering:  A telephone company needs to establish its network by putting its towers in a particular region it has acquired. The location of putting these towers can be found by clustering algorithm so that all its users receive maximum signal strength.  Cisco wants to open its new office in California. The management wants to be cordial to its employees and want their office in a location so that its employees’ commutation is reduced to minimum.  The Miami DEA wants to make its law enforcement more stringent and hence have decided to make their patrol vans stationed across the area so that the areas of high crime rates are in vicinity to the patrol vans.  A Hospital Care chain wants to open a series of Emergency-Care wards, keeping in mind the factor of maximum accident prone areas in a region.
  • 3. www.edureka.in/data-science Slide 3 What is Clustering? Slide 3 Organizing data into clusters such that there is:  High intra-cluster similarity  Low inter-cluster similarity  Informally, finding natural groupings among objects Why Clustering?
  • 4. www.edureka.in/data-science Slide 4 Why Clustering? Slide 4  Organizing data into clusters shows internal structure of the data Ex. Clusty and clustering genes  Sometimes the partitioning is the goal Ex. Market segmentation  Prepare for other AI techniques Ex. Summarize news (cluster and then find centroid)  Discovery in data Ex. Underlying rules, reoccurring patterns, topics, etc.
  • 5. www.edureka.in/data-science Slide 5 Clustering algorithms may be classified as: Exclusive Clustering: Data is grouped in an exclusive way, so that if a certain datum belongs to a definite cluster then it could not be included in another cluster. E.g. K-means Overlapping Clustering: The overlapping clustering, uses fuzzy sets to cluster data, so that each point may belong to two or more clusters with different degrees of membership. E.g. Fuzzy C-means
  • 6. www.edureka.in/data-science Slide 6 Hierarchical Clustering: It is based on the union between the two nearest clusters. The beginning condition is realized by setting every datum as a cluster. There are certain properties which one cluster receives in hierarchy from another cluster. Clustering algorithms may be classified as:
  • 7. www.edureka.in/data-science Slide 7 “Clustering is in the eye of the beholder." The most appropriate clustering algorithm for a particular problem often needs to be chosen experimentally, unless there is a mathematical reason to prefer one cluster model over another. It should be noted that an algorithm that is designed for one kind of model has no chance on a data set that contains a radically different kind of model. For example, k-means cannot find non-convex clusters.
  • 8. www.edureka.in/data-science Similarity/Dissimilarity Measurement Slide 8 To achieve Clustering, a similarity/dissimilarity measure must be determined so as to cluster the data points based either on : 1. Similarity in the data or 2. Dissimilarity in the data The measure reflects the degree of closeness or separation of the target objects and should correspond to the characteristics that are believed to distinguish the clusters embedded in the data. Measurement Similarity Dissimilarity
  • 9. www.edureka.in/data-science Slide 9 Similarity Measurement Similarity measures the degree to which a pair of objects are alike. Concerning structural patterns represented as strings or sequences of symbols, the concept of pattern resemblance has typically been viewed from three main perspectives:  Similarity as matching, according to which patterns are seen as different viewpoints, possible instantiations or noisy versions of the same object;  Structural resemblance, based on the similarity of their composition rules and primitives;  Content-based similarity.
  • 10. www.edureka.in/data-science Dissimilarity Measurement: Distance Measures Slide 10 Similarity can also be measured in terms of the placing of data points. By finding the distance between the data points , the distance/difference of the point to the cluster can be found. Distance Measures Euclidean Distance Measure Manhattan Distance Measure Cosine Distance Measure Tanimoto Distance Measure Squared Euclidean Distance Measure
  • 11. www.edureka.in/data-science Slide 11 Difference between Euclidean and Manhattan From this image we can say that, The Euclidean distance measure gives 5.65 as the distance between (2, 2) and (6, 6) whereas the Manhattan distance is 8.0 Slide 11 Mathematically, Euclidean distance between two n-dimensional vectors (a1, a2, ... , an) and (b1,b2,...,bn) is: d = |a1 – b1| + |a2 – b2| + ... + |an – bn| Manhattan distance between two n- dimensional vectors
  • 12. www.edureka.in/data-science Cosine Distance Measure The formula for the cosine distance between n-dimensional vectors (a1, a2, ... , an) and (b1, b2, ...,bn) is Slide 12
  • 14. www.edureka.in/data-science Slide 14 Slide 14 K-Means Clustering The process by which objects are classified into a number of groups so that they are as much dissimilar as possible from one group to another group, but as much similar as possible within each group. The objects in group 1 should be as similar as possible. But there should be much difference between an object in group 1 and group 2. The attributes of the objects are allowed to determine which objects should be grouped together. Total population Group 1 Group 2 Group 3 Group 4
  • 15. www.edureka.in/data-science Slide 15 Slide 15 Current Balance High High Medium Medium Low Low Gross Monthly Income Example Cluster 1 High Balance Low Income Example Cluster 2 High Income Low Balance  Cluster 1 and Cluster 2 are being differentiated by Income and Current Balance.  The objects in Cluster 1 have similar characteristics (High Income and Low balance), on the other hand the objects in Cluster 2 have the same characteristic (High Balance and Low Income).  But there are much differences between an object in Cluster 1 and an object in Cluster 2. Basic concepts of Cluster Analysis using two variables K-Means Clustering
  • 16. www.edureka.in/data-science Slide 16 Process Flow of K-means Iterate until stable (cluster centers converge): 1. Determine the centroid coordinate. 2. Determine the distance of each object to the centroids. 3. Group the object based on minimum distance (find the closest centroid) Start Number of Cluster K Centroid Distance objects to centroids Grouping based on minimum distance End No object move group? +
  • 17. www.edureka.in/data-science Slide 17 K-Means Clustering Use-Case: Problem Statement: The newly appointed Governor has finally decided to do something for the society and wants to open a chain of schools across a particular region, keeping in mind the distance travelled by children is minimum, so that the percentage turnout is more. Poor fella cannot decide himself and has asked its Data Science team to come up with the solution. Bet, these guys have the solution to almost everything!!
  • 18. www.edureka.in/data-science Slide 18 Slide 18 K-Means Clustering Steps 1. If k=4, we select 4 random points in the 2d space and assume them to be cluster centers for the clusters to be created.
  • 19. www.edureka.in/data-science Slide 19 Slide 19 2. We take up a random data point from the space and find out its distance from all the 4 clusters centers. If the data point is closest to the pink cluster center, it is colored pink. K-Means Clustering Steps
  • 20. www.edureka.in/data-science Slide 20 Slide 20 3. Now we calculate the centroid of all the pink points and assign that point as the cluster center for that cluster. Similarly, we calculate centroids for all the 4 colored(clustered) points and assign the new centroids as the cluster centers. K-Means Clustering Steps
  • 21. www.edureka.in/data-science Slide 21 Slide 21 4. Step-2 and step-3 are run iteratively, unless the cluster centers converge at a point and no longer move. Iteration-1 Iteration-2 K-Means Clustering Steps
  • 22. www.edureka.in/data-science Slide 22 Iteration-3 Iteration-4 5. We can see that the cluster centers are still not converged so we go ahead and iterate it more. K-Means Clustering Steps
  • 23. www.edureka.in/data-science Slide 23 Finally, after multiple iterations, we reach a stage where the cluster centers coverge and the clusters look like as: Here we have performed: Iterations: 5 K-Means Clustering Steps
  • 24. Slide 24 www.edureka.in/data-science Q1. In cluster analysis objects are classified into a number of groups so that 1. They are as much dissimilar as possible from one group to another group, but as much similar as possible within each group. 2. They are as much similar as possible from one group to another group, but as much dissimilar as possible within each group. Annie’s Question
  • 25. Slide 25 www.edureka.in/data-science Correct Answer. Option 1: They are as much dissimilar as possible from one group to another group, but as much similar as possible within each group. Annie’s Answer
  • 26. Slide 26 www.edureka.in/data-science K-Means Mathematical Formulation Distortion = = (within cluster sum of squares)    m i i i c x 1 2 ) (      k j OwnedBy i j i j x 1 ) ( 2 ) (   Owned By(.): set of records that belong to the specified cluster center D={x1,x2,…,xi,…,xm}  data set of m records xi=(xi1,xi2,…,xin)  each record is an n-dimensional vector ci = cluster(xi)=
  • 27. Slide 27 www.edureka.in/data-science Goal: Find cluster centers that minimize Distortion Solution can be found by setting the partial derivative of Distortion w.r.t. each cluster center to zero         ) ( 2 ) ( Distortion j OwnedBy i j i j j x          ) ( ) ( 2 j OwnedBy i j i x   minimum) (for 0      ) ( | ) ( | 1 j OwnedBy i i j j x OwnedBy    K-Means Mathematical Formulation
  • 28. Slide 28 www.edureka.in/data-science Will we find the Optimal Solution? Not necessarily! Try to come up with a converged solution, but does not have minimum distortion: We might get stuck in local minimum, and not a global minimum
  • 29. Slide 29 www.edureka.in/data-science  Choose first center at random  Choose second center that is far away from the first center  … Choose jth center as far away as possible from the closest of centers 1 through (j-1) Idea 1: careful about where we start Idea 2: Do many runs of K-means, each with different random starting point How to find Optimal Solution?
  • 30. Slide 30 www.edureka.in/data-science Choosing the Number of Clusters Elbow method Objective Function Value i.e., Distortion