SlideShare a Scribd company logo
1 of 8
Download to read offline
IOSR Journal of Computer Engineering (IOSRJCE)
ISSN: 2278-0661 Volume 4, Issue 6 (Sep-Oct. 2012), PP 23-30
www.iosrjournals.org
www.iosrjournals.org 23 | Page
A Comprehensive Overview of Clustering Algorithms in Pattern
Recognition
Namratha M
1
, Prajwala T R
2
1, 2
(Dept. of information science, PESIT/visvesvaraya technological university, India)
Abstract: Machine learning is a branch of artificial intelligence which recognizes complex patterns for
making intelligent decisions based on input data values. In machine learning, pattern recognition assigns
input value to given set of data labels.
Based on the learning method used to generate the output we have the following classification, supervised and
unsupervised learning. Unsupervised learning involves clustering and blind signal separation. Supervised
learning is also known as classification.
This paper mainly focuses on clustering techniques such as K-means clustering, hierarchical
clustering which in turn involves agglomerative and divisive clustering techniques.
This paper deals with introduction to machine learning, pattern recognition, clustering
techniques. We present steps involved in each of these clustering techniques along with an example and the
necessary formula used. The paper discusses the advantages and disadvantages of each of these techniques
and in turn we make a comparison of K-means and hierarchical clustering techniques. Based on
these comparisons we suggest the best suited clustering technique for the specified application.
Keywords: Agglomerative, Clustering, Divisive, K-means, Machine learning, Pattern recognition
I. Introudction
Machine learning is the field of research devoted to study of learning systems. Machine learning
refers to changes in the systems that perform tasks associated with artificial intelligence like recognition,
diagnosis, prediction and so on.
In machine learning[1], pattern recognition is assignment of label to given input value.
A pattern is an entity like fingerprint image, handwritten word or human face that could be given a
name. Recognition is an act of associating a classification with a label.
Pattern recognition[2] is the science of making inferences based on data. It’s objective is to assign
an object or event to one of a number of categories based on features derived to emphasize commonalities.
Figure 1: Design cycle for patter recognition
Pattern recognition involves three types of learning:
1. Unsupervised learning
2. Supervised learning
3. Semisupervised learning
In unsupervised learning[2] also known as cluster analysis, the basic task is to develop
classification labels. It’s task is to arrive at some grouping of data. The training set consists of labeled data.
Two types of unsupervised learning are:
a. Clustering
b. Blind signal separation
In supervised learning[2], classes are predetermined. The classes are seen as a finite set of data. A
certain segment of data will be labeled with these classification. The task is to search for patterns and
construct mathematical models. The training set consists of unlabeled data.
AComprehensive Overview Of Clustering Algorithms In Pattern Recognition
www.iosrjournals.org 24 | Page
Two types of supervised learning are:
a. Classification
b. Ensemble learning
Semisupervised learning deals with methods for exploiting unlabeled data and labeled data
automatically to improve learning performance without human intervention.
Four types of semisupervised learning are:
a. Deep learning
b. Low density separation
c. Graph based methods
d. Heuristic approach
Clustering is a form of unsupervised learning which involves the task of finding groups of objects
which are similar to one another and different from the objects in another group. The goal is to minimize
intracluster distances and maximize intercluster distances[3].
Figure 2: Graphical representation of clustering
II. K-Means Clustering
K-means is one of the simplest unsupervised learning algorithm that is used to generate specific
number of disjoint and non-hierarchical clusters based on attributes[4]. It is a numerical, non-
deterministic and iterative method to find the clusters. The purpose is to classify data.
Steps in K-means clustering:
Step 1: Consider K points to be clustered x1,…, xK. These are represented in a space in which objects are being
clustered. These points represent initial centroids.
Step 2: Each object is assigned to the group that has closest centroid[5].
Step 3: The positions of K centroids are recalculated after all objects have been assigned. C(i) denotes cluster
number for the ith observation[5]
Step 4: Reiterate steps 2 and 3 until no other distinguished centroid can be found. Hence, K clusters whose
intracluster distance is minimized and intercluster distance is maximized[5].
where
mk is the mean vector of the kth cluster
.,,1,
)(:
Kk
N
x
m
k
kiCi
i
k 

NimxiC
Kk
ki ,,1,minarg)(
1
2


       

K
k kiC
kik
K
k kiC kjC
ji mxNxxCW
1 )(
2
1 )( )(
2
2
1
)(
AComprehensive Overview Of Clustering Algorithms In Pattern Recognition
www.iosrjournals.org 25 | Page
Nk is the number of observations in k
th
cluster
The choice of initial cluster can greatly affect the final clusters in terms of intracluster
distance, intercluster distance and cohesion.
The sum of squares of distance between object and corresponding cluster centroid is minimum in the
final cluster.
Figure 3: Flowchart to represent steps in K-means clustering
Advantages:
1. K-means is computationally fast.
2. It is a simple and understandable unsupervised learning algorithm
Disadvantages:
1. Difficult to identify the initial clusters.
2. Prediction of value of K is difficult because the number of clusters is fixed at the beginning.
3. The final cluster patterns is dependent on the initial patterns.
Example: Problem:
To find the cluster of 5 points: (2,3),(4,6),(7,3),(1,2),(8,6).
Solution:
The initial clusters are (4,6) and (2,3) Iteration 1:
(4,6) (2,3) Cluster
(2,3) 5 0 2
(1,2) 7 2 2
(4,6) 0 5 1
(8,6) 4 9 1
(7,3) 6 5 2
The cluster column is calculated by finding the shortest distance between the points[6]. The new values
of centroid are (6,6) and (10/3,8/3).
Iteration 2:
Repeat the above steps to find the new values of centroids. Since the values converge, we do not
proceed to next iteration. Hence the final clusters are :
Cluster 1: (4,6) and (8,6) Cluster 2:(2,3) , (1,2) and (7,3) Applications[7]:
 Used in segementation and retreival of grey level images[6].
 Applied for spatial and temporal datasets in the field of geostatics.
 Used to analyze the listed enterprises in financial organizations[4].
 Also used in the fields of astronomy and agriculture.
AComprehensive Overview Of Clustering Algorithms In Pattern Recognition
www.iosrjournals.org 26 | Page
III. Hierarchical Clustering
It is an unsupervised learning technique that outputs a hierarchical structure which does not require to
prespecify the nuimber of clusters. It is a deteministic algorithm[3].
There are two kinds of hierarchical clustering:
1. Agglomerative clustering
2. Divisive clusttering
Agglomerative clustering:
It is a bottom up approach with n singleton clusters initially where each cluster has subclusters which in
turn have subclusters and so on[9].
Steps in agglomerative clustering:
Step 1: Each singleton group is assigned with unique data points.
Step 2: Merge the two adjacent groups iteratvely repeat this step. Calculate the Euclidian distance using the
formula given below[8],
Where a(x1,y1) and b(x2,y2) represent the coordinates of the clusters . Mean distance dmean(Di,Dj)=||xi-xj||
Where Di and Dj represent the clusters i and j respectively
Xi and xj are the means of clusters i and j respectively
Step 3: Repeat until a single cluster is obtained.
Figure 4: Flowchart for steps in agglomerative clustering
Advantages:
1. It ranks the objects for easier data display.
2. Small clusters are obtained which is easier to analyze and understand.
3. Number of clusters is not fixed at the beginning. Hence, user has the flexibility of choosing the clusters
dynamically.
Disadvantages:
1. If objects are grouped incorrectly at the initial stages , they cannot be relocated at later stages.
AComprehensive Overview Of Clustering Algorithms In Pattern Recognition
www.iosrjournals.org 27 | Page
2. The results vary based on the distance metrics used.
Example: Problem:
To find the cluster of 5 points: A(2,3),B(4,6),C(7,3),D(1,2),E(8,6).
Solution:
Iteration 1:
Calculate the Euclidian distance between two points. Euclidian distance between two points are:
A(2,3) and B(1,2)=sqrt(2)=1.41
A(2,3) and C(4,6)=sqrt(13)=3.6
A(2,3) and D(8,6)=sqrt(25)=5
A(2,3) and E(7,3)= sqrt(25)=5
The two adjacent clusters are A(2,3) and B(1,2). Merge these two clusters. The new centroid is F(1.5,2.5).
Iteration 2:
Repeat the above step and merge adjacent clusters as above.
The two adjacent clusters are C(4,6) and D(8,6). Merge these two clusters. The new centroid is G(6,6).
Iteration 3:
Repeat the above step and merge adjacent clusters as above.
The two adjacent clusters are G(6,6) and E(7,3). Merge these two clusters. The new centroid is H(6.5,4.5).
Iteration 4:
Repeat the above step and merge adjacent clusters as above.
The two adjacent clusters are H(1.5,2.5) and F(6.5,4.5). Merge these two clusters. Finally we get the resultant
single cluster R.
Figure 5: Diagrammatic representation of agglomerative clustering for the above example
Applications:
1. Used in search engine query logs for knowledge discovery.
2. Used in image classification systems to merge logically adjacent pixel values.
3. Used in automatic document classification.
4. Used in web document categorization.
Divisive clustering:
It is a top-down clustering method which works in a similar way to agglomerative clustering
but in the opposite direction. This method starts with a single cluster containing all objects and then
successively splits resulting clusters until only clusters of individual objects remain[10].
Steps in divisive clustering:
Step 1: Initially consider a singleton cluster.
Step 2: Iteratively divide the clusters into smaller clusters based on the Euclidian distance. Objects with least
Euclidian distance are grouped into a single cluster[8].
AComprehensive Overview Of Clustering Algorithms In Pattern Recognition
www.iosrjournals.org 28 | Page
Where a(x1,y1) and b(x2,y2) represent the coordinates of the clusters .
Step 3: Repeat the process until desired number of clusters are obtained and Euclidian distance remains
constant to obtain the final dendogram.
Figure 6: Flowchart for steps in divisive clustering
Advantages:
1. Focuses on the upper levels of dendogram.
2. We have access to all the data, hence the best possible solution is obtained.
Disadvantages:
1. Computational difficulties arise while splitting the clusters.
2. The results vary based on the distance metrics used.
Example: Problem:
To find the cluster of 5 points: A(2,3),B(4,6),C(7,3),D(1,2),E(8,6).
AComprehensive Overview Of Clustering Algorithms In Pattern Recognition
www.iosrjournals.org 29 | Page
Solution:
Iteration 1: Calculate the Euclidian distance between two points. Euclidian distance between two points are:
A(2,3) B(4,6) C(7,3) D(1,2) E(8,6)
A(2,3) Sqrt(13) Sqrt(25) Sqrt(2) Sqrt(45)
B(4,6) Sqrt(18) Sqrt(25) Sqrt(16)
C(7,3) Sqrt(37) Sqrt(10)
D(1,2) Sqrt(65)
E(8,6)
Since sqrt(2) is the least Euclidian distance merge the point sA(2,3) and D(1,2)
The new centroid is F(1.5,2.5). Iteration 2:
Repeat the above step and merge adjacent clusters with least Euclidian distance as above. The two
adjacent clusters are C(7,3) and E(8,6). Merge these two clusters.
The new centroid is G(7.5,4.5). Iteration 3:
Repeat the above step and merge adjacent clusters with least Euclidian distance as above.
The two adjacent clusters are B(4,6) and G(7.5,4.5). Merge these two clusters.
Figure 7: The resulting dendogram for the above example[11].
Applications:
1. Used in medical imaging for PET scans.
2. Used in world Wide Web in social networking analysis and slippy map optimization.
3. Used in market research for grouping shopping items.
4. Used in crime analysis to find hot spots where crime has occurred.
5. Also used in mathematical chemistry and petroleum geology.
Agglomerative versus divisive clustering:
Table 1:Comparison of hierarchical clustering techniques
Agglomerative Divisive
1. Bottom-up approach Top-down approach
2. Faster to compute Slower to compute
3. More ration al to global structure of data. Less blind to global structure of data
4. Best possible merge is obtained Best possible split is obtained
5. Access to individual objects Access to all data
IV. K-Means versus Hierarchical Clustering
Table 2:Comparison of clustering techniques
Hierarchical K means
Sequential partitioning process Iterative partitioning process
Results in nested cluster structure Results in Flat mutually exclusive structure
Membership of an object or cluster in fixed Membership of an object or cluster could be
constantly changed.
Prior knowledge of the number of clusters is not
needed.
Prior knowledge of the number of clusters is
needed in advance.
Generic clustering technique irrespective of the
data types.
Data are Summarized by representative entities.
Run time is slow. Run time Faster than Hierarchical.
Hierarchical clustering requires only a similarity
measure.
K-means clustering requires stronger
assumptions such as number of clusters and the
initial centers.
AComprehensive Overview Of Clustering Algorithms In Pattern Recognition
www.iosrjournals.org 30 | Page
V. Conclusion
This paper discusses the clustering techniques along with an illustrative example. By comparing
the advantages and disadvantages of each of these techniques we made a list of the applications
where the techniques could be used. Whenever we require a sequential partitioning and time is not a constraint
hierarchical clustering can be used. Contradictorily when prior knowledge of clusters is available and mutually
exclusive structure is used as training data we use K-means clustering. Each of the techniques described in
this paper has it’s own advantages and disadvantages. To overcome these disadvantages optimization
techniques can be used for better performance.
References
[1] Tom Mitchell: "Machine Learning", McGraw Hill, 1997.
[2] http://www.springer.com/computer/image+processing/book/978-0-387-31073-2
[3] Data Clustering: A Review A.K. JAIN Michigan State University M.N. MURTY Indian Institute of Science AND P.J. FLYNN
The Ohio State University
[4] http://books.ithunder.org/NLP/%E6%90%9C%E7%B4%A2%E8%B5%84%E6%96%99/%E6%96%87%E6%9C%AC%E8%81%
9A%E7%B1%BB/k-means/kmeans11.pdf
[5] http://gecco.org.chemie.uni-frankfurt.de/hkmeans/H-k-means.pdf
[6] http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/kmeans.html#macqueen
[7] http://delivery.acm.org/10.1145/2350000/2345414/p106
mishra.pdf?ip=119.82.126.162&acc=ACTIVE%20SERVICE&CFID=109187129&CFTOKEN=83079467& a cm
=1346224103_195980e7451e402acb712a927456104d
[8] http://www.cs.princeton.edu/courses/archive/spr07/cos424/papers/bishop-regression.pdf
[9] http://delivery.acm.org/10.1145/2010000/2003657/p34-
spiegel.pdf?ip=119.82.126.162&acc=ACTIVE%20SERVICE&CFID=109187129&CFTOKEN=83079467& acm
[10] =1346223866_ec4f1d23636d3f275175a2f7bc11c432
[11] http://www.frontiersinai.com/ecai/ecai2004/ecai04/pdf/p0435.pdf
[12] http://delivery.acm.org/10.1145/950000/944973/3-1265-
dhillon.pdf?ip=119.82.126.162&acc=PUBLIC&CFID=109187129&CFTOKEN=83079467& acm =1346223975
[13] _b9279bd748e1660bad277d28c67683cc

More Related Content

What's hot

A Review of Image Classification Techniques
A Review of Image Classification TechniquesA Review of Image Classification Techniques
A Review of Image Classification TechniquesIRJET Journal
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
 
Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Rakibul Hasan Pranto
 
05210401 P R O B A B I L I T Y T H E O R Y A N D S T O C H A S T I C P R...
05210401  P R O B A B I L I T Y  T H E O R Y  A N D  S T O C H A S T I C  P R...05210401  P R O B A B I L I T Y  T H E O R Y  A N D  S T O C H A S T I C  P R...
05210401 P R O B A B I L I T Y T H E O R Y A N D S T O C H A S T I C P R...guestd436758
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...csandit
 
Smu bsc it Spring 2014 solved assignments
Smu bsc it Spring 2014  solved assignmentsSmu bsc it Spring 2014  solved assignments
Smu bsc it Spring 2014 solved assignmentssmumbahelp
 
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...csandit
 
Pattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifierPattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifierNayem Nayem
 
Designing A Minimum Distance to Class Mean Classifier
Designing A Minimum Distance to Class Mean ClassifierDesigning A Minimum Distance to Class Mean Classifier
Designing A Minimum Distance to Class Mean ClassifierMd. Toufique Hasan
 
Pattern Recognition: Class mean classifier
Pattern Recognition: Class mean classifierPattern Recognition: Class mean classifier
Pattern Recognition: Class mean classifierMd Mamunur Rashid
 
HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITIONHANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITIONcsandit
 
Feature selection in multimodal
Feature selection in multimodalFeature selection in multimodal
Feature selection in multimodalijcsa
 
IRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET Journal
 
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SETPROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SETIJCSEIT Journal
 
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITIONARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITIONsipij
 
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...cscpconf
 
Mm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsMm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsEellekwameowusu
 

What's hot (17)

A Review of Image Classification Techniques
A Review of Image Classification TechniquesA Review of Image Classification Techniques
A Review of Image Classification Techniques
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 
Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019
 
05210401 P R O B A B I L I T Y T H E O R Y A N D S T O C H A S T I C P R...
05210401  P R O B A B I L I T Y  T H E O R Y  A N D  S T O C H A S T I C  P R...05210401  P R O B A B I L I T Y  T H E O R Y  A N D  S T O C H A S T I C  P R...
05210401 P R O B A B I L I T Y T H E O R Y A N D S T O C H A S T I C P R...
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
 
Smu bsc it Spring 2014 solved assignments
Smu bsc it Spring 2014  solved assignmentsSmu bsc it Spring 2014  solved assignments
Smu bsc it Spring 2014 solved assignments
 
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
 
Pattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifierPattern Recognition - Designing a minimum distance class mean classifier
Pattern Recognition - Designing a minimum distance class mean classifier
 
Designing A Minimum Distance to Class Mean Classifier
Designing A Minimum Distance to Class Mean ClassifierDesigning A Minimum Distance to Class Mean Classifier
Designing A Minimum Distance to Class Mean Classifier
 
Pattern Recognition: Class mean classifier
Pattern Recognition: Class mean classifierPattern Recognition: Class mean classifier
Pattern Recognition: Class mean classifier
 
HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITIONHANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
 
Feature selection in multimodal
Feature selection in multimodalFeature selection in multimodal
Feature selection in multimodal
 
IRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using ClusteringIRJET- Performance Analysis of Optimization Techniques by using Clustering
IRJET- Performance Analysis of Optimization Techniques by using Clustering
 
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SETPROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
 
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITIONARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
ARABIC HANDWRITTEN CHARACTER RECOGNITION USING STRUCTURAL SHAPE DECOMPOSITION
 
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
 
Mm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsMm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithms
 

Viewers also liked

Efficient Design of Reversible Sequential Circuit
Efficient Design of Reversible Sequential CircuitEfficient Design of Reversible Sequential Circuit
Efficient Design of Reversible Sequential CircuitIOSR Journals
 
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...
A Study in Employing Rough Set Based Approach for Clustering  on Categorical ...A Study in Employing Rough Set Based Approach for Clustering  on Categorical ...
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...IOSR Journals
 
Secure E- Health Care Model
Secure E- Health Care ModelSecure E- Health Care Model
Secure E- Health Care ModelIOSR Journals
 
Technology Integration Pattern For Distributed Scrum of Scrum
Technology Integration Pattern For Distributed Scrum of ScrumTechnology Integration Pattern For Distributed Scrum of Scrum
Technology Integration Pattern For Distributed Scrum of ScrumIOSR Journals
 
Impact of Distributed Generation on Reliability of Distribution System
Impact of Distributed Generation on Reliability of Distribution SystemImpact of Distributed Generation on Reliability of Distribution System
Impact of Distributed Generation on Reliability of Distribution SystemIOSR Journals
 
Pretext Knowledge Grids on Unstructured Data for Facilitating Online Education
Pretext Knowledge Grids on Unstructured Data for Facilitating  Online EducationPretext Knowledge Grids on Unstructured Data for Facilitating  Online Education
Pretext Knowledge Grids on Unstructured Data for Facilitating Online EducationIOSR Journals
 
An Improved Ant-Based Algorithm for Minimum Degree Spanning Tree Problems
An Improved Ant-Based Algorithm for Minimum Degree  Spanning Tree Problems An Improved Ant-Based Algorithm for Minimum Degree  Spanning Tree Problems
An Improved Ant-Based Algorithm for Minimum Degree Spanning Tree Problems IOSR Journals
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...IOSR Journals
 
Preclusion Measures for Protecting P2P Networks from Malware Spread
Preclusion Measures for Protecting P2P Networks from Malware  SpreadPreclusion Measures for Protecting P2P Networks from Malware  Spread
Preclusion Measures for Protecting P2P Networks from Malware SpreadIOSR Journals
 
Simulation Based Performance Evaluation of Queueing Disciplines for Multi-Cl...
Simulation Based Performance Evaluation of Queueing  Disciplines for Multi-Cl...Simulation Based Performance Evaluation of Queueing  Disciplines for Multi-Cl...
Simulation Based Performance Evaluation of Queueing Disciplines for Multi-Cl...IOSR Journals
 

Viewers also liked (20)

H0325660
H0325660H0325660
H0325660
 
G01064046
G01064046G01064046
G01064046
 
Efficient Design of Reversible Sequential Circuit
Efficient Design of Reversible Sequential CircuitEfficient Design of Reversible Sequential Circuit
Efficient Design of Reversible Sequential Circuit
 
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...
A Study in Employing Rough Set Based Approach for Clustering  on Categorical ...A Study in Employing Rough Set Based Approach for Clustering  on Categorical ...
A Study in Employing Rough Set Based Approach for Clustering on Categorical ...
 
H0444750
H0444750H0444750
H0444750
 
J0956064
J0956064J0956064
J0956064
 
K0556366
K0556366K0556366
K0556366
 
Secure E- Health Care Model
Secure E- Health Care ModelSecure E- Health Care Model
Secure E- Health Care Model
 
Technology Integration Pattern For Distributed Scrum of Scrum
Technology Integration Pattern For Distributed Scrum of ScrumTechnology Integration Pattern For Distributed Scrum of Scrum
Technology Integration Pattern For Distributed Scrum of Scrum
 
Impact of Distributed Generation on Reliability of Distribution System
Impact of Distributed Generation on Reliability of Distribution SystemImpact of Distributed Generation on Reliability of Distribution System
Impact of Distributed Generation on Reliability of Distribution System
 
Pretext Knowledge Grids on Unstructured Data for Facilitating Online Education
Pretext Knowledge Grids on Unstructured Data for Facilitating  Online EducationPretext Knowledge Grids on Unstructured Data for Facilitating  Online Education
Pretext Knowledge Grids on Unstructured Data for Facilitating Online Education
 
A0320105
A0320105A0320105
A0320105
 
An Improved Ant-Based Algorithm for Minimum Degree Spanning Tree Problems
An Improved Ant-Based Algorithm for Minimum Degree  Spanning Tree Problems An Improved Ant-Based Algorithm for Minimum Degree  Spanning Tree Problems
An Improved Ant-Based Algorithm for Minimum Degree Spanning Tree Problems
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
 
B0940509
B0940509B0940509
B0940509
 
G0314045
G0314045G0314045
G0314045
 
Preclusion Measures for Protecting P2P Networks from Malware Spread
Preclusion Measures for Protecting P2P Networks from Malware  SpreadPreclusion Measures for Protecting P2P Networks from Malware  Spread
Preclusion Measures for Protecting P2P Networks from Malware Spread
 
Simulation Based Performance Evaluation of Queueing Disciplines for Multi-Cl...
Simulation Based Performance Evaluation of Queueing  Disciplines for Multi-Cl...Simulation Based Performance Evaluation of Queueing  Disciplines for Multi-Cl...
Simulation Based Performance Evaluation of Queueing Disciplines for Multi-Cl...
 
E0932226
E0932226E0932226
E0932226
 
D0342934
D0342934D0342934
D0342934
 

Similar to A Comprehensive Overview of Clustering Algorithms in Pattern Recognition

iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfVIKASGUPTA127897
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporaryprjpublications
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingIOSR Journals
 
AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxSyed Ejaz
 
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONMAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONVLSICS Design
 
Chapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxChapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxAmy Aung
 
Performance Evaluation of Basic Segmented Algorithms for Brain Tumor Detection
Performance Evaluation of Basic Segmented Algorithms for Brain Tumor DetectionPerformance Evaluation of Basic Segmented Algorithms for Brain Tumor Detection
Performance Evaluation of Basic Segmented Algorithms for Brain Tumor DetectionIOSR Journals
 
Performance Evaluation of Basic Segmented Algorithms for Brain Tumor Detection
Performance Evaluation of Basic Segmented Algorithms for Brain Tumor DetectionPerformance Evaluation of Basic Segmented Algorithms for Brain Tumor Detection
Performance Evaluation of Basic Segmented Algorithms for Brain Tumor DetectionIOSR Journals
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clusteringIAEME Publication
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clusteringprjpublications
 
Predicting Students Performance using K-Median Clustering
Predicting Students Performance using  K-Median ClusteringPredicting Students Performance using  K-Median Clustering
Predicting Students Performance using K-Median ClusteringIIRindia
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptionsrefedey275
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptxGandhiMathy6
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
 
B colouring
B colouringB colouring
B colouringxs76250
 

Similar to A Comprehensive Overview of Clustering Algorithms in Pattern Recognition (20)

iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporary
 
Survey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in DataminingSurvey on Unsupervised Learning in Datamining
Survey on Unsupervised Learning in Datamining
 
I1803026164
I1803026164I1803026164
I1803026164
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
AI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptxAI-Lec20 Clustering I - Kmean.pptx
AI-Lec20 Clustering I - Kmean.pptx
 
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONMAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
 
Chapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptxChapter 10.1,2,3.pptx
Chapter 10.1,2,3.pptx
 
47 292-298
47 292-29847 292-298
47 292-298
 
Performance Evaluation of Basic Segmented Algorithms for Brain Tumor Detection
Performance Evaluation of Basic Segmented Algorithms for Brain Tumor DetectionPerformance Evaluation of Basic Segmented Algorithms for Brain Tumor Detection
Performance Evaluation of Basic Segmented Algorithms for Brain Tumor Detection
 
Performance Evaluation of Basic Segmented Algorithms for Brain Tumor Detection
Performance Evaluation of Basic Segmented Algorithms for Brain Tumor DetectionPerformance Evaluation of Basic Segmented Algorithms for Brain Tumor Detection
Performance Evaluation of Basic Segmented Algorithms for Brain Tumor Detection
 
Data Clusterng
Data ClusterngData Clusterng
Data Clusterng
 
A0310112
A0310112A0310112
A0310112
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
Predicting Students Performance using K-Median Clustering
Predicting Students Performance using  K-Median ClusteringPredicting Students Performance using  K-Median Clustering
Predicting Students Performance using K-Median Clustering
 
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptx
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
B colouring
B colouringB colouring
B colouring
 

More from IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

A Comprehensive Overview of Clustering Algorithms in Pattern Recognition

  • 1. IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661 Volume 4, Issue 6 (Sep-Oct. 2012), PP 23-30 www.iosrjournals.org www.iosrjournals.org 23 | Page A Comprehensive Overview of Clustering Algorithms in Pattern Recognition Namratha M 1 , Prajwala T R 2 1, 2 (Dept. of information science, PESIT/visvesvaraya technological university, India) Abstract: Machine learning is a branch of artificial intelligence which recognizes complex patterns for making intelligent decisions based on input data values. In machine learning, pattern recognition assigns input value to given set of data labels. Based on the learning method used to generate the output we have the following classification, supervised and unsupervised learning. Unsupervised learning involves clustering and blind signal separation. Supervised learning is also known as classification. This paper mainly focuses on clustering techniques such as K-means clustering, hierarchical clustering which in turn involves agglomerative and divisive clustering techniques. This paper deals with introduction to machine learning, pattern recognition, clustering techniques. We present steps involved in each of these clustering techniques along with an example and the necessary formula used. The paper discusses the advantages and disadvantages of each of these techniques and in turn we make a comparison of K-means and hierarchical clustering techniques. Based on these comparisons we suggest the best suited clustering technique for the specified application. Keywords: Agglomerative, Clustering, Divisive, K-means, Machine learning, Pattern recognition I. Introudction Machine learning is the field of research devoted to study of learning systems. Machine learning refers to changes in the systems that perform tasks associated with artificial intelligence like recognition, diagnosis, prediction and so on. In machine learning[1], pattern recognition is assignment of label to given input value. A pattern is an entity like fingerprint image, handwritten word or human face that could be given a name. Recognition is an act of associating a classification with a label. Pattern recognition[2] is the science of making inferences based on data. It’s objective is to assign an object or event to one of a number of categories based on features derived to emphasize commonalities. Figure 1: Design cycle for patter recognition Pattern recognition involves three types of learning: 1. Unsupervised learning 2. Supervised learning 3. Semisupervised learning In unsupervised learning[2] also known as cluster analysis, the basic task is to develop classification labels. It’s task is to arrive at some grouping of data. The training set consists of labeled data. Two types of unsupervised learning are: a. Clustering b. Blind signal separation In supervised learning[2], classes are predetermined. The classes are seen as a finite set of data. A certain segment of data will be labeled with these classification. The task is to search for patterns and construct mathematical models. The training set consists of unlabeled data.
  • 2. AComprehensive Overview Of Clustering Algorithms In Pattern Recognition www.iosrjournals.org 24 | Page Two types of supervised learning are: a. Classification b. Ensemble learning Semisupervised learning deals with methods for exploiting unlabeled data and labeled data automatically to improve learning performance without human intervention. Four types of semisupervised learning are: a. Deep learning b. Low density separation c. Graph based methods d. Heuristic approach Clustering is a form of unsupervised learning which involves the task of finding groups of objects which are similar to one another and different from the objects in another group. The goal is to minimize intracluster distances and maximize intercluster distances[3]. Figure 2: Graphical representation of clustering II. K-Means Clustering K-means is one of the simplest unsupervised learning algorithm that is used to generate specific number of disjoint and non-hierarchical clusters based on attributes[4]. It is a numerical, non- deterministic and iterative method to find the clusters. The purpose is to classify data. Steps in K-means clustering: Step 1: Consider K points to be clustered x1,…, xK. These are represented in a space in which objects are being clustered. These points represent initial centroids. Step 2: Each object is assigned to the group that has closest centroid[5]. Step 3: The positions of K centroids are recalculated after all objects have been assigned. C(i) denotes cluster number for the ith observation[5] Step 4: Reiterate steps 2 and 3 until no other distinguished centroid can be found. Hence, K clusters whose intracluster distance is minimized and intercluster distance is maximized[5]. where mk is the mean vector of the kth cluster .,,1, )(: Kk N x m k kiCi i k   NimxiC Kk ki ,,1,minarg)( 1 2            K k kiC kik K k kiC kjC ji mxNxxCW 1 )( 2 1 )( )( 2 2 1 )(
  • 3. AComprehensive Overview Of Clustering Algorithms In Pattern Recognition www.iosrjournals.org 25 | Page Nk is the number of observations in k th cluster The choice of initial cluster can greatly affect the final clusters in terms of intracluster distance, intercluster distance and cohesion. The sum of squares of distance between object and corresponding cluster centroid is minimum in the final cluster. Figure 3: Flowchart to represent steps in K-means clustering Advantages: 1. K-means is computationally fast. 2. It is a simple and understandable unsupervised learning algorithm Disadvantages: 1. Difficult to identify the initial clusters. 2. Prediction of value of K is difficult because the number of clusters is fixed at the beginning. 3. The final cluster patterns is dependent on the initial patterns. Example: Problem: To find the cluster of 5 points: (2,3),(4,6),(7,3),(1,2),(8,6). Solution: The initial clusters are (4,6) and (2,3) Iteration 1: (4,6) (2,3) Cluster (2,3) 5 0 2 (1,2) 7 2 2 (4,6) 0 5 1 (8,6) 4 9 1 (7,3) 6 5 2 The cluster column is calculated by finding the shortest distance between the points[6]. The new values of centroid are (6,6) and (10/3,8/3). Iteration 2: Repeat the above steps to find the new values of centroids. Since the values converge, we do not proceed to next iteration. Hence the final clusters are : Cluster 1: (4,6) and (8,6) Cluster 2:(2,3) , (1,2) and (7,3) Applications[7]:  Used in segementation and retreival of grey level images[6].  Applied for spatial and temporal datasets in the field of geostatics.  Used to analyze the listed enterprises in financial organizations[4].  Also used in the fields of astronomy and agriculture.
  • 4. AComprehensive Overview Of Clustering Algorithms In Pattern Recognition www.iosrjournals.org 26 | Page III. Hierarchical Clustering It is an unsupervised learning technique that outputs a hierarchical structure which does not require to prespecify the nuimber of clusters. It is a deteministic algorithm[3]. There are two kinds of hierarchical clustering: 1. Agglomerative clustering 2. Divisive clusttering Agglomerative clustering: It is a bottom up approach with n singleton clusters initially where each cluster has subclusters which in turn have subclusters and so on[9]. Steps in agglomerative clustering: Step 1: Each singleton group is assigned with unique data points. Step 2: Merge the two adjacent groups iteratvely repeat this step. Calculate the Euclidian distance using the formula given below[8], Where a(x1,y1) and b(x2,y2) represent the coordinates of the clusters . Mean distance dmean(Di,Dj)=||xi-xj|| Where Di and Dj represent the clusters i and j respectively Xi and xj are the means of clusters i and j respectively Step 3: Repeat until a single cluster is obtained. Figure 4: Flowchart for steps in agglomerative clustering Advantages: 1. It ranks the objects for easier data display. 2. Small clusters are obtained which is easier to analyze and understand. 3. Number of clusters is not fixed at the beginning. Hence, user has the flexibility of choosing the clusters dynamically. Disadvantages: 1. If objects are grouped incorrectly at the initial stages , they cannot be relocated at later stages.
  • 5. AComprehensive Overview Of Clustering Algorithms In Pattern Recognition www.iosrjournals.org 27 | Page 2. The results vary based on the distance metrics used. Example: Problem: To find the cluster of 5 points: A(2,3),B(4,6),C(7,3),D(1,2),E(8,6). Solution: Iteration 1: Calculate the Euclidian distance between two points. Euclidian distance between two points are: A(2,3) and B(1,2)=sqrt(2)=1.41 A(2,3) and C(4,6)=sqrt(13)=3.6 A(2,3) and D(8,6)=sqrt(25)=5 A(2,3) and E(7,3)= sqrt(25)=5 The two adjacent clusters are A(2,3) and B(1,2). Merge these two clusters. The new centroid is F(1.5,2.5). Iteration 2: Repeat the above step and merge adjacent clusters as above. The two adjacent clusters are C(4,6) and D(8,6). Merge these two clusters. The new centroid is G(6,6). Iteration 3: Repeat the above step and merge adjacent clusters as above. The two adjacent clusters are G(6,6) and E(7,3). Merge these two clusters. The new centroid is H(6.5,4.5). Iteration 4: Repeat the above step and merge adjacent clusters as above. The two adjacent clusters are H(1.5,2.5) and F(6.5,4.5). Merge these two clusters. Finally we get the resultant single cluster R. Figure 5: Diagrammatic representation of agglomerative clustering for the above example Applications: 1. Used in search engine query logs for knowledge discovery. 2. Used in image classification systems to merge logically adjacent pixel values. 3. Used in automatic document classification. 4. Used in web document categorization. Divisive clustering: It is a top-down clustering method which works in a similar way to agglomerative clustering but in the opposite direction. This method starts with a single cluster containing all objects and then successively splits resulting clusters until only clusters of individual objects remain[10]. Steps in divisive clustering: Step 1: Initially consider a singleton cluster. Step 2: Iteratively divide the clusters into smaller clusters based on the Euclidian distance. Objects with least Euclidian distance are grouped into a single cluster[8].
  • 6. AComprehensive Overview Of Clustering Algorithms In Pattern Recognition www.iosrjournals.org 28 | Page Where a(x1,y1) and b(x2,y2) represent the coordinates of the clusters . Step 3: Repeat the process until desired number of clusters are obtained and Euclidian distance remains constant to obtain the final dendogram. Figure 6: Flowchart for steps in divisive clustering Advantages: 1. Focuses on the upper levels of dendogram. 2. We have access to all the data, hence the best possible solution is obtained. Disadvantages: 1. Computational difficulties arise while splitting the clusters. 2. The results vary based on the distance metrics used. Example: Problem: To find the cluster of 5 points: A(2,3),B(4,6),C(7,3),D(1,2),E(8,6).
  • 7. AComprehensive Overview Of Clustering Algorithms In Pattern Recognition www.iosrjournals.org 29 | Page Solution: Iteration 1: Calculate the Euclidian distance between two points. Euclidian distance between two points are: A(2,3) B(4,6) C(7,3) D(1,2) E(8,6) A(2,3) Sqrt(13) Sqrt(25) Sqrt(2) Sqrt(45) B(4,6) Sqrt(18) Sqrt(25) Sqrt(16) C(7,3) Sqrt(37) Sqrt(10) D(1,2) Sqrt(65) E(8,6) Since sqrt(2) is the least Euclidian distance merge the point sA(2,3) and D(1,2) The new centroid is F(1.5,2.5). Iteration 2: Repeat the above step and merge adjacent clusters with least Euclidian distance as above. The two adjacent clusters are C(7,3) and E(8,6). Merge these two clusters. The new centroid is G(7.5,4.5). Iteration 3: Repeat the above step and merge adjacent clusters with least Euclidian distance as above. The two adjacent clusters are B(4,6) and G(7.5,4.5). Merge these two clusters. Figure 7: The resulting dendogram for the above example[11]. Applications: 1. Used in medical imaging for PET scans. 2. Used in world Wide Web in social networking analysis and slippy map optimization. 3. Used in market research for grouping shopping items. 4. Used in crime analysis to find hot spots where crime has occurred. 5. Also used in mathematical chemistry and petroleum geology. Agglomerative versus divisive clustering: Table 1:Comparison of hierarchical clustering techniques Agglomerative Divisive 1. Bottom-up approach Top-down approach 2. Faster to compute Slower to compute 3. More ration al to global structure of data. Less blind to global structure of data 4. Best possible merge is obtained Best possible split is obtained 5. Access to individual objects Access to all data IV. K-Means versus Hierarchical Clustering Table 2:Comparison of clustering techniques Hierarchical K means Sequential partitioning process Iterative partitioning process Results in nested cluster structure Results in Flat mutually exclusive structure Membership of an object or cluster in fixed Membership of an object or cluster could be constantly changed. Prior knowledge of the number of clusters is not needed. Prior knowledge of the number of clusters is needed in advance. Generic clustering technique irrespective of the data types. Data are Summarized by representative entities. Run time is slow. Run time Faster than Hierarchical. Hierarchical clustering requires only a similarity measure. K-means clustering requires stronger assumptions such as number of clusters and the initial centers.
  • 8. AComprehensive Overview Of Clustering Algorithms In Pattern Recognition www.iosrjournals.org 30 | Page V. Conclusion This paper discusses the clustering techniques along with an illustrative example. By comparing the advantages and disadvantages of each of these techniques we made a list of the applications where the techniques could be used. Whenever we require a sequential partitioning and time is not a constraint hierarchical clustering can be used. Contradictorily when prior knowledge of clusters is available and mutually exclusive structure is used as training data we use K-means clustering. Each of the techniques described in this paper has it’s own advantages and disadvantages. To overcome these disadvantages optimization techniques can be used for better performance. References [1] Tom Mitchell: "Machine Learning", McGraw Hill, 1997. [2] http://www.springer.com/computer/image+processing/book/978-0-387-31073-2 [3] Data Clustering: A Review A.K. JAIN Michigan State University M.N. MURTY Indian Institute of Science AND P.J. FLYNN The Ohio State University [4] http://books.ithunder.org/NLP/%E6%90%9C%E7%B4%A2%E8%B5%84%E6%96%99/%E6%96%87%E6%9C%AC%E8%81% 9A%E7%B1%BB/k-means/kmeans11.pdf [5] http://gecco.org.chemie.uni-frankfurt.de/hkmeans/H-k-means.pdf [6] http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/kmeans.html#macqueen [7] http://delivery.acm.org/10.1145/2350000/2345414/p106 mishra.pdf?ip=119.82.126.162&acc=ACTIVE%20SERVICE&CFID=109187129&CFTOKEN=83079467& a cm =1346224103_195980e7451e402acb712a927456104d [8] http://www.cs.princeton.edu/courses/archive/spr07/cos424/papers/bishop-regression.pdf [9] http://delivery.acm.org/10.1145/2010000/2003657/p34- spiegel.pdf?ip=119.82.126.162&acc=ACTIVE%20SERVICE&CFID=109187129&CFTOKEN=83079467& acm [10] =1346223866_ec4f1d23636d3f275175a2f7bc11c432 [11] http://www.frontiersinai.com/ecai/ecai2004/ecai04/pdf/p0435.pdf [12] http://delivery.acm.org/10.1145/950000/944973/3-1265- dhillon.pdf?ip=119.82.126.162&acc=PUBLIC&CFID=109187129&CFTOKEN=83079467& acm =1346223975 [13] _b9279bd748e1660bad277d28c67683cc