SlideShare a Scribd company logo
1 of 48
Download to read offline
Prof. Pier Luca Lanzi
Clustering: Hierarchical
Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
3
Prof. Pier Luca Lanzi
4
Prof. Pier Luca Lanzi
5
Prof. Pier Luca Lanzi
6
Prof. Pier Luca Lanzi
7
Prof. Pier Luca Lanzi
8
Prof. Pier Luca Lanzi
9
Prof. Pier Luca Lanzi
10
Prof. Pier Luca Lanzi
What is Hierarchical Clustering?
•  Suppose we have five items, a, b, c, d, and e.
•  Initially, we consider one cluster for each item
•  Then, at each step we merge together the most similar clusters,
until we generate one cluster
a
b
c
d
e
a,b
d,e
c,d,e
a,b,c,d,e
Step 0 Step 1 Step 2 Step 3 Step 4
11
Prof. Pier Luca Lanzi
What is Hierarchical Clustering?
•  Alternatively, we start from one cluster containing the five
elements
•  Then, at each step we split one cluster to improve intracluster
similarity, until all the elements are contained in one cluster
c
a
b
d
e
d,e
a,b,c,d,e
a,b
c,d,e
Step 4 Step 3 Step 2 Step 1 Step 0
Prof. Pier Luca Lanzi
What is Hierarchical Clustering?
•  By far, it is the most common clustering technique
•  Produces a hierarchy of nested clusters
•  The hiearchy be visualized as a dendrogram: a tree like diagram
that records the sequences of merges or splits
a
b
c
d
e
a,b
d,e
c,d,e
a,b,c,d,e
13
Prof. Pier Luca Lanzi
What Approaches?
•  Agglomerative
§ Start individual clusters, at each step, merge the closest pair of clusters 
until only one cluster (or k clusters) left
•  Divisive
§ Start with one cluster, at each step, split a cluster until each cluster 
contains a point (or there are k clusters)
14
a
b
c
d
e
a,b
d,e
c,d,e
a,b,c,d,e
agglomerative
divisive
Prof. Pier Luca Lanzi
Strengths of Hierarchical Clustering
•  No need to assume any particular number of clusters
•  Any desired number of clusters can be obtained by ‘cutting’ the
dendogram at the proper level
•  They may correspond to meaningful taxonomies
•  Example in biological sciences (e.g., animal kingdom, phylogeny
reconstruction, …)
•  Traditional hierarchical algorithms use a similarity 
or distance matrix to merge or split one cluster at a time
15
Prof. Pier Luca Lanzi
Agglomerative Clustering Algorithm
•  More popular hierarchical clustering technique
•  Compute the proximity matrix
•  Let each data point be a cluster
•  Repeat
§ Merge the two closest clusters
§ Update the proximity matrix
•  Until only a single cluster remains
•  Key operation is the computation of the proximity of two clusters
•  Different approaches to defining the distance between clusters
distinguish the different algorithms
16
Prof. Pier Luca Lanzi
Hierarchical Clustering: 
Time and Space Requirements
•  O(N2) space since it uses the proximity matrix.
§ N is the number of points.
•  O(N3) time in many cases
§ There are N steps and at each step the size, N2, proximity
matrix must be updated and searched
§ Complexity can be reduced to O(N2 log(N) ) 
time for some approaches
17
Prof. Pier Luca Lanzi
Efficient Implementation
•  Compute the distance between all pairs of points [O(N2)]
•  Insert the pairs and their distances into a priority queue to fine the min in one
step [O(N2)]
•  When two clusters are merged, we remove all entries in the priority queue
involving one of these two clusters [O(Nlog N)]
•  Compute all the distances between the new cluster and the re- maining
clusters [O(NlogN)]
•  Since the last two steps are executed at most N time, the complexity of the
whole algorithms is O(N2logN)
18
Prof. Pier Luca Lanzi
Hierarchical Clustering in R
# init the seed to be able to repeat the experiment
set.seed(1234)
par(mar=c(0,0,0,0))
# randomly generates the data
x-rnorm(12, mean=rep(1:3,each=4), sd=0.2)
y-rnorm(12, mean=rep(c(1,2,1),each=4), sd=0.2)
plot(x,y,pch=19,cex=2,col=blue)
# distance matrix
d - data.frame(x,y)
dm - dist(d)
# generate the
cl - hclust(dm)
plot(cl)
# other ways to plot dendrograms
# http://rstudio-pubs-static.s3.amazonaws.com/1876_df0bf890dd54461f98719b461d987c3d.html
19
Prof. Pier Luca Lanzi
hierarchical clustering generates
a set of N possible partitions
which one should I choose?
Prof. Pier Luca Lanzi
From the previous lecture we know ideally
a good cluster should partition points so that …
Data points in the same cluster should have
a small distance from one another
Data points in different clusters should be at
a large distance from one another.
Prof. Pier Luca Lanzi
Within/Between Clusters Sum of Squares
•  Within-cluster sum of squares




where μi is the centroid of cluster Ci (in case of Euclidean spaces)
•  Between-cluster sum of squares




where μ is the centroid of the whole dataset
22
Prof. Pier Luca Lanzi
Within/Between Clusters Sum of Squares
•  Within-cluster sum of squares




where μi is the centroid of cluster Ci (in case of Euclidean spaces)
•  Between-cluster sum of squares




where μ is the centroid of the whole dataset
23
Prof. Pier Luca Lanzi
Evaluation of Hierarchical Clustering using Knee Analysis
plot the WSS and BSS for every clustering and look
for a knee in the plot that show a significant
modification in the evaluation metrics
Prof. Pier Luca Lanzi
Evaluation of Clustering in R
library(GMD)
###
### checking the quality of the previous cluster
###
# init two vectors that will contain the evaluation
# in terms of within and between sum of squares
plot_wss = rep(0,12)
plot_bss = rep(0,12)
# evaluate every clustering
for(i in 1:12)
{
clusters - cutree(cl,i)
eval - css(dm,clusters);
plot_wss[i] - eval$totwss
plot_bss[i] - eval$totbss
}
25
Prof. Pier Luca Lanzi
Evaluation of Clustering in R
# plot the results
x = 1:12
plot(x, y=plot_bss, main=Between Cluster Sum-of-square,
cex=2, pch=18, col=blue, xlab=Number of Clusters,
ylab=Evaluation)
lines(x, plot_bss, col=blue)
par(new=TRUE)
plot(x, y=plot_wss, cex=2, pch=19, col=red, ylab=, xlab=)
lines(x,plot_wss, col=red);
26
Prof. Pier Luca Lanzi
Knee/Elbow Analysis of Clustering 27
Prof. Pier Luca Lanzi
Hierarchical Clustering in R – Iris2D
library(foreign)
iris = read.arff(iris.2D.arff)
with(iris, plot(petallength,petalwidth, col=blue, pch=19, cex=2))
dm - dist(iris[,1:2])
cl - hclust(iris_dist, method=single)
#clustering - hclust(dist(iris[,1:2],method=manhattan), method=single)
plot(cl)
cl_average - hclust(iris_dist, method=average)
plot(clustering)
cutree(clustering,2)
28
Prof. Pier Luca Lanzi
Knee/Elbow Analysis of Clustering for iris2D 29
Prof. Pier Luca Lanzi
Knee/Elbow Analysis of Clustering for iris 30
Prof. Pier Luca Lanzi
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
Prof. Pier Luca Lanzi
distance between clusters
Prof. Pier Luca Lanzi
Initial Configuration
•  Start with clusters of individual points and a proximity matrix
...
p1 p2 p3 p4 p9 p10 p11 p12
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
. Proximity Matrix
33
Prof. Pier Luca Lanzi
Intermediate Situation
•  After some merging steps, we have some clusters
...
p1 p2 p3 p4 p9 p10 p11 p12
C1
C4
C2 C5
C3
C2C1
C1
C3
C5
C4
C2
C3 C4 C5
Proximity Matrix
34
Prof. Pier Luca Lanzi
Intermediate Situation
•  We want to merge the two closest clusters (C2 and C5) and
update the proximity matrix.
...
p1 p2 p3 p4 p9 p10 p11 p12
C1
C4
C2 C5
C3
C2C1
C1
C3
C5
C4
C2
C3 C4 C5
Proximity Matrix
35
Prof. Pier Luca Lanzi
After Merging
•  The question is “How do we update the proximity matrix?”
...
p1 p2 p3 p4 p9 p10 p11 p12
C1
C4
C2 U C5
C3
? ? ? ?
?
?
?
C2
U
C5C1
C1
C3
C4
C2 U C5
C3 C4
Proximity Matrix
36
Prof. Pier Luca Lanzi
Typical Alternatives to Calculate the Distance
Between Clusters
•  Single link (or MIN)
§ smallest distance between an element in one cluster and an
element in the other, i.e., d(Ci, Cj) = min(ti,p, tj,q)
•  Complete link (or MAX)
§ largest distance between an element in one cluster and 
an element in the other, i.e., d(Ci, Cj) = max(ti,p, tj,q)
•  Average (or group average)
§ average distance between an element in one cluster and an
element in the other, i.e., dis(Ci, Cj) = avg(ti,p, tj,q)
•  Centroid
§ distance between the centroids of two clusters, i.e., 
d(Ci, Cj) = d(μi, μj) where μi and μi are the centroids
•  …
37
Prof. Pier Luca Lanzi
Similarity?
Prof. Pier Luca Lanzi
Single Linkage or MIN
Prof. Pier Luca Lanzi
Complete Linkage or MAX
Prof. Pier Luca Lanzi
Average or Group Average
Prof. Pier Luca Lanzi
Distance between Centroids
× ×
Prof. Pier Luca Lanzi
Example
•  Suppose we have five items, a, b, c, d, and e.
•  We wanto to perform hierarchical clustering on 
five instances following an agglomerative approach
•  First: we compute the distance or similarity matrix
•  Dij is the distance between instancce “i” and “j”
⎟
⎟
⎟
⎟
⎟
⎟
⎠
⎞
⎜
⎜
⎜
⎜
⎜
⎜
⎝
⎛
=
0003050809
000409010
000506
0002
00
.....
....
...
..
.
D
43
Prof. Pier Luca Lanzi
Example
•  Group the two instances that are closer
•  In this case, a and b are the closest items (D2,1=2)
•  Compute again the distance matrix, and start again.
•  Suppose we apply single-linkage (MIN), we need to compute the
distance between the new cluster {1,2} and the others
§ d(12)3 = min[d13,d23] = d23 = 5.0
§ d(12)4 = min[d14,d24] = d24 = 9.0
§ d(12)5 = min[d15,d25] = d25 = 8.0
44
Prof. Pier Luca Lanzi
Example
•  The new distance matrix is,
⎟
⎟
⎟
⎟
⎟
⎠
⎞
⎜
⎜
⎜
⎜
⎜
⎝
⎛
=
0.00.30.50.8
0.00.40.9
0.00.5
0.0
D
45
•  At the end, we obtain the
following dendrogram
Prof. Pier Luca Lanzi
Hierarchical Clustering: 
Problems and Limitations
•  Once a decision is made to combine two clusters,
it cannot be undone
•  No objective function is directly minimized
•  Different schemes have problems with one 
or more of the following:
§ Sensitivity to noise and outliers
§ Difficulty handling different sized clusters 
and convex shapes
§ Breaking large clusters
•  Major weakness of agglomerative clustering methods
§ do not scale well: time complexity of at least O(n2), where n
is the number of total objects
§ can never undo what was done previously
46
Prof. Pier Luca Lanzi
how can we represent clusters?
Prof. Pier Luca Lanzi
Euclidean vs Non-Euclidean Spaces
•  Euclidean Spaces
§ We can identify a cluster using for instance its centroid 
(e.g. computed as the average among all its data points)
§ Alternatively, we can use its convex hull
•  Non-Euclidean Spaces
§ We can define a distance (jaccard, cosine, edit)
§ We cannot compute a centroid and we can introduce the concept of
clustroid
•  Clustroid
§ An existing data point that we take as a cluster representative
§ It can be the point that minimizes the sum of the distances to the other
points in the cluster
§ Or, the one minimizing the maximum distance to another point
§ Or, the sum of the squares of the distances to the other points in the
cluster
48

More Related Content

What's hot

DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationPier Luca Lanzi
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationPier Luca Lanzi
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...hyunsung lee
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
Hierarchical Clustering With KSAI
Hierarchical Clustering With KSAIHierarchical Clustering With KSAI
Hierarchical Clustering With KSAIKnoldus Inc.
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningChun-Ming Chang
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means ClusteringAnna Fensel
 
Poggi analytics - clustering - 1
Poggi   analytics - clustering - 1Poggi   analytics - clustering - 1
Poggi analytics - clustering - 1Gaston Liberman
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 

What's hot (20)

DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to Classification
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Hierarchical Clustering With KSAI
Hierarchical Clustering With KSAIHierarchical Clustering With KSAI
Hierarchical Clustering With KSAI
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep Learning
 
Kmeans
KmeansKmeans
Kmeans
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Lect4
Lect4Lect4
Lect4
 
Poggi analytics - clustering - 1
Poggi   analytics - clustering - 1Poggi   analytics - clustering - 1
Poggi analytics - clustering - 1
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
K means clustering
K means clusteringK means clustering
K means clustering
 

Viewers also liked

DMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesDMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesPier Luca Lanzi
 
Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Pier Luca Lanzi
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationPier Luca Lanzi
 
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringDMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringPier Luca Lanzi
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsPier Luca Lanzi
 
DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1Pier Luca Lanzi
 
DMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesDMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesPier Luca Lanzi
 
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsDMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsPier Luca Lanzi
 
DMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph MiningDMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph MiningPier Luca Lanzi
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionPier Luca Lanzi
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationPier Luca Lanzi
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningPier Luca Lanzi
 
DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2Pier Luca Lanzi
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesPier Luca Lanzi
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesPier Luca Lanzi
 
Codemotion Milan - November 21 2015
Codemotion Milan - November 21 2015Codemotion Milan - November 21 2015
Codemotion Milan - November 21 2015Pier Luca Lanzi
 
Machine Learning and Data Mining: 02 Machine Learning
Machine Learning and Data Mining: 02 Machine LearningMachine Learning and Data Mining: 02 Machine Learning
Machine Learning and Data Mining: 02 Machine LearningPier Luca Lanzi
 
DMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringDMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringPier Luca Lanzi
 

Viewers also liked (20)

DMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesDMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification Rules
 
Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data Preparation
 
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringDMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based Clustering
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification Models
 
DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1
 
DMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesDMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision Trees
 
Course Introduction
Course IntroductionCourse Introduction
Course Introduction
 
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsDMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
 
DMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph MiningDMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph Mining
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course Introduction
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data Representation
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
 
Course Organization
Course OrganizationCourse Organization
Course Organization
 
DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association Rules
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
 
Codemotion Milan - November 21 2015
Codemotion Milan - November 21 2015Codemotion Milan - November 21 2015
Codemotion Milan - November 21 2015
 
Machine Learning and Data Mining: 02 Machine Learning
Machine Learning and Data Mining: 02 Machine LearningMachine Learning and Data Mining: 02 Machine Learning
Machine Learning and Data Mining: 02 Machine Learning
 
DMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringDMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to Clustering
 

Similar to DMTM 2015 - 07 Hierarchical Clustering

Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsMarina Santini
 
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Huang Po Chun
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applicationsFrank Nielsen
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
UnSupervised Machincs4811-ch23a-clustering.ppt
UnSupervised Machincs4811-ch23a-clustering.pptUnSupervised Machincs4811-ch23a-clustering.ppt
UnSupervised Machincs4811-ch23a-clustering.pptRamanamurthy Banda
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesBertram Ludäscher
 
Lecture8 clustering
Lecture8 clusteringLecture8 clustering
Lecture8 clusteringsidsingh680
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.ShwetaPatil174
 
Dirichlet processes and Applications
Dirichlet processes and ApplicationsDirichlet processes and Applications
Dirichlet processes and ApplicationsSaurav Jha
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Association for Computational Linguistics
 
Fuzzy c means clustering protocol for wireless sensor networks
Fuzzy c means clustering protocol for wireless sensor networksFuzzy c means clustering protocol for wireless sensor networks
Fuzzy c means clustering protocol for wireless sensor networksmourya chandra
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsVarad Meru
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningCharles Deledalle
 

Similar to DMTM 2015 - 07 Hierarchical Clustering (20)

Csc446: Pattern Recognition
Csc446: Pattern Recognition Csc446: Pattern Recognition
Csc446: Pattern Recognition
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest Neighbors
 
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applications
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
UnSupervised Machincs4811-ch23a-clustering.ppt
UnSupervised Machincs4811-ch23a-clustering.pptUnSupervised Machincs4811-ch23a-clustering.ppt
UnSupervised Machincs4811-ch23a-clustering.ppt
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Clustering.pdf
Clustering.pdfClustering.pdf
Clustering.pdf
 
Lecture8 clustering
Lecture8 clusteringLecture8 clustering
Lecture8 clustering
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
 
Dirichlet processes and Applications
Dirichlet processes and ApplicationsDirichlet processes and Applications
Dirichlet processes and Applications
 
CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
 
Fuzzy c means clustering protocol for wireless sensor networks
Fuzzy c means clustering protocol for wireless sensor networksFuzzy c means clustering protocol for wireless sensor networks
Fuzzy c means clustering protocol for wireless sensor networks
 
Kmeans plusplus
Kmeans plusplusKmeans plusplus
Kmeans plusplus
 
Data Mining Lecture_7.pptx
Data Mining Lecture_7.pptxData Mining Lecture_7.pptx
Data Mining Lecture_7.pptx
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learning
 

More from Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationPier Luca Lanzi
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesPier Luca Lanzi
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesPier Luca Lanzi
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsPier Luca Lanzi
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesPier Luca Lanzi
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesPier Luca Lanzi
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationPier Luca Lanzi
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationPier Luca Lanzi
 
DMTM Lecture 03 Regression
DMTM Lecture 03 RegressionDMTM Lecture 03 Regression
DMTM Lecture 03 RegressionPier Luca Lanzi
 
DMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionDMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionPier Luca Lanzi
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningPier Luca Lanzi
 
VDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelineVDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelinePier Luca Lanzi
 

More from Pier Luca Lanzi (20)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rules
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensembles
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethods
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rules
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision trees
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representation
 
DMTM Lecture 03 Regression
DMTM Lecture 03 RegressionDMTM Lecture 03 Regression
DMTM Lecture 03 Regression
 
DMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionDMTM Lecture 01 Introduction
DMTM Lecture 01 Introduction
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data mining
 
VDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelineVDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipeline
 

Recently uploaded

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 

Recently uploaded (20)

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 

DMTM 2015 - 07 Hierarchical Clustering

  • 1. Prof. Pier Luca Lanzi Clustering: Hierarchical Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
  • 3. Prof. Pier Luca Lanzi 3
  • 4. Prof. Pier Luca Lanzi 4
  • 5. Prof. Pier Luca Lanzi 5
  • 6. Prof. Pier Luca Lanzi 6
  • 7. Prof. Pier Luca Lanzi 7
  • 8. Prof. Pier Luca Lanzi 8
  • 9. Prof. Pier Luca Lanzi 9
  • 10. Prof. Pier Luca Lanzi 10
  • 11. Prof. Pier Luca Lanzi What is Hierarchical Clustering? •  Suppose we have five items, a, b, c, d, and e. •  Initially, we consider one cluster for each item •  Then, at each step we merge together the most similar clusters, until we generate one cluster a b c d e a,b d,e c,d,e a,b,c,d,e Step 0 Step 1 Step 2 Step 3 Step 4 11
  • 12. Prof. Pier Luca Lanzi What is Hierarchical Clustering? •  Alternatively, we start from one cluster containing the five elements •  Then, at each step we split one cluster to improve intracluster similarity, until all the elements are contained in one cluster c a b d e d,e a,b,c,d,e a,b c,d,e Step 4 Step 3 Step 2 Step 1 Step 0
  • 13. Prof. Pier Luca Lanzi What is Hierarchical Clustering? •  By far, it is the most common clustering technique •  Produces a hierarchy of nested clusters •  The hiearchy be visualized as a dendrogram: a tree like diagram that records the sequences of merges or splits a b c d e a,b d,e c,d,e a,b,c,d,e 13
  • 14. Prof. Pier Luca Lanzi What Approaches? •  Agglomerative § Start individual clusters, at each step, merge the closest pair of clusters until only one cluster (or k clusters) left •  Divisive § Start with one cluster, at each step, split a cluster until each cluster contains a point (or there are k clusters) 14 a b c d e a,b d,e c,d,e a,b,c,d,e agglomerative divisive
  • 15. Prof. Pier Luca Lanzi Strengths of Hierarchical Clustering •  No need to assume any particular number of clusters •  Any desired number of clusters can be obtained by ‘cutting’ the dendogram at the proper level •  They may correspond to meaningful taxonomies •  Example in biological sciences (e.g., animal kingdom, phylogeny reconstruction, …) •  Traditional hierarchical algorithms use a similarity or distance matrix to merge or split one cluster at a time 15
  • 16. Prof. Pier Luca Lanzi Agglomerative Clustering Algorithm •  More popular hierarchical clustering technique •  Compute the proximity matrix •  Let each data point be a cluster •  Repeat § Merge the two closest clusters § Update the proximity matrix •  Until only a single cluster remains •  Key operation is the computation of the proximity of two clusters •  Different approaches to defining the distance between clusters distinguish the different algorithms 16
  • 17. Prof. Pier Luca Lanzi Hierarchical Clustering: Time and Space Requirements •  O(N2) space since it uses the proximity matrix. § N is the number of points. •  O(N3) time in many cases § There are N steps and at each step the size, N2, proximity matrix must be updated and searched § Complexity can be reduced to O(N2 log(N) ) time for some approaches 17
  • 18. Prof. Pier Luca Lanzi Efficient Implementation •  Compute the distance between all pairs of points [O(N2)] •  Insert the pairs and their distances into a priority queue to fine the min in one step [O(N2)] •  When two clusters are merged, we remove all entries in the priority queue involving one of these two clusters [O(Nlog N)] •  Compute all the distances between the new cluster and the re- maining clusters [O(NlogN)] •  Since the last two steps are executed at most N time, the complexity of the whole algorithms is O(N2logN) 18
  • 19. Prof. Pier Luca Lanzi Hierarchical Clustering in R # init the seed to be able to repeat the experiment set.seed(1234) par(mar=c(0,0,0,0)) # randomly generates the data x-rnorm(12, mean=rep(1:3,each=4), sd=0.2) y-rnorm(12, mean=rep(c(1,2,1),each=4), sd=0.2) plot(x,y,pch=19,cex=2,col=blue) # distance matrix d - data.frame(x,y) dm - dist(d) # generate the cl - hclust(dm) plot(cl) # other ways to plot dendrograms # http://rstudio-pubs-static.s3.amazonaws.com/1876_df0bf890dd54461f98719b461d987c3d.html 19
  • 20. Prof. Pier Luca Lanzi hierarchical clustering generates a set of N possible partitions which one should I choose?
  • 21. Prof. Pier Luca Lanzi From the previous lecture we know ideally a good cluster should partition points so that … Data points in the same cluster should have a small distance from one another Data points in different clusters should be at a large distance from one another.
  • 22. Prof. Pier Luca Lanzi Within/Between Clusters Sum of Squares •  Within-cluster sum of squares where μi is the centroid of cluster Ci (in case of Euclidean spaces) •  Between-cluster sum of squares where μ is the centroid of the whole dataset 22
  • 23. Prof. Pier Luca Lanzi Within/Between Clusters Sum of Squares •  Within-cluster sum of squares where μi is the centroid of cluster Ci (in case of Euclidean spaces) •  Between-cluster sum of squares where μ is the centroid of the whole dataset 23
  • 24. Prof. Pier Luca Lanzi Evaluation of Hierarchical Clustering using Knee Analysis plot the WSS and BSS for every clustering and look for a knee in the plot that show a significant modification in the evaluation metrics
  • 25. Prof. Pier Luca Lanzi Evaluation of Clustering in R library(GMD) ### ### checking the quality of the previous cluster ### # init two vectors that will contain the evaluation # in terms of within and between sum of squares plot_wss = rep(0,12) plot_bss = rep(0,12) # evaluate every clustering for(i in 1:12) { clusters - cutree(cl,i) eval - css(dm,clusters); plot_wss[i] - eval$totwss plot_bss[i] - eval$totbss } 25
  • 26. Prof. Pier Luca Lanzi Evaluation of Clustering in R # plot the results x = 1:12 plot(x, y=plot_bss, main=Between Cluster Sum-of-square, cex=2, pch=18, col=blue, xlab=Number of Clusters, ylab=Evaluation) lines(x, plot_bss, col=blue) par(new=TRUE) plot(x, y=plot_wss, cex=2, pch=19, col=red, ylab=, xlab=) lines(x,plot_wss, col=red); 26
  • 27. Prof. Pier Luca Lanzi Knee/Elbow Analysis of Clustering 27
  • 28. Prof. Pier Luca Lanzi Hierarchical Clustering in R – Iris2D library(foreign) iris = read.arff(iris.2D.arff) with(iris, plot(petallength,petalwidth, col=blue, pch=19, cex=2)) dm - dist(iris[,1:2]) cl - hclust(iris_dist, method=single) #clustering - hclust(dist(iris[,1:2],method=manhattan), method=single) plot(cl) cl_average - hclust(iris_dist, method=average) plot(clustering) cutree(clustering,2) 28
  • 29. Prof. Pier Luca Lanzi Knee/Elbow Analysis of Clustering for iris2D 29
  • 30. Prof. Pier Luca Lanzi Knee/Elbow Analysis of Clustering for iris 30
  • 31. Prof. Pier Luca Lanzi http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
  • 32. Prof. Pier Luca Lanzi distance between clusters
  • 33. Prof. Pier Luca Lanzi Initial Configuration •  Start with clusters of individual points and a proximity matrix ... p1 p2 p3 p4 p9 p10 p11 p12 p1 p3 p5 p4 p2 p1 p2 p3 p4 p5 . . . . . . Proximity Matrix 33
  • 34. Prof. Pier Luca Lanzi Intermediate Situation •  After some merging steps, we have some clusters ... p1 p2 p3 p4 p9 p10 p11 p12 C1 C4 C2 C5 C3 C2C1 C1 C3 C5 C4 C2 C3 C4 C5 Proximity Matrix 34
  • 35. Prof. Pier Luca Lanzi Intermediate Situation •  We want to merge the two closest clusters (C2 and C5) and update the proximity matrix. ... p1 p2 p3 p4 p9 p10 p11 p12 C1 C4 C2 C5 C3 C2C1 C1 C3 C5 C4 C2 C3 C4 C5 Proximity Matrix 35
  • 36. Prof. Pier Luca Lanzi After Merging •  The question is “How do we update the proximity matrix?” ... p1 p2 p3 p4 p9 p10 p11 p12 C1 C4 C2 U C5 C3 ? ? ? ? ? ? ? C2 U C5C1 C1 C3 C4 C2 U C5 C3 C4 Proximity Matrix 36
  • 37. Prof. Pier Luca Lanzi Typical Alternatives to Calculate the Distance Between Clusters •  Single link (or MIN) § smallest distance between an element in one cluster and an element in the other, i.e., d(Ci, Cj) = min(ti,p, tj,q) •  Complete link (or MAX) § largest distance between an element in one cluster and an element in the other, i.e., d(Ci, Cj) = max(ti,p, tj,q) •  Average (or group average) § average distance between an element in one cluster and an element in the other, i.e., dis(Ci, Cj) = avg(ti,p, tj,q) •  Centroid § distance between the centroids of two clusters, i.e., d(Ci, Cj) = d(μi, μj) where μi and μi are the centroids •  … 37
  • 38. Prof. Pier Luca Lanzi Similarity?
  • 39. Prof. Pier Luca Lanzi Single Linkage or MIN
  • 40. Prof. Pier Luca Lanzi Complete Linkage or MAX
  • 41. Prof. Pier Luca Lanzi Average or Group Average
  • 42. Prof. Pier Luca Lanzi Distance between Centroids × ×
  • 43. Prof. Pier Luca Lanzi Example •  Suppose we have five items, a, b, c, d, and e. •  We wanto to perform hierarchical clustering on five instances following an agglomerative approach •  First: we compute the distance or similarity matrix •  Dij is the distance between instancce “i” and “j” ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ = 0003050809 000409010 000506 0002 00 ..... .... ... .. . D 43
  • 44. Prof. Pier Luca Lanzi Example •  Group the two instances that are closer •  In this case, a and b are the closest items (D2,1=2) •  Compute again the distance matrix, and start again. •  Suppose we apply single-linkage (MIN), we need to compute the distance between the new cluster {1,2} and the others § d(12)3 = min[d13,d23] = d23 = 5.0 § d(12)4 = min[d14,d24] = d24 = 9.0 § d(12)5 = min[d15,d25] = d25 = 8.0 44
  • 45. Prof. Pier Luca Lanzi Example •  The new distance matrix is, ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ = 0.00.30.50.8 0.00.40.9 0.00.5 0.0 D 45 •  At the end, we obtain the following dendrogram
  • 46. Prof. Pier Luca Lanzi Hierarchical Clustering: Problems and Limitations •  Once a decision is made to combine two clusters, it cannot be undone •  No objective function is directly minimized •  Different schemes have problems with one or more of the following: § Sensitivity to noise and outliers § Difficulty handling different sized clusters and convex shapes § Breaking large clusters •  Major weakness of agglomerative clustering methods § do not scale well: time complexity of at least O(n2), where n is the number of total objects § can never undo what was done previously 46
  • 47. Prof. Pier Luca Lanzi how can we represent clusters?
  • 48. Prof. Pier Luca Lanzi Euclidean vs Non-Euclidean Spaces •  Euclidean Spaces § We can identify a cluster using for instance its centroid (e.g. computed as the average among all its data points) § Alternatively, we can use its convex hull •  Non-Euclidean Spaces § We can define a distance (jaccard, cosine, edit) § We cannot compute a centroid and we can introduce the concept of clustroid •  Clustroid § An existing data point that we take as a cluster representative § It can be the point that minimizes the sum of the distances to the other points in the cluster § Or, the one minimizing the maximum distance to another point § Or, the sum of the squares of the distances to the other points in the cluster 48