SlideShare a Scribd company logo
1 of 33
Download to read offline
Parallel K-Means
ncku Tien-Yang Wu
outline
Introduction
K-Means Algorithm
Parallel K-Means Based on MapReduce
Experimental Results
K-Means on spark
Introduction
They assume that all objects can reside in main
memory at the same time.
Their parallel systems have provided restricted
programming models.
Introduction
They assume that all objects can reside in main
memory at the same time.
Their parallel systems have provided restricted
programming models.
dataset oriented parallel clustering algorithms should be
developed.
K-Means Algorithm
K-Means Algorithm
Firstly, it randomly selects k objects from the whole objects
which represent initial cluster centers.
K-Means Algorithm
Each remaining object is assigned to the cluster to which it
is the most similar, based on the distance between the
object and the cluster center.
K-Means Algorithm
The new mean for each cluster is then calculated. This
process iterates until the criterion function converges.
Parallel K-Means Based
on MapReduce
most intensive calculation to occur is the calculation of
distances.
each iteration require nk distance
Parallel K-Means Based
on MapReduce
the distance computations between one object with the
centers is irrelevant to the distance computations
between other objects with the corresponding centers.
distance computations between different objects with
centers can be parallel executed.
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
data target
1,1
2,2
3,3
11,11
12,12
13,13
1 class
2 class
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
random two centroid
c1:(1,1)
c2:(11,11)
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
store two nodes
c1:(1,1)
c2:(11,11)
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
1,1
12,12
3,3
11,11
2,2
13,13
node1
node2
c1:(1,1)
c2:(11,11)
Parallel K-Means Based
on MapReduce
1,1
2,2
3,3
11,11
12,12
13,13
1,1
12,12
3,3
11,11
2,2
13,13
node1
node2
c1:(1,1)
c2:(11,11)
map
map
combine
combine
reduce
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
node1
map
3,3
c1:(1,1)
c2:(11,11)
assign to c1(1,1)
(1,1) , {(3,3),(3,3)}
key value
output<key,value>
Parallel K-Means Based
on MapReduce
(1,1) , {(3,3),(3,3)}
key value
centroid
temporary to calculate new centroid, the object
(1,1)
{(3,3),(3,3)}
output<key,value>
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
node1
map
c1:(1,1)
c2:(11,11)
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
11,11
2,2
13,13
node1
node2
c1:(1,1)
c2:(11,11)
map
map
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
(11,11) , {(11,11),(11,11)}
(1,1) , {(2,2),(2,2)}
key value
(11,11) , {(13,13),(13,13)}
Parallel K-Means Based
on MapReduce
1,1
12,12
3,3
node1 c1:(1,1)
c2:(11,11)
map
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
combine
Parallel K-Means Based
on MapReduce
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
combine
(1,1) , {(4,4),{(1,1),(3,3),2}
(11,11) , {(12,12),(12,12),1}
key value
same key combine
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(11,11) , {(12,12),(12,12),1}
key value
output<key,value>
centroid
temporary to calculate new centroid, the objects
,number of objects
(1,1)
{(4,4),{(1,1),(3,3)},2}
Parallel K-Means Based
on MapReduce
combine
(1,1) , {(1,1),(1,1)}
(11,11) , {(12,12),(12,12)}
(1,1) , {(3,3),(3,3)}
key value
(11,11) , {(11,11),(11,11)}
(1,1) , {(2,2),(2,2)}
key value
(11,11) , {(13,13),(13,13)}
combine
(1,1) , {(4,4),{(1,1),(3,3)},2}
(11,11) , {(12,12),(12,12),1}
key value
(1,1) , {(2,2),(2,2),1}
(11,11) , {(24,24),{(11,11),(13,13)},2}
key value
Parallel K-Means Based
on MapReduce
reduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(11,11) , {(12,12),(12,12),1}
key value
(1,1) , {(2,2),(2,2),1}
(11,11) , {(24,24),{(11,11),(13,13)},2}
key value
same key reduce
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(1,1) , {(2,2),(2,2),1}
reduce
same key reduce
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(1,1) , {(2,2),(2,2),1}
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
(4+2)/(2+1) ,(4+2)/(2+1) = 2,2
2,2 = new centroid
1,1
2,2
3,3
centroid is 2,2
Parallel K-Means Based
on MapReduce
(1,1) , {(4,4),{(1,1),(3,3)},2}
(1,1) , {(2,2),(2,2),1}
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
centroid
new centroid, the objects
,new cluster
Parallel K-Means Based
on MapReduce
reduce
(1,1) , {(2,2),{(1,1),(2,2),(3,3)}
(11,11) , {(12,12),{(11,11),(12,12),(13,13)}
update new centroid and next iteration
until converge or arrive to iteration number
Experimental Results
two 2.8 GHz cores and 4GB of memory
Experimental Results
Speedup
non linear,communication cost
Experimental Results
Scale up
performance
datasets 1GB 2GB 3GB 4GB
K-Means on spark
Reference
Parallel K-Means Clustering Based on MapReduce
Weizhong Zhao1,2, Huifang Ma1,2, and Qing He1
2009
K means algorithm

More Related Content

What's hot

daa-unit-3-greedy method
daa-unit-3-greedy methoddaa-unit-3-greedy method
daa-unit-3-greedy method
hodcsencet
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
DESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMSDESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMS
Gayathri Gaayu
 
01 knapsack using backtracking
01 knapsack using backtracking01 knapsack using backtracking
01 knapsack using backtracking
mandlapure
 
Daa:Dynamic Programing
Daa:Dynamic ProgramingDaa:Dynamic Programing
Daa:Dynamic Programing
rupali_2bonde
 

What's hot (20)

Calculus in Machine Learning
Calculus in Machine Learning Calculus in Machine Learning
Calculus in Machine Learning
 
A Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM AlgorithmA Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM Algorithm
 
Binomial Heaps and Fibonacci Heaps
Binomial Heaps and Fibonacci HeapsBinomial Heaps and Fibonacci Heaps
Binomial Heaps and Fibonacci Heaps
 
Tsp is NP-Complete
Tsp is NP-CompleteTsp is NP-Complete
Tsp is NP-Complete
 
daa-unit-3-greedy method
daa-unit-3-greedy methoddaa-unit-3-greedy method
daa-unit-3-greedy method
 
Radial Basis Function
Radial Basis FunctionRadial Basis Function
Radial Basis Function
 
email transaction network analysis
email transaction network analysisemail transaction network analysis
email transaction network analysis
 
Text classification presentation
Text classification presentationText classification presentation
Text classification presentation
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Dimensionality reduction with UMAP
Dimensionality reduction with UMAPDimensionality reduction with UMAP
Dimensionality reduction with UMAP
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Ch 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdfCh 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdf
 
U-Net (1).pptx
U-Net (1).pptxU-Net (1).pptx
U-Net (1).pptx
 
Backpropagation: Understanding How to Update ANNs Weights Step-by-Step
Backpropagation: Understanding How to Update ANNs Weights Step-by-StepBackpropagation: Understanding How to Update ANNs Weights Step-by-Step
Backpropagation: Understanding How to Update ANNs Weights Step-by-Step
 
DESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMSDESIGN AND ANALYSIS OF ALGORITHMS
DESIGN AND ANALYSIS OF ALGORITHMS
 
01 knapsack using backtracking
01 knapsack using backtracking01 knapsack using backtracking
01 knapsack using backtracking
 
Computer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC AlgorithmComputer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC Algorithm
 
Pandas Dataframe reading data Kirti final.pptx
Pandas Dataframe reading data  Kirti final.pptxPandas Dataframe reading data  Kirti final.pptx
Pandas Dataframe reading data Kirti final.pptx
 
Daa:Dynamic Programing
Daa:Dynamic ProgramingDaa:Dynamic Programing
Daa:Dynamic Programing
 
Naive bayesian classification
Naive bayesian classificationNaive bayesian classification
Naive bayesian classification
 

Viewers also liked

Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoop
Tianwei Liu
 
Hadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringHadoop Design and k -Means Clustering
Hadoop Design and k -Means Clustering
George Ang
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
Edureka!
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
Subhas Kumar Ghosh
 
Spark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, alticSpark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, altic
ALTIC Altic
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2
Tianwei Liu
 

Viewers also liked (20)

Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoop
 
Hadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringHadoop Design and k -Means Clustering
Hadoop Design and k -Means Clustering
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Optimization for iterative queries on Mapreduce
Optimization for iterative queries on MapreduceOptimization for iterative queries on Mapreduce
Optimization for iterative queries on Mapreduce
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text Clustering
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
MachineLearning_MPI_vs_Spark
MachineLearning_MPI_vs_SparkMachineLearning_MPI_vs_Spark
MachineLearning_MPI_vs_Spark
 
Collaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopCollaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on Hadoop
 
Spark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, alticSpark Bi-Clustering - OW2 Big Data Initiative, altic
Spark Bi-Clustering - OW2 Big Data Initiative, altic
 
K means
K meansK means
K means
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
 

Similar to Parallel-kmeans

Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
Jonny Daenen
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracing
zukun
 

Similar to Parallel-kmeans (20)

The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph Generation
 
Determining the k in k-means with MapReduce
Determining the k in k-means with MapReduceDetermining the k in k-means with MapReduce
Determining the k in k-means with MapReduce
 
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection ProblemTheories and Engineering Technics of 2D-to-3D Back-Projection Problem
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
 
Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
Knn solution
Knn solutionKnn solution
Knn solution
 
M0174491101
M0174491101M0174491101
M0174491101
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracing
 
An analysis between exact and approximate algorithms for the k-center proble...
An analysis between exact and approximate algorithms for the  k-center proble...An analysis between exact and approximate algorithms for the  k-center proble...
An analysis between exact and approximate algorithms for the k-center proble...
 
Visualization of general defined space data
Visualization of general defined space dataVisualization of general defined space data
Visualization of general defined space data
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...
 
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
Understanding the Differences between the erfc(x) and the Q(z) functions: A S...
 

More from Tien-Yang (Aiden) Wu

More from Tien-Yang (Aiden) Wu (11)

Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 
沒有想像中簡單的簡單分類器 Knn
沒有想像中簡單的簡單分類器 Knn沒有想像中簡單的簡單分類器 Knn
沒有想像中簡單的簡單分類器 Knn
 
Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
RDD
RDDRDD
RDD
 
Semantic ui教學
Semantic ui教學Semantic ui教學
Semantic ui教學
 
響應式網頁教學
響應式網頁教學響應式網頁教學
響應式網頁教學
 
NoSQL & JSON
NoSQL & JSONNoSQL & JSON
NoSQL & JSON
 
Weebly上手教學
Weebly上手教學Weebly上手教學
Weebly上手教學
 
簡易爬蟲製作和Pttcrawler
簡易爬蟲製作和Pttcrawler簡易爬蟲製作和Pttcrawler
簡易爬蟲製作和Pttcrawler
 
Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設
 

Recently uploaded

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

Parallel-kmeans

  • 2. outline Introduction K-Means Algorithm Parallel K-Means Based on MapReduce Experimental Results K-Means on spark
  • 3. Introduction They assume that all objects can reside in main memory at the same time. Their parallel systems have provided restricted programming models.
  • 4. Introduction They assume that all objects can reside in main memory at the same time. Their parallel systems have provided restricted programming models. dataset oriented parallel clustering algorithms should be developed.
  • 6. K-Means Algorithm Firstly, it randomly selects k objects from the whole objects which represent initial cluster centers.
  • 7. K-Means Algorithm Each remaining object is assigned to the cluster to which it is the most similar, based on the distance between the object and the cluster center.
  • 8. K-Means Algorithm The new mean for each cluster is then calculated. This process iterates until the criterion function converges.
  • 9. Parallel K-Means Based on MapReduce most intensive calculation to occur is the calculation of distances. each iteration require nk distance
  • 10. Parallel K-Means Based on MapReduce the distance computations between one object with the centers is irrelevant to the distance computations between other objects with the corresponding centers. distance computations between different objects with centers can be parallel executed.
  • 11. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 data target 1,1 2,2 3,3 11,11 12,12 13,13 1 class 2 class
  • 12. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 random two centroid c1:(1,1) c2:(11,11)
  • 13. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 store two nodes c1:(1,1) c2:(11,11)
  • 14. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11)
  • 15. Parallel K-Means Based on MapReduce 1,1 2,2 3,3 11,11 12,12 13,13 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11) map map combine combine reduce
  • 16. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 map 3,3 c1:(1,1) c2:(11,11) assign to c1(1,1) (1,1) , {(3,3),(3,3)} key value output<key,value>
  • 17. Parallel K-Means Based on MapReduce (1,1) , {(3,3),(3,3)} key value centroid temporary to calculate new centroid, the object (1,1) {(3,3),(3,3)} output<key,value>
  • 18. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 map c1:(1,1) c2:(11,11) (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value
  • 19. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 11,11 2,2 13,13 node1 node2 c1:(1,1) c2:(11,11) map map (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value (11,11) , {(11,11),(11,11)} (1,1) , {(2,2),(2,2)} key value (11,11) , {(13,13),(13,13)}
  • 20. Parallel K-Means Based on MapReduce 1,1 12,12 3,3 node1 c1:(1,1) c2:(11,11) map (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value combine
  • 21. Parallel K-Means Based on MapReduce (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value combine (1,1) , {(4,4),{(1,1),(3,3),2} (11,11) , {(12,12),(12,12),1} key value same key combine
  • 22. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value output<key,value> centroid temporary to calculate new centroid, the objects ,number of objects (1,1) {(4,4),{(1,1),(3,3)},2}
  • 23. Parallel K-Means Based on MapReduce combine (1,1) , {(1,1),(1,1)} (11,11) , {(12,12),(12,12)} (1,1) , {(3,3),(3,3)} key value (11,11) , {(11,11),(11,11)} (1,1) , {(2,2),(2,2)} key value (11,11) , {(13,13),(13,13)} combine (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value (1,1) , {(2,2),(2,2),1} (11,11) , {(24,24),{(11,11),(13,13)},2} key value
  • 24. Parallel K-Means Based on MapReduce reduce (1,1) , {(4,4),{(1,1),(3,3)},2} (11,11) , {(12,12),(12,12),1} key value (1,1) , {(2,2),(2,2),1} (11,11) , {(24,24),{(11,11),(13,13)},2} key value same key reduce
  • 25. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} reduce same key reduce (1,1) , {(2,2),{(1,1),(2,2),(3,3)}
  • 26. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (4+2)/(2+1) ,(4+2)/(2+1) = 2,2 2,2 = new centroid 1,1 2,2 3,3 centroid is 2,2
  • 27. Parallel K-Means Based on MapReduce (1,1) , {(4,4),{(1,1),(3,3)},2} (1,1) , {(2,2),(2,2),1} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (1,1) , {(2,2),{(1,1),(2,2),(3,3)} centroid new centroid, the objects ,new cluster
  • 28. Parallel K-Means Based on MapReduce reduce (1,1) , {(2,2),{(1,1),(2,2),(3,3)} (11,11) , {(12,12),{(11,11),(12,12),(13,13)} update new centroid and next iteration until converge or arrive to iteration number
  • 29. Experimental Results two 2.8 GHz cores and 4GB of memory
  • 33. Reference Parallel K-Means Clustering Based on MapReduce Weizhong Zhao1,2, Huifang Ma1,2, and Qing He1 2009 K means algorithm