SlideShare a Scribd company logo
1 of 23
Download to read offline
Guided By: Presented By:
Prof. Prashant G. Ahire Miss.Poonam Kshirsagar
Roll No. 204
A SURVEY OF CLUSTERING
TECHNIQUES
FOR BIG DATA ANALYSIS
Agenda
 Problem Definition
 Objective
Literature Survey
 Big Data and it’s Analytics Challenges
 Cluster
 Criterion To Benchmark Clustering Methods
 Proposed System
 ELM
 ELM Feature Mapping Process
 ELM K-mean Algorithm
 Advantages
 Disadvantages
 Conclusion
Problem Definition:
Among various challenges in analyzing big data the major issue
is to design and develop the new techniques for clustering.
Cloud computing can be used for big data analysis but there is
problem to analyze data on cloud environment as many traditional
algorithms cannot be applied directly on cloud environment and
also there is an issue of applying scalability on traditional
algorithms, delay in result produced and accuracy of result
produced.
 These issues can be addressed by clustering techniques.
Objectives:
The objectives of the thesis are as follows:
To study the existing clustering techniques for analyzing big data.
 To propose and design an efficient clustering technique for big
data analysis.
Literature Survey:
Topic Name Keywords Abstract Author Name
A Survey of Clustering Algorithms for
Big Data: Taxonomy and Empirical
Analysis
Clustering algorithms,
unsupervised learning,
big data
we highlighted the set
of clustering
algorithms that are the
best performing for
big data.
ADIL FAHAD 1,4 ,
NAJLAA ALSHATRI
1 , ZAHIR TARI 1 ,
(Member, IEEE)
Clustering in extreme learning
machine feature space
ELM means The good properties of
the ELM feature
mapping, the
clustering problem
using ELM feature
mapping techniques is
studied in this paper.
Qing He a, n , Xin Jin
a,b , Changying Du a,b
, Fuzhen Zhuang a
A Hybrid Approach for Efficient
Clustering of Big Data
big data, Basic K-
Means Algorithm
using MapReduce,,
Basic DBSCAN
Algorithm using
MapReduce
This is presents a
theoretical overview
of some of current
clustering techniques
used for analyzing big
data
Saurabh Arora,
Department of
Computer Science and
Engineering ,Thapar
University
Patiala,India,
A Survey of Clustering Techniques for
Big Data Analysis
Big data, Clustering
Techniques, Data
Mining
In this paper we have
discussed some of the
current big data
mining clustering
techniques.
Saurabh Arora,
Inderveer, dept. of CS
What is Big data ?
Big Data means data that’s too big, too fast , or too hard for
existing tools to process.
 Too big : Peta byte-scale collection of data.
 Too fast: Processed quickly.
 Too hard: It is a catch all for data that doesn’t fit neatly
into an existing processing tools.
Fig. Evolution Of big data
Continue…
Fig.: Significant growth of Big data
Continue…
Big Data Analytics Challenges:
The main challenges for big data analytics are listed below :
Volume of data is large and also varies so challenge is how to deal with it.
Analysis of all data is required or not.
All data needs to be stored or not.
To analyze which data points are important and how to find them.
How data can be used in the best way.
What is Cluster ?
 Clustering is a division of data into groups of similar objects.
Each group, called cluster.
 Cluster consists of objects that are similar between themselves
and dissimilar to objects of other groups.
 It is one of the major techniques used for data mining.
Criterion To Benchmark Clustering Methods:
Volume : Refers large amount of data Criteria:
(i) Size of the dataset
(ii) Handling high dimensionality
(iii) Handling outliers/ noisy data
Velocity : Refers speed of processing data. Criteria :
(i) Complexity of algorithm
(ii) The run time performances
Variety: Refers to the ability to handle different types of data
(i) Type of dataset
(ii) Clusters shape.
Comparative Analysis of Current Clustering Techniques
 Partition Clustering Techniques
1.K-mean and variant partitioning techniques:
Example : K-MCI algorithm
2.Other Partitioning Techniques:
Example : Cuckoo search
 Hierarchical Clustering Techniques
Example : ACA-DTRS
FACA-DTRS
 Density Based Clustering Techniques
Example : DMM clustering algorithm
DBCURE Algorithm
 Generic Clustering Techniques:
Example : BRICH Algorithm
Proposed System:
 In the partitioning clustering techniques K-Means is being
used for past so many years.
 Now days but ELM K-means or ELM FCM is best suited
among all
 Methods as it finds best quality clusters and in less
computation time.
 ELM feature is easy to implement and it works well for
big datasets.
 Fast learning speed.
 Ease of implementation.
 Minimal human intervention.
 ELM tends to have better scalability.
Extreme Learning Machine
ELM Feature Mapping Process
Where,
1. G(ai,bi,x) is the output of the i th hidden node
2. ai is a d-dimensional weight vector between the d
input nodes and the i th hidden-node
3. bi is the bias of ith hidden-node.
 ELM will map the data into the L-dimensional ELM
feature space H, and L is the number of the hidden nodes
used in the feature mapping process
Fig.: ELM Feature Mapping Process
Continue…
•K-Means clustering problem can be described as follows:
•Given a set of observations (x1,x2,……xm) where each observation is a d-dimensional real vector
•k-Means clustering aims to partition the m observations into k sets
•so as to minimize the within-cluster sum of squares (WCSSs):
Where,
μi is theme an of point sin Si.
Continue…
ELM k-Means algorithm
Input: k : the number of clusters,
L : the number of the hidden-layer nodes,
D : a data set containing m objects.
Output : A set of k clusters.
Method :
1: Mapping the original data object sin D into the ELM feature
space H using h(x)=[H1(x),….,hi(x),…hl(x)]T ;
2: Arbitrarily choose k objects from H as the initial cluster centres;
3: repeat
4: (Re) assign each object to the cluster to which the object is the most
similar , based on the mean value of the object sin the cluster;
5: Update the cluster means , i.e. , calculate the mean value of the
objects for each cluster;
6: until no change in the cluster centres or reached the maximal
iteration number limit.
7: return A set of k clusters.
Advantages:
 ELM features are easy to implement and ELM K-means
produce better results than Mercer kernel based methods.
 The mapping is very intuitive and straight
forward
Disadvantages
 Number of nodes should be greater than 300 else
performance is not optimal.
 After studying these techniques it is observed
that still new methodologies are required for
analyzing big data as these techniques could are
not so efficient for analyzing real time and online
streaming data
Conclusion:
we have studied various clustering techniques
which are currently used for analyzing big data. All
these recent techniques are compared on the basis of
execution time and cluster quality and their merits
and demerits are provided.
Current clustering techniques

More Related Content

What's hot

Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithmhadifar
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsVarad Meru
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysisguru_prasadg
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysisguest0edcaf
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysisDataminingTools Inc
 
Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification Mahmoud Alfarra
 
3.6 constraint based cluster analysis
3.6 constraint based cluster analysis3.6 constraint based cluster analysis
3.6 constraint based cluster analysisKrish_ver2
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 

What's hot (20)

Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
Lect4
Lect4Lect4
Lect4
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Dataa miining
Dataa miiningDataa miining
Dataa miining
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Data clustering
Data clustering Data clustering
Data clustering
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification
 
3.6 constraint based cluster analysis
3.6 constraint based cluster analysis3.6 constraint based cluster analysis
3.6 constraint based cluster analysis
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 

Viewers also liked

DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering TypesAshwin Shenoy M
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data miningZHAO Sam
 
Towards modeling M&A in high tech industries
Towards modeling M&A in high tech industriesTowards modeling M&A in high tech industries
Towards modeling M&A in high tech industriesGene Moo Lee
 
A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...ijcses
 
Survey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applicationsSurvey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applicationsIAEME Publication
 
Hybrid recommender systems
Hybrid recommender systemsHybrid recommender systems
Hybrid recommender systemsrenataghisloti
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
Machine Learning and Data Mining: 08 Clustering: Hierarchical
Machine Learning and Data Mining: 08 Clustering: Hierarchical Machine Learning and Data Mining: 08 Clustering: Hierarchical
Machine Learning and Data Mining: 08 Clustering: Hierarchical Pier Luca Lanzi
 
Linear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actionsLinear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actionsHesen Peng
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesEditor IJMTER
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKAbhi Jit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RDatabricks
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungSpark Summit
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 

Viewers also liked (20)

DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering Types
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
 
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop ClustersDistributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
 
Towards modeling M&A in high tech industries
Towards modeling M&A in high tech industriesTowards modeling M&A in high tech industries
Towards modeling M&A in high tech industries
 
A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...A comparative survey based on processing network traffic data using hadoop pi...
A comparative survey based on processing network traffic data using hadoop pi...
 
Survey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applicationsSurvey on load balancing and data skew mitigation in mapreduce applications
Survey on load balancing and data skew mitigation in mapreduce applications
 
Hybrid recommender systems
Hybrid recommender systemsHybrid recommender systems
Hybrid recommender systems
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Machine Learning and Data Mining: 08 Clustering: Hierarchical
Machine Learning and Data Mining: 08 Clustering: Hierarchical Machine Learning and Data Mining: 08 Clustering: Hierarchical
Machine Learning and Data Mining: 08 Clustering: Hierarchical
 
Linear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actionsLinear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actions
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining Techniques
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 

Similar to Current clustering techniques

Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmIRJET Journal
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...Madan Golla
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
chalenges and apportunity of deep learning for big data analysis f
 chalenges and apportunity of deep learning for big data analysis f chalenges and apportunity of deep learning for big data analysis f
chalenges and apportunity of deep learning for big data analysis fmaru kindeneh
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...KamleshKumar394
 
DSA 1- Introduction.pdf
DSA 1- Introduction.pdfDSA 1- Introduction.pdf
DSA 1- Introduction.pdfAliyanAbbas1
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Experfy
 
Implementation of Improved Apriori Algorithm on Large Dataset using Hadoop
Implementation of Improved Apriori Algorithm on Large Dataset using HadoopImplementation of Improved Apriori Algorithm on Large Dataset using Hadoop
Implementation of Improved Apriori Algorithm on Large Dataset using HadoopBRNSSPublicationHubI
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxVenkateswaraBabuRavi
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1bPRAWEEN KUMAR
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxiamultapromax
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringIDES Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

Similar to Current clustering techniques (20)

Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
chalenges and apportunity of deep learning for big data analysis f
 chalenges and apportunity of deep learning for big data analysis f chalenges and apportunity of deep learning for big data analysis f
chalenges and apportunity of deep learning for big data analysis f
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
 
DSA 1- Introduction.pdf
DSA 1- Introduction.pdfDSA 1- Introduction.pdf
DSA 1- Introduction.pdf
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
5th sem
5th sem5th sem
5th sem
 
5th sem
5th sem5th sem
5th sem
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
 
Implementation of Improved Apriori Algorithm on Large Dataset using Hadoop
Implementation of Improved Apriori Algorithm on Large Dataset using HadoopImplementation of Improved Apriori Algorithm on Large Dataset using Hadoop
Implementation of Improved Apriori Algorithm on Large Dataset using Hadoop
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
 
84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b84cc04ff77007e457df6aa2b814d2346bf1b
84cc04ff77007e457df6aa2b814d2346bf1b
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means Clustering
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 

Recently uploaded

Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 

Recently uploaded (20)

Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 

Current clustering techniques

  • 1. Guided By: Presented By: Prof. Prashant G. Ahire Miss.Poonam Kshirsagar Roll No. 204 A SURVEY OF CLUSTERING TECHNIQUES FOR BIG DATA ANALYSIS
  • 2. Agenda  Problem Definition  Objective Literature Survey  Big Data and it’s Analytics Challenges  Cluster  Criterion To Benchmark Clustering Methods  Proposed System  ELM  ELM Feature Mapping Process  ELM K-mean Algorithm  Advantages  Disadvantages  Conclusion
  • 3. Problem Definition: Among various challenges in analyzing big data the major issue is to design and develop the new techniques for clustering. Cloud computing can be used for big data analysis but there is problem to analyze data on cloud environment as many traditional algorithms cannot be applied directly on cloud environment and also there is an issue of applying scalability on traditional algorithms, delay in result produced and accuracy of result produced.  These issues can be addressed by clustering techniques.
  • 4. Objectives: The objectives of the thesis are as follows: To study the existing clustering techniques for analyzing big data.  To propose and design an efficient clustering technique for big data analysis.
  • 5. Literature Survey: Topic Name Keywords Abstract Author Name A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis Clustering algorithms, unsupervised learning, big data we highlighted the set of clustering algorithms that are the best performing for big data. ADIL FAHAD 1,4 , NAJLAA ALSHATRI 1 , ZAHIR TARI 1 , (Member, IEEE) Clustering in extreme learning machine feature space ELM means The good properties of the ELM feature mapping, the clustering problem using ELM feature mapping techniques is studied in this paper. Qing He a, n , Xin Jin a,b , Changying Du a,b , Fuzhen Zhuang a A Hybrid Approach for Efficient Clustering of Big Data big data, Basic K- Means Algorithm using MapReduce,, Basic DBSCAN Algorithm using MapReduce This is presents a theoretical overview of some of current clustering techniques used for analyzing big data Saurabh Arora, Department of Computer Science and Engineering ,Thapar University Patiala,India, A Survey of Clustering Techniques for Big Data Analysis Big data, Clustering Techniques, Data Mining In this paper we have discussed some of the current big data mining clustering techniques. Saurabh Arora, Inderveer, dept. of CS
  • 6. What is Big data ? Big Data means data that’s too big, too fast , or too hard for existing tools to process.  Too big : Peta byte-scale collection of data.  Too fast: Processed quickly.  Too hard: It is a catch all for data that doesn’t fit neatly into an existing processing tools.
  • 7. Fig. Evolution Of big data Continue…
  • 8. Fig.: Significant growth of Big data Continue…
  • 9. Big Data Analytics Challenges: The main challenges for big data analytics are listed below : Volume of data is large and also varies so challenge is how to deal with it. Analysis of all data is required or not. All data needs to be stored or not. To analyze which data points are important and how to find them. How data can be used in the best way.
  • 10. What is Cluster ?  Clustering is a division of data into groups of similar objects. Each group, called cluster.  Cluster consists of objects that are similar between themselves and dissimilar to objects of other groups.  It is one of the major techniques used for data mining.
  • 11. Criterion To Benchmark Clustering Methods: Volume : Refers large amount of data Criteria: (i) Size of the dataset (ii) Handling high dimensionality (iii) Handling outliers/ noisy data Velocity : Refers speed of processing data. Criteria : (i) Complexity of algorithm (ii) The run time performances Variety: Refers to the ability to handle different types of data (i) Type of dataset (ii) Clusters shape.
  • 12. Comparative Analysis of Current Clustering Techniques  Partition Clustering Techniques 1.K-mean and variant partitioning techniques: Example : K-MCI algorithm 2.Other Partitioning Techniques: Example : Cuckoo search  Hierarchical Clustering Techniques Example : ACA-DTRS FACA-DTRS
  • 13.  Density Based Clustering Techniques Example : DMM clustering algorithm DBCURE Algorithm  Generic Clustering Techniques: Example : BRICH Algorithm
  • 14. Proposed System:  In the partitioning clustering techniques K-Means is being used for past so many years.  Now days but ELM K-means or ELM FCM is best suited among all  Methods as it finds best quality clusters and in less computation time.  ELM feature is easy to implement and it works well for big datasets.
  • 15.  Fast learning speed.  Ease of implementation.  Minimal human intervention.  ELM tends to have better scalability. Extreme Learning Machine
  • 16. ELM Feature Mapping Process Where, 1. G(ai,bi,x) is the output of the i th hidden node 2. ai is a d-dimensional weight vector between the d input nodes and the i th hidden-node 3. bi is the bias of ith hidden-node.  ELM will map the data into the L-dimensional ELM feature space H, and L is the number of the hidden nodes used in the feature mapping process
  • 17. Fig.: ELM Feature Mapping Process Continue…
  • 18. •K-Means clustering problem can be described as follows: •Given a set of observations (x1,x2,……xm) where each observation is a d-dimensional real vector •k-Means clustering aims to partition the m observations into k sets •so as to minimize the within-cluster sum of squares (WCSSs): Where, μi is theme an of point sin Si. Continue…
  • 19. ELM k-Means algorithm Input: k : the number of clusters, L : the number of the hidden-layer nodes, D : a data set containing m objects. Output : A set of k clusters. Method : 1: Mapping the original data object sin D into the ELM feature space H using h(x)=[H1(x),….,hi(x),…hl(x)]T ; 2: Arbitrarily choose k objects from H as the initial cluster centres; 3: repeat 4: (Re) assign each object to the cluster to which the object is the most similar , based on the mean value of the object sin the cluster; 5: Update the cluster means , i.e. , calculate the mean value of the objects for each cluster; 6: until no change in the cluster centres or reached the maximal iteration number limit. 7: return A set of k clusters.
  • 20. Advantages:  ELM features are easy to implement and ELM K-means produce better results than Mercer kernel based methods.  The mapping is very intuitive and straight forward
  • 21. Disadvantages  Number of nodes should be greater than 300 else performance is not optimal.  After studying these techniques it is observed that still new methodologies are required for analyzing big data as these techniques could are not so efficient for analyzing real time and online streaming data
  • 22. Conclusion: we have studied various clustering techniques which are currently used for analyzing big data. All these recent techniques are compared on the basis of execution time and cluster quality and their merits and demerits are provided.