Successfully reported this slideshow.
Upcoming SlideShare
×

of

34

Share

# Data Mining: clustering and analysis

Data Mining: clustering and analysis

See all

See all

### Data Mining: clustering and analysis

1. 1. Clustering and Analysis in Data Mining<br />
2. 2. What is Clustering?<br />The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering.<br />
3. 3. Why Clustering?<br />Scalability<br />Ability to deal with different types of attributes<br />Discovery of clusters with arbitrary shape<br />Minimal requirements for domain knowledge to determine input parameters<br />Ability to deal with noisy data<br />Incremental clustering and insensitivity to the order of input records:<br />High dimensionality<br />Constraint-based clustering<br />Interpretability and usability<br />
4. 4.  Data types in Cluster Analysis<br />Data matrix (or object-by-variable structure)<br />Interval-Scaled Variables<br />Binary Variables<br />A categorical variable<br />A discrete ordinal variable<br />A ratio-scaled variable<br />
5. 5. Methods used in clustering:<br />Partitioning method.<br />Hierarchical method.<br />Data Density based method.<br />Grid based method.<br />Model Based method.<br />
6. 6. Hierarchical methods in clustering<br /> There are two types of hierarchical clustering methods:<br />Agglomerative hierarchical clustering<br />Divisive hierarchical clustering<br />
7. 7. Agglomerative hierarchical clustering<br />This bottom-up strategy starts by placing each object in its own cluster and then merges these atomic clusters into larger and larger clusters, until all of the objects are in a single cluster or until certain termination conditions are satisfied.<br />
8. 8. Divisive hierarchical clustering<br />This top-down strategy does the reverse of agglomerative hierarchical clustering by starting with all objects in one cluster. It subdivides the cluster into smaller and smaller pieces, until each object forms a cluster on its own or until it satisfies certain termination conditions, such as a desired number of clusters is obtained or the diameter of each cluster is within a certain threshold.<br />
9. 9. Density-Based methods in clustering<br />DBSCAN: A Density-Based Clustering Method Based on Connected Regions withSufficiently High Density<br />OPTICS: Ordering Points to Identify the Clustering Structure<br />DENCLUE: Clustering Based on Density Distribution Functions<br />
10. 10. Grid-Based methods in clustering<br />STING: Statistical information gridSTING is a grid-based multi resolution clustering technique in which the spatial area is divided into rectangular cells.<br />Wave Cluster: Clustering Using Wavelet TransformationWave Cluster is a multi resolution clustering algorithm that first summarizes the data by imposing a multidimensional grid structure onto the data space. It then uses a wavelet transformation to transform the original feature space, finding dense regions in the transformed space<br />
11. 11. Model-Based Clustering Methods<br />Expectation-Maximization<br />Conceptual Clustering<br />Neural Network Approach<br />
12. 12. Methods of Clustering High-Dimensional Data<br />CLIQUE: A Dimension-Growth Subspace Clustering MethodCLIQUE (CLustering In QUEst) was the first algorithm proposed for dimension-growth subspace clustering in high-dimensional space.<br />PROCLUS: A Dimension-Reduction Subspace Clustering MethodPROCLUS (PROjected CLUStering) is a typical dimension-reduction subspace clustering method. That is, instead of starting from single-dimensional spaces, it starts by finding an initial approximation of the clusters in the high-dimensional attribute space. Each dimension is then assigned a weight for each cluster, and the updated weights are used in the next iteration to regenerate the clusters.<br />
13. 13. Constraint-Based Cluster Analysis<br /> Constraint-based clustering finds clusters that satisfy user-specified preferences or constraints, few categories of constraints are :<br />Constraints on individual objects<br />Constraints on the selection of clustering parameters<br />Constraints on distance or similarity functions<br />User-specified constraints on the properties of individual clusters<br />Semi-supervised clustering based on “partial” supervision<br />
14. 14. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at www.dataminingtools.net<br />
• #### PoojaPatil341

Apr. 29, 2020
• #### ajaysrivastava002

Apr. 14, 2020
• #### imtiazashraf29

Jan. 24, 2020
• #### ShaikAlthaf16

Dec. 16, 2019
• #### Sirwans12345

Nov. 18, 2019

Nov. 9, 2019

Apr. 8, 2019
• #### ShaikFardhinvali

Feb. 26, 2019

Jun. 5, 2018
• #### nommanshaik

Apr. 22, 2018

Apr. 3, 2018
• #### GayatriAkula1

Dec. 24, 2017
• #### usharay1

Dec. 20, 2017

Dec. 8, 2017
• #### pavithrachandrasekar3

Nov. 24, 2017
• #### SamiullahMosafar

Nov. 23, 2017
• #### KurellaManikanta

Sep. 20, 2017
• #### zendu1

Aug. 23, 2017

Jun. 8, 2017
• #### EashanDeshmukh

Apr. 30, 2017

Data Mining: clustering and analysis

Total views

24,155

On Slideshare

0

From embeds

0

Number of embeds

125

0

Shares

0