SlideShare a Scribd company logo
1 of 27
Data Mining
Presented By: Sunawar Khan
Reg No: 813-MSCS-F14
Clustering
• Clustering is a process of partitioning a set of
data(objects) in a set of meaningful sub
classes, called clusters.
• Cluster is a collection of objects that are
similar to each other.
• Unsupervised classification (no predefined
classes).
Example
Clustering Algorithms
• Are attractive for the task of class identification.
1. Partitioning Methods
2. Hierarchical Methods
3. Density Based Methods
4. Grid Based Methods
5. Model Based Methods
Density Based Methods
• Based on notion of density
• Density-based clustering algorithm that grows
regions with sufficiently high density into clusters.
• The idea is to continue growing the given cluster as
long as the density (# of data points) in the
neighborhood exceeds some threshold. Namely, the
neighborhood of a given radius has to contain at
least a minimum number of objects.
• Discover clusters of arbitrary shape
• Handle noise
Density Based Methods
• Clustering based on density (local cluster criterion), such as
density-connected points
• Major features:
– Discover clusters of arbitrary shape
– Handle noise
– One scan
– Need density parameters as termination condition
• Several interesting studies:
– DBSCAN: Ester, et al. (KDD’96)
– OPTICS: Ankerst, et al (SIGMOD’99).
– DENCLUE: Hinneburg & D. Keim (KDD’98)
– CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-based)
6
Density based Notion of Clusters
• Def:1 (Eps-neighborhood of a point)
• The Eps neighborhood of a point p, denoted by
NEps(P), is defined:
NEps(P) = {q E D I dist(p,q) < = Eps}.
• A naive approach could require for each point in a
cluster that there are at least a minimum number
(MinPts) of points in an Eps-neighborhood of that
point.
Def:2 (directly density reachable)
• A point p is directly density-reachable from a point q
wrt. Eps, MinPts if
• 1) p є NEps(q)
• 2) I NEps(q) l > = MinPts (core point condition).
• Def:4(density connected)
• A point p is density connected to a point q wrt. Eps and
MinPts if there is a point o such that both, p and q are
density-reachable from o wrt. Eps and MinPts. Density-
connectivity is a symmetric relation. Now, we are able to
define our density-based notion of a cluster. cluster is
defined to be a set of density connected points which is
maximal wrt. density-reachability. Noise is simply the set
of points in D not belonging to any of its clusters.
Def:5 (Cluster)
Let D be a database of points. A cluster C wrt. Eps and MinPts is a
non-empty subset of D satisfying the following conditions:
1) ɏ p, q: if p E C and q is density-reachable from p wrt. Eps and
MinPts, then q E C. (Maximality)
2) ɏ p, q є C: p is density-connected to q wrt. EPS and MinPts.
(Connectivity)
Def:6 (Noise)
Let C 1 ..... Ck be the clusters of the database D wrt. parameters Eps i
and MinPts i, i = 1 ..... k. Then we define the noise as the set of points
in the database D not belonging to any cluster C i, i.e.
noise = {p E D I ɏ i: p !є Ci)
Lemmas for validating the correctness of
our clustering algorithm
Lemma 1: Let p be a point in D and INEps(p)l > MinPts. Then
the
set O = {o I o E D and o is density-reachable from p wrt. Eps
and MinPts } is a cluster wrt. Eps and MinPts.
• It is not obvious that a cluster C wrt. Eps and MinPts is
uniquely determined by any of its core points. However,
each point in C is density-reachable from any of the core
points of C and, therefore, a cluster C contains exactly the
points which are density-reachable from an arbitrary
core point of C.
Lemmas for validating the correctness of our
clustering algorithm
Lemma 2:
• Let C be a cluster wrt. Eps and MinPts and let p be
any point in C with INEps(P)l >= MinPts.
• Then C equals to the
set O = {o I o is density-connected from p wrt. Eps and
MinPts }.
Algorithm
• Arbitrary select a point p
• Retrieve all points density-reachable from p w.r.t. Eps and
MinPts
• If p is a core point, a cluster is formed
• If p is a border point, no points are density-reachable from p
and DBSCAN visits the next point of the database
• Continue the process until all of the points have been
processed
• If a spatial index is used, the computational complexity of
DBSCAN is O(nlogn), where n is the number of database
objects. Otherwise, the complexity is O(n2)
Comparisons (DBSCAN vs. CLARANS)
• the DBSCAN algorithm is compared to another
clustering algorithm. This one is called CLARANS
(Clustering Large Applications based on RANdomized
Search).
• It is an improvement of the k-medoid algorithms.
• The good properties compared to k-medoid are that
CLARANS works efficient for databases with about a
thousand objects. When the database grows larger,
CLARANS will fall behind because the algorithm
temporarily stores all the objects in the main
memory, i.e. the run time will increase.
Complexity
• DBSCAN visits each point of the database, possibly multiple
times. For practical considerations, time complexity is mostly
governed by the number of regionQuery invocations. DBSCAN
executes exactly one such query for each point, and if
an indexing structure is used that executes such
aneighborhood query in O(log n), an overall runtime
complexity of O(n log n) is obtained.
• Without the use of an accelerating index structure, the run
time complexity is O(n²). Often the distance matrix of size (n²-
n)/2 is materialized to avoid distance recomputations. This
however also needs O(n²) memory, whereas a non-matrix
based implementation only needs O(n) memory.
Advantages
• DBSCAN does not require one to specify the number of
clusters in the data a priori, as opposed to k-means.
• DBSCAN can find arbitrarily shaped clusters.
• DBSCAN requires just two parameters and is mostly
insensitive to the ordering of the points in the database.
• DBSCAN has a notion of noise, and is robust to outliers
• DBSCAN is designed for use with databases that can
accelerate region queries, e.g. using an R* tree.
Disadvantages
• DBSCAN is not entirely deterministic: border points that are
reachable from more than one cluster can be part of either
cluster. Fortunately, this situation does not arise often, and
has little impact on the clustering result: both on core points
and noise points, DBSCAN is deterministic.
• The quality of DBSCAN depends on the distance measure used
in the function regionQuery(P,ε). The most common distance
metric used is Euclidean distance (making it difficult to find an
appropriate value for ε. This effect, however, is also present in
any other algorithm based on Euclidean distance.)
• DBSCAN cannot cluster data sets well with large differences in
densities
Extensions
• Generalized DBSCAN (GDBSCAN)is a generalization by the
same authors to arbitrary "neighborhood" and "dense"
predicates.
• DBSCAN algorithm have been proposed, including methods
for parallelization, parameter estimation and support for
uncertain data. The basic idea has been extended to
hierarchical clustering by the OPTICS algorithm.
• HDBSCANis a hierarchical version of DBSCAN which is also
faster than OPTICS, from which a flat partition consisting of
most prominent clusters can be extracted from the hierarchy.

More Related Content

What's hot (20)

Greedy Algorihm
Greedy AlgorihmGreedy Algorihm
Greedy Algorihm
 
A presentation on prim's and kruskal's algorithm
A presentation on prim's and kruskal's algorithmA presentation on prim's and kruskal's algorithm
A presentation on prim's and kruskal's algorithm
 
K means clustering
K means clusteringK means clustering
K means clustering
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Branch and bound
Branch and boundBranch and bound
Branch and bound
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Top down parsing
Top down parsingTop down parsing
Top down parsing
 
Backtracking
BacktrackingBacktracking
Backtracking
 
Branch and bound
Branch and boundBranch and bound
Branch and bound
 
KNN
KNN KNN
KNN
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Decision tree
Decision treeDecision tree
Decision tree
 

Viewers also liked

3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methodsKrish_ver2
 
Overview of human resource management system & function
Overview of human resource management  system & functionOverview of human resource management  system & function
Overview of human resource management system & functionRita Choudhary
 
Role of HR Manager
Role of HR ManagerRole of HR Manager
Role of HR ManagerCreativeHRM
 
Functions and Activities of HRM
Functions and Activities of HRMFunctions and Activities of HRM
Functions and Activities of HRMSharon Geroquia
 
Hr functions and strategy ppt
Hr functions and strategy pptHr functions and strategy ppt
Hr functions and strategy pptLOLITA GANDIA
 

Viewers also liked (7)

HR FUNCTIONS
HR FUNCTIONSHR FUNCTIONS
HR FUNCTIONS
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
Overview of human resource management system & function
Overview of human resource management  system & functionOverview of human resource management  system & function
Overview of human resource management system & function
 
Role of HR Manager
Role of HR ManagerRole of HR Manager
Role of HR Manager
 
hrm functions
hrm functionshrm functions
hrm functions
 
Functions and Activities of HRM
Functions and Activities of HRMFunctions and Activities of HRM
Functions and Activities of HRM
 
Hr functions and strategy ppt
Hr functions and strategy pptHr functions and strategy ppt
Hr functions and strategy ppt
 

Similar to Db Scan

3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptxNANDHINIS900805
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
Clustering Algorithm by Vishal.pdf
Clustering Algorithm by Vishal.pdfClustering Algorithm by Vishal.pdf
Clustering Algorithm by Vishal.pdfRenasHDarweesh
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxShwetapadmaBabu1
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in RSudhakar Chavan
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)Pravinkumar Landge
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptxJK970901
 
Graph and Density Based Clustering
Graph and Density Based ClusteringGraph and Density Based Clustering
Graph and Density Based ClusteringAyushAnand105
 
Parallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangParallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangChinmay Patel
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)Cory Cook
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningPyingkodi Maran
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 

Similar to Db Scan (20)

3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Clustering Algorithm by Vishal.pdf
Clustering Algorithm by Vishal.pdfClustering Algorithm by Vishal.pdf
Clustering Algorithm by Vishal.pdf
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 
DBSCAN
DBSCANDBSCAN
DBSCAN
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
Data Mining Lecture_7.pptx
Data Mining Lecture_7.pptxData Mining Lecture_7.pptx
Data Mining Lecture_7.pptx
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
Graph and Density Based Clustering
Graph and Density Based ClusteringGraph and Density Based Clustering
Graph and Density Based Clustering
 
Parallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangParallel kmeans clustering in Erlang
Parallel kmeans clustering in Erlang
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Unsupervised Learning in Machine Learning
Unsupervised Learning in Machine LearningUnsupervised Learning in Machine Learning
Unsupervised Learning in Machine Learning
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 

More from International Islamic University (20)

Hash tables
Hash tablesHash tables
Hash tables
 
Binary Search Tree
Binary Search TreeBinary Search Tree
Binary Search Tree
 
Graph 1
Graph 1Graph 1
Graph 1
 
Graph 2
Graph 2Graph 2
Graph 2
 
Graph 3
Graph 3Graph 3
Graph 3
 
Greedy algorithm
Greedy algorithmGreedy algorithm
Greedy algorithm
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 
Quick sort
Quick sortQuick sort
Quick sort
 
Merge sort
Merge sortMerge sort
Merge sort
 
Linear timesorting
Linear timesortingLinear timesorting
Linear timesorting
 
Facial Expression Recognitino
Facial Expression RecognitinoFacial Expression Recognitino
Facial Expression Recognitino
 
Lecture#4
Lecture#4Lecture#4
Lecture#4
 
Lecture#3
Lecture#3 Lecture#3
Lecture#3
 
Lecture#2
Lecture#2 Lecture#2
Lecture#2
 
Case study
Case studyCase study
Case study
 
Arrays
ArraysArrays
Arrays
 
Pcb
PcbPcb
Pcb
 
Data transmission
Data transmissionData transmission
Data transmission
 
Basic organization of computer
Basic organization of computerBasic organization of computer
Basic organization of computer
 
Sorting techniques
Sorting techniquesSorting techniques
Sorting techniques
 

Recently uploaded

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 

Recently uploaded (20)

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 

Db Scan

  • 1. Data Mining Presented By: Sunawar Khan Reg No: 813-MSCS-F14
  • 2. Clustering • Clustering is a process of partitioning a set of data(objects) in a set of meaningful sub classes, called clusters. • Cluster is a collection of objects that are similar to each other. • Unsupervised classification (no predefined classes).
  • 4. Clustering Algorithms • Are attractive for the task of class identification. 1. Partitioning Methods 2. Hierarchical Methods 3. Density Based Methods 4. Grid Based Methods 5. Model Based Methods
  • 5. Density Based Methods • Based on notion of density • Density-based clustering algorithm that grows regions with sufficiently high density into clusters. • The idea is to continue growing the given cluster as long as the density (# of data points) in the neighborhood exceeds some threshold. Namely, the neighborhood of a given radius has to contain at least a minimum number of objects. • Discover clusters of arbitrary shape • Handle noise
  • 6. Density Based Methods • Clustering based on density (local cluster criterion), such as density-connected points • Major features: – Discover clusters of arbitrary shape – Handle noise – One scan – Need density parameters as termination condition • Several interesting studies: – DBSCAN: Ester, et al. (KDD’96) – OPTICS: Ankerst, et al (SIGMOD’99). – DENCLUE: Hinneburg & D. Keim (KDD’98) – CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-based) 6
  • 7. Density based Notion of Clusters • Def:1 (Eps-neighborhood of a point) • The Eps neighborhood of a point p, denoted by NEps(P), is defined: NEps(P) = {q E D I dist(p,q) < = Eps}. • A naive approach could require for each point in a cluster that there are at least a minimum number (MinPts) of points in an Eps-neighborhood of that point.
  • 8.
  • 9. Def:2 (directly density reachable) • A point p is directly density-reachable from a point q wrt. Eps, MinPts if • 1) p є NEps(q) • 2) I NEps(q) l > = MinPts (core point condition).
  • 10. • Def:4(density connected) • A point p is density connected to a point q wrt. Eps and MinPts if there is a point o such that both, p and q are density-reachable from o wrt. Eps and MinPts. Density- connectivity is a symmetric relation. Now, we are able to define our density-based notion of a cluster. cluster is defined to be a set of density connected points which is maximal wrt. density-reachability. Noise is simply the set of points in D not belonging to any of its clusters.
  • 11.
  • 12. Def:5 (Cluster) Let D be a database of points. A cluster C wrt. Eps and MinPts is a non-empty subset of D satisfying the following conditions: 1) ɏ p, q: if p E C and q is density-reachable from p wrt. Eps and MinPts, then q E C. (Maximality) 2) ɏ p, q є C: p is density-connected to q wrt. EPS and MinPts. (Connectivity) Def:6 (Noise) Let C 1 ..... Ck be the clusters of the database D wrt. parameters Eps i and MinPts i, i = 1 ..... k. Then we define the noise as the set of points in the database D not belonging to any cluster C i, i.e. noise = {p E D I ɏ i: p !є Ci)
  • 13.
  • 14.
  • 15.
  • 16. Lemmas for validating the correctness of our clustering algorithm Lemma 1: Let p be a point in D and INEps(p)l > MinPts. Then the set O = {o I o E D and o is density-reachable from p wrt. Eps and MinPts } is a cluster wrt. Eps and MinPts. • It is not obvious that a cluster C wrt. Eps and MinPts is uniquely determined by any of its core points. However, each point in C is density-reachable from any of the core points of C and, therefore, a cluster C contains exactly the points which are density-reachable from an arbitrary core point of C.
  • 17. Lemmas for validating the correctness of our clustering algorithm Lemma 2: • Let C be a cluster wrt. Eps and MinPts and let p be any point in C with INEps(P)l >= MinPts. • Then C equals to the set O = {o I o is density-connected from p wrt. Eps and MinPts }.
  • 18. Algorithm • Arbitrary select a point p • Retrieve all points density-reachable from p w.r.t. Eps and MinPts • If p is a core point, a cluster is formed • If p is a border point, no points are density-reachable from p and DBSCAN visits the next point of the database • Continue the process until all of the points have been processed • If a spatial index is used, the computational complexity of DBSCAN is O(nlogn), where n is the number of database objects. Otherwise, the complexity is O(n2)
  • 19.
  • 20.
  • 21. Comparisons (DBSCAN vs. CLARANS) • the DBSCAN algorithm is compared to another clustering algorithm. This one is called CLARANS (Clustering Large Applications based on RANdomized Search). • It is an improvement of the k-medoid algorithms. • The good properties compared to k-medoid are that CLARANS works efficient for databases with about a thousand objects. When the database grows larger, CLARANS will fall behind because the algorithm temporarily stores all the objects in the main memory, i.e. the run time will increase.
  • 22.
  • 23.
  • 24. Complexity • DBSCAN visits each point of the database, possibly multiple times. For practical considerations, time complexity is mostly governed by the number of regionQuery invocations. DBSCAN executes exactly one such query for each point, and if an indexing structure is used that executes such aneighborhood query in O(log n), an overall runtime complexity of O(n log n) is obtained. • Without the use of an accelerating index structure, the run time complexity is O(n²). Often the distance matrix of size (n²- n)/2 is materialized to avoid distance recomputations. This however also needs O(n²) memory, whereas a non-matrix based implementation only needs O(n) memory.
  • 25. Advantages • DBSCAN does not require one to specify the number of clusters in the data a priori, as opposed to k-means. • DBSCAN can find arbitrarily shaped clusters. • DBSCAN requires just two parameters and is mostly insensitive to the ordering of the points in the database. • DBSCAN has a notion of noise, and is robust to outliers • DBSCAN is designed for use with databases that can accelerate region queries, e.g. using an R* tree.
  • 26. Disadvantages • DBSCAN is not entirely deterministic: border points that are reachable from more than one cluster can be part of either cluster. Fortunately, this situation does not arise often, and has little impact on the clustering result: both on core points and noise points, DBSCAN is deterministic. • The quality of DBSCAN depends on the distance measure used in the function regionQuery(P,ε). The most common distance metric used is Euclidean distance (making it difficult to find an appropriate value for ε. This effect, however, is also present in any other algorithm based on Euclidean distance.) • DBSCAN cannot cluster data sets well with large differences in densities
  • 27. Extensions • Generalized DBSCAN (GDBSCAN)is a generalization by the same authors to arbitrary "neighborhood" and "dense" predicates. • DBSCAN algorithm have been proposed, including methods for parallelization, parameter estimation and support for uncertain data. The basic idea has been extended to hierarchical clustering by the OPTICS algorithm. • HDBSCANis a hierarchical version of DBSCAN which is also faster than OPTICS, from which a flat partition consisting of most prominent clusters can be extracted from the hierarchy.