SlideShare a Scribd company logo
1 of 17
Outlier Analysis
1
Outlier Analysis
 Outlier – data objects that are grossly different from or
inconsistent with the remaining set of data
 Causes
 Measurement / Execution errors
 Inherent data variability
 Outliers – maybe valuable patterns
 Fraud detection
 Customized marketing
 Medical Analysis
2
Outlier Mining
 Given n data points and k – expected number of
outliers find the top k dissimilar objects
 Define inconsistent data
 Residuals in Regression
 Difficulties – Multi-dimensional data, non-numeric data
 Mine the outliers
 Visualization based methods
 Not applicable to cyclic plots, high dimensional data and categorical data
 Approaches
 Statistical Approach
 Distance-based approach
 Density based outlier approach
 Deviation based approach
3
Statistical Distribution-based Outlier
detection
 Assumes data follows a probability distribution and uses
discordancy test
 Discordancy testing
 Working hypothesis – H: oi ∈ F i=1,2,..n
 Test verifies whether an object oi is significantly different from F
 Significance probability SP(vi) = Prob(T>vi)
 IF SP is small oi is discordant and working hypothesis is rejected
and alternate hypothesis that oi comes from another distribution
model G is adopted
4
Statistical Distribution-based Outlier
detection
 Alternative distributions
 Inherent alternative distribution
 Alternative hypothesis: All objects arise from another distribution G
 Mixture alternative distribution
 Discordant values are not outliers but contaminants from G H’: oi ∈ (1-
λ) F + λG i=1,2,..n
 Slippage alternative distribution
 Some Objects are independent observations from a modified version
of F (different parameters)
5
Statistical Distribution-based Outlier
detection
 Procedures for detecting Outliers
 Block procedures
 All are outliers or all are consistent
 Consecutive Procedures
 Inside-out procedure: Least likely object is tested first
 If it is an outlier – more extreme values are also considered as outliers
 Disadvantages of Statistical Approach
 Tests are for single attributes
 Data distribution may not be known
6
Distance based Outlier Detection
 Distance-based outlier
 A DB(p, D)-outlier is an object O in a dataset T such that at least
a fraction p of the objects in T lies at a distance greater than D
from O
 Object does not have enough neighbours
 Avoids excessive computation of Statistical models
 If an object is an outlier according to a discordancy test then o is
DB(p, D) outlier for some p and D
7
Distance based Outlier Detection
 Index based Algorithm
 Uses multi-dimensional indexing structures such as k-d trees and R-trees
 M – maximum number of objects within dmin neighborhood
 Once M+1 neighbours are found o is not an outlier
 O(n2
k) apart from index construction
 Nested loop algorithm
 Avoids index construction
 Tries to minimize I/Os
 Divides memory buffer space into two halves and data set into several logical
blocks
8
Distance based Outlier Detection
 Cell based Algorithm
 Complexity : O(ck
+n) c- depends on number of cells ; k – dimensionality
 Data space is partitioned into cells: dmin / 2√k
 Two layers surround each cell
 First layer – One cell thick
 Second layer -  2√k-1  cells thick
 Algorithm processes cells instead of objects
 Maintains three counts: cell_count, cell_+_1_layer_count,
cell_+_2_layers_count
 An object in a cell is an outlier if cell_+_1_layer_count <= M, if not, no
objects in the cell are outliers
 If cell_+_2_layers_count, <= M then all objects in cell – Outliers
 If > M some may be outliers
 Object by object processing has to be done
9
Density based Outlier detection
 Previous methods assume data are uniformly
distributed
 Data may have different density distributions
 Difficulty in choosing dmin
10
Density based Outlier detection
 Local Outlier – if its outlying relative to its local
neighbourhood particularly wrt the density of the
neighborhood
 O2 is a local outlier wrt C2; o1 is also an outlier; none of the objects
in C1 are treated as outliers
 Considers degree to which an object is an outlier
 Local Outlier factor – degree depends on how isolated the object is
wrt its surroundings
11
Density based Outlier detection
 The k-distance of an object p is the maximal distance that p gets
from its k-nearest neighbors d(p, o)
 there are at least k objects in D that are as close as or closer to p than o;
for k o’ d(p, o’) <= d(p, o)
 there are at most k-1 objects that are closer to p than o; for k-1 o” d(p,
o”) < d(p, o)
 k-distance neighborhood
 contains every object whose distance is not greater than the MinPts (k)-
distance of p
 The reachability distance of an object p with respect to object o, is
defined as reach_distMinPts(p, o) = max { MinPts-distance(o), d(p, o) }
12
OPTICS
 Complexity : O(n log n)
13
Density based Outlier detection
 Local reachability density of p is the inverse of the
average reachability density based on the MinPts-
nearest neighbors of p.
 Local outlier factor (LOF) of p captures the degree to
which we call p an outlier.
 It is the average of the ratio of the local reachability density of p
and those of p’s MinPts-nearest neighbors.
 LOF is higher for outliers
14
Deviation based Outlier detection
 Identifies outliers by examining the main characteristics
of objects in a group
 Objects that “deviate” from this description are
considered outliers
 Sequential exception technique
 Simulates the way in which humans can distinguish unusual
objects from among a series of supposedly like objects
15
 Sequential exception technique
 Given a data set D a sequence of subsets {D1, D2, ..Dm} is built
such that Dj-1 ⊆ Dj; Dissimilarities are assessed between
subsets in the sequence
 Exception Set – Smallest subset of objects whose removal
results in greatest reduction of dissimilarity
 Dissimilarity function – 1/n ∑i=1
n
(xi-x’)2
 Smoothing factor: Assesses how much the dissimilarity can be
reduced by removing the subset from the original set of objects
 Can be repeated to avoid the influence of order
16
Deviation based Outlier detection
Deviation based Outlier detection
 OLAP Data Cube technique
 Uses data cubes to identify regions of anomalies
 A cell value in a cube is an exception if it differs
significantly from an expected value
 Visualization effects guide user
 May drill down
17

More Related Content

What's hot

2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classificationKrish_ver2
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA BoostAman Patel
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detectionShantanuDeosthale
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha
 
Data Reduction
Data ReductionData Reduction
Data ReductionRajan Shah
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data MiningKamal Acharya
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Salah Amean
 

What's hot (20)

Chapter8
Chapter8Chapter8
Chapter8
 
Outlier Detection
Outlier DetectionOutlier Detection
Outlier Detection
 
Clustering
ClusteringClustering
Clustering
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Chapter 12 outlier
Chapter 12 outlierChapter 12 outlier
Chapter 12 outlier
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
 

Viewers also liked

OS Lab: Introduction to Linux
OS Lab: Introduction to LinuxOS Lab: Introduction to Linux
OS Lab: Introduction to LinuxMotaz Saad
 
مقدمة في تكنواوجيا المعلومات
مقدمة في تكنواوجيا المعلوماتمقدمة في تكنواوجيا المعلومات
مقدمة في تكنواوجيا المعلوماتMotaz Saad
 
Cross Language Concept Mining
Cross Language Concept Mining Cross Language Concept Mining
Cross Language Concept Mining Motaz Saad
 
Hewahi, saad 2006 - class outliers mining distance-based approach
Hewahi, saad   2006 - class outliers mining distance-based approachHewahi, saad   2006 - class outliers mining distance-based approach
Hewahi, saad 2006 - class outliers mining distance-based approachMotaz Saad
 
Intel 64bit Architecture
Intel 64bit ArchitectureIntel 64bit Architecture
Intel 64bit ArchitectureMotaz Saad
 
Assembly Language Lecture 5
Assembly Language Lecture 5Assembly Language Lecture 5
Assembly Language Lecture 5Motaz Saad
 
Browsing The Source Code of Linux Packages
Browsing The Source Code of Linux PackagesBrowsing The Source Code of Linux Packages
Browsing The Source Code of Linux PackagesMotaz Saad
 
Class Outlier Mining
Class Outlier MiningClass Outlier Mining
Class Outlier MiningMotaz Saad
 
Browsing Linux Kernel Source
Browsing Linux Kernel SourceBrowsing Linux Kernel Source
Browsing Linux Kernel SourceMotaz Saad
 
The x86 Family
The x86 FamilyThe x86 Family
The x86 FamilyMotaz Saad
 
Open Source Business Models
Open Source Business ModelsOpen Source Business Models
Open Source Business ModelsMotaz Saad
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data miningDevakumar Jain
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsMotaz Saad
 
Assembly Language Lecture 4
Assembly Language Lecture 4Assembly Language Lecture 4
Assembly Language Lecture 4Motaz Saad
 
Assembly Language Lecture 3
Assembly Language Lecture 3Assembly Language Lecture 3
Assembly Language Lecture 3Motaz Saad
 
Structured Vs, Object Oriented Analysis and Design
Structured Vs, Object Oriented Analysis and DesignStructured Vs, Object Oriented Analysis and Design
Structured Vs, Object Oriented Analysis and DesignMotaz Saad
 
Introduction to CLIPS Expert System
Introduction to CLIPS Expert SystemIntroduction to CLIPS Expert System
Introduction to CLIPS Expert SystemMotaz Saad
 

Viewers also liked (17)

OS Lab: Introduction to Linux
OS Lab: Introduction to LinuxOS Lab: Introduction to Linux
OS Lab: Introduction to Linux
 
مقدمة في تكنواوجيا المعلومات
مقدمة في تكنواوجيا المعلوماتمقدمة في تكنواوجيا المعلومات
مقدمة في تكنواوجيا المعلومات
 
Cross Language Concept Mining
Cross Language Concept Mining Cross Language Concept Mining
Cross Language Concept Mining
 
Hewahi, saad 2006 - class outliers mining distance-based approach
Hewahi, saad   2006 - class outliers mining distance-based approachHewahi, saad   2006 - class outliers mining distance-based approach
Hewahi, saad 2006 - class outliers mining distance-based approach
 
Intel 64bit Architecture
Intel 64bit ArchitectureIntel 64bit Architecture
Intel 64bit Architecture
 
Assembly Language Lecture 5
Assembly Language Lecture 5Assembly Language Lecture 5
Assembly Language Lecture 5
 
Browsing The Source Code of Linux Packages
Browsing The Source Code of Linux PackagesBrowsing The Source Code of Linux Packages
Browsing The Source Code of Linux Packages
 
Class Outlier Mining
Class Outlier MiningClass Outlier Mining
Class Outlier Mining
 
Browsing Linux Kernel Source
Browsing Linux Kernel SourceBrowsing Linux Kernel Source
Browsing Linux Kernel Source
 
The x86 Family
The x86 FamilyThe x86 Family
The x86 Family
 
Open Source Business Models
Open Source Business ModelsOpen Source Business Models
Open Source Business Models
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
 
Assembly Language Lecture 4
Assembly Language Lecture 4Assembly Language Lecture 4
Assembly Language Lecture 4
 
Assembly Language Lecture 3
Assembly Language Lecture 3Assembly Language Lecture 3
Assembly Language Lecture 3
 
Structured Vs, Object Oriented Analysis and Design
Structured Vs, Object Oriented Analysis and DesignStructured Vs, Object Oriented Analysis and Design
Structured Vs, Object Oriented Analysis and Design
 
Introduction to CLIPS Expert System
Introduction to CLIPS Expert SystemIntroduction to CLIPS Expert System
Introduction to CLIPS Expert System
 

Similar to 3.7 outlier analysis

Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detectionguest76d673
 
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxData Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxrandyburney60861
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionKhalid Elshafie
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detectionguest0edcaf
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataIJERA Editor
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptSubrata Kumer Paul
 
Anomaly Detection in DataMining
Anomaly Detection in DataMiningAnomaly Detection in DataMining
Anomaly Detection in DataMiningBilalAbbasAwan
 
similarities-knn-1.ppt
similarities-knn-1.pptsimilarities-knn-1.ppt
similarities-knn-1.pptsatvikpatil5
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
 
Local Outlier Detection with Interpretation
Local Outlier Detection with InterpretationLocal Outlier Detection with Interpretation
Local Outlier Detection with InterpretationDaiki Tanaka
 

Similar to 3.7 outlier analysis (20)

Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detection
 
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxData Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly Detection
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional Data
 
Data wrangling week 10
Data wrangling week 10Data wrangling week 10
Data wrangling week 10
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.ppt
 
12 outlier
12 outlier12 outlier
12 outlier
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Anomaly Detection in DataMining
Anomaly Detection in DataMiningAnomaly Detection in DataMining
Anomaly Detection in DataMining
 
Cluster
ClusterCluster
Cluster
 
similarities-knn-1.ppt
similarities-knn-1.pptsimilarities-knn-1.ppt
similarities-knn-1.ppt
 
Kdd08 abod
Kdd08 abodKdd08 abod
Kdd08 abod
 
angle based outlier de
angle based outlier deangle based outlier de
angle based outlier de
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Local Outlier Detection with Interpretation
Local Outlier Detection with InterpretationLocal Outlier Detection with Interpretation
Local Outlier Detection with Interpretation
 

More from Krish_ver2

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back trackingKrish_ver2
 
5.5 back track
5.5 back track5.5 back track
5.5 back trackKrish_ver2
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02Krish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithmKrish_ver2
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03Krish_ver2
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programmingKrish_ver2
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-iKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquerKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02Krish_ver2
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing extKrish_ver2
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashingKrish_ver2
 

More from Krish_ver2 (20)

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back tracking
 
5.5 back track
5.5 back track5.5 back track
5.5 back track
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithm
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programming
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquer
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
5.1 greedy 03
5.1 greedy 035.1 greedy 03
5.1 greedy 03
 
4.4 hashing02
4.4 hashing024.4 hashing02
4.4 hashing02
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing ext
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
 
4.2 bst
4.2 bst4.2 bst
4.2 bst
 

Recently uploaded

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 

Recently uploaded (20)

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 

3.7 outlier analysis

  • 2. Outlier Analysis  Outlier – data objects that are grossly different from or inconsistent with the remaining set of data  Causes  Measurement / Execution errors  Inherent data variability  Outliers – maybe valuable patterns  Fraud detection  Customized marketing  Medical Analysis 2
  • 3. Outlier Mining  Given n data points and k – expected number of outliers find the top k dissimilar objects  Define inconsistent data  Residuals in Regression  Difficulties – Multi-dimensional data, non-numeric data  Mine the outliers  Visualization based methods  Not applicable to cyclic plots, high dimensional data and categorical data  Approaches  Statistical Approach  Distance-based approach  Density based outlier approach  Deviation based approach 3
  • 4. Statistical Distribution-based Outlier detection  Assumes data follows a probability distribution and uses discordancy test  Discordancy testing  Working hypothesis – H: oi ∈ F i=1,2,..n  Test verifies whether an object oi is significantly different from F  Significance probability SP(vi) = Prob(T>vi)  IF SP is small oi is discordant and working hypothesis is rejected and alternate hypothesis that oi comes from another distribution model G is adopted 4
  • 5. Statistical Distribution-based Outlier detection  Alternative distributions  Inherent alternative distribution  Alternative hypothesis: All objects arise from another distribution G  Mixture alternative distribution  Discordant values are not outliers but contaminants from G H’: oi ∈ (1- λ) F + λG i=1,2,..n  Slippage alternative distribution  Some Objects are independent observations from a modified version of F (different parameters) 5
  • 6. Statistical Distribution-based Outlier detection  Procedures for detecting Outliers  Block procedures  All are outliers or all are consistent  Consecutive Procedures  Inside-out procedure: Least likely object is tested first  If it is an outlier – more extreme values are also considered as outliers  Disadvantages of Statistical Approach  Tests are for single attributes  Data distribution may not be known 6
  • 7. Distance based Outlier Detection  Distance-based outlier  A DB(p, D)-outlier is an object O in a dataset T such that at least a fraction p of the objects in T lies at a distance greater than D from O  Object does not have enough neighbours  Avoids excessive computation of Statistical models  If an object is an outlier according to a discordancy test then o is DB(p, D) outlier for some p and D 7
  • 8. Distance based Outlier Detection  Index based Algorithm  Uses multi-dimensional indexing structures such as k-d trees and R-trees  M – maximum number of objects within dmin neighborhood  Once M+1 neighbours are found o is not an outlier  O(n2 k) apart from index construction  Nested loop algorithm  Avoids index construction  Tries to minimize I/Os  Divides memory buffer space into two halves and data set into several logical blocks 8
  • 9. Distance based Outlier Detection  Cell based Algorithm  Complexity : O(ck +n) c- depends on number of cells ; k – dimensionality  Data space is partitioned into cells: dmin / 2√k  Two layers surround each cell  First layer – One cell thick  Second layer -  2√k-1  cells thick  Algorithm processes cells instead of objects  Maintains three counts: cell_count, cell_+_1_layer_count, cell_+_2_layers_count  An object in a cell is an outlier if cell_+_1_layer_count <= M, if not, no objects in the cell are outliers  If cell_+_2_layers_count, <= M then all objects in cell – Outliers  If > M some may be outliers  Object by object processing has to be done 9
  • 10. Density based Outlier detection  Previous methods assume data are uniformly distributed  Data may have different density distributions  Difficulty in choosing dmin 10
  • 11. Density based Outlier detection  Local Outlier – if its outlying relative to its local neighbourhood particularly wrt the density of the neighborhood  O2 is a local outlier wrt C2; o1 is also an outlier; none of the objects in C1 are treated as outliers  Considers degree to which an object is an outlier  Local Outlier factor – degree depends on how isolated the object is wrt its surroundings 11
  • 12. Density based Outlier detection  The k-distance of an object p is the maximal distance that p gets from its k-nearest neighbors d(p, o)  there are at least k objects in D that are as close as or closer to p than o; for k o’ d(p, o’) <= d(p, o)  there are at most k-1 objects that are closer to p than o; for k-1 o” d(p, o”) < d(p, o)  k-distance neighborhood  contains every object whose distance is not greater than the MinPts (k)- distance of p  The reachability distance of an object p with respect to object o, is defined as reach_distMinPts(p, o) = max { MinPts-distance(o), d(p, o) } 12
  • 13. OPTICS  Complexity : O(n log n) 13
  • 14. Density based Outlier detection  Local reachability density of p is the inverse of the average reachability density based on the MinPts- nearest neighbors of p.  Local outlier factor (LOF) of p captures the degree to which we call p an outlier.  It is the average of the ratio of the local reachability density of p and those of p’s MinPts-nearest neighbors.  LOF is higher for outliers 14
  • 15. Deviation based Outlier detection  Identifies outliers by examining the main characteristics of objects in a group  Objects that “deviate” from this description are considered outliers  Sequential exception technique  Simulates the way in which humans can distinguish unusual objects from among a series of supposedly like objects 15
  • 16.  Sequential exception technique  Given a data set D a sequence of subsets {D1, D2, ..Dm} is built such that Dj-1 ⊆ Dj; Dissimilarities are assessed between subsets in the sequence  Exception Set – Smallest subset of objects whose removal results in greatest reduction of dissimilarity  Dissimilarity function – 1/n ∑i=1 n (xi-x’)2  Smoothing factor: Assesses how much the dissimilarity can be reduced by removing the subset from the original set of objects  Can be repeated to avoid the influence of order 16 Deviation based Outlier detection
  • 17. Deviation based Outlier detection  OLAP Data Cube technique  Uses data cubes to identify regions of anomalies  A cell value in a cube is an exception if it differs significantly from an expected value  Visualization effects guide user  May drill down 17