SlideShare a Scribd company logo
1 of 11
Improving SVM classification on imbalanced
time series data sets with ghost points
Presenter: Shang-Tse Chen
Authors: Suzan Köknar-Tezel, Longin Jan Latecki
Introduction
● Imbalanced dataset is a challenge for data mining
○ always predict majority class -> high accuracy
○ often, rare events are more interesting
● Common Technique:
○ Up / Down sampling
○ SMOTE (adding synthetic points in feature space)
● This paper
○ adding synthetic points in distance space
Research Question
● For time series data
○ not intuitive to represent as features in Rn
○ distance between two sequence is non-metric
○ Cannot use SMOTE
● In many applications, pair-wise distance is more relevant
○ many classifier only need pair-wise distances,
■ eg. SVM, knn
○ many good algorithms to compute distance in time
series data, e.g. DTW, OSB, …, etc.
Research Question
● Can we add synthetic data in distance space?
● Does it improve the performance?
Methodology
● Given any two points a, b in a distance space X, we can define a
ghost point e = μ(a,b).
● For every x ∈ X, the distance from x to e, d(x, μ(a,b)) is as follows:
○ case 1: {x, a, b} is a metric, then
■ d(x, µ(a, b))2 = ½ d(x, a)2 + ½ d(x, b)2 - ¼ d(a, b)2
○ case 2: If d(a, b) > d(x, a) + d(x, b), then
■ d(x, µ(a, b)) = ½ d(a, b) - d(x, b)
○ case 3a: If d(x, a) > d(x, b) + d(a, b), then
■ d(x, µ(a, b))2 = d(x, b)2 + ¼ d(a, b)2
○ case 3b: If d(x, b) > d(x, a) + d(a, b), then
■ d(x, µ(a, b))2 = d(x, a)2 + ¼ d(a, b)2
Data Collection, Processing
● UCR Time series datasets
○ Use 17 datasets from various domains
○ number of classes range from 2 to 50
● MPEG-7
○ 1400 binary images consisting of 70 object classes
○ within each class there are 20 shapes
○ each shape is represented with 100 equidistant sample points on the contour
○ these points are converted into sequences by calculating the curvature of each point with
respect to its five neighbors on each side.
○ this yields 1400 sequences, each of length 100
○ this transformation is invariant to rotation and scale
Key Results
● UCR Data Sets and OSB
● Shaded results indicate best
performers
● the darker the shade,
the larger the difference
Key Results
● UCR Data Sets and DTW
Key Results
● MPEG-7 dataset
Summary
● Proposed a new approach for over-sampling the minority
class of imbalanced data
● Unlike other feature based methods, the ghost points
are added in distance space.
● Ghost points can be added to non-metric distance space
○ Can be used with DTW, OSB, and many more.
● Empirical results show significant improvement
Critique of work
● For large-scale data, over-sampling is time consuming
● Introduce another parameters, i.e. the number of
ghost points that we should add
● May not perform well in highly noisy data

More Related Content

What's hot

Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...ArchiLab 7
 
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...Computer Science Club
 
Generalized Notions of Data Depth
Generalized Notions of Data DepthGeneralized Notions of Data Depth
Generalized Notions of Data DepthMukund Raj
 
Application of Dijkstra Algorithm in Robot path planning
Application of Dijkstra Algorithm in Robot path planningApplication of Dijkstra Algorithm in Robot path planning
Application of Dijkstra Algorithm in Robot path planningDarling Jemima
 
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...Menlo Systems GmbH
 
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...CvilleDataScience
 
Minicourse on Network Science
Minicourse on Network ScienceMinicourse on Network Science
Minicourse on Network SciencePavel Loskot
 
Deep single view 3 d object reconstruction with visual hull
Deep single view 3 d object reconstruction with visual hullDeep single view 3 d object reconstruction with visual hull
Deep single view 3 d object reconstruction with visual hullHanqing Wang
 
Modelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphModelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphGraph-TA
 
15 chapter9 graph_algorithms_mst
15 chapter9 graph_algorithms_mst15 chapter9 graph_algorithms_mst
15 chapter9 graph_algorithms_mstSSE_AndyLi
 
Multidimension Scaling and Isomap
Multidimension Scaling and IsomapMultidimension Scaling and Isomap
Multidimension Scaling and IsomapCheng-Shiang Li
 

What's hot (20)

Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
 
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
 
Discrete time signals on MATLAB
Discrete time signals on MATLABDiscrete time signals on MATLAB
Discrete time signals on MATLAB
 
Assignment 1
Assignment 1Assignment 1
Assignment 1
 
Generalized Notions of Data Depth
Generalized Notions of Data DepthGeneralized Notions of Data Depth
Generalized Notions of Data Depth
 
Application of Dijkstra Algorithm in Robot path planning
Application of Dijkstra Algorithm in Robot path planningApplication of Dijkstra Algorithm in Robot path planning
Application of Dijkstra Algorithm in Robot path planning
 
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...
 
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Minicourse on Network Science
Minicourse on Network ScienceMinicourse on Network Science
Minicourse on Network Science
 
Line Detection
Line DetectionLine Detection
Line Detection
 
Representation
RepresentationRepresentation
Representation
 
Quiz 2
Quiz 2Quiz 2
Quiz 2
 
Deep single view 3 d object reconstruction with visual hull
Deep single view 3 d object reconstruction with visual hullDeep single view 3 d object reconstruction with visual hull
Deep single view 3 d object reconstruction with visual hull
 
R programmingmilano
R programmingmilanoR programmingmilano
R programmingmilano
 
Modelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graphModelling the Clustering Coefficient of a Random graph
Modelling the Clustering Coefficient of a Random graph
 
Hubba Deep Learning
Hubba Deep LearningHubba Deep Learning
Hubba Deep Learning
 
Deep Learning meetup
Deep Learning meetupDeep Learning meetup
Deep Learning meetup
 
15 chapter9 graph_algorithms_mst
15 chapter9 graph_algorithms_mst15 chapter9 graph_algorithms_mst
15 chapter9 graph_algorithms_mst
 
Multidimension Scaling and Isomap
Multidimension Scaling and IsomapMultidimension Scaling and Isomap
Multidimension Scaling and Isomap
 

Viewers also liked

Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetJoachim Gwoke
 
Smoking soujanya
Smoking soujanyaSmoking soujanya
Smoking soujanyaBBKuhn
 
Revista digital pdf jorge pinzon
Revista digital pdf jorge pinzonRevista digital pdf jorge pinzon
Revista digital pdf jorge pinzonJorge Pinzon Cuervo
 
Presentation yamin
Presentation yaminPresentation yamin
Presentation yaminBBKuhn
 
Sound shredding moustafa
Sound shredding moustafaSound shredding moustafa
Sound shredding moustafaBBKuhn
 
2014.chi.structured labeling to facilitate concept evolution in machine learning
2014.chi.structured labeling to facilitate concept evolution in machine learning2014.chi.structured labeling to facilitate concept evolution in machine learning
2014.chi.structured labeling to facilitate concept evolution in machine learningBBKuhn
 
SAVE_Memb_Certificate.jpg
SAVE_Memb_Certificate.jpgSAVE_Memb_Certificate.jpg
SAVE_Memb_Certificate.jpgSAAD al-ZUBAIDI
 
Determinar la ecuación general de la circunferencia que pasa por el punto A(-...
Determinar la ecuación general de la circunferencia que pasa por el punto A(-...Determinar la ecuación general de la circunferencia que pasa por el punto A(-...
Determinar la ecuación general de la circunferencia que pasa por el punto A(-...Sergio Damian Reinoso Rivadeneira
 

Viewers also liked (10)

Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Smoking soujanya
Smoking soujanyaSmoking soujanya
Smoking soujanya
 
Revista digital pdf jorge pinzon
Revista digital pdf jorge pinzonRevista digital pdf jorge pinzon
Revista digital pdf jorge pinzon
 
Presentation yamin
Presentation yaminPresentation yamin
Presentation yamin
 
Sound shredding moustafa
Sound shredding moustafaSound shredding moustafa
Sound shredding moustafa
 
2014.chi.structured labeling to facilitate concept evolution in machine learning
2014.chi.structured labeling to facilitate concept evolution in machine learning2014.chi.structured labeling to facilitate concept evolution in machine learning
2014.chi.structured labeling to facilitate concept evolution in machine learning
 
banheiras
banheirasbanheiras
banheiras
 
SAVE_Memb_Certificate.jpg
SAVE_Memb_Certificate.jpgSAVE_Memb_Certificate.jpg
SAVE_Memb_Certificate.jpg
 
Business Resume
Business ResumeBusiness Resume
Business Resume
 
Determinar la ecuación general de la circunferencia que pasa por el punto A(-...
Determinar la ecuación general de la circunferencia que pasa por el punto A(-...Determinar la ecuación general de la circunferencia que pasa por el punto A(-...
Determinar la ecuación general de la circunferencia que pasa por el punto A(-...
 

Similar to Md2k 0219 shang

Making BIG DATA smaller
Making BIG DATA smallerMaking BIG DATA smaller
Making BIG DATA smallerTony Tran
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...ssuser2624f71
 
Vectorise all the things
Vectorise all the thingsVectorise all the things
Vectorise all the thingsJodieBurchell1
 
Neural Network Approximation.pdf
Neural Network Approximation.pdfNeural Network Approximation.pdf
Neural Network Approximation.pdfbvhrs2
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAminaRepo
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics PipelineMark Kilgard
 
Learning multifractal structure in large networks (Purdue ML Seminar)
Learning multifractal structure in large networks (Purdue ML Seminar)Learning multifractal structure in large networks (Purdue ML Seminar)
Learning multifractal structure in large networks (Purdue ML Seminar)Austin Benson
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesSreedhar Chowdam
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackarogozhnikov
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DLLeapMind Inc
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentationRishavSharma112
 
Vectorise all the things - long version.pptx
Vectorise all the things - long version.pptxVectorise all the things - long version.pptx
Vectorise all the things - long version.pptxJodieBurchell1
 
Noisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyNoisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyOlivier Teytaud
 
Chromatic Sparse Learning
Chromatic Sparse LearningChromatic Sparse Learning
Chromatic Sparse LearningDatabricks
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1VitAnhNguyn94
 

Similar to Md2k 0219 shang (20)

Making BIG DATA smaller
Making BIG DATA smallerMaking BIG DATA smaller
Making BIG DATA smaller
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
Vectorise all the things
Vectorise all the thingsVectorise all the things
Vectorise all the things
 
Neural Network Approximation.pdf
Neural Network Approximation.pdfNeural Network Approximation.pdf
Neural Network Approximation.pdf
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics Pipeline
 
Learning multifractal structure in large networks (Purdue ML Seminar)
Learning multifractal structure in large networks (Purdue ML Seminar)Learning multifractal structure in large networks (Purdue ML Seminar)
Learning multifractal structure in large networks (Purdue ML Seminar)
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture Notes
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic track
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DL
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
Vectorise all the things - long version.pptx
Vectorise all the things - long version.pptxVectorise all the things - long version.pptx
Vectorise all the things - long version.pptx
 
Noisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyNoisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) Survey
 
DBSCAN
DBSCANDBSCAN
DBSCAN
 
Chromatic Sparse Learning
Chromatic Sparse LearningChromatic Sparse Learning
Chromatic Sparse Learning
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1
 

More from BBKuhn

Md2 k 04_19_2015
Md2 k 04_19_2015Md2 k 04_19_2015
Md2 k 04_19_2015BBKuhn
 
March19 tun
March19 tunMarch19 tun
March19 tunBBKuhn
 
March12 rahman
March12 rahmanMarch12 rahman
March12 rahmanBBKuhn
 
March12 natarajan
March12 natarajanMarch12 natarajan
March12 natarajanBBKuhn
 
March12 chatterjee
March12 chatterjeeMarch12 chatterjee
March12 chatterjeeBBKuhn
 
March12 alzantot
March12 alzantotMarch12 alzantot
March12 alzantotBBKuhn
 
March5 gao
March5 gaoMarch5 gao
March5 gaoBBKuhn
 
March5 bargar
March5 bargarMarch5 bargar
March5 bargarBBKuhn
 
MD2K Presentation to Stanford Mobilize (1/22/15)
MD2K Presentation to Stanford Mobilize (1/22/15)MD2K Presentation to Stanford Mobilize (1/22/15)
MD2K Presentation to Stanford Mobilize (1/22/15)BBKuhn
 

More from BBKuhn (9)

Md2 k 04_19_2015
Md2 k 04_19_2015Md2 k 04_19_2015
Md2 k 04_19_2015
 
March19 tun
March19 tunMarch19 tun
March19 tun
 
March12 rahman
March12 rahmanMarch12 rahman
March12 rahman
 
March12 natarajan
March12 natarajanMarch12 natarajan
March12 natarajan
 
March12 chatterjee
March12 chatterjeeMarch12 chatterjee
March12 chatterjee
 
March12 alzantot
March12 alzantotMarch12 alzantot
March12 alzantot
 
March5 gao
March5 gaoMarch5 gao
March5 gao
 
March5 bargar
March5 bargarMarch5 bargar
March5 bargar
 
MD2K Presentation to Stanford Mobilize (1/22/15)
MD2K Presentation to Stanford Mobilize (1/22/15)MD2K Presentation to Stanford Mobilize (1/22/15)
MD2K Presentation to Stanford Mobilize (1/22/15)
 

Recently uploaded

Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 

Recently uploaded (20)

Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 

Md2k 0219 shang

  • 1. Improving SVM classification on imbalanced time series data sets with ghost points Presenter: Shang-Tse Chen Authors: Suzan Köknar-Tezel, Longin Jan Latecki
  • 2. Introduction ● Imbalanced dataset is a challenge for data mining ○ always predict majority class -> high accuracy ○ often, rare events are more interesting ● Common Technique: ○ Up / Down sampling ○ SMOTE (adding synthetic points in feature space) ● This paper ○ adding synthetic points in distance space
  • 3. Research Question ● For time series data ○ not intuitive to represent as features in Rn ○ distance between two sequence is non-metric ○ Cannot use SMOTE ● In many applications, pair-wise distance is more relevant ○ many classifier only need pair-wise distances, ■ eg. SVM, knn ○ many good algorithms to compute distance in time series data, e.g. DTW, OSB, …, etc.
  • 4. Research Question ● Can we add synthetic data in distance space? ● Does it improve the performance?
  • 5. Methodology ● Given any two points a, b in a distance space X, we can define a ghost point e = μ(a,b). ● For every x ∈ X, the distance from x to e, d(x, μ(a,b)) is as follows: ○ case 1: {x, a, b} is a metric, then ■ d(x, µ(a, b))2 = ½ d(x, a)2 + ½ d(x, b)2 - ¼ d(a, b)2 ○ case 2: If d(a, b) > d(x, a) + d(x, b), then ■ d(x, µ(a, b)) = ½ d(a, b) - d(x, b) ○ case 3a: If d(x, a) > d(x, b) + d(a, b), then ■ d(x, µ(a, b))2 = d(x, b)2 + ¼ d(a, b)2 ○ case 3b: If d(x, b) > d(x, a) + d(a, b), then ■ d(x, µ(a, b))2 = d(x, a)2 + ¼ d(a, b)2
  • 6. Data Collection, Processing ● UCR Time series datasets ○ Use 17 datasets from various domains ○ number of classes range from 2 to 50 ● MPEG-7 ○ 1400 binary images consisting of 70 object classes ○ within each class there are 20 shapes ○ each shape is represented with 100 equidistant sample points on the contour ○ these points are converted into sequences by calculating the curvature of each point with respect to its five neighbors on each side. ○ this yields 1400 sequences, each of length 100 ○ this transformation is invariant to rotation and scale
  • 7. Key Results ● UCR Data Sets and OSB ● Shaded results indicate best performers ● the darker the shade, the larger the difference
  • 8. Key Results ● UCR Data Sets and DTW
  • 10. Summary ● Proposed a new approach for over-sampling the minority class of imbalanced data ● Unlike other feature based methods, the ghost points are added in distance space. ● Ghost points can be added to non-metric distance space ○ Can be used with DTW, OSB, and many more. ● Empirical results show significant improvement
  • 11. Critique of work ● For large-scale data, over-sampling is time consuming ● Introduce another parameters, i.e. the number of ghost points that we should add ● May not perform well in highly noisy data