Md2k 0219 shang

•Download as PPTX, PDF•

0 likes•153 views

BBKuhn

Paper presentation

Science

Improving SVM classification on imbalanced
time series data sets with ghost points
Presenter: Shang-Tse Chen
Authors: Suzan Köknar-Tezel, Longin Jan Latecki

Introduction
● Imbalanced dataset is a challenge for data mining
○ always predict majority class -> high accuracy
○ often, rare events are more interesting
● Common Technique:
○ Up / Down sampling
○ SMOTE (adding synthetic points in feature space)
● This paper
○ adding synthetic points in distance space

Research Question
● For time series data
○ not intuitive to represent as features in Rn
○ distance between two sequence is non-metric
○ Cannot use SMOTE
● In many applications, pair-wise distance is more relevant
○ many classifier only need pair-wise distances,
■ eg. SVM, knn
○ many good algorithms to compute distance in time
series data, e.g. DTW, OSB, …, etc.

Research Question
● Can we add synthetic data in distance space?
● Does it improve the performance?

Methodology
● Given any two points a, b in a distance space X, we can define a
ghost point e = μ(a,b).
● For every x ∈ X, the distance from x to e, d(x, μ(a,b)) is as follows:
○ case 1: {x, a, b} is a metric, then
■ d(x, µ(a, b))2 = ½ d(x, a)2 + ½ d(x, b)2 - ¼ d(a, b)2
○ case 2: If d(a, b) > d(x, a) + d(x, b), then
■ d(x, µ(a, b)) = ½ d(a, b) - d(x, b)
○ case 3a: If d(x, a) > d(x, b) + d(a, b), then
■ d(x, µ(a, b))2 = d(x, b)2 + ¼ d(a, b)2
○ case 3b: If d(x, b) > d(x, a) + d(a, b), then
■ d(x, µ(a, b))2 = d(x, a)2 + ¼ d(a, b)2

Data Collection, Processing
● UCR Time series datasets
○ Use 17 datasets from various domains
○ number of classes range from 2 to 50
● MPEG-7
○ 1400 binary images consisting of 70 object classes
○ within each class there are 20 shapes
○ each shape is represented with 100 equidistant sample points on the contour
○ these points are converted into sequences by calculating the curvature of each point with
respect to its five neighbors on each side.
○ this yields 1400 sequences, each of length 100
○ this transformation is invariant to rotation and scale

Key Results
● UCR Data Sets and OSB
● Shaded results indicate best
performers
● the darker the shade,
the larger the difference

Summary
● Proposed a new approach for over-sampling the minority
class of imbalanced data
● Unlike other feature based methods, the ghost points
are added in distance space.
● Ghost points can be added to non-metric distance space
○ Can be used with DTW, OSB, and many more.
● Empirical results show significant improvement

Critique of work
● For large-scale data, over-sampling is time consuming
● Introduce another parameters, i.e. the number of
ghost points that we should add
● May not perform well in highly noisy data

What's hot

Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...ArchiLab 7

Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...Computer Science Club

Discrete time signals on MATLABMartin Wachiye Wafula

Assignment 1Ciaran Cox

Generalized Notions of Data DepthMukund Raj

Application of Dijkstra Algorithm in Robot path planningDarling Jemima

DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...Menlo Systems GmbH

Using Principal Component Analysis to Remove Correlated Signal from Astronomi...CvilleDataScience

Graph Evolution ModelsCarlos Castillo (ChaTo)

Minicourse on Network SciencePavel Loskot

Line DetectionUpekha Vandebona

RepresentationSyed Zaid Irshad

Quiz 2Gopi Saiteja

Deep single view 3 d object reconstruction with visual hullHanqing Wang

R programmingmilanoIsmail Seyrik

Modelling the Clustering Coefficient of a Random graphGraph-TA

Hubba Deep LearningIvan Goloskokovic

Deep Learning meetupIvan Goloskokovic

15 chapter9 graph_algorithms_mstSSE_AndyLi

Multidimension Scaling and IsomapCheng-Shiang Li

What's hot (20)

Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...

Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...

Discrete time signals on MATLAB

Assignment 1

Generalized Notions of Data Depth

Application of Dijkstra Algorithm in Robot path planning

DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...

Using Principal Component Analysis to Remove Correlated Signal from Astronomi...

Graph Evolution Models

Minicourse on Network Science

Line Detection

Representation

Quiz 2

Deep single view 3 d object reconstruction with visual hull

R programmingmilano

Modelling the Clustering Coefficient of a Random graph

Hubba Deep Learning

Deep Learning meetup

15 chapter9 graph_algorithms_mst

Multidimension Scaling and Isomap

Viewers also liked

Probability cheatsheetJoachim Gwoke

Smoking soujanyaBBKuhn

Revista digital pdf jorge pinzonJorge Pinzon Cuervo

Presentation yaminBBKuhn

Sound shredding moustafaBBKuhn

2014.chi.structured labeling to facilitate concept evolution in machine learningBBKuhn

banheirashlopez10

SAVE_Memb_Certificate.jpgSAAD al-ZUBAIDI

Business ResumeJairo Bonilla Ramirez

Determinar la ecuación general de la circunferencia que pasa por el punto A(-...Sergio Damian Reinoso Rivadeneira

Viewers also liked (10)

Probability cheatsheet

Smoking soujanya

Revista digital pdf jorge pinzon

Presentation yamin

Sound shredding moustafa

2014.chi.structured labeling to facilitate concept evolution in machine learning

banheiras

SAVE_Memb_Certificate.jpg

Business Resume

Determinar la ecuación general de la circunferencia que pasa por el punto A(-...

Similar to Md2k 0219 shang

Making BIG DATA smallerTony Tran

Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...ssuser2624f71

Vectorise all the thingsJodieBurchell1

Neural Network Approximation.pdfbvhrs2

Aaa ped-17-Unsupervised Learning: Dimensionality reductionAminaRepo

Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI

CS 354 More Graphics PipelineMark Kilgard

$Learning multifractal structure in large networks (Purdue ML Seminar)$ $Learning multifractal structure in large networks (Purdue ML Seminar)$

Learning multifractal structure in large networks (Purdue ML Seminar)Austin Benson

Design and Analysis of Algorithms Lecture NotesSreedhar Chowdam

MLHEP Lectures - day 1, basic trackarogozhnikov

Survey on optical flow estimation with DLLeapMind Inc

Knn Algorithm presentationRishavSharma112

Vectorise all the things - long version.pptxJodieBurchell1

Noisy optimization --- (theory oriented) SurveyOlivier Teytaud

DBSCANssuseraef7e0

Chromatic Sparse LearningDatabricks

Deep Learning for SearchBhaskar Mitra

Image segmentationMadhuriMulik1

On clusteredsteinertree slide-ver 1.1VitAnhNguyn94

Similar to Md2k 0219 shang (20)

Making BIG DATA smaller

Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...

Vectorise all the things

Neural Network Approximation.pdf

Aaa ped-17-Unsupervised Learning: Dimensionality reduction

Semantic Segmentation on Satellite Imagery

CS 354 More Graphics Pipeline

$Learning multifractal structure in large networks (Purdue ML Seminar)$ $Learning multifractal structure in large networks (Purdue ML Seminar)$

Learning multifractal structure in large networks (Purdue ML Seminar)

Design and Analysis of Algorithms Lecture Notes

MLHEP Lectures - day 1, basic track

Survey on optical flow estimation with DL

Knn Algorithm presentation

Vectorise all the things - long version.pptx

Noisy optimization --- (theory oriented) Survey

DBSCAN

Chromatic Sparse Learning

Deep Learning for Search

Image segmentation

On clusteredsteinertree slide-ver 1.1

Recently uploaded

Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav

GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji

Animal Communication- Auditory and Visual.pptxUmerFayaz5

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani

Formation of low mass protostars and their circumstellar disksSérgio Sacani

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter

Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani

Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk

DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi

Disentangling the origin of chemical differences using GHOSTSérgio Sacani

Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav

CELL -Structural and Functional unit of life.pdfNistarini College, Purulia (W.B) India

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani

Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009

Recently uploaded (20)

Botany krishna series 2nd semester Only Mcq type questions

GFP in rDNA Technology (Biotechnology).pptx

Animal Communication- Auditory and Visual.pptx

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...

Formation of low mass protostars and their circumstellar disks

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx

Hubble Asteroid Hunter III. Physical properties of newly found asteroids

Pests of cotton_Sucking_Pests_Dr.UPR.pdf

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx

DIFFERENCE IN BACK CROSS AND TEST CROSS

Disentangling the origin of chemical differences using GHOST

Botany 4th semester file By Sumit Kumar yadav.pdf

CELL -Structural and Functional unit of life.pdf

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service

Recombination DNA Technology (Nucleic Acid Hybridization )

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...

Presentation Vikram Lander by Vedansh Gupta.pptx

Md2k 0219 shang

1. Improving SVM classification on imbalanced time series data sets with ghost points Presenter: Shang-Tse Chen Authors: Suzan Köknar-Tezel, Longin Jan Latecki

2. Introduction ● Imbalanced dataset is a challenge for data mining ○ always predict majority class -> high accuracy ○ often, rare events are more interesting ● Common Technique: ○ Up / Down sampling ○ SMOTE (adding synthetic points in feature space) ● This paper ○ adding synthetic points in distance space

3. Research Question ● For time series data ○ not intuitive to represent as features in Rn ○ distance between two sequence is non-metric ○ Cannot use SMOTE ● In many applications, pair-wise distance is more relevant ○ many classifier only need pair-wise distances, ■ eg. SVM, knn ○ many good algorithms to compute distance in time series data, e.g. DTW, OSB, …, etc.

4. Research Question ● Can we add synthetic data in distance space? ● Does it improve the performance?

5. Methodology ● Given any two points a, b in a distance space X, we can define a ghost point e = μ(a,b). ● For every x ∈ X, the distance from x to e, d(x, μ(a,b)) is as follows: ○ case 1: {x, a, b} is a metric, then ■ d(x, µ(a, b))2 = ½ d(x, a)2 + ½ d(x, b)2 - ¼ d(a, b)2 ○ case 2: If d(a, b) > d(x, a) + d(x, b), then ■ d(x, µ(a, b)) = ½ d(a, b) - d(x, b) ○ case 3a: If d(x, a) > d(x, b) + d(a, b), then ■ d(x, µ(a, b))2 = d(x, b)2 + ¼ d(a, b)2 ○ case 3b: If d(x, b) > d(x, a) + d(a, b), then ■ d(x, µ(a, b))2 = d(x, a)2 + ¼ d(a, b)2

6. Data Collection, Processing ● UCR Time series datasets ○ Use 17 datasets from various domains ○ number of classes range from 2 to 50 ● MPEG-7 ○ 1400 binary images consisting of 70 object classes ○ within each class there are 20 shapes ○ each shape is represented with 100 equidistant sample points on the contour ○ these points are converted into sequences by calculating the curvature of each point with respect to its five neighbors on each side. ○ this yields 1400 sequences, each of length 100 ○ this transformation is invariant to rotation and scale

7. Key Results ● UCR Data Sets and OSB ● Shaded results indicate best performers ● the darker the shade, the larger the difference

8. Key Results ● UCR Data Sets and DTW

9. Key Results ● MPEG-7 dataset

10. Summary ● Proposed a new approach for over-sampling the minority class of imbalanced data ● Unlike other feature based methods, the ghost points are added in distance space. ● Ghost points can be added to non-metric distance space ○ Can be used with DTW, OSB, and many more. ● Empirical results show significant improvement

11. Critique of work ● For large-scale data, over-sampling is time consuming ● Introduce another parameters, i.e. the number of ghost points that we should add ● May not perform well in highly noisy data

Md2k 0219 shang

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Md2k 0219 shang

Similar to Md2k 0219 shang (20)

More from BBKuhn

More from BBKuhn (9)

Recently uploaded

Recently uploaded (20)

Md2k 0219 shang