SlideShare a Scribd company logo
1 of 40
이상 감지
(Anomaly Detection)
고등 지능 기술 연구회
(Advanced Intelligence Technology Research Society)
김철(ki4420@gmail.com)
2016-07-09
이상감지란?
데이터의 메인 스트림에서 벗어난 샘플
데이터 마이닝에서 이상감지는 예상 패턴 또는 정상 범
주를 준수하지 않는 아이템, 이벤트, 관찰들의 식별을 의
미.
outlier
이상감지란?(cont.)
Min:Max ≠ Outlier
1.5xIQR rule
IQR(Interquartile Range) = Q3 – Q1
Max
Min
이상감지란?(cont.)
이상 값은 전형적으로 문제의 한 증상으로 해석
일반적인 통계 정의에 따르지 않는 드문 현상
이상감지란?(cont.)
클러스터 알고리즘으로 이상 패턴에 의해 형성된
마이크로 클러스터를 검출
역사
Anomaly detection was proposed for intrusion
detection systems (IDS) by Dorothy Denning in 1986.
초기에는 정상 임계치, 통계량의 전처리, 소프트 컴퓨팅
그리고, 귀납적 학습
역사(cont.)
응용기술
사이버 침입 탐지, 신용카드 사기, 고장 감지, 시스템 건
전성 모니터링, IoT, etc.
생태계 교란을 감지
데이터에서 이상 값을 제거하는 데 자주 사용
3가지 분류
1. 비지도 이상 감지(Unsupervised anomaly detection)
- 레이블 없는 데이터에서 이상 감지
- K-means 클러스터 알고리즘으로 이상검출
2. 지도 이상 감지(Supervised anomaly detection)
- 정상(Normal), 비정상(Abnormal) 레이블이 존재
- 분류 모델 이용(SVM, Random forests, Logistic, Robust,
KNN, etc.)
3가지 분류(cont.)
3. 준지도 이상 감지(Semi-supervised anomaly detection)
- 정상(Normal) 레이블만 존재하고, 정상 모델에 의해 생성한
likelihood를 비교해서 이상 값을 추출
- NKIA’s LRSTSD based Anomaly Detection
- Twitter’s Seasonal Hybrid ESD (S-H-ESD) based Anomaly
Detection
NKIA’s Anomaly Detection Twitter’s Anomaly Detection
입력 데이터
단변량(Univariate) 다변량(Multivariate)
입력 데이터(cont.)
자료구조
- Binary
- Categorical
- Continuous
- Hybrid
이상값의 종류
Point Anomalies
- 데이터 셋의 뭉치에서 벗어나는 값
이상값의 종류(cont.)
Contextual Anomalies
- 컨텍스트에 동떨어진 값
- 컨텍스트의 개념이 필요
- 조건부 이상치의 참조(Rules)
이상값의 종류(cont.)
Collective Anomalies
- 수집 문제로 발생한 이상값
Output of Anomaly Detection
Label
- Label of normal or anomaly
- 분류문제 접근법에서 true|false or class
Score
- Rank
- 0:1
- Threshold parameter가 필요
이상감지의 평가
F-Measure
- 지도학습, 분류문제 평가
- Formula:
Recall(R) = TP / (TP + FN)
Precision(P) = TP / (TP + FP)
F-measure = 2*R*P/(R+P)
The Area Under an ROC Curve
- AUC(Area Under the Curve)
- Detection Rate(TP), False Alarm Rate(TN)
- 0:1
- Equation:
Confusion Actual class
Normal Anomaly
Predicted
class
Normal TP FP
Anomaly FN TN
이원교차표(Crosstable)
Score Label
.90 ~ 1 Excellent(A)
.80 ~ .90 Good(B)
.70 ~ .80 Fair(C)
.60 ~ .70 Poor(D)
.50 ~ .60 Fail(F)
평가표 ROC(Receiver Operating
Characteristic) Curves
m = # of TP, n = # of TN, 𝑝𝑖 = 𝑇𝑃 𝑅𝑎𝑡𝑒(Detection Rate), 𝑝𝑗 = 𝑇𝑁 𝑅𝑎𝑡𝑒(𝐹𝑎𝑙𝑠𝑒 𝐴𝑙𝑎𝑟𝑚 𝑅𝑎𝑡𝑒)
Taxonomy*
유명한 이상감지 기법들
Twitter’s Anomaly Detection R pack.
Twitter open-sourced their R package for anomaly
detection.
They call their algorithm Seasonal Hybrid ESD (S-H-
ESD), which is built on Generalized ESD.
Sometimes anomalies can mess up your modeling.
Twitter’s Anomaly Detection R pack.(cont.)
install.packages("devtools")
devtools::install_github("twitter/AnomalyDetection")
library(AnomalyDetection)
install.packages("gtable")
install.packages("scales")
data(raw_data)
res = AnomalyDetectionTs(raw_data, max_anoms=0.02,
direction='both', plot=TRUE)
res$plota
Twitter’s Anomaly Detection R pack.(cont.)
v <- read.csv("D:/r/tsd_paper/cpu_5m_02.csv")
res2 = AnomalyDetectionVec(v, max_anoms=0.02, period=72,
direction='both', plot=TRUE)
res2$plot
Twitter’s Anomaly Detection R pack.(cont.)
Usage
AnomalyDetectionTs(x, max_anoms = 0.1, direction = "pos", alpha = 0.05, only_last = NULL, threshold = "None", e_value =
FALSE, longterm = FALSE, piecewise_median_period_weeks = 2, plot = FALSE, y_log = FALSE, xlabel = "", ylabel = "count", title
= NULL, verbose = FALSE)
Arguments
X : Time series as a two column data frame where the first column consists of the timestamps and the second column consists
of the observations.
max_anoms : Maximum number of anomalies that S-H-ESD will detect as a percentage of the data.
direction : Directionality of the anomalies to be detected. Options are: 'pos' | 'neg' | 'both'.
alpha : The level of statistical significance with which to accept or reject anomalies.
only_last : Find and report anomalies only within the last day or hr in the time series. NULL | 'day' | 'hr'.
threshold : Only report positive going anoms above the threshold specified. Options are: 'None' | 'med_max' | 'p95' | 'p99'.
e_value : Add an additional column to the anoms output containing the expected value.
longterm : Increase anom detection efficacy for time series that are greater than a month. See Details below.
piecewise_median_period_weeks : The piecewise median time window as described in Vallis, Hochenbaum, and Kejariwal (2014).
Defaults to 2.
Twitter’s Anomaly Detection R pack.(cont.)
Usage
AnomalyDetectionTs(x, max_anoms = 0.1, direction = "pos", alpha = 0.05, only_last = NULL, threshold = "None", e_value =
FALSE, longterm = FALSE, piecewise_median_period_weeks = 2, plot = FALSE, y_log = FALSE, xlabel = "", ylabel = "count", title
= NULL, verbose = FALSE)
Arguments(cont.)
plot : A flag indicating if a plot with both the time series and the estimated anoms, indicated by circles, should also be returned.
y_log : Apply log scaling to the y-axis. This helps with viewing plots that have extremely large positive anomalies relative to the
rest of the data.
xlabel : X-axis label to be added to the output plot.
ylabel : Y-axis label to be added to the output plot.
title : Title for the output plot.
verbose : Enable debug messages
Twitter’s Anomaly Detection R pack.(cont.)
To understand how twitter’s algorithm works, you need
to know.
- Student t-distribution
- Extreme Studentized Deviate (ESD) test
- Generalized ESD
- Linear regression
- LOESS
- STL(Seasonal Trend LOESS)
Twitter’s Anomaly Detection R pack.(cont.)
Student t-distribution
정규 분포의 평균을 측정할 때 주로 사용되는 분포
PDF
t
Twitter’s Anomaly Detection R pack.(cont.)
Extreme Studentized Deviate (ESD) test
Twitter’s Anomaly Detection R pack.(cont.)
Generalized ESD
Twitter’s Anomaly Detection R pack.(cont.)
Seasonality(linear regression, LOESS, STL)
The generalized ESD works when you have a set of points from a normal distribution,
but real data has some seasonality. This is where STL comes in. It decomposes the data
into a season part, a trend and whatever’s left over using local regression (LOESS), which
fits a low order polynomial to a subset of the data and stitches them together by
weighting them. Since you can remove the trend and seasonal part with loess, you
should be left with something that is more or less normally distributed. You can apply
generalized ESD on what’s left over to detect anomalies.
#STL: “Seasonal and Trend decomposition using Loess”
Seasonality Local regression(LOESS) Polynomial regression
Twitter: Introducing practical and robust
anomaly detection in a time series
Global/Local
At Twitter, we observe distinct seasonal patterns in most of the time series.
Global: global anomalies typically extend above or below expected seasonality and are
therefore not subject to seasonality and underlying trend
Local: anomalies which occur inside seasonal patterns, are masked and thus are much
more difficult to detect in a robust fashion.
Positive/Negative
Positive: 슈퍼볼 경기 동안의 트윗 폭증 등(이벤트에 대한 용량 산정을 위해 사용)
Negative: 초당 쿼리수(QPS[Queries Per Second])의 증가 등 잠재적인 하드웨어나 데이터
수집 이슈를 발견
Subspace- and correlation-based outlier
detection for high-dimensional data.
주성분 분석(PCA), 요인 분석(Dimension reduction)을 이용하여
차원 축소
부분공간(Subspace)의 대비(Contrast)를 계산하여 이상을 감지
Subspace- and correlation-based outlier
detection for high-dimensional data.(cont.)
HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
RNN(Replicator neural networks)
에러를 최소화해서 입력 패턴을 재생하는 방법
정상 모델을 생성하여 이상값을 추출
A schematic view of a fully connected
Replicator Neural Network.
𝑂𝐹𝑖 = i번째 요소의 Anomaly Factor 스코어
𝑛 = # of features
𝑥𝑖𝑗 = i번째 요소의 j컬럼 관측값
𝑜𝑖𝑗 = i번째 요소의 j컬럼 RNN으로 재생한 정규값
LOF(Local Outlier Factor)
Density-based anomaly detection by KNN
Score를 제공하여 해석이 용이하나 delay time이 좀 있음.
Unsupervised anomaly detection
Basic idea of LOF: comparing the local density of a point with the densities of its neighbors. A has a much lower
density than its neighbors
LOF(Local Outlier Factor)(cont.)
Formula:
Illustration of the
reachability distance.
Objects B and C have the
same reachability distance
(k=3), while D is not a k
nearest neighbor
LOF(Local Outlier Factor)(cont.)
LOF scores as visualized by ELKI. While the upper right cluster has a
comparable density to the outliers close to the bottom left cluster, they
are detected correctly.
LOF(Local Outlier Factor)(cont.)
LOF scores of cpu util. vs. Time by Rlof
LRSTSD(Log regression seasonality based
approach of time series decomposition)
Anomaly score formula:
Anomaly score
1일 네트워크 트래픽Tx 7일 네트워크 트래픽Tx
𝐸𝑖 = i번째 에러
𝐴𝑖 = i번째 관측값
𝑈𝑖 = i번째 예측 상한 값
𝐿𝑖 = i번째 예측 하한 값
𝑃 = 전체 값(Parameter)
결론
이상감지는 예측 모델 생성 시 Noise를 제거할 수 있는 기술
 예측률 향상 기대
데이터의 오탐/수집 실패를 감지
 Resampling, 보정 등 적절한 대처가 가능
관측된 이상 값과 문제와의 연관성 분석
 문제에 대한 사전 감지 기술로 활용
 고장 예측
참고문헌
• https://en.wikipedia.org/wiki/Anomaly_detection
• http://datascience.stackexchange.com/questions/2313/mach
ine-learning-where-is-the-difference-between-one-class-
binary-class-and-m
• https://en.wikipedia.org/wiki/Outlier#Detection
• https://www.semanticscholar.org/paper/Outlier-Detection-
Using-Replicator-Neural-Networks-Hawkins-
He/87a09c777dcecab4883e328669ef2af1ba8dd7be
• http://neuro.bstu.by/ai/To-dom/My_research/Papers-0/For-
research/D-mining/Anomaly-D/KDD-cup-
99/NN/dawak02.pdf
• http://slideplayer.com/slide/4194183/
• http://link.springer.com/chapter/10.1007%2F978-981-10-
0281-6_118#page-1
• https://cran.r-project.org/web/packages/Rlof/index.html
• https://warrenmar.wordpress.com/tag/seasonal-hybrid-esd/
• https://ko.wikipedia.org/wiki/%EC%8A%A4%ED%8A%9C%EB
%8D%98%ED%8A%B8_t_%EB%B6%84%ED%8F%AC
• https://en.wikipedia.org/wiki/Soft_computing
• https://www.google.com/trends/explore#q=anomaly%2C%20%2Fm%
2F02vnd10%2C%20%2Fm%2F0bs2j8q&cmpt=q&tz=Etc%2FGMT-9
• http://www.slideserve.com/sidonie/data-mining-for-anomaly-
detection
• http://www.physics.csbsju.edu/stats/box2.html
• http://study.com/academy/lesson/maximums-minimums-outliers-in-
a-data-set-lesson-quiz.html
• http://www.sfu.ca/~jackd/Stat203/Wk02_1_Full.pdf
• http://slideplayer.com/slide/6321088/
• http://gim.unmc.edu/dxtests/roc3.htm
• http://www.cs.ru.nl/~tomh/onderwijs/dm/dm_files/roc_auc.pdf
• http://togaware.com/papers/dawak02.pdf
• https://en.wikipedia.org/wiki/Grubbs%27_test_for_outliers
• https://github.com/twitter/AnomalyDetection
• https://blog.twitter.com/2015/introducing-practical-and-robust-
anomaly-detection-in-a-time-series

More Related Content

What's hot

Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection TechniqueChakrit Phain
 
Introduction to Machine Learning
Introduction to Machine Learning   Introduction to Machine Learning
Introduction to Machine Learning snehal_152
 
Trends in DM.pptx
Trends in DM.pptxTrends in DM.pptx
Trends in DM.pptxImXaib
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronMostafa G. M. Mostafa
 
Anomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesAnomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesHumberto Marchezi
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
Needle in the Haystack—User Behavior Anomaly Detection for Information Securi...
Needle in the Haystack—User Behavior Anomaly Detection for Information Securi...Needle in the Haystack—User Behavior Anomaly Detection for Information Securi...
Needle in the Haystack—User Behavior Anomaly Detection for Information Securi...Databricks
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Unsupervised Data Augmentation for Consistency Training
Unsupervised Data Augmentation for Consistency TrainingUnsupervised Data Augmentation for Consistency Training
Unsupervised Data Augmentation for Consistency TrainingSungchul Kim
 
220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualizationtaeseon ryu
 
내가 이해하는 SVM(왜, 어떻게를 중심으로)
내가 이해하는 SVM(왜, 어떻게를 중심으로)내가 이해하는 SVM(왜, 어떻게를 중심으로)
내가 이해하는 SVM(왜, 어떻게를 중심으로)SANG WON PARK
 
Active learning lecture
Active learning lectureActive learning lecture
Active learning lectureazuring
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksYunjey Choi
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANNMohamed Talaat
 
Anomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnAnomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnagramfort
 

What's hot (20)

Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
 
Introduction to Machine Learning
Introduction to Machine Learning   Introduction to Machine Learning
Introduction to Machine Learning
 
Trends in DM.pptx
Trends in DM.pptxTrends in DM.pptx
Trends in DM.pptx
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Anomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time SeriesAnomaly Detection in Seasonal Time Series
Anomaly Detection in Seasonal Time Series
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Needle in the Haystack—User Behavior Anomaly Detection for Information Securi...
Needle in the Haystack—User Behavior Anomaly Detection for Information Securi...Needle in the Haystack—User Behavior Anomaly Detection for Information Securi...
Needle in the Haystack—User Behavior Anomaly Detection for Information Securi...
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Unsupervised Data Augmentation for Consistency Training
Unsupervised Data Augmentation for Consistency TrainingUnsupervised Data Augmentation for Consistency Training
Unsupervised Data Augmentation for Consistency Training
 
220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization
 
내가 이해하는 SVM(왜, 어떻게를 중심으로)
내가 이해하는 SVM(왜, 어떻게를 중심으로)내가 이해하는 SVM(왜, 어떻게를 중심으로)
내가 이해하는 SVM(왜, 어떻게를 중심으로)
 
Active learning lecture
Active learning lectureActive learning lecture
Active learning lecture
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Anomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnAnomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learn
 

Viewers also liked

Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterArun Kejariwal
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
 
Welcher Test gewinnt?
Welcher Test gewinnt?Welcher Test gewinnt?
Welcher Test gewinnt?Silke Berz
 
Anomaly Detection with BigML
Anomaly Detection with BigMLAnomaly Detection with BigML
Anomaly Detection with BigMLDavid Gerster
 
What is jubatus? How it works for you?
What is jubatus? How it works for you?What is jubatus? How it works for you?
What is jubatus? How it works for you?Kumazaki Hiroki
 
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup ) Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup ) Ai Makabi
 
Vector space - subspace By Jatin Dhola
Vector space - subspace By Jatin DholaVector space - subspace By Jatin Dhola
Vector space - subspace By Jatin DholaJatin Dhola
 
Time series Analysis & fpp package
Time series Analysis & fpp packageTime series Analysis & fpp package
Time series Analysis & fpp packageDr. Fiona McGroarty
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsManojit Nandi
 
Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Saksham Agrawal
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflixCody Rioux
 
単純ベイズ法による異常検知 #ml-professional
単純ベイズ法による異常検知  #ml-professional単純ベイズ法による異常検知  #ml-professional
単純ベイズ法による異常検知 #ml-professionalAi Makabi
 
Chapter 01 #ml-professional
Chapter 01 #ml-professionalChapter 01 #ml-professional
Chapter 01 #ml-professionalAi Makabi
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup SlidesQuantUniversity
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAdam Gibson
 
[devil's camp] - 알고리즘 대회와 STL (박인서)
[devil's camp] - 알고리즘 대회와 STL (박인서)[devil's camp] - 알고리즘 대회와 STL (박인서)
[devil's camp] - 알고리즘 대회와 STL (박인서)NAVER D2
 
Chapter 02 #ml-professional
Chapter 02  #ml-professionalChapter 02  #ml-professional
Chapter 02 #ml-professionalAi Makabi
 
Anomaly detection, part 1
Anomaly detection, part 1Anomaly detection, part 1
Anomaly detection, part 1David Khosid
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionKhalid Elshafie
 

Viewers also liked (20)

Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ Twitter
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
 
Welcher Test gewinnt?
Welcher Test gewinnt?Welcher Test gewinnt?
Welcher Test gewinnt?
 
Anomaly Detection with BigML
Anomaly Detection with BigMLAnomaly Detection with BigML
Anomaly Detection with BigML
 
What is jubatus? How it works for you?
What is jubatus? How it works for you?What is jubatus? How it works for you?
What is jubatus? How it works for you?
 
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup ) Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
 
Vector space - subspace By Jatin Dhola
Vector space - subspace By Jatin DholaVector space - subspace By Jatin Dhola
Vector space - subspace By Jatin Dhola
 
Time series Analysis & fpp package
Time series Analysis & fpp packageTime series Analysis & fpp package
Time series Analysis & fpp package
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
 
Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflix
 
単純ベイズ法による異常検知 #ml-professional
単純ベイズ法による異常検知  #ml-professional単純ベイズ法による異常検知  #ml-professional
単純ベイズ法による異常検知 #ml-professional
 
Chapter 01 #ml-professional
Chapter 01 #ml-professionalChapter 01 #ml-professional
Chapter 01 #ml-professional
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup Slides
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) English
 
[devil's camp] - 알고리즘 대회와 STL (박인서)
[devil's camp] - 알고리즘 대회와 STL (박인서)[devil's camp] - 알고리즘 대회와 STL (박인서)
[devil's camp] - 알고리즘 대회와 STL (박인서)
 
Chapter 02 #ml-professional
Chapter 02  #ml-professionalChapter 02  #ml-professional
Chapter 02 #ml-professional
 
Anomaly detection, part 1
Anomaly detection, part 1Anomaly detection, part 1
Anomaly detection, part 1
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly Detection
 

Similar to Anomaly detection

Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsCynthia Freeman
 
Encoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlabEncoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlabSneheshDutta
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMapAshish Patel
 
Nural network ER. Abhishek k. upadhyay
Nural network ER. Abhishek  k. upadhyayNural network ER. Abhishek  k. upadhyay
Nural network ER. Abhishek k. upadhyayabhishek upadhyay
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxakashayosha
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalizationKamal Bhatt
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
CBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCarl Byington
 
Hierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly DetectionHierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly DetectionIhor Bobak
 
EBDSS Max Research Report - Final
EBDSS  Max  Research Report - FinalEBDSS  Max  Research Report - Final
EBDSS Max Research Report - FinalMax Robertson
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Oswald Campesato
 
Cheat sheets for AI
Cheat sheets for AICheat sheets for AI
Cheat sheets for AINcib Lotfi
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningRADO7900
 
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Studyiosrjce
 

Similar to Anomaly detection (20)

Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Encoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlabEncoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlab
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Java and Deep Learning
Java and Deep LearningJava and Deep Learning
Java and Deep Learning
 
TamingStatistics
TamingStatisticsTamingStatistics
TamingStatistics
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMap
 
Nural network ER. Abhishek k. upadhyay
Nural network ER. Abhishek  k. upadhyayNural network ER. Abhishek  k. upadhyay
Nural network ER. Abhishek k. upadhyay
 
Introduction
IntroductionIntroduction
Introduction
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalization
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
CBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCBM Fault Detection by Carl Byington
CBM Fault Detection by Carl Byington
 
PhysicsSIG2008-01-Seneviratne
PhysicsSIG2008-01-SeneviratnePhysicsSIG2008-01-Seneviratne
PhysicsSIG2008-01-Seneviratne
 
Hierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly DetectionHierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly Detection
 
EBDSS Max Research Report - Final
EBDSS  Max  Research Report - FinalEBDSS  Max  Research Report - Final
EBDSS Max Research Report - Final
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)
 
Cheat sheets for AI
Cheat sheets for AICheat sheets for AI
Cheat sheets for AI
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Study
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Anomaly detection

  • 1. 이상 감지 (Anomaly Detection) 고등 지능 기술 연구회 (Advanced Intelligence Technology Research Society) 김철(ki4420@gmail.com) 2016-07-09
  • 2. 이상감지란? 데이터의 메인 스트림에서 벗어난 샘플 데이터 마이닝에서 이상감지는 예상 패턴 또는 정상 범 주를 준수하지 않는 아이템, 이벤트, 관찰들의 식별을 의 미. outlier
  • 3. 이상감지란?(cont.) Min:Max ≠ Outlier 1.5xIQR rule IQR(Interquartile Range) = Q3 – Q1 Max Min
  • 4. 이상감지란?(cont.) 이상 값은 전형적으로 문제의 한 증상으로 해석 일반적인 통계 정의에 따르지 않는 드문 현상
  • 5. 이상감지란?(cont.) 클러스터 알고리즘으로 이상 패턴에 의해 형성된 마이크로 클러스터를 검출
  • 6. 역사 Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986. 초기에는 정상 임계치, 통계량의 전처리, 소프트 컴퓨팅 그리고, 귀납적 학습
  • 8. 응용기술 사이버 침입 탐지, 신용카드 사기, 고장 감지, 시스템 건 전성 모니터링, IoT, etc. 생태계 교란을 감지 데이터에서 이상 값을 제거하는 데 자주 사용
  • 9. 3가지 분류 1. 비지도 이상 감지(Unsupervised anomaly detection) - 레이블 없는 데이터에서 이상 감지 - K-means 클러스터 알고리즘으로 이상검출 2. 지도 이상 감지(Supervised anomaly detection) - 정상(Normal), 비정상(Abnormal) 레이블이 존재 - 분류 모델 이용(SVM, Random forests, Logistic, Robust, KNN, etc.)
  • 10. 3가지 분류(cont.) 3. 준지도 이상 감지(Semi-supervised anomaly detection) - 정상(Normal) 레이블만 존재하고, 정상 모델에 의해 생성한 likelihood를 비교해서 이상 값을 추출 - NKIA’s LRSTSD based Anomaly Detection - Twitter’s Seasonal Hybrid ESD (S-H-ESD) based Anomaly Detection NKIA’s Anomaly Detection Twitter’s Anomaly Detection
  • 12. 입력 데이터(cont.) 자료구조 - Binary - Categorical - Continuous - Hybrid
  • 13. 이상값의 종류 Point Anomalies - 데이터 셋의 뭉치에서 벗어나는 값
  • 14. 이상값의 종류(cont.) Contextual Anomalies - 컨텍스트에 동떨어진 값 - 컨텍스트의 개념이 필요 - 조건부 이상치의 참조(Rules)
  • 15. 이상값의 종류(cont.) Collective Anomalies - 수집 문제로 발생한 이상값
  • 16. Output of Anomaly Detection Label - Label of normal or anomaly - 분류문제 접근법에서 true|false or class Score - Rank - 0:1 - Threshold parameter가 필요
  • 17. 이상감지의 평가 F-Measure - 지도학습, 분류문제 평가 - Formula: Recall(R) = TP / (TP + FN) Precision(P) = TP / (TP + FP) F-measure = 2*R*P/(R+P) The Area Under an ROC Curve - AUC(Area Under the Curve) - Detection Rate(TP), False Alarm Rate(TN) - 0:1 - Equation: Confusion Actual class Normal Anomaly Predicted class Normal TP FP Anomaly FN TN 이원교차표(Crosstable) Score Label .90 ~ 1 Excellent(A) .80 ~ .90 Good(B) .70 ~ .80 Fair(C) .60 ~ .70 Poor(D) .50 ~ .60 Fail(F) 평가표 ROC(Receiver Operating Characteristic) Curves m = # of TP, n = # of TN, 𝑝𝑖 = 𝑇𝑃 𝑅𝑎𝑡𝑒(Detection Rate), 𝑝𝑗 = 𝑇𝑁 𝑅𝑎𝑡𝑒(𝐹𝑎𝑙𝑠𝑒 𝐴𝑙𝑎𝑟𝑚 𝑅𝑎𝑡𝑒)
  • 20. Twitter’s Anomaly Detection R pack. Twitter open-sourced their R package for anomaly detection. They call their algorithm Seasonal Hybrid ESD (S-H- ESD), which is built on Generalized ESD. Sometimes anomalies can mess up your modeling.
  • 21. Twitter’s Anomaly Detection R pack.(cont.) install.packages("devtools") devtools::install_github("twitter/AnomalyDetection") library(AnomalyDetection) install.packages("gtable") install.packages("scales") data(raw_data) res = AnomalyDetectionTs(raw_data, max_anoms=0.02, direction='both', plot=TRUE) res$plota
  • 22. Twitter’s Anomaly Detection R pack.(cont.) v <- read.csv("D:/r/tsd_paper/cpu_5m_02.csv") res2 = AnomalyDetectionVec(v, max_anoms=0.02, period=72, direction='both', plot=TRUE) res2$plot
  • 23. Twitter’s Anomaly Detection R pack.(cont.) Usage AnomalyDetectionTs(x, max_anoms = 0.1, direction = "pos", alpha = 0.05, only_last = NULL, threshold = "None", e_value = FALSE, longterm = FALSE, piecewise_median_period_weeks = 2, plot = FALSE, y_log = FALSE, xlabel = "", ylabel = "count", title = NULL, verbose = FALSE) Arguments X : Time series as a two column data frame where the first column consists of the timestamps and the second column consists of the observations. max_anoms : Maximum number of anomalies that S-H-ESD will detect as a percentage of the data. direction : Directionality of the anomalies to be detected. Options are: 'pos' | 'neg' | 'both'. alpha : The level of statistical significance with which to accept or reject anomalies. only_last : Find and report anomalies only within the last day or hr in the time series. NULL | 'day' | 'hr'. threshold : Only report positive going anoms above the threshold specified. Options are: 'None' | 'med_max' | 'p95' | 'p99'. e_value : Add an additional column to the anoms output containing the expected value. longterm : Increase anom detection efficacy for time series that are greater than a month. See Details below. piecewise_median_period_weeks : The piecewise median time window as described in Vallis, Hochenbaum, and Kejariwal (2014). Defaults to 2.
  • 24. Twitter’s Anomaly Detection R pack.(cont.) Usage AnomalyDetectionTs(x, max_anoms = 0.1, direction = "pos", alpha = 0.05, only_last = NULL, threshold = "None", e_value = FALSE, longterm = FALSE, piecewise_median_period_weeks = 2, plot = FALSE, y_log = FALSE, xlabel = "", ylabel = "count", title = NULL, verbose = FALSE) Arguments(cont.) plot : A flag indicating if a plot with both the time series and the estimated anoms, indicated by circles, should also be returned. y_log : Apply log scaling to the y-axis. This helps with viewing plots that have extremely large positive anomalies relative to the rest of the data. xlabel : X-axis label to be added to the output plot. ylabel : Y-axis label to be added to the output plot. title : Title for the output plot. verbose : Enable debug messages
  • 25. Twitter’s Anomaly Detection R pack.(cont.) To understand how twitter’s algorithm works, you need to know. - Student t-distribution - Extreme Studentized Deviate (ESD) test - Generalized ESD - Linear regression - LOESS - STL(Seasonal Trend LOESS)
  • 26. Twitter’s Anomaly Detection R pack.(cont.) Student t-distribution 정규 분포의 평균을 측정할 때 주로 사용되는 분포 PDF t
  • 27. Twitter’s Anomaly Detection R pack.(cont.) Extreme Studentized Deviate (ESD) test
  • 28. Twitter’s Anomaly Detection R pack.(cont.) Generalized ESD
  • 29. Twitter’s Anomaly Detection R pack.(cont.) Seasonality(linear regression, LOESS, STL) The generalized ESD works when you have a set of points from a normal distribution, but real data has some seasonality. This is where STL comes in. It decomposes the data into a season part, a trend and whatever’s left over using local regression (LOESS), which fits a low order polynomial to a subset of the data and stitches them together by weighting them. Since you can remove the trend and seasonal part with loess, you should be left with something that is more or less normally distributed. You can apply generalized ESD on what’s left over to detect anomalies. #STL: “Seasonal and Trend decomposition using Loess” Seasonality Local regression(LOESS) Polynomial regression
  • 30. Twitter: Introducing practical and robust anomaly detection in a time series Global/Local At Twitter, we observe distinct seasonal patterns in most of the time series. Global: global anomalies typically extend above or below expected seasonality and are therefore not subject to seasonality and underlying trend Local: anomalies which occur inside seasonal patterns, are masked and thus are much more difficult to detect in a robust fashion. Positive/Negative Positive: 슈퍼볼 경기 동안의 트윗 폭증 등(이벤트에 대한 용량 산정을 위해 사용) Negative: 초당 쿼리수(QPS[Queries Per Second])의 증가 등 잠재적인 하드웨어나 데이터 수집 이슈를 발견
  • 31. Subspace- and correlation-based outlier detection for high-dimensional data. 주성분 분석(PCA), 요인 분석(Dimension reduction)을 이용하여 차원 축소 부분공간(Subspace)의 대비(Contrast)를 계산하여 이상을 감지
  • 32. Subspace- and correlation-based outlier detection for high-dimensional data.(cont.) HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
  • 33. RNN(Replicator neural networks) 에러를 최소화해서 입력 패턴을 재생하는 방법 정상 모델을 생성하여 이상값을 추출 A schematic view of a fully connected Replicator Neural Network. 𝑂𝐹𝑖 = i번째 요소의 Anomaly Factor 스코어 𝑛 = # of features 𝑥𝑖𝑗 = i번째 요소의 j컬럼 관측값 𝑜𝑖𝑗 = i번째 요소의 j컬럼 RNN으로 재생한 정규값
  • 34. LOF(Local Outlier Factor) Density-based anomaly detection by KNN Score를 제공하여 해석이 용이하나 delay time이 좀 있음. Unsupervised anomaly detection Basic idea of LOF: comparing the local density of a point with the densities of its neighbors. A has a much lower density than its neighbors
  • 35. LOF(Local Outlier Factor)(cont.) Formula: Illustration of the reachability distance. Objects B and C have the same reachability distance (k=3), while D is not a k nearest neighbor
  • 36. LOF(Local Outlier Factor)(cont.) LOF scores as visualized by ELKI. While the upper right cluster has a comparable density to the outliers close to the bottom left cluster, they are detected correctly.
  • 37. LOF(Local Outlier Factor)(cont.) LOF scores of cpu util. vs. Time by Rlof
  • 38. LRSTSD(Log regression seasonality based approach of time series decomposition) Anomaly score formula: Anomaly score 1일 네트워크 트래픽Tx 7일 네트워크 트래픽Tx 𝐸𝑖 = i번째 에러 𝐴𝑖 = i번째 관측값 𝑈𝑖 = i번째 예측 상한 값 𝐿𝑖 = i번째 예측 하한 값 𝑃 = 전체 값(Parameter)
  • 39. 결론 이상감지는 예측 모델 생성 시 Noise를 제거할 수 있는 기술  예측률 향상 기대 데이터의 오탐/수집 실패를 감지  Resampling, 보정 등 적절한 대처가 가능 관측된 이상 값과 문제와의 연관성 분석  문제에 대한 사전 감지 기술로 활용  고장 예측
  • 40. 참고문헌 • https://en.wikipedia.org/wiki/Anomaly_detection • http://datascience.stackexchange.com/questions/2313/mach ine-learning-where-is-the-difference-between-one-class- binary-class-and-m • https://en.wikipedia.org/wiki/Outlier#Detection • https://www.semanticscholar.org/paper/Outlier-Detection- Using-Replicator-Neural-Networks-Hawkins- He/87a09c777dcecab4883e328669ef2af1ba8dd7be • http://neuro.bstu.by/ai/To-dom/My_research/Papers-0/For- research/D-mining/Anomaly-D/KDD-cup- 99/NN/dawak02.pdf • http://slideplayer.com/slide/4194183/ • http://link.springer.com/chapter/10.1007%2F978-981-10- 0281-6_118#page-1 • https://cran.r-project.org/web/packages/Rlof/index.html • https://warrenmar.wordpress.com/tag/seasonal-hybrid-esd/ • https://ko.wikipedia.org/wiki/%EC%8A%A4%ED%8A%9C%EB %8D%98%ED%8A%B8_t_%EB%B6%84%ED%8F%AC • https://en.wikipedia.org/wiki/Soft_computing • https://www.google.com/trends/explore#q=anomaly%2C%20%2Fm% 2F02vnd10%2C%20%2Fm%2F0bs2j8q&cmpt=q&tz=Etc%2FGMT-9 • http://www.slideserve.com/sidonie/data-mining-for-anomaly- detection • http://www.physics.csbsju.edu/stats/box2.html • http://study.com/academy/lesson/maximums-minimums-outliers-in- a-data-set-lesson-quiz.html • http://www.sfu.ca/~jackd/Stat203/Wk02_1_Full.pdf • http://slideplayer.com/slide/6321088/ • http://gim.unmc.edu/dxtests/roc3.htm • http://www.cs.ru.nl/~tomh/onderwijs/dm/dm_files/roc_auc.pdf • http://togaware.com/papers/dawak02.pdf • https://en.wikipedia.org/wiki/Grubbs%27_test_for_outliers • https://github.com/twitter/AnomalyDetection • https://blog.twitter.com/2015/introducing-practical-and-robust- anomaly-detection-in-a-time-series

Editor's Notes

  1. oO