SlideShare a Scribd company logo
1 of 56
Download to read offline
®

Anomaly Detection

© MapR Technologies, confidential

®
Agenda
• 
• 
• 
• 
• 

What is anomaly detection?
Some examples
Some generalization
More interesting examples
Sample implementation methods

© MapR Technologies, confidential

®
Who I am
•  Ted Dunning, Chief Application Architect, MapR
tdunning@mapr.com
tdunning@apache.org
@ted_dunning

•  Committer, mentor, champion, PMC member on
several Apache projects
•  Mahout, Drill, Zookeeper others

© MapR Technologies, confidential

®
Who we are
•  MapR makes the technology leading distribution
including Hadoop
•  MapR integrates real-time data semantics
directly into a system that also runs Hadoop
programs seamlessly
•  The biggest and best choose MapR
–  Google, Amazon
–  Largest credit card, retailer, health insurance, telco
–  Ping me for info
© MapR Technologies, confidential

®
What is Anomaly Detection?
•  What just happened that shouldn’t?
–  but I don’t know what failure looks like (yet)

•  Find the problem before other people see it
–  especially customers and CEO’s

•  But don’t wake me up if it isn’t really broken

© MapR Technologies, confidential

®
© MapR Technologies, confidential

®
Looks pretty
anomalous
to me

© MapR Technologies, confidential

®
Will the real anomaly
please stand up?

© MapR Technologies, confidential

®
What Are We Really Doing
•  We want action when something breaks
(dies/falls over/otherwise gets in trouble)

•  But action is expensive
•  So we don’t want false alarms
•  And we don’t want false negatives
•  We need to trade off costs

© MapR Technologies, confidential

®
A Second Look

© MapR Technologies, confidential

®
A Second Look

99.9%-ile

© MapR Technologies, confidential

®
With Spikes

© MapR Technologies, confidential

®
With Spikes

99.9%-ile including spikes

© MapR Technologies, confidential

®
How Hard Can it Be?

x

x>t?

Alarm !

t
Online
Summarizer

99.9%-ile

© MapR Technologies, confidential

®
On-line Percentile Estimates
•  Apache Mahout has on-line percentile estimator
–  very high accuracy for extreme tails
–  new in version 0.9 !!

•  What’s the big deal with anomaly detection?
•  This looks like a solved problem

© MapR Technologies, confidential

®
Spot the Anomaly

Anomaly?

© MapR Technologies, confidential

®
Maybe not!

© MapR Technologies, confidential

®
Where’s Waldo?

This is the real
anomaly

© MapR Technologies, confidential

®
Normal Isn’t Just Normal
•  What we want is a model of what is normal
•  What doesn’t fit the model is the anomaly
•  For simple signals, the model can be simple …

x ~ N(0, ε )
•  The real world is rarely so accommodating
© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
We Do Windows

© MapR Technologies, confidential

®
Windows on the World
•  The set of windowed signals is a nice model of
our original signal
•  Clustering can find the prototypes
–  Fancier techniques available using sparse coding

•  The result is a dictionary of shapes
•  New signals can be encoded by shifting, scaling
and adding shapes from the dictionary

© MapR Technologies, confidential

®
Most Common Shapes (for EKG)

© MapR Technologies, confidential

®
Reconstructed signal
Original
signal

Reconstructed
signal
Reconstruction
error

© MapR Technologies, confidential

®
An Anomaly

Original technique for
finding 1-d anomaly works
against reconstruction error

© MapR Technologies, confidential

®
Close-up of anomaly

Not what you want your
heart to do.
And not what the model
expects it to do.

© MapR Technologies, confidential

®
A Different Kind of Anomaly

© MapR Technologies, confidential

®
Model Delta Anomaly Detection

δ

+

δ>t?

Alarm !

t
Model

Online
Summarizer

99.9%-ile

© MapR Technologies, confidential

®
The Real Inside Scoop
•  The model-delta anomaly detector is really just a
mixture distribution
–  the model we know about already
–  and a normally distributed error

•  The output (delta) is (roughly) the log probability
of the mixture distribution
•  Thinking about probability distributions is good
© MapR Technologies, confidential

®
Example: Event Stream (timing)
•  Events of various types arrive at irregular
intervals
–  we can assume Poisson distribution

•  The key question is whether frequency has
changed relative to expected values
•  Want alert as soon as possible

© MapR Technologies, confidential

®
Poisson Distribution
•  Time between events is exponentially distributed

Δt ~ λ e

− λt

•  This means that long delays are exponentially
rare
− λT

P(Δt > T ) = e
− log P(Δt > T ) = λT

•  If we know λ we can select a good threshold
–  or we can pick a threshold empirically
© MapR Technologies, confidential

®
Converting Event Times to Anomaly
99.9%-ile
99.99%-ile

© MapR Technologies, confidential

®
Example: Event Stream (sequence)
•  The scenario:
–  Web visitors are being subjected to phishing attacks
–  The hook directs the visitors to a mocked login
–  When they enter their credentials, the attacker uses
them to log in
–  We assume captcha’s or similar are part of the
authentication so the attacker has to show the images
on the login page to the visitor

•  The problem:
–  We don’t really know how the attack works
© MapR Technologies, confidential

®
Normal Event Flow
Image-1

Image-2
login
Image-3

Image-4

© MapR Technologies, confidential

®
Phishing Flow
Real login
Image1
Image1
login
Image1

Image1

Image1

Image1
login
Image1
Image1

Fake login

© MapR Technologies, confidential

®
Key Observations
•  Regardless of exact details, there are patterns
•  Event stream per user shows these patterns
•  Phishing will have different patterns at much
lower rate
•  Measuring statistical surprise gives a good
anomaly (fraud or malfunction) indicator

© MapR Technologies, confidential

®
Recap (out of order)
•  Anomaly detection is best done with a probability
model
•  Deep learning is a neat way to build this model
–  converting to symbolic dynamics simplifies life

•  -log p is a good way to convert to anomaly
measure
•  Adaptive quantile estimation works for autosetting thresholds

© MapR Technologies, confidential

®
Recap
•  Different systems require different models
•  Continuous time-series
–  sparse coding or deep learning to build signal model

•  Events in time
–  rate model base on variable rate Poisson
–  segregated rate model

•  Events with labels
–  language modeling
–  hidden Markov models
© MapR Technologies, confidential

®
How Do I Build Such a System
•  The key is to combine real-time and long-time
–  real-time evaluates data stream against model
–  long-time is how we build the model

•  Extended Lambda architecture is my favorite
•  See my other talks on slideshare.net for info
•  Ping me directly
© MapR Technologies, confidential

®
Hadoop is Not Very Real-time
Unprocessed
Data

now

t
Fully
processed

Latest
full
period

Hadoop
job takes
this long
for this
data

© MapR Technologies, confidential

®
Real-time and Long-time together
Blended
View
view

now

t
Hadoop works
great back here

Storm
works
here

© MapR Technologies, confidential

®
Who I am
•  Ted Dunning, Chief Application Architect, MapR
tdunning@mapr.com
tdunning@apache.org
@ted_dunning

•  Committer, mentor, champion, PMC member on
several Apache projects
•  Mahout, Drill, Zookeeper others

© MapR Technologies, confidential

®
© MapR Technologies, confidential

® ®

More Related Content

What's hot

Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache MahoutTed Dunning
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really MatterTed Dunning
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesTed Dunning
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
 
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...tboubez
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and RecommendationsTed Dunning
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionDataWorks Summit
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedTed Dunning
 

What's hot (20)

Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for Recommendation
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Dunning ml-conf-2014
Dunning ml-conf-2014Dunning ml-conf-2014
Dunning ml-conf-2014
 
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and Recommendations
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detection
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
 

Viewers also liked

Anomaly detection, part 1
Anomaly detection, part 1Anomaly detection, part 1
Anomaly detection, part 1David Khosid
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionKhalid Elshafie
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learningAdam Gibson
 
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 Let Spark Fly: Advantages and Use Cases for Spark on Hadoop Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on HadoopMapR Technologies
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAdam Gibson
 
Ariu - Workshop on Artificial Intelligence and Security - 2011
Ariu - Workshop on Artificial Intelligence and Security - 2011Ariu - Workshop on Artificial Intelligence and Security - 2011
Ariu - Workshop on Artificial Intelligence and Security - 2011Pluribus One
 
Anomaly Detection by Mean and Standard Deviation (LT at AQ)
Anomaly Detection by Mean and Standard Deviation (LT at AQ)Anomaly Detection by Mean and Standard Deviation (LT at AQ)
Anomaly Detection by Mean and Standard Deviation (LT at AQ)Yoshihiro Iwanaga
 
Network anomaly detection based on statistical
Network anomaly detection based on statistical Network anomaly detection based on statistical
Network anomaly detection based on statistical jimmy9090909
 
Mr201306 machine learning for computer security
Mr201306 machine learning for computer securityMr201306 machine learning for computer security
Mr201306 machine learning for computer securityFFRI, Inc.
 
Machine learning approach to anomaly detection in cyber security
Machine learning approach to anomaly detection in cyber securityMachine learning approach to anomaly detection in cyber security
Machine learning approach to anomaly detection in cyber securityIAEME Publication
 
Tugas analisis struk tur teks anekdot
Tugas analisis struk tur teks anekdotTugas analisis struk tur teks anekdot
Tugas analisis struk tur teks anekdotNuril anwar
 
Portfolio of scriptwriter Johan de Rooij
Portfolio of scriptwriter Johan de RooijPortfolio of scriptwriter Johan de Rooij
Portfolio of scriptwriter Johan de Rooijjderooij
 
Managed analytics overview
Managed analytics overviewManaged analytics overview
Managed analytics overviewstuartdalton
 
Anomaly Detection Via PCA
Anomaly Detection Via PCAAnomaly Detection Via PCA
Anomaly Detection Via PCADeepak Kumar
 
Jim Geovedi - Machine Learning for Cybersecurity
Jim Geovedi - Machine Learning for CybersecurityJim Geovedi - Machine Learning for Cybersecurity
Jim Geovedi - Machine Learning for Cybersecurityidsecconf
 
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Alex Pinto
 
플랫폼비즈니스와 혁신의 만남
플랫폼비즈니스와 혁신의 만남플랫폼비즈니스와 혁신의 만남
플랫폼비즈니스와 혁신의 만남The Innovation Lab
 

Viewers also liked (20)

Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly detection, part 1
Anomaly detection, part 1Anomaly detection, part 1
Anomaly detection, part 1
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly Detection
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learning
 
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 Let Spark Fly: Advantages and Use Cases for Spark on Hadoop Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) English
 
Ariu - Workshop on Artificial Intelligence and Security - 2011
Ariu - Workshop on Artificial Intelligence and Security - 2011Ariu - Workshop on Artificial Intelligence and Security - 2011
Ariu - Workshop on Artificial Intelligence and Security - 2011
 
Anomaly Detection by Mean and Standard Deviation (LT at AQ)
Anomaly Detection by Mean and Standard Deviation (LT at AQ)Anomaly Detection by Mean and Standard Deviation (LT at AQ)
Anomaly Detection by Mean and Standard Deviation (LT at AQ)
 
Network anomaly detection based on statistical
Network anomaly detection based on statistical Network anomaly detection based on statistical
Network anomaly detection based on statistical
 
Mr201306 machine learning for computer security
Mr201306 machine learning for computer securityMr201306 machine learning for computer security
Mr201306 machine learning for computer security
 
Machine learning approach to anomaly detection in cyber security
Machine learning approach to anomaly detection in cyber securityMachine learning approach to anomaly detection in cyber security
Machine learning approach to anomaly detection in cyber security
 
Velocity 2015-final
Velocity 2015-finalVelocity 2015-final
Velocity 2015-final
 
Tugas analisis struk tur teks anekdot
Tugas analisis struk tur teks anekdotTugas analisis struk tur teks anekdot
Tugas analisis struk tur teks anekdot
 
Portfolio of scriptwriter Johan de Rooij
Portfolio of scriptwriter Johan de RooijPortfolio of scriptwriter Johan de Rooij
Portfolio of scriptwriter Johan de Rooij
 
Managed analytics overview
Managed analytics overviewManaged analytics overview
Managed analytics overview
 
Anomaly Detection Via PCA
Anomaly Detection Via PCAAnomaly Detection Via PCA
Anomaly Detection Via PCA
 
Jim Geovedi - Machine Learning for Cybersecurity
Jim Geovedi - Machine Learning for CybersecurityJim Geovedi - Machine Learning for Cybersecurity
Jim Geovedi - Machine Learning for Cybersecurity
 
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
 
플랫폼비즈니스와 혁신의 만남
플랫폼비즈니스와 혁신의 만남플랫폼비즈니스와 혁신의 만남
플랫폼비즈니스와 혁신의 만남
 

Similar to Strata 2014 Anomaly Detection

Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01MapR Technologies
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15MLconf
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Aleksandr Tavgen
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the MoviesDataWorks Summit
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 
Using Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS PlatformUsing Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS PlatformDevOps.com
 
201411203 goto night on graphs for fraud detection
201411203 goto night on graphs for fraud detection201411203 goto night on graphs for fraud detection
201411203 goto night on graphs for fraud detectionRik Van Bruggen
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with ChaosMapR Technologies
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With ChaosDataWorks Summit
 
Quantum Computers new Generation of Computers part 7 by prof lili saghafi Qua...
Quantum Computers new Generation of Computers part 7 by prof lili saghafi Qua...Quantum Computers new Generation of Computers part 7 by prof lili saghafi Qua...
Quantum Computers new Generation of Computers part 7 by prof lili saghafi Qua...Professor Lili Saghafi
 
Machine Learning Introduction introducing basics of Machine Learning
Machine Learning Introduction introducing basics of Machine LearningMachine Learning Introduction introducing basics of Machine Learning
Machine Learning Introduction introducing basics of Machine Learningvipulkondekar
 
InfoSecurity Europe 2017 - On The Hunt for Advanced Attacks? C&C Channels are...
InfoSecurity Europe 2017 - On The Hunt for Advanced Attacks? C&C Channels are...InfoSecurity Europe 2017 - On The Hunt for Advanced Attacks? C&C Channels are...
InfoSecurity Europe 2017 - On The Hunt for Advanced Attacks? C&C Channels are...Moshe Zioni
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
Buzzwords 2014 / Overview / part2
Buzzwords 2014 / Overview / part2Buzzwords 2014 / Overview / part2
Buzzwords 2014 / Overview / part2Andrii Gakhov
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterDataWorks Summit
 
Observability - the good, the bad, and the ugly
Observability - the good, the bad, and the uglyObservability - the good, the bad, and the ugly
Observability - the good, the bad, and the uglyAleksandr Tavgen
 
Short Data Rules for Observability.pdf
Short Data Rules for Observability.pdfShort Data Rules for Observability.pdf
Short Data Rules for Observability.pdfDave McAllister
 

Similar to Strata 2014 Anomaly Detection (20)

Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the Movies
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
Using Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS PlatformUsing Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS Platform
 
201411203 goto night on graphs for fraud detection
201411203 goto night on graphs for fraud detection201411203 goto night on graphs for fraud detection
201411203 goto night on graphs for fraud detection
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with Chaos
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With Chaos
 
GoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 SkinnedGoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 Skinned
 
Quantum Computers new Generation of Computers part 7 by prof lili saghafi Qua...
Quantum Computers new Generation of Computers part 7 by prof lili saghafi Qua...Quantum Computers new Generation of Computers part 7 by prof lili saghafi Qua...
Quantum Computers new Generation of Computers part 7 by prof lili saghafi Qua...
 
Machine Learning Introduction introducing basics of Machine Learning
Machine Learning Introduction introducing basics of Machine LearningMachine Learning Introduction introducing basics of Machine Learning
Machine Learning Introduction introducing basics of Machine Learning
 
Devoxx Real-Time Learning
Devoxx Real-Time LearningDevoxx Real-Time Learning
Devoxx Real-Time Learning
 
InfoSecurity Europe 2017 - On The Hunt for Advanced Attacks? C&C Channels are...
InfoSecurity Europe 2017 - On The Hunt for Advanced Attacks? C&C Channels are...InfoSecurity Europe 2017 - On The Hunt for Advanced Attacks? C&C Channels are...
InfoSecurity Europe 2017 - On The Hunt for Advanced Attacks? C&C Channels are...
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Buzzwords 2014 / Overview / part2
Buzzwords 2014 / Overview / part2Buzzwords 2014 / Overview / part2
Buzzwords 2014 / Overview / part2
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really Matter
 
Observability - the good, the bad, and the ugly
Observability - the good, the bad, and the uglyObservability - the good, the bad, and the ugly
Observability - the good, the bad, and the ugly
 
Short Data Rules for Observability.pdf
Short Data Rules for Observability.pdfShort Data Rules for Observability.pdf
Short Data Rules for Observability.pdf
 

More from Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 

More from Ted Dunning (11)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Strata 2014 Anomaly Detection

  • 1. ® Anomaly Detection © MapR Technologies, confidential ®
  • 2. Agenda •  •  •  •  •  What is anomaly detection? Some examples Some generalization More interesting examples Sample implementation methods © MapR Technologies, confidential ®
  • 3. Who I am •  Ted Dunning, Chief Application Architect, MapR tdunning@mapr.com tdunning@apache.org @ted_dunning •  Committer, mentor, champion, PMC member on several Apache projects •  Mahout, Drill, Zookeeper others © MapR Technologies, confidential ®
  • 4. Who we are •  MapR makes the technology leading distribution including Hadoop •  MapR integrates real-time data semantics directly into a system that also runs Hadoop programs seamlessly •  The biggest and best choose MapR –  Google, Amazon –  Largest credit card, retailer, health insurance, telco –  Ping me for info © MapR Technologies, confidential ®
  • 5. What is Anomaly Detection? •  What just happened that shouldn’t? –  but I don’t know what failure looks like (yet) •  Find the problem before other people see it –  especially customers and CEO’s •  But don’t wake me up if it isn’t really broken © MapR Technologies, confidential ®
  • 6. © MapR Technologies, confidential ®
  • 7. Looks pretty anomalous to me © MapR Technologies, confidential ®
  • 8. Will the real anomaly please stand up? © MapR Technologies, confidential ®
  • 9. What Are We Really Doing •  We want action when something breaks (dies/falls over/otherwise gets in trouble) •  But action is expensive •  So we don’t want false alarms •  And we don’t want false negatives •  We need to trade off costs © MapR Technologies, confidential ®
  • 10. A Second Look © MapR Technologies, confidential ®
  • 11. A Second Look 99.9%-ile © MapR Technologies, confidential ®
  • 12. With Spikes © MapR Technologies, confidential ®
  • 13. With Spikes 99.9%-ile including spikes © MapR Technologies, confidential ®
  • 14. How Hard Can it Be? x x>t? Alarm ! t Online Summarizer 99.9%-ile © MapR Technologies, confidential ®
  • 15. On-line Percentile Estimates •  Apache Mahout has on-line percentile estimator –  very high accuracy for extreme tails –  new in version 0.9 !! •  What’s the big deal with anomaly detection? •  This looks like a solved problem © MapR Technologies, confidential ®
  • 16. Spot the Anomaly Anomaly? © MapR Technologies, confidential ®
  • 17. Maybe not! © MapR Technologies, confidential ®
  • 18. Where’s Waldo? This is the real anomaly © MapR Technologies, confidential ®
  • 19. Normal Isn’t Just Normal •  What we want is a model of what is normal •  What doesn’t fit the model is the anomaly •  For simple signals, the model can be simple … x ~ N(0, ε ) •  The real world is rarely so accommodating © MapR Technologies, confidential ®
  • 20. We Do Windows © MapR Technologies, confidential ®
  • 21. We Do Windows © MapR Technologies, confidential ®
  • 22. We Do Windows © MapR Technologies, confidential ®
  • 23. We Do Windows © MapR Technologies, confidential ®
  • 24. We Do Windows © MapR Technologies, confidential ®
  • 25. We Do Windows © MapR Technologies, confidential ®
  • 26. We Do Windows © MapR Technologies, confidential ®
  • 27. We Do Windows © MapR Technologies, confidential ®
  • 28. We Do Windows © MapR Technologies, confidential ®
  • 29. We Do Windows © MapR Technologies, confidential ®
  • 30. We Do Windows © MapR Technologies, confidential ®
  • 31. We Do Windows © MapR Technologies, confidential ®
  • 32. We Do Windows © MapR Technologies, confidential ®
  • 33. We Do Windows © MapR Technologies, confidential ®
  • 34. We Do Windows © MapR Technologies, confidential ®
  • 35. Windows on the World •  The set of windowed signals is a nice model of our original signal •  Clustering can find the prototypes –  Fancier techniques available using sparse coding •  The result is a dictionary of shapes •  New signals can be encoded by shifting, scaling and adding shapes from the dictionary © MapR Technologies, confidential ®
  • 36. Most Common Shapes (for EKG) © MapR Technologies, confidential ®
  • 38. An Anomaly Original technique for finding 1-d anomaly works against reconstruction error © MapR Technologies, confidential ®
  • 39. Close-up of anomaly Not what you want your heart to do. And not what the model expects it to do. © MapR Technologies, confidential ®
  • 40. A Different Kind of Anomaly © MapR Technologies, confidential ®
  • 41. Model Delta Anomaly Detection δ + δ>t? Alarm ! t Model Online Summarizer 99.9%-ile © MapR Technologies, confidential ®
  • 42. The Real Inside Scoop •  The model-delta anomaly detector is really just a mixture distribution –  the model we know about already –  and a normally distributed error •  The output (delta) is (roughly) the log probability of the mixture distribution •  Thinking about probability distributions is good © MapR Technologies, confidential ®
  • 43. Example: Event Stream (timing) •  Events of various types arrive at irregular intervals –  we can assume Poisson distribution •  The key question is whether frequency has changed relative to expected values •  Want alert as soon as possible © MapR Technologies, confidential ®
  • 44. Poisson Distribution •  Time between events is exponentially distributed Δt ~ λ e − λt •  This means that long delays are exponentially rare − λT P(Δt > T ) = e − log P(Δt > T ) = λT •  If we know λ we can select a good threshold –  or we can pick a threshold empirically © MapR Technologies, confidential ®
  • 45. Converting Event Times to Anomaly 99.9%-ile 99.99%-ile © MapR Technologies, confidential ®
  • 46. Example: Event Stream (sequence) •  The scenario: –  Web visitors are being subjected to phishing attacks –  The hook directs the visitors to a mocked login –  When they enter their credentials, the attacker uses them to log in –  We assume captcha’s or similar are part of the authentication so the attacker has to show the images on the login page to the visitor •  The problem: –  We don’t really know how the attack works © MapR Technologies, confidential ®
  • 49. Key Observations •  Regardless of exact details, there are patterns •  Event stream per user shows these patterns •  Phishing will have different patterns at much lower rate •  Measuring statistical surprise gives a good anomaly (fraud or malfunction) indicator © MapR Technologies, confidential ®
  • 50. Recap (out of order) •  Anomaly detection is best done with a probability model •  Deep learning is a neat way to build this model –  converting to symbolic dynamics simplifies life •  -log p is a good way to convert to anomaly measure •  Adaptive quantile estimation works for autosetting thresholds © MapR Technologies, confidential ®
  • 51. Recap •  Different systems require different models •  Continuous time-series –  sparse coding or deep learning to build signal model •  Events in time –  rate model base on variable rate Poisson –  segregated rate model •  Events with labels –  language modeling –  hidden Markov models © MapR Technologies, confidential ®
  • 52. How Do I Build Such a System •  The key is to combine real-time and long-time –  real-time evaluates data stream against model –  long-time is how we build the model •  Extended Lambda architecture is my favorite •  See my other talks on slideshare.net for info •  Ping me directly © MapR Technologies, confidential ®
  • 53. Hadoop is Not Very Real-time Unprocessed Data now t Fully processed Latest full period Hadoop job takes this long for this data © MapR Technologies, confidential ®
  • 54. Real-time and Long-time together Blended View view now t Hadoop works great back here Storm works here © MapR Technologies, confidential ®
  • 55. Who I am •  Ted Dunning, Chief Application Architect, MapR tdunning@mapr.com tdunning@apache.org @ted_dunning •  Committer, mentor, champion, PMC member on several Apache projects •  Mahout, Drill, Zookeeper others © MapR Technologies, confidential ®
  • 56. © MapR Technologies, confidential ® ®