Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Anomaly Detection: How to find what you didn’t know to look for

564 views

Published on

A description of a range of basic anomaly detection methods that can be applied to practical problems.

Published in: Technology
  • Login to see the comments

Anomaly Detection: How to find what you didn’t know to look for

  1. 1. © 2016 MapR Technologies 1© 2016 MapR Technologies
  2. 2. © 2016 MapR Technologies 2 Anomaly Detection: How To Find What You Didn’t Know to Look For Ted Dunning, Chief Applications Architect MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning Ellen Friedman, Consultant and Commentator Email ellenf@apache.org Twitter @Ellen_Friedman
  3. 3. © 2016 MapR Technologies 3 e-book available courtesy of MapR http://bit.ly/1jQ9QuL A New Look at Anomaly Detection by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)
  4. 4. © 2016 MapR Technologies 4 Practical Machine Learning series (O’Reilly) • Machine learning is becoming mainstream • Need pragmatic approaches that take into account real world business settings: – Time to value – Limited resources – Availability of data – Expertise and cost of team to develop and to maintain system • Look for approaches with big benefits for the effort expended
  5. 5. © 2016 MapR Technologies 5 Anomaly Detection
  6. 6. © 2016 MapR Technologies 6 Who Needs Anomaly Detection? Utility providers using smart meters
  7. 7. © 2016 MapR Technologies 7 Who Needs Anomaly Detection? Feedback from manufacturing assembly lines
  8. 8. © 2016 MapR Technologies 8 Who Needs Anomaly Detection? Monitoring data traffic on communication networks
  9. 9. © 2016 MapR Technologies 9 What is Anomaly Detection? • The goal is to discover rare events – especially those that shouldn’t have happened • Find a problem before other people see it – especially before it causes a problem for customers • Why is this a challenge? – I don’t know what an anomaly looks like (yet)
  10. 10. © 2016 MapR Technologies 10 Spot the Anomaly
  11. 11. © 2016 MapR Technologies 11 Spot the Anomaly Looks pretty anomalous to me
  12. 12. © 2016 MapR Technologies 12 Spot the Anomaly Will the real anomaly please stand up?
  13. 13. © 2016 MapR Technologies 13 Basic idea: Find “normal” first
  14. 14. © 2016 MapR Technologies 14 Steps in Anomaly Detection • Build a model: Collect and process data for training a model • Use the machine learning model to determine what is the normal pattern • Decide how far away from this normal pattern you’ll consider to be anomalous • Use the AD model to detect anomalies in new data – Methods such as clustering for discovery can be helpful
  15. 15. © 2016 MapR Technologies 15 How hard is it to set an alert for anomalies? Grey data is from normal events; x’s are anomalies. Where would you set the threshold?
  16. 16. © 2016 MapR Technologies 16 Basic idea: Set adaptive thresholds
  17. 17. © 2016 MapR Technologies 17 What Are We Really Doing • We want action when something breaks (dies/falls over/otherwise gets in trouble) • But action is expensive • So we don’t want too many false alarms • And we don’t want too many false negatives • What’s the right threshold to set for alerts? – We need to trade off costs
  18. 18. © 2016 MapR Technologies 18 A Second Look
  19. 19. © 2016 MapR Technologies 19 A Second Look 99.9%-ile
  20. 20. © 2016 MapR Technologies 20 Cool algorithm: t-digest
  21. 21. © 2016 MapR Technologies 21 Online Summarizer 99.9%-ile t x > t ? Alarm ! x How Hard Can it Be?
  22. 22. © 2016 MapR Technologies 22 Using t-Digest • The t-digest is an on-line percentile estimator – very high accuracy for extreme tails • t-digest also available everywhere – in ElasticSearch, in Solr – in streamlib (open source library on github) – in Mahout Math (open source library on github) – standalone (github and Maven Central) • Very handy for general distributions, few assumptions • For latency, exponential binning may be useful – See, for instance, hdrhistorgram
  23. 23. © 2016 MapR Technologies 23 So are we all done?
  24. 24. © 2016 MapR Technologies 24 What About This? 0 5 10 15 −20246810 offset+noise+pulse1+pulse2 A B
  25. 25. © 2016 MapR Technologies 25 Model Delta Anomaly Detection Online Summarizer δ > t ? 99.9%-ile t Alarm ! Model - + δ
  26. 26. © 2016 MapR Technologies 26 Spot the Anomaly Anomaly?
  27. 27. © 2016 MapR Technologies 27 Maybe not!
  28. 28. © 2016 MapR Technologies 28 Where’s Waldo? This is the real anomaly
  29. 29. © 2016 MapR Technologies 29 Normal Isn’t Just Normal • What we want is a model of what is normal • What doesn’t fit the model is the anomaly • For simple signals, the model can be simple … • The real world is rarely so accommodating x ~ m(t)+ N(0,e)
  30. 30. © 2016 MapR Technologies 30 We Do Windows
  31. 31. © 2016 MapR Technologies 31 We Do Windows
  32. 32. © 2016 MapR Technologies 32 We Do Windows
  33. 33. © 2016 MapR Technologies 33 We Do Windows
  34. 34. © 2016 MapR Technologies 34 We Do Windows
  35. 35. © 2016 MapR Technologies 35 We Do Windows
  36. 36. © 2016 MapR Technologies 36 We Do Windows
  37. 37. © 2016 MapR Technologies 37 We Do Windows
  38. 38. © 2016 MapR Technologies 38 We Do Windows
  39. 39. © 2016 MapR Technologies 39 We Do Windows
  40. 40. © 2016 MapR Technologies 40 We Do Windows
  41. 41. © 2016 MapR Technologies 41 We Do Windows
  42. 42. © 2016 MapR Technologies 42 We Do Windows
  43. 43. © 2016 MapR Technologies 43 We Do Windows
  44. 44. © 2016 MapR Technologies 44 We Do Windows
  45. 45. © 2016 MapR Technologies 45 Windows on the World • The set of windowed signals is a nice model of our original signal • Clustering can find the prototypes – Fancier techniques available using sparse coding • The result is a dictionary of shapes • New signals can be encoded by shifting, scaling and adding shapes from the dictionary
  46. 46. © 2016 MapR Technologies 46 Most Common Shapes (for EKG)
  47. 47. © 2016 MapR Technologies 47 Reconstructed signal Original signal Reconstructed signal Reconstruction error < 1 bit / sample
  48. 48. © 2016 MapR Technologies 48 An Anomaly Original technique for finding 1-d anomaly works against reconstruction error
  49. 49. © 2016 MapR Technologies 49 Close-up of anomaly Not what you want your heart to do. And not what the model expects it to do.
  50. 50. © 2016 MapR Technologies 50 A Different Kind of Anomaly
  51. 51. © 2016 MapR Technologies 51 Model Delta Anomaly Detection Online Summarizer δ > t ? 99.9%-ile t Alarm ! Model - + δ
  52. 52. © 2016 MapR Technologies 52 The Real Inside Scoop • The model-delta anomaly detector is really just a sum of random variables – the model we know about already – and a normally distributed error • The output (delta) is (roughly) the log probability of the sum distribution (really δ2) • Thinking about probability distributions is good
  53. 53. © 2016 MapR Technologies 53 Some k-means Caveats • But Eamonn Keogh says that k-means can’t work on time-series • That is silly … and kind of correct, k-means does have limits – Other kinds of auto-encoders are much more powerful • More fun and code demos at – https://github.com/tdunning/k-means-auto-encoder http://www.cs.ucr.edu/~eamonn/meaningless.pdf
  54. 54. © 2016 MapR Technologies 54 The Limits of Clustering as Auto-encoder • Clustering is like trying to tile your sample distribution • Can be used to approximate a signal • Filling d dimensional region with k clusters should give • If d is large, this is no good e » 1/ kd
  55. 55. © 2016 MapR Technologies 55 0 500 1000 1500 2000 −2−1012 Time series training data (first 2000 samples) Time Test data Reconstruction Error
  56. 56. © 2016 MapR Technologies 56 0 500 1000 1500 2000 0.000.050.100.15 Reconstruction error for time−series data Centroids MAVError Training data Held−out data
  57. 57. © 2016 MapR Technologies 57 Another Example • Take points randomly in , project non-linearly into • Approximation using clustering should give
  58. 58. © 2016 MapR Technologies 58 0 500 1000 1500 2000 0.00.51.01.52.0 Reconstruction error for random points Centroids Error Training data Held−out data
  59. 59. © 2016 MapR Technologies 59 0 500 1000 1500 2000 0.00.51.01.52.0 Error is approximately cube root of k k Error Actual Cube root model
  60. 60. © 2016 MapR Technologies 60 Moral For Auto-encoders • The simplest auto-encoders can be good models • For more complex spaces/signals, more elaborate models may be required • Consider deep learning, recurrent networks, denoising
  61. 61. © 2016 MapR Technologies 61 Anomalies among sporadic events
  62. 62. © 2016 MapR Technologies 62 Sporadic Web Traffic to an e-Business Site It’s important to know if traffic is stopped or delayed because of a problem… But visits to site normally come at varying intervals. How long after the last event should you begin to worry?
  63. 63. © 2016 MapR Technologies 63 Sporadic Web Traffic to an e-Business Site It’s important to know if traffic is stopped or delayed because of a problem… But visits to site normally come at varying intervals. And how do you let your CEO sleep through the night?
  64. 64. © 2016 MapR Technologies 64 Basic idea: Time interval between events is how to convert to something useful you can measure
  65. 65. © 2016 MapR Technologies 65 Sporadic Events: Finding Normal and Anomalous Patterns • Time between intervals is much more usable than absolute times • Counts don’t link as directly to probability models • Time interval is log ρ • This is a big deal
  66. 66. © 2016 MapR Technologies 66 Event Stream (timing) • Events of various types arrive at irregular intervals – we can assume Poisson distribution • The key question is whether frequency has changed relative to expected values – This shows up as a change in interval • Want alert as soon as possible
  67. 67. © 2016 MapR Technologies 67 Converting Event Times to Anomaly 99.9%-ile 99.99%-ile
  68. 68. © 2016 MapR Technologies 68 But in the real world, event rates often change
  69. 69. © 2016 MapR Technologies 69 Time Intervals Are Key to Modeling Sporadic Events 0 1 2 3 4 02468 t (days) dt(min)
  70. 70. © 2016 MapR Technologies 70 Time Intervals Are Key to Modeling Sporadic Events 0 1 2 3 4 02468 t (days) dt(min)
  71. 71. © 2016 MapR Technologies 71 Poisson Distribution • Time between events is exponentially distributed • This means that long delays are exponentially rare • If we know λ we can select a good threshold – or we can pick a threshold empirically Dt ~ le-lt P(Dt > T) = e-lT -logP(Dt > T) = lT
  72. 72. © 2016 MapR Technologies 72 After Rate Correction 0 1 2 3 4 0246810 t (days) dt/rate 99.9%−ile 99.99%−ile
  73. 73. © 2016 MapR Technologies 73 Model-Scaled Intervals Solve the Problem
  74. 74. © 2016 MapR Technologies 74 Model Delta Anomaly Detection Online Summarizer δ > t ? 99.9%-ile t Alarm ! Model - + δ log p
  75. 75. © 2016 MapR Technologies 75 Detecting Anomalies in Sporadic Events Incoming events 99.97%-ile Alarm Δn Rate predictor Rate history t-digest δ> t ti δ λ(ti- ti- n) λ t
  76. 76. © 2016 MapR Technologies 76 Detecting Anomalies in Sporadic Events Incoming events 99.97%-ile Alarm Δn Rate predictor Rate history t-digest δ> t ti δ λ(ti- ti- n) λ t
  77. 77. © 2016 MapR Technologies 77 Slipped Week: Simple Rate Predictor Nov 02 Nov 07 Nov 12 Nov 17 Nov 22 Nov 27 Dec 02 0100200300400500 Main Page Traffic Date Hits(x1000) A B C D
  78. 78. © 2016 MapR Technologies 78 Seasonality Poses a Challenge Nov 17 Nov 27 Dec 07 Dec 17 Dec 27 02468 Christmas Traffic Date Hits/1000
  79. 79. © 2016 MapR Technologies 79 Something more is needed … Nov 17 Nov 27 Dec 07 Dec 17 Dec 27 02468 Christmas Traffic Date Hits/1000
  80. 80. © 2016 MapR Technologies 80 We need a better rate predictor… Incoming events 99.97%-ile Alarm Δn Rate predictor Rate history t-digest δ> t ti δ λ(ti- ti- n) λ t
  81. 81. © 2016 MapR Technologies 81 Idea: Predict log(rate) from lagged log(rate) • Predict log because – Peak to valley ratio – Traffic grew by 30 % – All rates are positive
  82. 82. © 2016 MapR Technologies 82 Idea: Predict log(rate) from lagged log(rate) • Predict log because – Peak to valley ratio – Traffic grew by 30 % – All rates are positive – Just because I said so
  83. 83. © 2016 MapR Technologies 83 Idea: Predict log(rate) from lagged log(rate) • Predict log because – Peak to valley ratio – Traffic grew by 30 % – All rates are positive – Just because I said so • Let model see many lagged values • Use L1 regularized linear model to pick important historical values – We would have moved to something fancier if this hadn’t worked
  84. 84. © 2016 MapR Technologies 84 A New Rate Predictor for Sporadic Events
  85. 85. © 2016 MapR Technologies 85 Improved Prediction with Adaptive Modeling Dec 17 Dec 19 Dec 21 Dec 23 Dec 25 Dec 27 Dec 29 02468 Christmas Prediction Date Hits(x1000)
  86. 86. © 2016 MapR Technologies 86 Some days the magic works Some days ... We use slightly different magic
  87. 87. © 2016 MapR Technologies 87 Streaming Micro-Service Scenario File upload web service Files Thumbnail extraction Transcoding uploads thumbs recodes Files
  88. 88. © 2016 MapR Technologies 88 Let’s Assume a Good Micro-Architecture Thumbnail extraction uploads thumbs metrics exceptions checkpoints Input Output Monitoring Restart
  89. 89. © 2016 MapR Technologies 89 How Can We Monitor This? • We want to detect more than just cascading total failure • Arrival time is good for detecting upstream complete loss • What about misbehavior of a black box?
  90. 90. © 2016 MapR Technologies 90 0 100 200 300 400 500 600 0.00.20.40.60.8 End time Deltatime
  91. 91. © 2016 MapR Technologies 91 0 100 200 300 400 500 0200400600 Start End
  92. 92. © 2016 MapR Technologies 92 0 100 200 300 400 500 600 051015202530 End time Elapsedtime
  93. 93. © 2016 MapR Technologies 93 Some Models Must Model Internal Operations • Computational architecture can help modeling • Models need to be built with knowledge of intent and structure • Pure black box is rarely sufficient … you need some intuitions
  94. 94. © 2016 MapR Technologies 94 Anomaly Detection + Classification  Useful Pair • Use the AD model to detect anomalies in new data – Methods such as clustering for discovery can be helpful • Once you have well-defined models in your system, you may also want to use classification to tag those • Continue to use the AD model to find new anomalies
  95. 95. © 2016 MapR Technologies 95 Recap (out of order) • Anomaly detection is best done with a probability model • -log p is a good way to convert to anomaly measure • -log p takes different forms in different systems • Simplistic distributions insufficient in practice – Need mixture distributions – Resampled live data • Adaptive quantile estimation (t-digest) works for auto-setting thresholds
  96. 96. © 2016 MapR Technologies 96 Recap • Different systems require different models • Continuous time-series – sparse coding to build signal model • Events in time – rate model base on variable rate Poisson – segregated rate model • Events with labels – language modeling – hidden Markov models
  97. 97. © 2016 MapR Technologies 97 Why Use Anomaly Detection?
  98. 98. © 2016 MapR Technologies 98 Keep in mind… • Model normal, then find anomalies • t-digest for adaptive threshold • Probabilistic models for complex patterns - 0 5 10 15 −20246810 offset+noise+pulse1+pulse2 A B
  99. 99. © 2016 MapR Technologies 99 Dec 17 Dec 19 Dec 21 Dec 23 Dec 25 Dec 27 Dec 29 02468 Christmas Prediction Date Hits(x1000) Keep in mind… • Time intervals are key for sporadic events • Complex time shift to predict rate with seasonality • Sequence of events reveals phishing attack
  100. 100. © 2016 MapR Technologies 100 e-book available courtesy of MapR http://bit.ly/1jQ9QuL A New Look at Anomaly Detection by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)
  101. 101. © 2016 MapR Technologies 101 Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams Read online mapr.com/6ebooks-read Download pdfs mapr.com/6ebooks-pdf 6 Free ebooks Streaming Architecture Ted Dunning & Ellen Friedman and MapR Streams
  102. 102. © 2016 MapR Technologies 102 Thank you for coming today!
  103. 103. © 2016 MapR Technologies 103 bit.ly/sdaml-june2016 Find my slides & other related materials to this talk here: or search:
  104. 104. © 2016 MapR Technologies 104 …helping you put data technology to work ● Find answers ● Ask technical questions ● Join on-demand training course discussions ● Follow release announcements ● Share and vote on product ideas ● Find Meetup and event listings Connect with fellow Apache Hadoop and Spark professionals community.mapr.com

×