3. Data
Fidelity
• Data-driven decision making
q Evolving product landscape
• Data partners
q Nielsen
q Dataminr
• Operational
q Performance and Availability
AK
3
A/B
Tes-ng
4. Data
Fidelity:
Challenges
• Anomalies
q Exogenic factors
§ User behavior
§ Events
§ Data center
q Endogenic factors
§ Agile development
o Fail fast
§ Data collection
• Millions of time series [1,2]
q Scalability
AK
4
[1]
h2p://strata.oreilly.com/2013/09/how-‐twi2er-‐monitors-‐millions-‐of-‐-me-‐series.html
[2]
h2p://strataconf.com/strata2014/public/schedule/detail/32431
5. Anomaly
Detec-on
• Visual
q Prone to errors
q Not scalable
§ Machine generated data
11% of the digital universe in 2005
to > 40% by 2020 [1]
§ Cloud Infrastructure 2013-2017 CAGR ~50% [2]
• Algorithmic approach
q Automate!
AK
5
[1]
h2p://www.emc.com/about/news/press/2012/20121211-‐01.htm
[2]
h2p://www.forbes.com/sites/gilpress/2013/12/12/16-‐1-‐billion-‐big-‐data-‐market-‐2014-‐predic-ons-‐from-‐idc-‐and-‐iia/
6. Anomaly
Detec-on:
Background
• Over 50 years of research [1]
q Statistics
§ Extreme Value Theory
§ Robust Statistics, Grubb’s Test, ESD
q Econometrics
q Finance
§ Value at Risk (VaR)
q Signal Processing
q Music Information Retrieval
q Networking
q E- Commerce
q Performance Regression
AK
6
[1]
“Anomaly
Detec-on”
by
Chandola
et
al.
ACM
Compu-ng
Surveys,
2009.
7. Anomaly
Detec-on
• Characterization
q Magnitude
q Width
q Frequency
q Direction
AK
7
8. Anomaly
Detec-on
(contd.)
• Two flavors
q Global
§ Max Value
q Local
§ Intra-day
AK
8
Global
Local
9. Anomaly
Detec-on
(contd.)
• Traditional Approaches
q Metrics
§ Mean μ
§ Variance σ
q Rule of thumb
§ μ + 3*σ
q Which time series?
§ Raw
§ Moving Averages
o SMA, EWMA, PEWMA
AK
9
3 * σ
10. Anomaly
Detec-on
(contd.)
• Impact of multi-modal distribution
q μ Shift ~ 0.2%
q Inflates σ by 4.5%
§ Miss quite a few anomalies
q What do multiple modes correspond to?
§ Seasonality
AK
10
11. • Robust Statistics
q MAD
§ Robust Breakdown point
o Median 50% vs. Mean 0%
q σMAD
§ K = 1.4826 for normally distributed data
AK
11
Anomaly
Detec-on
(contd.)
12. • Grubb’s Test
q Critical value is derived from data using a statistical confidence (α)
• ESD (Generalized Extreme Studentized Deviate) [1]
q Critical value (λi) re-calculated every iteration
q Largest i such that Ri > λi determines # of anomalies
q An upper-bound on the number of anomalies is an input parameter
AK
12
Anomaly
Detec-on
(contd.)
[1]
Rosner,
Bernard.
“Percentage
Points
for
a
Generalized
ESD
Many-‐outlier
Procedure.”
Technometrics
25,
no.
2
(1983):
165–172.
15. • Impact of removal of seasonal and trend
q Transforms our multi-modal data into unimodal data.
§ Amenable to ESD/MAD!
AK
15
Anomaly
Detec-on
(contd.)
The decomposed Residual
becomes "Uni-modal". This
significantly shrinks the value of
sigma.
The original "Multi-Modal"
Raw Data has a much wider
value for sigma, leading ESD
to miss a lot of the outliers.
20. • Standalone R package
q https://github.com/twitter/AnomalyDetection
q Key features
§ Filter
o Last day, Last hour
o Direction: positive, negative, both
§ Expected values
§ Long term
o Piecewise approximation (HotCloud’14 research paper)
q Widely used
• Blog
q https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series
AK
20
Open
Source
21. • Pluggable design
q Data source
§ Currently, support different data sources
q Detector
• Usage
q Library:
§ Mesos job
q Service
§ RESTful API
AK
21
Anomaly
Detec-on
(contd.)
Status
Used
by
10+
internal
customers
22. • E-mail notification
AK
22
Anomaly
Detec-on
(contd.)
• JIRA integration
q Ticket auto-created if anomaly detected
23. • Granularities
q Daily
§ Seasonal adjustment based on day of the week
o Keep things simple
q Minutely
§ S-H-ESD
AK
23
Anomaly
Detec-on
(contd.)
24. • Lessons learned in the wild
q Summingbird [1] - Lambda architecture
q Real time: Data integrity issues - lag between real time and batch
§ Periodic update to cache
§ Higher threshold
AK
24
Real-‐-me
Anomaly
Detec-on
[1]
"Summingbird:
a
framework
for
integra-ng
batch
and
online
MapReduce
computa-ons",
by
O.
Boykin
and
S.
Ritchie
and
I.
O'Connell
and
J.
Lin.
Proceedings
of
the
VLDB
Endowment,
7:13,
pp.
1441-‐1451,
August
2014.
25. • Lessons learned in the wild
q JVM R bridges
§ High latency
§ Exception handling missing
q Looping future model
§ Finagle
q Few historical anomalies
AK
25
Real-‐-me
Anomaly
Detec-on
(contd.)
26. • Future work
q Streaming algorithms
§ Key for sub-minute data granularity
q Making job more robust
§ Minimizing false positives
§ Real-time topology uptime
q More use cases
§ Multiple time series (correlation)
§ Core metrics
AK
26
Real-‐-me
Anomaly
Detec-on
(contd.)
27. Join
the
Flock
• We are hiring!!
q https://twitter.com/JoinTheFlock
q https://twitter.com/jobs
q Contact us: @arun_kejariwal
Like
problem
solving?
Like
challenges?
Be
at
cuing
Edge
Make
an
impact
AK
27