Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

Beyond The Pretty Charts
Analytics for the rest of us
Toufic Boubez, Ph.D.
Co-Founder, CTO
Metafor Software

2
Toufic intro – who I am
• Co-Founder/CTO Metafor Software
• Co-Founder/CTO Layer 7 Technologies
– API Management
– Acquired by Computer Associates in 2013
• I escaped 
• Building large scale software systems for 20
years (I’m older than I look, I know!)

3
Why this talk?
• DevOps Days Austin: Open Space talk
– Blog: http://metaforsoftware.com/beyond-the-
pretty-charts-a-report-from-devopsdays-in-austin/
• Five major discussion points/lessons learned
• Note: no labels on charts – on purpose!!
• Note: real data

5
1. We’ve moved beyond static thresholds
• Most current monitoring tools assume that
the underlying system is relatively static so we
can surround it with static thresholds and
rules. BUT:
– So what if my unicorn usage is at 91%, and has
been stable at 91% for a while?
– I’d much rather know if it’s at 60% and has been
rapidly increasing over the last few hours.

6
Need more better analytics
• Thresholds won’t help you in this case
• Need some more dynamic analytics

7
2. Context is really important
– Do I really want to be alerted when I know someone is
performing maintenance or backups?
– Is there an event that caused the change in behaviour (e.g. new
deploy)?
– Correlate your event line with your monitoring
Down for maintenance?

8
3. Know your data!!
– You need to understand the statistical properties
of your data, and where it comes from, in order to
determine what kind of analytics to use.
• For example, it’s important to know if your data is
normally distributed.
• http://codeascraft.com/2013/06/11/introducing-kale/
• https://github.com/etsy/skyline/blob/master/src/analy
zer/algorithms.py
– Three-sigma, Grubbs and other algorithms assume normal
distribution

11
Another common distribution

12
4. Is all data important to collect?
– Two camps:
• Data is data, let’s collect and analyze everything and
figure out the trends.
• Not all data is important, so let’s figure out what’s
important first and understand the underlying model so
we don’t waste resources on the rest.
– Similar to the very public bun fight between Noam
Chomsky and Peter Norvig
• http://norvig.com/chomsky.html
– Unresolved as far as I know 

14
5. We all want to automate
• Having humans in the way of detecting and
solving DevOps issues doesn’t scale.
• At some point, we need systems that can
detect anomalies before problems become
critical, and take appropriate action.

15
Open Loop Control System:
Heating your house – the wrong way!
• Steps:
– Tweak heater input
– Get to ideal temperature
– Lock gas valve
– Hope nothing changes
Controller
(gas valve)
System
(heater)
Sensor
(thermometer)

16
Controller
(gas valve)
System
(heater)
Sensor
(thermometer)
+
-
delta
desired
temperature
current
temperature
Open Loop Control System:
Heating your house – the right way
• Steps:
– Set the desired temperature
– Sit back and let the system deal with changes

17
Controller System
Sensor
+
-
Puppet
Chef
CFEngine
…
My
Infrastrucutre
Nagios
Cacti
Zabbix
…
?desired
state
current
state
What’s missing to get to self-healing systems
delta
• We have most of the tools already
• Need to add:
– Error tracking (anomaly detection)
– Corrective action

18
How much data do we need?
• Trend towards higher and higher sampling
rates in data collection
• Reminds me of Jorge Luis Borges’ story about
Funes the Memorious
– Perfect recollection of the slightest details of every
instant of his life, but lost the ability for
abstraction
• Our brain works on abstraction
– We notice patterns BECAUSE we can abstract

19
The danger of over-abstraction
+
= comfortable?

20
So, how much data DO you need?
– You don’t need more resolution that twice your
highest frequency (Nyquist-Shanon sampling
theorem)
– Most of the algorithms for analytics will smooth,
average, filter, and pre-process the data.
– Watch out for correlated metrics (e.g. used vs.
available memory)

21
More?
• I want to talk more about analytics, in more
depth, but time’s up!!
– (Actually John won’t let me)
• Come talk to me during the breaks!
• Thank you!

Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (17)

Similar to Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22

Similar to Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22 (20)

Recently uploaded

Recently uploaded (20)

Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days Silicon Valley 2013-06-22