- This is a story of where we've been
- And what you can learn from the past.
- Where we're going
- Humans as monitoring
- single/small numbers of machines, ops teams.
- Monitoring was limited: heavily checklist-based
- move to client/server
- more moving pieces
- move to IT being mission critical
Outages – more of them, more time to recover.
Check-based monitoring.
- periodic checks
Limited granularity and doesn’t scale
Built post-the-implementation
Focused on host and availability metrics
Checks are one source of data – generally ignores: traces, logs, metrics
Focused on IT assets not business metrics
Thankfully things have gotten simpler…
Oh wait.
Complexity plus IT becoming strategic.
Despite all this - things have not evolved as rapidly in the monitoring world.
80/20 bubble.
We need a new start.
First, re-evaluate what data youre collecting, what you're watching, what you're alerting on
Second, it’s about new requirements not about patching old systems.
You need to build a new paradigm.
New paradigm that we can grow into.
A meta narrative. A superset of monitoring.
Embraces more than monitoring, doesn’t mean giving monitoring up.
Not a perfect definition.
Aspirational
You are not the (only) customer of your monitoring.
Observe systems.
Components are interchangeable and not worth the effort.
Observe end-to-end – user experience matters.
Checks
Metrics
Logs
Traces
Also add context – you know a lot about your systems, why not surface that so you don’t have to go fishing.
Et al
Good for debugging purposes. You can’t predict failure in complex systems but you know it’s going to happen arming yourself with evidence and diagnostic information leaves you best prepared.
Tools and their data can tell you information.
Data with context gives you answers.
report the overall health of systems
Highly granular insights into the behavior of systems along with rich context.