5. Why is monitoring a DevOps topic?
● Check functionality / performance
● Analyse behavior
● Insight how software works
● Trend analytics / resources
You build it you run it!
Follow @goettl
7. Well known monitoring tools
● Nagios, Check_Mk
● Opentsb, Graphite
● Influxdb + Kapacitor (Similar to Prometheus)
● Elasticsearch + Logstash + Kibana + ...
● ...
Hard to use in a DevOps stack
Follow @goettl
8. Rule #1
"Spend more time working on code that analyzes the meaning
of metrics, than code that collects, moves, stores and displays
metrics", Adrian Cockroft
Follow @goettl
11. Demo: exporter / endpoint (Gitblit)
...
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8
jvm_memory_pool_bytes_max{pool="Metaspace",} -1.0
jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 1.073741824E9
jvm_memory_pool_bytes_max{pool="PS Eden Space",} 1.320157184E9
jvm_memory_pool_bytes_max{pool="PS Survivor Space",} 3.670016E7
jvm_memory_pool_bytes_max{pool="PS Old Gen",} 2.793406464E9
# HELP log4j_appender_total Log4j log statements at various log levels
# TYPE log4j_appender_total counter
log4j_appender_total{level="debug",} 0.0
log4j_appender_total{level="warn",} 4.0
log4j_appender_total{level="trace",} 0.0
log4j_appender_total{level="error",} 1034.0
log4j_appender_total{level="fatal",} 0.0
log4j_appender_total{level="info",} 6049.0
...
Follow @goettl
12. Demo: Prometheus out of the box functionality
● Scrape raw metrics
● Persist metrics
● Navigate data / promql
● Visualisation
Follow @goettl
13. Demo: Prometheus advanced vis + navigation
● Grafana dashboards
● Navigation with labels
Follow @goettl
14. Demo: monitoring as part of development
● Monitoring for verification of load tests
● Tests should trigger similar load to production
● DevOps is the best way to get high quality data
● Alertmanager as Assert.that
Follow @goettl
15. Demo: the admin part of Prometheus
● Prometheus time series database
● Integration to existing monitoring solutions
● How to scale Prometheus
● 11 integrations to container orchestrators (k8s, marathon, dns, ... )
Follow @goettl
22. Custom Metrics
... that's it ...
private final Counter garbageCollectsTotal = Counter.build()
.name("GIT_GARBAGE_COLLECTS_TOTAL")
.help("Number of git garbage collects issued by giblit for a repository")
.register();
...
garbageCollectsTotal.inc();
Follow @goettl
23. What did we see?
Whitebox monitoring won't work without Developers!
Follow @goettl
25. ... should I?
Don't use deep learning and datasience when a straight-
forward 15 minute rule-based system does well.
Datascience can help you to detect patterns and facts in your
metrics you can't see.
Follow @goettl
26. What is already available
● Great architecture to get high quality data
● Numerical data
● Apply mathematical functions on it
● Easy and fast navigable (promql)
● Alert / rule model
● Chart / histogram vis with Grafana
Follow @goettl
27. When do I start?
Already working alerts / dashboards you want to improve
Follow @goettl
28. Two ways to get data out of prometheus
● HTTP API (Poll)
● Exploratory data analysis
● REMOTE API (Push)
● Streaming analysis
Follow @goettl
30. Normalize prometheus datatypes
● Gauges, histograms are ok
● Counters have to be processed
● No repetition in counters. No statistical value in that.
● Use e.g derivative function to convert a counter to a gauge equivalent
Follow @goettl
31. Example 1
I can predict the latency of http requests
● Can I use the prometheus function predict_linear?
● Are there other predictions possible?
↡↡ R Notebook predict_linear↡↡
Follow @goettl
32. Histogramme, Monitoring for the long tail
histogram_quantile(0.99,
sum(
rate(
http_request_duration_seconds_bucket{method="GET"}[1m]
)
) by (le))
Follow @goettl