Dynamic Infrastructure and Container Monitoring with Prometheus

Dynamic Infrastructure and Container Monitoring with
PROMETHEUS
Georg Öttl
Infracoders Meetup, 2017, Graz
Follow @goettl

● Enterprise Software dev
● Data Science Services
● Dev / DevOps / Ops
● Developer who likes Math
Twitter: @goettl
About me
Follow @goettl

Overview
● Monitoring
● Prometheus by example
● DevOps demo, scaling Gitblit
● Analyze Prometheus metrics like a data scientist
Follow @goettl

Why is monitoring a DevOps topic?
● Check functionality / performance
● Analyse behavior
● Insight how software works
● Trend analytics / resources
You build it you run it!
Follow @goettl

Metrics, tracing, logging?
Follow @goettl
Blog Peter Bourgon - Metrics, Tracing and Logging

Well known monitoring tools
● Nagios, Check_Mk
● Opentsb, Graphite
● Influxdb + Kapacitor (Similar to Prometheus)
● Elasticsearch + Logstash + Kibana + ...
● ...
Hard to use in a DevOps stack
Follow @goettl

Rule #1
"Spend more time working on code that analyzes the meaning
of metrics, than code that collects, moves, stores and displays
metrics", Adrian Cockroft
Follow @goettl

Prometheus by example
Follow @goettl

Demo: app scenario scaling Gitblit
Follow @goettl

Demo: exporter / endpoint (Gitblit)
...
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8
jvm_memory_pool_bytes_max{pool="Metaspace",} -1.0
jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 1.073741824E9
jvm_memory_pool_bytes_max{pool="PS Eden Space",} 1.320157184E9
jvm_memory_pool_bytes_max{pool="PS Survivor Space",} 3.670016E7
jvm_memory_pool_bytes_max{pool="PS Old Gen",} 2.793406464E9
# HELP log4j_appender_total Log4j log statements at various log levels
# TYPE log4j_appender_total counter
log4j_appender_total{level="debug",} 0.0
log4j_appender_total{level="warn",} 4.0
log4j_appender_total{level="trace",} 0.0
log4j_appender_total{level="error",} 1034.0
log4j_appender_total{level="fatal",} 0.0
log4j_appender_total{level="info",} 6049.0
...
Follow @goettl

Demo: Prometheus out of the box functionality
● Scrape raw metrics
● Persist metrics
● Navigate data / promql
● Visualisation
Follow @goettl

Demo: Prometheus advanced vis + navigation
● Grafana dashboards
● Navigation with labels
Follow @goettl

Demo: monitoring as part of development
● Monitoring for verification of load tests
● Tests should trigger similar load to production
● DevOps is the best way to get high quality data
● Alertmanager as Assert.that
Follow @goettl

Demo: the admin part of Prometheus
● Prometheus time series database
● Integration to existing monitoring solutions
● How to scale Prometheus
● 11 integrations to container orchestrators (k8s, marathon, dns, ... )
Follow @goettl

Whitebox instrumentation in Java
Follow @goettl

How to do whitebox monitoring so far
● Json / CSV / SQL View, ...
● JMX
● Libraries with hooks push (e.g. datadog, ... )
Follow @goettl

Prometheus client instrumentation, example Gitblit
● Client instrumentation
● Default metrics for Log4j
● Default metrics für JDK
● Custom Metric for git garbage collection, ldap sync
Follow @goettl

Prometheus client Metrics HTTP / Servlet
Gitblit Servlet / Guice WebModule konfigurieren
bind(MetricsServlet.class).in(Scopes.SINGLETON);
serve("/Prometheus").with(MetricsServlet.class);
... that's it ...
Follow @goettl

Prometheus client Metrics JDK
Register default JDK Metrics
DefaultExports.initialize();
... that's it ...
Follow @goettl

Client Metriken Log4j
Instrumen Logger / Log4j
log4j.rootCategory=INFO, S, METRICS
...
log4j.appender.METRICS = io.Prometheus.client.log4j.InstrumentedAppender
log4j.appender.METRICS.Append = false
... that's it ...
Follow @goettl

Custom Metrics
... that's it ...
private final Counter garbageCollectsTotal = Counter.build()
.name("GIT_GARBAGE_COLLECTS_TOTAL")
.help("Number of git garbage collects issued by giblit for a repository")
.register();
...
garbageCollectsTotal.inc();
Follow @goettl

What did we see?
Whitebox monitoring won't work without Developers!
Follow @goettl

Analyze Prometheus Metrics Like a Data Scientist
Follow @goettl

... should I?
Don't use deep learning and datasience when a straight-
forward 15 minute rule-based system does well.
Datascience can help you to detect patterns and facts in your
metrics you can't see.
Follow @goettl

What is already available
● Great architecture to get high quality data
● Numerical data
● Apply mathematical functions on it
● Easy and fast navigable (promql)
● Alert / rule model
● Chart / histogram vis with Grafana
Follow @goettl

When do I start?
Already working alerts / dashboards you want to improve
Follow @goettl

Two ways to get data out of prometheus
● HTTP API (Poll)
● Exploratory data analysis
● REMOTE API (Push)
● Streaming analysis
Follow @goettl

HTTP API - /api/v1/query_range
requests.get(
url = 'http://127.0.0.1:9090/api/v1/query_range',
params = {
'query': 'sum({__name__=~".+"}) by (__name__,instance)',
'start': '1502809554',
'end' : '1502839554',
'step' : '1m'
})
{"data": {..., "resultType": "matrix",
"result": [{
"metric": {"method": "GET",...},
"values": [[1500008340,"3"], ... ]},...]
}}
Follow @goettl

Normalize prometheus datatypes
● Gauges, histograms are ok
● Counters have to be processed
● No repetition in counters. No statistical value in that.
● Use e.g derivative function to convert a counter to a gauge equivalent
Follow @goettl

Example 1
I can predict the latency of http requests
● Can I use the prometheus function predict_linear?
● Are there other predictions possible?
↡↡ R Notebook predict_linear↡↡
Follow @goettl

Histogramme, Monitoring for the long tail
histogram_quantile(0.99,
sum(
rate(
http_request_duration_seconds_bucket{method="GET"}[1m]
)
) by (le))
Follow @goettl

Outliers Detection Algorithms
Follow @goettl
https://github.com/twitter/AnomalyDetection

Demo export from grafana
● Demo API
● Export into csv
Follow @goettl

Thx for having me here at infracoders meetup 2017!
Questions?
Georg Öttl
Twitter Handle: @goettl
Follow @goettl

Dynamic Infrastructure and Container Monitoring with Prometheus

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Dynamic Infrastructure and Container Monitoring with Prometheus

Similar to Dynamic Infrastructure and Container Monitoring with Prometheus (20)

Recently uploaded

Recently uploaded (20)

Dynamic Infrastructure and Container Monitoring with Prometheus