This talk gives an introduction to the Prometheus data model and PromQL, building up a simple PromQL expression and explaining what's happening under the hood at each stage. The architecture of Prometheus is shown as well as its limitations, and a distributed version of Prometheus, Weave Cortex, is proposed. Finally, an explanation of how Kubernetes and Prometheus have sympathetic designs is given.
1. Monitoring your App in
Kubernetes with Prometheus
Luke Marsden, Developer Experience
@lmarsden
2. What does Weave do?
Weave helps devops
iterate faster with:
• observability &
monitoring
• continuous delivery
• container networks &
firewalls
Use Prometheus to
power our Monitoring
solution
3. What does Weave do?
Weave helps devops
iterate faster with:
• observability &
monitoring
• continuous delivery
• container networks &
firewalls
Use Prometheus to
power our Monitoring
solution
4. Agenda
1. Prometheus concepts: data model & PromQL
2. Prometheus architecture & pull model
3. Why Prometheus & Kubernetes are a good fit
4. What is Cortex?
5. Kubernetes recap
6. Training on real app
7. What’s next?
6. Data model & PromQL
• Prometheus is a labelled time-series database
• Labels are key-value pairs
• A time-series is [(timestamp, value), …]
• lists of timestamp, value tuples
• values are just floats, but you represent counters, gauges,
histograms, etc – PromQL lets you make sense of them
• So the data type of Prometheus is
• {key1=A, key2=B} —> [(t0, v0), (t1, v1), …]
• …
7. Data model & PromQL
• __name__ is a magic label, you can
shorten the query syntax from
{__name__=“requests”}
to:
requests
8. Data model & PromQL
• Example: counter requests over a spike in traffic:
• 1, 2, 3, 13, 23, 33, 34, 35, 36
time
requests
1
3
13
23
33
36
t1 t2 t3 t4 t5 t6 t7 t8 t9
1 2 3 13 23 33 34 35 36
9. Data model & PromQL
• What Prom is storing
• {__name__=“requests”} —>
[(t1, 1), (t2, 2), (t3, 3), (t4, 13),
(t5, 23), (t6, 33), (t7, 34), (t8, 35),
(t9, 36), (t10, 37)]
or
t1 t2 t3 t4 t5 t6 t7 t8 t9
1 2 3 13 23 33 34 35 36
10. Data model & PromQL
• the [P] (period) syntax after a label turns
an instant type into a vector type
• for each value, turn the value into a vector
of all the values before and including that
value for the last period P
• Example P: 5s, 1m, 2h…
11. Data model & PromQL
• Recall our time-series requests
• What is requests[3s]? Vector query:
t1 t2 t3 t4 t5 t6 t7 t8 t9
1 2 3 13 23 33 34 35 36
t1-3 t2-4 t3-5 t4-6 t5-7 t6-8 t7-9
1
2
3
12. Data model & PromQL
• Recall our time-series requests
• What is requests[3s]? Vector query:
t1 t2 t3 t4 t5 t6 t7 t8 t9
1 2 3 13 23 33 34 35 36
t1-3 t2-4 t3-5 t4-6 t5-7 t6-8 t7-9
1 2
2 3
3 13
13. Data model & PromQL
• Recall our time-series requests
• What is requests[3s]? Vector query:
t1 t2 t3 t4 t5 t6 t7 t8 t9
1 2 3 13 23 33 34 35 36
t1-3 t2-4 t3-5 t4-6 t5-7 t6-8 t7-9
1 2 3
2 3 13
3 13 23
15. Data model & PromQL
• rate() finds the per second rate of
change over a vector query
• for each vector rate() just does
(last_value - first_value) / (last_time -
first_time)
38. Labels
• Recall that requests is just shorthand for
{__name__=“requests”}
• We can have more labels:
{__name__=“requests”, job=“frontend”}
• Shortens to requests{job=“frontend”}
• And so we could query
rate(requests{job=“frontend”}[1m])
41. Alerts
• You can define PromQL queries that trigger alerts when
the result of a query matches a criteria. Example:
# Alert for any instance that have a median request latency >1s.
ALERT APIHighRequestLatency
IF api_http_request_latencies_second{quantile="0.5"} > 1
FOR 1m
ANNOTATIONS {
summary = "High request latency on {{ $labels.instance }}",
description = "{{ $labels.instance }} has a median request latency above 1s (current
value: {{ $value }}s)",
}
42. Cortex
• Distributed, multi-tenant version of
Prometheus
• Prometheus architecture is single-server
• We wanted to build something scalable
44. Cortex
• We run it for you
• Long term storage for your metrics
• We open sourced it
• https://github.com/weaveworks/cortex
45. Recap: all you need to know (Kube)
Pods
containers
ServicesDeployments
Container
Image
Docker container image, contains your application code in an isolated
environment.
Pod A set of containers, sharing network namespace and local volumes,
co-scheduled on one machine. Mortal. Has pod IP. Has labels.
Deployment Specify how many replicas of a pod should run in a cluster. Then
ensures that many are running across the cluster. Has labels.
Service Names things in DNS. Gets virtual IP. Two types: ClusterIP for internal
services, NodePort for publishing to outside. Routes based on labels.
47. Why Kubernetes <3 Prometheus
• Prom discovers what to scrape by asking Kube
• Prom’s pull model matches Kube dynamic
scheduling
• Allows Prom to identify thing it’s pulling from
• Prom label/value pairs mirror Kube labels
• Pods were made for exporters
49. Join the Weave user group!
meetup.com/pro/Weave/
weave.works/help
50. Other topics
• Kubernetes 101
• Continuous delivery: hooking up my CI/CD
pipeline to Kubernetes
• Network policy for security
We have talks on all these topics in the Weave
user group!
51. Thanks! Questions?
We are hiring!
DX in San Francisco
Engineers in London & SF
weave.works/weave-company/hiring