Prometheus’s simple and reliable operational model is one of its major selling points. However, after surpassing a certain scale, we have identified a few shortcomings it imposes. We are proud to present Thanos, an open source project by Improbable that bundles a set of components that seamlessly transform existing Prometheus deployments, into a unified, global scale monitoring system.
Authors: Fabian Reinartz, Bartlomiej Plotka
Slides from January London Prometheus Meetup 2018.
Thanos: https://github.com/improbable-eng/thanos
7. Improbable case
● Multiple isolated Kubernetes clusters
● Single Prometheus server per cluster
● Dashboards & Alertmanager in separate cluster
Grafana
Alertmanager
What is missing? 1
2
Prometheus
n
n+1
Prometheus
...
12. 1
Global View
2
Prometheus
n
n+1
Prometheus
...
● How to aggregate data from different clusters?
○ Use hierarchical federation?
Grafana
AlertmanagerSingle point of failure
Maintenance
What data are federated?
Prometheus
48. Store
● A series is made up of one or more “chunks”
● A chunk contains ~120 samples each
● Chunks can be retrieved through HTTP byte
range queries
49. Store
● A series is made up of one or more “chunks”
● A chunk contains ~120 samples each
● Chunks can be retrieved through HTTP byte
range queries
Example:
● 1000 series @ 30s scrape interval
50. Store
● A series is made up of one or more “Chunks”
● A chunk contains ~120 samples each
● Chunks can be retrieved through HTTP byte
range queries
Example:
● 1000 series @ 30s scrape interval
● Query 1 year
8.7 million chunks/range queries
53. Store
Leverage Prometheus’ TSDB file layout
● Chunks of the same series are aligned
● Similar series are aligned, e.g. due to same
metric name
54. Store
Leverage Prometheus’ TSDB file layout
● Chunks of the same series are aligned
● Similar series are aligned, e.g. due to same
metric name
Consolidating ranges in close proximity reduces
request count by 4-6 orders of magnitude.
8.7 million requests turned into O(20) requests.
55. Store
Leverage Prometheus’ TSDB file layout
● Chunks of the same series are aligned
● Similar series are aligned, e.g. due to same
metric name
Index lookups profit from a similar approach.
66. Downsampling
Decompressing one sample takes 10-40 nanoseconds
● Times 1000 series @ 30s scrape interval
● Times 1 year
● Over 1 billion samples, i.e. 10-40s – for decoding alone
● Plus your actual computation over all those samples, e.g. rate()
81. Cost
● Store + Query node ~ Savings on Prometheus side (+/- 0)
● Fewer SSD space on Prometheus side (savings)
● Basically: just your data stored in S3/GCS/HDFS + requests
82. Cost
Example:
● 20 Prometheus servers each ingesting 100k samples/sec, 500GB of local disk
● 20 x 250GB of new data per month + ~20% overhead for downsampling
● $1440/month for storage after 1 year (72TB of queryable data)
● $100/month for sustained 100 query/sec against object storage
Thanos Cost: $1540
83. Cost
Example:
● 20 Prometheus servers each ingesting 100k samples/sec, 500GB of local disk
● 20 x 250GB of new data per month + ~20% overhead for downsampling
● $1440/month for storage after 1 year (72TB of queryable data)
● $100/month for sustained 100 query/sec against object storage
● $1530/month savings in local SSDs
Thanos Cost: $1540 Prometheus Savings: $1530