Histograms at scale - Monitorama 2019

Rich Histograms at Scale:
A New Hope
Evan Chan
@evanfchan
http://github.com/ﬁlodb/FiloDB

This is not a contribution@evanfchan
What do we do with
Histograms?

The Evolution of Histograms
• Pre-aggregated percentiles
Prometheus
InﬂuxDB
???
Statsd
Graphite
OpenTSDB• Histogram with buckets
• Prometheus histograms
• HDRHistogram
• T-Digests

Overlaid Latency Quantiles

Now an incident happens…

Heatmaps: Rich Visuals

Grafana Heatmaps
• Buckets are scalable to much more input data but
needs TSDB support for histogram buckets
• Time series: ﬂexible, but Grafana needs to read
ALL the raw data

Useful Histograms
• Should be aggregatable
• Supports quantiles, distributions, other f(x)
• Heatmaps - histograms over time
• Should be accurate
• Should scale and be efﬁcient

Buckets and Accuracy
• Max quantile error = bucket
width / lowerBound
• Exponential buckets = consistent
max quantile errors (Good!)
• Linear almost never makes sense
• Your custom Prom histogram
buckets likely have >100% error
Histogram Type Max Error % # Buckets
Linear 100% 60,000,000
Exponential 99.1% 26
Linear 10% 600,000,000
Exponential 10.0% 188
Example: (1000, 6E10) value range

Conﬁguring your Histograms
• Start with the range of values you need: (min, max)
• Pick the desired max quantile error %
• Think about trading off publish freq for accuracy
• # buckets = log(max/min) / log(1 + max_error)
• Example: Max error=50%, (1000 to 6E10):
numBuckets = Math.log(6E10/1000) / Math.log(1 + 0.50) 
exponentialBuckets(1000, 1 + 0.50, numBuckets)

This is not a contribution
Histograms at Scale

Histograms as First-Class
Citizen
• Modeling, transporting, and storing histograms holistically
offers many beneﬁts
• Scalability — much better storage, network, query speed
• Proper aggregations
• Better accuracy and features
• Adaptable to better histogram designs in the future
• Almost nobody is doing this yet

Prometheus Histogram
Schema
__name__ metric_sum
5 buckets, sum, count per histogram
__name__ metric_count
__name__ metric_bucket
le 0.5
le 2.0
le 5.0
le 10.
le 25.
44
5
0
2
3
5
5
35
6
1
4
6
6
6
50
10
1
5
8
9
10
60
11
2
6
10
11
11
Series1
Series2
Series3
Series4
Series5
Series6
Series7

The Scale Problem with
Histograms
• My app: 100 metrics, 20 histograms
• Assume range of (1000, 6E10).
• Notice how histograms dominate the time series!
Max error % Num buckets
Histogram
Series
Other Series Total Series
50% 44 882 80 962
10% 188 3762 80 3842
2% 905 18102 80 18182

Mama we got a problem
• Actual system: hundreds of
millions of metrics, each one
has histogram with 64
buckets
• Using Prometheus would
lead to tens of billions of
series

Prometheus: Raw Data
__name__ metric_sum
__name__ metric_count
le 0.5
le 2.0
le 5.0
le 10.
le 25.
Zone Us-west
Zone Us-west
Zone Us-west
Zone Us-west
Zone Us-west
Zone Us-west
Zone Us-west
44
5
0
2
3
5
5

Atomicity Issues
• Prom export, scrape does not guarantee grouping
of histogram buckets.
• Easy to only get part of a histogram
• FiloDB is a distributed database. 7 records might
end up in 7 different nodes!
• Calculating histogram_quantile: talk to 7 nodes
for every query!

Single Histogram Schema
5 buckets, sum, count per histogram
__name__ metric
Sum
Count
Hist
0.5
2.0
5.0
10.
25.
44
5
0
2
3
5
5
35
6
1
4
6
6
6
50
10
1
5
8
9
10
60
11
2
6
10
11
11
Series1

Single Histogram Raw Data
__name__ MetricZone Us-west
44 5 0 2 3 5 5
Sum Count Hist (0.5, 2, 5, 10, 25)
• One record, not (n + 2). No distribution problem!
• Labels only appear once
• Savings proportional to # of histogram buckets
• 50x savings for 64 histogram buckets

Much smaller network and
disk usage
• One time series vs 66 -> 50x network I/O reduction
• Single histogram schema in FiloDB uses < 0.2 bytes
per histogram bucket
Network I/O
Bytesper
histogram
0
3500
7000
10500
14000
Series/bucket Series/histo
Storage cost
Bytesperbucket
0
0.4
0.8
1.2
1.6
Series/bucket Series/histo

Optimizing Histograms:
Compression
• Delta encoding of increasing bucket values
0 2 3 5 5 0 2 1 2 0
1 4 6 6 6 1 3 2 0 0
• Compressed size about 4x-10x better than 1
time series per bucket (64 buckets; FiloDB)
• 0.18 bytes/histogram bucket (range: 0.16 - 0.61)
FiloDB
SingleHistogram
0.18 bytes/bucket
Prometheus 1.5 bytes/bucket
Raw data 8 bytes/bucket

Optimizing Histograms:
Querying (64 Buckets)
• histogram_quantile()
is more than 100x faster
than series-per-bucket
• No need for group-by
• Localized computation vs
needing to jump across 64
bucket time series
histogram_quantile()
QPS
0
7500
15000
22500
30000
Series/Bucket Series/Histo

This is not a contribution
Rich Histograms
Usability and Correctness

Changing buckets…. sum()
• sum(rate(http_req_latency{…..}[5m])) by (le)
• Different buckets lead to incorrect sums
2.5 5 10 50 +Infle= 25 100

Holistic Histograms:  
Correct Sums
• Adding histograms holistically allows us to track
bucket changes and correctly sum them
2.5 5 10 50 +Infle= 25 100

histogram_quantile clipping
• At 20:00, quantile is clipped at 2nd-last bucket of
10.0

histogram_max_quantile
• Client sends a max value at each time interval

histogram_max_quantile
• Having a known max allows us to interpolate in last bucket
• Cannot interpolate to +Inf
• https://github.com/ﬁlodb/FiloDB/pull/361
2.5 5 10 25 +Infle= 40
0.9

Ad-Hoc Histograms
• Just the quantile, min, max from gauges is not that useful
• Get heat map for CPU use across k8s containers
• histogram(2, 8,
container_cpu_usage_seconds_total{….})
• Aggregate histogram across gauges using new
histogram() function
• Yes Grafana can do heat maps from raw series - but you
can only read so many raw time series. :)

Summary: Rich Histograms
at Scale
• Treating histograms as a ﬁrst class citizen
• Massive savings in storage and network I/O
• Solve aggregation and other correctness issues
• Move towards T-Digests and future formats

Thank you very much!
Please reach out to help make useful histograms 
at scale a reality!
@evanfchan
http://github.com/ﬁlodb/FiloDB
Monitorama slack: #talk-evan-chan

Example 2: Write size

Heatmap 2: Write Size

Histogram aggregation:
Prometheus
• Group by is needed for summing histogram buckets
due to data model - leak of abstraction
• What if dev changes the histogram scheme? (# of
buckets, etc.)
• Not possible to resolve scheme differences in Prom,
since aggregation knows nothing about histograms
sum(rate(histogram_bucket{app="foo")[5m])) by (le)

Histogram aggregation:
FiloDB
• No need for _bucket, but need to select histogram
column
• No need for group by. Histograms are natively
understood and correct aggregations happen
sum(rate(histogram{app=“foo”,__col__=“h”)[5m]))

Histograms at scale - Monitorama 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Histograms at scale - Monitorama 2019

Similar to Histograms at scale - Monitorama 2019 (20)

More from Evan Chan

More from Evan Chan (15)

Recently uploaded

Recently uploaded (20)

Histograms at scale - Monitorama 2019