SlideShare a Scribd company logo
1 of 36
Download to read offline
Rich Histograms at Scale:
A New Hope
Evan Chan
@evanfchan
http://github.com/filodb/FiloDB
This is not a contribution
This is not a contribution@evanfchan
What do we do with
Histograms?
This is not a contribution@evanfchan
The Evolution of Histograms
• Pre-aggregated percentiles
Prometheus
InfluxDB
???
Statsd
Graphite
OpenTSDB• Histogram with buckets
• Prometheus histograms
• HDRHistogram
• T-Digests
This is not a contribution@evanfchan
Overlaid Latency Quantiles
This is not a contribution@evanfchan
Now an incident happens…
This is not a contribution@evanfchan
Heatmaps: Rich Visuals
This is not a contribution@evanfchan
Grafana Heatmaps
• Buckets are scalable to much more input data but
needs TSDB support for histogram buckets
• Time series: flexible, but Grafana needs to read
ALL the raw data
This is not a contribution@evanfchan
Useful Histograms
• Should be aggregatable
• Supports quantiles, distributions, other f(x)
• Heatmaps - histograms over time
• Should be accurate
• Should scale and be efficient
This is not a contribution@evanfchan
Buckets and Accuracy
• Max quantile error = bucket
width / lowerBound
• Exponential buckets = consistent
max quantile errors (Good!)
• Linear almost never makes sense
• Your custom Prom histogram
buckets likely have >100% error
Histogram Type Max Error % # Buckets
Linear 100% 60,000,000
Exponential 99.1% 26
Linear 10% 600,000,000
Exponential 10.0% 188
Example: (1000, 6E10) value range
This is not a contribution@evanfchan
Configuring your Histograms
• Start with the range of values you need: (min, max)
• Pick the desired max quantile error %
• Think about trading off publish freq for accuracy
• # buckets = log(max/min) / log(1 + max_error)
• Example: Max error=50%, (1000 to 6E10):
numBuckets = Math.log(6E10/1000) / Math.log(1 + 0.50)

exponentialBuckets(1000, 1 + 0.50, numBuckets)
This is not a contribution
Histograms at Scale
This is not a contribution@evanfchan
Histograms as First-Class
Citizen
• Modeling, transporting, and storing histograms holistically
offers many benefits
• Scalability — much better storage, network, query speed
• Proper aggregations
• Better accuracy and features
• Adaptable to better histogram designs in the future
• Almost nobody is doing this yet
This is not a contribution@evanfchan
Prometheus Histogram
Schema
__name__ metric_sum
5 buckets, sum, count per histogram
__name__ metric_count
__name__ metric_bucket
__name__ metric_bucket
__name__ metric_bucket
__name__ metric_bucket
__name__ metric_bucket
le 0.5
le 2.0
le 5.0
le 10.
le 25.
44
5
0
2
3
5
5
35
6
1
4
6
6
6
50
10
1
5
8
9
10
60
11
2
6
10
11
11
Series1
Series2
Series3
Series4
Series5
Series6
Series7
This is not a contribution@evanfchan
The Scale Problem with
Histograms
• My app: 100 metrics, 20 histograms
• Assume range of (1000, 6E10).
• Notice how histograms dominate the time series!
Max error % Num buckets
Histogram
Series
Other Series Total Series
50% 44 882 80 962
10% 188 3762 80 3842
2% 905 18102 80 18182
This is not a contribution@evanfchan
Mama we got a problem
• Actual system: hundreds of
millions of metrics, each one
has histogram with 64
buckets
• Using Prometheus would
lead to tens of billions of
series
This is not a contribution@evanfchan
Prometheus: Raw Data
__name__ metric_sum
__name__ metric_count
__name__ metric_bucket
__name__ metric_bucket
__name__ metric_bucket
__name__ metric_bucket
__name__ metric_bucket
le 0.5
le 2.0
le 5.0
le 10.
le 25.
Zone Us-west
Zone Us-west
Zone Us-west
Zone Us-west
Zone Us-west
Zone Us-west
Zone Us-west
44
5
0
2
3
5
5
This is not a contribution@evanfchan
Atomicity Issues
• Prom export, scrape does not guarantee grouping
of histogram buckets.
• Easy to only get part of a histogram
• FiloDB is a distributed database. 7 records might
end up in 7 different nodes!
• Calculating histogram_quantile: talk to 7 nodes
for every query!
This is not a contribution@evanfchan
Single Histogram Schema
5 buckets, sum, count per histogram
__name__ metric
Sum
Count
Hist
0.5
2.0
5.0
10.
25.
44
5
0
2
3
5
5
35
6
1
4
6
6
6
50
10
1
5
8
9
10
60
11
2
6
10
11
11
Series1
This is not a contribution@evanfchan
Single Histogram Raw Data
__name__ MetricZone Us-west
44 5 0 2 3 5 5
Sum Count Hist (0.5, 2, 5, 10, 25)
• One record, not (n + 2). No distribution problem!
• Labels only appear once
• Savings proportional to # of histogram buckets
• 50x savings for 64 histogram buckets
This is not a contribution@evanfchan
Much smaller network and
disk usage
• One time series vs 66 -> 50x network I/O reduction
• Single histogram schema in FiloDB uses < 0.2 bytes
per histogram bucket
Network I/O
Bytesper
histogram
0
3500
7000
10500
14000
Series/bucket Series/histo
Storage cost
Bytesperbucket
0
0.4
0.8
1.2
1.6
Series/bucket Series/histo
This is not a contribution@evanfchan
Optimizing Histograms:
Compression
• Delta encoding of increasing bucket values
0 2 3 5 5 0 2 1 2 0
1 4 6 6 6 1 3 2 0 0
• Compressed size about 4x-10x better than 1
time series per bucket (64 buckets; FiloDB)
• 0.18 bytes/histogram bucket (range: 0.16 - 0.61)
FiloDB
SingleHistogram
0.18 bytes/bucket
Prometheus 1.5 bytes/bucket
Raw data 8 bytes/bucket
This is not a contribution@evanfchan
Optimizing Histograms:
Querying (64 Buckets)
• histogram_quantile()
is more than 100x faster
than series-per-bucket
• No need for group-by
• Localized computation vs
needing to jump across 64
bucket time series
histogram_quantile()
QPS
0
7500
15000
22500
30000
Series/Bucket Series/Histo
This is not a contribution
Rich Histograms
Usability and Correctness
This is not a contribution@evanfchan
Changing buckets…. sum()
• sum(rate(http_req_latency{…..}[5m])) by (le)
• Different buckets lead to incorrect sums
2.5 5 10 50 +Infle= 25 100
This is not a contribution@evanfchan
Holistic Histograms: 

Correct Sums
• Adding histograms holistically allows us to track
bucket changes and correctly sum them
2.5 5 10 50 +Infle= 25 100
This is not a contribution@evanfchan
histogram_quantile clipping
• At 20:00, quantile is clipped at 2nd-last bucket of
10.0
This is not a contribution@evanfchan
histogram_max_quantile
• Client sends a max value at each time interval
This is not a contribution@evanfchan
histogram_max_quantile
• Having a known max allows us to interpolate in last bucket
• Cannot interpolate to +Inf
• https://github.com/filodb/FiloDB/pull/361
2.5 5 10 25 +Infle= 40
0.9
This is not a contribution@evanfchan
Ad-Hoc Histograms
• Just the quantile, min, max from gauges is not that useful
• Get heat map for CPU use across k8s containers
• histogram(2, 8,
container_cpu_usage_seconds_total{….})
• Aggregate histogram across gauges using new
histogram() function
• Yes Grafana can do heat maps from raw series - but you
can only read so many raw time series. :)
This is not a contribution@evanfchan
Summary: Rich Histograms
at Scale
• Treating histograms as a first class citizen
• Massive savings in storage and network I/O
• Solve aggregation and other correctness issues
• Move towards T-Digests and future formats
Thank you very much!
Please reach out to help make useful histograms

at scale a reality!
@evanfchan
http://github.com/filodb/FiloDB
Monitorama slack: #talk-evan-chan
This is not a contribution@evanfchan
Example 2: Write size
This is not a contribution@evanfchan
Heatmap 2: Write Size
This is not a contribution@evanfchan
Histogram aggregation:
Prometheus
• Group by is needed for summing histogram buckets
due to data model - leak of abstraction
• What if dev changes the histogram scheme? (# of
buckets, etc.)
• Not possible to resolve scheme differences in Prom,
since aggregation knows nothing about histograms
sum(rate(histogram_bucket{app="foo")[5m])) by (le)
This is not a contribution@evanfchan
Histogram aggregation:
FiloDB
• No need for _bucket, but need to select histogram
column
• No need for group by. Histograms are natively
understood and correct aggregations happen
sum(rate(histogram{app=“foo”,__col__=“h”)[5m]))

More Related Content

What's hot

ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfAlkin Tezuysal
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureDatabricks
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache CalciteJordan Halterman
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...HostedbyConfluent
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips confluent
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZENorvald Ryeng
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...HostedbyConfluent
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesCloudera, Inc.
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsFederico Razzoli
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafkaconfluent
 

What's hot (20)

ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAs
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 

Similar to Histograms at scale - Monitorama 2019

FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleEvan Chan
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataDatabricks
 
Online statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithmsOnline statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithmsSimon Belak
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathChester Chen
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatternsgrepalex
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwordsNitay Joffe
 
2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users GroupNitay Joffe
 
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache GiraphAvery Ching
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Ontico
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Alexey Zinoviev
 
Sketch algorithms
Sketch algorithmsSketch algorithms
Sketch algorithmsSimon Belak
 
Faster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at YahooFaster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at YahooMithun Radhakrishnan
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveDataWorks Summit/Hadoop Summit
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterAttila Szegedi
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 

Similar to Histograms at scale - Monitorama 2019 (20)

FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
 
PraveenBOUT++
PraveenBOUT++PraveenBOUT++
PraveenBOUT++
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
 
Online statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithmsOnline statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithms
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords
 
2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group
 
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
Sketch algorithms
Sketch algorithmsSketch algorithms
Sketch algorithms
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Faster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at YahooFaster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at Yahoo
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 

More from Evan Chan

Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesEvan Chan
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web ServiceEvan Chan
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkEvan Chan
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureEvan Chan
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and SparkEvan Chan
 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server TalkEvan Chan
 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Evan Chan
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkEvan Chan
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkEvan Chan
 

More from Evan Chan (15)

Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and Kubernetes
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data Architecture
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server Talk
 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and Shark
 

Recently uploaded

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 

Recently uploaded (20)

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 

Histograms at scale - Monitorama 2019

  • 1. Rich Histograms at Scale: A New Hope Evan Chan @evanfchan http://github.com/filodb/FiloDB
  • 2. This is not a contribution
  • 3. This is not a contribution@evanfchan What do we do with Histograms?
  • 4. This is not a contribution@evanfchan The Evolution of Histograms • Pre-aggregated percentiles Prometheus InfluxDB ??? Statsd Graphite OpenTSDB• Histogram with buckets • Prometheus histograms • HDRHistogram • T-Digests
  • 5. This is not a contribution@evanfchan Overlaid Latency Quantiles
  • 6. This is not a contribution@evanfchan Now an incident happens…
  • 7. This is not a contribution@evanfchan Heatmaps: Rich Visuals
  • 8. This is not a contribution@evanfchan Grafana Heatmaps • Buckets are scalable to much more input data but needs TSDB support for histogram buckets • Time series: flexible, but Grafana needs to read ALL the raw data
  • 9. This is not a contribution@evanfchan Useful Histograms • Should be aggregatable • Supports quantiles, distributions, other f(x) • Heatmaps - histograms over time • Should be accurate • Should scale and be efficient
  • 10. This is not a contribution@evanfchan Buckets and Accuracy • Max quantile error = bucket width / lowerBound • Exponential buckets = consistent max quantile errors (Good!) • Linear almost never makes sense • Your custom Prom histogram buckets likely have >100% error Histogram Type Max Error % # Buckets Linear 100% 60,000,000 Exponential 99.1% 26 Linear 10% 600,000,000 Exponential 10.0% 188 Example: (1000, 6E10) value range
  • 11. This is not a contribution@evanfchan Configuring your Histograms • Start with the range of values you need: (min, max) • Pick the desired max quantile error % • Think about trading off publish freq for accuracy • # buckets = log(max/min) / log(1 + max_error) • Example: Max error=50%, (1000 to 6E10): numBuckets = Math.log(6E10/1000) / Math.log(1 + 0.50)
 exponentialBuckets(1000, 1 + 0.50, numBuckets)
  • 12. This is not a contribution Histograms at Scale
  • 13. This is not a contribution@evanfchan Histograms as First-Class Citizen • Modeling, transporting, and storing histograms holistically offers many benefits • Scalability — much better storage, network, query speed • Proper aggregations • Better accuracy and features • Adaptable to better histogram designs in the future • Almost nobody is doing this yet
  • 14. This is not a contribution@evanfchan Prometheus Histogram Schema __name__ metric_sum 5 buckets, sum, count per histogram __name__ metric_count __name__ metric_bucket __name__ metric_bucket __name__ metric_bucket __name__ metric_bucket __name__ metric_bucket le 0.5 le 2.0 le 5.0 le 10. le 25. 44 5 0 2 3 5 5 35 6 1 4 6 6 6 50 10 1 5 8 9 10 60 11 2 6 10 11 11 Series1 Series2 Series3 Series4 Series5 Series6 Series7
  • 15. This is not a contribution@evanfchan The Scale Problem with Histograms • My app: 100 metrics, 20 histograms • Assume range of (1000, 6E10). • Notice how histograms dominate the time series! Max error % Num buckets Histogram Series Other Series Total Series 50% 44 882 80 962 10% 188 3762 80 3842 2% 905 18102 80 18182
  • 16. This is not a contribution@evanfchan Mama we got a problem • Actual system: hundreds of millions of metrics, each one has histogram with 64 buckets • Using Prometheus would lead to tens of billions of series
  • 17. This is not a contribution@evanfchan Prometheus: Raw Data __name__ metric_sum __name__ metric_count __name__ metric_bucket __name__ metric_bucket __name__ metric_bucket __name__ metric_bucket __name__ metric_bucket le 0.5 le 2.0 le 5.0 le 10. le 25. Zone Us-west Zone Us-west Zone Us-west Zone Us-west Zone Us-west Zone Us-west Zone Us-west 44 5 0 2 3 5 5
  • 18. This is not a contribution@evanfchan Atomicity Issues • Prom export, scrape does not guarantee grouping of histogram buckets. • Easy to only get part of a histogram • FiloDB is a distributed database. 7 records might end up in 7 different nodes! • Calculating histogram_quantile: talk to 7 nodes for every query!
  • 19. This is not a contribution@evanfchan Single Histogram Schema 5 buckets, sum, count per histogram __name__ metric Sum Count Hist 0.5 2.0 5.0 10. 25. 44 5 0 2 3 5 5 35 6 1 4 6 6 6 50 10 1 5 8 9 10 60 11 2 6 10 11 11 Series1
  • 20. This is not a contribution@evanfchan Single Histogram Raw Data __name__ MetricZone Us-west 44 5 0 2 3 5 5 Sum Count Hist (0.5, 2, 5, 10, 25) • One record, not (n + 2). No distribution problem! • Labels only appear once • Savings proportional to # of histogram buckets • 50x savings for 64 histogram buckets
  • 21. This is not a contribution@evanfchan Much smaller network and disk usage • One time series vs 66 -> 50x network I/O reduction • Single histogram schema in FiloDB uses < 0.2 bytes per histogram bucket Network I/O Bytesper histogram 0 3500 7000 10500 14000 Series/bucket Series/histo Storage cost Bytesperbucket 0 0.4 0.8 1.2 1.6 Series/bucket Series/histo
  • 22. This is not a contribution@evanfchan Optimizing Histograms: Compression • Delta encoding of increasing bucket values 0 2 3 5 5 0 2 1 2 0 1 4 6 6 6 1 3 2 0 0 • Compressed size about 4x-10x better than 1 time series per bucket (64 buckets; FiloDB) • 0.18 bytes/histogram bucket (range: 0.16 - 0.61) FiloDB SingleHistogram 0.18 bytes/bucket Prometheus 1.5 bytes/bucket Raw data 8 bytes/bucket
  • 23. This is not a contribution@evanfchan Optimizing Histograms: Querying (64 Buckets) • histogram_quantile() is more than 100x faster than series-per-bucket • No need for group-by • Localized computation vs needing to jump across 64 bucket time series histogram_quantile() QPS 0 7500 15000 22500 30000 Series/Bucket Series/Histo
  • 24. This is not a contribution Rich Histograms Usability and Correctness
  • 25. This is not a contribution@evanfchan Changing buckets…. sum() • sum(rate(http_req_latency{…..}[5m])) by (le) • Different buckets lead to incorrect sums 2.5 5 10 50 +Infle= 25 100
  • 26. This is not a contribution@evanfchan Holistic Histograms: 
 Correct Sums • Adding histograms holistically allows us to track bucket changes and correctly sum them 2.5 5 10 50 +Infle= 25 100
  • 27. This is not a contribution@evanfchan histogram_quantile clipping • At 20:00, quantile is clipped at 2nd-last bucket of 10.0
  • 28. This is not a contribution@evanfchan histogram_max_quantile • Client sends a max value at each time interval
  • 29. This is not a contribution@evanfchan histogram_max_quantile • Having a known max allows us to interpolate in last bucket • Cannot interpolate to +Inf • https://github.com/filodb/FiloDB/pull/361 2.5 5 10 25 +Infle= 40 0.9
  • 30. This is not a contribution@evanfchan Ad-Hoc Histograms • Just the quantile, min, max from gauges is not that useful • Get heat map for CPU use across k8s containers • histogram(2, 8, container_cpu_usage_seconds_total{….}) • Aggregate histogram across gauges using new histogram() function • Yes Grafana can do heat maps from raw series - but you can only read so many raw time series. :)
  • 31. This is not a contribution@evanfchan Summary: Rich Histograms at Scale • Treating histograms as a first class citizen • Massive savings in storage and network I/O • Solve aggregation and other correctness issues • Move towards T-Digests and future formats
  • 32. Thank you very much! Please reach out to help make useful histograms
 at scale a reality! @evanfchan http://github.com/filodb/FiloDB Monitorama slack: #talk-evan-chan
  • 33. This is not a contribution@evanfchan Example 2: Write size
  • 34. This is not a contribution@evanfchan Heatmap 2: Write Size
  • 35. This is not a contribution@evanfchan Histogram aggregation: Prometheus • Group by is needed for summing histogram buckets due to data model - leak of abstraction • What if dev changes the histogram scheme? (# of buckets, etc.) • Not possible to resolve scheme differences in Prom, since aggregation knows nothing about histograms sum(rate(histogram_bucket{app="foo")[5m])) by (le)
  • 36. This is not a contribution@evanfchan Histogram aggregation: FiloDB • No need for _bucket, but need to select histogram column • No need for group by. Histograms are natively understood and correct aggregations happen sum(rate(histogram{app=“foo”,__col__=“h”)[5m]))