SlideShare a Scribd company logo
1 of 42
Download to read offline
Scalable Online Analytics for Monitoring
LISA15, Nov. 13, 2015, Washington, D.C.
Heinrich Hartmann, PhD, Chief Data Scientist, Circonus
I’m Heinrich
· From Mainz, Germany
· Studied Math. in Mainz, Bonn, Oxford
· PhD in Algebraic Geometry
· Sensor Data Mining at Univ. Koblenz
· Freelance IT consultant
· Chief Data Scientist at Circonus
Heinrich.Hartmann@Circonus.com
@HeinrichHartman(n)
Circonus is ...
@HeinrichHartman(n)
· monitoring and telemetry analysis platform
· scalable to millions of incoming metrics
· available as public and private SaaS
· built-in histograms, forecasting, anomaly-detection, ...
@HeinrichHartman(n)
This talk is about...
I. The Future of Monitoring
II. Patterns of Telemetry Analysis
III. Design of the Online Analytics Engine ‘Beaker’
Part I - The Future of Monitoring
Monitor this:
· 100 containers in one fleet
· 200 metrics per-container
· 50 code pushes a day
· For each push all containers are recreated
The “cloud monitoring challenge” 2015
@HeinrichHartman(n)
Line-plots work well for small numbers of nodes
@HeinrichHartman(n)
Node 1
Node 2
Node 3
Node 4
Node 5
Node 6
CPU utilization for a DB cluster. Source: Internal.
... but can get polluted easily
@HeinrichHartman(n)
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
CPU utilization for a DB cluster. Source: Internal.
“Information is not a scarce resource. Attention is.”
Herbert A. Simon
@HeinrichHartman(n) Source: http://www.circonus.com/the-future-of-monitoring-qa-with-john-allspaw/
Store Histograms and Alert on Percentiles
@HeinrichHartman(n)
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
90% percentile
CPU Utilization of a db service with a variable number of nodes.
@HeinrichHartman(n)
Anomaly Detection for Surfacing relevant data
Mockup of an metric overview page. Source: Circonus
Part II - Patterns of Telemetry Analysis
@HeinrichHartman(n)
Telemetry analysis comes in two forms
Online Analytics
· Anomaly detection
· Percentile alerting
· Smart alerting rules
· Smart dashboards
Offline Analytics
· Post mortem analysis
· Assisted thresholding
Stream
Processing
‘Big Data’
@HeinrichHartman(n)
New Components in Circonus
BunsenBeaker
@HeinrichHartman(n)
Beaker & Bunsen in the Circonus Architecture
metrics Ernie
Alerting Service
Snowth
Metric Store
alerts
Web-UI
Q
Beaker
Online AE
Bunsen
Offline AE
@HeinrichHartman(n)
Pattern 1: Windowing
Examples
· Supervised Machine Learning
· Fourier Transformation
· Anomaly Detection (etsy)
Remarks
· Tradeoff: window size
rich features vs. memory
· Tradeoff: overlap
latency vs. CPU
windows = window(y_stream)
def results():
for w in windows:
yield z = method(w)
@HeinrichHartman(n)
Pattern 3: Processing Units
processing_unit = {
state = ...
update = function(self, y) ... end
}
Example
· Exponential Smoothing
· Holt Winters forecasting
· Anomaly Detection (Circonus)
Remarks
· Fast updates
· Fully general
· Cost: maintains state
Circonus Anomaly Detection
@HeinrichHartman(n)
local q = 0.9
exponential_smoothing = {
s = 0,
update = function(self, y)
self.s = (y * q) + (self.s * (1 - q))
return self.s
end
}
For
A Processing Unit for Exponential Smoothing
Exponential smoothing applied to a dns-duration metric
More examples on http://heinrichhartmann.com/.../Generative-Models-for-Time-Series
Processing units are convenient
· Primitive transformation are readily implemented:
Arithmetic, Smoothing, Forecasting, ...
· Fully general. Allow window-based processing as well
· Composable. Compose several PUs to get a new one!
@HeinrichHartman(n)
Circonus Analytics Query Language
· Create your own customized processing unit from primitives
· UNIX-inspired syntax with pipes ‘|’
· Native support for histogram metrics
@HeinrichHartman(n)
@HeinrichHartman(n)
CAQL: Example 1 - Low frequency AD
Pre-process a metric before feeding into anomaly detection
metric:average(<>) | rolling:mean(30m) | anomaly_detection()
@HeinrichHartman(n)
CAQL: Example 2 - Histogram Aggregation
Histograms are first-class citizens
metric:average(<uuid>) | window:histogram(1h) | histogram:percentile(95)
Part III - The Design of the Online Analytics
Engine ‘Beaker’
1. Read messages from a message queue
2. Execute a processing units over incoming metrics
3. Publish computed values to message queue
Beaker v. 0.1: Simple stream processing
Beaker Service
output
Q
input
Q
Beaker: Basic Data Flow
Challenge 1: Rollup metrics by the minute
@HeinrichHartman(n)
· Metrics arrive asynchronously on the input queue
· PUs expect exactly one sample per minute
· Rollup logic needs to allow for
· late arrival, e.g. when broker is behind
· out of order arrival
· errors in system time (time-zones, drifts)
Consequence: No real-time processing in Beaker
· Rolling up data in 1m periods causes an average delay
of 30sec for data processing.
· Real-time threshold-based alerting still available.
· Approach: Avoid rolling up data for stateless PUs.
@HeinrichHartman(n)
Challenge 2: Multiple input slots
input metric
input metric
input metric
input metric
input metric
processing unit
processing unit
processing unit output metric
output metric
output metric
Beaker: Logical Data Flow
@HeinrichHartman(n)
Challenge 3:
Synchronize roll-ups for multiple inputs is tricky
@HeinrichHartman(n)
Challenge 4: Fault Tolerance
Definition (Birman): A service is fault tolerant if it
tolerates the failure of nodes without producing wrong
output messages.
Failed nodes must be able to recover from a crash and
rejoin the cluster.
The time to recovery should be as low as possible.
@HeinrichHartman(n) Source: K. Birman - Reliable Distributed Systems, Springer, 2005
Beaker v0.5:
· Automated restarts
a. Service Management Facilities (svcadm) in OmniOS
b. systemd or watchdogs in Linux
· But, recovery can take a long time
State has to be rebuilt from input stream.
· Need a way to recover faster from errors:
a. Persist processing unit state (software updates!)
b. Access persisted metric data
@HeinrichHartman(n)
Beaker v. 0.5: Use db to rebuild state on startup
@HeinrichHartman(n)
Beaker Service
output
Q
input
Q
Snowth db
Challenge 5: High Availability
Definition (Birman): A service is highly available it
continues to publish valid messages during a node failure
after a small reconfiguration period.
For Beaker we require a reconfiguration period of less
than 1 minute. In this time messages may be delayed (e.g.
30sec) and duplicated messages may be published.
@HeinrichHartman(n) Source: K. Birman - Reliable Distributed Systems, Springer, 2005
Beaker v. 0.5 -- HA Cluster
Beaker HA Cluster
Beaker Master
output
Q
input
Q
Beaker
Slave
Beaker
Slave On
Failover:
Master
election
@HeinrichHartman(n)
Snowth db
@HeinrichHartman(n)
Challenge 6: Scalability
Beaker needs to scale in the following dimensions
· Number of processing units up to an unlimited amount.
· In the number of incoming metrics up to ~100M
metrics.
Beaker v.0.6 -- Multiple - HA Clusters
Beaker HA Cluster II (3 slaves)
Beaker HA Cluster I (5 slaves)
..
.
..
Beaker HA Cluster III (1 slave)
BI-M BI-S1 BI-S5
BII-M BII-S1 BII-S3
BIII-M BIII-S1
Meta Service
Msg
Broker
Msg
Broker
BIII-S2
@HeinrichHartman(n)
Snowth db
..
@HeinrichHartman(n)
Done! Great. This works...
Can we do better?
@HeinrichHartman(n)
· Divide Beaker into multiple services
· Avoid master election in processing service, by allowing duplicates
· Upside: Only the processing service needs to scale out
Beaker Service
Beaker v. 1.0 -- Divide and Conquer
@HeinrichHartman(n)
Q Q
Processing
Service
Dedup
Service
Rollup
Service
Scaling the Processing Service is simplified
· No rollup logic in workers
· All workers publish messages
· No failover logic necessary
· Replicate each PUs on multiple
workers
· No crosstalk between nodes
( = 0 in USL).
@HeinrichHartman(n)
Q Q
...
snowth db
Worker
Worker
Worker
Meta Service
Processing Service:
Data Flow
Benchmark results on prototype
· PU type anomaly_detection
PU count 100
Throughput 15 kHz
· PU type anomaly_detection
PU count 10k
Throughput 4.2 kHz
Machine: 6-core Xeon E5 2630 v2 at 2.6 GHz
@HeinrichHartman(n)
Conclusion
· Stateful processing units allow implementation of next
generation of monitoring analytics
· Use CAQL to build your own processing units
· Service orientation facilitates scaling
· Beaker will be out soon
@HeinrichHartman(n)
Credits
Joint work with:
· Jonas Kunze
· Theo Schlossnagle
Image Credits:
Example of Kummer Surface, Claudio Rocchini, CC-BY-SA, https://en.wikipedia.org/wiki/File:Kummer_surface.png
Indian truck, by strudelt, CC-BY, https://commons.wikimedia.org/wiki/File:Truck_in_India_-_overloaded.jpg
Kafka, Public Domain, https://en.wikipedia.org/wiki/File:Kafka_portrait.jpg
Thinker, Public Domain, https://www.flickr.com/photos/mustangjoe/5966894496
@HeinrichHartman(n)

More Related Content

What's hot

Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingKostas Tzoumas
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...Jonas Traub
 
Monitoring Flink with Prometheus
Monitoring Flink with PrometheusMonitoring Flink with Prometheus
Monitoring Flink with PrometheusMaximilian Bode
 
Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...
Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...
Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...Flink Forward
 
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward
 
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Ververica
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Flink Forward
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Ververica
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisShah Zaib
 
Using Kafka to integrate DWH and Cloud Based big data systems
Using Kafka to integrate DWH and Cloud Based big data systemsUsing Kafka to integrate DWH and Cloud Based big data systems
Using Kafka to integrate DWH and Cloud Based big data systemsconfluent
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?confluent
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...Ververica
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkVerverica
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Ververica
 
Worldsensing: A Real World Use Case for Flux by Albert Zaragoza, CTO & Head o...
Worldsensing: A Real World Use Case for Flux by Albert Zaragoza, CTO & Head o...Worldsensing: A Real World Use Case for Flux by Albert Zaragoza, CTO & Head o...
Worldsensing: A Real World Use Case for Flux by Albert Zaragoza, CTO & Head o...InfluxData
 
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...Flink Forward
 
Continuously Updating Query Results over Real-Time Linked Data
Continuously Updating Query Results over Real-Time Linked DataContinuously Updating Query Results over Real-Time Linked Data
Continuously Updating Query Results over Real-Time Linked DataRuben Taelman
 
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...Flink Forward
 

What's hot (20)

Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
 
Monitoring Flink with Prometheus
Monitoring Flink with PrometheusMonitoring Flink with Prometheus
Monitoring Flink with Prometheus
 
Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...
Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...
Flink Forward Berlin 2018: Xingcan Cui - "Stream Join in Flink: from Discrete...
 
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
 
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 
Using Kafka to integrate DWH and Cloud Based big data systems
Using Kafka to integrate DWH and Cloud Based big data systemsUsing Kafka to integrate DWH and Cloud Based big data systems
Using Kafka to integrate DWH and Cloud Based big data systems
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
 
Vlsi 2015 2016 ieee project list-(m)
Vlsi 2015 2016 ieee project list-(m)Vlsi 2015 2016 ieee project list-(m)
Vlsi 2015 2016 ieee project list-(m)
 
Worldsensing: A Real World Use Case for Flux by Albert Zaragoza, CTO & Head o...
Worldsensing: A Real World Use Case for Flux by Albert Zaragoza, CTO & Head o...Worldsensing: A Real World Use Case for Flux by Albert Zaragoza, CTO & Head o...
Worldsensing: A Real World Use Case for Flux by Albert Zaragoza, CTO & Head o...
 
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...
 
Continuously Updating Query Results over Real-Time Linked Data
Continuously Updating Query Results over Real-Time Linked DataContinuously Updating Query Results over Real-Time Linked Data
Continuously Updating Query Results over Real-Time Linked Data
 
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...
 

Similar to Scalable Online Analytics for Monitoring

OSMC 2016 | Friends and foes in API Monitoring by Heinrich Hartmann
OSMC 2016 | Friends and foes in API Monitoring by Heinrich HartmannOSMC 2016 | Friends and foes in API Monitoring by Heinrich Hartmann
OSMC 2016 | Friends and foes in API Monitoring by Heinrich HartmannNETWAYS
 
OSMC 2016 - Friends and foes by Heinrich Hartmann
OSMC 2016 - Friends and foes by Heinrich HartmannOSMC 2016 - Friends and foes by Heinrich Hartmann
OSMC 2016 - Friends and foes by Heinrich HartmannNETWAYS
 
S4x16 europe krotofil_granular_dataflowsics
S4x16 europe krotofil_granular_dataflowsicsS4x16 europe krotofil_granular_dataflowsics
S4x16 europe krotofil_granular_dataflowsicsMarina Krotofil
 
ODSC 2019: Sessionisation via stochastic periods for root event identification
ODSC 2019: Sessionisation via stochastic periods for root event identificationODSC 2019: Sessionisation via stochastic periods for root event identification
ODSC 2019: Sessionisation via stochastic periods for root event identificationKuldeep Jiwani
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in StreamsJamie Grier
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Kostas Tzoumas
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemC4Media
 
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017Justin Hayward
 
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Florian Lautenschlager
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
 
Post quantum cryptography in vault (hashi talks 2020)
Post quantum cryptography in vault (hashi talks 2020)Post quantum cryptography in vault (hashi talks 2020)
Post quantum cryptography in vault (hashi talks 2020)Mitchell Pronschinske
 
Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016Dieter Plaetinck
 
HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)Eron Wright
 
DEF CON 27 - ANDREAS BAUMHOF - are quantum computers really a threat to crypt...
DEF CON 27 - ANDREAS BAUMHOF - are quantum computers really a threat to crypt...DEF CON 27 - ANDREAS BAUMHOF - are quantum computers really a threat to crypt...
DEF CON 27 - ANDREAS BAUMHOF - are quantum computers really a threat to crypt...Felipe Prado
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of dataconfluent
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsDebasish Ghosh
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataJames Sirota
 

Similar to Scalable Online Analytics for Monitoring (20)

OSMC 2016 | Friends and foes in API Monitoring by Heinrich Hartmann
OSMC 2016 | Friends and foes in API Monitoring by Heinrich HartmannOSMC 2016 | Friends and foes in API Monitoring by Heinrich Hartmann
OSMC 2016 | Friends and foes in API Monitoring by Heinrich Hartmann
 
OSMC 2016 - Friends and foes by Heinrich Hartmann
OSMC 2016 - Friends and foes by Heinrich HartmannOSMC 2016 - Friends and foes by Heinrich Hartmann
OSMC 2016 - Friends and foes by Heinrich Hartmann
 
S4x16 europe krotofil_granular_dataflowsics
S4x16 europe krotofil_granular_dataflowsicsS4x16 europe krotofil_granular_dataflowsics
S4x16 europe krotofil_granular_dataflowsics
 
ODSC 2019: Sessionisation via stochastic periods for root event identification
ODSC 2019: Sessionisation via stochastic periods for root event identificationODSC 2019: Sessionisation via stochastic periods for root event identification
ODSC 2019: Sessionisation via stochastic periods for root event identification
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 
S4x16_Europe_Krotofil
S4x16_Europe_KrotofilS4x16_Europe_Krotofil
S4x16_Europe_Krotofil
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing System
 
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
 
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Post quantum cryptography in vault (hashi talks 2020)
Post quantum cryptography in vault (hashi talks 2020)Post quantum cryptography in vault (hashi talks 2020)
Post quantum cryptography in vault (hashi talks 2020)
 
Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016
 
HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)
 
DEF CON 27 - ANDREAS BAUMHOF - are quantum computers really a threat to crypt...
DEF CON 27 - ANDREAS BAUMHOF - are quantum computers really a threat to crypt...DEF CON 27 - ANDREAS BAUMHOF - are quantum computers really a threat to crypt...
DEF CON 27 - ANDREAS BAUMHOF - are quantum computers really a threat to crypt...
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Nexmark with beam
Nexmark with beamNexmark with beam
Nexmark with beam
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 

More from Heinrich Hartmann

Linux System Monitoring with eBPF
Linux System Monitoring with eBPFLinux System Monitoring with eBPF
Linux System Monitoring with eBPFHeinrich Hartmann
 
Seminar on Motivic Hall Algebras
Seminar on Motivic Hall AlgebrasSeminar on Motivic Hall Algebras
Seminar on Motivic Hall AlgebrasHeinrich Hartmann
 
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONSGROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONSHeinrich Hartmann
 
Related-Work.net at WeST Oberseminar
Related-Work.net at WeST OberseminarRelated-Work.net at WeST Oberseminar
Related-Work.net at WeST OberseminarHeinrich Hartmann
 
Pushforward of Differential Forms
Pushforward of Differential FormsPushforward of Differential Forms
Pushforward of Differential FormsHeinrich Hartmann
 
Dimensionstheorie Noetherscher Ringe
Dimensionstheorie Noetherscher RingeDimensionstheorie Noetherscher Ringe
Dimensionstheorie Noetherscher RingeHeinrich Hartmann
 
Hecke Curves and Moduli spcaes of Vector Bundles
Hecke Curves and Moduli spcaes of Vector BundlesHecke Curves and Moduli spcaes of Vector Bundles
Hecke Curves and Moduli spcaes of Vector BundlesHeinrich Hartmann
 
Dimension und Multiplizität von D-Moduln
Dimension und Multiplizität von D-ModulnDimension und Multiplizität von D-Moduln
Dimension und Multiplizität von D-ModulnHeinrich Hartmann
 
Nodale kurven und Hilbertschemata
Nodale kurven und HilbertschemataNodale kurven und Hilbertschemata
Nodale kurven und HilbertschemataHeinrich Hartmann
 
Local morphisms are given by composition
Local morphisms are given by compositionLocal morphisms are given by composition
Local morphisms are given by compositionHeinrich Hartmann
 
Notes on Intersection theory
Notes on Intersection theoryNotes on Intersection theory
Notes on Intersection theoryHeinrich Hartmann
 
Cusps of the Kähler moduli space and stability conditions on K3 surfaces
Cusps of the Kähler moduli space and stability conditions on K3 surfacesCusps of the Kähler moduli space and stability conditions on K3 surfaces
Cusps of the Kähler moduli space and stability conditions on K3 surfacesHeinrich Hartmann
 

More from Heinrich Hartmann (19)

Linux System Monitoring with eBPF
Linux System Monitoring with eBPFLinux System Monitoring with eBPF
Linux System Monitoring with eBPF
 
Geometric Aspects of LSA
Geometric Aspects of LSAGeometric Aspects of LSA
Geometric Aspects of LSA
 
Geometric Aspects of LSA
Geometric Aspects of LSAGeometric Aspects of LSA
Geometric Aspects of LSA
 
Seminar on Complex Geometry
Seminar on Complex GeometrySeminar on Complex Geometry
Seminar on Complex Geometry
 
Seminar on Motivic Hall Algebras
Seminar on Motivic Hall AlgebrasSeminar on Motivic Hall Algebras
Seminar on Motivic Hall Algebras
 
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONSGROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
 
Topics in Category Theory
Topics in Category TheoryTopics in Category Theory
Topics in Category Theory
 
Related-Work.net at WeST Oberseminar
Related-Work.net at WeST OberseminarRelated-Work.net at WeST Oberseminar
Related-Work.net at WeST Oberseminar
 
Komplexe Zahlen
Komplexe ZahlenKomplexe Zahlen
Komplexe Zahlen
 
Pushforward of Differential Forms
Pushforward of Differential FormsPushforward of Differential Forms
Pushforward of Differential Forms
 
Dimensionstheorie Noetherscher Ringe
Dimensionstheorie Noetherscher RingeDimensionstheorie Noetherscher Ringe
Dimensionstheorie Noetherscher Ringe
 
Polynomproblem
PolynomproblemPolynomproblem
Polynomproblem
 
Hecke Curves and Moduli spcaes of Vector Bundles
Hecke Curves and Moduli spcaes of Vector BundlesHecke Curves and Moduli spcaes of Vector Bundles
Hecke Curves and Moduli spcaes of Vector Bundles
 
Dimension und Multiplizität von D-Moduln
Dimension und Multiplizität von D-ModulnDimension und Multiplizität von D-Moduln
Dimension und Multiplizität von D-Moduln
 
Nodale kurven und Hilbertschemata
Nodale kurven und HilbertschemataNodale kurven und Hilbertschemata
Nodale kurven und Hilbertschemata
 
Local morphisms are given by composition
Local morphisms are given by compositionLocal morphisms are given by composition
Local morphisms are given by composition
 
Notes on Intersection theory
Notes on Intersection theoryNotes on Intersection theory
Notes on Intersection theory
 
Cusps of the Kähler moduli space and stability conditions on K3 surfaces
Cusps of the Kähler moduli space and stability conditions on K3 surfacesCusps of the Kähler moduli space and stability conditions on K3 surfaces
Cusps of the Kähler moduli space and stability conditions on K3 surfaces
 
Log Files
Log FilesLog Files
Log Files
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Scalable Online Analytics for Monitoring

  • 1. Scalable Online Analytics for Monitoring LISA15, Nov. 13, 2015, Washington, D.C. Heinrich Hartmann, PhD, Chief Data Scientist, Circonus
  • 2. I’m Heinrich · From Mainz, Germany · Studied Math. in Mainz, Bonn, Oxford · PhD in Algebraic Geometry · Sensor Data Mining at Univ. Koblenz · Freelance IT consultant · Chief Data Scientist at Circonus Heinrich.Hartmann@Circonus.com @HeinrichHartman(n)
  • 3. Circonus is ... @HeinrichHartman(n) · monitoring and telemetry analysis platform · scalable to millions of incoming metrics · available as public and private SaaS · built-in histograms, forecasting, anomaly-detection, ...
  • 4. @HeinrichHartman(n) This talk is about... I. The Future of Monitoring II. Patterns of Telemetry Analysis III. Design of the Online Analytics Engine ‘Beaker’
  • 5. Part I - The Future of Monitoring
  • 6. Monitor this: · 100 containers in one fleet · 200 metrics per-container · 50 code pushes a day · For each push all containers are recreated The “cloud monitoring challenge” 2015 @HeinrichHartman(n)
  • 7. Line-plots work well for small numbers of nodes @HeinrichHartman(n) Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 CPU utilization for a DB cluster. Source: Internal.
  • 8. ... but can get polluted easily @HeinrichHartman(n) N N N N N N N N N N N N N N N N N N N N CPU utilization for a DB cluster. Source: Internal.
  • 9. “Information is not a scarce resource. Attention is.” Herbert A. Simon @HeinrichHartman(n) Source: http://www.circonus.com/the-future-of-monitoring-qa-with-john-allspaw/
  • 10. Store Histograms and Alert on Percentiles @HeinrichHartman(n) N N N N N N N N N N N N N N N N N N N N N N N 90% percentile CPU Utilization of a db service with a variable number of nodes.
  • 11. @HeinrichHartman(n) Anomaly Detection for Surfacing relevant data Mockup of an metric overview page. Source: Circonus
  • 12. Part II - Patterns of Telemetry Analysis
  • 13. @HeinrichHartman(n) Telemetry analysis comes in two forms Online Analytics · Anomaly detection · Percentile alerting · Smart alerting rules · Smart dashboards Offline Analytics · Post mortem analysis · Assisted thresholding Stream Processing ‘Big Data’
  • 14. @HeinrichHartman(n) New Components in Circonus BunsenBeaker
  • 15. @HeinrichHartman(n) Beaker & Bunsen in the Circonus Architecture metrics Ernie Alerting Service Snowth Metric Store alerts Web-UI Q Beaker Online AE Bunsen Offline AE
  • 16. @HeinrichHartman(n) Pattern 1: Windowing Examples · Supervised Machine Learning · Fourier Transformation · Anomaly Detection (etsy) Remarks · Tradeoff: window size rich features vs. memory · Tradeoff: overlap latency vs. CPU windows = window(y_stream) def results(): for w in windows: yield z = method(w)
  • 17. @HeinrichHartman(n) Pattern 3: Processing Units processing_unit = { state = ... update = function(self, y) ... end } Example · Exponential Smoothing · Holt Winters forecasting · Anomaly Detection (Circonus) Remarks · Fast updates · Fully general · Cost: maintains state Circonus Anomaly Detection
  • 18. @HeinrichHartman(n) local q = 0.9 exponential_smoothing = { s = 0, update = function(self, y) self.s = (y * q) + (self.s * (1 - q)) return self.s end } For A Processing Unit for Exponential Smoothing Exponential smoothing applied to a dns-duration metric More examples on http://heinrichhartmann.com/.../Generative-Models-for-Time-Series
  • 19. Processing units are convenient · Primitive transformation are readily implemented: Arithmetic, Smoothing, Forecasting, ... · Fully general. Allow window-based processing as well · Composable. Compose several PUs to get a new one! @HeinrichHartman(n)
  • 20. Circonus Analytics Query Language · Create your own customized processing unit from primitives · UNIX-inspired syntax with pipes ‘|’ · Native support for histogram metrics @HeinrichHartman(n)
  • 21. @HeinrichHartman(n) CAQL: Example 1 - Low frequency AD Pre-process a metric before feeding into anomaly detection metric:average(<>) | rolling:mean(30m) | anomaly_detection()
  • 22. @HeinrichHartman(n) CAQL: Example 2 - Histogram Aggregation Histograms are first-class citizens metric:average(<uuid>) | window:histogram(1h) | histogram:percentile(95)
  • 23. Part III - The Design of the Online Analytics Engine ‘Beaker’
  • 24. 1. Read messages from a message queue 2. Execute a processing units over incoming metrics 3. Publish computed values to message queue Beaker v. 0.1: Simple stream processing Beaker Service output Q input Q Beaker: Basic Data Flow
  • 25. Challenge 1: Rollup metrics by the minute @HeinrichHartman(n) · Metrics arrive asynchronously on the input queue · PUs expect exactly one sample per minute · Rollup logic needs to allow for · late arrival, e.g. when broker is behind · out of order arrival · errors in system time (time-zones, drifts)
  • 26. Consequence: No real-time processing in Beaker · Rolling up data in 1m periods causes an average delay of 30sec for data processing. · Real-time threshold-based alerting still available. · Approach: Avoid rolling up data for stateless PUs. @HeinrichHartman(n)
  • 27. Challenge 2: Multiple input slots input metric input metric input metric input metric input metric processing unit processing unit processing unit output metric output metric output metric Beaker: Logical Data Flow @HeinrichHartman(n)
  • 28. Challenge 3: Synchronize roll-ups for multiple inputs is tricky @HeinrichHartman(n)
  • 29. Challenge 4: Fault Tolerance Definition (Birman): A service is fault tolerant if it tolerates the failure of nodes without producing wrong output messages. Failed nodes must be able to recover from a crash and rejoin the cluster. The time to recovery should be as low as possible. @HeinrichHartman(n) Source: K. Birman - Reliable Distributed Systems, Springer, 2005
  • 30. Beaker v0.5: · Automated restarts a. Service Management Facilities (svcadm) in OmniOS b. systemd or watchdogs in Linux · But, recovery can take a long time State has to be rebuilt from input stream. · Need a way to recover faster from errors: a. Persist processing unit state (software updates!) b. Access persisted metric data @HeinrichHartman(n)
  • 31. Beaker v. 0.5: Use db to rebuild state on startup @HeinrichHartman(n) Beaker Service output Q input Q Snowth db
  • 32. Challenge 5: High Availability Definition (Birman): A service is highly available it continues to publish valid messages during a node failure after a small reconfiguration period. For Beaker we require a reconfiguration period of less than 1 minute. In this time messages may be delayed (e.g. 30sec) and duplicated messages may be published. @HeinrichHartman(n) Source: K. Birman - Reliable Distributed Systems, Springer, 2005
  • 33. Beaker v. 0.5 -- HA Cluster Beaker HA Cluster Beaker Master output Q input Q Beaker Slave Beaker Slave On Failover: Master election @HeinrichHartman(n) Snowth db
  • 34. @HeinrichHartman(n) Challenge 6: Scalability Beaker needs to scale in the following dimensions · Number of processing units up to an unlimited amount. · In the number of incoming metrics up to ~100M metrics.
  • 35. Beaker v.0.6 -- Multiple - HA Clusters Beaker HA Cluster II (3 slaves) Beaker HA Cluster I (5 slaves) .. . .. Beaker HA Cluster III (1 slave) BI-M BI-S1 BI-S5 BII-M BII-S1 BII-S3 BIII-M BIII-S1 Meta Service Msg Broker Msg Broker BIII-S2 @HeinrichHartman(n) Snowth db ..
  • 37. Can we do better? @HeinrichHartman(n)
  • 38. · Divide Beaker into multiple services · Avoid master election in processing service, by allowing duplicates · Upside: Only the processing service needs to scale out Beaker Service Beaker v. 1.0 -- Divide and Conquer @HeinrichHartman(n) Q Q Processing Service Dedup Service Rollup Service
  • 39. Scaling the Processing Service is simplified · No rollup logic in workers · All workers publish messages · No failover logic necessary · Replicate each PUs on multiple workers · No crosstalk between nodes ( = 0 in USL). @HeinrichHartman(n) Q Q ... snowth db Worker Worker Worker Meta Service Processing Service: Data Flow
  • 40. Benchmark results on prototype · PU type anomaly_detection PU count 100 Throughput 15 kHz · PU type anomaly_detection PU count 10k Throughput 4.2 kHz Machine: 6-core Xeon E5 2630 v2 at 2.6 GHz @HeinrichHartman(n)
  • 41. Conclusion · Stateful processing units allow implementation of next generation of monitoring analytics · Use CAQL to build your own processing units · Service orientation facilitates scaling · Beaker will be out soon @HeinrichHartman(n)
  • 42. Credits Joint work with: · Jonas Kunze · Theo Schlossnagle Image Credits: Example of Kummer Surface, Claudio Rocchini, CC-BY-SA, https://en.wikipedia.org/wiki/File:Kummer_surface.png Indian truck, by strudelt, CC-BY, https://commons.wikimedia.org/wiki/File:Truck_in_India_-_overloaded.jpg Kafka, Public Domain, https://en.wikipedia.org/wiki/File:Kafka_portrait.jpg Thinker, Public Domain, https://www.flickr.com/photos/mustangjoe/5966894496 @HeinrichHartman(n)