SlideShare a Scribd company logo
1 of 116
Download to read offline
Infrastructure Monitoring
(with Postgres, obviously)
Steve Simpson
StackHPC
steve@stackhpc.com
www.stackhpc.com
Overview
1) Background
2) Monitoring
Postgres for Metrics
3) Requirements
4) Data & Queries
5) Optimisation
Postgres for ...
6) Log Searching
7) Log Parsing
8) Queueing
Background
Background
Systems Software Engineer
C, C++, Python
Background
Based in Bristol, UK
Thriving Tech Industry
Background
● Gnodal
● 10GbE Ethernet
● ASIC Verification
● Embedded Firmware
● JustOne Database
● Agile “Big Data” RMDBS
● Based on PostgreSQL
● Storage Team Lead
Background
Consultancy for HPC on OpenStack
Multi-tenant massively parallel workloads
Monitoring complex infrastructure
Stack
HPC
Background
Cloud orchestration platform
IaaS through API and dashboard
Multi-tenancy throughout
Network, Compute, Storage
Background
Operational visibility is critical
OpenStack is a complex, distributed application
…to run your complex, distributed applications
Monitoring
Monitoring Requirements
Gain visibility into the operation of the
hardware and software
e.g. web site, database, cluster, disk drive
Monitoring Requirements
Fault finding and alerting
Notify me when a server or service is
unavailable, a disk needs replacing, ...
Fault post-mortem, pre-emption
Why did the outage occur and what can we
do to prevent it next time
Monitoring Requirements
Utilisation and efficiency analysis
Is all the hardware we own being used?
Is it being used efficiently?
Performance monitoring and profiling
How long are my web/database requests?
Monitoring Requirements
Auditing (security, billing)
Tracking users use of system
Auditing access to systems or resources
Decision making, future planning
What is expected growth in data, or users?
What of the current system is most used?
Monitoring
Existing Tools
Existing Tools
Checking and Alerting
Agents check on machines or services
Report centrally, notify users via dashboard
Store history of events in database
Existing Tools
Nagios / Icinga
ping -c 1 $host || mail -s “Help!” $me
Existing Tools
Kibana (+Elasticsearch/Logstash)
Existing Tools
Metrics
Periodically collect metrics, e.g. CPU%
Store in central database for visualization
Some systems allow checking on top
Existing Tools
Ganglia
Collector (gmond) + Aggregator (gmetad)
Existing Tools
https://ganglia.wikimedia.org/
Existing Tools
Grafana - visualization only
Existing Tools
Metrics Databases
● Ganglia (RRDtool)
● Graphite (Whisper)
● OpenTSDB (HBase)
● KairosDB (Cassandra)
● InfluxDB
● Prometheus
● Gnocchi
● Atlas
● Heroic
● Hawkular (Cassandra)
● MetricTank (Cassandra)
● Riak TS (Riak)
● Blueflood (Cassandra)
● DalmatinerDB
● Druid
● BTrDB
● Warp 10 (Hbase)
● Tgres (PostgreSQL!)
Existing Tools
Metrics Databases
● Ganglia [Berkley]
● Graphite [Orbitz]
● OpenTSDB [Stubleupon]
● KairosDB
● InfluxDB
● Prometheus [SoundCloud]
● Gnocchi [OpenStack]
● Atlas [Netflix]
● Heroic [Spotify]
● Hawkular [Redhat]
● MetricTank [Raintank]
● Riak TS [Basho]
● Blueflood [Rackspace]
● DalmatinerDB
● Druid
● BTrDB
● Warp 10
● Tgres
Existing Tools
Existing Tools
2000
Existing Tools
2000
2010
Existing Tools
2000
2010
2013 - 2015
Existing Tools
Software
Network
Storage
Servers
Existing Tools
Software
Network
Storage
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Metric API
Alerting
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Metric API
Alerting
MySQL
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
InfluxDB
Metric API
Alerting
MySQL
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
InfluxDB
Metric API
Alerting
Grafana
MySQL
SQLite
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
MySQL
SQLite
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
Kibana
MySQL
SQLite
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
Kibana
MySQL
SQLite
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
Kibana
MySQL
SQLite
Servers
Metrics
Logs
Zookeeper
Existing Tools
Commendable “right tool for the job” attitude, but…
How about Postgres?
Fewer points of failure
Fewer places to backup
Fewer redundancy protocols
One set of consistent data semantics
Re-use existing operational knowledge
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
Kibana
MySQL
SQLite
Servers
Metrics
Logs
Zookeeper
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
Kibana
SQLite
Servers
Metrics
Logs
Zookeeper
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
Kibana
Servers
Metrics
Logs
Zookeeper
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash Elastic
Metric API
Alerting
Grafana
Kibana
Servers
Metrics
Logs
Zookeeper
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash
Metric API
Alerting
Grafana
Grafana?
Servers
Metrics
Logs
Zookeeper
Monasca
Existing Tools
Software
Network
Storage Log API
Logstash
Metric API
Alerting
Grafana
Grafana?
Servers
Metrics
Logs
Zookeeper
Monasca
Existing Tools
Software
Network
Storage Log API
Logstash
Metric API
Alerting
Grafana
Grafana?
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Metric API
Alerting
Grafana
Grafana?
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Metric API
Alerting
Grafana
Servers
Metrics
Logs
Postgres for Metrics
Requirements
Postgres for Metrics
Requirements
● ~45M values/day
(80x196 per 30s)
● 6 month history
● <1TB disk footprint
● <100ms queries
Postgres for Metrics
Combine Series
average over all
for {series=cpu}
[time range/interval]
Read Series
for each {type}
for {series=cpu}
[time range/interval]
Postgres for Metrics
List
Dimension
Values
List
Dimension
Names
List
Metric
Names
"metrics": [
"cpu.percent",
"cpu.user_perc",
"net.out_bytes_sec",
"net.out_errors_sec",
"net.in_bytes_sec",
"net.in_errors_sec"
…
]
"dimensions": [
"device",
"hostname",
"instance",
"mount_point",
"process_name",
"process_user"
…
]
"hostname": [
"dev-01",
"dev-02",
"staging-01",
"staging-02",
"prod-01",
"prod-02"
…
]
Postgres for Metrics
Data & Queries
Postgres for Metrics
"metric": {
"timestamp": 1232141412,
"name": "cpu.percent",
"value": 42,
"dimensions": { "hostname": "dev-01" },
"value_meta": { … }
}
JSON Ingest Format
Known, well defined structure
Varying set of dimensions key/values
Postgres for Metrics
CREATE TABLE measurements (
timestamp TIMESTAMPTZ,
name VARCHAR,
value FLOAT8,
dimensions JSONB,
value_meta JSON
);
Basic Denormalised Schema
Straightforward mapping onto input data
Data model for all schemas
Postgres for Metrics
SELECT
TIME_ROUND(timestamp, 60) AS timestamp,
AVG(value) AS avg
FROM
measurements
WHERE
timestamp BETWEEN '2015-01-01Z00:00:00'
AND '2015-01-01Z01:00:00'
AND name = 'cpu.percent'
AND dimensions @> '{"hostname": "dev-01"}'::JSONB
GROUP BY
timestamp
Single Series Query
One hour window | Single hostname
Measurements every 60 second interval
Postgres for Metrics
SELECT
TIME_ROUND(timestamp, 60) AS timestamp,
AVG(value) AS avg,
dimensions ->> 'hostname' AS hostname
FROM
measurements
WHERE
timestamp BETWEEN '2015-01-01Z00:00:00'
AND '2015-01-01Z01:00:00'
AND name = 'cpu.percent'
GROUP BY
timestamp, hostname
Group Multi-Series Query
One hour window | Every hostname
Measurements every 60 second interval
Postgres for Metrics
SELECT
TIME_ROUND(timestamp, 60) AS timestamp,
AVG(value) AS avg
FROM
measurements
WHERE
timestamp BETWEEN '2015-01-01Z00:00:00'
AND '2015-01-01Z01:00:00'
AND name = 'cpu.percent'
GROUP BY
timestamp
All Multi-Series Query
One hour window | Every hostname
Measurements every 60 second interval
Postgres for Metrics
SELECT DISTINCT
name
FROM
measurements
Metric Name List Query
:)
Postgres for Metrics
SELECT DISTINCT
JSONB_OBJECT_KEYS(dimensions)
AS d_name
WHERE
name = 'cpu.percent'
FROM
measurements
Dimension Name List Query
(for specific metric)
Postgres for Metrics
SELECT DISTINCT
dimensions ->> 'hostname'
AS d_value
WHERE
name = 'cpu.percent'
AND dimensions ? 'hostname'
FROM
measurements
Dimension Value List Query
(for specific metric and dimension)
Postgres for Metrics
Optimisation
Postgres for Metrics
CREATE TABLE measurements (
timestamp TIMESTAMPTZ,
name VARCHAR,
value FLOAT8,
dimensions JSONB,
value_meta JSON
);
CREATE INDEX ON measurements
(name, timestamp);
CREATE INDEX ON measurements USING GIN
(dimensions);
Indexes
Covers all necessary query terms
Using single GIN saves space, but slower
Postgres for Metrics
● Series Queries
● All, Group, Specific
● Varying Time Window/Interval
5m|15s, 1h|15s, 1h|300s, 6h|300s, 24h|300s
● Listing Queries
● Metric Names, Dimension Names & Values
● All, Partial
Postgres for Metrics
Single
Group
All
0 2000 4000 6000 8000 10000 12000
"Denormalised" Series Queries
5m (15s)
1h (15s)
1h (300s)
6h (300s)
24h (300s)
Duration (ms)
Postgres for Metrics
Single
Group
All
0 500 1000 1500 2000 2500
"Denormalised" Series Queries
5m (15s)
1h (15s)
1h (300s)
6h (300s)
24h (300s)
Duration (ms)
Postgres for Metrics
Dimension Values
Dimension Names
Metric Names
0 10000 20000 30000 40000 50000 60000
"Denormalised" Listing Queries
All
Partial
Duration (ms)
Postgres for Metrics
Dimension Values
Dimension Names
Metric Names
0 1000 2000 3000 4000 5000 6000 7000 8000
"Denormalised" Listing Queries
All
Partial
Duration (ms)
Postgres for Metrics
CREATE TABLE measurement_values (
timestamp TIMESTAMPTZ,
metric_id INT,
value FLOAT8,
value_meta JSON
);
CREATE TABLE metrics (
id SERIAL,
name VARCHAR,
dimensions JSONB,
);
Normalised Schema
Reduces duplication of data
Pre-built set of distinct metric definitions
Postgres for Metrics
CREATE FUNCTION get_metric_id (in_name VARCHAR, in_dims JSONB)
RETURNS INT LANGUAGE plpgsql AS $_$
DECLARE
out_id INT;
BEGIN
SELECT id INTO out_id FROM metrics AS m
WHERE m.name = in_name AND m.dimensions = in_dims;
IF NOT FOUND THEN
INSERT INTO metrics ("name", "dimensions") VALUES
(in_name, in_dims) RETURNING id INTO out_id;
END IF;
RETURN out_id;
END; $_$;
Normalised Schema
Function to use at insert time
Finds existing metric_id or allocates new
Postgres for Metrics
CREATE VIEW measurements AS
SELECT *
FROM measurement_values
INNER JOIN
metrics ON (metric_id = id);
CREATE INDEX metrics_idx ON
metrics (name, dimensions);
CREATE INDEX measurements_idx ON
measurement_values (metric_id, timestamp);
Normalised Schema
Same queries, use view to join
Extra index to help normalisation step
Postgres for Metrics
Single
Group
All
0 500 1000 1500 2000 2500
"Normalised" Series Queries
5m (15s)
1h (15s)
1h (300s)
6h (300s)
24h (300s)
Duration (ms)
Postgres for Metrics
Single
Group
All
0 200 400 600 800 1000
"Normalised" Series Queries
5m (15s)
1h (15s)
1h (300s)
6h (300s)
24h (300s)
Duration (ms)
Postgres for Metrics
Dimension Values
Dimension Names
Metric Names
0 200 400 600 800 1000
"Normalised" Listing Queries
All
Partial
Duration (ms)
Postgres for Metrics
● As time window grows
less detail is necessary, e.g.
● 30s interval at 1 hour
● 300s interval at 6 hour
Postgres for Metrics
Timestamp Metric Value
10:00:00 1 10
10:00:00 2 2
10:00:30 1 10
10:00:30 2 4
10:01:30 1 20
10:01:30 2 4
10:02:00 1 15
10:02:00 2 2
10:02:30 1 5
10:02:30 2 2
10:03:00 1 10
10:03:00 2 6
Timestamp Metric Value
10:00:00 1 40
10:00:00 2 10
10:02:00 1 30
10:02:00 2 8
Postgres for Metrics
CREATE TABLE summary_values_5m (
timestamp TIMESTAMPTZ,
metric_id INT,
value_sum FLOAT8,
value_count FLOAT8,
value_min FLOAT8,
value_max FLOAT8,
UNIQUE (metric_id, timestamp)
);
Summarised Schema
Pre-compute every 5m (300s) interval
Functions to be applied must be known
Postgres for Metrics
CREATE FUNCTION update_summarise () RETURNS TRIGGER
LANGUAGE plpgsql AS $_$
BEGIN
INSERT INTO summary_values_5m VALUES (
TIME_ROUND(NEW.timestamp, 300), NEW.metric_id,
NEW.value, 1, NEW.value, NEW.value)
ON CONFLICT (metric_id, timestamp)
DO UPDATE SET
value_sum = value_sum + EXCLUDED.value_sum,
value_count = value_count + EXCLUDED.value_count,
value_min = LEAST (value_min, EXCLUDED.value_min),
value_max = GREATEST(value_max, EXCLUDED.value_max);
RETURN NULL;
END; $_$;
Summarised Schema
Entry for each metric/rounded time period
Update existing entries by aggregating
Postgres for Metrics
CREATE TRIGGER update_summarise_trigger
AFTER INSERT ON measurement_values
FOR EACH ROW
EXECUTE PROCEDURE update_summarise ();
CREATE VIEW summary_5m AS
SELECT *
FROM
summary_values_5m INNER JOIN metrics
ON (metric_id = id);
Summarised Schema
Trigger applies row to summary table
View mainly for convenience when querying
Postgres for Metrics
SELECT
TIME_ROUND(timestamp, 300) AS timestamp,
AVG(value) AS avg
FROM
measurements
WHERE
timestamp BETWEEN '2015-01-01Z00:00:00'
AND '2015-01-01Z06:00:00'
AND name = 'cpu.percent'
GROUP BY
timestamp
Combined Series Query
Six hour window | Every hostname
Measurements every 300 second interval
Postgres for Metrics
SELECT
TIME_ROUND(timestamp, 300) AS timestamp,
SUM(value_sum) / SUM(value_count) AS avg
FROM
summary_5m
WHERE
timestamp BETWEEN '2015-01-01Z00:00:00'
AND '2015-01-01Z06:00:00'
AND name = 'cpu.percent'
GROUP BY
timestamp
Combined Series Query
Use pre-aggregated summary table
Mostly the same; extra fiddling for AVG
Postgres for Metrics
Single
Group
All
0 200 400 600 800 1000
"Summarised" Series Queries
5m (15s)
1h (15s)
1h (300s)
6h (300s)
24h (300s)
Duration (ms)
Postgres for Metrics
Dimension Values
Dimension Names
Metric Names
0 200 400 600 800 1000
"Summarised" Listing Queries
All
Partial
Duration (ms)
Postgres for Metrics
Summarised
Normalised
Denormalised
0 10000 20000 30000 40000 50000 60000 70000 80000 90000
Ingest Time (1 day / 45M rows)
Seconds
Postgres for Metrics
Summarised
Normalised
Denormalised
0 500 1000 1500 2000 2500 3000 3500 4000
Ingest Time (1 day / 45M rows)
Seconds
Postgres for Metrics
Summarised
Normalised
Denormalised
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Disk Usage (1 day / 45M rows)
MB
Postgres for Metrics
● Need coarser summaries for wider
queries (e.g. 30m summaries)
● Need to partition data by day to:
● Retain ingest rate due to indexes
● Optimise dropping old data
● Much better ways to produce summaries
to optimise ingest, specifically:
● Process rows in batches of interval size
● Process asynchronous to ingest transaction
Postgres for…
Postgres for…
Log Searching
Postgres for Log Searching
Requirements
● Central log storage
● Trivially searchable
● Time bounded
● Filter ‘dimensions’
● Interactive query
times (<100ms)
Postgres for Log Searching
"log": {
"timestamp": 1232141412,
"message":
"Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)",
"dimensions": {
"severity": 6,
"facility": 16,
"pid": "39762",
"program": "haproxy"
"hostname": "dev-controller-0"
},
}
Log Ingest Format
Typically sourced from rsyslog
Varying set of dimensions key/values
Postgres for Log Searching
CREATE TABLE logs (
timestamp TIMESTAMPTZ,
message VARCHAR,
dimensions JSONB
);
Basic Schema
Straightforward mapping of source data
Allow for maximum dimension flexibility
Postgres for Log Searching
connection AND program:haproxy
Query Example
Kibana/Elastic style using PG-FTS
SELECT *
FROM logs
WHERE
TO_TSVECTOR('english', message)
@@ TO_TSQUERY('connection')
AND dimensions @> '{"program":"haproxy"}';
Postgres for Log Searching
CREATE INDEX ON logs
USING GIN
(TO_TSVECTOR('english', message));
CREATE INDEX ON logs
USING GIN
(dimensions);
Indexes
Enables fast text search on ‘message’
& Fast filtering based on ‘dimensions’
Postgres for …
Log Parsing
Postgres for Log Parsing
"log": {
"timestamp": 1232141412,
"message":
"Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)",
"dimensions": {
"severity": 6,
"facility": 16,
"pid": "39762",
"program": "haproxy",
"hostname": "dev-controller-0"
},
}
Postgres for Log Parsing
"log": {
"timestamp": 1232141412,
"message":
"Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)",
"dimensions": {
"severity": 6,
"facility": 16,
"pid": "39762",
"program": "haproxy",
"hostname": "dev-controller-0"
},
}
Postgres for Log Parsing
"log": {
"timestamp": 1232141412,
"message":
"Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)",
"dimensions": {
"severity": 6,
"facility": 16,
"pid": "39762",
"program": "haproxy",
"hostname": "dev-controller-0",
"tags": [ "connect" ]
},
}
Postgres for Log Parsing
"log": {
"timestamp": 1232141412,
"message":
"Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)",
"dimensions": {
"severity": 6,
"facility": 16,
"pid": "39762",
"program": "haproxy",
"hostname": "dev-controller-0",
"tags": [ "connect" ]
},
}
Postgres for Log Parsing
"log": {
"timestamp": 1232141412,
"message":
"Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)",
"dimensions": {
"severity": 6,
"facility": 16,
"pid": "39762",
"program": "haproxy",
"hostname": "dev-controller-0",
"tags": [ "connect" ],
"src_ip": "172.16.8.1",
"src_port": "52690",
"dest_ip": "172.16.8.10",
"dest_port": "5000",
"service_name": "keystone",
"protocol": "HTTP"
},
}
Postgres for Log Parsing
….regex!
# SELECT REGEXP_MATCHES(
'Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)',
'Connect from '
|| '(d+.d+.d+.d+):(d+) to (d+.d+.d+.d+):(d+)'
|| ' ((w+)/(w+))'
);
regexp_matches
---------------------------------------------------
{172.16.8.1,52690,172.16.8.10,5000,keystone,HTTP}
(1 row)
Postgres for Log Parsing
Garnish with JSONB
# SELECT JSONB_PRETTY(JSONB_OBJECT(
'{src_ip,src_port,dest_ip,dest_port,service, protocol}',
'{172.16.8.1,52690,172.16.8.10,5000,keystone,HTTP}'
));
jsonb_pretty
-------------------------------
{ +
"src_ip": "172.16.8.1", +
"dest_ip": "172.16.8.10",+
"service": "keystone", +
"protocol": "HTTP", +
"src_port": "52690", +
"dest_port": "5000" +
}
(1 row)
Postgres for Log Parsing
CREATE TABLE logs (
timestamp TIMESTAMPTZ,
message VARCHAR,
dimensions JSONB
);
Log Schema – Goals:
Parse message against set of patterns
Add extracted information as dimensions
Postgres for Log Parsing
Patterns Table
Store pattern to match and field names
CREATE TABLE patterns (
regex VARCHAR,
field_names VARCHAR[]
);
INSERT INTO patterns (regex, fields_names) VALUES (
'Connect from '
|| '(d+.d+.d+.d+):(d+) to (d+.d+.d+.d+):(d+)'
|| ' ((w+)/(w+))',
'{src_ip,src_port,dest_ip,dest_port,service,protocol}'
);
Postgres for Log Parsing
Log Processing
Apply all configured patterns to new rows
CREATE FUNCTION process_log () RETURNS TRIGGER
LANGUAGE PLPGSQL AS $_$
DECLARE
m JSONB; p RECORD;
BEGIN
FOR p IN SELECT * FROM patterns LOOP
m := JSONB_OBJECT(p.field_names,
REGEXP_MATCHES(NEW.message, p.regex));
IF m IS NOT NULL THEN
NEW.dimensions := NEW.dimensions || m
END IF;
END LOOP;
RETURN NEW;
END; $_$;
Postgres for Log Parsing
CREATE TRIGGER process_log_trigger
BEFORE INSERT ON logs
FOR EACH ROW
EXECUTE PROCEDURE process_log ();
Log Processing Trigger
Apply patterns as messages and extend
dimensions, as inserted into logs table
Postgres for Log Parsing
# INSERT INTO logs (timestamp, message, dimensions) VALUES (
'2017-01-03T06:29:09.043Z',
'Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)',
'{"hostname": "dev-controller-0", "program": "haproxy"}');
# SELECT timestamp, message, JSONB_PRETTY(dimensions) FROM logs;
-[ RECORD 1 ]+------------------------------------------------------------------
timestamp | 2017-01-03 06:29:09.043+00
message | Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)
jsonb_pretty | { +
| "src_ip": "172.16.8.1", +
| "dest_ip": "172.16.8.10", +
| "program": "haproxy", +
| "service": "keystone", +
| "hostname": "dev-controller-0", +
| "protocol": "HTTP", +
| "src_port": "52690", +
| "dest_port": "5000" +
| }
Postgres for …
Queueing
Requirements
● Offload data burden
from producers
● Persist as soon as
possible to avoid loss
● Handle high velocity
burst loads
● Data does not need
to be queryable
Postgres for Queueing
Postgres for Queueing
WITH BINARY
VARCHAR
JSON
JSONB
Denormalised
Normalised
Summarised
0 50 100 150 200 250 300 350 400
Ingest Rate (1d / 45M rows)
K-row/sec
Postgres for Queueing
WITH BINARY
VARCHAR
JSON
JSONB
Denormalised
Normalised
Summarised
0 50 100 150 200 250 300 350 400
Ingest Rate (1d / 45M rows)
K-row/sec
Postgres for Queueing
WITH BINARY
VARCHAR
JSON
JSONB
Denormalised
Normalised
Summarised
0 50 100 150 200 250 300 350 400
Ingest Rate (1d / 45M rows)
K-row/sec
Postgres for Queueing
WITH BINARY
VARCHAR
JSON
JSONB
Denormalised
Normalised
Summarised
0 50 100 150 200 250 300 350 400
Ingest Rate (1d / 45M rows)
K-row/sec
Postgres for Queueing
WITH BINARY
VARCHAR
JSON
JSONB
Denormalised
Normalised
Summarised
0 50 100 150 200 250 300 350 400
Ingest Rate (1d / 45M rows)
K-row/sec
Conclusion.. ?
Conclusion… ?
● I view Postgres as a very flexible
“data persistence toolbox”
● ...which happens to use SQL
● Batteries not always included
● That doesn’t mean it’s hard
● Operational advantages of using
general purpose tools can be huge
● Use & deploy what you know & trust

More Related Content

What's hot

Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Gruter
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best PracticesCloudera, Inc.
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopGruter
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonMiklos Christine
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeDataWorks Summit
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoHyunsik Choi
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015N Masahiro
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopPrasanna Rajaperumal
 

What's hot (20)

Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajo
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 
Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoop
 

Viewers also liked

PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL-Consulting
 
Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016PostgreSQL-Consulting
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
 
PostgreSQL Hooks for Fun and Profit
PostgreSQL Hooks for Fun and ProfitPostgreSQL Hooks for Fun and Profit
PostgreSQL Hooks for Fun and ProfitDavid Fetter
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataJimmy Angelakos
 
Dissertation Defense Dec. 11, Participatory Infrastructure Monitoring
Dissertation Defense Dec. 11, Participatory Infrastructure MonitoringDissertation Defense Dec. 11, Participatory Infrastructure Monitoring
Dissertation Defense Dec. 11, Participatory Infrastructure MonitoringDietmar Offenhuber
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseJimmy Angelakos
 
IT Executive Survey: Strategies for Monitoring IT Infrastructure & Services
IT Executive Survey: Strategies for Monitoring IT  Infrastructure & ServicesIT Executive Survey: Strategies for Monitoring IT  Infrastructure & Services
IT Executive Survey: Strategies for Monitoring IT Infrastructure & ServicesCA Technologies
 
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)Ontico
 
Как PostgreSQL работает с диском
Как PostgreSQL работает с дискомКак PostgreSQL работает с диском
Как PostgreSQL работает с дискомPostgreSQL-Consulting
 
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...Grier Johnson
 
Microsoft Infrastructure Monitoring using OpManager
Microsoft Infrastructure Monitoring using OpManagerMicrosoft Infrastructure Monitoring using OpManager
Microsoft Infrastructure Monitoring using OpManagerManageEngine
 
Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)
Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)
Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)Badoo Development
 
"Выбраться из спама - как повысить CTR рассылки без потери активности". Андре...
"Выбраться из спама - как повысить CTR рассылки без потери активности". Андре..."Выбраться из спама - как повысить CTR рассылки без потери активности". Андре...
"Выбраться из спама - как повысить CTR рассылки без потери активности". Андре...Badoo Development
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL-Consulting
 
Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)
Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)
Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)Badoo Development
 
Open Source Monitoring Tools Shootout
Open Source Monitoring Tools ShootoutOpen Source Monitoring Tools Shootout
Open Source Monitoring Tools Shootouttomdc
 

Viewers also liked (20)

PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
 
Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in Transwerwise
 
PostgreSQL Hooks for Fun and Profit
PostgreSQL Hooks for Fun and ProfitPostgreSQL Hooks for Fun and Profit
PostgreSQL Hooks for Fun and Profit
 
Backups
BackupsBackups
Backups
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic Data
 
Dissertation Defense Dec. 11, Participatory Infrastructure Monitoring
Dissertation Defense Dec. 11, Participatory Infrastructure MonitoringDissertation Defense Dec. 11, Participatory Infrastructure Monitoring
Dissertation Defense Dec. 11, Participatory Infrastructure Monitoring
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
 
IT Executive Survey: Strategies for Monitoring IT Infrastructure & Services
IT Executive Survey: Strategies for Monitoring IT  Infrastructure & ServicesIT Executive Survey: Strategies for Monitoring IT  Infrastructure & Services
IT Executive Survey: Strategies for Monitoring IT Infrastructure & Services
 
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)
 
Как PostgreSQL работает с диском
Как PostgreSQL работает с дискомКак PostgreSQL работает с диском
Как PostgreSQL работает с диском
 
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
 
Microsoft Infrastructure Monitoring using OpManager
Microsoft Infrastructure Monitoring using OpManagerMicrosoft Infrastructure Monitoring using OpManager
Microsoft Infrastructure Monitoring using OpManager
 
Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)
Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)
Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)
 
"Выбраться из спама - как повысить CTR рассылки без потери активности". Андре...
"Выбраться из спама - как повысить CTR рассылки без потери активности". Андре..."Выбраться из спама - как повысить CTR рассылки без потери активности". Андре...
"Выбраться из спама - как повысить CTR рассылки без потери активности". Андре...
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQ
 
Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)
Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)
Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)
 
Open Source Monitoring Tools Shootout
Open Source Monitoring Tools ShootoutOpen Source Monitoring Tools Shootout
Open Source Monitoring Tools Shootout
 

Similar to Infrastructure Monitoring with Postgres

How Cloudflare analyzes -1m dns queries per second @ Percona E17
How Cloudflare analyzes -1m dns queries per second @ Percona E17How Cloudflare analyzes -1m dns queries per second @ Percona E17
How Cloudflare analyzes -1m dns queries per second @ Percona E17Tom Arnfeld
 
Riga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSRiga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSAntons Kranga
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...HostedbyConfluent
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudMongoDB
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoNathaniel Braun
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseAll Things Open
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformEva Tse
 
Solr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverSolr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverAlex Pinkin
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase HBaseCon
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analyticsXiang Fu
 
Fom io t_to_bigdata_step_by_step-final
Fom io t_to_bigdata_step_by_step-finalFom io t_to_bigdata_step_by_step-final
Fom io t_to_bigdata_step_by_step-finalLuis Filipe Silva
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...Altinity Ltd
 
MySQL performance monitoring using Statsd and Graphite
MySQL performance monitoring using Statsd and GraphiteMySQL performance monitoring using Statsd and Graphite
MySQL performance monitoring using Statsd and GraphiteDB-Art
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Dataplumbee
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppet
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightScyllaDB
 

Similar to Infrastructure Monitoring with Postgres (20)

How Cloudflare analyzes -1m dns queries per second @ Percona E17
How Cloudflare analyzes -1m dns queries per second @ Percona E17How Cloudflare analyzes -1m dns queries per second @ Percona E17
How Cloudflare analyzes -1m dns queries per second @ Percona E17
 
Riga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSRiga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWS
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
 
Solr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverSolr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World Over
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Fom io t_to_bigdata_step_by_step-final
Fom io t_to_bigdata_step_by_step-finalFom io t_to_bigdata_step_by_step-final
Fom io t_to_bigdata_step_by_step-final
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
MySQL performance monitoring using Statsd and Graphite
MySQL performance monitoring using Statsd and GraphiteMySQL performance monitoring using Statsd and Graphite
MySQL performance monitoring using Statsd and Graphite
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Data
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
 

Recently uploaded

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 

Recently uploaded (20)

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 

Infrastructure Monitoring with Postgres

  • 1. Infrastructure Monitoring (with Postgres, obviously) Steve Simpson StackHPC steve@stackhpc.com www.stackhpc.com
  • 2. Overview 1) Background 2) Monitoring Postgres for Metrics 3) Requirements 4) Data & Queries 5) Optimisation Postgres for ... 6) Log Searching 7) Log Parsing 8) Queueing
  • 5. Background Based in Bristol, UK Thriving Tech Industry
  • 6. Background ● Gnodal ● 10GbE Ethernet ● ASIC Verification ● Embedded Firmware ● JustOne Database ● Agile “Big Data” RMDBS ● Based on PostgreSQL ● Storage Team Lead
  • 7. Background Consultancy for HPC on OpenStack Multi-tenant massively parallel workloads Monitoring complex infrastructure Stack HPC
  • 8. Background Cloud orchestration platform IaaS through API and dashboard Multi-tenancy throughout Network, Compute, Storage
  • 9. Background Operational visibility is critical OpenStack is a complex, distributed application …to run your complex, distributed applications
  • 11. Monitoring Requirements Gain visibility into the operation of the hardware and software e.g. web site, database, cluster, disk drive
  • 12. Monitoring Requirements Fault finding and alerting Notify me when a server or service is unavailable, a disk needs replacing, ... Fault post-mortem, pre-emption Why did the outage occur and what can we do to prevent it next time
  • 13. Monitoring Requirements Utilisation and efficiency analysis Is all the hardware we own being used? Is it being used efficiently? Performance monitoring and profiling How long are my web/database requests?
  • 14. Monitoring Requirements Auditing (security, billing) Tracking users use of system Auditing access to systems or resources Decision making, future planning What is expected growth in data, or users? What of the current system is most used?
  • 16. Existing Tools Checking and Alerting Agents check on machines or services Report centrally, notify users via dashboard Store history of events in database
  • 17. Existing Tools Nagios / Icinga ping -c 1 $host || mail -s “Help!” $me
  • 19. Existing Tools Metrics Periodically collect metrics, e.g. CPU% Store in central database for visualization Some systems allow checking on top
  • 22. Existing Tools Grafana - visualization only
  • 23. Existing Tools Metrics Databases ● Ganglia (RRDtool) ● Graphite (Whisper) ● OpenTSDB (HBase) ● KairosDB (Cassandra) ● InfluxDB ● Prometheus ● Gnocchi ● Atlas ● Heroic ● Hawkular (Cassandra) ● MetricTank (Cassandra) ● Riak TS (Riak) ● Blueflood (Cassandra) ● DalmatinerDB ● Druid ● BTrDB ● Warp 10 (Hbase) ● Tgres (PostgreSQL!)
  • 24. Existing Tools Metrics Databases ● Ganglia [Berkley] ● Graphite [Orbitz] ● OpenTSDB [Stubleupon] ● KairosDB ● InfluxDB ● Prometheus [SoundCloud] ● Gnocchi [OpenStack] ● Atlas [Netflix] ● Heroic [Spotify] ● Hawkular [Redhat] ● MetricTank [Raintank] ● Riak TS [Basho] ● Blueflood [Rackspace] ● DalmatinerDB ● Druid ● BTrDB ● Warp 10 ● Tgres
  • 31. Monasca Existing Tools Software Network Storage Log API Metric API Alerting Servers Metrics Logs
  • 32. Monasca Existing Tools Software Network Storage Log API Metric API Alerting MySQL Servers Metrics Logs
  • 33. Monasca Existing Tools Software Network Storage Log API InfluxDB Metric API Alerting MySQL Servers Metrics Logs
  • 34. Monasca Existing Tools Software Network Storage Log API InfluxDB Metric API Alerting Grafana MySQL SQLite Servers Metrics Logs
  • 35. Monasca Existing Tools Software Network Storage Log API Logstash Elastic InfluxDB Metric API Alerting Grafana MySQL SQLite Servers Metrics Logs
  • 36. Monasca Existing Tools Software Network Storage Log API Logstash Elastic InfluxDB Metric API Alerting Grafana Kibana MySQL SQLite Servers Metrics Logs
  • 37. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Elastic InfluxDB Metric API Alerting Grafana Kibana MySQL SQLite Servers Metrics Logs
  • 38. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Elastic InfluxDB Metric API Alerting Grafana Kibana MySQL SQLite Servers Metrics Logs Zookeeper
  • 39. Existing Tools Commendable “right tool for the job” attitude, but… How about Postgres? Fewer points of failure Fewer places to backup Fewer redundancy protocols One set of consistent data semantics Re-use existing operational knowledge
  • 40. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Elastic InfluxDB Metric API Alerting Grafana Kibana MySQL SQLite Servers Metrics Logs Zookeeper
  • 41. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Elastic InfluxDB Metric API Alerting Grafana Kibana SQLite Servers Metrics Logs Zookeeper
  • 42. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Elastic InfluxDB Metric API Alerting Grafana Kibana Servers Metrics Logs Zookeeper
  • 43. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Elastic Metric API Alerting Grafana Kibana Servers Metrics Logs Zookeeper
  • 44. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Metric API Alerting Grafana Grafana? Servers Metrics Logs Zookeeper
  • 45. Monasca Existing Tools Software Network Storage Log API Logstash Metric API Alerting Grafana Grafana? Servers Metrics Logs Zookeeper
  • 46. Monasca Existing Tools Software Network Storage Log API Logstash Metric API Alerting Grafana Grafana? Servers Metrics Logs
  • 47. Monasca Existing Tools Software Network Storage Log API Metric API Alerting Grafana Grafana? Servers Metrics Logs
  • 48. Monasca Existing Tools Software Network Storage Log API Metric API Alerting Grafana Servers Metrics Logs
  • 50. Postgres for Metrics Requirements ● ~45M values/day (80x196 per 30s) ● 6 month history ● <1TB disk footprint ● <100ms queries
  • 51. Postgres for Metrics Combine Series average over all for {series=cpu} [time range/interval] Read Series for each {type} for {series=cpu} [time range/interval]
  • 52. Postgres for Metrics List Dimension Values List Dimension Names List Metric Names "metrics": [ "cpu.percent", "cpu.user_perc", "net.out_bytes_sec", "net.out_errors_sec", "net.in_bytes_sec", "net.in_errors_sec" … ] "dimensions": [ "device", "hostname", "instance", "mount_point", "process_name", "process_user" … ] "hostname": [ "dev-01", "dev-02", "staging-01", "staging-02", "prod-01", "prod-02" … ]
  • 54. Postgres for Metrics "metric": { "timestamp": 1232141412, "name": "cpu.percent", "value": 42, "dimensions": { "hostname": "dev-01" }, "value_meta": { … } } JSON Ingest Format Known, well defined structure Varying set of dimensions key/values
  • 55. Postgres for Metrics CREATE TABLE measurements ( timestamp TIMESTAMPTZ, name VARCHAR, value FLOAT8, dimensions JSONB, value_meta JSON ); Basic Denormalised Schema Straightforward mapping onto input data Data model for all schemas
  • 56. Postgres for Metrics SELECT TIME_ROUND(timestamp, 60) AS timestamp, AVG(value) AS avg FROM measurements WHERE timestamp BETWEEN '2015-01-01Z00:00:00' AND '2015-01-01Z01:00:00' AND name = 'cpu.percent' AND dimensions @> '{"hostname": "dev-01"}'::JSONB GROUP BY timestamp Single Series Query One hour window | Single hostname Measurements every 60 second interval
  • 57. Postgres for Metrics SELECT TIME_ROUND(timestamp, 60) AS timestamp, AVG(value) AS avg, dimensions ->> 'hostname' AS hostname FROM measurements WHERE timestamp BETWEEN '2015-01-01Z00:00:00' AND '2015-01-01Z01:00:00' AND name = 'cpu.percent' GROUP BY timestamp, hostname Group Multi-Series Query One hour window | Every hostname Measurements every 60 second interval
  • 58. Postgres for Metrics SELECT TIME_ROUND(timestamp, 60) AS timestamp, AVG(value) AS avg FROM measurements WHERE timestamp BETWEEN '2015-01-01Z00:00:00' AND '2015-01-01Z01:00:00' AND name = 'cpu.percent' GROUP BY timestamp All Multi-Series Query One hour window | Every hostname Measurements every 60 second interval
  • 59. Postgres for Metrics SELECT DISTINCT name FROM measurements Metric Name List Query :)
  • 60. Postgres for Metrics SELECT DISTINCT JSONB_OBJECT_KEYS(dimensions) AS d_name WHERE name = 'cpu.percent' FROM measurements Dimension Name List Query (for specific metric)
  • 61. Postgres for Metrics SELECT DISTINCT dimensions ->> 'hostname' AS d_value WHERE name = 'cpu.percent' AND dimensions ? 'hostname' FROM measurements Dimension Value List Query (for specific metric and dimension)
  • 63. Postgres for Metrics CREATE TABLE measurements ( timestamp TIMESTAMPTZ, name VARCHAR, value FLOAT8, dimensions JSONB, value_meta JSON ); CREATE INDEX ON measurements (name, timestamp); CREATE INDEX ON measurements USING GIN (dimensions); Indexes Covers all necessary query terms Using single GIN saves space, but slower
  • 64. Postgres for Metrics ● Series Queries ● All, Group, Specific ● Varying Time Window/Interval 5m|15s, 1h|15s, 1h|300s, 6h|300s, 24h|300s ● Listing Queries ● Metric Names, Dimension Names & Values ● All, Partial
  • 65. Postgres for Metrics Single Group All 0 2000 4000 6000 8000 10000 12000 "Denormalised" Series Queries 5m (15s) 1h (15s) 1h (300s) 6h (300s) 24h (300s) Duration (ms)
  • 66. Postgres for Metrics Single Group All 0 500 1000 1500 2000 2500 "Denormalised" Series Queries 5m (15s) 1h (15s) 1h (300s) 6h (300s) 24h (300s) Duration (ms)
  • 67. Postgres for Metrics Dimension Values Dimension Names Metric Names 0 10000 20000 30000 40000 50000 60000 "Denormalised" Listing Queries All Partial Duration (ms)
  • 68. Postgres for Metrics Dimension Values Dimension Names Metric Names 0 1000 2000 3000 4000 5000 6000 7000 8000 "Denormalised" Listing Queries All Partial Duration (ms)
  • 69. Postgres for Metrics CREATE TABLE measurement_values ( timestamp TIMESTAMPTZ, metric_id INT, value FLOAT8, value_meta JSON ); CREATE TABLE metrics ( id SERIAL, name VARCHAR, dimensions JSONB, ); Normalised Schema Reduces duplication of data Pre-built set of distinct metric definitions
  • 70. Postgres for Metrics CREATE FUNCTION get_metric_id (in_name VARCHAR, in_dims JSONB) RETURNS INT LANGUAGE plpgsql AS $_$ DECLARE out_id INT; BEGIN SELECT id INTO out_id FROM metrics AS m WHERE m.name = in_name AND m.dimensions = in_dims; IF NOT FOUND THEN INSERT INTO metrics ("name", "dimensions") VALUES (in_name, in_dims) RETURNING id INTO out_id; END IF; RETURN out_id; END; $_$; Normalised Schema Function to use at insert time Finds existing metric_id or allocates new
  • 71. Postgres for Metrics CREATE VIEW measurements AS SELECT * FROM measurement_values INNER JOIN metrics ON (metric_id = id); CREATE INDEX metrics_idx ON metrics (name, dimensions); CREATE INDEX measurements_idx ON measurement_values (metric_id, timestamp); Normalised Schema Same queries, use view to join Extra index to help normalisation step
  • 72. Postgres for Metrics Single Group All 0 500 1000 1500 2000 2500 "Normalised" Series Queries 5m (15s) 1h (15s) 1h (300s) 6h (300s) 24h (300s) Duration (ms)
  • 73. Postgres for Metrics Single Group All 0 200 400 600 800 1000 "Normalised" Series Queries 5m (15s) 1h (15s) 1h (300s) 6h (300s) 24h (300s) Duration (ms)
  • 74. Postgres for Metrics Dimension Values Dimension Names Metric Names 0 200 400 600 800 1000 "Normalised" Listing Queries All Partial Duration (ms)
  • 75. Postgres for Metrics ● As time window grows less detail is necessary, e.g. ● 30s interval at 1 hour ● 300s interval at 6 hour
  • 76. Postgres for Metrics Timestamp Metric Value 10:00:00 1 10 10:00:00 2 2 10:00:30 1 10 10:00:30 2 4 10:01:30 1 20 10:01:30 2 4 10:02:00 1 15 10:02:00 2 2 10:02:30 1 5 10:02:30 2 2 10:03:00 1 10 10:03:00 2 6 Timestamp Metric Value 10:00:00 1 40 10:00:00 2 10 10:02:00 1 30 10:02:00 2 8
  • 77. Postgres for Metrics CREATE TABLE summary_values_5m ( timestamp TIMESTAMPTZ, metric_id INT, value_sum FLOAT8, value_count FLOAT8, value_min FLOAT8, value_max FLOAT8, UNIQUE (metric_id, timestamp) ); Summarised Schema Pre-compute every 5m (300s) interval Functions to be applied must be known
  • 78. Postgres for Metrics CREATE FUNCTION update_summarise () RETURNS TRIGGER LANGUAGE plpgsql AS $_$ BEGIN INSERT INTO summary_values_5m VALUES ( TIME_ROUND(NEW.timestamp, 300), NEW.metric_id, NEW.value, 1, NEW.value, NEW.value) ON CONFLICT (metric_id, timestamp) DO UPDATE SET value_sum = value_sum + EXCLUDED.value_sum, value_count = value_count + EXCLUDED.value_count, value_min = LEAST (value_min, EXCLUDED.value_min), value_max = GREATEST(value_max, EXCLUDED.value_max); RETURN NULL; END; $_$; Summarised Schema Entry for each metric/rounded time period Update existing entries by aggregating
  • 79. Postgres for Metrics CREATE TRIGGER update_summarise_trigger AFTER INSERT ON measurement_values FOR EACH ROW EXECUTE PROCEDURE update_summarise (); CREATE VIEW summary_5m AS SELECT * FROM summary_values_5m INNER JOIN metrics ON (metric_id = id); Summarised Schema Trigger applies row to summary table View mainly for convenience when querying
  • 80. Postgres for Metrics SELECT TIME_ROUND(timestamp, 300) AS timestamp, AVG(value) AS avg FROM measurements WHERE timestamp BETWEEN '2015-01-01Z00:00:00' AND '2015-01-01Z06:00:00' AND name = 'cpu.percent' GROUP BY timestamp Combined Series Query Six hour window | Every hostname Measurements every 300 second interval
  • 81. Postgres for Metrics SELECT TIME_ROUND(timestamp, 300) AS timestamp, SUM(value_sum) / SUM(value_count) AS avg FROM summary_5m WHERE timestamp BETWEEN '2015-01-01Z00:00:00' AND '2015-01-01Z06:00:00' AND name = 'cpu.percent' GROUP BY timestamp Combined Series Query Use pre-aggregated summary table Mostly the same; extra fiddling for AVG
  • 82. Postgres for Metrics Single Group All 0 200 400 600 800 1000 "Summarised" Series Queries 5m (15s) 1h (15s) 1h (300s) 6h (300s) 24h (300s) Duration (ms)
  • 83. Postgres for Metrics Dimension Values Dimension Names Metric Names 0 200 400 600 800 1000 "Summarised" Listing Queries All Partial Duration (ms)
  • 84. Postgres for Metrics Summarised Normalised Denormalised 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Ingest Time (1 day / 45M rows) Seconds
  • 85. Postgres for Metrics Summarised Normalised Denormalised 0 500 1000 1500 2000 2500 3000 3500 4000 Ingest Time (1 day / 45M rows) Seconds
  • 86. Postgres for Metrics Summarised Normalised Denormalised 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Disk Usage (1 day / 45M rows) MB
  • 87. Postgres for Metrics ● Need coarser summaries for wider queries (e.g. 30m summaries) ● Need to partition data by day to: ● Retain ingest rate due to indexes ● Optimise dropping old data ● Much better ways to produce summaries to optimise ingest, specifically: ● Process rows in batches of interval size ● Process asynchronous to ingest transaction
  • 90. Postgres for Log Searching Requirements ● Central log storage ● Trivially searchable ● Time bounded ● Filter ‘dimensions’ ● Interactive query times (<100ms)
  • 91. Postgres for Log Searching "log": { "timestamp": 1232141412, "message": "Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)", "dimensions": { "severity": 6, "facility": 16, "pid": "39762", "program": "haproxy" "hostname": "dev-controller-0" }, } Log Ingest Format Typically sourced from rsyslog Varying set of dimensions key/values
  • 92. Postgres for Log Searching CREATE TABLE logs ( timestamp TIMESTAMPTZ, message VARCHAR, dimensions JSONB ); Basic Schema Straightforward mapping of source data Allow for maximum dimension flexibility
  • 93. Postgres for Log Searching connection AND program:haproxy Query Example Kibana/Elastic style using PG-FTS SELECT * FROM logs WHERE TO_TSVECTOR('english', message) @@ TO_TSQUERY('connection') AND dimensions @> '{"program":"haproxy"}';
  • 94. Postgres for Log Searching CREATE INDEX ON logs USING GIN (TO_TSVECTOR('english', message)); CREATE INDEX ON logs USING GIN (dimensions); Indexes Enables fast text search on ‘message’ & Fast filtering based on ‘dimensions’
  • 96. Postgres for Log Parsing "log": { "timestamp": 1232141412, "message": "Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)", "dimensions": { "severity": 6, "facility": 16, "pid": "39762", "program": "haproxy", "hostname": "dev-controller-0" }, }
  • 97. Postgres for Log Parsing "log": { "timestamp": 1232141412, "message": "Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)", "dimensions": { "severity": 6, "facility": 16, "pid": "39762", "program": "haproxy", "hostname": "dev-controller-0" }, }
  • 98. Postgres for Log Parsing "log": { "timestamp": 1232141412, "message": "Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)", "dimensions": { "severity": 6, "facility": 16, "pid": "39762", "program": "haproxy", "hostname": "dev-controller-0", "tags": [ "connect" ] }, }
  • 99. Postgres for Log Parsing "log": { "timestamp": 1232141412, "message": "Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)", "dimensions": { "severity": 6, "facility": 16, "pid": "39762", "program": "haproxy", "hostname": "dev-controller-0", "tags": [ "connect" ] }, }
  • 100. Postgres for Log Parsing "log": { "timestamp": 1232141412, "message": "Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)", "dimensions": { "severity": 6, "facility": 16, "pid": "39762", "program": "haproxy", "hostname": "dev-controller-0", "tags": [ "connect" ], "src_ip": "172.16.8.1", "src_port": "52690", "dest_ip": "172.16.8.10", "dest_port": "5000", "service_name": "keystone", "protocol": "HTTP" }, }
  • 101. Postgres for Log Parsing ….regex! # SELECT REGEXP_MATCHES( 'Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)', 'Connect from ' || '(d+.d+.d+.d+):(d+) to (d+.d+.d+.d+):(d+)' || ' ((w+)/(w+))' ); regexp_matches --------------------------------------------------- {172.16.8.1,52690,172.16.8.10,5000,keystone,HTTP} (1 row)
  • 102. Postgres for Log Parsing Garnish with JSONB # SELECT JSONB_PRETTY(JSONB_OBJECT( '{src_ip,src_port,dest_ip,dest_port,service, protocol}', '{172.16.8.1,52690,172.16.8.10,5000,keystone,HTTP}' )); jsonb_pretty ------------------------------- { + "src_ip": "172.16.8.1", + "dest_ip": "172.16.8.10",+ "service": "keystone", + "protocol": "HTTP", + "src_port": "52690", + "dest_port": "5000" + } (1 row)
  • 103. Postgres for Log Parsing CREATE TABLE logs ( timestamp TIMESTAMPTZ, message VARCHAR, dimensions JSONB ); Log Schema – Goals: Parse message against set of patterns Add extracted information as dimensions
  • 104. Postgres for Log Parsing Patterns Table Store pattern to match and field names CREATE TABLE patterns ( regex VARCHAR, field_names VARCHAR[] ); INSERT INTO patterns (regex, fields_names) VALUES ( 'Connect from ' || '(d+.d+.d+.d+):(d+) to (d+.d+.d+.d+):(d+)' || ' ((w+)/(w+))', '{src_ip,src_port,dest_ip,dest_port,service,protocol}' );
  • 105. Postgres for Log Parsing Log Processing Apply all configured patterns to new rows CREATE FUNCTION process_log () RETURNS TRIGGER LANGUAGE PLPGSQL AS $_$ DECLARE m JSONB; p RECORD; BEGIN FOR p IN SELECT * FROM patterns LOOP m := JSONB_OBJECT(p.field_names, REGEXP_MATCHES(NEW.message, p.regex)); IF m IS NOT NULL THEN NEW.dimensions := NEW.dimensions || m END IF; END LOOP; RETURN NEW; END; $_$;
  • 106. Postgres for Log Parsing CREATE TRIGGER process_log_trigger BEFORE INSERT ON logs FOR EACH ROW EXECUTE PROCEDURE process_log (); Log Processing Trigger Apply patterns as messages and extend dimensions, as inserted into logs table
  • 107. Postgres for Log Parsing # INSERT INTO logs (timestamp, message, dimensions) VALUES ( '2017-01-03T06:29:09.043Z', 'Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)', '{"hostname": "dev-controller-0", "program": "haproxy"}'); # SELECT timestamp, message, JSONB_PRETTY(dimensions) FROM logs; -[ RECORD 1 ]+------------------------------------------------------------------ timestamp | 2017-01-03 06:29:09.043+00 message | Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP) jsonb_pretty | { + | "src_ip": "172.16.8.1", + | "dest_ip": "172.16.8.10", + | "program": "haproxy", + | "service": "keystone", + | "hostname": "dev-controller-0", + | "protocol": "HTTP", + | "src_port": "52690", + | "dest_port": "5000" + | }
  • 109. Requirements ● Offload data burden from producers ● Persist as soon as possible to avoid loss ● Handle high velocity burst loads ● Data does not need to be queryable Postgres for Queueing
  • 110. Postgres for Queueing WITH BINARY VARCHAR JSON JSONB Denormalised Normalised Summarised 0 50 100 150 200 250 300 350 400 Ingest Rate (1d / 45M rows) K-row/sec
  • 111. Postgres for Queueing WITH BINARY VARCHAR JSON JSONB Denormalised Normalised Summarised 0 50 100 150 200 250 300 350 400 Ingest Rate (1d / 45M rows) K-row/sec
  • 112. Postgres for Queueing WITH BINARY VARCHAR JSON JSONB Denormalised Normalised Summarised 0 50 100 150 200 250 300 350 400 Ingest Rate (1d / 45M rows) K-row/sec
  • 113. Postgres for Queueing WITH BINARY VARCHAR JSON JSONB Denormalised Normalised Summarised 0 50 100 150 200 250 300 350 400 Ingest Rate (1d / 45M rows) K-row/sec
  • 114. Postgres for Queueing WITH BINARY VARCHAR JSON JSONB Denormalised Normalised Summarised 0 50 100 150 200 250 300 350 400 Ingest Rate (1d / 45M rows) K-row/sec
  • 116. Conclusion… ? ● I view Postgres as a very flexible “data persistence toolbox” ● ...which happens to use SQL ● Batteries not always included ● That doesn’t mean it’s hard ● Operational advantages of using general purpose tools can be huge ● Use & deploy what you know & trust