SlideShare a Scribd company logo
1 of 85
Download to read offline
Let's Build a Complex, Real-Time
Data Management Application
JONATHAN S. KATZ
PGCONF.EU 2018
OCTOBER 25, 2018
...before the session ends!
About Crunchy Data
2
Market Leading Data Security
• Crunchy Certified PostgreSQL is open source and Common Criteria EAL 2+ Certified, with
essential security enhancements for enterprise deployment

• Author of the DISA Secure Technology Implementation Guide for PostgreSQL and co-author
of CIS PostgreSQL Benchmark. Move ATO from weeks to days!
Cloud Ready Data Management
• Open source, Kubernetes-based solutions proven to scale to 1000s of database instances

• Cloud-agnostic technology provide flexibility on how to deploy databases to public
clouds, private clouds, or on-premise technology
Leader in Open Source Enterprise PostgreSQL
• Developer of essential open source tools for high availability, disaster recovery, and and
monitoring for PostgreSQL

• Leading contributor and sponsor of features that enhance stability, security, and performance
of PostgreSQL
• Director of Communications, Crunchy Data

• Previously: Engineering leadership in startups

• Longtime PostgreSQL community contributor

• Advocacy & various committees for PGDG

• @postgresql + .org content

• Director, PgUS

• Co-Organizer, NYCPUG

• Conference organization + speaking

• @jkatz05
About Me
3
• This talk introduces many different tools and techniques available in
PostgreSQL for building applications

• Introduces different features and where to find out more information

• We have a lot of material to cover in a short time - the slides and
demonstrations will be made available
How to Approach This Talk
4
• Imagine we are managing the rooms at the Marriott Lisbon Hotel

• We have a set of operating hours in which the rooms can be booked

• Only one booking can occur in the room at a given time
The Problem
5
For example...
6
• We need to know...

• All the rooms that are available to book

• When the rooms are available to be booked (operating hours)

• When the rooms have been booked 

• And...

• The system needs to be able to CRUD fast

• (Create, Read, Update, Delete. Fast).
Application Requirements
7
8
🤔
First, let's talk about how we can find
availability
• Availability can be thought about in three ways:

• Closed

• Available

• Unavailable (or "booked")

• Our ultimate "calendar tuple" is (room, status, range)
Managing Availability
10
• PostgreSQL 9.2 introduced "range types" that included the ability to store
and efficiently search over ranges of data

• Built-in:

• Date, Timestamps

• Integer, Numeric

• Lookups (e.g. overlaps) can be sped up using GiST and SP-GiST indexes
PostgreSQL Range Types
11
SELECT
tstzrange('2018-10-26 09:30'::timestamptz, '2018-10-26 10:30'::timestamptz);
Availability
12
Availability
13
SELECT *
FROM (
VALUES
('closed', tstzrange('2018-10-26 0:00', '2018-10-26 8:00')),
('available', tstzrange('2018-10-26 08:00', '2018-10-26 09:30')),
('unavailable', tstzrange('2018-10-26 09:30', '2018-10-26 10:30')),
('available', tstzrange('2018-10-26 10:30', '2018-10-26 16:30')),
('unavailable', tstzrange('2018-10-26 16:30', '2018-10-26 18:30')),
('available', tstzrange('2018-10-26 18:30', '2018-10-26 20:00')),
('closed', tstzrange('2018-10-26 20:00', '2018-10-27 0:00'))
) x(status, calendar_range)
ORDER BY lower(x.calendar_range);
Easy, right?
• Insert new ranges and dividing them up

• PostgreSQL does not work well with discontiguous ranges (...yet)

• Availability

• Just for one day - what about other days?

• What happens with data in the past?

• What happens with data in the future?

• Unavailability

• Ensure no double-bookings

• Overlapping Events?

• Just one space
But...
15
availability_rule
id <serial> PRIMARY KEY
room_id <int> REFERENCES (room)
days_of_week <int[]>
start_time <time>
end_time <time>
generate_weeks_into_future <int>
DEFAULT 52
room
id <serial>
PRIMARY KEY
name <text>
availability
id <serial> PRIMARY KEY
room_id <int> REFERENCES
(room)
availability_rule_id <int>
REFERENCES (availabilityrule)
available_date <date>
available_range <tstzrange>
unavailability
id <serial> PRIMARY KEY
room_id <int> REFERENCES
(room)
unavailable_date <date>
unavailable_range <tstzrange>
calendar
id <serial> PRIMARY KEY
room_id <int> REFERENCES
(room)
status <text> DOMAIN:
{available, unavailable, closed}
calendar_date <date>
calendar_range <tstzrange>
Managing Availability
16
• We can now store data, but what about:

• Generating initial calendar?

• Generating availability based on rules?

• Generating unavailability?

• Sounds like we need to build an application
Managing Availability
17
• To build our application, there are a few topics we will need to explore first:

• generate_series

• Recursive queries

• SQL Functions

• Set returning functions

• PL/pgsql

• Triggers
Managing Availability
18
• Generate series is a "set returning" function, i.e. a function that can return
multiple rows of data

• Generate series can return:

• A set of numbers (int, bigint, numeric) either incremented by 1 or some
other integer interval

• A set of timestamps incremented by a time interval(!!)
generate_series: More than just generating test data
19
SELECT x::date
FROM generate_series(
'2018-01-01'::date, '2018-12-31'::date, '1 day'::interval
) x;
• PostgreSQL 8.4 introduced the "WITH" syntax and with it also introduced
the ability to perform recursive queries

• WITH RECURSIVE ... AS ()

• Base case vs. recursive case

• UNION vs. UNION ALL

• CAN HIT INFINITE LOOPS
Recursion in my SQL?
20
Recursion in my SQL?
21
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
)
SELECT fac.n, fac.i
FROM fac;
Nope
Recursion in my SQL?
22
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= 100
)
SELECT fac.n, fac.i
FROM fac;
Better
• PostgreSQL provides the ability to write functions to help encapsulate repeated
behavior

• PostgreSQL 11 introduces stored procedures which enables you to embed
transactions!

• SQL functions have many properties, including:

• Input / output

• Volatility (IMMUTABLE, STABLE, VOLATILE) (default VOLATILE)

• Parallel safety (default PARALLEL UNSAFE)

• LEAKPROOF; SECURITY DEFINER

• Execution Cost

• Language type (more on this later)
Functions
23
Functions
24
CREATE OR REPLACE FUNCTION pgconfeu_fac(n int)
RETURNS numeric
AS $$
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= $1
)
SELECT max(fac.n)
FROM fac;
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
Functions
25
CREATE OR REPLACE FUNCTION pgconfeu_fac_set(n int)
RETURNS SETOF numeric
AS $$
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= $1
)
SELECT fac.n
FROM fac
ORDER BY fac.n;
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
Functions
26
CREATE OR REPLACE FUNCTION pgopen_fac_table(n int)
RETURNS TABLE(n numeric)
AS $$
WITH RECURSIVE fac AS (
SELECT
1::numeric AS n,
1::numeric AS i
UNION
SELECT
fac.n * (fac.i + 1),
fac.i + 1 AS i
FROM fac
WHERE i + 1 <= $1
)
SELECT fac.n
FROM fac
ORDER BY fac.n;
$$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
• PostgreSQL has the ability to load in procedural languages and execute
code in them beyond SQL

• "PL"

• Built-in: pgSQL, Python, Perl, Tcl

• Others: Javascript, R, Java, C, JVM, Container, LOLCODE, Ruby, PHP,
Lua, pgPSM, Scheme
Procedural Languages
27
PL/pgSQL
28
CREATE EXTENSION IF NOT EXISTS plpgsql;
CREATE OR REPLACE FUNCTION pgopen_fac_plpgsql(n int)
RETURNS numeric
AS $$
DECLARE
fac numeric;
i int;
BEGIN
fac := 1;
FOR i IN 1..n LOOP
fac := fac * i;
END LOOP;
RETURN fac;
END;
$$ LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE;
• Triggers are functions that can be called before/after/instead of an operation or
event

• Data changes (INSERT/UPDATE/DELETE)

• Events (DDL, DCL, etc. changes)

• Atomic

• Must return "trigger" or "event_trigger"

• (Return "NULL" in a trigger if you want to skip operation)

• (Gotcha: RETURN OLD [INSERT] / RETURN NEW [DELETE])

• Execute once per modified row or once per SQL statement

• Multiple triggers on same event will execute in alphabetical order

• Writeable in any PL language that defined trigger interface
Triggers
29
Building a Synchronized System
We will scan through the application code.
It will be available for download later ;-)
The Test
• [Test your live demos before running them, and you will have much
success!]

• availability_rule inserts took some time, > 500ms

• availability: INSERT 52 

• calendar: INSERT 52 from nontrivial function

• Updates on individual availability / unavailability are not too painful

• Lookups are faaaaaaaast
Lessons of The Test
33
How about at (web) scale?
• Even with only 100 more rooms with a few set of rules, rule generation
time increased significantly

• Lookups are still lightning fast!
Web Scale :(
35
• Added in PostgreSQL 9.4

• Replays all logical changes made to the database

• Create a logical replication slot in your database

• Only one receiver can consume changes from one slot at a time

• Slot keeps track of last change that was read by a receiver

• If receiver disconnects, slot will ensure database holds changes until receiver reconnects

• Only changes from tables with primary keys are relayed
• As of PostgreSQL 10, you can set a "REPLICA IDENTITY" on a UNIQUE, NOT NULL,
non-deferrable, non-partial column(s)

• Basis for Logical Replication
Logical Decoding
36
• A logical replication slot has a name and an output plugin

• PostgreSQL comes with the "test" output plugin

• Have to write a custom parser to read changes from test output plugin

• Several output plugins and libraries available

• wal2json: https://github.com/eulerto/wal2json

• jsoncdc: https://github.com/posix4e/jsoncdc

• Debezium: http://debezium.io/

• (Test: https://www.postgresql.org/docs/11/static/test-decoding.html)

• Every change in the database is streamed

• Need to be aware of the logical decoding format
Logical Decoding Out of the Box
37
• C: libpq

• pg_recvlogical

• PostgreSQL functions

• Python: psycopg2 - version 2.7

• JDBC: version 42

• Go: go-pgx

• JavaScript: node-postgres (pg-logical-replication)
Driver Support
38
Using Logical Decoding
39
wal_level = logical
max_wal_senders = 2
max_replication_slots = 2
postgresql.conf
local replication jkatz trust
pg_hba.conf
# DEVELOPMENT ONLY
SELECT *
FROM pg_create_logical_replication_slot('schedule', 'wal2json');
In the database:
• We know it takes time to regenerate calendar

• Want to ensure changes always propagate but want to ensure all users
(managers, calendar searchers) have good experience
Thoughts
40
🤔
• Will use the same data model as before as well as the same helper
functions, but without the triggers

• (That's a lie, we will have one set of DELETE triggers as "DELETE" in
the wal2json output plugin currently does not provide enough
information)
Replacing Triggers
41
Replacing Triggers
42
/**
* Helper function: substitute the data within the `calendar`; this can be used
* for all updates that occur on `availability` and `unavailability`
*/
CREATE OR REPLACE FUNCTION calendar_manage(room_id int, calendar_date date)
RETURNS void
AS $$
WITH delete_calendar AS (
DELETE FROM calendar
WHERE
room_id = $1 AND
calendar_date = $2
)
INSERT INTO calendar (room_id, status, calendar_date, calendar_range)
SELECT $1, c.status, $2, c.calendar_range
FROM calendar_generate_calendar($1, tstzrange($2, $2 + 1)) c
$$ LANGUAGE SQL;
Replacing Triggers
43
/** Now, the trigger functions for availability and unavailability; needs this for DELETE */
CREATE OR REPLACE FUNCTION availability_manage()
RETURNS trigger
AS $trigger$
BEGIN
IF TG_OP = 'DELETE' THEN
PERFORM calendar_manage(OLD.room_id, OLD.available_date);
RETURN OLD;
END IF;
END;
$trigger$
LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION unavailability_manage()
RETURNS trigger
AS $trigger$
BEGIN
IF TG_OP = 'DELETE' THEN
PERFORM calendar_manage(OLD.room_id, OLD.unavailable_date);
RETURN OLD;
END IF;
END;
$trigger$
LANGUAGE plpgsql;
/** And the triggers, applied to everything */
CREATE TRIGGER availability_manage
AFTER DELETE ON availability
FOR EACH ROW
EXECUTE PROCEDURE availability_manage();
CREATE TRIGGER unavailability_manage
AFTER DELETE ON unavailability
FOR EACH ROW
EXECUTE PROCEDURE unavailability_manage();
• We will have a Python script that reads from a logical replication slot and if
it detects a relevant change, take an action

• Similar to what we did with triggers, but this moves the work to OUTSIDE
the transaction

• BUT...we can confirm whether or not the work is completed, thus if the
program fails, we can restart from last acknowledged transaction ID
Replacing Triggers
44
Reading the Changes
45
import json
import sys
import psycopg2
import psycopg2.extras
SQL = {
'availability': {
'insert': """SELECT calendar_manage(%(room_id)s, %(available_date)s)""",
'update': """SELECT calendar_manage(%(room_id)s, %(available_date)s)""",
},
'availability_rule': {
'insert': True,
'update': True,
},
'room': {
'insert': """
INSERT INTO calendar (room_id, status, calendar_date, calendar_range)
SELECT
%(id)s, 'closed', calendar_date, tstzrange(calendar_date, calendar_date + '1 day'::interval)
FROM generate_series(
date_trunc('week', CURRENT_DATE),
date_trunc('week', CURRENT_DATE + '52 weeks'::interval),
'1 day'::interval
) calendar_date;
""",
},
'unavailability': {
'insert': """SELECT calendar_manage(%(room_id)s, %(unavailable_date)s)""",
'update': """SELECT calendar_manage(%(room_id)s, %(unavailable_date)s)""",
},
}
Reading the Changes
46
class StreamReader(object):
def _consume_change(self, payload):
connection = psycopg2.connect("dbname=realtime")
cursor = connection.cursor()
for data in payload['change']:
sql = SQL.get(data.get('table'), {}).get(data.get('kind'))
if not sql:
return
params = dict(zip(data['columnnames'], data['columnvalues']))
if data['table'] == 'availability_rule':
self._perform_availability_rule(cursor, data['kind'], params)
else:
cursor.execute(sql, params)
connection.commit()
cursor.close()
connection.close()
def _perform_availability_rule(self, cursor, kind, params):
if kind == 'update':
cursor.execute("""DELETE FROM availability WHERE availability_rule_id = %(id)s""", params)
if kind in ['insert', 'update']:
days_of_week = params['days_of_week'].replace('{', '').replace('}', '').split(',')
for day_of_week in days_of_week:
params['day_of_week'] = day_of_week
cursor.execute(
"""
SELECT availability_rule_bulk_insert(ar, %(day_of_week)s)
FROM availability_rule ar
WHERE ar.id = %(id)s
""", params)
Reading the Changes
47
def __init__(self):
self.connection = psycopg2.connect("dbname=schedule",
connection_factory=psycopg2.extras.LogicalReplicationConnection,
)
def __call__(self, msg):
payload = json.loads(msg.payload, strict=False)
print(payload)
self._consume_change(payload)
msg.cursor.send_feedback(flush_lsn=msg.data_start)
Reading the Changes
48
reader = StreamReader()
cursor = reader.connection.cursor()
cursor.start_replication(slot_name='schedule', decode=True)
try:
cursor.consume_stream(reader)
except KeyboardInterrupt:
print("Stopping reader...")
finally:
cursor.close()
reader.connection.close()
print("Exiting reader")
• A consumer of the logical stream can only read one change at a time

• If our processing of a change takes a lot of time, it will create a backlog
of changes

• Backlog means the PostgreSQL server needs to retain more WAL logs

• Retaining too many WAL logs can lead to running out of disk space

• Running out of disk space can lead to...rough times.
The Consumer Bottleneck
49
🌩
🌤
🌥
☁
Can we move any processing to a
separate part of the application?
• Can utilize a durable message queueing system to store any WAL changes
that are necessary to perform post-processing on

• Ensure the changes are worked on in order

• "Divide-and-conquer" workload - have multiple workers acting on
different "topics"

• Remove WAL bloat
Shifting the Workload
51
• Durable message processing and distribution system

• Streams

• Supports parallelization of consumers

• Multiple consumers, partitions

• Highly-available, distributed architecture

• Acknowledgement of receiving, processing messages; can replay (sounds
like WAL?)
Apache Kafka
52
Architecture
53
WAL Consumer
54
import json, sys
from kafka import KafkaProducer
from kafka.errors import KafkaError
import psycopg2
import psycopg2.extras
TABLES = set([
'availability', 'availability_rule', 'room', 'unavailability',
])
reader = WALConsumer()
cursor = reader.connection.cursor()
cursor.start_replication(slot_name='schedule', decode=True)
try:
cursor.consume_stream(reader)
except KeyboardInterrupt:
print("Stopping reader...")
finally:
cursor.close()
reader.connection.close()
print("Exiting reader")
WAL Consumer
55
class WALConsumer(object):
def __init__(self):
self.connection = psycopg2.connect("dbname=realtime",
connection_factory=psycopg2.extras.LogicalReplicationConnection,
)
self.producer = producer = KafkaProducer(
bootstrap_servers=['localhost:9092'],
value_serializer=lambda m: json.dumps(m).encode('ascii'),
)
def __call__(self, msg):
payload = json.loads(msg.payload, strict=False)
print(payload)
# determine if the payload should be passed on to a consumer listening
# to the Kafka que
for data in payload['change']:
if data.get('table') in TABLES:
self.producer.send(data.get('table'), data)
# ensure everything is sent; call flush at this point
self.producer.flush()
# acknowledge that the change has been read - tells PostgreSQL to stop
# holding onto this log file
msg.cursor.send_feedback(flush_lsn=msg.data_start)
Kafka Consumer
56
import json
from kafka import KafkaConsumer
from kafka.structs import OffsetAndMetadata, TopicPartition
import psycopg2
class Worker(object):
"""Base class to work perform any post processing on changes"""
OPERATIONS = set([]) # override with "insert", "update", "delete"
def __init__(self, topic):
# connect to the PostgreSQL database
self.connection = psycopg2.connect("dbname=realtime")
# connect to Kafka
self.consumer = KafkaConsumer(
bootstrap_servers=['localhost:9092'],
value_deserializer=lambda m: json.loads(m.decode('utf8')),
auto_offset_reset="earliest",
group_id='1')
# subscribe to the topic(s)
self.consumer.subscribe(topic if isinstance(topic, list) else [topic])
Kafka Consumer
57
def run(self):
"""Function that runs ad-infinitum"""
# loop through the payloads from the consumer
# determine if there are any follow-up actions based on the kind of
# operation, and if so, act upon it
# always commit when done.
for msg in self.consumer:
print(msg)
# load the data from the message
data = msg.value
# determine if there are any follow-up operations to perform
if data['kind'] in self.OPERATIONS:
# open up a cursor for interacting with PostgreSQL
cursor = self.connection.cursor()
# put the parameters in an easy to digest format
params = dict(zip(data['columnnames'], data['columnvalues']))
# all the function
getattr(self, data['kind'])(cursor, params)
# commit any work that has been done, and close the cursor
self.connection.commit()
cursor.close()
# acknowledge the message has been handled
tp = TopicPartition(msg.topic, msg.partition)
offsets = {tp: OffsetAndMetadata(msg.offset, None)}
self.consumer.commit(offsets=offsets)
Kafka Consumer
58
# override with the appropriate post-processing code
def insert(self, cursor, params):
"""Override with any post-processing to be done on an ``INSERT``"""
raise NotImplementedError()
def update(self, cursor, params):
"""Override with any post-processing to be done on an ``UPDATE``"""
raise NotImplementedError()
def delete(self, cursor, params):
"""Override with any post-processing to be done on an ``DELETE``"""
raise NotImplementedError()
Testing Our Application
• Logical decoding allows the bulk inserts to occur significantly faster from a
transactional view

• DELETEs are tricky if you need to do anything other than using the
PRIMARY KEY

• Can bucket changes by topic

• Potential bottleneck for long running execution, but bottlenecks are
isolated to specific queues
Lessons
60
Conclusion
61
• PostgreSQL is robust

• Triggers will keep your data in sync but can have
significant performance overhead

• Utilizing a logical replication slot can eliminate trigger
overhead and transfer the computational load
elsewhere

• Not a panacea: still need to use good architectural
patterns!
jonathan.katz@crunchydata.com
@jkatz05
Thank You! Questions?
Appendix
Appendix A: Schema for Example
Managing Availability
65
CREATE TABLE room (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE availability_rule (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE,
days_of_week int[] NOT NULL,
start_time time NOT NULL,
end_time time NOT NULL,
generate_weeks_into_future int NOT NULL DEFAULT 52
);
Managing Availability
66
CREATE TABLE availability (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE,
availability_rule_id int NOT NULL
REFERENCES availability_rule (id) ON DELETE CASCADE,
available_date date NOT NULL,
available_range tstzrange NOT NULL
);
CREATE INDEX availability_available_range_gist_idx
ON availability
USING gist(available_range);
Managing Availability
67
CREATE TABLE unavailability (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE,
unavailable_date date NOT NULL,
unavailable_range tstzrange NOT NULL
);
CREATE INDEX unavailability_unavailable_range_gist_idx
ON unavailability
USING gist(unavailable_range);
Managing Availability
68
CREATE TABLE calendar (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE,
status text NOT NULL,
calendar_date date NOT NULL,
calendar_range tstzrange NOT NULL
);
CREATE INDEX calendar_room_id_calendar_date_idx
ON calendar (room_id, calendar_date);
Appendix B:
Finding Availability for a Room
70
/** AVAILABILITY, UNAVAILABILITY, and CALENDAR */
/** We need some lengthy functions to help generate the calendar */
71
/** Helper function: generate the available chunks of time within a block of time for a day within a calendar */
CREATE OR REPLACE FUNCTION calendar_generate_available(room_id int, calendar_range tstzrange)
RETURNS TABLE(status text, calendar_range tstzrange)
AS $$
WITH RECURSIVE availables AS (
SELECT
'closed' AS left_status,
CASE
WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1 day'::interval)
ELSE
tstzrange(
calendar_date,
lower(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval))
)
END AS left_range,
CASE isempty(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval))
WHEN TRUE THEN 'closed'
ELSE 'available'
END AS center_status,
availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval) AS center_range,
'closed' AS right_status,
CASE
WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1 day'::interval)
ELSE
tstzrange(
upper(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)),
calendar_date + '1 day'::interval
)
END AS right_range
FROM generate_series(lower($2), upper($2), '1 day'::interval) AS calendar_date
LEFT OUTER JOIN availability ON
availability.room_id = $1 AND
availability.available_range && $2
UNION
SELECT
'closed' AS left_status,
CASE
WHEN availability.available_range && availables.left_range THEN
tstzrange(
lower(availables.left_range),
lower(availables.left_range * availability.available_range)
)
ELSE
tstzrange(
lower(availables.right_range),
lower(availables.right_range * availability.available_range)
)
END AS left_range,
CASE
WHEN
availability.available_range && availables.left_range OR
availability.available_range && availables.right_range
THEN 'available'
ELSE 'closed'
END AS center_status,
CASE
WHEN availability.available_range && availables.left_range THEN
availability.available_range * availables.left_range
ELSE
availability.available_range * availables.right_range
END AS center_range,
'closed' AS right_status,
CASE
WHEN availability.available_range && availables.left_range THEN
tstzrange(
upper(availables.left_range * availability.available_range),
upper(availables.left_range)
)
ELSE
tstzrange(
upper(availables.right_range * availability.available_range),
upper(availables.right_range)
)
END AS right_range
FROM availables
JOIN availability ON
availability.room_id = $1 AND
availability.available_range && $2 AND
availability.available_range <> availables.center_range AND (
availability.available_range && availables.left_range OR
availability.available_range && availables.right_range
)
)
SELECT *
FROM (
SELECT
x.left_status AS status,
x.left_range AS calendar_range
FROM availables x
LEFT OUTER JOIN availables y ON
x.left_range <> y.left_range AND
x.left_range @> y.left_range
GROUP BY 1, 2
HAVING NOT bool_or(COALESCE(x.left_range @> y.left_range, FALSE))
UNION
SELECT DISTINCT
x.center_status AS status,
x.center_range AS calendar_range
FROM availables x
UNION
SELECT
x.right_status AS status,
x.right_range AS calendar_range
FROM availables x
LEFT OUTER JOIN availables y ON
x.right_range <> y.right_range AND
x.right_range @> y.right_range
GROUP BY 1, 2
HAVING NOT bool_or(COALESCE(x.right_range @> y.right_range, FALSE))
) x
WHERE
NOT isempty(x.calendar_range) AND
NOT lower_inf(x.calendar_range) AND
NOT upper_inf(x.calendar_range) AND
x.calendar_range <@ $2
$$ LANGUAGE SQL STABLE;
This is the first of two

helpers functions...
• We will have two availability rules:

• Open every day 8am - 8pm

• Open every day 9pm - 10:30pm
For this experiment
72
73
INSERT INTO room (name) VALUES ('Test Room');
INSERT INTO availability_rule
(room_id, days_of_week, start_time, end_time)
VALUES
(1, ARRAY[1,2,3,4,5,6,7], '08:00', '20:00'),
(1, ARRAY[1,2,3,4,5,6,7], '21:00', '22:30');
74
/** Helper function: generate the available chunks of time within a
block of time for a day within a calendar */
CREATE OR REPLACE FUNCTION calendar_generate_available(room_id int,
calendar_range tstzrange)
RETURNS TABLE(status text, calendar_range tstzrange)
AS $$
75
WITH RECURSIVE availables AS (
SELECT
'closed' AS left_status,
CASE
WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1
day'::interval)
ELSE
tstzrange(
calendar_date,
lower(availability.available_range * tstzrange(calendar_date, calendar_date +
'1 day'::interval))
)
END AS left_range,
CASE isempty(availability.available_range * tstzrange(calendar_date, calendar_date + '1
day'::interval))
WHEN TRUE THEN 'closed'
ELSE 'available'
END AS center_status,
availability.available_range * tstzrange(calendar_date, calendar_date + '1
day'::interval) AS center_range,
'closed' AS right_status,
CASE
WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1
day'::interval)
ELSE
tstzrange(
upper(availability.available_range * tstzrange(calendar_date, calendar_date +
'1 day'::interval)),
calendar_date + '1 day'::interval
)
END AS right_range
FROM generate_series(lower($2), upper($2), '1 day'::interval) AS calendar_date
LEFT OUTER JOIN availability ON
availability.room_id = $1 AND
availability.available_range && $2
76
77
UNION
SELECT
'closed' AS left_status,
CASE
WHEN availability.available_range && availables.left_range THEN
tstzrange(
lower(availables.left_range),
lower(availables.left_range * availability.available_range)
)
ELSE
tstzrange(
lower(availables.right_range),
lower(availables.right_range * availability.available_range)
)
END AS left_range,
CASE
WHEN
availability.available_range && availables.left_range OR
availability.available_range && availables.right_range
THEN 'available'
ELSE 'closed'
END AS center_status,
CASE
WHEN availability.available_range && availables.left_range THEN
availability.available_range * availables.left_range
ELSE
availability.available_range * availables.right_range
END AS center_range,
'closed' AS right_status,
CASE
WHEN availability.available_range && availables.left_range THEN
tstzrange(
upper(availables.left_range * availability.available_range),
upper(availables.left_range)
)
ELSE
tstzrange(
upper(availables.right_range * availability.available_range),
upper(availables.right_range)
)
END AS right_range
FROM availables
JOIN availability ON
availability.room_id = $1 AND
availability.available_range && $2 AND
availability.available_range <> availables.center_range AND (
availability.available_range && availables.left_range OR
availability.available_range && availables.right_range
))
78
UNION
SELECT
...
FROM availables
JOIN availability ON
availability.room_id = $1 AND
availability.available_range && $2 AND
availability.available_range <> availables.center_range AND (
availability.available_range && availables.left_range OR
availability.available_range && availables.right_range
)
79
80
'closed' AS left_status,
CASE
WHEN availability.available_range && availables.left_range THEN
tstzrange(
lower(availables.left_range),
lower(availables.left_range * availability.available_range)
)
ELSE
tstzrange(
lower(availables.right_range),
lower(availables.right_range * availability.available_range)
)
END AS left_range,
CASE
WHEN
availability.available_range && availables.left_range OR
availability.available_range && availables.right_range
THEN 'available'
ELSE 'closed'
END AS center_status,
CASE
WHEN availability.available_range && availables.left_range THEN
availability.available_range * availables.left_range
ELSE
availability.available_range * availables.right_range
END AS center_range,
'closed' AS right_status,
CASE
WHEN availability.available_range && availables.left_range THEN
tstzrange(
upper(availables.left_range * availability.available_range),
upper(availables.left_range)
)
ELSE
tstzrange(
upper(availables.right_range * availability.available_range),
upper(availables.right_range)
)
END AS right_range
81
82
83
SELECT *
FROM (
SELECT
x.left_status AS status,
x.left_range AS calendar_range
FROM availables x
LEFT OUTER JOIN availables y ON
x.left_range <> y.left_range AND
x.left_range @> y.left_range
GROUP BY 1, 2
HAVING NOT bool_or(COALESCE(x.left_range @> y.left_range, FALSE))
UNION
SELECT DISTINCT
x.center_status AS status,
x.center_range AS calendar_range
FROM availables x
UNION
SELECT
x.right_status AS status,
x.right_range AS calendar_range
FROM availables x
LEFT OUTER JOIN availables y ON
x.right_range <> y.right_range AND
x.right_range @> y.right_range
GROUP BY 1, 2
HAVING NOT bool_or(COALESCE(x.right_range @> y.right_range, FALSE))
) x
WHERE
NOT isempty(x.calendar_range) AND
NOT lower_inf(x.calendar_range) AND
NOT upper_inf(x.calendar_range) AND
x.calendar_range <@ $2
$$ LANGUAGE SQL STABLE;
84
X
X
X
X
X
X X
85

More Related Content

What's hot

Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Jonathan Katz
 
Triton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaTriton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaWei-Bo Chen
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesDatabricks
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleMariaDB plc
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Jaime Crespo
 
Version Control & Git
Version Control & GitVersion Control & Git
Version Control & GitJason Byrne
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best PracticesAmazon Web Services
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
지금이라도 알게되어 다행인, 새해 계획 잘 세우는 법
지금이라도 알게되어 다행인, 새해 계획 잘 세우는 법지금이라도 알게되어 다행인, 새해 계획 잘 세우는 법
지금이라도 알게되어 다행인, 새해 계획 잘 세우는 법Tony Kim
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
 
Trace memory leak with gdb (GDB로 메모리 누수 찾기)
Trace memory  leak with gdb (GDB로 메모리 누수 찾기)Trace memory  leak with gdb (GDB로 메모리 누수 찾기)
Trace memory leak with gdb (GDB로 메모리 누수 찾기)Jun Hong Kim
 
Gitlab Training with GIT and SourceTree
Gitlab Training with GIT and SourceTreeGitlab Training with GIT and SourceTree
Gitlab Training with GIT and SourceTreeTeerapat Khunpech
 
Distributed Ruby and Rails
Distributed Ruby and RailsDistributed Ruby and Rails
Distributed Ruby and RailsWen-Tien Chang
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayAltinity Ltd
 

What's hot (20)

Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15
 
Triton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON ChinaTriton and Symbolic execution on GDB@DEF CON China
Triton and Symbolic execution on GDB@DEF CON China
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
Introduction to Robot Framework
Introduction to Robot FrameworkIntroduction to Robot Framework
Introduction to Robot Framework
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
 
Version Control & Git
Version Control & GitVersion Control & Git
Version Control & Git
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Git 101
Git 101Git 101
Git 101
 
지금이라도 알게되어 다행인, 새해 계획 잘 세우는 법
지금이라도 알게되어 다행인, 새해 계획 잘 세우는 법지금이라도 알게되어 다행인, 새해 계획 잘 세우는 법
지금이라도 알게되어 다행인, 새해 계획 잘 세우는 법
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
Trace memory leak with gdb (GDB로 메모리 누수 찾기)
Trace memory  leak with gdb (GDB로 메모리 누수 찾기)Trace memory  leak with gdb (GDB로 메모리 누수 찾기)
Trace memory leak with gdb (GDB로 메모리 누수 찾기)
 
Gitlab Training with GIT and SourceTree
Gitlab Training with GIT and SourceTreeGitlab Training with GIT and SourceTree
Gitlab Training with GIT and SourceTree
 
Distributed Ruby and Rails
Distributed Ruby and RailsDistributed Ruby and Rails
Distributed Ruby and Rails
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 
Introduction to Git
Introduction to GitIntroduction to Git
Introduction to Git
 

Similar to Build a Complex Data App in PostgreSQL

Scheduling in Linux and Web Servers
Scheduling in Linux and Web ServersScheduling in Linux and Web Servers
Scheduling in Linux and Web ServersDavid Evans
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at ParseTravis Redman
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at ParseMongoDB
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and OptimizationMongoDB
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2Thomas Segismont
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsZohar Elkayam
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
PostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingPostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingGrant Fritchey
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastoreTomas Sirny
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterDatabricks
 

Similar to Build a Complex Data App in PostgreSQL (20)

Scheduling in Linux and Web Servers
Scheduling in Linux and Web ServersScheduling in Linux and Web Servers
Scheduling in Linux and Web Servers
 
Spark etl
Spark etlSpark etl
Spark etl
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Data herding
Data herdingData herding
Data herding
 
Data herding
Data herdingData herding
Data herding
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at Parse
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
PostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingPostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and Alerting
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 

More from Jonathan Katz

Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Jonathan Katz
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLJonathan Katz
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!Jonathan Katz
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMJonathan Katz
 
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMJonathan Katz
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesJonathan Katz
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWJonathan Katz
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesJonathan Katz
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesJonathan Katz
 

More from Jonathan Katz (13)

Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAM
 
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Build a Complex Data App in PostgreSQL

  • 1. Let's Build a Complex, Real-Time Data Management Application JONATHAN S. KATZ PGCONF.EU 2018 OCTOBER 25, 2018 ...before the session ends!
  • 2. About Crunchy Data 2 Market Leading Data Security • Crunchy Certified PostgreSQL is open source and Common Criteria EAL 2+ Certified, with essential security enhancements for enterprise deployment • Author of the DISA Secure Technology Implementation Guide for PostgreSQL and co-author of CIS PostgreSQL Benchmark. Move ATO from weeks to days! Cloud Ready Data Management • Open source, Kubernetes-based solutions proven to scale to 1000s of database instances • Cloud-agnostic technology provide flexibility on how to deploy databases to public clouds, private clouds, or on-premise technology Leader in Open Source Enterprise PostgreSQL • Developer of essential open source tools for high availability, disaster recovery, and and monitoring for PostgreSQL • Leading contributor and sponsor of features that enhance stability, security, and performance of PostgreSQL
  • 3. • Director of Communications, Crunchy Data • Previously: Engineering leadership in startups • Longtime PostgreSQL community contributor • Advocacy & various committees for PGDG • @postgresql + .org content • Director, PgUS • Co-Organizer, NYCPUG • Conference organization + speaking • @jkatz05 About Me 3
  • 4. • This talk introduces many different tools and techniques available in PostgreSQL for building applications • Introduces different features and where to find out more information • We have a lot of material to cover in a short time - the slides and demonstrations will be made available How to Approach This Talk 4
  • 5. • Imagine we are managing the rooms at the Marriott Lisbon Hotel • We have a set of operating hours in which the rooms can be booked • Only one booking can occur in the room at a given time The Problem 5
  • 7. • We need to know... • All the rooms that are available to book • When the rooms are available to be booked (operating hours) • When the rooms have been booked • And... • The system needs to be able to CRUD fast • (Create, Read, Update, Delete. Fast). Application Requirements 7
  • 9. First, let's talk about how we can find availability
  • 10. • Availability can be thought about in three ways: • Closed • Available • Unavailable (or "booked") • Our ultimate "calendar tuple" is (room, status, range) Managing Availability 10
  • 11. • PostgreSQL 9.2 introduced "range types" that included the ability to store and efficiently search over ranges of data • Built-in: • Date, Timestamps • Integer, Numeric • Lookups (e.g. overlaps) can be sped up using GiST and SP-GiST indexes PostgreSQL Range Types 11 SELECT tstzrange('2018-10-26 09:30'::timestamptz, '2018-10-26 10:30'::timestamptz);
  • 13. Availability 13 SELECT * FROM ( VALUES ('closed', tstzrange('2018-10-26 0:00', '2018-10-26 8:00')), ('available', tstzrange('2018-10-26 08:00', '2018-10-26 09:30')), ('unavailable', tstzrange('2018-10-26 09:30', '2018-10-26 10:30')), ('available', tstzrange('2018-10-26 10:30', '2018-10-26 16:30')), ('unavailable', tstzrange('2018-10-26 16:30', '2018-10-26 18:30')), ('available', tstzrange('2018-10-26 18:30', '2018-10-26 20:00')), ('closed', tstzrange('2018-10-26 20:00', '2018-10-27 0:00')) ) x(status, calendar_range) ORDER BY lower(x.calendar_range);
  • 15. • Insert new ranges and dividing them up • PostgreSQL does not work well with discontiguous ranges (...yet) • Availability • Just for one day - what about other days? • What happens with data in the past? • What happens with data in the future? • Unavailability • Ensure no double-bookings • Overlapping Events? • Just one space But... 15
  • 16. availability_rule id <serial> PRIMARY KEY room_id <int> REFERENCES (room) days_of_week <int[]> start_time <time> end_time <time> generate_weeks_into_future <int> DEFAULT 52 room id <serial> PRIMARY KEY name <text> availability id <serial> PRIMARY KEY room_id <int> REFERENCES (room) availability_rule_id <int> REFERENCES (availabilityrule) available_date <date> available_range <tstzrange> unavailability id <serial> PRIMARY KEY room_id <int> REFERENCES (room) unavailable_date <date> unavailable_range <tstzrange> calendar id <serial> PRIMARY KEY room_id <int> REFERENCES (room) status <text> DOMAIN: {available, unavailable, closed} calendar_date <date> calendar_range <tstzrange> Managing Availability 16
  • 17. • We can now store data, but what about: • Generating initial calendar? • Generating availability based on rules? • Generating unavailability? • Sounds like we need to build an application Managing Availability 17
  • 18. • To build our application, there are a few topics we will need to explore first: • generate_series • Recursive queries • SQL Functions • Set returning functions • PL/pgsql • Triggers Managing Availability 18
  • 19. • Generate series is a "set returning" function, i.e. a function that can return multiple rows of data • Generate series can return: • A set of numbers (int, bigint, numeric) either incremented by 1 or some other integer interval • A set of timestamps incremented by a time interval(!!) generate_series: More than just generating test data 19 SELECT x::date FROM generate_series( '2018-01-01'::date, '2018-12-31'::date, '1 day'::interval ) x;
  • 20. • PostgreSQL 8.4 introduced the "WITH" syntax and with it also introduced the ability to perform recursive queries • WITH RECURSIVE ... AS () • Base case vs. recursive case • UNION vs. UNION ALL • CAN HIT INFINITE LOOPS Recursion in my SQL? 20
  • 21. Recursion in my SQL? 21 WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac ) SELECT fac.n, fac.i FROM fac; Nope
  • 22. Recursion in my SQL? 22 WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= 100 ) SELECT fac.n, fac.i FROM fac; Better
  • 23. • PostgreSQL provides the ability to write functions to help encapsulate repeated behavior • PostgreSQL 11 introduces stored procedures which enables you to embed transactions! • SQL functions have many properties, including: • Input / output • Volatility (IMMUTABLE, STABLE, VOLATILE) (default VOLATILE) • Parallel safety (default PARALLEL UNSAFE) • LEAKPROOF; SECURITY DEFINER • Execution Cost • Language type (more on this later) Functions 23
  • 24. Functions 24 CREATE OR REPLACE FUNCTION pgconfeu_fac(n int) RETURNS numeric AS $$ WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= $1 ) SELECT max(fac.n) FROM fac; $$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
  • 25. Functions 25 CREATE OR REPLACE FUNCTION pgconfeu_fac_set(n int) RETURNS SETOF numeric AS $$ WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= $1 ) SELECT fac.n FROM fac ORDER BY fac.n; $$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
  • 26. Functions 26 CREATE OR REPLACE FUNCTION pgopen_fac_table(n int) RETURNS TABLE(n numeric) AS $$ WITH RECURSIVE fac AS ( SELECT 1::numeric AS n, 1::numeric AS i UNION SELECT fac.n * (fac.i + 1), fac.i + 1 AS i FROM fac WHERE i + 1 <= $1 ) SELECT fac.n FROM fac ORDER BY fac.n; $$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;
  • 27. • PostgreSQL has the ability to load in procedural languages and execute code in them beyond SQL • "PL" • Built-in: pgSQL, Python, Perl, Tcl • Others: Javascript, R, Java, C, JVM, Container, LOLCODE, Ruby, PHP, Lua, pgPSM, Scheme Procedural Languages 27
  • 28. PL/pgSQL 28 CREATE EXTENSION IF NOT EXISTS plpgsql; CREATE OR REPLACE FUNCTION pgopen_fac_plpgsql(n int) RETURNS numeric AS $$ DECLARE fac numeric; i int; BEGIN fac := 1; FOR i IN 1..n LOOP fac := fac * i; END LOOP; RETURN fac; END; $$ LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE;
  • 29. • Triggers are functions that can be called before/after/instead of an operation or event • Data changes (INSERT/UPDATE/DELETE) • Events (DDL, DCL, etc. changes) • Atomic • Must return "trigger" or "event_trigger" • (Return "NULL" in a trigger if you want to skip operation) • (Gotcha: RETURN OLD [INSERT] / RETURN NEW [DELETE]) • Execute once per modified row or once per SQL statement • Multiple triggers on same event will execute in alphabetical order • Writeable in any PL language that defined trigger interface Triggers 29
  • 31. We will scan through the application code. It will be available for download later ;-)
  • 33. • [Test your live demos before running them, and you will have much success!] • availability_rule inserts took some time, > 500ms • availability: INSERT 52 • calendar: INSERT 52 from nontrivial function • Updates on individual availability / unavailability are not too painful • Lookups are faaaaaaaast Lessons of The Test 33
  • 34. How about at (web) scale?
  • 35. • Even with only 100 more rooms with a few set of rules, rule generation time increased significantly • Lookups are still lightning fast! Web Scale :( 35
  • 36. • Added in PostgreSQL 9.4 • Replays all logical changes made to the database • Create a logical replication slot in your database • Only one receiver can consume changes from one slot at a time • Slot keeps track of last change that was read by a receiver • If receiver disconnects, slot will ensure database holds changes until receiver reconnects • Only changes from tables with primary keys are relayed • As of PostgreSQL 10, you can set a "REPLICA IDENTITY" on a UNIQUE, NOT NULL, non-deferrable, non-partial column(s) • Basis for Logical Replication Logical Decoding 36
  • 37. • A logical replication slot has a name and an output plugin • PostgreSQL comes with the "test" output plugin • Have to write a custom parser to read changes from test output plugin • Several output plugins and libraries available • wal2json: https://github.com/eulerto/wal2json • jsoncdc: https://github.com/posix4e/jsoncdc • Debezium: http://debezium.io/ • (Test: https://www.postgresql.org/docs/11/static/test-decoding.html) • Every change in the database is streamed • Need to be aware of the logical decoding format Logical Decoding Out of the Box 37
  • 38. • C: libpq • pg_recvlogical • PostgreSQL functions • Python: psycopg2 - version 2.7 • JDBC: version 42 • Go: go-pgx • JavaScript: node-postgres (pg-logical-replication) Driver Support 38
  • 39. Using Logical Decoding 39 wal_level = logical max_wal_senders = 2 max_replication_slots = 2 postgresql.conf local replication jkatz trust pg_hba.conf # DEVELOPMENT ONLY SELECT * FROM pg_create_logical_replication_slot('schedule', 'wal2json'); In the database:
  • 40. • We know it takes time to regenerate calendar • Want to ensure changes always propagate but want to ensure all users (managers, calendar searchers) have good experience Thoughts 40 🤔
  • 41. • Will use the same data model as before as well as the same helper functions, but without the triggers • (That's a lie, we will have one set of DELETE triggers as "DELETE" in the wal2json output plugin currently does not provide enough information) Replacing Triggers 41
  • 42. Replacing Triggers 42 /** * Helper function: substitute the data within the `calendar`; this can be used * for all updates that occur on `availability` and `unavailability` */ CREATE OR REPLACE FUNCTION calendar_manage(room_id int, calendar_date date) RETURNS void AS $$ WITH delete_calendar AS ( DELETE FROM calendar WHERE room_id = $1 AND calendar_date = $2 ) INSERT INTO calendar (room_id, status, calendar_date, calendar_range) SELECT $1, c.status, $2, c.calendar_range FROM calendar_generate_calendar($1, tstzrange($2, $2 + 1)) c $$ LANGUAGE SQL;
  • 43. Replacing Triggers 43 /** Now, the trigger functions for availability and unavailability; needs this for DELETE */ CREATE OR REPLACE FUNCTION availability_manage() RETURNS trigger AS $trigger$ BEGIN IF TG_OP = 'DELETE' THEN PERFORM calendar_manage(OLD.room_id, OLD.available_date); RETURN OLD; END IF; END; $trigger$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION unavailability_manage() RETURNS trigger AS $trigger$ BEGIN IF TG_OP = 'DELETE' THEN PERFORM calendar_manage(OLD.room_id, OLD.unavailable_date); RETURN OLD; END IF; END; $trigger$ LANGUAGE plpgsql; /** And the triggers, applied to everything */ CREATE TRIGGER availability_manage AFTER DELETE ON availability FOR EACH ROW EXECUTE PROCEDURE availability_manage(); CREATE TRIGGER unavailability_manage AFTER DELETE ON unavailability FOR EACH ROW EXECUTE PROCEDURE unavailability_manage();
  • 44. • We will have a Python script that reads from a logical replication slot and if it detects a relevant change, take an action • Similar to what we did with triggers, but this moves the work to OUTSIDE the transaction • BUT...we can confirm whether or not the work is completed, thus if the program fails, we can restart from last acknowledged transaction ID Replacing Triggers 44
  • 45. Reading the Changes 45 import json import sys import psycopg2 import psycopg2.extras SQL = { 'availability': { 'insert': """SELECT calendar_manage(%(room_id)s, %(available_date)s)""", 'update': """SELECT calendar_manage(%(room_id)s, %(available_date)s)""", }, 'availability_rule': { 'insert': True, 'update': True, }, 'room': { 'insert': """ INSERT INTO calendar (room_id, status, calendar_date, calendar_range) SELECT %(id)s, 'closed', calendar_date, tstzrange(calendar_date, calendar_date + '1 day'::interval) FROM generate_series( date_trunc('week', CURRENT_DATE), date_trunc('week', CURRENT_DATE + '52 weeks'::interval), '1 day'::interval ) calendar_date; """, }, 'unavailability': { 'insert': """SELECT calendar_manage(%(room_id)s, %(unavailable_date)s)""", 'update': """SELECT calendar_manage(%(room_id)s, %(unavailable_date)s)""", }, }
  • 46. Reading the Changes 46 class StreamReader(object): def _consume_change(self, payload): connection = psycopg2.connect("dbname=realtime") cursor = connection.cursor() for data in payload['change']: sql = SQL.get(data.get('table'), {}).get(data.get('kind')) if not sql: return params = dict(zip(data['columnnames'], data['columnvalues'])) if data['table'] == 'availability_rule': self._perform_availability_rule(cursor, data['kind'], params) else: cursor.execute(sql, params) connection.commit() cursor.close() connection.close() def _perform_availability_rule(self, cursor, kind, params): if kind == 'update': cursor.execute("""DELETE FROM availability WHERE availability_rule_id = %(id)s""", params) if kind in ['insert', 'update']: days_of_week = params['days_of_week'].replace('{', '').replace('}', '').split(',') for day_of_week in days_of_week: params['day_of_week'] = day_of_week cursor.execute( """ SELECT availability_rule_bulk_insert(ar, %(day_of_week)s) FROM availability_rule ar WHERE ar.id = %(id)s """, params)
  • 47. Reading the Changes 47 def __init__(self): self.connection = psycopg2.connect("dbname=schedule", connection_factory=psycopg2.extras.LogicalReplicationConnection, ) def __call__(self, msg): payload = json.loads(msg.payload, strict=False) print(payload) self._consume_change(payload) msg.cursor.send_feedback(flush_lsn=msg.data_start)
  • 48. Reading the Changes 48 reader = StreamReader() cursor = reader.connection.cursor() cursor.start_replication(slot_name='schedule', decode=True) try: cursor.consume_stream(reader) except KeyboardInterrupt: print("Stopping reader...") finally: cursor.close() reader.connection.close() print("Exiting reader")
  • 49. • A consumer of the logical stream can only read one change at a time • If our processing of a change takes a lot of time, it will create a backlog of changes • Backlog means the PostgreSQL server needs to retain more WAL logs • Retaining too many WAL logs can lead to running out of disk space • Running out of disk space can lead to...rough times. The Consumer Bottleneck 49 🌩 🌤 🌥 ☁
  • 50. Can we move any processing to a separate part of the application?
  • 51. • Can utilize a durable message queueing system to store any WAL changes that are necessary to perform post-processing on • Ensure the changes are worked on in order • "Divide-and-conquer" workload - have multiple workers acting on different "topics" • Remove WAL bloat Shifting the Workload 51
  • 52. • Durable message processing and distribution system • Streams • Supports parallelization of consumers • Multiple consumers, partitions • Highly-available, distributed architecture • Acknowledgement of receiving, processing messages; can replay (sounds like WAL?) Apache Kafka 52
  • 54. WAL Consumer 54 import json, sys from kafka import KafkaProducer from kafka.errors import KafkaError import psycopg2 import psycopg2.extras TABLES = set([ 'availability', 'availability_rule', 'room', 'unavailability', ]) reader = WALConsumer() cursor = reader.connection.cursor() cursor.start_replication(slot_name='schedule', decode=True) try: cursor.consume_stream(reader) except KeyboardInterrupt: print("Stopping reader...") finally: cursor.close() reader.connection.close() print("Exiting reader")
  • 55. WAL Consumer 55 class WALConsumer(object): def __init__(self): self.connection = psycopg2.connect("dbname=realtime", connection_factory=psycopg2.extras.LogicalReplicationConnection, ) self.producer = producer = KafkaProducer( bootstrap_servers=['localhost:9092'], value_serializer=lambda m: json.dumps(m).encode('ascii'), ) def __call__(self, msg): payload = json.loads(msg.payload, strict=False) print(payload) # determine if the payload should be passed on to a consumer listening # to the Kafka que for data in payload['change']: if data.get('table') in TABLES: self.producer.send(data.get('table'), data) # ensure everything is sent; call flush at this point self.producer.flush() # acknowledge that the change has been read - tells PostgreSQL to stop # holding onto this log file msg.cursor.send_feedback(flush_lsn=msg.data_start)
  • 56. Kafka Consumer 56 import json from kafka import KafkaConsumer from kafka.structs import OffsetAndMetadata, TopicPartition import psycopg2 class Worker(object): """Base class to work perform any post processing on changes""" OPERATIONS = set([]) # override with "insert", "update", "delete" def __init__(self, topic): # connect to the PostgreSQL database self.connection = psycopg2.connect("dbname=realtime") # connect to Kafka self.consumer = KafkaConsumer( bootstrap_servers=['localhost:9092'], value_deserializer=lambda m: json.loads(m.decode('utf8')), auto_offset_reset="earliest", group_id='1') # subscribe to the topic(s) self.consumer.subscribe(topic if isinstance(topic, list) else [topic])
  • 57. Kafka Consumer 57 def run(self): """Function that runs ad-infinitum""" # loop through the payloads from the consumer # determine if there are any follow-up actions based on the kind of # operation, and if so, act upon it # always commit when done. for msg in self.consumer: print(msg) # load the data from the message data = msg.value # determine if there are any follow-up operations to perform if data['kind'] in self.OPERATIONS: # open up a cursor for interacting with PostgreSQL cursor = self.connection.cursor() # put the parameters in an easy to digest format params = dict(zip(data['columnnames'], data['columnvalues'])) # all the function getattr(self, data['kind'])(cursor, params) # commit any work that has been done, and close the cursor self.connection.commit() cursor.close() # acknowledge the message has been handled tp = TopicPartition(msg.topic, msg.partition) offsets = {tp: OffsetAndMetadata(msg.offset, None)} self.consumer.commit(offsets=offsets)
  • 58. Kafka Consumer 58 # override with the appropriate post-processing code def insert(self, cursor, params): """Override with any post-processing to be done on an ``INSERT``""" raise NotImplementedError() def update(self, cursor, params): """Override with any post-processing to be done on an ``UPDATE``""" raise NotImplementedError() def delete(self, cursor, params): """Override with any post-processing to be done on an ``DELETE``""" raise NotImplementedError()
  • 60. • Logical decoding allows the bulk inserts to occur significantly faster from a transactional view • DELETEs are tricky if you need to do anything other than using the PRIMARY KEY • Can bucket changes by topic • Potential bottleneck for long running execution, but bottlenecks are isolated to specific queues Lessons 60
  • 61. Conclusion 61 • PostgreSQL is robust • Triggers will keep your data in sync but can have significant performance overhead • Utilizing a logical replication slot can eliminate trigger overhead and transfer the computational load elsewhere • Not a panacea: still need to use good architectural patterns!
  • 64. Appendix A: Schema for Example
  • 65. Managing Availability 65 CREATE TABLE room ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, name text NOT NULL ); CREATE TABLE availability_rule ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE, days_of_week int[] NOT NULL, start_time time NOT NULL, end_time time NOT NULL, generate_weeks_into_future int NOT NULL DEFAULT 52 );
  • 66. Managing Availability 66 CREATE TABLE availability ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE, availability_rule_id int NOT NULL REFERENCES availability_rule (id) ON DELETE CASCADE, available_date date NOT NULL, available_range tstzrange NOT NULL ); CREATE INDEX availability_available_range_gist_idx ON availability USING gist(available_range);
  • 67. Managing Availability 67 CREATE TABLE unavailability ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE, unavailable_date date NOT NULL, unavailable_range tstzrange NOT NULL ); CREATE INDEX unavailability_unavailable_range_gist_idx ON unavailability USING gist(unavailable_range);
  • 68. Managing Availability 68 CREATE TABLE calendar ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, room_id int NOT NULL REFERENCES room (id) ON DELETE CASCADE, status text NOT NULL, calendar_date date NOT NULL, calendar_range tstzrange NOT NULL ); CREATE INDEX calendar_room_id_calendar_date_idx ON calendar (room_id, calendar_date);
  • 70. 70 /** AVAILABILITY, UNAVAILABILITY, and CALENDAR */ /** We need some lengthy functions to help generate the calendar */
  • 71. 71 /** Helper function: generate the available chunks of time within a block of time for a day within a calendar */ CREATE OR REPLACE FUNCTION calendar_generate_available(room_id int, calendar_range tstzrange) RETURNS TABLE(status text, calendar_range tstzrange) AS $$ WITH RECURSIVE availables AS ( SELECT 'closed' AS left_status, CASE WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1 day'::interval) ELSE tstzrange( calendar_date, lower(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)) ) END AS left_range, CASE isempty(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)) WHEN TRUE THEN 'closed' ELSE 'available' END AS center_status, availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval) AS center_range, 'closed' AS right_status, CASE WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1 day'::interval) ELSE tstzrange( upper(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)), calendar_date + '1 day'::interval ) END AS right_range FROM generate_series(lower($2), upper($2), '1 day'::interval) AS calendar_date LEFT OUTER JOIN availability ON availability.room_id = $1 AND availability.available_range && $2 UNION SELECT 'closed' AS left_status, CASE WHEN availability.available_range && availables.left_range THEN tstzrange( lower(availables.left_range), lower(availables.left_range * availability.available_range) ) ELSE tstzrange( lower(availables.right_range), lower(availables.right_range * availability.available_range) ) END AS left_range, CASE WHEN availability.available_range && availables.left_range OR availability.available_range && availables.right_range THEN 'available' ELSE 'closed' END AS center_status, CASE WHEN availability.available_range && availables.left_range THEN availability.available_range * availables.left_range ELSE availability.available_range * availables.right_range END AS center_range, 'closed' AS right_status, CASE WHEN availability.available_range && availables.left_range THEN tstzrange( upper(availables.left_range * availability.available_range), upper(availables.left_range) ) ELSE tstzrange( upper(availables.right_range * availability.available_range), upper(availables.right_range) ) END AS right_range FROM availables JOIN availability ON availability.room_id = $1 AND availability.available_range && $2 AND availability.available_range <> availables.center_range AND ( availability.available_range && availables.left_range OR availability.available_range && availables.right_range ) ) SELECT * FROM ( SELECT x.left_status AS status, x.left_range AS calendar_range FROM availables x LEFT OUTER JOIN availables y ON x.left_range <> y.left_range AND x.left_range @> y.left_range GROUP BY 1, 2 HAVING NOT bool_or(COALESCE(x.left_range @> y.left_range, FALSE)) UNION SELECT DISTINCT x.center_status AS status, x.center_range AS calendar_range FROM availables x UNION SELECT x.right_status AS status, x.right_range AS calendar_range FROM availables x LEFT OUTER JOIN availables y ON x.right_range <> y.right_range AND x.right_range @> y.right_range GROUP BY 1, 2 HAVING NOT bool_or(COALESCE(x.right_range @> y.right_range, FALSE)) ) x WHERE NOT isempty(x.calendar_range) AND NOT lower_inf(x.calendar_range) AND NOT upper_inf(x.calendar_range) AND x.calendar_range <@ $2 $$ LANGUAGE SQL STABLE; This is the first of two helpers functions...
  • 72. • We will have two availability rules: • Open every day 8am - 8pm • Open every day 9pm - 10:30pm For this experiment 72
  • 73. 73 INSERT INTO room (name) VALUES ('Test Room'); INSERT INTO availability_rule (room_id, days_of_week, start_time, end_time) VALUES (1, ARRAY[1,2,3,4,5,6,7], '08:00', '20:00'), (1, ARRAY[1,2,3,4,5,6,7], '21:00', '22:30');
  • 74. 74 /** Helper function: generate the available chunks of time within a block of time for a day within a calendar */ CREATE OR REPLACE FUNCTION calendar_generate_available(room_id int, calendar_range tstzrange) RETURNS TABLE(status text, calendar_range tstzrange) AS $$
  • 75. 75 WITH RECURSIVE availables AS ( SELECT 'closed' AS left_status, CASE WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1 day'::interval) ELSE tstzrange( calendar_date, lower(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)) ) END AS left_range, CASE isempty(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)) WHEN TRUE THEN 'closed' ELSE 'available' END AS center_status, availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval) AS center_range, 'closed' AS right_status, CASE WHEN availability.id IS NULL THEN tstzrange(calendar_date, calendar_date + '1 day'::interval) ELSE tstzrange( upper(availability.available_range * tstzrange(calendar_date, calendar_date + '1 day'::interval)), calendar_date + '1 day'::interval ) END AS right_range FROM generate_series(lower($2), upper($2), '1 day'::interval) AS calendar_date LEFT OUTER JOIN availability ON availability.room_id = $1 AND availability.available_range && $2
  • 76. 76
  • 77. 77 UNION SELECT 'closed' AS left_status, CASE WHEN availability.available_range && availables.left_range THEN tstzrange( lower(availables.left_range), lower(availables.left_range * availability.available_range) ) ELSE tstzrange( lower(availables.right_range), lower(availables.right_range * availability.available_range) ) END AS left_range, CASE WHEN availability.available_range && availables.left_range OR availability.available_range && availables.right_range THEN 'available' ELSE 'closed' END AS center_status, CASE WHEN availability.available_range && availables.left_range THEN availability.available_range * availables.left_range ELSE availability.available_range * availables.right_range END AS center_range, 'closed' AS right_status, CASE WHEN availability.available_range && availables.left_range THEN tstzrange( upper(availables.left_range * availability.available_range), upper(availables.left_range) ) ELSE tstzrange( upper(availables.right_range * availability.available_range), upper(availables.right_range) ) END AS right_range FROM availables JOIN availability ON availability.room_id = $1 AND availability.available_range && $2 AND availability.available_range <> availables.center_range AND ( availability.available_range && availables.left_range OR availability.available_range && availables.right_range ))
  • 78. 78 UNION SELECT ... FROM availables JOIN availability ON availability.room_id = $1 AND availability.available_range && $2 AND availability.available_range <> availables.center_range AND ( availability.available_range && availables.left_range OR availability.available_range && availables.right_range )
  • 79. 79
  • 80. 80 'closed' AS left_status, CASE WHEN availability.available_range && availables.left_range THEN tstzrange( lower(availables.left_range), lower(availables.left_range * availability.available_range) ) ELSE tstzrange( lower(availables.right_range), lower(availables.right_range * availability.available_range) ) END AS left_range, CASE WHEN availability.available_range && availables.left_range OR availability.available_range && availables.right_range THEN 'available' ELSE 'closed' END AS center_status, CASE WHEN availability.available_range && availables.left_range THEN availability.available_range * availables.left_range ELSE availability.available_range * availables.right_range END AS center_range, 'closed' AS right_status, CASE WHEN availability.available_range && availables.left_range THEN tstzrange( upper(availables.left_range * availability.available_range), upper(availables.left_range) ) ELSE tstzrange( upper(availables.right_range * availability.available_range), upper(availables.right_range) ) END AS right_range
  • 81. 81
  • 82. 82
  • 83. 83 SELECT * FROM ( SELECT x.left_status AS status, x.left_range AS calendar_range FROM availables x LEFT OUTER JOIN availables y ON x.left_range <> y.left_range AND x.left_range @> y.left_range GROUP BY 1, 2 HAVING NOT bool_or(COALESCE(x.left_range @> y.left_range, FALSE)) UNION SELECT DISTINCT x.center_status AS status, x.center_range AS calendar_range FROM availables x UNION SELECT x.right_status AS status, x.right_range AS calendar_range FROM availables x LEFT OUTER JOIN availables y ON x.right_range <> y.right_range AND x.right_range @> y.right_range GROUP BY 1, 2 HAVING NOT bool_or(COALESCE(x.right_range @> y.right_range, FALSE)) ) x WHERE NOT isempty(x.calendar_range) AND NOT lower_inf(x.calendar_range) AND NOT upper_inf(x.calendar_range) AND x.calendar_range <@ $2 $$ LANGUAGE SQL STABLE;
  • 85. 85