SlideShare a Scribd company logo
1 of 71
Download to read offline
Cybertec Training: Data Analysis
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Introduction
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Scope of this training
Importing data
Simple aggregations
Windowing and analytics
Analyzing time series
Managing incomplete data
Writing custom aggregates
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Importing data
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Loading data
Things to consider when importing data
There are many ways to import data
Avoid mini-transactions for performance reasons
In case of large data sets speed is a major issue
There is a life after importing
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Importing a simple data set
A simple data structure . . .
test=# CREATE TABLE t_test (a int, b int);
CREATE TABLE
Let us add 10.000 rows now . . .
INSERT INTO t_test VALUES (1, 2);
...
INSERT INTO t_test VALUES (1, 2);
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Using one transaction to import things
BEGIN;
INSERT INTO t_test VALUES (1, 2);
...
INSERT INTO t_test VALUES (1, 2);
COMMIT;
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Observations
Performance can vary depending on hardware
Longer transactions can be WAYS faster
PostgreSQL has to flush every transaction to disk
Most of the time is burned by flushing
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Changing durability requirements
Performance will sky rocket . . .
SET synchronous_commit TO off;
INSERT INTO t_test VALUES (1, 2);
...
INSERT INTO t_test VALUES (1, 2);
The reason is that PostgreSQL does not have to flush every
transaction anymore.
Trading “durability” for performance
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Use bulk loads
Loading single rows is usually a bad idea
Use COPY to do bulk loading
COPY can load data A LOT faster than INSERT due to
significantly smaller overhead
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
A simple COPY
This time 10 million lines are imported: (10.000 rows are not
enough)
COPY t_test FROM stdin;
1 2
1 2
...
.
Note the performance difference (rows per second). There is no
need to check column lists, existence of table, etc. anymore ->
higher throughput
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
COPY: Observations to be made
In the default configuration (= checkpoint segments = 3) you
will see a steady up and down of I/O speed
This is caused by checkpoints happening in the background
Data has to go to the transaction log to “repair” data files in
case of a crash
Performance is limited by writing data twice
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
COPY: Using a transaction log bypass
Writing to the transaction log can be avoided in some cases:
BEGIN;
TRUNCATE t_test;
COPY t_test FROM stdin;
...
COMMIT;
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
COPY: Why the bypass works
TRUNCATE will schedule the removal of the data file on
COMMIT
COPY will start writing to a new data file
Concurrency is not an issue because TRUNCATE locks the
table
PostgreSQL can take the old or the new data file on COMMIT
or ROLLBACK
no need to actually repair a data file anymore
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
COPY: More on WAL-bypassing
BEGIN;
CREATE TABLE ...
WAL bypassing only works if you are not using streaming
replication
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Freezing rows
Before you import data, run . . .
ALTER TABLE t_test
SET (autovacuum_enabled = off);
Compare the timing of the first
SELECT count(id) FROM t_test;
with the second one.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Observations
The first run is a lot slower
During the first run, writes will happen
No more writes from the second run on
PostgreSQL sets bits in the background
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
The purpose of hint bits
On first write PostgreSQL checks if a row can be seen by
everybody.
This bit is set to make sure that PostgreSQL does not have to
go through expensive visibility checks next time.
This is an issue for big data sets
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Fixing after-import performance
To set hint bits straight away do a
test=# COPY t_test FROM ’/tmp/file.txt’ FREEZE;
Be careful. It only works in case . . .
ERROR: cannot perform FREEZE because the table
was not created or truncated in the
current subtransaction
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
VACUUM and hint bits
Bits can be set on an entire block as well (not just on rows)
VACUUM will set those hint bits
However, block bits will not speed things up as much as
row-level bits usually
For a heavily used read-only database system vacuuming data
can actually make sense (not to reclaim space)
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Importing some test data:
Here is some test data:
CREATE TABLE t_oil (country text,
year int,
production int);
COPY t_oil FROM PROGRAM
’curl www.cybertec.at/secret/oil.txt’;
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Simple aggregations
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Basic aggregation
test=# SELECT country, avg(production)
FROM t_oil
GROUP BY 1;
country | avg
---------------+-----------------------
USA | 9141.3478260869565217
Saudi Arabien | 7641.8260869565217391
(2 rows)
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
GROUP BY is needed for aggregates
A GROUP BY clause is needed because otherwise groups
cannot be built:
test=# SELECT country, avg(production) FROM t_oil;
ERROR: column "t_oil.country" must appear in the
GROUP BY clause or be used in an
aggregate function
LINE 1: SELECT country, avg(production) FROM t_oil;
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
HAVING: Filtering on aggregated data
test=# SELECT country, avg(production)
FROM t_oil
GROUP BY 1
HAVING avg(production) > 8000;
country | avg
---------+-----------------------
USA | 9141.3478260869565217
(1 row)
NOTE that an alias is not allowed in a HAVING clause
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Windowing and analytics
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
The purpose of windowing
An analogy:
Your car is not a valuable car because it is good
It is valuable because it is better than the ones driven by your
friends
This is what windowing does: The current row in relation to all
rows in the reference group
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Windowing vs GROUP BY
GROUP BY has been designed to reduce the amount of data
and turn it into aggregated values
Windowing is used to compare values and put them into
relation.
Windowing is used along with aggregate functions (e.g. sum,
count, avg, min, max, . . . )
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
A simple aggregate: Average values
SELECT *, avg(production) OVER () FROM t_oil ;
country | year | production | avg
---------------+------+------------+---------------
USA | 1965 | 9014 | 8391.58695652
USA | 1966 | 9579 | 8391.58695652
USA | 1967 | 10219 | 8391.58695652
USA | 1968 | 10600 | 8391.58695652
USA | 1969 | 10828 | 8391.58695652
...
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
What does the result mean?
‘Give me all rows and the average “over” all rows in the table’
Logically it is the same as . . .
SELECT *, (SELECT avg(production) FROM t_oil) AS avg
FROM t_oil;
However, subselects can be very nasty if the task is a more
complex one
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
OVER()-clauses can define order
Calculate max production up to a certain point
SELECT *, max(production) OVER (ORDER BY year)
FROM t_oil
WHERE country = ’Saudi Arabien’;
Saudi Arabia is a so called ‘swing producer’.
Note that max stays up even if production declines
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
OVER()-clauses can form groups
Averages for each country
SELECT *, avg(production)
OVER (PARTITION BY country)
FROM t_oil;
country | year | production | avg
---------------+------+------------+---------------
Saudi Arabien | 1965 | 2219 | 7641.82608695
Saudi Arabien | 1966 | 2615 | 7641.82608695
...
USA | 1965 | 9014 | 9141.34782608
USA | 1966 | 9579 | 9141.34782608
...
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Forming groups
Data is split into groups
Each row shows the average of all rows in its group
Note that we got one group (= window) per country
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
OVER() can contain order and groups
SELECT *, max(production)
OVER (PARTITION BY country ORDER BY year)
FROM t_oil;
In this case we get the maximum up to a given point
This is done for each country
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Abstracting window-clauses
SELECT *,
min(production) OVER (w),
max(production) OVER (w),
count(production) OVER (w)
FROM t_oil
WINDOW w AS (PARTITION BY country ORDER BY year)
the same clause can be used for many columns
many window-clauses may exist (w, w2, w3, etc.)
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
rank() and dense rank()
Data can be ranked according to some order
In case of duplicates
rank gives 1, 2, 2, 2, 5
dense rank gives 1, 2, 2, 2, 3
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Moving rows: lead
ORDER BY defines into which direction to “move” the row
the number defines the offset
SELECT *, lag(production, 1) OVER (ORDER BY year)
FROM t_oil WHERE country = ’USA’;
country | year | production | lag
---------+------+------------+-------
USA | 1965 | 9014 |
USA | 1966 | 9579 | 9014
USA | 1967 | 10219 | 9579
USA | 1968 | 10600 | 10219
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Calculating the change in production
Very easy thing to do now
SELECT *, production - lag(production, 1)
OVER (ORDER BY year)
FROM t_oil WHERE country = ’USA’;
country | year | production | ?column?
---------+------+------------+----------
USA | 1965 | 9014 |
USA | 1966 | 9579 | 565
USA | 1967 | 10219 | 640
USA | 1968 | 10600 | 381
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
lead is the opposite of lag
lag is the same as ‘lead(. . . , -1)’
lag pushes elements down
lead pushes elemens up
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
moving entire rows
SELECT *, lag(t_oil, 1) OVER (ORDER BY year)
FROM t_oil WHERE country = ’USA’;
country | year | production | lag
---------+------+------------+------------------
USA | 1965 | 9014 |
USA | 1966 | 9579 | (USA,1965,9014)
USA | 1967 | 10219 | (USA,1966,9579)
USA | 1968 | 10600 | (USA,1967,10219)
the composite type can then be disected using a subselect for
the current query
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
works for more than just one column
SELECT *, lag((year, production), 1)
OVER (ORDER BY year)
FROM t_oil WHERE country = ’USA’;
country | year | production | lag
---------+------+------------+--------------
USA | 1965 | 9014 |
USA | 1966 | 9579 | (1965,9014)
USA | 1967 | 10219 | (1966,9579)
USA | 1968 | 10600 | (1967,10219)
this is the perfect foundation to build custom aggregates to
solve complex problems
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Splitting data into equal parts
ntile can split your data into n equally sized blocks
ntile(4) will therefore give you a nice quantile distribution
order is needed to achieve that
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Here is how it works . . .
SELECT year, production, ntile(4)
OVER (ORDER BY production)
FROM t_oil WHERE country = ’USA’ ORDER BY 3, 2 DESC;
year | production | ntile
------+------------+-------
2000 | 7733 | 1
1999 | 7731 | 1
...
1966 | 9579 | 2
1989 | 9159 | 2
...
1972 | 11185 | 4
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Work can proceed from there
SELECT ntile, min(production), max(production)
FROM ( SELECT year, production, ntile(4)
OVER (ORDER BY production)
FROM t_oil WHERE country = ’USA’) AS x
GROUP BY 1 ORDER BY 1
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
The query returns nice quantiles
ntile | min | max
-------+-------+-------
1 | 6734 | 7733
2 | 8011 | 9579
3 | 9736 | 10231
4 | 10247 | 11297
(4 rows)
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Moving averages
More sophisticated frame-clauses are needed
The average is done for 2 years = current + previous one
SELECT *, avg(production) OVER (ORDER BY year ROWS
BETWEEN 1 PRECEDING AND 0 FOLLOWING)
FROM t_oil WHERE country = ’Saudi Arabien’;
country | year | production | avg
---------------+------+------------+------------
Saudi Arabien | 1965 | 2219 | 2219.0000
Saudi Arabien | 1966 | 2615 | 2417.0000
Saudi Arabien | 1967 | 2825 | 2720.0000
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Combining joins, aggregates, and windowing
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Combining data
To combine data we need some more import data
CREATE TABLE t_president
(name text,
start_year int,
end_year int,
party text);
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Some input data
A list of all American presidents and their presidency
test=# COPY t_president FROM PROGRAM
’curl www.cybertec.at/secret/president.txt’;
COPY 9
The format is not too nice for analysis
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Input data: American presidents
SELECT * FROM t_president ;
name | start_year | end_year | party
-------------------+------------+----------+------------
Lyndon B. Johnson | 1963 | 1969 | Democrat
Richard M. Nixon | 1969 | 1974 | Republican
Gerald Ford | 1974 | 1977 | Republican
Jimmy Carter | 1977 | 1981 | Democrat
Ronald W. Reagan | 1981 | 1989 | Republican
George H. W. Bush | 1989 | 1993 | Republican
Bill Clinton | 1993 | 2001 | Democrat
George W. Bush | 2001 | 2009 | Republican
Barack Obama | 2009 | 2017 | Democrat
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
The challenge: Adjust the format
LATERAL can come to the rescue
SELECT name, party, year
FROM t_president AS x,
LATERAL (SELECT * FROM
generate_series(x.start_year, x.end_year - 1)
AS year) AS y
LIMIT 8;
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
The output is:
name | party | year
-------------------+------------+------
Lyndon B. Johnson | Democrat | 1963
Lyndon B. Johnson | Democrat | 1964
Lyndon B. Johnson | Democrat | 1965
Lyndon B. Johnson | Democrat | 1966
Lyndon B. Johnson | Democrat | 1967
Lyndon B. Johnson | Democrat | 1968
Richard M. Nixon | Republican | 1969
Richard M. Nixon | Republican | 1970
Richard M. Nixon | Republican | 1971
Richard M. Nixon | Republican | 1972
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Which party is better for oil?
The following way to solve the problem is definitely not the
only one.
There might be other factors than the party of the president
when it comes to this kind of data.
Keep in mind: It is just an SQL exercise
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Putting things together (1)
CREATE VIEW v AS
WITH b AS (
SELECT name, party, year
FROM t_president AS x,
LATERAL (SELECT * FROM generate_series(
x.start_year,
x.end_year - 1) AS year) AS y)
SELECT a.*, party,
production - lag(production, 1)
OVER (ORDER BY a.year) AS lag
FROM t_oil AS a, b
WHERE a.year = b.year AND country = ’USA’;
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
What we got so far
SELECT * FROM v;
country | year | production | party | lag
---------+------+------------+------------+------
USA | 1965 | 9014 | Democrat |
USA | 1966 | 9579 | Democrat | 565
USA | 1967 | 10219 | Democrat | 640
USA | 1968 | 10600 | Democrat | 381
USA | 1969 | 10828 | Republican | 228
USA | 1970 | 11297 | Republican | 469
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Making use of NULL
Remember NULL is ignored by aggregate functions
We can use that to do ‘partial counts’
SELECT party, lag,
CASE WHEN lag > 0 THEN 1 END AS up,
CASE WHEN lag < 0 THEN 1 END AS down
FROM v
ORDER BY year;
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Which gives us . . .
party | lag | up | down
------------+------+----+------
Democrat | | |
Democrat | 565 | 1 |
Democrat | 640 | 1 |
Democrat | 381 | 1 |
Republican | 228 | 1 |
Republican | 469 | 1 |
Republican | -141 | | 1
Republican | 29 | 1 |
Republican | -239 | | 1
Republican | -485 | | 1
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
We can move on from there easily
SELECT party,
count(CASE WHEN lag > 0 THEN 1 END) AS up,
count(CASE WHEN lag < 0 THEN 1 END) AS down
FROM v
GROUP BY party;
party | up | down
------------+----+------
Democrat | 9 | 8
Republican | 10 | 18
(2 rows)
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Handling missing data
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Preparing our sample data
test=# UPDATE t_oil
SET production = NULL
WHERE year IN (1998, 1999)
AND country = ’USA’ RETURNING *;
country | year | production
---------+------+------------
USA | 1998 |
USA | 1999 |
(2 rows)
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Challenges ahead
How can we make lead and lag work again?
How can we fill the gaps?
How can we control our behavior in a more efficient way?
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Turning to frame-clauses once again
One idea is to just use the average of some previous values
However, you might also want to turn to interpolation or
outright guess work
A custom aggregate might help
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
A ‘lazy’ idea
Creating an array with some historic values
Applying a function on this array
SELECT year, production, array_agg(production)
OVER (ORDER BY year ROWS BETWEEN 3 PRECEDING
AND 0 FOLLOWING)
FROM t_oil
WHERE country = ’USA’;
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Which gives us . . .
... snip ...
1995 | 8322 | {8868,8583,8389,8322}
1996 | 8295 | {8583,8389,8322,8295}
1997 | 8269 | {8389,8322,8295,8269}
1998 | | {8322,8295,8269,NULL}
1999 | | {8295,8269,NULL,NULL}
2000 | 7733 | {8269,NULL,NULL,7733}
2001 | 7669 | {NULL,NULL,7733,7669}
2002 | 7626 | {NULL,7733,7669,7626}
2003 | 7400 | {7733,7669,7626,7400}
... snip ...
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Applying a function
A simple function could look like this:
SELECT avg(x)
FROM unnest(’{8295,8269,NULL,NULL}’::int4[]) AS x;
avg
-----------------------
8282.0000000000000000
(1 row)
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
A query could therefore look like this
SELECT *, (SELECT avg(x) FROM unnest(array_agg) AS x)
FROM (SELECT year, production, array_agg(production)
OVER (ORDER BY year ROWS BETWEEN 3 PRECEDING
AND 0 FOLLOWING)
FROM t_oil WHERE country = ’USA’) AS y
OFFSET 32 LIMIT 4;
year | production | array_agg | avg
------+------------+-----------------------+-------------
1997 | 8269 | {8389,8322,8295,8269} | 8318.750000
1998 | | {8322,8295,8269,NULL} | 8295.333333
1999 | | {8295,8269,NULL,NULL} | 8282.000000
2000 | 7733 | {8269,NULL,NULL,7733} | 8001.000000
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Defining an aggregate
Defining an aggregate is really the more desirable way
It is ways more clean
CREATE AGGREGATE is your friend
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
A simple example
the aggregate can be created like this:
CREATE FUNCTION my_final(int[]) RETURNS numeric AS
$$
SELECT avg(x) FROM unnest($1) AS x;
$$ LANGUAGE sql;
CREATE AGGREGATE artificial_avg(int) (
SFUNC = array_append,
STYPE = int[],
INITCOND = ’{}’,
FINALFUNC = my_final
);
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Using our new aggregate
SELECT year, production, artificial_avg(production)
OVER (ORDER BY year
ROWS BETWEEN 3 PRECEDING AND 0 FOLLOWING)
FROM t_oil WHERE country = ’USA’;
the aggregate can be used just like any other aggregate in the
system
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Finally
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Thank you for your attention
Cybertec Sch¨onig & Sch¨onig GmbH
Gr¨ohrm¨uhlgasse 26
A-2700 Wiener Neustadt
www.postgresql-support.de
Hans-J¨urgen Sch¨onig
www.postgresql-support.de

More Related Content

What's hot

Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationAlexey Lesovsky
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollPGConf APAC
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Altinity Ltd
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovAltinity Ltd
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL-Consulting
 
Common Performance Pitfalls in Odoo apps
Common Performance Pitfalls in Odoo appsCommon Performance Pitfalls in Odoo apps
Common Performance Pitfalls in Odoo appsOdoo
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...Altinity Ltd
 
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeSCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeJeff Frost
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQLMark Wong
 
12c SQL Plan Directives
12c SQL Plan Directives12c SQL Plan Directives
12c SQL Plan DirectivesFranck Pachot
 
PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6Tomas Vondra
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
 
Understanding Autovacuum
Understanding AutovacuumUnderstanding Autovacuum
Understanding AutovacuumDan Robinson
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenPostgresOpen
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PgDay.Seoul
 

What's hot (20)

Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Strategic autovacuum
Strategic autovacuumStrategic autovacuum
Strategic autovacuum
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'roll
 
Pgcenter overview
Pgcenter overviewPgcenter overview
Pgcenter overview
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQ
 
Common Performance Pitfalls in Odoo apps
Common Performance Pitfalls in Odoo appsCommon Performance Pitfalls in Odoo apps
Common Performance Pitfalls in Odoo apps
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
 
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeSCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQL
 
12c SQL Plan Directives
12c SQL Plan Directives12c SQL Plan Directives
12c SQL Plan Directives
 
PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
Understanding Autovacuum
Understanding AutovacuumUnderstanding Autovacuum
Understanding Autovacuum
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
 
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
 

Viewers also liked

PostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreibenPostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreibenHans-Jürgen Schönig
 
PostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database securityPostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database securityHans-Jürgen Schönig
 
Role of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, WarehousingRole of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, WarehousingVenu Anuganti
 
Not Your Father's Database by Vida Ha
Not Your Father's Database by Vida HaNot Your Father's Database by Vida Ha
Not Your Father's Database by Vida HaSpark Summit
 
Infrastructure Monitoring with Postgres
Infrastructure Monitoring with PostgresInfrastructure Monitoring with Postgres
Infrastructure Monitoring with PostgresSteven Simpson
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit
 
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...Spark Summit
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Spark Summit
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationmysqlops
 
It's time to change the basics of Cyber Security
It's time to change the basics of Cyber SecurityIt's time to change the basics of Cyber Security
It's time to change the basics of Cyber SecurityJiří Napravnik
 
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...Amazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...Amazon Web Services
 

Viewers also liked (20)

PostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreibenPostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreiben
 
Explain explain
Explain explainExplain explain
Explain explain
 
5min analyse
5min analyse5min analyse
5min analyse
 
PostgreSQL: The NoSQL way
PostgreSQL: The NoSQL wayPostgreSQL: The NoSQL way
PostgreSQL: The NoSQL way
 
PostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database securityPostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database security
 
Role of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, WarehousingRole of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, Warehousing
 
PostgreSQL Replication Tutorial
PostgreSQL Replication TutorialPostgreSQL Replication Tutorial
PostgreSQL Replication Tutorial
 
Not Your Father's Database by Vida Ha
Not Your Father's Database by Vida HaNot Your Father's Database by Vida Ha
Not Your Father's Database by Vida Ha
 
Infrastructure Monitoring with Postgres
Infrastructure Monitoring with PostgresInfrastructure Monitoring with Postgres
Infrastructure Monitoring with Postgres
 
Really Big Elephants: PostgreSQL DW
Really Big Elephants: PostgreSQL DWReally Big Elephants: PostgreSQL DW
Really Big Elephants: PostgreSQL DW
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimization
 
It's time to change the basics of Cyber Security
It's time to change the basics of Cyber SecurityIt's time to change the basics of Cyber Security
It's time to change the basics of Cyber Security
 
Big Data e seu fiel companheiro Spark
Big Data e seu fiel companheiro SparkBig Data e seu fiel companheiro Spark
Big Data e seu fiel companheiro Spark
 
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
 
Big Data na globo.com
Big Data na globo.comBig Data na globo.com
Big Data na globo.com
 
Bases de datos en la nube con AWS
Bases de datos en la nube con AWSBases de datos en la nube con AWS
Bases de datos en la nube con AWS
 
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
 

Similar to PostgreSQL: Data analysis and analytics

Simplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functionsSimplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functionsClayton Groom
 
Faster transactions & analytics with the new SQL2016 In-memory technologies
Faster transactions & analytics with the new SQL2016 In-memory technologiesFaster transactions & analytics with the new SQL2016 In-memory technologies
Faster transactions & analytics with the new SQL2016 In-memory technologiesHenk van der Valk
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLJim Mlodgenski
 
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Chris Adkin
 
Mis presentation
Mis presentationMis presentation
Mis presentationprutha_beta
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarinn5712036
 
The Ring programming language version 1.8 book - Part 95 of 202
The Ring programming language version 1.8 book - Part 95 of 202The Ring programming language version 1.8 book - Part 95 of 202
The Ring programming language version 1.8 book - Part 95 of 202Mahmoud Samir Fayed
 
The Ring programming language version 1.5.3 book - Part 8 of 184
The Ring programming language version 1.5.3 book - Part 8 of 184The Ring programming language version 1.5.3 book - Part 8 of 184
The Ring programming language version 1.5.3 book - Part 8 of 184Mahmoud Samir Fayed
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...Altinity Ltd
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondTomas Vondra
 
The Ring programming language version 1.9 book - Part 14 of 210
The Ring programming language version 1.9 book - Part 14 of 210The Ring programming language version 1.9 book - Part 14 of 210
The Ring programming language version 1.9 book - Part 14 of 210Mahmoud Samir Fayed
 
La programmation concurrente par flux de données
La programmation concurrente par flux de donnéesLa programmation concurrente par flux de données
La programmation concurrente par flux de donnéesMicrosoft
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQLEDB
 
The Ring programming language version 1.5.1 book - Part 7 of 180
The Ring programming language version 1.5.1 book - Part 7 of 180The Ring programming language version 1.5.1 book - Part 7 of 180
The Ring programming language version 1.5.1 book - Part 7 of 180Mahmoud Samir Fayed
 
The Ring programming language version 1.8 book - Part 12 of 202
The Ring programming language version 1.8 book - Part 12 of 202The Ring programming language version 1.8 book - Part 12 of 202
The Ring programming language version 1.8 book - Part 12 of 202Mahmoud Samir Fayed
 
The Ring programming language version 1.5.4 book - Part 8 of 185
The Ring programming language version 1.5.4 book - Part 8 of 185The Ring programming language version 1.5.4 book - Part 8 of 185
The Ring programming language version 1.5.4 book - Part 8 of 185Mahmoud Samir Fayed
 
filesHeap.h#ifndef HEAP_H#define HEAP_H#includ.docx
filesHeap.h#ifndef HEAP_H#define HEAP_H#includ.docxfilesHeap.h#ifndef HEAP_H#define HEAP_H#includ.docx
filesHeap.h#ifndef HEAP_H#define HEAP_H#includ.docxssuser454af01
 

Similar to PostgreSQL: Data analysis and analytics (20)

Simplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functionsSimplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functions
 
Faster transactions & analytics with the new SQL2016 In-memory technologies
Faster transactions & analytics with the new SQL2016 In-memory technologiesFaster transactions & analytics with the new SQL2016 In-memory technologies
Faster transactions & analytics with the new SQL2016 In-memory technologies
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
 
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)
 
Mis presentation
Mis presentationMis presentation
Mis presentation
 
PHP tips by a MYSQL DBA
PHP tips by a MYSQL DBAPHP tips by a MYSQL DBA
PHP tips by a MYSQL DBA
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
 
The Ring programming language version 1.8 book - Part 95 of 202
The Ring programming language version 1.8 book - Part 95 of 202The Ring programming language version 1.8 book - Part 95 of 202
The Ring programming language version 1.8 book - Part 95 of 202
 
The Ring programming language version 1.5.3 book - Part 8 of 184
The Ring programming language version 1.5.3 book - Part 8 of 184The Ring programming language version 1.5.3 book - Part 8 of 184
The Ring programming language version 1.5.3 book - Part 8 of 184
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Xgboost
XgboostXgboost
Xgboost
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyond
 
The Ring programming language version 1.9 book - Part 14 of 210
The Ring programming language version 1.9 book - Part 14 of 210The Ring programming language version 1.9 book - Part 14 of 210
The Ring programming language version 1.9 book - Part 14 of 210
 
La programmation concurrente par flux de données
La programmation concurrente par flux de donnéesLa programmation concurrente par flux de données
La programmation concurrente par flux de données
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
 
The Ring programming language version 1.5.1 book - Part 7 of 180
The Ring programming language version 1.5.1 book - Part 7 of 180The Ring programming language version 1.5.1 book - Part 7 of 180
The Ring programming language version 1.5.1 book - Part 7 of 180
 
The Ring programming language version 1.8 book - Part 12 of 202
The Ring programming language version 1.8 book - Part 12 of 202The Ring programming language version 1.8 book - Part 12 of 202
The Ring programming language version 1.8 book - Part 12 of 202
 
The Ring programming language version 1.5.4 book - Part 8 of 185
The Ring programming language version 1.5.4 book - Part 8 of 185The Ring programming language version 1.5.4 book - Part 8 of 185
The Ring programming language version 1.5.4 book - Part 8 of 185
 
filesHeap.h#ifndef HEAP_H#define HEAP_H#includ.docx
filesHeap.h#ifndef HEAP_H#define HEAP_H#includ.docxfilesHeap.h#ifndef HEAP_H#define HEAP_H#includ.docx
filesHeap.h#ifndef HEAP_H#define HEAP_H#includ.docx
 

Recently uploaded

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 

Recently uploaded (20)

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 

PostgreSQL: Data analysis and analytics

  • 1. Cybertec Training: Data Analysis Hans-J¨urgen Sch¨onig www.postgresql-support.de Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 3. Scope of this training Importing data Simple aggregations Windowing and analytics Analyzing time series Managing incomplete data Writing custom aggregates Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 5. Loading data Things to consider when importing data There are many ways to import data Avoid mini-transactions for performance reasons In case of large data sets speed is a major issue There is a life after importing Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 6. Importing a simple data set A simple data structure . . . test=# CREATE TABLE t_test (a int, b int); CREATE TABLE Let us add 10.000 rows now . . . INSERT INTO t_test VALUES (1, 2); ... INSERT INTO t_test VALUES (1, 2); Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 7. Using one transaction to import things BEGIN; INSERT INTO t_test VALUES (1, 2); ... INSERT INTO t_test VALUES (1, 2); COMMIT; Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 8. Observations Performance can vary depending on hardware Longer transactions can be WAYS faster PostgreSQL has to flush every transaction to disk Most of the time is burned by flushing Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 9. Changing durability requirements Performance will sky rocket . . . SET synchronous_commit TO off; INSERT INTO t_test VALUES (1, 2); ... INSERT INTO t_test VALUES (1, 2); The reason is that PostgreSQL does not have to flush every transaction anymore. Trading “durability” for performance Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 10. Use bulk loads Loading single rows is usually a bad idea Use COPY to do bulk loading COPY can load data A LOT faster than INSERT due to significantly smaller overhead Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 11. A simple COPY This time 10 million lines are imported: (10.000 rows are not enough) COPY t_test FROM stdin; 1 2 1 2 ... . Note the performance difference (rows per second). There is no need to check column lists, existence of table, etc. anymore -> higher throughput Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 12. COPY: Observations to be made In the default configuration (= checkpoint segments = 3) you will see a steady up and down of I/O speed This is caused by checkpoints happening in the background Data has to go to the transaction log to “repair” data files in case of a crash Performance is limited by writing data twice Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 13. COPY: Using a transaction log bypass Writing to the transaction log can be avoided in some cases: BEGIN; TRUNCATE t_test; COPY t_test FROM stdin; ... COMMIT; Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 14. COPY: Why the bypass works TRUNCATE will schedule the removal of the data file on COMMIT COPY will start writing to a new data file Concurrency is not an issue because TRUNCATE locks the table PostgreSQL can take the old or the new data file on COMMIT or ROLLBACK no need to actually repair a data file anymore Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 15. COPY: More on WAL-bypassing BEGIN; CREATE TABLE ... WAL bypassing only works if you are not using streaming replication Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 16. Freezing rows Before you import data, run . . . ALTER TABLE t_test SET (autovacuum_enabled = off); Compare the timing of the first SELECT count(id) FROM t_test; with the second one. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 17. Observations The first run is a lot slower During the first run, writes will happen No more writes from the second run on PostgreSQL sets bits in the background Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 18. The purpose of hint bits On first write PostgreSQL checks if a row can be seen by everybody. This bit is set to make sure that PostgreSQL does not have to go through expensive visibility checks next time. This is an issue for big data sets Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 19. Fixing after-import performance To set hint bits straight away do a test=# COPY t_test FROM ’/tmp/file.txt’ FREEZE; Be careful. It only works in case . . . ERROR: cannot perform FREEZE because the table was not created or truncated in the current subtransaction Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 20. VACUUM and hint bits Bits can be set on an entire block as well (not just on rows) VACUUM will set those hint bits However, block bits will not speed things up as much as row-level bits usually For a heavily used read-only database system vacuuming data can actually make sense (not to reclaim space) Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 21. Importing some test data: Here is some test data: CREATE TABLE t_oil (country text, year int, production int); COPY t_oil FROM PROGRAM ’curl www.cybertec.at/secret/oil.txt’; Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 23. Basic aggregation test=# SELECT country, avg(production) FROM t_oil GROUP BY 1; country | avg ---------------+----------------------- USA | 9141.3478260869565217 Saudi Arabien | 7641.8260869565217391 (2 rows) Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 24. GROUP BY is needed for aggregates A GROUP BY clause is needed because otherwise groups cannot be built: test=# SELECT country, avg(production) FROM t_oil; ERROR: column "t_oil.country" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT country, avg(production) FROM t_oil; Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 25. HAVING: Filtering on aggregated data test=# SELECT country, avg(production) FROM t_oil GROUP BY 1 HAVING avg(production) > 8000; country | avg ---------+----------------------- USA | 9141.3478260869565217 (1 row) NOTE that an alias is not allowed in a HAVING clause Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 26. Windowing and analytics Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 27. The purpose of windowing An analogy: Your car is not a valuable car because it is good It is valuable because it is better than the ones driven by your friends This is what windowing does: The current row in relation to all rows in the reference group Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 28. Windowing vs GROUP BY GROUP BY has been designed to reduce the amount of data and turn it into aggregated values Windowing is used to compare values and put them into relation. Windowing is used along with aggregate functions (e.g. sum, count, avg, min, max, . . . ) Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 29. A simple aggregate: Average values SELECT *, avg(production) OVER () FROM t_oil ; country | year | production | avg ---------------+------+------------+--------------- USA | 1965 | 9014 | 8391.58695652 USA | 1966 | 9579 | 8391.58695652 USA | 1967 | 10219 | 8391.58695652 USA | 1968 | 10600 | 8391.58695652 USA | 1969 | 10828 | 8391.58695652 ... Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 30. What does the result mean? ‘Give me all rows and the average “over” all rows in the table’ Logically it is the same as . . . SELECT *, (SELECT avg(production) FROM t_oil) AS avg FROM t_oil; However, subselects can be very nasty if the task is a more complex one Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 31. OVER()-clauses can define order Calculate max production up to a certain point SELECT *, max(production) OVER (ORDER BY year) FROM t_oil WHERE country = ’Saudi Arabien’; Saudi Arabia is a so called ‘swing producer’. Note that max stays up even if production declines Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 32. OVER()-clauses can form groups Averages for each country SELECT *, avg(production) OVER (PARTITION BY country) FROM t_oil; country | year | production | avg ---------------+------+------------+--------------- Saudi Arabien | 1965 | 2219 | 7641.82608695 Saudi Arabien | 1966 | 2615 | 7641.82608695 ... USA | 1965 | 9014 | 9141.34782608 USA | 1966 | 9579 | 9141.34782608 ... Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 33. Forming groups Data is split into groups Each row shows the average of all rows in its group Note that we got one group (= window) per country Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 34. OVER() can contain order and groups SELECT *, max(production) OVER (PARTITION BY country ORDER BY year) FROM t_oil; In this case we get the maximum up to a given point This is done for each country Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 35. Abstracting window-clauses SELECT *, min(production) OVER (w), max(production) OVER (w), count(production) OVER (w) FROM t_oil WINDOW w AS (PARTITION BY country ORDER BY year) the same clause can be used for many columns many window-clauses may exist (w, w2, w3, etc.) Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 36. rank() and dense rank() Data can be ranked according to some order In case of duplicates rank gives 1, 2, 2, 2, 5 dense rank gives 1, 2, 2, 2, 3 Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 37. Moving rows: lead ORDER BY defines into which direction to “move” the row the number defines the offset SELECT *, lag(production, 1) OVER (ORDER BY year) FROM t_oil WHERE country = ’USA’; country | year | production | lag ---------+------+------------+------- USA | 1965 | 9014 | USA | 1966 | 9579 | 9014 USA | 1967 | 10219 | 9579 USA | 1968 | 10600 | 10219 Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 38. Calculating the change in production Very easy thing to do now SELECT *, production - lag(production, 1) OVER (ORDER BY year) FROM t_oil WHERE country = ’USA’; country | year | production | ?column? ---------+------+------------+---------- USA | 1965 | 9014 | USA | 1966 | 9579 | 565 USA | 1967 | 10219 | 640 USA | 1968 | 10600 | 381 Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 39. lead is the opposite of lag lag is the same as ‘lead(. . . , -1)’ lag pushes elements down lead pushes elemens up Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 40. moving entire rows SELECT *, lag(t_oil, 1) OVER (ORDER BY year) FROM t_oil WHERE country = ’USA’; country | year | production | lag ---------+------+------------+------------------ USA | 1965 | 9014 | USA | 1966 | 9579 | (USA,1965,9014) USA | 1967 | 10219 | (USA,1966,9579) USA | 1968 | 10600 | (USA,1967,10219) the composite type can then be disected using a subselect for the current query Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 41. works for more than just one column SELECT *, lag((year, production), 1) OVER (ORDER BY year) FROM t_oil WHERE country = ’USA’; country | year | production | lag ---------+------+------------+-------------- USA | 1965 | 9014 | USA | 1966 | 9579 | (1965,9014) USA | 1967 | 10219 | (1966,9579) USA | 1968 | 10600 | (1967,10219) this is the perfect foundation to build custom aggregates to solve complex problems Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 42. Splitting data into equal parts ntile can split your data into n equally sized blocks ntile(4) will therefore give you a nice quantile distribution order is needed to achieve that Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 43. Here is how it works . . . SELECT year, production, ntile(4) OVER (ORDER BY production) FROM t_oil WHERE country = ’USA’ ORDER BY 3, 2 DESC; year | production | ntile ------+------------+------- 2000 | 7733 | 1 1999 | 7731 | 1 ... 1966 | 9579 | 2 1989 | 9159 | 2 ... 1972 | 11185 | 4 Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 44. Work can proceed from there SELECT ntile, min(production), max(production) FROM ( SELECT year, production, ntile(4) OVER (ORDER BY production) FROM t_oil WHERE country = ’USA’) AS x GROUP BY 1 ORDER BY 1 Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 45. The query returns nice quantiles ntile | min | max -------+-------+------- 1 | 6734 | 7733 2 | 8011 | 9579 3 | 9736 | 10231 4 | 10247 | 11297 (4 rows) Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 46. Moving averages More sophisticated frame-clauses are needed The average is done for 2 years = current + previous one SELECT *, avg(production) OVER (ORDER BY year ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING) FROM t_oil WHERE country = ’Saudi Arabien’; country | year | production | avg ---------------+------+------------+------------ Saudi Arabien | 1965 | 2219 | 2219.0000 Saudi Arabien | 1966 | 2615 | 2417.0000 Saudi Arabien | 1967 | 2825 | 2720.0000 Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 47. Combining joins, aggregates, and windowing Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 48. Combining data To combine data we need some more import data CREATE TABLE t_president (name text, start_year int, end_year int, party text); Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 49. Some input data A list of all American presidents and their presidency test=# COPY t_president FROM PROGRAM ’curl www.cybertec.at/secret/president.txt’; COPY 9 The format is not too nice for analysis Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 50. Input data: American presidents SELECT * FROM t_president ; name | start_year | end_year | party -------------------+------------+----------+------------ Lyndon B. Johnson | 1963 | 1969 | Democrat Richard M. Nixon | 1969 | 1974 | Republican Gerald Ford | 1974 | 1977 | Republican Jimmy Carter | 1977 | 1981 | Democrat Ronald W. Reagan | 1981 | 1989 | Republican George H. W. Bush | 1989 | 1993 | Republican Bill Clinton | 1993 | 2001 | Democrat George W. Bush | 2001 | 2009 | Republican Barack Obama | 2009 | 2017 | Democrat Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 51. The challenge: Adjust the format LATERAL can come to the rescue SELECT name, party, year FROM t_president AS x, LATERAL (SELECT * FROM generate_series(x.start_year, x.end_year - 1) AS year) AS y LIMIT 8; Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 52. The output is: name | party | year -------------------+------------+------ Lyndon B. Johnson | Democrat | 1963 Lyndon B. Johnson | Democrat | 1964 Lyndon B. Johnson | Democrat | 1965 Lyndon B. Johnson | Democrat | 1966 Lyndon B. Johnson | Democrat | 1967 Lyndon B. Johnson | Democrat | 1968 Richard M. Nixon | Republican | 1969 Richard M. Nixon | Republican | 1970 Richard M. Nixon | Republican | 1971 Richard M. Nixon | Republican | 1972 Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 53. Which party is better for oil? The following way to solve the problem is definitely not the only one. There might be other factors than the party of the president when it comes to this kind of data. Keep in mind: It is just an SQL exercise Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 54. Putting things together (1) CREATE VIEW v AS WITH b AS ( SELECT name, party, year FROM t_president AS x, LATERAL (SELECT * FROM generate_series( x.start_year, x.end_year - 1) AS year) AS y) SELECT a.*, party, production - lag(production, 1) OVER (ORDER BY a.year) AS lag FROM t_oil AS a, b WHERE a.year = b.year AND country = ’USA’; Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 55. What we got so far SELECT * FROM v; country | year | production | party | lag ---------+------+------------+------------+------ USA | 1965 | 9014 | Democrat | USA | 1966 | 9579 | Democrat | 565 USA | 1967 | 10219 | Democrat | 640 USA | 1968 | 10600 | Democrat | 381 USA | 1969 | 10828 | Republican | 228 USA | 1970 | 11297 | Republican | 469 Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 56. Making use of NULL Remember NULL is ignored by aggregate functions We can use that to do ‘partial counts’ SELECT party, lag, CASE WHEN lag > 0 THEN 1 END AS up, CASE WHEN lag < 0 THEN 1 END AS down FROM v ORDER BY year; Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 57. Which gives us . . . party | lag | up | down ------------+------+----+------ Democrat | | | Democrat | 565 | 1 | Democrat | 640 | 1 | Democrat | 381 | 1 | Republican | 228 | 1 | Republican | 469 | 1 | Republican | -141 | | 1 Republican | 29 | 1 | Republican | -239 | | 1 Republican | -485 | | 1 Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 58. We can move on from there easily SELECT party, count(CASE WHEN lag > 0 THEN 1 END) AS up, count(CASE WHEN lag < 0 THEN 1 END) AS down FROM v GROUP BY party; party | up | down ------------+----+------ Democrat | 9 | 8 Republican | 10 | 18 (2 rows) Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 59. Handling missing data Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 60. Preparing our sample data test=# UPDATE t_oil SET production = NULL WHERE year IN (1998, 1999) AND country = ’USA’ RETURNING *; country | year | production ---------+------+------------ USA | 1998 | USA | 1999 | (2 rows) Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 61. Challenges ahead How can we make lead and lag work again? How can we fill the gaps? How can we control our behavior in a more efficient way? Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 62. Turning to frame-clauses once again One idea is to just use the average of some previous values However, you might also want to turn to interpolation or outright guess work A custom aggregate might help Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 63. A ‘lazy’ idea Creating an array with some historic values Applying a function on this array SELECT year, production, array_agg(production) OVER (ORDER BY year ROWS BETWEEN 3 PRECEDING AND 0 FOLLOWING) FROM t_oil WHERE country = ’USA’; Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 64. Which gives us . . . ... snip ... 1995 | 8322 | {8868,8583,8389,8322} 1996 | 8295 | {8583,8389,8322,8295} 1997 | 8269 | {8389,8322,8295,8269} 1998 | | {8322,8295,8269,NULL} 1999 | | {8295,8269,NULL,NULL} 2000 | 7733 | {8269,NULL,NULL,7733} 2001 | 7669 | {NULL,NULL,7733,7669} 2002 | 7626 | {NULL,7733,7669,7626} 2003 | 7400 | {7733,7669,7626,7400} ... snip ... Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 65. Applying a function A simple function could look like this: SELECT avg(x) FROM unnest(’{8295,8269,NULL,NULL}’::int4[]) AS x; avg ----------------------- 8282.0000000000000000 (1 row) Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 66. A query could therefore look like this SELECT *, (SELECT avg(x) FROM unnest(array_agg) AS x) FROM (SELECT year, production, array_agg(production) OVER (ORDER BY year ROWS BETWEEN 3 PRECEDING AND 0 FOLLOWING) FROM t_oil WHERE country = ’USA’) AS y OFFSET 32 LIMIT 4; year | production | array_agg | avg ------+------------+-----------------------+------------- 1997 | 8269 | {8389,8322,8295,8269} | 8318.750000 1998 | | {8322,8295,8269,NULL} | 8295.333333 1999 | | {8295,8269,NULL,NULL} | 8282.000000 2000 | 7733 | {8269,NULL,NULL,7733} | 8001.000000 Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 67. Defining an aggregate Defining an aggregate is really the more desirable way It is ways more clean CREATE AGGREGATE is your friend Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 68. A simple example the aggregate can be created like this: CREATE FUNCTION my_final(int[]) RETURNS numeric AS $$ SELECT avg(x) FROM unnest($1) AS x; $$ LANGUAGE sql; CREATE AGGREGATE artificial_avg(int) ( SFUNC = array_append, STYPE = int[], INITCOND = ’{}’, FINALFUNC = my_final ); Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 69. Using our new aggregate SELECT year, production, artificial_avg(production) OVER (ORDER BY year ROWS BETWEEN 3 PRECEDING AND 0 FOLLOWING) FROM t_oil WHERE country = ’USA’; the aggregate can be used just like any other aggregate in the system Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 71. Thank you for your attention Cybertec Sch¨onig & Sch¨onig GmbH Gr¨ohrm¨uhlgasse 26 A-2700 Wiener Neustadt www.postgresql-support.de Hans-J¨urgen Sch¨onig www.postgresql-support.de