SlideShare a Scribd company logo
1 of 35
cstore_fdw – Columnar store
for analytic workloads
Hadi Moshayedi &
Ozgun Erdogan
What is CitusDB?
• CitusDB is a scalable analytics database that
extends PostgreSQL
– Citus shards your data and automatically parallelizes
your queries
– Citus isn’t a fork of Postgres. Rather, it hooks onto the
planner and executor for distributed query execution.
– Always rebased to newest Postgres version
– Natively supports new data types and extensions
A C
D G
worker node #1
(extended PostgreSQL)
worker node #2
(extended PostgreSQL)
A
worker node #3
(extended PostgreSQL)
. . . .
1 shard =
1 Postgres
table
master node
(extended PostgreSQL)
shard and shard
placement metadata
Talk Overview
1. Why customers want columnar stores
2. Live demo
3. Optimized Row Columnar (ORC) format
4. PostgreSQL benefits
5. New benchmark numbers
Id Sz Ln Ht … … … … … … … … … … …
1 4 3 4 … … … … … … … … … … …
2 4 11 3 … … … … … … … … … … …
3 1 4 2 … … … … … … … … … … …
4 8 4 12 … … … … … … … … … … …
…
4
…
… … … … … … … … … … … … … …
4
…
… … … … … … … … … … … … … …
4 … … … … … … … … … … … … … …
30M
rows
700 columns
Example SQL query
SELECT
id, AVG(price), MAX(price)
FROM
items
WHERE
quantity > 100 AND
last_stock_date < ‘2013-10-01’
GROUP BY
weight;
Id … price … … quant … … last_stm … … … … … weight
1 … 3.90 … … 31 … … 2013-… … … … … … 0.6
2 … 13 … … 70 … … 2010-… … … … … … 0.8
3 … 4.25 … … 432 … … 2013-… … … … … … 1
4 … 4 … … 45 … … 2013-… … … … … … 6
…
4… … 95 … … 37 … … 2013-… … … … … … 0.6
4… … 59 … … 90 … … 2012-… … … … … … 1.5
Row-oriented store
Id … price … … quant … … last_stm … … … … … weight
1 … 3.90 … … 31 … … 2013-… … … … … … 0.6
2 … 13 … … 70 … … 2010-… … … … … … 0.8
3 … 4.25 … … 432 … … 2013-… … … … … … 1
4 … 4 … … 45 … … 2013-… … … … … … 6
…
4… … 95 … … 37 … … 2013-… … … … … … 0.6
4… … 59 … … 90 … … 2012-… … … … … … 1.5
Row-oriented store
Id … price … … quant … … last_stm … … … … … weight
1 … 3.90 … … 31 … … 2013-… … … … … … 0.6
2 … 13 … … 70 … … 2010-… … … … … … 0.8
3 … 4.25 … … 432 … … 2013-… … … … … … 1
4 … 4 … … 45 … … 2013-… … … … … … 6
…
4… … 95 … … 37 … … 2013-… … … … … … 0.6
4… … 59 … … 90 … … 2012-… … … … … … 1.5
Row-oriented store
Id … price … … quant … … last_stm … … … … … weight
1 … 3.90 … … 31 … … 2013-… … … … … … 0.6
2 … 13 … … 70 … … 2010-… … … … … … 0.8
3 … 4.25 … … 432 … … 2013-… … … … … … 1
4 … 4 … … 45 … … 2013-… … … … … … 6
…
4… … 95 … … 37 … … 2013-… … … … … … 0.6
4… … 59 … … 90 … … 2012-… … … … … … 1.5
Row-oriented store
Cost of row storage
• Read 700 columns instead of 5
• >39 GB of unnecessary I/O
Input Type Estimated Input
Rate
Cost to query
performance
Memory 10 GB/s 3.9 seconds
SSD 600 MB/s >60 seconds
Example SQL query
SELECT
id, AVG(price), MAX(price)
FROM
items
WHERE
quantity > 100 AND
last_stock_date < ‘2013-10-01’
GROUP BY
weight;
Id sz price … … quant … … last_stm … … … … … weight
1 4 3.90 … … 31 … … 2013-… … … … … … 0.6
2 3 13 … … 70 … … 2010-… … … … … … 0.8
3 2 4.25 … … 432 … … 2013-… … … … … … 1
4 4 4 … … 45 … … 2013-… … … … … … 6
…
4… 19 95 … … 37 … … 2013-… … … … … … 0.6
4… 2 59 … … 90 … … 2012-… … … … … … 1.5
Column-oriented store
Column-oriented store
Id sz price … … quant … … last_stm … … … … … weight
1 4 3.90 … … 31 … … 2013-… … … … … … 0.6
2 3 13 … … 70 … … 2010-… … … … … … 0.8
3 2 4.25 … … 432 … … 2013-… … … … … … 1
4 4 4 … … 45 … … 2013-… … … … … … 6
…
4… 19 95 … … 37 … … 2013-… … … … … … 0.6
4… 2 59 … … 90 … … 2012-… … … … … … 1.5
Column-oriented store
Id sz price … … quant … … last_stm … … … … … weight
1 4 3.90 … … 31 … … 2013-… … … … … … 0.6
2 3 13 … … 70 … … 2010-… … … … … … 0.8
3 2 4.25 … … 432 … … 2013-… … … … … … 1
4 4 4 … … 45 … … 2013-… … … … … … 6
…
4… 19 95 … … 37 … … 2013-… … … … … … 0.6
4… 2 59 … … 90 … … 2012-… … … … … … 1.5
Columnar Store Motivation
• Read subset of columns to reduce I/O
• Better compression
– Less disk usage
– Less disk I/O
State of the Columnar Store
1. Fork a popular database, swap in your
storage engine, and never look back
2. Develop an open columnar store format for
the Hadoop Distributed Filesystem (HDFS)
3. Use PostgreSQL extension machinery for in-
memory stores / external databases
Columnar Store Specs
• Record Columnar File (RCFile)
– Facebook, OSU, and Chinese Academy of Sciences
– First horizontally-partition, then vertically-partition
• ORC (Optimized RCFile)
– Second generation. Developed by Hortonworks and
Facebook
– Lightweight indexes stored within the file
– Different compression methods within the same file
ORC File Layout benefits
1. Columnar layout – reads columns only
related to the query
2. Compression – groups column values
(10K) together and compresses them
3. Skip indexes – applies predicate filtering
to skip over unrelated values
Block 1
Block 2
Block 3
Block 4
Block 5
Block 6
Block 7
150K rows
(configurable)
150K rows
(configurable) 10K column values
(configurable) per
block
Compression
• Current compression method is PG_LZ
from PostgreSQL core
• Easy to add new compression methods
depending on the CPU / disk trade-off
• cstore_fdw enables using different
compression methods at the column block
level
Table sizes normalized to 1.0
Skip Indexes
• For each column block (10K), cstore_fdw
also records min/max values in a skip
index.
• When the user runs a query, we extract all
filter clauses from the query.
• For example, the query specifies quantity
> 100 And last_stock_date < ‘2013-10-01’.
Skip Indexes
• We then use Postgres’ constraint exclusion
mechanism to decide whether to skip over 10K
rows.
• For each filter clause, we create and apply a
constraint. The awesome thing about using
PostgreSQL is that we don’t need to write any code.
• If input data has an inherent time dimension, that
helps. Sorting input data also helps with skip
indexes.
Drawbacks to ORC
• Support for only eight data types. Each
data type further needs to have a separate
code path for min/max value collection and
constraint exclusion.
• Gathering statistics from the data and
table JOINs are an afterthought.
1. Simply use PostgreSQL
data types’ datum
representation.
2. Avoid deserialization
overhead.
3. Support user-defined
types as well.
Statistics Collection
• FDWs provide an API to collect random samples
from data. Users need to manually run Analyze.
• Postgres then constructs histograms, most
common value frequencies, and other stats.
• cstore_fdw estimates query costs for different
access paths based on these statistics. *
• Informed resource usage. Better join order and
join method selection.
Recent Benchmark Results
• TPC-H is a standard benchmark
• Performed in-memory, SSD, and HDD
tests on 10 GB of data
• Used m2.2xlarge and m3.2xlarge on EC2
• Compared vanilla PostgreSQL, CStore,
CStore with compression
10GB of uncached data on m2.2xlarge
10GB of uncached data on m3.2xlarge
Total issued disk I/O measures with iotop
10GB of cached data on m2/m3.2xlarge
Future Work
• CStore is an open source project actively in
development: github.com/citusdata/cstore_fdw
– Improve memory usage
– Automatically determining paths for data files
– Native Delete / Insert / Update support
– Improve read query performance (vectorized execution)
– Different compression codecs
– Many more; contribute to the discussion on GitHub!
Summary
• CStore: Open source columnar store fdw for Postgres
• Data layout is based on ORC
1 Columnar data layout per stripe
2 Supports different compression codecs
3 Skip indexes enable predicate filtering
• Uses foreign wrapper APIs
1 Supports all PostgreSQL data types
2 Statistics collection for better query plans
3 Load extension. Create Table. Copy
cstore_fdw – Columnar Store
for Analytic Workloads
Hadi Moshayedi – hadi@citusdata.com
Ozgun Erdogan – ozgun@citusdata.com

More Related Content

Similar to SF PostgreSQL User Group cstore presentation

Large Scale Multilayer Perceptron
Large Scale Multilayer PerceptronLarge Scale Multilayer Perceptron
Large Scale Multilayer PerceptronSascha Jonas
 
Smile2 Office auf dem iPad
Smile2 Office auf dem iPadSmile2 Office auf dem iPad
Smile2 Office auf dem iPadjekel & team
 
Evaluierungsmodell
EvaluierungsmodellEvaluierungsmodell
Evaluierungsmodelloneduphine
 
Evaluierungsmodell
EvaluierungsmodellEvaluierungsmodell
Evaluierungsmodelloneduphine
 
The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...
The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...
The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...Bernhard Seilz
 
Inhaltsverzeichnis: amzn.to/emailBuch
Inhaltsverzeichnis: amzn.to/emailBuchInhaltsverzeichnis: amzn.to/emailBuch
Inhaltsverzeichnis: amzn.to/emailBuchRene Kulka
 
HTML5 und CSS3 Übersicht
HTML5 und CSS3 ÜbersichtHTML5 und CSS3 Übersicht
HTML5 und CSS3 ÜbersichtSven Brencher
 
Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...
Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...
Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...Kulturplanner
 
Wettbewerbsanalyse - Blick ins Buch (Auszug)
Wettbewerbsanalyse - Blick ins Buch (Auszug)Wettbewerbsanalyse - Blick ins Buch (Auszug)
Wettbewerbsanalyse - Blick ins Buch (Auszug)ACRASIO
 
Master thesis pascal_mueller01
Master thesis pascal_mueller01Master thesis pascal_mueller01
Master thesis pascal_mueller01guest39ce4e
 
Linux advanced
Linux advancedLinux advanced
Linux advancedheiko.vogl
 
Analyse wissenschaftlicher Publikationen
Analyse wissenschaftlicher PublikationenAnalyse wissenschaftlicher Publikationen
Analyse wissenschaftlicher Publikationenadrianwilke
 
Einsteiger zertifizierung des LPI
Einsteiger zertifizierung des LPIEinsteiger zertifizierung des LPI
Einsteiger zertifizierung des LPIMichael M. Bosbach
 

Similar to SF PostgreSQL User Group cstore presentation (14)

Large Scale Multilayer Perceptron
Large Scale Multilayer PerceptronLarge Scale Multilayer Perceptron
Large Scale Multilayer Perceptron
 
Smile2 Office auf dem iPad
Smile2 Office auf dem iPadSmile2 Office auf dem iPad
Smile2 Office auf dem iPad
 
Evaluierungsmodell
EvaluierungsmodellEvaluierungsmodell
Evaluierungsmodell
 
Evaluierungsmodell
EvaluierungsmodellEvaluierungsmodell
Evaluierungsmodell
 
The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...
The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...
The book I wrote: You can order it:http://www.amazon.de/Teile-Lagermanagement...
 
Inhaltsverzeichnis: amzn.to/emailBuch
Inhaltsverzeichnis: amzn.to/emailBuchInhaltsverzeichnis: amzn.to/emailBuch
Inhaltsverzeichnis: amzn.to/emailBuch
 
HTML5 und CSS3 Übersicht
HTML5 und CSS3 ÜbersichtHTML5 und CSS3 Übersicht
HTML5 und CSS3 Übersicht
 
Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...
Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...
Kulturplanner Impulse- Big Data als Basis für den IT- gestützten "Forecast" d...
 
Wettbewerbsanalyse - Blick ins Buch (Auszug)
Wettbewerbsanalyse - Blick ins Buch (Auszug)Wettbewerbsanalyse - Blick ins Buch (Auszug)
Wettbewerbsanalyse - Blick ins Buch (Auszug)
 
Master thesis pascal_mueller01
Master thesis pascal_mueller01Master thesis pascal_mueller01
Master thesis pascal_mueller01
 
mabio
mabiomabio
mabio
 
Linux advanced
Linux advancedLinux advanced
Linux advanced
 
Analyse wissenschaftlicher Publikationen
Analyse wissenschaftlicher PublikationenAnalyse wissenschaftlicher Publikationen
Analyse wissenschaftlicher Publikationen
 
Einsteiger zertifizierung des LPI
Einsteiger zertifizierung des LPIEinsteiger zertifizierung des LPI
Einsteiger zertifizierung des LPI
 

More from Citus Data

Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Citus Data
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
 
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...Citus Data
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Citus Data
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensCitus Data
 
When it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberWhen it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberCitus Data
 
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncCitus Data
 
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...Citus Data
 
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisDeep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisCitus Data
 
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Citus Data
 
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncCitus Data
 
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Citus Data
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineCitus Data
 
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Citus Data
 
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberWhen it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberCitus Data
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineCitus Data
 
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Citus Data
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineCitus Data
 
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberWhen it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberCitus Data
 
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoWhy PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoCitus Data
 

More from Citus Data (20)

Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
 
When it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will LeinweberWhen it all goes wrong | PGConf EU 2019 | Will Leinweber
When it all goes wrong | PGConf EU 2019 | Will Leinweber
 
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
 
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
 
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisDeep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
 
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
 
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
 
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
 
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
 
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will LeinweberWhen it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
 
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
 
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will LeinweberWhen it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
 
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoWhy PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
 

SF PostgreSQL User Group cstore presentation

  • 1. cstore_fdw – Columnar store for analytic workloads Hadi Moshayedi & Ozgun Erdogan
  • 2. What is CitusDB? • CitusDB is a scalable analytics database that extends PostgreSQL – Citus shards your data and automatically parallelizes your queries – Citus isn’t a fork of Postgres. Rather, it hooks onto the planner and executor for distributed query execution. – Always rebased to newest Postgres version – Natively supports new data types and extensions
  • 3. A C D G worker node #1 (extended PostgreSQL) worker node #2 (extended PostgreSQL) A worker node #3 (extended PostgreSQL) . . . . 1 shard = 1 Postgres table master node (extended PostgreSQL) shard and shard placement metadata
  • 4. Talk Overview 1. Why customers want columnar stores 2. Live demo 3. Optimized Row Columnar (ORC) format 4. PostgreSQL benefits 5. New benchmark numbers
  • 5. Id Sz Ln Ht … … … … … … … … … … … 1 4 3 4 … … … … … … … … … … … 2 4 11 3 … … … … … … … … … … … 3 1 4 2 … … … … … … … … … … … 4 8 4 12 … … … … … … … … … … … … 4 … … … … … … … … … … … … … … … 4 … … … … … … … … … … … … … … … 4 … … … … … … … … … … … … … … 30M rows 700 columns
  • 6. Example SQL query SELECT id, AVG(price), MAX(price) FROM items WHERE quantity > 100 AND last_stock_date < ‘2013-10-01’ GROUP BY weight;
  • 7. Id … price … … quant … … last_stm … … … … … weight 1 … 3.90 … … 31 … … 2013-… … … … … … 0.6 2 … 13 … … 70 … … 2010-… … … … … … 0.8 3 … 4.25 … … 432 … … 2013-… … … … … … 1 4 … 4 … … 45 … … 2013-… … … … … … 6 … 4… … 95 … … 37 … … 2013-… … … … … … 0.6 4… … 59 … … 90 … … 2012-… … … … … … 1.5 Row-oriented store
  • 8. Id … price … … quant … … last_stm … … … … … weight 1 … 3.90 … … 31 … … 2013-… … … … … … 0.6 2 … 13 … … 70 … … 2010-… … … … … … 0.8 3 … 4.25 … … 432 … … 2013-… … … … … … 1 4 … 4 … … 45 … … 2013-… … … … … … 6 … 4… … 95 … … 37 … … 2013-… … … … … … 0.6 4… … 59 … … 90 … … 2012-… … … … … … 1.5 Row-oriented store
  • 9. Id … price … … quant … … last_stm … … … … … weight 1 … 3.90 … … 31 … … 2013-… … … … … … 0.6 2 … 13 … … 70 … … 2010-… … … … … … 0.8 3 … 4.25 … … 432 … … 2013-… … … … … … 1 4 … 4 … … 45 … … 2013-… … … … … … 6 … 4… … 95 … … 37 … … 2013-… … … … … … 0.6 4… … 59 … … 90 … … 2012-… … … … … … 1.5 Row-oriented store
  • 10. Id … price … … quant … … last_stm … … … … … weight 1 … 3.90 … … 31 … … 2013-… … … … … … 0.6 2 … 13 … … 70 … … 2010-… … … … … … 0.8 3 … 4.25 … … 432 … … 2013-… … … … … … 1 4 … 4 … … 45 … … 2013-… … … … … … 6 … 4… … 95 … … 37 … … 2013-… … … … … … 0.6 4… … 59 … … 90 … … 2012-… … … … … … 1.5 Row-oriented store
  • 11. Cost of row storage • Read 700 columns instead of 5 • >39 GB of unnecessary I/O Input Type Estimated Input Rate Cost to query performance Memory 10 GB/s 3.9 seconds SSD 600 MB/s >60 seconds
  • 12. Example SQL query SELECT id, AVG(price), MAX(price) FROM items WHERE quantity > 100 AND last_stock_date < ‘2013-10-01’ GROUP BY weight;
  • 13. Id sz price … … quant … … last_stm … … … … … weight 1 4 3.90 … … 31 … … 2013-… … … … … … 0.6 2 3 13 … … 70 … … 2010-… … … … … … 0.8 3 2 4.25 … … 432 … … 2013-… … … … … … 1 4 4 4 … … 45 … … 2013-… … … … … … 6 … 4… 19 95 … … 37 … … 2013-… … … … … … 0.6 4… 2 59 … … 90 … … 2012-… … … … … … 1.5 Column-oriented store
  • 14. Column-oriented store Id sz price … … quant … … last_stm … … … … … weight 1 4 3.90 … … 31 … … 2013-… … … … … … 0.6 2 3 13 … … 70 … … 2010-… … … … … … 0.8 3 2 4.25 … … 432 … … 2013-… … … … … … 1 4 4 4 … … 45 … … 2013-… … … … … … 6 … 4… 19 95 … … 37 … … 2013-… … … … … … 0.6 4… 2 59 … … 90 … … 2012-… … … … … … 1.5
  • 15. Column-oriented store Id sz price … … quant … … last_stm … … … … … weight 1 4 3.90 … … 31 … … 2013-… … … … … … 0.6 2 3 13 … … 70 … … 2010-… … … … … … 0.8 3 2 4.25 … … 432 … … 2013-… … … … … … 1 4 4 4 … … 45 … … 2013-… … … … … … 6 … 4… 19 95 … … 37 … … 2013-… … … … … … 0.6 4… 2 59 … … 90 … … 2012-… … … … … … 1.5
  • 16. Columnar Store Motivation • Read subset of columns to reduce I/O • Better compression – Less disk usage – Less disk I/O
  • 17. State of the Columnar Store 1. Fork a popular database, swap in your storage engine, and never look back 2. Develop an open columnar store format for the Hadoop Distributed Filesystem (HDFS) 3. Use PostgreSQL extension machinery for in- memory stores / external databases
  • 18. Columnar Store Specs • Record Columnar File (RCFile) – Facebook, OSU, and Chinese Academy of Sciences – First horizontally-partition, then vertically-partition • ORC (Optimized RCFile) – Second generation. Developed by Hortonworks and Facebook – Lightweight indexes stored within the file – Different compression methods within the same file
  • 19. ORC File Layout benefits 1. Columnar layout – reads columns only related to the query 2. Compression – groups column values (10K) together and compresses them 3. Skip indexes – applies predicate filtering to skip over unrelated values
  • 20. Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 150K rows (configurable) 150K rows (configurable) 10K column values (configurable) per block
  • 21. Compression • Current compression method is PG_LZ from PostgreSQL core • Easy to add new compression methods depending on the CPU / disk trade-off • cstore_fdw enables using different compression methods at the column block level
  • 23. Skip Indexes • For each column block (10K), cstore_fdw also records min/max values in a skip index. • When the user runs a query, we extract all filter clauses from the query. • For example, the query specifies quantity > 100 And last_stock_date < ‘2013-10-01’.
  • 24. Skip Indexes • We then use Postgres’ constraint exclusion mechanism to decide whether to skip over 10K rows. • For each filter clause, we create and apply a constraint. The awesome thing about using PostgreSQL is that we don’t need to write any code. • If input data has an inherent time dimension, that helps. Sorting input data also helps with skip indexes.
  • 25. Drawbacks to ORC • Support for only eight data types. Each data type further needs to have a separate code path for min/max value collection and constraint exclusion. • Gathering statistics from the data and table JOINs are an afterthought.
  • 26. 1. Simply use PostgreSQL data types’ datum representation. 2. Avoid deserialization overhead. 3. Support user-defined types as well.
  • 27. Statistics Collection • FDWs provide an API to collect random samples from data. Users need to manually run Analyze. • Postgres then constructs histograms, most common value frequencies, and other stats. • cstore_fdw estimates query costs for different access paths based on these statistics. * • Informed resource usage. Better join order and join method selection.
  • 28. Recent Benchmark Results • TPC-H is a standard benchmark • Performed in-memory, SSD, and HDD tests on 10 GB of data • Used m2.2xlarge and m3.2xlarge on EC2 • Compared vanilla PostgreSQL, CStore, CStore with compression
  • 29. 10GB of uncached data on m2.2xlarge
  • 30. 10GB of uncached data on m3.2xlarge
  • 31. Total issued disk I/O measures with iotop
  • 32. 10GB of cached data on m2/m3.2xlarge
  • 33. Future Work • CStore is an open source project actively in development: github.com/citusdata/cstore_fdw – Improve memory usage – Automatically determining paths for data files – Native Delete / Insert / Update support – Improve read query performance (vectorized execution) – Different compression codecs – Many more; contribute to the discussion on GitHub!
  • 34. Summary • CStore: Open source columnar store fdw for Postgres • Data layout is based on ORC 1 Columnar data layout per stripe 2 Supports different compression codecs 3 Skip indexes enable predicate filtering • Uses foreign wrapper APIs 1 Supports all PostgreSQL data types 2 Statistics collection for better query plans 3 Load extension. Create Table. Copy
  • 35. cstore_fdw – Columnar Store for Analytic Workloads Hadi Moshayedi – hadi@citusdata.com Ozgun Erdogan – ozgun@citusdata.com

Editor's Notes

  1. Columnar store for PostgreSQL Ozgun .. founder at Citus Data SF and Istanbul <short bio> Hadi did bulk of the work on the columnar store Have about 30 slides and a demo. I’ll put things into context with 2 slides on Citus Technical talk. If you have questions, please feel free to interrupt Speak slowly.
  2. When I say extends, we didn’t take a particular version of Postgres and forked from there. Instead we went from 8.4 to 9.0, etc. We used the existing API and integration points: query planner and executor hooks are an example.
  3. Let’s take an example distributed table, and see how it’s spread across the worker nodes. The yellow boxes here are shards that make up the distributed table. Worker node extensions Master node extensions 1 shard = 1 postgres table = 1 cstore table
  4. Relative ease of use: PostgreSQL config could be much simpler HDFS: NameNode / DataNode, Hadoop: JobTracker / TaskTracker, Hive: metadata server (MySQL), etc. Uses the copy hook for loading in the data
  5. TPC-H is an ad-hoc, decision support benchmark. Each table has between 10-20 columns. So not the best benchmark to demonstrate column store performance. Talk about what graphs are going to show m3.2xlarge (2 x 80G SSD, 30G ram, 4x3.25 ECU - 10G tests) m2.2xlarge (1 x 850G HDD, 34.2G ram, 4x3.25 ECU - 10G tests)
  6. Representative queries Q6: 68s -> 25s (Q3: 85s -> 44s) 1/ Reduce disk bottlenecks 2/ If you’re deploying PB scale clusters, reduces number of machines
  7. Q6: 26s -> 14s (Q3: 37s -> 26s) 1/ Reduces SSD storage costs 2/ Query performance starts increasing with CitusDB (use of multiple cores)
  8. * Q6: 9GB -> 1.8GB -> 0.8GB
  9. cstore is slightly faster. cstore with compression is slightly slower due to the compression’s CPU cost. Effective memory size increases 1/ Compression (Instead of fitting 1GB, users can now fit in 2-3GB) 2/ If queries always selects a subset of the columns, then they occupy the working set 3/ Ideally, skip indexes are always kept in memory (they get referenced on each query)
  10. Bug fixes! Better cost estimates for join operations!
  11. Questions?