SF PostgreSQL User Group cstore presentation

cstore_fdw – Columnar store
for analytic workloads
Hadi Moshayedi &
Ozgun Erdogan

What is CitusDB?
• CitusDB is a scalable analytics database that
extends PostgreSQL
– Citus shards your data and automatically parallelizes
your queries
– Citus isn’t a fork of Postgres. Rather, it hooks onto the
planner and executor for distributed query execution.
– Always rebased to newest Postgres version
– Natively supports new data types and extensions

A C
D G
worker node #1
(extended PostgreSQL)
worker node #2
A
worker node #3
. . . .
1 shard =
1 Postgres
table
master node
shard and shard
placement metadata

Talk Overview
1. Why customers want columnar stores
2. Live demo
3. Optimized Row Columnar (ORC) format
4. PostgreSQL benefits
5. New benchmark numbers

Id Sz Ln Ht … … … … … … … … … … …
1 4 3 4 … … … … … … … … … … …
2 4 11 3 … … … … … … … … … … …
3 1 4 2 … … … … … … … … … … …
4 8 4 12 … … … … … … … … … … …
…
4
…
… … … … … … … … … … … … … …
4
…
… … … … … … … … … … … … … …
4 … … … … … … … … … … … … … …
30M
rows
700 columns

Example SQL query
SELECT
id, AVG(price), MAX(price)
FROM
items
WHERE
quantity > 100 AND
last_stock_date < ‘2013-10-01’
GROUP BY
weight;

Id … price … … quant … … last_stm … … … … … weight
1 … 3.90 … … 31 … … 2013-… … … … … … 0.6
2 … 13 … … 70 … … 2010-… … … … … … 0.8
3 … 4.25 … … 432 … … 2013-… … … … … … 1
4 … 4 … … 45 … … 2013-… … … … … … 6
…
4… … 95 … … 37 … … 2013-… … … … … … 0.6
4… … 59 … … 90 … … 2012-… … … … … … 1.5
Row-oriented store

Cost of row storage
• Read 700 columns instead of 5
• >39 GB of unnecessary I/O
Input Type Estimated Input
Rate
Cost to query
performance
Memory 10 GB/s 3.9 seconds
SSD 600 MB/s >60 seconds

Id sz price … … quant … … last_stm … … … … … weight
1 4 3.90 … … 31 … … 2013-… … … … … … 0.6
2 3 13 … … 70 … … 2010-… … … … … … 0.8
3 2 4.25 … … 432 … … 2013-… … … … … … 1
4 4 4 … … 45 … … 2013-… … … … … … 6
…
4… 19 95 … … 37 … … 2013-… … … … … … 0.6
4… 2 59 … … 90 … … 2012-… … … … … … 1.5
Column-oriented store

Column-oriented store
Id sz price … … quant … … last_stm … … … … … weight
1 4 3.90 … … 31 … … 2013-… … … … … … 0.6
2 3 13 … … 70 … … 2010-… … … … … … 0.8
3 2 4.25 … … 432 … … 2013-… … … … … … 1
4 4 4 … … 45 … … 2013-… … … … … … 6
…
4… 19 95 … … 37 … … 2013-… … … … … … 0.6
4… 2 59 … … 90 … … 2012-… … … … … … 1.5

Columnar Store Motivation
• Read subset of columns to reduce I/O
• Better compression
– Less disk usage
– Less disk I/O

State of the Columnar Store
1. Fork a popular database, swap in your
storage engine, and never look back
2. Develop an open columnar store format for
the Hadoop Distributed Filesystem (HDFS)
3. Use PostgreSQL extension machinery for in-
memory stores / external databases

Columnar Store Specs
• Record Columnar File (RCFile)
– Facebook, OSU, and Chinese Academy of Sciences
– First horizontally-partition, then vertically-partition
• ORC (Optimized RCFile)
– Second generation. Developed by Hortonworks and
Facebook
– Lightweight indexes stored within the file
– Different compression methods within the same file

ORC File Layout benefits
1. Columnar layout – reads columns only
related to the query
2. Compression – groups column values
(10K) together and compresses them
3. Skip indexes – applies predicate filtering
to skip over unrelated values

Block 1
Block 2
Block 3
Block 4
Block 5
Block 6
Block 7
150K rows
(configurable)
150K rows
(configurable) 10K column values
(configurable) per
block

Compression
• Current compression method is PG_LZ
from PostgreSQL core
• Easy to add new compression methods
depending on the CPU / disk trade-off
• cstore_fdw enables using different
compression methods at the column block
level

Skip Indexes
• For each column block (10K), cstore_fdw
also records min/max values in a skip
index.
• When the user runs a query, we extract all
filter clauses from the query.
• For example, the query specifies quantity
> 100 And last_stock_date < ‘2013-10-01’.

Skip Indexes
• We then use Postgres’ constraint exclusion
mechanism to decide whether to skip over 10K
rows.
• For each filter clause, we create and apply a
constraint. The awesome thing about using
PostgreSQL is that we don’t need to write any code.
• If input data has an inherent time dimension, that
helps. Sorting input data also helps with skip
indexes.

Drawbacks to ORC
• Support for only eight data types. Each
data type further needs to have a separate
code path for min/max value collection and
constraint exclusion.
• Gathering statistics from the data and
table JOINs are an afterthought.

1. Simply use PostgreSQL
data types’ datum
representation.
2. Avoid deserialization
overhead.
3. Support user-defined
types as well.

Statistics Collection
• FDWs provide an API to collect random samples
from data. Users need to manually run Analyze.
• Postgres then constructs histograms, most
common value frequencies, and other stats.
• cstore_fdw estimates query costs for different
access paths based on these statistics. *
• Informed resource usage. Better join order and
join method selection.

Recent Benchmark Results
• TPC-H is a standard benchmark
• Performed in-memory, SSD, and HDD
tests on 10 GB of data
• Used m2.2xlarge and m3.2xlarge on EC2
• Compared vanilla PostgreSQL, CStore,
CStore with compression

10GB of uncached data on m2.2xlarge

10GB of uncached data on m3.2xlarge

Total issued disk I/O measures with iotop

10GB of cached data on m2/m3.2xlarge

Future Work
• CStore is an open source project actively in
development: github.com/citusdata/cstore_fdw
– Improve memory usage
– Automatically determining paths for data files
– Native Delete / Insert / Update support
– Improve read query performance (vectorized execution)
– Different compression codecs
– Many more; contribute to the discussion on GitHub!

Summary
• CStore: Open source columnar store fdw for Postgres
• Data layout is based on ORC
1 Columnar data layout per stripe
2 Supports different compression codecs
3 Skip indexes enable predicate filtering
• Uses foreign wrapper APIs
1 Supports all PostgreSQL data types
2 Statistics collection for better query plans
3 Load extension. Create Table. Copy

cstore_fdw – Columnar Store
for Analytic Workloads
Hadi Moshayedi – hadi@citusdata.com
Ozgun Erdogan – ozgun@citusdata.com

SF PostgreSQL User Group cstore presentation

Recommended

Recommended

More Related Content

Similar to SF PostgreSQL User Group cstore presentation

Similar to SF PostgreSQL User Group cstore presentation (14)

More from Citus Data

More from Citus Data (20)

SF PostgreSQL User Group cstore presentation

Editor's Notes