The slides used at the San Francisco PostgreSQL User Group meetup (http://www.meetup.com/postgresql-1/events/178687982/). Learn about how we at Citus Data implemented a columnar store for PostgreSQL using foreign data wrappers. Features discussions on architecture and benchmark results.
2. What is CitusDB?
• CitusDB is a scalable analytics database that
extends PostgreSQL
– Citus shards your data and automatically parallelizes
your queries
– Citus isn’t a fork of Postgres. Rather, it hooks onto the
planner and executor for distributed query execution.
– Always rebased to newest Postgres version
– Natively supports new data types and extensions
3. A C
D G
worker node #1
(extended PostgreSQL)
worker node #2
(extended PostgreSQL)
A
worker node #3
(extended PostgreSQL)
. . . .
1 shard =
1 Postgres
table
master node
(extended PostgreSQL)
shard and shard
placement metadata
4. Talk Overview
1. Why customers want columnar stores
2. Live demo
3. Optimized Row Columnar (ORC) format
4. PostgreSQL benefits
5. New benchmark numbers
16. Columnar Store Motivation
• Read subset of columns to reduce I/O
• Better compression
– Less disk usage
– Less disk I/O
17. State of the Columnar Store
1. Fork a popular database, swap in your
storage engine, and never look back
2. Develop an open columnar store format for
the Hadoop Distributed Filesystem (HDFS)
3. Use PostgreSQL extension machinery for in-
memory stores / external databases
18. Columnar Store Specs
• Record Columnar File (RCFile)
– Facebook, OSU, and Chinese Academy of Sciences
– First horizontally-partition, then vertically-partition
• ORC (Optimized RCFile)
– Second generation. Developed by Hortonworks and
Facebook
– Lightweight indexes stored within the file
– Different compression methods within the same file
19. ORC File Layout benefits
1. Columnar layout – reads columns only
related to the query
2. Compression – groups column values
(10K) together and compresses them
3. Skip indexes – applies predicate filtering
to skip over unrelated values
21. Compression
• Current compression method is PG_LZ
from PostgreSQL core
• Easy to add new compression methods
depending on the CPU / disk trade-off
• cstore_fdw enables using different
compression methods at the column block
level
23. Skip Indexes
• For each column block (10K), cstore_fdw
also records min/max values in a skip
index.
• When the user runs a query, we extract all
filter clauses from the query.
• For example, the query specifies quantity
> 100 And last_stock_date < ‘2013-10-01’.
24. Skip Indexes
• We then use Postgres’ constraint exclusion
mechanism to decide whether to skip over 10K
rows.
• For each filter clause, we create and apply a
constraint. The awesome thing about using
PostgreSQL is that we don’t need to write any code.
• If input data has an inherent time dimension, that
helps. Sorting input data also helps with skip
indexes.
25. Drawbacks to ORC
• Support for only eight data types. Each
data type further needs to have a separate
code path for min/max value collection and
constraint exclusion.
• Gathering statistics from the data and
table JOINs are an afterthought.
26. 1. Simply use PostgreSQL
data types’ datum
representation.
2. Avoid deserialization
overhead.
3. Support user-defined
types as well.
27. Statistics Collection
• FDWs provide an API to collect random samples
from data. Users need to manually run Analyze.
• Postgres then constructs histograms, most
common value frequencies, and other stats.
• cstore_fdw estimates query costs for different
access paths based on these statistics. *
• Informed resource usage. Better join order and
join method selection.
28. Recent Benchmark Results
• TPC-H is a standard benchmark
• Performed in-memory, SSD, and HDD
tests on 10 GB of data
• Used m2.2xlarge and m3.2xlarge on EC2
• Compared vanilla PostgreSQL, CStore,
CStore with compression
33. Future Work
• CStore is an open source project actively in
development: github.com/citusdata/cstore_fdw
– Improve memory usage
– Automatically determining paths for data files
– Native Delete / Insert / Update support
– Improve read query performance (vectorized execution)
– Different compression codecs
– Many more; contribute to the discussion on GitHub!
34. Summary
• CStore: Open source columnar store fdw for Postgres
• Data layout is based on ORC
1 Columnar data layout per stripe
2 Supports different compression codecs
3 Skip indexes enable predicate filtering
• Uses foreign wrapper APIs
1 Supports all PostgreSQL data types
2 Statistics collection for better query plans
3 Load extension. Create Table. Copy
35. cstore_fdw – Columnar Store
for Analytic Workloads
Hadi Moshayedi – hadi@citusdata.com
Ozgun Erdogan – ozgun@citusdata.com
Editor's Notes
Columnar store for PostgreSQL
Ozgun .. founder at Citus Data
SF and Istanbul <short bio>
Hadi did bulk of the work on the columnar store
Have about 30 slides and a demo. I’ll put things into context with 2 slides on Citus
Technical talk. If you have questions, please feel free to interrupt
Speak slowly.
When I say extends, we didn’t take a particular version of Postgres and forked from there. Instead we went from 8.4 to 9.0, etc.
We used the existing API and integration points: query planner and executor hooks are an example.
Let’s take an example distributed table, and see how it’s spread across the worker nodes.
The yellow boxes here are shards that make up the distributed table.
Worker node extensions
Master node extensions
1 shard = 1 postgres table = 1 cstore table
Relative ease of use: PostgreSQL config could be much simpler
HDFS: NameNode / DataNode, Hadoop: JobTracker / TaskTracker, Hive: metadata server (MySQL), etc.
Uses the copy hook for loading in the data
TPC-H is an ad-hoc, decision support benchmark.
Each table has between 10-20 columns. So not the best benchmark to demonstrate column store performance.
Talk about what graphs are going to show
m3.2xlarge (2 x 80G SSD, 30G ram, 4x3.25 ECU - 10G tests)
m2.2xlarge (1 x 850G HDD, 34.2G ram, 4x3.25 ECU - 10G tests)
Representative queries
Q6: 68s -> 25s (Q3: 85s -> 44s)
1/ Reduce disk bottlenecks
2/ If you’re deploying PB scale clusters, reduces number of machines
cstore is slightly faster. cstore with compression is slightly slower due to the compression’s CPU cost.
Effective memory size increases
1/ Compression (Instead of fitting 1GB, users can now fit in 2-3GB)
2/ If queries always selects a subset of the columns, then they occupy the working set
3/ Ideally, skip indexes are always kept in memory (they get referenced on each query)
Bug fixes!
Better cost estimates for join operations!