This presentation discusses how Postgres-XC can be used as a PostgreSQL-based key-value store using features like hstore and JSON. It also compares performance to MongoDB for a read workload
2. Who Am I?
Mason Sharp
●
Original architect of Stado / GridSQL
●
One of original architects of Postgres-XC
●
Former architect at EnterpriseDB
●
Co-organizer of NYC PostgreSQL User Group
●
Co-founder and CTO of
3. Agenda
●
Why use a key-value store?
●
PostgreSQL features
●
XML
●
hstore
●
JSON
●
Postgres-XC Overview
●
Measurements: MongoDB versus Postgres-XC
4. Agenda
●
Why use a key-value store?
●
PostgreSQL features
●
XML
●
hstore
●
JSON
●
Postgres-XC Overview
●
Measurements: MongoDB versus Postgres-XC
5. Why Use a Key-Value Store?
●
Document oriented vs. row oriented
●
Unstructured data
●
Semi-structured data
●
Self-describing / schema-less
●
Uses Tags
●
Dynamic attributes for different objects
●
Dwight Merriman, CEO 10gen (paraphrasing):
●
“Some customers use MongoDB just for the schema-
less features. They don't need the scalability and
run on one single server” (!)
●
“Easier for developers” (...)
6. Why Use a Key-Value Store? (2)
●
Key-value makes for an easy distributed store
●
Multiple servers
●
In-memory
●
No complicated schema changes
●
But PostgreSQL's ALTER TABLE exclusive locks
may be brief
●
Need to be “web-scale”
●
Perception that it scales better
●
What if it no longer fits in memory?
●
A series of unfortunate anecdotes
8. XML
●
--with-libxml at build time
●
Native data type
●
CREATE TABLE foo (myid int, data xml)
●
Validation
INSERT INTO foo VALUES (2, '<aaa');
ERROR: invalid XML content
Detail: line 1: Couldn't find end of Start Tag
aaa line 1
●
Xpath
●
Mapping & Export functions
11. hstore
SELECT hdata->'name' FROM foo WHERE id = 10;
?column?
----------
fred
(1 row)
# Extract all department values where it is an attribute
SELECT hdata->'department'
FROM foo
WHERE hdata ? 'department';
13. hstore
# Get a list of unique keys
SELECT DISTINCT (each(hdata)).key
FROM foo
14. hstore - Indexes
●
Btree index only helps with '='
●
Gin and gist indexes will help with operators
●
@> left operand contains right
●
? contains key
●
?& contains all keys in array
●
?| contains at least one key in array
●
Can create index on custom function
●
Extract a particular key value
16. JSON – looking ahead to
PostgreSQL 9.3
●
PostgreSQL 9.3
●
json_agg
●
hstore_to_json
●
hstore_to_json_loose
●
… and much more
http://www.postgresql.org/docs/devel/static/
functions-json.html
17. Composite Type
CREATE TYPE address AS (
street TEXT,
city TEXT,
state TEXT,
zip CHAR(10));
CREATE TABLE customer (
full_name TEXT,
mail_address address);
19. 19
●
PostgreSQL-based database cluster
Same API to Apps as PostgreSQL
• Same drivers
●
Symmetric Multi-headed Cluster
No master, no slave
• Not just PostgreSQL replication.
• Application can read/write to any coordinator server
Consistent database view to all the transactions
• Complete ACID property to all the transactions in the cluster
●
Scales both for Write and Read
21. Sep 20, 2012 Postgres-XC 21
Postgres-XC Cluster
Coordinator
Data Node
PG-XC Server
Coordinator
Data Node
Coordinator
Data Node
Coordinator
Data Node
・・・・・
Communication amongPG-XC servers
Add PG-XC servers as
needed
Global Transaction
Manager
Application can connect to any server to have the same database view and service.
GTM
PG-XC Server PG-XC Server PG-XC Server
22. Coordinator Overview
●
Based on PostgreSQL
●
Accepts connections from clients
●
Parses and plans requests
●
Interacts with Global Transaction Manager
●
Uses pooler for Data Node connections
●
Sends down XIDs and snapshots to Data Nodes
●
Collects results and returns to client
●
Uses two phase commit if necessary
22
23. Data Node Overview
●
Based on PostgreSQL
●
Where user created data is actually stored
●
Coordinators (not clients) connects to Data
Nodes
●
Accepts XID and snapshots from Coordinator
●
The rest is fairly similar to vanilla PostgreSQL
23
26. GTM Proxy
●
Runs on other nodes
●
Groups requests together
●
Reduces number of connections to GTM
●
Reduces traffic to GTM
26
27. Sep 20, 2012 Postgres-XC 27
Summary
● Coordinator
● Visible to apps
● SQL analysis, planning, execution
● Connection pooling
● Datanode (or simply “NODE”)
● Actual database store
● Local SQL execution
● GTM (Global Transaction Manager)
● Provides consistent database view to transactions
– GXID (Global Transaction ID)
– Snapshot (List of active transactions)
– Other global values such as SEQUENCE
● GTM Proxy, integrates server-local transaction requirement for performance
Postgres-XC core, based upon
vanilla PostgreSQL
Share same binary
May want to colocate
Different binaries
28. MongoDB vs Postgres-XC
Performance Comparison
●
Three data nodes (16GB RAM each)
●
Postgres-XC also used a coordinator
●
Adds latency
●
Out-of-the-box default configuration
●
No replicas
29. Insert Comparison – single thread
●
0 – 1M Rows
●
MongoDB: 7m 06s
●
Postgres-XC: 131m 1s
●
Postgres-XC COPY: 43s
●
10M – 20M Rows
●
MongoDB: 64m 48
●
Postgres-XC: 354m 56s
GTM in XC adds a lot of latency hurting
single-threaded performance
33. Possible Future Tests
●
Insert,Select concurrency test (important)
●
Mixed workload
●
Measure in-memory and not in-memory
●
Impact of replicas for availability
●
MongoDB replicas
●
Postgres-XC streaming replication
●
Have seen about 15% perf drop for two sync slaves
●
MongoDB Write-Concern durability settings (try
journaled)
●
Hstore
34. Other PostgreSQL Results?
●
Christophe Pettus:
wiki.postgresql.org/images/b/b4/Pg-as-nosql-
pgday-fosdem-2013.pdf
●
Single laptop-based tests, but interesting
●
35. Summary
●
PostgreSQL has schema-less functionality built-
in and can act as a key-value store
●
Postgres-XC can scale this out horizontally to
multiple servers
●
MongoDB performs much better for low
concurrency for inserts
●
In XC, use COPY or multiple threads to populate
●
Postgres-XC performs better for non-partitioned
indexed access
●
Postgres-XC can perform about the same to
MongoDB for reads
36. Summary (2)
If Postgres-XC generally performs similarly to
MongoDB, why not use XC and
●
Stick with ACID
●
Feel secure with PostgreSQL maturity
●
Leverage PostgreSQL features and community