PostgreSQL as NoSQL

H I M A N C H A L I
T E C H L E A D D B E @ I N M O B I
PostgreSQL as NoSQL

What’s on the plate today ?
 What is the requirement for Schemaless database ?
 Data type and volume that has to be handled.
 How can we use PostgreSQL to store Schemaless
data ?
 Performance of PostgreSQL Schemaless option.

What is Schemaless data ?
 It doesn’t mean unstructured, but it’s structured over
each document
 Each Document is a pair of key-values in a
hierarchical structure of arrays
 The application needs to be aware about this
structure and handle it if it’s not getting as expected
 Modern Schemaless DBs don’t use SQL and hence
are termed NoSQL DB. (confusing term)

Requirement for Schemaless data
 Introduction of new reporting data for merchant
where dimension is not fixed
 Currently there are 15 known fields, where data
varies from 5-15 for different merchants
 In future , chances of new dimension addition
 Data size to start with is 1 million per hour

Issue with current schema DB
 Fixed schema definition
 Too many null values
 Adding new column will need schema changes
 May be updates in old rows will be required
So, should we move to NoSQL ??

If PostgreSQL can solve it !!
 3 different ways provided by PostgreSQL for
Document types
 XML
 hstore
 JSON

hstore : Smart contrib module
 A hierarchical storage type for PostgreSQL
 Key Value store with ACID compliance
 Maps String Keys to String Values or other hstore values
 Rich in its own functions like..
 h -> “a”
 h?->”a”
 h@>”a->2”
 Indexing available : GiST and GIN
 Indexes whole hierarchy , not just a key
 Expression indexes are also supported

hstore continues….
CREATE TABLE my_store
(
id character varying(1024) NOT NULL,
doc hstore,
CONSTRAINT my_store_pkey PRIMARY KEY (id)
);
CREATE INDEX my_store_doc_idx_gist
ON my_store
USING gist(doc) ;
SELECT doc -> ‘text’ as merchant, doc -> ‘created_at’ as created_at
FROM my_store
WHERE doc @> ‘created_at=>23/12/2013’;
FROM my_store
WHERE doc @> ‘is_active=>:t’ AND doc ? ‘has_address’
ORDER BY doc -> ‘created_at’ DESC;
FROM my_store
WHERE doc @> ‘is_active=>:t’ AND doc ?| ARRAY[‘has_address’, ‘has_payoption’] ;

JSON
 Storage type of famous NoSQL DBs like MongoDB ,
CouchDB
 PostgreSQL introduced JSON type in 9.2; Stored as
text but with validation for JSON (--To be fixed in
9.4)
 Function like row_to_json , array_to_json
 But JSON Production and Processing absent
 9.3 Came with many new features
 to_json(any)
 Json_agg(record)
 Many Extraction Functions and operators

JSON continues…
 JSONB being introduced in 9.4
 JSONB Indexing
 Tables can be unlogged for further performance
increase at the cost of reliability

hstore and json
 hstore_to_json(hstore)
 hstore_to_json_loose(hstore)

Some numbers..
 Data used : id number, other fields text
 Record 2 Million , average 64bytes each
 Read CSV File, parse into appropriate format , Insert into DB
 Write Speed in Records/Sec
 hstore : 4600 r/s
 hstore(GiST) : 3000 r/s
 hstore (GIN) : 1700 r/s
 JSON : 4600 r/s
 MongoDB : 4000 r/s
• DB Size (in MB table/indexes)
 hstore : 300/0
 hstore(GiST) : 300/71
 hstore(GIN) : 300/700
 JSON : 300/60
 MongoDB : 1800/200

Write Speed
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Records/Second
Records/Second

Data Size
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Index(MB)
Table(MB)

 Select query on primary key
 Fetch Time in Milliseconds
 hstore : 320
 hstore(GiST) : 190
 hstore(GIN) : 180
 JSON : 20
 MongoDB : 40
 Select query on Name (filter for 100 names)
 Fetch Time in Milliseconds
 hstore : 350
 hstore(GiST) : 150
 Hstore(GIN) : 140
 JSON : 10000
 MongoDB : 450

Select Query Fetch on PK
0
50
100
150
200
250
300
350
MilliSeconds

Select Query Fetch on Text
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
MilliSeconds
MilliSeconds

Some conclusions..
 PostgreSQL can be used as schemaless DB
 PostgreSQL’s relational data storage is very efﬁcient.
 Build indexes using expressions on common used fields
 In hstore GiST index is much more efficient than GIN in our
case
 GiST/GIN accelerates every field not just the Primary Key
 GIN indexes are best for static data because lookups are
faster.
 For dynamic data, GiST indexes are faster to update.
 Specifically, GiST indexes are very good for dynamic data and
fast if the number of unique words (lexemes) is under
100,000.

So what you got in PostgreSQL
 Use the features of relational DB in your Schemaless
world !
 Use Constraints
 Use Transactions
 Use Indexing
 Do Joins on keys
 And with all these get NoSQL(Schemaless data) requirement
fulfilled as well.
 Save the new DB migration cost and time
 ---
 As these are PostgreSQL specific features, migration to
other RDBMS not possible.

Questions???
No offence to NoSQL DBs 
Thank You !!
himamahi09@gmail.com

PostgreSQL as NoSQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PostgreSQL as NoSQL

Similar to PostgreSQL as NoSQL (20)

Recently uploaded

Recently uploaded (20)

PostgreSQL as NoSQL