DSPy a system for AI to Write Prompts and Do Fine Tuning
PostgreSQL as NoSQL
1. H I M A N C H A L I
T E C H L E A D D B E @ I N M O B I
PostgreSQL as NoSQL
2. What’s on the plate today ?
What is the requirement for Schemaless database ?
Data type and volume that has to be handled.
How can we use PostgreSQL to store Schemaless
data ?
Performance of PostgreSQL Schemaless option.
3. What is Schemaless data ?
It doesn’t mean unstructured, but it’s structured over
each document
Each Document is a pair of key-values in a
hierarchical structure of arrays
The application needs to be aware about this
structure and handle it if it’s not getting as expected
Modern Schemaless DBs don’t use SQL and hence
are termed NoSQL DB. (confusing term)
4. Requirement for Schemaless data
Introduction of new reporting data for merchant
where dimension is not fixed
Currently there are 15 known fields, where data
varies from 5-15 for different merchants
In future , chances of new dimension addition
Data size to start with is 1 million per hour
5. Issue with current schema DB
Fixed schema definition
Too many null values
Adding new column will need schema changes
May be updates in old rows will be required
So, should we move to NoSQL ??
6. If PostgreSQL can solve it !!
3 different ways provided by PostgreSQL for
Document types
XML
hstore
JSON
7. hstore : Smart contrib module
A hierarchical storage type for PostgreSQL
Key Value store with ACID compliance
Maps String Keys to String Values or other hstore values
Rich in its own functions like..
h -> “a”
h?->”a”
h@>”a->2”
Indexing available : GiST and GIN
Indexes whole hierarchy , not just a key
Expression indexes are also supported
8. hstore continues….
CREATE TABLE my_store
(
id character varying(1024) NOT NULL,
doc hstore,
CONSTRAINT my_store_pkey PRIMARY KEY (id)
);
CREATE INDEX my_store_doc_idx_gist
ON my_store
USING gist(doc) ;
SELECT doc -> ‘text’ as merchant, doc -> ‘created_at’ as created_at
FROM my_store
WHERE doc @> ‘created_at=>23/12/2013’;
SELECT doc -> ‘text’ as merchant, doc -> ‘created_at’ as created_at
FROM my_store
WHERE doc @> ‘is_active=>:t’ AND doc ? ‘has_address’
ORDER BY doc -> ‘created_at’ DESC;
SELECT doc -> ‘text’ as merchant, doc -> ‘created_at’ as created_at
FROM my_store
WHERE doc @> ‘is_active=>:t’ AND doc ?| ARRAY[‘has_address’, ‘has_payoption’] ;
9. JSON
Storage type of famous NoSQL DBs like MongoDB ,
CouchDB
PostgreSQL introduced JSON type in 9.2; Stored as
text but with validation for JSON (--To be fixed in
9.4)
Function like row_to_json , array_to_json
But JSON Production and Processing absent
9.3 Came with many new features
to_json(any)
Json_agg(record)
Many Extraction Functions and operators
10. JSON continues…
JSONB being introduced in 9.4
JSONB Indexing
Tables can be unlogged for further performance
increase at the cost of reliability
11. hstore and json
hstore_to_json(hstore)
hstore_to_json_loose(hstore)
12. Some numbers..
Data used : id number, other fields text
Record 2 Million , average 64bytes each
Read CSV File, parse into appropriate format , Insert into DB
Write Speed in Records/Sec
hstore : 4600 r/s
hstore(GiST) : 3000 r/s
hstore (GIN) : 1700 r/s
JSON : 4600 r/s
MongoDB : 4000 r/s
• DB Size (in MB table/indexes)
hstore : 300/0
hstore(GiST) : 300/71
hstore(GIN) : 300/700
JSON : 300/60
MongoDB : 1800/200
17. Select Query Fetch on Text
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
MilliSeconds
MilliSeconds
18. Some conclusions..
PostgreSQL can be used as schemaless DB
PostgreSQL’s relational data storage is very efficient.
Build indexes using expressions on common used fields
In hstore GiST index is much more efficient than GIN in our
case
GiST/GIN accelerates every field not just the Primary Key
GIN indexes are best for static data because lookups are
faster.
For dynamic data, GiST indexes are faster to update.
Specifically, GiST indexes are very good for dynamic data and
fast if the number of unique words (lexemes) is under
100,000.
19. So what you got in PostgreSQL
Use the features of relational DB in your Schemaless
world !
Use Constraints
Use Transactions
Use Indexing
Do Joins on keys
And with all these get NoSQL(Schemaless data) requirement
fulfilled as well.
Save the new DB migration cost and time
---
As these are PostgreSQL specific features, migration to
other RDBMS not possible.