Slides from my talk on building tiered data stores using Aesop to bridge SQL and NoSQL data stores. Aesop is a pub-sub like change data capture and propagation system.
Building tiered data stores using aesop to bridge sql and no sql systems
1. Building tiered data stores
using Aesop to bridge
SQL and NoSQL systems
Regunath B
twitter.com/RegunathB
github.com/regunathb
Joint work with : Jagadeesh Huliyar, Shoury Bharadwaj, Arya Ketan, Nikhil Bafna, Pratyay Banerjee, Sneha Shukla,
Yogesh Dahiya, Kartik Ukhalkar, Milap Wadhwa, Ravindra Yadav, Santosh Patil,Rishab Dua & others at Flipkart
2. What Data store?
• Often heard : I need to scale and therefore will use a NoSQL
database - Really?
• By Type,Access : Relational, KV, Document, Columnar
• Scaling limits of single node databases, product claims &
benchmarks of distributed data stores
• Rise of memory-based, flash-optimised data stores (Log structured)
• CAP tradeoffs - Consistency vs. Availability
• Other considerations : Durability of data, disk-to-memory ratios, size
of single cluster etc.
4. Source: Wayback Machine (ca. 2007)
MySQL&
Master&
Website&
Writes&
MySQL&
Slave&
Website&
Reads&
Analy5cs&
Reads&
Replica5on&
MySQL&
Master&
Website&
Writes&
MySQL&
Slave&1&
Website&
Reads&
Replica6on&
MySQL&
Slave&2&
Analy6cs&
Reads&
Replica6on&
MySQL&
Master&
Writes&
MySQL&
Slave&
Reads&
Replica4on&
Scaling the Data store
Source: http://www.slideshare.net/slashn/slash-n-tech-talk-track-2-website-architecturemistakes-learnings-siddhartha-reddy
and meanwhile… traded Consistency
for Availability
5. Scaling the data store
(polyglot persistence)
Data Store
User session Memcached, HBase
Product HBase, Elastic
Search, Redis
Cart MySQL, MongoDB
Notifications HBase
Search Solr, Neo4j
Recommendations Hadoop MR, Redis/
Aerospike
Promotions (WIP) HBase, Redis
Pattern : Serving Layer +
Source of truth data store
9. Caching/Serving Layer
challenges
There are only two hard things in Computer Science: cache invalidation and naming
things.
-- Phil Karlton
• (Low Cache TTLs + Lazy caching + High Request concurrency) results in:
• Thundering herds
• Exposes Availability of primary data store
• (Cache size + distribution + no. of replicas) results in:
• Hard to implement write-through mechanisms to maintain consistency
Serving Layer is Eventually Consistent, at best
10. Eventual Consistency
• Replicas converge over time
• Pros
• Scale reads through multiple replicas
• Higher overall data availability
• Cons
• Reads return live data before convergence
• Achieving Eventual Consistency is not easy
• Trivially requires Atleast-Once delivery guarantee of updates to all replicas
13. Log Mining
(Old wine in new bottle?)
• "Durability is typically implemented via logging and
recovery.” [1] Architecture of a Database System -
Michael Stonebraker et.al (2007)
• "The contents of the DB are a cache of the latest records
in the log. The truth is the log. The database is a cache of
a subset of the log.” - Jay Kreps (creator of Kafka, 2015)
• WAL (write ahead log) ensures:
• Each modification is flushed to disk
• Log records are in order
14. Data Consistency & Aesop
Consistency Model-map source: https://aphyr.com/
• Sequential Consistency:
• Says nothing about time - there is no reference to the “most recent” write operation
• Mechanisms for Sequential Consistency : Primary-based replication protocols (used by Aesop)
• Strict consistency (e.g Linearizability) : too
hard & requires coordination across
replicas, relies on absolute global time
• Less strict : Replicas can disagree for ever
15. Reliable delivery
Data store
w4 w5 w6 w7 w8 w9 w10 w11Transaction Logs
Producer
Tailer (Slave/Replica)
Subscriber Subscriber Subscriber
Application
Data stores
Zookeeper/
Filesystem
SCN
(checkpoints)
Event Ring
Buffer
Relay
w8 w8 w8
18. Summary
Aesop - pub-sub like Change Capture, propagation system
Performance :
• Relay : 1 XL VM (8 core, 32GB)
• Consumers : 4 XL VM, 200 partitions
• Throughput : 30K Inserts per sec (MySQL to HBase)
• Data size : 800 GB
• Not exactly-once delivery
• Not a storage system
• No global ordering across different data-stores (consumers)
What it isn't :
What it is :
• Supports multiple data stores
• Delivers updates reliably - at least once, in-order
• Supports varying consumer speeds
19. More details..
• Open Source : https://github.com/Flipkart/aesop
• Support : aesop-users@googlegroups.com
• Multiple production deployments at Flipkart
Project :
Related Work :
• LinkedIn Databus
• [2] Facebook Wormhole
• [1] Architecture of a Database System : http://
db.cs.berkeley.edu/papers/fntdb07-architecture.pdf
• [2] Wormhole Paper: https://www.usenix.org/system/
files/conference/nsdi15/nsdi15-paper-sharma.pdf
References :