SlideShare a Scribd company logo
1 of 54
NoSQL and ACID
david.rosenthal@foundationdb.com
Twitter: @foundationdb
NoSQL‘s Motivation
Make it easy to build and deploy
applications.
 Ease of scaling and operation
 Fault tolerance
 Many data models
 Good price/performance
X ACID transactions
What if we had ACID?
Good for financial applications?
Big performance hit?
Sacrifice availability?
Nope… When NoSQL has ACID, it
opens up a very different path.
The case for ACID in NoSQL
Bugs don‘t appear under concurrency
• ACID means isolation.
• Reason locally rather than globally.
– If every transaction maintains an
invariant, then multiple clients running
any combination of concurrent
transactions also maintain that invariant.
• The impact of each client is isolated.
Isolation means strong abstractions
• Example interface:
– storeUser(name, SSN)
– getName(SSN)
– getSSN(name)
• Invariant: N == getName(getSSN(N))
– Always works with single client.
– Without ACID: Fails with concurrent clients.
– With ACID: Works with concurrent clients.
Abstractions
Abstractions built on a scalable, fault
tolerant, transactional foundation
inherit those properties.
And are easy to build…
Examples of ―easy‖
 SQL database in one day
 Indexed table layer (3 days * 1 intern)
 Fractal spatial index in 200 lines:
Remove/decouple features from the DB
With strong abstractions, features can be
moved from the DB to more flexible code.
Examples:
– Indexing
– More efficient data structures (e.g. using
pointers/indirection)
– Query language
Remove/decouple data models
• A NoSQL database with ACID can
provide polyglot data models and APIs.
– Key-value, graph, column-oriented,
document, relational, publish-subscribe,
spatial, blobs, ORMs, analytics, etc…
• Without requiring separate physical
databases. This is a huge ops win.
So, why don't we have ACID?
• It‘s hard.
• History.
History
Historical Perspective: 2008
In 2008, NoSQL doesn’t really exist yet.
2008
Databases in 2008
NoSQL emerges to replace scalable
sharding/caching solutions that had already
thrown out consistency.
• BigTable
• Dynamo
• Voldemort
• Cassandra
The CAP2008 theorem
―Pick 2 out of 3‖
- Eric Brewer
The CAP2008 theorem
―Data inconsistency in large-scale
reliable distributed systems has to be
tolerated … [for performance and to
handle faults]‖
- Werner Vogles (CTO Amazon.com)
The CAP2008 theorem
―The availability property means that
the system is ‗online‘ and the client of
the system can expect to receive a
response for its request.‖
- Wrong descriptions all over the
web
CAP2008 Conclusions?
• Scaling requires distributed design
• Distributed requires high availability
• Availability requires no C
So, if we want scalability we have to
give up C, a cornerstone of ACID,
right?
Thinking about CAP2008
CAP availability != High availability
Fast forward to CAP2013
―Why ’2 out of 3’ is misleading‖
―CAP prohibits… perfect
availability‖
- Eric Brewer
Fast forward to CAP2013
―Achieving strict consistency can come
at a cost in update or read latency,
and may result in lower throughput…‖
- Werner Vogles (Amazon CTO)
Fast forward to CAP2013
―…it is better to have application
programmers deal with performance
problems due to overuse of transactions
as bottlenecks arise, rather than always
coding around the lack of transactions.―
- Google (Spanner)
The ACID NoSQL plan
• Maintain both scalability and fault tolerance
• Leverage CAP2013 and deliver a CP system
with true global ACID transactions
• Enable abstractions and many data models
• Deliver high per-node performance
Approaches
NoSQL
TRANSACTIONS / LOCKING
Bolt-on approach
Bolt transactions on top of a
database without transactions.
Bolt-on approach
Bolt transactions on top of a database
without transactions.
• Upside: Elegance.
• Downsides:
– Nerd trap
– Performance. ―…integrating multiple layers has
its advantages: integrating concurrency control
with replication reduces the cost of commit wait
in Spanner, for example‖ -Google
NoSQL
TRANSACTIONS /
LOCKING
Transactional building block approach
Use non-scalable transactional DBs
as components of a cluster.
•
Transactional building block approach
Use non-scalable transactional DBs as
components of a cluster.
• Upside: Local transactions are fast
• Downside: Distributed transactions
across machines are hard to make fast,
and are messy (timeouts required)
Decomposition approach
Decompose the processing pipeline of a
traditional ACID DB into individual
stages.
Decomposition approach
Decompose the processing pipeline of a
traditional ACID DB into individual stages.
• Stages:
– Accept client transactions
– Apply concurrency control
– Write to transaction logs
– Update persistent data representation
• Upside: Performance
• Downside: ―Ugly‖ and complex architecture
needs to solve tough problems for each stage
Challenges with ACID
Disconnected operation challenge
• Offline sync is a real application need
Solution:
• Doing it in the DB layer is terrible
• Can (and should) be solved by the app,
E.g. by buffering mutations, sync‘ing
when connected
Split brain challenge
• Any consistent database need a fault-tolerance
source of ―ground truth‖
• Must prevent database from splitting into two
independent parts
Solution :
• Using thoughtfully chosen Paxos nodes can yield
high availability, even for drastic failure scenarios
• Paxos is not required for each transaction
Latency challenge
• Durability costs latency
• Causal consistency costs latency
Solution:
• Bundling ops reduces overhead
• ACID costs only needed for ACID
guarantees
Correctness challenge
• MaybeDB:
– Set(key, value) – Might set key to value
– Get(key) – Get a value that key was set to
Solution:
• The much stronger ACID contract
requires vastly more powerful tools for
testing
Implementation language challenge
We need new tools!
Goal Language
Many asynchronous
communicating processes
Erlang?
Engineering for reliability and
fault tolerance of large clusters
while maintaining correctness
Simulation
Fast algorithms; efficient I/O C++
Tools for achieving ACID
Flow
• A new programming language
• Adds actor-model concurrency to C++11
• New keywords: ACTOR, future, promise,
wait, choose, when, streams
• Transcompilation:
– Flow code -> C++11 code -> native
Seriously?
Flow allows…
• Easier ACTOR-model coding
• Testability by enabling simulation
• Performance by compiling to native
Flow eases development
Flow eases development
Flow output
Flow performance
―Write a ring benchmark. Create N processes in a
ring. Send a message round the ring M times so that
a total of N * M messages get sent. Time how long
this takes for different values of N and M. Write a
similar program in some other programming language
you are familiar with. Compare the results. Write a
blog, and publish the results on the internet!‖
- Joe Armstrong (author of ―Programming Erlang‖)
Flow performance
(N=1000, M=1000)
• Ruby (using threads): 1990 seconds
• Ruby (queues): 360 seconds
• Objective C (using threads): 26 seconds
• Java (threads): 12 seconds
• Stackless Python: 1.68 seconds
• Erlang: 1.09 seconds
• Google Go: 0.87 seconds
• Flow: 0.075 seconds
Flow enables testability
• ―Lithium‖ testing framework
• Simulate all physical interfaces
• Simulate failures modes
• Deterministic (!) simulation of entire
system
Simulation is the key for correctness.
Testability: Quicksand
FoundationDB is
NoSQL with ACID
FoundationDB
Database software:
• Scalable
• Fault tolerant
• Transactional
• Ordered key-value API
• Layers
Layers
Key-value API
Layers
• An open-source ecosystem
• Common NoSQL data models
• Graph database (implements BluePrints
2.4 standard)
• Zookeeper-like coordination layer
• Celery (distributed task queue) layer
• Many others…
SQL Layer
• A full SQL database in a layer!
• Akiban acquisition
• Unique ―table group‖ concept can
physically store related tables in an
efficient ―object structure‖
• Architecture: stateless, local server
Performance results
• Reads of cacheable data are ½ the
speed of memcached—with full
consistency!
• Random uncacheable reads of 4k ranges
saturate network bandwidth
• A 24-machine cluster processing 100%
cross-node transactions saturates its
SSDs at 890,000 op/s
The big performance result
• Vogels: “Achieving strict consistency can
come at a cost in update or read
latency, and may result in lower
throughput…”
• Ok, so, how much?
– Only ~10%!
– Transaction isolation—the ―intuitive
bottleneck‖ is accomplished in less than
one core.
A vision for NoSQL
• The next generation should maintain
– Scalability and fault tolerance
– High performance
• While adding
– ACID transactions
– Data model flexibility
Thank you
david.rosenthal@foundationdb.com
Twitter: @foundationdb

More Related Content

What's hot

An Introduction To Space Based Architecture
An Introduction To Space Based ArchitectureAn Introduction To Space Based Architecture
An Introduction To Space Based ArchitectureAmin Abbaspour
 
Modern Cloud Fundamentals: Misconceptions and Industry Trends
Modern Cloud Fundamentals: Misconceptions and Industry TrendsModern Cloud Fundamentals: Misconceptions and Industry Trends
Modern Cloud Fundamentals: Misconceptions and Industry TrendsChristopher Bennage
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliencyMasashi Narumoto
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataStylight
 
The Future of Services: Building Asynchronous, Resilient and Elastic Systems
The Future of Services: Building Asynchronous, Resilient and Elastic SystemsThe Future of Services: Building Asynchronous, Resilient and Elastic Systems
The Future of Services: Building Asynchronous, Resilient and Elastic SystemsLightbend
 
Sneaking Scala through the Back Door
Sneaking Scala through the Back DoorSneaking Scala through the Back Door
Sneaking Scala through the Back DoorDianne Marsh
 
SQL Server Disaster Recovery on Azure - SQL Saturday 921
SQL Server Disaster Recovery on Azure - SQL Saturday 921SQL Server Disaster Recovery on Azure - SQL Saturday 921
SQL Server Disaster Recovery on Azure - SQL Saturday 921Marco Obinu
 
RedisConf18 - Common Redis Use Cases for Cloud Native Apps and Microservices
RedisConf18 - Common Redis Use Cases for Cloud Native Apps and MicroservicesRedisConf18 - Common Redis Use Cases for Cloud Native Apps and Microservices
RedisConf18 - Common Redis Use Cases for Cloud Native Apps and MicroservicesRedis Labs
 
Event Sourcing in less than 20 minutes - With Akka and Java 8
Event Sourcing in less than 20 minutes - With Akka and Java 8Event Sourcing in less than 20 minutes - With Akka and Java 8
Event Sourcing in less than 20 minutes - With Akka and Java 8J On The Beach
 
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native ApplicationsThe Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native ApplicationsLightbend
 
Virtualize with bare metal performance
Virtualize with bare metal performanceVirtualize with bare metal performance
Virtualize with bare metal performanceDeba Chatterjee
 
Sql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su AzureSql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su AzureMarco Obinu
 
The Architect's Two Hats
The Architect's Two HatsThe Architect's Two Hats
The Architect's Two HatsBen Stopford
 
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...Data Con LA
 
Managing Performance in the Cloud
Managing Performance in the CloudManaging Performance in the Cloud
Managing Performance in the CloudDevOpsGroup
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkTodd Fritz
 
Blockchain for the DBA and Data Professional
Blockchain for the DBA and Data ProfessionalBlockchain for the DBA and Data Professional
Blockchain for the DBA and Data ProfessionalKaren Lopez
 

What's hot (20)

An Introduction To Space Based Architecture
An Introduction To Space Based ArchitectureAn Introduction To Space Based Architecture
An Introduction To Space Based Architecture
 
Modern Cloud Fundamentals: Misconceptions and Industry Trends
Modern Cloud Fundamentals: Misconceptions and Industry TrendsModern Cloud Fundamentals: Misconceptions and Industry Trends
Modern Cloud Fundamentals: Misconceptions and Industry Trends
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Designing microservices
Designing microservicesDesigning microservices
Designing microservices
 
The Future of Services: Building Asynchronous, Resilient and Elastic Systems
The Future of Services: Building Asynchronous, Resilient and Elastic SystemsThe Future of Services: Building Asynchronous, Resilient and Elastic Systems
The Future of Services: Building Asynchronous, Resilient and Elastic Systems
 
Sneaking Scala through the Back Door
Sneaking Scala through the Back DoorSneaking Scala through the Back Door
Sneaking Scala through the Back Door
 
SQL Server Disaster Recovery on Azure - SQL Saturday 921
SQL Server Disaster Recovery on Azure - SQL Saturday 921SQL Server Disaster Recovery on Azure - SQL Saturday 921
SQL Server Disaster Recovery on Azure - SQL Saturday 921
 
RedisConf18 - Common Redis Use Cases for Cloud Native Apps and Microservices
RedisConf18 - Common Redis Use Cases for Cloud Native Apps and MicroservicesRedisConf18 - Common Redis Use Cases for Cloud Native Apps and Microservices
RedisConf18 - Common Redis Use Cases for Cloud Native Apps and Microservices
 
Event Sourcing in less than 20 minutes - With Akka and Java 8
Event Sourcing in less than 20 minutes - With Akka and Java 8Event Sourcing in less than 20 minutes - With Akka and Java 8
Event Sourcing in less than 20 minutes - With Akka and Java 8
 
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native ApplicationsThe Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
 
Azure Reference Architectures
Azure Reference ArchitecturesAzure Reference Architectures
Azure Reference Architectures
 
Virtualize with bare metal performance
Virtualize with bare metal performanceVirtualize with bare metal performance
Virtualize with bare metal performance
 
Revitalizing Aging Architectures with Microservices
Revitalizing Aging Architectures with MicroservicesRevitalizing Aging Architectures with Microservices
Revitalizing Aging Architectures with Microservices
 
Sql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su AzureSql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su Azure
 
The Architect's Two Hats
The Architect's Two HatsThe Architect's Two Hats
The Architect's Two Hats
 
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...
 
Managing Performance in the Cloud
Managing Performance in the CloudManaging Performance in the Cloud
Managing Performance in the Cloud
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
Blockchain for the DBA and Data Professional
Blockchain for the DBA and Data ProfessionalBlockchain for the DBA and Data Professional
Blockchain for the DBA and Data Professional
 

Similar to NoSQL and ACID

Practical Thin Server Architecture With Dojo Sapo Codebits 2008
Practical Thin Server Architecture With Dojo Sapo Codebits 2008Practical Thin Server Architecture With Dojo Sapo Codebits 2008
Practical Thin Server Architecture With Dojo Sapo Codebits 2008codebits
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cachecornelia davis
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Dataexponential-inc
 
Practical Thin Server Architecture With Dojo Peter Svensson
Practical Thin Server Architecture With Dojo Peter SvenssonPractical Thin Server Architecture With Dojo Peter Svensson
Practical Thin Server Architecture With Dojo Peter Svenssonrajivmordani
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorScyllaDB
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesJosef Adersberger
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesQAware GmbH
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
AWS Serverless patterns & best-practices in AWS
AWS Serverless  patterns & best-practices in AWSAWS Serverless  patterns & best-practices in AWS
AWS Serverless patterns & best-practices in AWSDima Pasko
 
Scalability using Node.js
Scalability using Node.jsScalability using Node.js
Scalability using Node.jsratankadam
 
Node.js meetup at Palo Alto Networks Tel Aviv
Node.js meetup at Palo Alto Networks Tel AvivNode.js meetup at Palo Alto Networks Tel Aviv
Node.js meetup at Palo Alto Networks Tel AvivRon Perlmuter
 
Microservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problemsMicroservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problemsŁukasz Sowa
 
AWS Startup Webinar | Developing on AWS
AWS Startup Webinar | Developing on AWSAWS Startup Webinar | Developing on AWS
AWS Startup Webinar | Developing on AWSAmazon Web Services
 
(ARC309) Getting to Microservices: Cloud Architecture Patterns
(ARC309) Getting to Microservices: Cloud Architecture Patterns(ARC309) Getting to Microservices: Cloud Architecture Patterns
(ARC309) Getting to Microservices: Cloud Architecture PatternsAmazon Web Services
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Jimmy Angelakos
 
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
The Crown Jewels: Is Enterprise Data Ready for the Cloud?The Crown Jewels: Is Enterprise Data Ready for the Cloud?
The Crown Jewels: Is Enterprise Data Ready for the Cloud?Inside Analysis
 
Docebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverlessDocebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverlessAWS User Group Italy
 

Similar to NoSQL and ACID (20)

Practical Thin Server Architecture With Dojo Sapo Codebits 2008
Practical Thin Server Architecture With Dojo Sapo Codebits 2008Practical Thin Server Architecture With Dojo Sapo Codebits 2008
Practical Thin Server Architecture With Dojo Sapo Codebits 2008
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cache
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Data
 
Practical Thin Server Architecture With Dojo Peter Svensson
Practical Thin Server Architecture With Dojo Peter SvenssonPractical Thin Server Architecture With Dojo Peter Svensson
Practical Thin Server Architecture With Dojo Peter Svensson
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with Alternator
 
AWS User Group October
AWS User Group OctoberAWS User Group October
AWS User Group October
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
AWS Serverless patterns & best-practices in AWS
AWS Serverless  patterns & best-practices in AWSAWS Serverless  patterns & best-practices in AWS
AWS Serverless patterns & best-practices in AWS
 
Db trends final
Db trends   finalDb trends   final
Db trends final
 
Scalability using Node.js
Scalability using Node.jsScalability using Node.js
Scalability using Node.js
 
Node.js meetup at Palo Alto Networks Tel Aviv
Node.js meetup at Palo Alto Networks Tel AvivNode.js meetup at Palo Alto Networks Tel Aviv
Node.js meetup at Palo Alto Networks Tel Aviv
 
Microservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problemsMicroservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problems
 
AWS Startup Webinar | Developing on AWS
AWS Startup Webinar | Developing on AWSAWS Startup Webinar | Developing on AWS
AWS Startup Webinar | Developing on AWS
 
Apache Drill (ver. 0.2)
Apache Drill (ver. 0.2)Apache Drill (ver. 0.2)
Apache Drill (ver. 0.2)
 
(ARC309) Getting to Microservices: Cloud Architecture Patterns
(ARC309) Getting to Microservices: Cloud Architecture Patterns(ARC309) Getting to Microservices: Cloud Architecture Patterns
(ARC309) Getting to Microservices: Cloud Architecture Patterns
 
Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]Slow things down to make them go faster [FOSDEM 2022]
Slow things down to make them go faster [FOSDEM 2022]
 
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
The Crown Jewels: Is Enterprise Data Ready for the Cloud?The Crown Jewels: Is Enterprise Data Ready for the Cloud?
The Crown Jewels: Is Enterprise Data Ready for the Cloud?
 
Docebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverlessDocebo: history of a journey from legacy to serverless
Docebo: history of a journey from legacy to serverless
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

NoSQL and ACID

  • 2. NoSQL‘s Motivation Make it easy to build and deploy applications.  Ease of scaling and operation  Fault tolerance  Many data models  Good price/performance X ACID transactions
  • 3. What if we had ACID? Good for financial applications? Big performance hit? Sacrifice availability? Nope… When NoSQL has ACID, it opens up a very different path.
  • 4. The case for ACID in NoSQL
  • 5. Bugs don‘t appear under concurrency • ACID means isolation. • Reason locally rather than globally. – If every transaction maintains an invariant, then multiple clients running any combination of concurrent transactions also maintain that invariant. • The impact of each client is isolated.
  • 6. Isolation means strong abstractions • Example interface: – storeUser(name, SSN) – getName(SSN) – getSSN(name) • Invariant: N == getName(getSSN(N)) – Always works with single client. – Without ACID: Fails with concurrent clients. – With ACID: Works with concurrent clients.
  • 7. Abstractions Abstractions built on a scalable, fault tolerant, transactional foundation inherit those properties. And are easy to build…
  • 8. Examples of ―easy‖  SQL database in one day  Indexed table layer (3 days * 1 intern)  Fractal spatial index in 200 lines:
  • 9. Remove/decouple features from the DB With strong abstractions, features can be moved from the DB to more flexible code. Examples: – Indexing – More efficient data structures (e.g. using pointers/indirection) – Query language
  • 10. Remove/decouple data models • A NoSQL database with ACID can provide polyglot data models and APIs. – Key-value, graph, column-oriented, document, relational, publish-subscribe, spatial, blobs, ORMs, analytics, etc… • Without requiring separate physical databases. This is a huge ops win.
  • 11. So, why don't we have ACID? • It‘s hard. • History.
  • 13. Historical Perspective: 2008 In 2008, NoSQL doesn’t really exist yet. 2008
  • 14. Databases in 2008 NoSQL emerges to replace scalable sharding/caching solutions that had already thrown out consistency. • BigTable • Dynamo • Voldemort • Cassandra
  • 15. The CAP2008 theorem ―Pick 2 out of 3‖ - Eric Brewer
  • 16. The CAP2008 theorem ―Data inconsistency in large-scale reliable distributed systems has to be tolerated … [for performance and to handle faults]‖ - Werner Vogles (CTO Amazon.com)
  • 17. The CAP2008 theorem ―The availability property means that the system is ‗online‘ and the client of the system can expect to receive a response for its request.‖ - Wrong descriptions all over the web
  • 18. CAP2008 Conclusions? • Scaling requires distributed design • Distributed requires high availability • Availability requires no C So, if we want scalability we have to give up C, a cornerstone of ACID, right?
  • 19. Thinking about CAP2008 CAP availability != High availability
  • 20. Fast forward to CAP2013 ―Why ’2 out of 3’ is misleading‖ ―CAP prohibits… perfect availability‖ - Eric Brewer
  • 21. Fast forward to CAP2013 ―Achieving strict consistency can come at a cost in update or read latency, and may result in lower throughput…‖ - Werner Vogles (Amazon CTO)
  • 22. Fast forward to CAP2013 ―…it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.― - Google (Spanner)
  • 23. The ACID NoSQL plan • Maintain both scalability and fault tolerance • Leverage CAP2013 and deliver a CP system with true global ACID transactions • Enable abstractions and many data models • Deliver high per-node performance
  • 25. NoSQL TRANSACTIONS / LOCKING Bolt-on approach Bolt transactions on top of a database without transactions.
  • 26. Bolt-on approach Bolt transactions on top of a database without transactions. • Upside: Elegance. • Downsides: – Nerd trap – Performance. ―…integrating multiple layers has its advantages: integrating concurrency control with replication reduces the cost of commit wait in Spanner, for example‖ -Google NoSQL TRANSACTIONS / LOCKING
  • 27. Transactional building block approach Use non-scalable transactional DBs as components of a cluster. •
  • 28. Transactional building block approach Use non-scalable transactional DBs as components of a cluster. • Upside: Local transactions are fast • Downside: Distributed transactions across machines are hard to make fast, and are messy (timeouts required)
  • 29. Decomposition approach Decompose the processing pipeline of a traditional ACID DB into individual stages.
  • 30. Decomposition approach Decompose the processing pipeline of a traditional ACID DB into individual stages. • Stages: – Accept client transactions – Apply concurrency control – Write to transaction logs – Update persistent data representation • Upside: Performance • Downside: ―Ugly‖ and complex architecture needs to solve tough problems for each stage
  • 32. Disconnected operation challenge • Offline sync is a real application need Solution: • Doing it in the DB layer is terrible • Can (and should) be solved by the app, E.g. by buffering mutations, sync‘ing when connected
  • 33. Split brain challenge • Any consistent database need a fault-tolerance source of ―ground truth‖ • Must prevent database from splitting into two independent parts Solution : • Using thoughtfully chosen Paxos nodes can yield high availability, even for drastic failure scenarios • Paxos is not required for each transaction
  • 34. Latency challenge • Durability costs latency • Causal consistency costs latency Solution: • Bundling ops reduces overhead • ACID costs only needed for ACID guarantees
  • 35. Correctness challenge • MaybeDB: – Set(key, value) – Might set key to value – Get(key) – Get a value that key was set to Solution: • The much stronger ACID contract requires vastly more powerful tools for testing
  • 36. Implementation language challenge We need new tools! Goal Language Many asynchronous communicating processes Erlang? Engineering for reliability and fault tolerance of large clusters while maintaining correctness Simulation Fast algorithms; efficient I/O C++
  • 38. Flow • A new programming language • Adds actor-model concurrency to C++11 • New keywords: ACTOR, future, promise, wait, choose, when, streams • Transcompilation: – Flow code -> C++11 code -> native Seriously?
  • 39. Flow allows… • Easier ACTOR-model coding • Testability by enabling simulation • Performance by compiling to native
  • 43. Flow performance ―Write a ring benchmark. Create N processes in a ring. Send a message round the ring M times so that a total of N * M messages get sent. Time how long this takes for different values of N and M. Write a similar program in some other programming language you are familiar with. Compare the results. Write a blog, and publish the results on the internet!‖ - Joe Armstrong (author of ―Programming Erlang‖)
  • 44. Flow performance (N=1000, M=1000) • Ruby (using threads): 1990 seconds • Ruby (queues): 360 seconds • Objective C (using threads): 26 seconds • Java (threads): 12 seconds • Stackless Python: 1.68 seconds • Erlang: 1.09 seconds • Google Go: 0.87 seconds • Flow: 0.075 seconds
  • 45. Flow enables testability • ―Lithium‖ testing framework • Simulate all physical interfaces • Simulate failures modes • Deterministic (!) simulation of entire system Simulation is the key for correctness.
  • 48. FoundationDB Database software: • Scalable • Fault tolerant • Transactional • Ordered key-value API • Layers Layers Key-value API
  • 49. Layers • An open-source ecosystem • Common NoSQL data models • Graph database (implements BluePrints 2.4 standard) • Zookeeper-like coordination layer • Celery (distributed task queue) layer • Many others…
  • 50. SQL Layer • A full SQL database in a layer! • Akiban acquisition • Unique ―table group‖ concept can physically store related tables in an efficient ―object structure‖ • Architecture: stateless, local server
  • 51. Performance results • Reads of cacheable data are ½ the speed of memcached—with full consistency! • Random uncacheable reads of 4k ranges saturate network bandwidth • A 24-machine cluster processing 100% cross-node transactions saturates its SSDs at 890,000 op/s
  • 52. The big performance result • Vogels: “Achieving strict consistency can come at a cost in update or read latency, and may result in lower throughput…” • Ok, so, how much? – Only ~10%! – Transaction isolation—the ―intuitive bottleneck‖ is accomplished in less than one core.
  • 53. A vision for NoSQL • The next generation should maintain – Scalability and fault tolerance – High performance • While adding – ACID transactions – Data model flexibility