SlideShare a Scribd company logo
1 of 24
The Power of Determinism in 
Database Systems 
Daniel J. Abadi 
Yale University 
(Joint work with Jose Faleiro, Kun Ren, and Alex 
Thomson)
Database Systems Are Great 
• Protects a dataset from corruption or 
deletion in the face of media, system, or 
program crashes 
• Allows programs to change state of data in 
arbitrary ways 
• Allows 1000s of such programs to run 
concurrently 
– Guarantees atomicity and isolation of such 
programs 
• Has served as blueprint for many 
concurrent, highly complex systems
But … 
• Design is incredibly complex 
– Takes $17 million to build a new one 
• Components are horribly monolithic 
• Corner case bugs nearly impossible to 
reproduce 
• Does not scale horizontally 
• Does not scale horizontally (seriously) 
Should the DBMS architecture really be a 
blueprint for concurrent system design?
Nondeterminism is the problem 
• Building on top of: 
– OSes that enable threads to be scheduled 
arbitrarily 
– Networks that deliver messages with arbitrary 
delays (and sometimes in arbitrary orders) 
– Hardware that can fail arbitrarily 
• Only natural to allow the state of the 
database to be dependent on these 
nondeterministic events
Nondeterminism is the problem 
• OS non-deterministic thread scheduling leads 
to: 
– Arbitrary transaction interleaving 
– Deadlocks 
– Difficult to reproduce bugs 
– Tight interactions between lock manager, 
recovery manager, access manager, and 
transaction manager. 
• Hardware failures and message delivery 
delays result in transaction aborts 
– Need complicated recovery manager to handle 
half-completed transactions 
– Need commit-protocol for distributed transactions
How to eliminate nondeterminism? 
• There exist proposals for: 
– Deterministic operating systems 
– (Somewhat) deterministic networking layers 
– Highly redundant and reliable hardware 
• Maybe one day those proposals will come 
with fewer disadvantages 
• In the meantime, we have to create 
determinism from nondeterministic 
components 
– Select and choose what we make deterministic
Possible determinism levels 
• Given an input and initial state of the database 
system, to get to one and only one possible final 
state: 
– Level 1: System always runs the same sequence 
of instructions 
– Level 2: System always proceeds through the 
same sequence of states of the database 
– Level 3: Database is allowed to proceed through 
states in any order as long as the final state of all 
external and internal data structures is 
determined by the input 
– Level 4: Database is allowed to proceed through 
states in any order as long as the final state of all 
external structures is determined by the input
Database Systems Problems 
• Design is incredibly complex 
– Takes $17 million to build a new one 
• Components are horribly monolithic 
• Corner case bugs nearly impossible to 
reproduce 
• Does not scale horizontally 
• Does not scale horizontally
Database Systems Problems 
• Design is incredibly complex 
– Takes $17 million to build a new one 
LEVEL 4 DETERMINISM 
HELPS WITH ALL OF 
• Components are horribly monolithic 
• Corner case bugs nearly impossible to 
reproduce 
• Does not scale THESE 
horizontally 
• Does not scale horizontally
Recovery 
• Brain-dead version: 
– Log all input to the system 
– Upon a failure, trash the entire database, reply input 
log from the beginning 
• Less brain-dead version: 
– Create checkpoints of database state as of some 
point in the input log 
– Upon a failure, trash the entire database, load 
checkpoint, replay input log from point where 
checkpoint was taken 
• Note that logging can happen entirely externally to 
the DBMS 
• Same is true for checkpointing, although may want 
to perform it inside the DBMS for performance 
– Even in this case, it needs very little knowledge about 
other components
Replication 
• Send the same input log to replica DBMS 
– User-visible state in replicas will not diverge 
– Can happen entirely externally to the DBMS
Horizontal Scalability 
• Active distributed xacts not aborted upon 
node failure 
– Greatly reduces (or eliminates) cost of 
distributed commit 
• Don’t have to worry about nodes failing during 
commit protocol 
• Don’t have to worry about affects of transaction 
making it to disk before promising to commit 
transaction 
• Just need one message from any node that 
potentially can deterministically abort the xact 
– This message can be sent in the middle of the xact, as 
soon as it knows it will commit
One Way to Implement Determinism 
• Use a preprocessor to handle client communications, 
and create a log of submitted xacts 
• Send log in batches to DBMS 
• Every xact immediately requests all locks it will need 
(in order of log) 
• If it doesn’t know what it will need 
– Run enough of the xact to find out, but do not change the 
database state 
– Reissue xact to the preprocessor with lock requirements 
included as parameter 
– Run enough of the new xact to find out if it locked the 
correct items (database state might have changed in the 
meantime) 
• If so, then xact can proceed as normal 
• If not, reissue again to the preprocessor and repeat as 
necessary 
• Trivial to prove this is deterministic and deadlock-free
What’s the Downside? 
• Increased latency to log input transactions 
and send to the DBMS in batches 
• No flexibility for the system to abort 
transactions on a whim 
• Can’t reorder transaction execution if one 
xact stalls mid-transaction 
• Need to determine what will be locked in 
advance
Additional Upside 
• Our implementation eliminates deadlocks 
– Distributed deadlock is a major problem for 
distributed DBMSs 
• Lock manager totally separate from the 
rest of DBMS 
– Increases modularity of the system
Experimental Evaluation 
• Experiments conducted on Amazon EC2 
using m3.2xlarge(Double Extra Large) 
• Cluster of 8 nodes 
• TPC-C 
• Microbenchmark: 
– 10RMW actions 
– 10RMW actions + CPU computation
TPC-C
Microbenchmark Experiments (Long xacts) 
250000 
200000 
150000 
100000 
50000 
0 
0% 20% 40% 60% 80% 100% 
Transactions per second per node 
% Distributed Transactions 
Deterministic, high contention 
Nondeterministic, high 
contention 
Deterministic, low contention 
Nondeterministic, low 
contention 
Nondeterministic w/o 2PC, low 
contention 
Nondeterministic w/o 2PC, 
high contention
Microbenchmark Experiments (Short xacts) 
600000 
500000 
400000 
300000 
200000 
100000 
0 
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
Transactions per second per node 
% Distributed Transactions 
Deterministic, high contention 
Nondeterministic, high 
contention 
Nondeterministic, low 
contention 
Deterministic, low contention 
Deterministic w/ VLL, low 
contention 
Deterministic w/ VLL, high 
contention
Resource Constraints Experiments 
250000 
200000 
150000 
100000 
50000 
0 
Deterministic, 5% distributed 
Deterministic, 100% distributed 
Nondeterministic, 5% distributed 
Nondeterministic, 100% distributed 
Nondeterministic w/ throttling, 5% distributed 
0 5 10 15 20 25 30 35 40 45 
throughput (txns/sec) 
time (seconds)
Dependent Transactions Experiments 
70000 
60000 
50000 
40000 
30000 
20000 
Deterministic, 0% dependent 
Deterministic, 20% dependent 
Deterministic, 50% dependent 
Deterministic, 100% dependent 
Nondeterministic, 0% dependent 
Nondeterministic, 20% dependent 
Nondeterministic, 50% dependent 
Nondeterministic, 100% dependent 0 
(a) 0% distributed transactions 
Deterministic, 0% dependent 
Deterministic, 20% dependent 
Deterministic, 50% dependent 
Deterministic, 100% dependent 
Nondeterministic, 0% dependent 
Nondeterministic, 20% dependent 
Nondeterministic, 50% dependent 
Nondeterministic, 100% dependent 
(b)100% distributed transactions 
180000 
160000 
140000 
120000 
100000 
80000 
60000 
40000 
20000 
0 
0.01 0.1 1 10 100 
throughput (txns/sec) 
index entry volatility 
10000 
0.01 0.1 1 10 100 
throughput (txns/sec) 
index entry volatility
Latency CDF
More information 
• The Case for Determinism in Database Systems 
Alexander Thomson and Daniel J. Abadi. In PVLDB, 3(1), 
2010. (pdf) 
• Calvin: Fast Distributed Transactions for Partitioned 
Database Systems 
Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, 
Kun Ren, Philip Shao, and Daniel J. Abadi. In Proceedings of 
SIGMOD, 2012. (pdf) 
• An Evaluation of the Advantages and Disadvantages of 
Deterministic Database Systems 
Kun Ren, Alexander Thomson and Daniel J. Abadi. In PVLDB, 
7(10), 2014. (pdf) 
• Modularity and Scalability in Calvin 
Alexander Thomson and Daniel J. Abadi. In IEEE Data Eng. 
Bull., 36(2): 48-55, 2013. (pdf) 
• Lightweight Locking for Main Memory Database Systems 
Kun Ren, Alexander Thomson, and Daniel J. Abadi. In PVLDB 
6(2): 145-156, 2012. (pdf)
Conclusions 
• Determinism not a good fit for latency-sensitive 
applications 
• Fewer options to deal with node overload 
(true only for lock-based implementation) 
• Much improved throughput for distributed 
transactions 
• Much simpler design. Recover manager, 
lock manager, totally separate from rest of 
DBMS 
• Replication is trivial

More Related Content

What's hot

ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailability
webuploader
 
Distributed Middleware Reliability & Fault Tolerance Support in System S
Distributed Middleware Reliability & Fault Tolerance Support in System SDistributed Middleware Reliability & Fault Tolerance Support in System S
Distributed Middleware Reliability & Fault Tolerance Support in System S
Harini Sirisena
 

What's hot (20)

Load balancing in Distributed Systems
Load balancing in Distributed SystemsLoad balancing in Distributed Systems
Load balancing in Distributed Systems
 
4. system models
4. system models4. system models
4. system models
 
Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems
 
Teradata Architecture
Teradata Architecture Teradata Architecture
Teradata Architecture
 
ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailability
 
From Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsFrom Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed Systems
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii Vozniuk
 
DataCluster
DataClusterDataCluster
DataCluster
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network Issues
 
Applications of Distributed Systems
Applications of Distributed SystemsApplications of Distributed Systems
Applications of Distributed Systems
 
Process coordination
Process coordinationProcess coordination
Process coordination
 
Hbase hivepig
Hbase hivepigHbase hivepig
Hbase hivepig
 
NoSQL Evolution
NoSQL EvolutionNoSQL Evolution
NoSQL Evolution
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
 
Madsqlserver
MadsqlserverMadsqlserver
Madsqlserver
 
Virtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A SurveyVirtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A Survey
 
Distributed Middleware Reliability & Fault Tolerance Support in System S
Distributed Middleware Reliability & Fault Tolerance Support in System SDistributed Middleware Reliability & Fault Tolerance Support in System S
Distributed Middleware Reliability & Fault Tolerance Support in System S
 
Architectural Tactics for Large Scale Systems
Architectural Tactics for Large Scale SystemsArchitectural Tactics for Large Scale Systems
Architectural Tactics for Large Scale Systems
 
Memory Management
Memory ManagementMemory Management
Memory Management
 

Viewers also liked

Viewers also liked (20)

Consistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignConsistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System Design
 
The Future of Distributed Databases
The Future of Distributed DatabasesThe Future of Distributed Databases
The Future of Distributed Databases
 
Invisible loading
Invisible loadingInvisible loading
Invisible loading
 
Beckman abadi-5min-pres
Beckman abadi-5min-presBeckman abadi-5min-pres
Beckman abadi-5min-pres
 
Daniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 PanelDaniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 Panel
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
 
Hadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesHadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and Opportunities
 
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13
 
Dallas Breakfast Seminar
Dallas Breakfast SeminarDallas Breakfast Seminar
Dallas Breakfast Seminar
 
Cambridge Breakfast Seminar
Cambridge Breakfast SeminarCambridge Breakfast Seminar
Cambridge Breakfast Seminar
 
NuoDB Blackbirds Release 2.0 Launch
NuoDB Blackbirds Release 2.0 LaunchNuoDB Blackbirds Release 2.0 Launch
NuoDB Blackbirds Release 2.0 Launch
 
The Ins and Outs of Cloud-Scale for ISVs
The Ins and Outs of Cloud-Scale for ISVsThe Ins and Outs of Cloud-Scale for ISVs
The Ins and Outs of Cloud-Scale for ISVs
 
Sharing Experiences in Cloud Adoption: Burlington, MA
Sharing Experiences in Cloud Adoption: Burlington, MASharing Experiences in Cloud Adoption: Burlington, MA
Sharing Experiences in Cloud Adoption: Burlington, MA
 
From Backups To Time Travel: A Systems Perspective on Snapshots
From Backups To Time Travel: A Systems Perspective on SnapshotsFrom Backups To Time Travel: A Systems Perspective on Snapshots
From Backups To Time Travel: A Systems Perspective on Snapshots
 
New york-breakfast-seminar
New york-breakfast-seminarNew york-breakfast-seminar
New york-breakfast-seminar
 
Future of Cloud: Insights From the Front Line
Future of Cloud: Insights From the Front LineFuture of Cloud: Insights From the Front Line
Future of Cloud: Insights From the Front Line
 
Key Database Criteria for Cloud Applications
Key Database Criteria for Cloud ApplicationsKey Database Criteria for Cloud Applications
Key Database Criteria for Cloud Applications
 
London Breakfast Seminar
London Breakfast SeminarLondon Breakfast Seminar
London Breakfast Seminar
 

Similar to The Power of Determinism in Database Systems

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226
Nick Kypreos
 
Osi model
Osi model Osi model
Osi model
maha tce
 

Similar to The Power of Determinism in Database Systems (20)

Introduction
IntroductionIntroduction
Introduction
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Hbase hive pig
Hbase hive pigHbase hive pig
Hbase hive pig
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
chap-0 .ppt
chap-0 .pptchap-0 .ppt
chap-0 .ppt
 
No stress with state
No stress with stateNo stress with state
No stress with state
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226
 
Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity security
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
 
Patterns of enterprise application architecture
Patterns of enterprise application architecturePatterns of enterprise application architecture
Patterns of enterprise application architecture
 
Storm at Forter
Storm at ForterStorm at Forter
Storm at Forter
 
Osi model
Osi model Osi model
Osi model
 
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
 
02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdf02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdf
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

The Power of Determinism in Database Systems

  • 1. The Power of Determinism in Database Systems Daniel J. Abadi Yale University (Joint work with Jose Faleiro, Kun Ren, and Alex Thomson)
  • 2. Database Systems Are Great • Protects a dataset from corruption or deletion in the face of media, system, or program crashes • Allows programs to change state of data in arbitrary ways • Allows 1000s of such programs to run concurrently – Guarantees atomicity and isolation of such programs • Has served as blueprint for many concurrent, highly complex systems
  • 3. But … • Design is incredibly complex – Takes $17 million to build a new one • Components are horribly monolithic • Corner case bugs nearly impossible to reproduce • Does not scale horizontally • Does not scale horizontally (seriously) Should the DBMS architecture really be a blueprint for concurrent system design?
  • 4. Nondeterminism is the problem • Building on top of: – OSes that enable threads to be scheduled arbitrarily – Networks that deliver messages with arbitrary delays (and sometimes in arbitrary orders) – Hardware that can fail arbitrarily • Only natural to allow the state of the database to be dependent on these nondeterministic events
  • 5. Nondeterminism is the problem • OS non-deterministic thread scheduling leads to: – Arbitrary transaction interleaving – Deadlocks – Difficult to reproduce bugs – Tight interactions between lock manager, recovery manager, access manager, and transaction manager. • Hardware failures and message delivery delays result in transaction aborts – Need complicated recovery manager to handle half-completed transactions – Need commit-protocol for distributed transactions
  • 6. How to eliminate nondeterminism? • There exist proposals for: – Deterministic operating systems – (Somewhat) deterministic networking layers – Highly redundant and reliable hardware • Maybe one day those proposals will come with fewer disadvantages • In the meantime, we have to create determinism from nondeterministic components – Select and choose what we make deterministic
  • 7. Possible determinism levels • Given an input and initial state of the database system, to get to one and only one possible final state: – Level 1: System always runs the same sequence of instructions – Level 2: System always proceeds through the same sequence of states of the database – Level 3: Database is allowed to proceed through states in any order as long as the final state of all external and internal data structures is determined by the input – Level 4: Database is allowed to proceed through states in any order as long as the final state of all external structures is determined by the input
  • 8. Database Systems Problems • Design is incredibly complex – Takes $17 million to build a new one • Components are horribly monolithic • Corner case bugs nearly impossible to reproduce • Does not scale horizontally • Does not scale horizontally
  • 9. Database Systems Problems • Design is incredibly complex – Takes $17 million to build a new one LEVEL 4 DETERMINISM HELPS WITH ALL OF • Components are horribly monolithic • Corner case bugs nearly impossible to reproduce • Does not scale THESE horizontally • Does not scale horizontally
  • 10. Recovery • Brain-dead version: – Log all input to the system – Upon a failure, trash the entire database, reply input log from the beginning • Less brain-dead version: – Create checkpoints of database state as of some point in the input log – Upon a failure, trash the entire database, load checkpoint, replay input log from point where checkpoint was taken • Note that logging can happen entirely externally to the DBMS • Same is true for checkpointing, although may want to perform it inside the DBMS for performance – Even in this case, it needs very little knowledge about other components
  • 11. Replication • Send the same input log to replica DBMS – User-visible state in replicas will not diverge – Can happen entirely externally to the DBMS
  • 12. Horizontal Scalability • Active distributed xacts not aborted upon node failure – Greatly reduces (or eliminates) cost of distributed commit • Don’t have to worry about nodes failing during commit protocol • Don’t have to worry about affects of transaction making it to disk before promising to commit transaction • Just need one message from any node that potentially can deterministically abort the xact – This message can be sent in the middle of the xact, as soon as it knows it will commit
  • 13. One Way to Implement Determinism • Use a preprocessor to handle client communications, and create a log of submitted xacts • Send log in batches to DBMS • Every xact immediately requests all locks it will need (in order of log) • If it doesn’t know what it will need – Run enough of the xact to find out, but do not change the database state – Reissue xact to the preprocessor with lock requirements included as parameter – Run enough of the new xact to find out if it locked the correct items (database state might have changed in the meantime) • If so, then xact can proceed as normal • If not, reissue again to the preprocessor and repeat as necessary • Trivial to prove this is deterministic and deadlock-free
  • 14. What’s the Downside? • Increased latency to log input transactions and send to the DBMS in batches • No flexibility for the system to abort transactions on a whim • Can’t reorder transaction execution if one xact stalls mid-transaction • Need to determine what will be locked in advance
  • 15. Additional Upside • Our implementation eliminates deadlocks – Distributed deadlock is a major problem for distributed DBMSs • Lock manager totally separate from the rest of DBMS – Increases modularity of the system
  • 16. Experimental Evaluation • Experiments conducted on Amazon EC2 using m3.2xlarge(Double Extra Large) • Cluster of 8 nodes • TPC-C • Microbenchmark: – 10RMW actions – 10RMW actions + CPU computation
  • 17. TPC-C
  • 18. Microbenchmark Experiments (Long xacts) 250000 200000 150000 100000 50000 0 0% 20% 40% 60% 80% 100% Transactions per second per node % Distributed Transactions Deterministic, high contention Nondeterministic, high contention Deterministic, low contention Nondeterministic, low contention Nondeterministic w/o 2PC, low contention Nondeterministic w/o 2PC, high contention
  • 19. Microbenchmark Experiments (Short xacts) 600000 500000 400000 300000 200000 100000 0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Transactions per second per node % Distributed Transactions Deterministic, high contention Nondeterministic, high contention Nondeterministic, low contention Deterministic, low contention Deterministic w/ VLL, low contention Deterministic w/ VLL, high contention
  • 20. Resource Constraints Experiments 250000 200000 150000 100000 50000 0 Deterministic, 5% distributed Deterministic, 100% distributed Nondeterministic, 5% distributed Nondeterministic, 100% distributed Nondeterministic w/ throttling, 5% distributed 0 5 10 15 20 25 30 35 40 45 throughput (txns/sec) time (seconds)
  • 21. Dependent Transactions Experiments 70000 60000 50000 40000 30000 20000 Deterministic, 0% dependent Deterministic, 20% dependent Deterministic, 50% dependent Deterministic, 100% dependent Nondeterministic, 0% dependent Nondeterministic, 20% dependent Nondeterministic, 50% dependent Nondeterministic, 100% dependent 0 (a) 0% distributed transactions Deterministic, 0% dependent Deterministic, 20% dependent Deterministic, 50% dependent Deterministic, 100% dependent Nondeterministic, 0% dependent Nondeterministic, 20% dependent Nondeterministic, 50% dependent Nondeterministic, 100% dependent (b)100% distributed transactions 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 0.01 0.1 1 10 100 throughput (txns/sec) index entry volatility 10000 0.01 0.1 1 10 100 throughput (txns/sec) index entry volatility
  • 23. More information • The Case for Determinism in Database Systems Alexander Thomson and Daniel J. Abadi. In PVLDB, 3(1), 2010. (pdf) • Calvin: Fast Distributed Transactions for Partitioned Database Systems Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. In Proceedings of SIGMOD, 2012. (pdf) • An Evaluation of the Advantages and Disadvantages of Deterministic Database Systems Kun Ren, Alexander Thomson and Daniel J. Abadi. In PVLDB, 7(10), 2014. (pdf) • Modularity and Scalability in Calvin Alexander Thomson and Daniel J. Abadi. In IEEE Data Eng. Bull., 36(2): 48-55, 2013. (pdf) • Lightweight Locking for Main Memory Database Systems Kun Ren, Alexander Thomson, and Daniel J. Abadi. In PVLDB 6(2): 145-156, 2012. (pdf)
  • 24. Conclusions • Determinism not a good fit for latency-sensitive applications • Fewer options to deal with node overload (true only for lock-based implementation) • Much improved throughput for distributed transactions • Much simpler design. Recover manager, lock manager, totally separate from rest of DBMS • Replication is trivial

Editor's Notes

  1. Hi everyone, I’m Jose Faleiro, and I’m here to talk about Lazy Evaluation of Transactions in Database Systems. This is joint work with Alexander Thomson and Daniel Abadi.