Slides for Daniel Abadi talk at UC Berkeley on 10/22/2014. Discusses the problems with traditional database systems, especially around modularity and horizontal scalability, and shows how deterministic database systems can help.
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
The Power of Determinism in Database Systems
1. The Power of Determinism in
Database Systems
Daniel J. Abadi
Yale University
(Joint work with Jose Faleiro, Kun Ren, and Alex
Thomson)
2. Database Systems Are Great
• Protects a dataset from corruption or
deletion in the face of media, system, or
program crashes
• Allows programs to change state of data in
arbitrary ways
• Allows 1000s of such programs to run
concurrently
– Guarantees atomicity and isolation of such
programs
• Has served as blueprint for many
concurrent, highly complex systems
3. But …
• Design is incredibly complex
– Takes $17 million to build a new one
• Components are horribly monolithic
• Corner case bugs nearly impossible to
reproduce
• Does not scale horizontally
• Does not scale horizontally (seriously)
Should the DBMS architecture really be a
blueprint for concurrent system design?
4. Nondeterminism is the problem
• Building on top of:
– OSes that enable threads to be scheduled
arbitrarily
– Networks that deliver messages with arbitrary
delays (and sometimes in arbitrary orders)
– Hardware that can fail arbitrarily
• Only natural to allow the state of the
database to be dependent on these
nondeterministic events
5. Nondeterminism is the problem
• OS non-deterministic thread scheduling leads
to:
– Arbitrary transaction interleaving
– Deadlocks
– Difficult to reproduce bugs
– Tight interactions between lock manager,
recovery manager, access manager, and
transaction manager.
• Hardware failures and message delivery
delays result in transaction aborts
– Need complicated recovery manager to handle
half-completed transactions
– Need commit-protocol for distributed transactions
6. How to eliminate nondeterminism?
• There exist proposals for:
– Deterministic operating systems
– (Somewhat) deterministic networking layers
– Highly redundant and reliable hardware
• Maybe one day those proposals will come
with fewer disadvantages
• In the meantime, we have to create
determinism from nondeterministic
components
– Select and choose what we make deterministic
7. Possible determinism levels
• Given an input and initial state of the database
system, to get to one and only one possible final
state:
– Level 1: System always runs the same sequence
of instructions
– Level 2: System always proceeds through the
same sequence of states of the database
– Level 3: Database is allowed to proceed through
states in any order as long as the final state of all
external and internal data structures is
determined by the input
– Level 4: Database is allowed to proceed through
states in any order as long as the final state of all
external structures is determined by the input
8. Database Systems Problems
• Design is incredibly complex
– Takes $17 million to build a new one
• Components are horribly monolithic
• Corner case bugs nearly impossible to
reproduce
• Does not scale horizontally
• Does not scale horizontally
9. Database Systems Problems
• Design is incredibly complex
– Takes $17 million to build a new one
LEVEL 4 DETERMINISM
HELPS WITH ALL OF
• Components are horribly monolithic
• Corner case bugs nearly impossible to
reproduce
• Does not scale THESE
horizontally
• Does not scale horizontally
10. Recovery
• Brain-dead version:
– Log all input to the system
– Upon a failure, trash the entire database, reply input
log from the beginning
• Less brain-dead version:
– Create checkpoints of database state as of some
point in the input log
– Upon a failure, trash the entire database, load
checkpoint, replay input log from point where
checkpoint was taken
• Note that logging can happen entirely externally to
the DBMS
• Same is true for checkpointing, although may want
to perform it inside the DBMS for performance
– Even in this case, it needs very little knowledge about
other components
11. Replication
• Send the same input log to replica DBMS
– User-visible state in replicas will not diverge
– Can happen entirely externally to the DBMS
12. Horizontal Scalability
• Active distributed xacts not aborted upon
node failure
– Greatly reduces (or eliminates) cost of
distributed commit
• Don’t have to worry about nodes failing during
commit protocol
• Don’t have to worry about affects of transaction
making it to disk before promising to commit
transaction
• Just need one message from any node that
potentially can deterministically abort the xact
– This message can be sent in the middle of the xact, as
soon as it knows it will commit
13. One Way to Implement Determinism
• Use a preprocessor to handle client communications,
and create a log of submitted xacts
• Send log in batches to DBMS
• Every xact immediately requests all locks it will need
(in order of log)
• If it doesn’t know what it will need
– Run enough of the xact to find out, but do not change the
database state
– Reissue xact to the preprocessor with lock requirements
included as parameter
– Run enough of the new xact to find out if it locked the
correct items (database state might have changed in the
meantime)
• If so, then xact can proceed as normal
• If not, reissue again to the preprocessor and repeat as
necessary
• Trivial to prove this is deterministic and deadlock-free
14. What’s the Downside?
• Increased latency to log input transactions
and send to the DBMS in batches
• No flexibility for the system to abort
transactions on a whim
• Can’t reorder transaction execution if one
xact stalls mid-transaction
• Need to determine what will be locked in
advance
15. Additional Upside
• Our implementation eliminates deadlocks
– Distributed deadlock is a major problem for
distributed DBMSs
• Lock manager totally separate from the
rest of DBMS
– Increases modularity of the system
16. Experimental Evaluation
• Experiments conducted on Amazon EC2
using m3.2xlarge(Double Extra Large)
• Cluster of 8 nodes
• TPC-C
• Microbenchmark:
– 10RMW actions
– 10RMW actions + CPU computation
23. More information
• The Case for Determinism in Database Systems
Alexander Thomson and Daniel J. Abadi. In PVLDB, 3(1),
2010. (pdf)
• Calvin: Fast Distributed Transactions for Partitioned
Database Systems
Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng,
Kun Ren, Philip Shao, and Daniel J. Abadi. In Proceedings of
SIGMOD, 2012. (pdf)
• An Evaluation of the Advantages and Disadvantages of
Deterministic Database Systems
Kun Ren, Alexander Thomson and Daniel J. Abadi. In PVLDB,
7(10), 2014. (pdf)
• Modularity and Scalability in Calvin
Alexander Thomson and Daniel J. Abadi. In IEEE Data Eng.
Bull., 36(2): 48-55, 2013. (pdf)
• Lightweight Locking for Main Memory Database Systems
Kun Ren, Alexander Thomson, and Daniel J. Abadi. In PVLDB
6(2): 145-156, 2012. (pdf)
24. Conclusions
• Determinism not a good fit for latency-sensitive
applications
• Fewer options to deal with node overload
(true only for lock-based implementation)
• Much improved throughput for distributed
transactions
• Much simpler design. Recover manager,
lock manager, totally separate from rest of
DBMS
• Replication is trivial
Editor's Notes
Hi everyone, I’m Jose Faleiro, and I’m here to talk about Lazy Evaluation of Transactions in Database Systems. This is joint work with Alexander Thomson and Daniel Abadi.