PelotonDB is a self-driving database that uses a hybrid storage layout to support both OLTP and OLAP workloads. It uses logical tiles to decouple storage management from query execution. Tuples can be stored in either a narrow storage model, dense storage model, or flexible storage model depending on how "hot" the data is. The system continuously monitors queries and reorganizes the physical data layout in the background to optimize for the workload over time using k-means clustering. It employs MVCC for concurrency control and minimizes overhead during data reorganization by only modifying versioning metadata. An evaluation using the ADAPT benchmark shows PelotonDB can adapt the storage layout to improve performance.
5. How to be Self-driving?
• Understand the workload – OLTP or OLAP
• Forecast resource utilization trends
• Identify potential actions that tune and optimize the database
5
8. Problems
• Many existing systems: OLTP + OLAP
• Takes minutes or even hours to propagate changes
• Administrative overhead
• Developer needs to write query for multiple systems
Transactional
Database
(e.g. MySQL)
Analytical
Database
(e.g. HiStore)
ETL
8
9. HTAP
• Classic solution – 2 separated engines
• OLTP engine with row-oriented data
• OLAP engine with column-oriented data
• Use a synchronization method (e.g. 2PC) to combine the results
• Well, this looks better but still too complex
• Limited types of queries
• Performance overhead
9
10. Hybrid Storage in Peloton
• A unified architecture for 'hot' and 'cold' tuples, based on a logical
abstraction over these different layouts
• A novel online reorganization technique that continuously enhances
the physical design
10
11. Storage Models
• NSM is good for OLTP
• DSM is good for OLAP
• FSM: adaptive as data get cooler
• NSM/DSM is special case of FSM
11
19. MVCC
• HTAP workloads are comprised of short-duration transactions
alongside long-running analytical queries.
• Every transaction holds
• A unique transaction Id
• A unique commit timestamp (assigned on committing)
• Timestamp of last committed transaction
19
20. Versioning Metadata
• For each tuple
• TxnId: The transaction id that currently holds a latch
• BeginCTS: The commit timestamp from which it becomes visible
• EndCTS: The commit timestamp after which it ceases to be visible
• PreV: Reference to the previous version
20
25. Find Optimized Partition
• Naive algorithm takes . Infeasible!
• Heuristic approach
1. Clustering similar queries by k-means
• distance(q, p) = #attributes appears only in one side / #attributes
• Prioritizes each query based on its plan cost to avoid partial to TP queries
• Prioritizes the older samples with a weight w
2. Generate a layout in greedy way
• Iterates over the clusters in the weight-descending order
• For each cluster, groups the attributes accessed by that its representative
query together into a tile
25
27. Data Layout Reorganization
• Copies over the data to the new layout then atomically swaps in
• Concurrent DML operation only modifies the versioning metadata
• Old data is reclaimed if not referenced by any logical tile
27
Then, over time they become colder and thus are less likely to be updated again. For instance, more than half of the content that Facebook users access and interact with are shared by their friends in the past two days, and then there is a rapid decline in content popularity over the following days
(Q1) an insert query that adds a single tuple into the table
(Q2) a scan query that projects a subset of attributes of the tuples that satisfy a predicate
(Q3) an aggregate query that computes the maximum value for a set of attributes over the selected tuples
(Q4) an arithmetic query that sums up a subset of attributes of the selected tuples
(Q5) a join query that combines the tuples from two tables based on a predicate defined over the attributes in the tables