SlideShare a Scribd company logo
1 of 32
PelotonDB
A self-driving database for hybrid workloads
2
Agenda
• Overview of Peloton
• Hybrid Storage Layout
• Concurrency Control
• Layout Reorganization
• Evaluation
3
Overview of Peloton
4
How to be Self-driving?
• Understand the workload – OLTP or OLAP
• Forecast resource utilization trends
• Identify potential actions that tune and optimize the database
5
Architecture
6
Hybrid Storage Layout
7
Problems
• Many existing systems: OLTP + OLAP
• Takes minutes or even hours to propagate changes
• Administrative overhead
• Developer needs to write query for multiple systems
Transactional
Database
(e.g. MySQL)
Analytical
Database
(e.g. HiStore)
ETL
8
HTAP
• Classic solution – 2 separated engines
• OLTP engine with row-oriented data
• OLAP engine with column-oriented data
• Use a synchronization method (e.g. 2PC) to combine the results
• Well, this looks better but still too complex
• Limited types of queries
• Performance overhead
9
Hybrid Storage in Peloton
• A unified architecture for 'hot' and 'cold' tuples, based on a logical
abstraction over these different layouts
• A novel online reorganization technique that continuously enhances
the physical design
10
Storage Models
• NSM is good for OLTP
• DSM is good for OLAP
• FSM: adaptive as data get cooler
• NSM/DSM is special case of FSM
11
12
Physical Tiles
• A tuple can be stored in
different layouts over time.
• Stores new tuples in NSM
• Reorganizes as they become
colder
13
Logical Tiles
Composed of
• Underlying physical tiles
• Mapping of attributes
• Offsets of values
14
Logical Tile Algebra
• Decoupling storage managing and execution engine
• Intermediate results can be represented as passthrough logical tile
15
Bridge Operator
Pipeline Breaker
Metadata Operator
Mutator
16
Benefits of Logical Tiles
• Layout Transparency
• Vectorized Processing
• Flexible Materialization
• Caching Behavior
17
Concurrency Control
18
MVCC
• HTAP workloads are comprised of short-duration transactions
alongside long-running analytical queries.
• Every transaction holds
• A unique transaction Id
• A unique commit timestamp (assigned on committing)
• Timestamp of last committed transaction
19
Versioning Metadata
• For each tuple
• TxnId: The transaction id that currently holds a latch
• BeginCTS: The commit timestamp from which it becomes visible
• EndCTS: The commit timestamp after which it ceases to be visible
• PreV: Reference to the previous version
20
Operations under MVCC
• Insert / Delete / Update
• TableScan / IndexScan
21
Layout Reorganization
22
Approach
• Track the recent query workloads
• Periodically compute a workload-optimized storage layout in the
background
23
Query Monitoring
• Collects information about the attributes in queries
24
Find Optimized Partition
• Naive algorithm takes . Infeasible!
• Heuristic approach
1. Clustering similar queries by k-means
• distance(q, p) = #attributes appears only in one side / #attributes
• Prioritizes each query based on its plan cost to avoid partial to TP queries
• Prioritizes the older samples with a weight w
2. Generate a layout in greedy way
• Iterates over the clusters in the weight-descending order
• For each cluster, groups the attributes accessed by that its representative
query together into a tile
25
K-means
26
Data Layout Reorganization
• Copies over the data to the new layout then atomically swaps in
• Concurrent DML operation only modifies the versioning metadata
• Old data is reclaimed if not referenced by any logical tile
27
Evaluation
28
ADAPT benchmark
Simulating enterprise workloads
29
# Attributes Size of tuple # Tuples PK
Narrow table 50 200B
10m a0
Wide table 500 2KB
30
Performance Impact of Storage Models
31Workload-Aware Adaptation
Thanks!
Q&A
32

More Related Content

What's hot

HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge MeinhardHNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge MeinhardHelix Nebula The Science Cloud
 
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managabilityGaurav Bahrani
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiBowen Li
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward
 
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Bowen Li
 
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...Bowen Li
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward
 
Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksEron Wright
 
OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...
OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...
OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...mfrancis
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkTill Rohrmann
 
"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018Globus
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Flink Forward
 
Updating the Globus Connect Architecture - ARCC Workshop at PEARC17
Updating the Globus Connect Architecture - ARCC Workshop at PEARC17Updating the Globus Connect Architecture - ARCC Workshop at PEARC17
Updating the Globus Connect Architecture - ARCC Workshop at PEARC17Mary Bass
 
Centralised logging with ELK stack
Centralised logging with ELK stackCentralised logging with ELK stack
Centralised logging with ELK stackSimon Hanmer
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uberconfluent
 

What's hot (19)

HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge MeinhardHNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
HNSciCloud Info Day, 7 Sept 2016, Functional Requirements by Helge Meinhard
 
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managability
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
 
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
Integrating Flink with Hive, Seattle Flink Meetup, Feb 2019
 
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
AthenaX - Unified Stream & Batch Processing using SQL at Uber, Zhenqiu Huang,...
 
Concourse CI
Concourse CIConcourse CI
Concourse CI
 
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...
 
Flink Connector Development Tips & Tricks
Flink Connector Development Tips & TricksFlink Connector Development Tips & Tricks
Flink Connector Development Tips & Tricks
 
OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...
OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...
OSGi Community Event 2010 - Modular Applications on a Data Grid - A Case Stud...
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache Flink
 
"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
 
Updating the Globus Connect Architecture - ARCC Workshop at PEARC17
Updating the Globus Connect Architecture - ARCC Workshop at PEARC17Updating the Globus Connect Architecture - ARCC Workshop at PEARC17
Updating the Globus Connect Architecture - ARCC Workshop at PEARC17
 
Centralised logging with ELK stack
Centralised logging with ELK stackCentralised logging with ELK stack
Centralised logging with ELK stack
 
Grafana 7.0
Grafana 7.0Grafana 7.0
Grafana 7.0
 
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at UberDisaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
 

Similar to Self-driving PelotonDB automates hybrid workloads

Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewLei (Harry) Zhang
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
 
LISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground RunningLISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground RunningChris McEniry
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPTony Rogerson
 
Membase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San FranciscoMembase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San FranciscoMembase
 
Linux kernel development ch4
Linux kernel development   ch4Linux kernel development   ch4
Linux kernel development ch4huangachou
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Lucidworks
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache KuduAndriy Zabavskyy
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2Thomas Segismont
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Work with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMsWork with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMsMalin Weiss
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Martin Traverso
 
What you need to know for postgresql operation
What you need to know for postgresql operationWhat you need to know for postgresql operation
What you need to know for postgresql operationAnton Bushmelev
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...Edgar Alejandro Villegas
 
Swift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StorySwift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StoryBrian Cline
 

Similar to Self-driving PelotonDB automates hybrid workloads (20)

Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
Preso-v0.1
Preso-v0.1Preso-v0.1
Preso-v0.1
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
 
LISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground RunningLISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground Running
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTP
 
Membase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San FranciscoMembase Intro from Membase Meetup San Francisco
Membase Intro from Membase Meetup San Francisco
 
Linux kernel development ch4
Linux kernel development   ch4Linux kernel development   ch4
Linux kernel development ch4
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
273CC03851E778670A (1).ppt
273CC03851E778670A (1).ppt273CC03851E778670A (1).ppt
273CC03851E778670A (1).ppt
 
Work with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMsWork with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMs
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
 
What you need to know for postgresql operation
What you need to know for postgresql operationWhat you need to know for postgresql operation
What you need to know for postgresql operation
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
 
Swift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StorySwift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer Story
 

More from 宇 傅

Parallel Query Execution
Parallel Query ExecutionParallel Query Execution
Parallel Query Execution宇 傅
 
The Evolution of Data Systems
The Evolution of Data SystemsThe Evolution of Data Systems
The Evolution of Data Systems宇 傅
 
The Volcano/Cascades Optimizer
The Volcano/Cascades OptimizerThe Volcano/Cascades Optimizer
The Volcano/Cascades Optimizer宇 傅
 
Immutable Data Structures
Immutable Data StructuresImmutable Data Structures
Immutable Data Structures宇 傅
 
The Case for Learned Index Structures
The Case for Learned Index StructuresThe Case for Learned Index Structures
The Case for Learned Index Structures宇 傅
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming宇 傅
 
Functional Programming in Java 8
Functional Programming in Java 8Functional Programming in Java 8
Functional Programming in Java 8宇 傅
 
第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩宇 傅
 
Data Streaming Algorithms
Data Streaming AlgorithmsData Streaming Algorithms
Data Streaming Algorithms宇 傅
 
Golang 101
Golang 101Golang 101
Golang 101宇 傅
 
Docker Container: isolation and security
Docker Container: isolation and securityDocker Container: isolation and security
Docker Container: isolation and security宇 傅
 
Paxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus AlgorithmPaxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus Algorithm宇 傅
 

More from 宇 傅 (12)

Parallel Query Execution
Parallel Query ExecutionParallel Query Execution
Parallel Query Execution
 
The Evolution of Data Systems
The Evolution of Data SystemsThe Evolution of Data Systems
The Evolution of Data Systems
 
The Volcano/Cascades Optimizer
The Volcano/Cascades OptimizerThe Volcano/Cascades Optimizer
The Volcano/Cascades Optimizer
 
Immutable Data Structures
Immutable Data StructuresImmutable Data Structures
Immutable Data Structures
 
The Case for Learned Index Structures
The Case for Learned Index StructuresThe Case for Learned Index Structures
The Case for Learned Index Structures
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming
 
Functional Programming in Java 8
Functional Programming in Java 8Functional Programming in Java 8
Functional Programming in Java 8
 
第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩
 
Data Streaming Algorithms
Data Streaming AlgorithmsData Streaming Algorithms
Data Streaming Algorithms
 
Golang 101
Golang 101Golang 101
Golang 101
 
Docker Container: isolation and security
Docker Container: isolation and securityDocker Container: isolation and security
Docker Container: isolation and security
 
Paxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus AlgorithmPaxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus Algorithm
 

Recently uploaded

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 

Recently uploaded (20)

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 

Self-driving PelotonDB automates hybrid workloads

  • 1. PelotonDB A self-driving database for hybrid workloads
  • 2. 2
  • 3. Agenda • Overview of Peloton • Hybrid Storage Layout • Concurrency Control • Layout Reorganization • Evaluation 3
  • 5. How to be Self-driving? • Understand the workload – OLTP or OLAP • Forecast resource utilization trends • Identify potential actions that tune and optimize the database 5
  • 8. Problems • Many existing systems: OLTP + OLAP • Takes minutes or even hours to propagate changes • Administrative overhead • Developer needs to write query for multiple systems Transactional Database (e.g. MySQL) Analytical Database (e.g. HiStore) ETL 8
  • 9. HTAP • Classic solution – 2 separated engines • OLTP engine with row-oriented data • OLAP engine with column-oriented data • Use a synchronization method (e.g. 2PC) to combine the results • Well, this looks better but still too complex • Limited types of queries • Performance overhead 9
  • 10. Hybrid Storage in Peloton • A unified architecture for 'hot' and 'cold' tuples, based on a logical abstraction over these different layouts • A novel online reorganization technique that continuously enhances the physical design 10
  • 11. Storage Models • NSM is good for OLTP • DSM is good for OLAP • FSM: adaptive as data get cooler • NSM/DSM is special case of FSM 11
  • 12. 12
  • 13. Physical Tiles • A tuple can be stored in different layouts over time. • Stores new tuples in NSM • Reorganizes as they become colder 13
  • 14. Logical Tiles Composed of • Underlying physical tiles • Mapping of attributes • Offsets of values 14
  • 15. Logical Tile Algebra • Decoupling storage managing and execution engine • Intermediate results can be represented as passthrough logical tile 15
  • 17. Benefits of Logical Tiles • Layout Transparency • Vectorized Processing • Flexible Materialization • Caching Behavior 17
  • 19. MVCC • HTAP workloads are comprised of short-duration transactions alongside long-running analytical queries. • Every transaction holds • A unique transaction Id • A unique commit timestamp (assigned on committing) • Timestamp of last committed transaction 19
  • 20. Versioning Metadata • For each tuple • TxnId: The transaction id that currently holds a latch • BeginCTS: The commit timestamp from which it becomes visible • EndCTS: The commit timestamp after which it ceases to be visible • PreV: Reference to the previous version 20
  • 21. Operations under MVCC • Insert / Delete / Update • TableScan / IndexScan 21
  • 23. Approach • Track the recent query workloads • Periodically compute a workload-optimized storage layout in the background 23
  • 24. Query Monitoring • Collects information about the attributes in queries 24
  • 25. Find Optimized Partition • Naive algorithm takes . Infeasible! • Heuristic approach 1. Clustering similar queries by k-means • distance(q, p) = #attributes appears only in one side / #attributes • Prioritizes each query based on its plan cost to avoid partial to TP queries • Prioritizes the older samples with a weight w 2. Generate a layout in greedy way • Iterates over the clusters in the weight-descending order • For each cluster, groups the attributes accessed by that its representative query together into a tile 25
  • 27. Data Layout Reorganization • Copies over the data to the new layout then atomically swaps in • Concurrent DML operation only modifies the versioning metadata • Old data is reclaimed if not referenced by any logical tile 27
  • 29. ADAPT benchmark Simulating enterprise workloads 29 # Attributes Size of tuple # Tuples PK Narrow table 50 200B 10m a0 Wide table 500 2KB
  • 30. 30 Performance Impact of Storage Models

Editor's Notes

  1. Then, over time they become colder and thus are less likely to be updated again. For instance, more than half of the content that Facebook users access and interact with are shared by their friends in the past two days, and then there is a rapid decline in content popularity over the following days
  2. (Q1) an insert query that adds a single tuple into the table (Q2) a scan query that projects a subset of attributes of the tuples that satisfy a predicate (Q3) an aggregate query that computes the maximum value for a set of attributes over the selected tuples (Q4) an arithmetic query that sums up a subset of attributes of the selected tuples (Q5) a join query that combines the tuples from two tables based on a predicate defined over the attributes in the tables