SlideShare a Scribd company logo
1 of 61
The Volcano/Cascades Optimizer
Eric Fu
2018-11-14
Outline
● Background
● Dynamic Programming
● Components
● Search Engine
● Summary
2
Life of SQL
SQL Parser Optimizer Executor
Syntax
Tree
Logical
Plan
Physical
Plan data
● Parser
● Optimizer
● Executor
statistics
3
Query Optimization Strategies
● Choice #1: Heuristics
○ INGRES, Oracle (until mid 1990s)
● Choice #2: Heuristics + Cost-based Join Search
○ System R, early IBM DB2, most open-source DBMSs
● Choice #3: Randomized Search
○ Academics in the 1980s, current Postgres
● Choice #4: Stratified Search
○ IBM’s STARBURST (late 1980s), now IBM DB2 + Oracle
● Choice #5: Unified Search
○ Volcano/Cascades in 1990s, now MSSQL + Greenplum
4
Problem
● Why query optimizing is such a hard problem?
● It’s not difficult to find a feasible solution
● It’s almost impossible to find a optimal solution
5
Why So Many Choices?
● Equivalence Rules
● Various Implements
Join
Join D
Join C
A B
Join
JoinA
JoinB
DC
Join
Join
A
Join
B DC
ABCD, ABDC, ACBD, ACDB, ADBC, ADCB,
BACD, BADC, BCAD, BCDA, BDAC, BDCA,
CABD, CADB, CBAD, CBDA, CDAB, CDBA,
DABC, DACB, DBAC, DBCA, DCAB, DCBA
6
Why So Many Choices?
● Equivalence Rules
● Various Implements
HashJoin
NestedLoopJoin
SortMergeJoin
IndexScan
TableScan
Join
JoinA
JoinB
DC
In Total: 24 * 3 * 2^4 * 3^3 = 31104 !!!
7
Which one is better?
● Given a physical plan, we can estimate its total cost
● Cost of an operator is related to input rows
● Selectivity Factors
SELECT *
FROM Reviews
WHERE 7/1< date < 7/31 AND
rating > 9
8
Summary of Background
Good News
● We known how to construct the search space
Bad News
● It’s almost impossible to exhaust the search space
● We need an elegant & smart way to do the search
9
Dynamic Programing
in Algorithm
10
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
11
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1
12
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2
13
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3
14
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3 5 8 13 21 34 55 89
15
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 ?
It’s fine to go reversely...
16
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 ? ?
17
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 ? ? ?
18
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 ? ? ? ?
19
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 ? ? ? ? ?
20
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3 ? ? ? ?
21
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3 5 ? ? ?
22
Dynamic Programing
● You are climbing a staircase. It takes n steps to reach to the top.
● Each time you can either climb 1 or 2 steps
● In how many distinct ways can you climb to the top?
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3 5 8 13 21 34 55 89
23
Define Dynamic Programing (DP)
● DP is solving a problem by solving a sub-problem
● DP is only appliable for Optimal Substructure
○ Optimal solution of current solution can be calculated from optimal solution of sub-problems
● DP can be done in both directions
○ Filling a table
○ DFS with memo
24
DP in Searching
● Find the minimum path sum from top to bottom
● Each step you may move to adjacent numbers on the row below
2
3 4
6 5 7
4 1 8 3
2
3 4
6 5 7
4 1 8 3
25
DP in Searching
● Find the minimum path sum from top to bottom
● Each step you may move to adjacent numbers on the row below
2
3 4
6 5 7
4 1 8 3 4 1 8 3
26
DP in Searching
● Find the minimum path sum from top to bottom
● Each step you may move to adjacent numbers on the row below
2
3 4
6 5 7
4 1 8 3
7 6
4 1 8 3
10
27
DP in Searching
● Find the minimum path sum from top to bottom
● Each step you may move to adjacent numbers on the row below
2
3 4
6 5 7
4 1 8 3
9
7 6
4 1 8 3
10
10
11
28
DP in Searching
● Find the minimum path sum from top to bottom
● Each step you may move to adjacent numbers on the row below
2
3 4
6 5 7
4 1 8 3
?
4 1 8 3
29
Dynamic Programing
30
Apply DP in Optimization?
Sort
Join
A B
Sort
HashJoin
Scan A Scan B
SortMergeJoin
Scan B
SELECT * FROM A, B WHERE A.bid = B.bid ORDER BY A.bid
Scan A
Sort
Optimal Plan!
Order by aid
Order by bid
Order by bid
31
Apply DP in Optimization?
Sort
Join
A B
Sort
HashJoin
Scan A Scan B
SortMergeJoin
Scan B
Scan A
Sort
Optimal Plan of [AB]
You cannot just apply DP straightforwardly
32
RelSet[ABCD]
System-R Optimizer
● Dynamic Programing
● Interesing Orders
The main contribution: Optimal Substructure is defined so DP is feasible.
ABCD, ABDC, ACBD, ACDB,
ADBC, ADCB, BACD, BADC,
BCAD, BCDA, BDAC, BDCA,
CABD, CADB, CBAD, CBDA,
CDAB, CDBA, DABC, DACB,
DBAC, DBCA, DCAB, DCBA
Access Path Selection in a Relational Database Management System (SIGMOD 1979)
33
RelSet[ABCD]
System-R Optimizer
● Dynamic Programing
● Interesing Orders
The main contribution: Optimal Substructure is defined so DP is feasible.
SortBy[A]ASC SortBy[A]DESC SortBy[B]ASC
······ ··· ···
34
Optimal Substructures
● Based on assumption that cost function is polynomial
● Stores Best Plan for each pair of (Relation Set, Physical Properties)
● Instead of O(n!) plans, only O(n·2n-1) plans need to be enumerated.
RelSet[ABCD]
Order1 Order2 Order3
RelSet[ABC]
Order1 Order2 Order3
RelSet[BCD]
Order1 Order2 Order3
Goal
35
Volcano/Cascades Optimizer (1993)
● Implemented as a code generator (operators, rules, etc) and dynamic-link
library (the search engine)
● Top-down Search (Directed Search)
○ Start with the final outcome that you want
○ Search path could be guided by heuristics
● Relatively, System-R’s approach is in bottom-up style
36
Graefe Goetz
● Volcano - An Extensible and Parallel Query
Evaluation System (1990)
● The Volcano Optimizer Generator: Extensibility and
Efficient Search (1991)
● The Cascades Framework for Query Optimization
(1995)
37
Components
Operators
● logical operators
● algorithms
● enforcers
Rules
● transformation rules
● implementation rules
Properties
● logical properties
● physical properties
Interfaces of Operators
● property function
● applicability function (physical-only)
● cost function (physical-only)
38
Operators
● logical operators
○ e.g. Join, Scan
● algorithms
○ e.g. HashJoin, SortMergeJoin, FileScan, IndexScan
● enforcers
○ e.g. Sort, Shuffle
39
Rules
● transformation rules
○ Tha algebraic rules of expression equivalence
○ e.g. associativity rule, commutative rule
● implementation rules
○ Rules mapping logical operator to algorithms
○ Possible to map multiple logical operators to a single physical operator
● Specify how to match rules to plan tree
○ Sime pattern matching
○ Other condition code is also allowed
40
Properties
● logical properties
○ Can be derived from the logical algebra expression
○ Attached to logical equivalent set: [LogExpr]
○ e.g. schema, expected size
● physical properties
○ Depend on algorithms
○ Attached to physical equivalent set: [LogExpr, PhyProp]
○ e.g. sort order, partitioning
physical properties vector
41
Interfaces of Operators
● applicability function
○ Physical property vectors that it can deliver with
○ Physical property vectors that its input must satisfy
● cost function
○ Estimate its cost
○ Cost is an abstract data type in Volcano. e.g. (CPU cost, IO cost)
● property function
○ Determine logical properties e.g. schema, row count
■ selectivity estimate
○ Determine physical properties e.g. sort order
only applicable for
algorithms & enforcers
42
Components
Operators
● logical operators
● algorithms
● enforcers
Rules
● transformation rules
● implementation rules
Properties
● logical properties
● physical properties
Interfaces of Operators
● property function
● applicability function (physical-only)
● cost function (physical-only)
43
Search Engine
Define goal as [LogExpr, PhysProp]
Logically we may divide the searching procedure into 2 stages:
1. Explore: Apply transformation rules to explore expression space
2. Build: Apply implementation rules to build physical plans and find best one
44
Explore
● Apply transformation rules to explore expression space
● e.g. [ABC] = { (A⨝B)⨝C, (B⨝A)⨝C, (A⨝C)⨝B …}
Join
Join C
A B
Join
Join C
B A
Join
JoinA
CB
Join
JoinC
AB
····
Generated Logical PlansGoal.LogExpr
45
Build
● Apply implementation rules to build physical plans
● For every [LogExpr, PhyProp] record the physical plan to Memo table
● e.g. [AB]⨝C ➡ SortMergeJoin v.s. HashJoin
LogExpr PhyProp BestPlan
[ABC]
-
x⬆
x⬇
[AB] -
… …
Memo Table
HashJoin
[AB] Scan(C)
SMJ
Scan(C)
[AB]
Sort
SMJ
Scan(C)[AB] x⬆
Total Cost = ? Total Cost = ? Total Cost = ?
46
Some Facts
● Volcano do Explore then Build
● While Cascades have only one stage
Actually exploring almost happens before building even in Cascades. Why?
47
Example
Logical Expression Space:
[ABC]
[AB], [AC], [BC]
[A], [B], [C]
Our Mission:
FindBestPlan((A⨝B)⨝C, A.x, 500)
Logical Expression Order Limit
48
49
50
51
52
53
54
55
56
FindBestPlan(LogExpr, PhysProp)
If Memo[LogExpr, PhysProp] is not empty:
● return BestPlan or Failures
Possible moves =
● applicable transformations
● algorithms that give the required PhysProp
● enforcers for required PhysProp
ForEach (Move = pop the most promising moves)
● is transformation: Cost = FindBestPlan(LogExpr, PhysProp)
● is algorithm: Cost = Costself + Sum(Costinput)
● is enforcer: Cost = Costself + Costinput
Memo[LogExpr, PhysProp] = Best Plan
return Best Plan
57
Some Details
● Use cost limit to do branch-and-bound pruning
○ By default set to unlimited
● Mark (LogExpr, PhysProp) as in-progress to prevent dead loop
○ e.g. A JOIN B <=> B JOIN A
● Use prioirity queue to do heuristic ordering of moves
○ Calcite prioritizes RelSet with less depth and higher cost
58
Summary
Volcano/Cascades Optimizer …
● use Rules to build all logical or physical plans
● use Cost to evaluate a physical plan
● use Dynamic Programming to search for the optimal physical plan
59
Compared with RBO
Here are my personal opinions …
● Cost-based: Could find better physical plans
● Rule-independent: Provide an elegant interface for DB implementors
● Still Heuristic: May performs bad in some corner cases
60
Thanks!
Q&A

More Related Content

What's hot

Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits allJulian Hyde
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxData
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
PostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQLPostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQLCockroachDB
 
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...MinhLeNguyenAnh2
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationDatabricks
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward
 
Spark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookSpark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookDatabricks
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Flink Forward
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Databricks
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkPatrick Wendell
 

What's hot (20)

Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits all
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
PostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQLPostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQL
 
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
PostgreSQL_ Up and Running_ A Practical Guide to the Advanced Open Source Dat...
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Spark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookSpark SQL Join Improvement at Facebook
Spark SQL Join Improvement at Facebook
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 

Similar to The Volcano/Cascades Optimizer

How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDBPingCAP
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Sparkdatamantra
 
BlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search FeedbackBlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search Feedbacksinfomicien
 
Lecture 3 - Driving.pdf
Lecture 3 - Driving.pdfLecture 3 - Driving.pdf
Lecture 3 - Driving.pdfSwasShiv
 
Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be I...
Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be I...Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be I...
Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be I...VMware Tanzu
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceDatabricks
 
Active record, standalone migrations, and working with Arel
Active record, standalone migrations, and working with ArelActive record, standalone migrations, and working with Arel
Active record, standalone migrations, and working with ArelAlex Tironati
 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLJanani C
 
Lecture01 algorithm analysis
Lecture01 algorithm analysisLecture01 algorithm analysis
Lecture01 algorithm analysisZara Nawaz
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheapMarc Cluet
 
Monitoring with ElasticSearch
Monitoring with ElasticSearch Monitoring with ElasticSearch
Monitoring with ElasticSearch Kris Buytaert
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examplesFelipe
 
SOLID refactoring - racing car katas
SOLID refactoring - racing car katasSOLID refactoring - racing car katas
SOLID refactoring - racing car katasGeorg Berky
 
Alexandr Vronskiy "Evolution of Ecommerce Application"
Alexandr Vronskiy "Evolution of Ecommerce Application"Alexandr Vronskiy "Evolution of Ecommerce Application"
Alexandr Vronskiy "Evolution of Ecommerce Application"Fwdays
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in productionvalstadsve
 
How MySQL can boost (or kill) your application v2
How MySQL can boost (or kill) your application v2How MySQL can boost (or kill) your application v2
How MySQL can boost (or kill) your application v2Federico Razzoli
 
PFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RLPFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RLNaoto Yoshida
 

Similar to The Volcano/Cascades Optimizer (20)

How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Embedded C
Embedded CEmbedded C
Embedded C
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Google
GoogleGoogle
Google
 
BlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search FeedbackBlaBlaCar Elastic Search Feedback
BlaBlaCar Elastic Search Feedback
 
Lecture 3 - Driving.pdf
Lecture 3 - Driving.pdfLecture 3 - Driving.pdf
Lecture 3 - Driving.pdf
 
Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be I...
Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be I...Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be I...
Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be I...
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML Inference
 
Active record, standalone migrations, and working with Arel
Active record, standalone migrations, and working with ArelActive record, standalone migrations, and working with Arel
Active record, standalone migrations, and working with Arel
 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemML
 
Rails data migrations
Rails data migrationsRails data migrations
Rails data migrations
 
Lecture01 algorithm analysis
Lecture01 algorithm analysisLecture01 algorithm analysis
Lecture01 algorithm analysis
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheap
 
Monitoring with ElasticSearch
Monitoring with ElasticSearch Monitoring with ElasticSearch
Monitoring with ElasticSearch
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
 
SOLID refactoring - racing car katas
SOLID refactoring - racing car katasSOLID refactoring - racing car katas
SOLID refactoring - racing car katas
 
Alexandr Vronskiy "Evolution of Ecommerce Application"
Alexandr Vronskiy "Evolution of Ecommerce Application"Alexandr Vronskiy "Evolution of Ecommerce Application"
Alexandr Vronskiy "Evolution of Ecommerce Application"
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in production
 
How MySQL can boost (or kill) your application v2
How MySQL can boost (or kill) your application v2How MySQL can boost (or kill) your application v2
How MySQL can boost (or kill) your application v2
 
PFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RLPFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RL
 

More from 宇 傅

Parallel Query Execution
Parallel Query ExecutionParallel Query Execution
Parallel Query Execution宇 傅
 
The Evolution of Data Systems
The Evolution of Data SystemsThe Evolution of Data Systems
The Evolution of Data Systems宇 傅
 
PelotonDB - A self-driving database for hybrid workloads
PelotonDB - A self-driving database for hybrid workloadsPelotonDB - A self-driving database for hybrid workloads
PelotonDB - A self-driving database for hybrid workloads宇 傅
 
Immutable Data Structures
Immutable Data StructuresImmutable Data Structures
Immutable Data Structures宇 傅
 
The Case for Learned Index Structures
The Case for Learned Index StructuresThe Case for Learned Index Structures
The Case for Learned Index Structures宇 傅
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming宇 傅
 
Functional Programming in Java 8
Functional Programming in Java 8Functional Programming in Java 8
Functional Programming in Java 8宇 傅
 
第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩宇 傅
 
Data Streaming Algorithms
Data Streaming AlgorithmsData Streaming Algorithms
Data Streaming Algorithms宇 傅
 
Golang 101
Golang 101Golang 101
Golang 101宇 傅
 
Docker Container: isolation and security
Docker Container: isolation and securityDocker Container: isolation and security
Docker Container: isolation and security宇 傅
 
Paxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus AlgorithmPaxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus Algorithm宇 傅
 

More from 宇 傅 (12)

Parallel Query Execution
Parallel Query ExecutionParallel Query Execution
Parallel Query Execution
 
The Evolution of Data Systems
The Evolution of Data SystemsThe Evolution of Data Systems
The Evolution of Data Systems
 
PelotonDB - A self-driving database for hybrid workloads
PelotonDB - A self-driving database for hybrid workloadsPelotonDB - A self-driving database for hybrid workloads
PelotonDB - A self-driving database for hybrid workloads
 
Immutable Data Structures
Immutable Data StructuresImmutable Data Structures
Immutable Data Structures
 
The Case for Learned Index Structures
The Case for Learned Index StructuresThe Case for Learned Index Structures
The Case for Learned Index Structures
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming
 
Functional Programming in Java 8
Functional Programming in Java 8Functional Programming in Java 8
Functional Programming in Java 8
 
第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩
 
Data Streaming Algorithms
Data Streaming AlgorithmsData Streaming Algorithms
Data Streaming Algorithms
 
Golang 101
Golang 101Golang 101
Golang 101
 
Docker Container: isolation and security
Docker Container: isolation and securityDocker Container: isolation and security
Docker Container: isolation and security
 
Paxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus AlgorithmPaxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus Algorithm
 

Recently uploaded

Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 

Recently uploaded (20)

Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 

The Volcano/Cascades Optimizer

  • 2. Outline ● Background ● Dynamic Programming ● Components ● Search Engine ● Summary 2
  • 3. Life of SQL SQL Parser Optimizer Executor Syntax Tree Logical Plan Physical Plan data ● Parser ● Optimizer ● Executor statistics 3
  • 4. Query Optimization Strategies ● Choice #1: Heuristics ○ INGRES, Oracle (until mid 1990s) ● Choice #2: Heuristics + Cost-based Join Search ○ System R, early IBM DB2, most open-source DBMSs ● Choice #3: Randomized Search ○ Academics in the 1980s, current Postgres ● Choice #4: Stratified Search ○ IBM’s STARBURST (late 1980s), now IBM DB2 + Oracle ● Choice #5: Unified Search ○ Volcano/Cascades in 1990s, now MSSQL + Greenplum 4
  • 5. Problem ● Why query optimizing is such a hard problem? ● It’s not difficult to find a feasible solution ● It’s almost impossible to find a optimal solution 5
  • 6. Why So Many Choices? ● Equivalence Rules ● Various Implements Join Join D Join C A B Join JoinA JoinB DC Join Join A Join B DC ABCD, ABDC, ACBD, ACDB, ADBC, ADCB, BACD, BADC, BCAD, BCDA, BDAC, BDCA, CABD, CADB, CBAD, CBDA, CDAB, CDBA, DABC, DACB, DBAC, DBCA, DCAB, DCBA 6
  • 7. Why So Many Choices? ● Equivalence Rules ● Various Implements HashJoin NestedLoopJoin SortMergeJoin IndexScan TableScan Join JoinA JoinB DC In Total: 24 * 3 * 2^4 * 3^3 = 31104 !!! 7
  • 8. Which one is better? ● Given a physical plan, we can estimate its total cost ● Cost of an operator is related to input rows ● Selectivity Factors SELECT * FROM Reviews WHERE 7/1< date < 7/31 AND rating > 9 8
  • 9. Summary of Background Good News ● We known how to construct the search space Bad News ● It’s almost impossible to exhaust the search space ● We need an elegant & smart way to do the search 9
  • 11. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 11
  • 12. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 12
  • 13. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 13
  • 14. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 3 14
  • 15. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 3 5 8 13 21 34 55 89 15
  • 16. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 ? It’s fine to go reversely... 16
  • 17. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 ? ? 17
  • 18. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 ? ? ? 18
  • 19. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 ? ? ? ? 19
  • 20. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 ? ? ? ? ? 20
  • 21. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 3 ? ? ? ? 21
  • 22. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 3 5 ? ? ? 22
  • 23. Dynamic Programing ● You are climbing a staircase. It takes n steps to reach to the top. ● Each time you can either climb 1 or 2 steps ● In how many distinct ways can you climb to the top? 0 1 2 3 4 5 6 7 8 9 10 1 1 2 3 5 8 13 21 34 55 89 23
  • 24. Define Dynamic Programing (DP) ● DP is solving a problem by solving a sub-problem ● DP is only appliable for Optimal Substructure ○ Optimal solution of current solution can be calculated from optimal solution of sub-problems ● DP can be done in both directions ○ Filling a table ○ DFS with memo 24
  • 25. DP in Searching ● Find the minimum path sum from top to bottom ● Each step you may move to adjacent numbers on the row below 2 3 4 6 5 7 4 1 8 3 2 3 4 6 5 7 4 1 8 3 25
  • 26. DP in Searching ● Find the minimum path sum from top to bottom ● Each step you may move to adjacent numbers on the row below 2 3 4 6 5 7 4 1 8 3 4 1 8 3 26
  • 27. DP in Searching ● Find the minimum path sum from top to bottom ● Each step you may move to adjacent numbers on the row below 2 3 4 6 5 7 4 1 8 3 7 6 4 1 8 3 10 27
  • 28. DP in Searching ● Find the minimum path sum from top to bottom ● Each step you may move to adjacent numbers on the row below 2 3 4 6 5 7 4 1 8 3 9 7 6 4 1 8 3 10 10 11 28
  • 29. DP in Searching ● Find the minimum path sum from top to bottom ● Each step you may move to adjacent numbers on the row below 2 3 4 6 5 7 4 1 8 3 ? 4 1 8 3 29
  • 31. Apply DP in Optimization? Sort Join A B Sort HashJoin Scan A Scan B SortMergeJoin Scan B SELECT * FROM A, B WHERE A.bid = B.bid ORDER BY A.bid Scan A Sort Optimal Plan! Order by aid Order by bid Order by bid 31
  • 32. Apply DP in Optimization? Sort Join A B Sort HashJoin Scan A Scan B SortMergeJoin Scan B Scan A Sort Optimal Plan of [AB] You cannot just apply DP straightforwardly 32
  • 33. RelSet[ABCD] System-R Optimizer ● Dynamic Programing ● Interesing Orders The main contribution: Optimal Substructure is defined so DP is feasible. ABCD, ABDC, ACBD, ACDB, ADBC, ADCB, BACD, BADC, BCAD, BCDA, BDAC, BDCA, CABD, CADB, CBAD, CBDA, CDAB, CDBA, DABC, DACB, DBAC, DBCA, DCAB, DCBA Access Path Selection in a Relational Database Management System (SIGMOD 1979) 33
  • 34. RelSet[ABCD] System-R Optimizer ● Dynamic Programing ● Interesing Orders The main contribution: Optimal Substructure is defined so DP is feasible. SortBy[A]ASC SortBy[A]DESC SortBy[B]ASC ······ ··· ··· 34
  • 35. Optimal Substructures ● Based on assumption that cost function is polynomial ● Stores Best Plan for each pair of (Relation Set, Physical Properties) ● Instead of O(n!) plans, only O(n·2n-1) plans need to be enumerated. RelSet[ABCD] Order1 Order2 Order3 RelSet[ABC] Order1 Order2 Order3 RelSet[BCD] Order1 Order2 Order3 Goal 35
  • 36. Volcano/Cascades Optimizer (1993) ● Implemented as a code generator (operators, rules, etc) and dynamic-link library (the search engine) ● Top-down Search (Directed Search) ○ Start with the final outcome that you want ○ Search path could be guided by heuristics ● Relatively, System-R’s approach is in bottom-up style 36
  • 37. Graefe Goetz ● Volcano - An Extensible and Parallel Query Evaluation System (1990) ● The Volcano Optimizer Generator: Extensibility and Efficient Search (1991) ● The Cascades Framework for Query Optimization (1995) 37
  • 38. Components Operators ● logical operators ● algorithms ● enforcers Rules ● transformation rules ● implementation rules Properties ● logical properties ● physical properties Interfaces of Operators ● property function ● applicability function (physical-only) ● cost function (physical-only) 38
  • 39. Operators ● logical operators ○ e.g. Join, Scan ● algorithms ○ e.g. HashJoin, SortMergeJoin, FileScan, IndexScan ● enforcers ○ e.g. Sort, Shuffle 39
  • 40. Rules ● transformation rules ○ Tha algebraic rules of expression equivalence ○ e.g. associativity rule, commutative rule ● implementation rules ○ Rules mapping logical operator to algorithms ○ Possible to map multiple logical operators to a single physical operator ● Specify how to match rules to plan tree ○ Sime pattern matching ○ Other condition code is also allowed 40
  • 41. Properties ● logical properties ○ Can be derived from the logical algebra expression ○ Attached to logical equivalent set: [LogExpr] ○ e.g. schema, expected size ● physical properties ○ Depend on algorithms ○ Attached to physical equivalent set: [LogExpr, PhyProp] ○ e.g. sort order, partitioning physical properties vector 41
  • 42. Interfaces of Operators ● applicability function ○ Physical property vectors that it can deliver with ○ Physical property vectors that its input must satisfy ● cost function ○ Estimate its cost ○ Cost is an abstract data type in Volcano. e.g. (CPU cost, IO cost) ● property function ○ Determine logical properties e.g. schema, row count ■ selectivity estimate ○ Determine physical properties e.g. sort order only applicable for algorithms & enforcers 42
  • 43. Components Operators ● logical operators ● algorithms ● enforcers Rules ● transformation rules ● implementation rules Properties ● logical properties ● physical properties Interfaces of Operators ● property function ● applicability function (physical-only) ● cost function (physical-only) 43
  • 44. Search Engine Define goal as [LogExpr, PhysProp] Logically we may divide the searching procedure into 2 stages: 1. Explore: Apply transformation rules to explore expression space 2. Build: Apply implementation rules to build physical plans and find best one 44
  • 45. Explore ● Apply transformation rules to explore expression space ● e.g. [ABC] = { (A⨝B)⨝C, (B⨝A)⨝C, (A⨝C)⨝B …} Join Join C A B Join Join C B A Join JoinA CB Join JoinC AB ···· Generated Logical PlansGoal.LogExpr 45
  • 46. Build ● Apply implementation rules to build physical plans ● For every [LogExpr, PhyProp] record the physical plan to Memo table ● e.g. [AB]⨝C ➡ SortMergeJoin v.s. HashJoin LogExpr PhyProp BestPlan [ABC] - x⬆ x⬇ [AB] - … … Memo Table HashJoin [AB] Scan(C) SMJ Scan(C) [AB] Sort SMJ Scan(C)[AB] x⬆ Total Cost = ? Total Cost = ? Total Cost = ? 46
  • 47. Some Facts ● Volcano do Explore then Build ● While Cascades have only one stage Actually exploring almost happens before building even in Cascades. Why? 47
  • 48. Example Logical Expression Space: [ABC] [AB], [AC], [BC] [A], [B], [C] Our Mission: FindBestPlan((A⨝B)⨝C, A.x, 500) Logical Expression Order Limit 48
  • 49. 49
  • 50. 50
  • 51. 51
  • 52. 52
  • 53. 53
  • 54. 54
  • 55. 55
  • 56. 56
  • 57. FindBestPlan(LogExpr, PhysProp) If Memo[LogExpr, PhysProp] is not empty: ● return BestPlan or Failures Possible moves = ● applicable transformations ● algorithms that give the required PhysProp ● enforcers for required PhysProp ForEach (Move = pop the most promising moves) ● is transformation: Cost = FindBestPlan(LogExpr, PhysProp) ● is algorithm: Cost = Costself + Sum(Costinput) ● is enforcer: Cost = Costself + Costinput Memo[LogExpr, PhysProp] = Best Plan return Best Plan 57
  • 58. Some Details ● Use cost limit to do branch-and-bound pruning ○ By default set to unlimited ● Mark (LogExpr, PhysProp) as in-progress to prevent dead loop ○ e.g. A JOIN B <=> B JOIN A ● Use prioirity queue to do heuristic ordering of moves ○ Calcite prioritizes RelSet with less depth and higher cost 58
  • 59. Summary Volcano/Cascades Optimizer … ● use Rules to build all logical or physical plans ● use Cost to evaluate a physical plan ● use Dynamic Programming to search for the optimal physical plan 59
  • 60. Compared with RBO Here are my personal opinions … ● Cost-based: Could find better physical plans ● Rule-independent: Provide an elegant interface for DB implementors ● Still Heuristic: May performs bad in some corner cases 60