Optimizer features in recent releases of other databases

[some] Optimizer features
in recent releases
of other databases
Sergei Petrunia
Barcelona ACM
October 2019

Optimizer/Executor
in MySQL 8.0.x

MySQL 8.0
MySQL 8.0.n
- Iterator-based Executor (introduced
gradually)
MySQL 8.0.13 (2018-10-22, GA)
- Skip Scan (contribution from Facebook)
MySQL 8.0.14 (2019-01-21, GA)
- LATERAL support
MySQL 8.0.17 (2019-07-22, GA)
- Multi-valued indexes for JSON
- NOT IN/EXISTS/.. -> anti-join
- EXISTS -> Semi-join conversion
- [Anti-]semi-join conversion IS TRUE/IS NULL
- Semi-join optimization for subqueries in the
ON expressions
MySQL 8.0.18 (2019-10-14, GA)
- Hash join
- EXPLAIN ANALYZE

Iterator-based executor
● The idea: switch server internals to the init()/get_next() interface
– Unifies all access methods (quick selects, full scans, NL-joins, etc)
– A textbook-like approach, similar to PostgreSQL
– Pay-offs?
●
Cursors
●
Flexibility allows further development
●
New EXPLAIN form, EXPLAIN FORMAT=TREE
● Doesn’t handle all SQL features yet
– Handles semi-join subqueries
– Doesn’t handle outer joins.
– Doesn’t handle certain kinds of non-indexed joins?

EXPLAIN ANALYZE
● Overlapping functionality with MariaDB’s ANALYZE (since 10.1)
● Only works with Iterator-based executor
● Doesn’t [yet] work with
– SELECT queries not using the iterator-based executor
●
LEFT JOINs
●
...
– UPDATE/DELETE/etc

ANALYZE in MariaDB
select *
from ten, t1, t2
where
ten.col1 < 7 and
t1.key1=ten.a and
t2.key1=t1.a
+------+-------------+-------+------+---------------+------+---------+-----------+------+--------+----------+------------+-----------
| id   | select_type | table | type | possible_keys | key | key_len | ref       | rows | r_rows | filtered | r_filtered | Extra
+------+-------------+-------+------+---------------+------+---------+-----------+------+--------+----------+------------+-----------
|    1 | SIMPLE      | ten   | ALL | PRIMARY       | NULL | NULL    | NULL      | 10   | 10.00 |   100.00 |      70.00 | Using wher
|    1 | SIMPLE      | t1    | ref | key1          | key1 | 5       | j20.ten.a | 1    | 2.86   |   100.00 |     100.00 | Using wher
|    1 | SIMPLE      | t2    | ref | key1          | key1 | 5       | j20.t1.a | 1    | 10.00 |   100.00 |     100.00 |
+------+-------------+-------+------+---------------+------+---------+-----------+------+--------+----------+------------+-----------

EXPLAIN ANALYZE
select *
from ten, t1, t2
where
ten.col1 < 7 and
t1.key1=ten.a and
t2.key1=t1.a
-> Nested loop inner join (cost=3.58 rows=3) (actual time=0.882..18.963 rows=200 loops=1)
-> Filter: (ten.col1 < 7) (cost=1.25 rows=3) (actual time=0.303..0.433 rows=7 loops=1)
-> Table scan on ten (cost=1.25 rows=10) (actual time=0.296..0.394 rows=10 loops=1)
-> Filter: (t1.a is not null) (cost=0.28 rows=1) (actual time=0.178..0.504 rows=3 loops=7)
-> Index lookup on t1 using key1 (key1=ten.a) (cost=0.28 rows=1) (actual time=0.176..0.497 rows=3 loops=7)
-> Index lookup on t2 using key1 (key1=t1.a) (cost=0.28 rows=1) (actual time=0.155..0.735 rows=10 loops=20)

EXPLAIN ANALYZE
select *
from ten, t1, t2
where
ten.col1 < 7 and
t1.key1=ten.a and
t2.key1=t1.a
-> Filter: (ten.col1 < 7) (cost=1.25 rows=3) (actual time=0.303..0.433 rows=7 loops=1)
-> Table scan on ten (cost=1.25 rows=10) (actual time=0.296..0.394 rows=10 loops=1)
-> Filter: (t1.a is not null) (cost=0.28 rows=1) (actual time=0.178..0.504 rows=3 loops=7)
-> Index lookup on t1 using key1 (key1=ten.a) (cost=0.28 rows=1) (actual time=0.176..0.497 rows=3 loops=7)
-> Index lookup on t2 using key1 (key1=t1.a) (cost=0.28 rows=1) (actual time=0.155..0.735 rows=10 loops=20)
(actual time=0.176..0.497 rows=3 loops=7)

Convert NOT IN/EXISTS to anti-join
● Before: subquery predicate attached as early as possible
– Subquery execution short-cuts as soon as match is found
● Now: anti-join operation
– Participates in join optimization (possiblity for better plans?)
– Potentially, could benefit from join algorithms (BKA, hash
join, etc)
●
Not sure if it actually does

Convert EXISTS(...) to semi-join
● MariaDB: similar feature in 10.0, “EXISTS->IN conversion”
– Trivial correlation detection/removal works in both
– But MariaDB’s EXISTS->IN works for non-semi-joins, too.

Hash Join
● It’s a “Grace Hash Join” (correction by Igor: not quite!)
– Theoretically should be better than “BNL-H” we have
●
One read pass, hash table spills over to disk
● It uses the “iterator interface”
– Doesn’t support certain SQL features
●
Left join, etc..
– Supports semi-join
● Need to study this further

PostgreSQL, relevant features in recent versions
PostgreSQL 11, 2018-10-18
●
Improved partitioning
●
Improved query parallelism
●
Parallel index creation
●
JIT compilation for expressions
●
Non-recursive CTEs are now inlined
●
MCV statistics for multiple columns
●
JSONPath (but not indexing)
●
llvm JIT is enabled by default if compiled-in
PostgreSQL 9.6, 2016-09-29
●
Parallel query support
●
Logical replication
●
Improved query parallelism
●
Multi-column statistics:
correlation ratio, #distinct_values

Inlining non-recursive CTEs
● Before PostgreSQL 12: CTE is materialized and is an “optimization barrier”

19
CTE Merge Condition
pushdown
CTE reuse
MariaDB 10.2 ✔ ✔ ✘
MS SQL Server ✔ ✔ ✘
PostgreSQL ✘ ✘ ✔
MySQL 8.0.0 ✔ ✘ ✔
CTE Optimizations summary
 Merge and condition pushdown are most important
 MariaDB supports them, like MS SQL.
 PostgreSQL’s approach is *weird*
 “CTEs are optimization barriers”
 MySQL: “try merging, otherwise reuse”
Q
uoting
m
y
slide
from
M
|18:
Fixed in
PostgreSQL 12

MCV statistics for multiple columns
select * from cars_for_sale where maker='Honda' and model='Civic'
select * from users where city='Moscow' and country='Russia|USA'
● Issue: correlated conditions in WHERE
CREATE STATISTICS s2 (mcv) ON col1, col2 FROM t2;
ANALYZE TABLE t2;
● Assuming independence and multipying selectivities is bad
– Bruce Momjan(?): “90% of complaints about the optimizer are about this”
● Solution: Most-Common-Values statistics for multiple columns:
● Note: the statistics (s2) is present is [auto-]recomputed
– There’s DROP STATISTICS to remove it

Parallel index creation (PostgreSQL 11)
● Use multiple threads when creating an index
– a typical objection “does this help for an IO-bound workload”?
workers speedup
1 1.00
2 2.32
4 2.87
8 3.33
● Experiment results: it depends
– col1 INT, key(col1) - less speedup (~1.5 x)
– 3-keypart key: up 3.33x speedup
● Machine
– intel i9, 8 cores/16 threads
– Intel Optane SSD 900P (~1K eur)

Parallel query execution
● Supports joins and aggregation
– Parallel table scan
– Aggregate functions
– Parallel join algoritms
● Many constructs are “parallel stoppers”
– CTEs
●
This is why my TPC-DS benchmark run didn’t benefit from parallel
– ...
– TPC-H will show more speedups

Optimizer features in recent releases of other databases

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Optimizer features in recent releases of other databases

Similar to Optimizer features in recent releases of other databases (20)

More from Sergey Petrunya

More from Sergey Petrunya (14)

Recently uploaded

Recently uploaded (20)

Optimizer features in recent releases of other databases