3. MySQL 8.0
MySQL 8.0.n
- Iterator-based Executor (introduced
gradually)
MySQL 8.0.13 (2018-10-22, GA)
- Skip Scan (contribution from Facebook)
MySQL 8.0.14 (2019-01-21, GA)
- LATERAL support
MySQL 8.0.17 (2019-07-22, GA)
- Multi-valued indexes for JSON
- NOT IN/EXISTS/.. -> anti-join
- EXISTS -> Semi-join conversion
- [Anti-]semi-join conversion IS TRUE/IS NULL
- Semi-join optimization for subqueries in the
ON expressions
MySQL 8.0.18 (2019-10-14, GA)
- Hash join
- EXPLAIN ANALYZE
4. Iterator-based executor
● The idea: switch server internals to the init()/get_next() interface
– Unifies all access methods (quick selects, full scans, NL-joins, etc)
– A textbook-like approach, similar to PostgreSQL
– Pay-offs?
●
Cursors
●
Flexibility allows further development
●
New EXPLAIN form, EXPLAIN FORMAT=TREE
● Doesn’t handle all SQL features yet
– Handles semi-join subqueries
– Doesn’t handle outer joins.
– Doesn’t handle certain kinds of non-indexed joins?
5. EXPLAIN ANALYZE
● Overlapping functionality with MariaDB’s ANALYZE (since 10.1)
● Only works with Iterator-based executor
● Doesn’t [yet] work with
– SELECT queries not using the iterator-based executor
●
LEFT JOINs
●
...
– UPDATE/DELETE/etc
7. EXPLAIN ANALYZE
select *
from ten, t1, t2
where
ten.col1 < 7 and
t1.key1=ten.a and
t2.key1=t1.a
-> Nested loop inner join (cost=3.58 rows=3) (actual time=0.882..18.963 rows=200 loops=1)
-> Nested loop inner join (cost=2.42 rows=3) (actual time=0.648..3.992 rows=20 loops=1)
-> Filter: (ten.col1 < 7) (cost=1.25 rows=3) (actual time=0.303..0.433 rows=7 loops=1)
-> Table scan on ten (cost=1.25 rows=10) (actual time=0.296..0.394 rows=10 loops=1)
-> Filter: (t1.a is not null) (cost=0.28 rows=1) (actual time=0.178..0.504 rows=3 loops=7)
-> Index lookup on t1 using key1 (key1=ten.a) (cost=0.28 rows=1) (actual time=0.176..0.497 rows=3 loops=7)
-> Index lookup on t2 using key1 (key1=t1.a) (cost=0.28 rows=1) (actual time=0.155..0.735 rows=10 loops=20)
8. EXPLAIN ANALYZE
select *
from ten, t1, t2
where
ten.col1 < 7 and
t1.key1=ten.a and
t2.key1=t1.a
-> Nested loop inner join (cost=3.58 rows=3) (actual time=0.882..18.963 rows=200 loops=1)
-> Nested loop inner join (cost=2.42 rows=3) (actual time=0.648..3.992 rows=20 loops=1)
-> Filter: (ten.col1 < 7) (cost=1.25 rows=3) (actual time=0.303..0.433 rows=7 loops=1)
-> Table scan on ten (cost=1.25 rows=10) (actual time=0.296..0.394 rows=10 loops=1)
-> Filter: (t1.a is not null) (cost=0.28 rows=1) (actual time=0.178..0.504 rows=3 loops=7)
-> Index lookup on t1 using key1 (key1=ten.a) (cost=0.28 rows=1) (actual time=0.176..0.497 rows=3 loops=7)
-> Index lookup on t2 using key1 (key1=t1.a) (cost=0.28 rows=1) (actual time=0.155..0.735 rows=10 loops=20)
9. EXPLAIN ANALYZE
select *
from ten, t1, t2
where
ten.col1 < 7 and
t1.key1=ten.a and
t2.key1=t1.a
-> Nested loop inner join (cost=3.58 rows=3) (actual time=0.882..18.963 rows=200 loops=1)
-> Nested loop inner join (cost=2.42 rows=3) (actual time=0.648..3.992 rows=20 loops=1)
-> Filter: (ten.col1 < 7) (cost=1.25 rows=3) (actual time=0.303..0.433 rows=7 loops=1)
-> Table scan on ten (cost=1.25 rows=10) (actual time=0.296..0.394 rows=10 loops=1)
-> Filter: (t1.a is not null) (cost=0.28 rows=1) (actual time=0.178..0.504 rows=3 loops=7)
-> Index lookup on t1 using key1 (key1=ten.a) (cost=0.28 rows=1) (actual time=0.176..0.497 rows=3 loops=7)
-> Index lookup on t2 using key1 (key1=t1.a) (cost=0.28 rows=1) (actual time=0.155..0.735 rows=10 loops=20)
(actual time=0.176..0.497 rows=3 loops=7)
10. Convert NOT IN/EXISTS to anti-join
● Before: subquery predicate attached as early as possible
– Subquery execution short-cuts as soon as match is found
● Now: anti-join operation
– Participates in join optimization (possiblity for better plans?)
– Potentially, could benefit from join algorithms (BKA, hash
join, etc)
●
Not sure if it actually does
11. Convert EXISTS(...) to semi-join
● MariaDB: similar feature in 10.0, “EXISTS->IN conversion”
– Trivial correlation detection/removal works in both
– But MariaDB’s EXISTS->IN works for non-semi-joins, too.
12. Hash Join
● It’s a “Grace Hash Join” (correction by Igor: not quite!)
– Theoretically should be better than “BNL-H” we have
●
One read pass, hash table spills over to disk
● It uses the “iterator interface”
– Doesn’t support certain SQL features
●
Left join, etc..
– Supports semi-join
● Need to study this further
16. 19
CTE Merge Condition
pushdown
CTE reuse
MariaDB 10.2 ✔ ✔ ✘
MS SQL Server ✔ ✔ ✘
PostgreSQL ✘ ✘ ✔
MySQL 8.0.0 ✔ ✘ ✔
CTE Optimizations summary
Merge and condition pushdown are most important
MariaDB supports them, like MS SQL.
PostgreSQL’s approach is *weird*
“CTEs are optimization barriers”
MySQL: “try merging, otherwise reuse”
Q
uoting
m
y
slide
from
M
|18:
Fixed in
PostgreSQL 12
17. MCV statistics for multiple columns
select * from cars_for_sale where maker='Honda' and model='Civic'
select * from users where city='Moscow' and country='Russia|USA'
● Issue: correlated conditions in WHERE
CREATE STATISTICS s2 (mcv) ON col1, col2 FROM t2;
ANALYZE TABLE t2;
● Assuming independence and multipying selectivities is bad
– Bruce Momjan(?): “90% of complaints about the optimizer are about this”
● Solution: Most-Common-Values statistics for multiple columns:
● Note: the statistics (s2) is present is [auto-]recomputed
– There’s DROP STATISTICS to remove it
18. Parallel index creation (PostgreSQL 11)
● Use multiple threads when creating an index
– a typical objection “does this help for an IO-bound workload”?
workers speedup
1 1.00
2 2.32
4 2.87
8 3.33
● Experiment results: it depends
– col1 INT, key(col1) - less speedup (~1.5 x)
– 3-keypart key: up 3.33x speedup
● Machine
– intel i9, 8 cores/16 threads
– Intel Optane SSD 900P (~1K eur)
19. Parallel query execution
● Supports joins and aggregation
– Parallel table scan
– Aggregate functions
– Parallel join algoritms
● Many constructs are “parallel stoppers”
– CTEs
●
This is why my TPC-DS benchmark run didn’t benefit from parallel
– ...
– TPC-H will show more speedups