SlideShare a Scribd company logo
1 of 102
Advanced query optimizer 
tuning and analysis 
Sergei Petrunia 
Timour Katchaounov 
Monty Program Ab 
MySQL Conference And Expo 2013
2 07:48:08 AM 
● Introduction 
– What is an optimizer problem 
– How to catch it 
● old an new tools 
● Single-table selects 
– brief recap from 2012 
● JOINs 
– ref access 
● index statistics 
– join condition pushdown 
– join plan efficiency 
– query plan vs reality 
● Big I/O bound JOINs 
– Batched Key Access 
● Aggregate functions 
● ORDER BY ... LIMIT 
● GROUP BY 
● Subqueries
Is there a problem with query optimizer? 
3 07:48:08 AM 
• Database 
performance is 
affected by many 
factors 
• One of them is the 
query optimizer 
• Is my performance 
problem caused by 
the optimizer?
Sings that there is a query optimizer problem 
• Some (not all) queries are slow 
• A query seems to run longer than it ought to 
– And examines more records than it ought to 
• Usually, query remains slow regardless of 
other activity on the server 
4 07:48:08 AM
Catching slow queries, the old ways 
5 07:48:08 AM 
● Watch the Slow query log 
– Percona Server/MariaDB: 
--log_slow_verbosity=query_plan 
# Thread_id: 1 Schema: dbt3sf10 QC_hit: No 
# Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000 
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No 
# Filesort: No Filesort_on_disk: No Merge_passes: 0 
SET timestamp=1333385770; 
select * from customer where c_acctbal < -1000; 
– Run pt-query-digest on the log 
• Run SHOW PROCESSLIST periodically
The new way: SHOW PROCESSLIST + SHOW EXPLAIN 
• Available in MariaDB 10.0+ 
• Displays EXPLAIN of a running statement 
MariaDB> show processlist; 
+--+----+---------+-------+-------+----+------------+-------------------------... 
|Id|User|Host |db |Command|Time|State |Info 
+--+----+---------+-------+-------+----+------------+-------------------------... 
| 1|root|localhost|dbt3sf1|Query | 10|Sending data|select max(o_totalprice) ... 
| 2|root|localhost|dbt3sf1|Query | 0|init |show processlist 
+--+----+---------+-------+-------+----+------------+-------------------------... 
MariaDB> show explain for 1; 
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+ 
|1 |SIMPLE |orders|ALL |NULL |NULL|NULL |NULL|1498194|Using where| 
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+ 
MariaDB [dbt3sf1]> show warnings; 
+-----+----+-----------------------------------------------------------------+ 
|Level|Code|Message | 
+-----+----+-----------------------------------------------------------------+ 
|Note |1003|select max(o_totalprice) from orders where year(o_orderDATE)=1995| 
+-----+----+-----------------------------------------------------------------+ 
6 07:48:08 AM
7 07:48:08 AM 
SHOW EXPLAIN usage 
● Intended usage 
– SHOW PROCESSLIST ... 
– SHOW EXPLAIN FOR ... 
● Why not just run EXPLAIN again 
– Difficult to replicate setups 
● Temporary tables 
● Optimizer settings 
● Storage engine's index statistics 
● ... 
– No uncertainty about whether you're looking at 
the same query plan or not.
Catching slow queries (NEW) 
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 
8 07:48:08 AM 
● use performance_schema 
● Many ways to analyze via queries 
– events_statements_summary_by_digest 
● count_star, sum_timer_wait, 
min_timer_wait, avg_timer_wait, max_timer_wait 
● digest_text, digest 
● sum_rows_examined, sum_created_tmp_disk_tables, 
sum_select_full_join 
– events_statements_history 
● sql_text, digest_text, digest 
● timer_start, timer_end, timer_wait 
● rows_examined, created_tmp_disk_tables, 
select_full_join 
8
Catching slow queries (NEW) 
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 
• Modified Q18 from DBT3 
select c_name, c_custkey, o_orderkey, o_orderdate, 
9 07:48:08 AM 
o_totalprice, sum(l_quantity) 
from customer, orders, lineitem 
where 
o_totalprice > ? 
and c_custkey = o_custkey 
and o_orderkey = l_orderkey 
group by c_name, c_custkey, o_orderkey, 
o_orderdate, o_totalprice 
order by o_totalprice desc, o_orderdate 
LIMIT 10; 
• App executes Q18 many times with 
? = 550000, 500000, 400000, ... 
9
Catching slow queries (NEW) 
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 
● Find candidate slow queries 
● Simple tests: select_full_join > 0, 
created_tmp_disk_tables > 0, etc 
● Complex conditions: 
max execution time > X sec OR 
min/max time vary a lot: 
select max_timer_wait/avg_timer_wait as max_ratio, 
avg_timer_wait/min_timer_wait as min_ratio 
from events_statements_summary_by_digest 
where max_timer_wait > 1000000000000 
or max_timer_wait / avg_timer_wait > 2 
or avg_timer_wait / min_timer_wait > 2G 
10 07:48:08 AM
Catching slow queries (NEW) 
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 
11 07:48:08 AM 
*************************** 5. row *************************** 
DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6b 
DIGEST_TEXT: SELECT `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , 
`o_totalprice` , SUM ( `l_quantity` ) FROM `customer` , `orders` , `lineitem` WHERE 
`o_totalprice` > ? AND `c_custkey` = `o_custkey` AND `o_orderkey` = `l_orderkey` GROUP BY 
`c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` ORDER BY `o_totalprice` 
DESC , `o_orderdate` LIMIT ? 
COUNT_STAR: 3 
SUM_TIMER_WAIT: 3251758347000 
MIN_TIMER_WAIT: 3914209000 → 0.0039 sec 
AVG_TIMER_WAIT: 1083919449000 
MAX_TIMER_WAIT: 3204044053000 → 3.2 sec 
SUM_LOCK_TIME: 555000000 
SUM_ROWS_SENT: 25 
SUM_ROWS_EXAMINED: 0 
SUM_CREATED_TMP_DISK_TABLES: 0 
SUM_CREATED_TMP_TABLES: 3 
SUM_SELECT_FULL_JOIN: 0 
SUM_SELECT_RANGE: 3 
SUM_SELECT_SCAN: 0 
SUM_SORT_RANGE: 0 
SUM_SORT_ROWS: 25 
SUM_SORT_SCAN: 3 
SUM_NO_INDEX_USED: 0 
SUM_NO_GOOD_INDEX_USED: 0 
FIRST_SEEN: 1970-01-01 03:38:27 
LAST_SEEN: 1970-01-01 03:38:43 
max_ratio: 2.9560 
min_ratio: 276.9192 
High variance of 
execution time
Catching slow queries (NEW) 
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 
● Check the actual queries and constants 
● The events_statements_history table 
select timer_wait/1000000000000 as exec_time, sql_text 
from events_statements_history 
where digest in 
(select digest from events_statements_summary_by_digest 
where max_timer_wait > 1000000000000 
12 07:48:08 AM 
or max_timer_wait / avg_timer_wait > 2 
or avg_timer_wait / min_timer_wait > 2) 
order by timer_wait;
Catching slow queries (NEW) 
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 
+-----------+-----------------------------------------------------------------------------------+ 
| exec_time | sql_text | 
+-----------+-----------------------------------------------------------------------------------+ 
| 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) 
from customer, orders, lineitem 
where o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 | 
| 0.0438 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) 
from customer, orders, lineitem 
where o_totalprice > 500000 and c_custkey = o_custkey ... LIMIT 10 | 
| 3.2040 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) 
from customer, orders, lineitem 
where o_totalprice > 400000 and c_custkey = o_custkey ... LIMIT 10 | 
+-----------+-----------------------------------------------------------------------------------+ 
Observation: 
orders.o_totalprice > ? is less and less selective 
13 07:48:08 AM
Actions after finding the slow query 
Bad query plan 
– Rewrite the query 
– Force a good query plan 
• Bad optimizer settings 
– Do tuning 
• Query is inherently complex 
– Don't waste time with it 
– Look for other solutions. 
14 07:48:08 AM
15 07:48:08 AM 
● Introduction 
– What is an optimizer problem 
– How to catch it 
● old an new tools 
● Single-table selects 
– brief recap from 2012 
● JOINs 
– ref access 
● index statistics 
– join condition pushdown 
– join plan efficiency 
– query plan vs reality 
● Big I/O bound JOINs 
– Batched Key Access 
● Aggregate functions 
● ORDER BY ... LIMIT 
● GROUP BY 
● Subqueries
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and 
o_clerk='Clerk#000009506' 
● Run the query: 
19 rows in set (7.65 sec) 
● Check the query plan: 
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | 
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where | 
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 
16 07:48:08 AM 
Consider a simple select 
select * from orders 
where 
• 15M rows were scanned, 19 rows in output 
• Query plan seems inefficient 
– (note: this logic doesn't directly apply to group/order by queries).
select * from orders 
where 
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and 
o_clerk='Clerk#000009506' 
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | 
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where | 
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 
17 07:48:08 AM 
Query plan analysis 
• Entire table is scanned 
• WHERE condition checked 
after records are read 
– Not used to limit 
#examined rows.
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and 
o_clerk='Clerk#000009506' 
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where| 
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 
18 07:48:08 AM 
Let's add an index 
alter table orders add key i_o_orderdate (o_orderdate); 
select * from orders 
where 
● Query time: 
19 rows in set (0.76 sec) 
• Outcome 
– Down to reading 300K rows 
– Still, 300K >> 19 rows.
Finding out which indexes to add 
select * from orders 
where 
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and 
o_clerk='Clerk#000009506' 
Check selectivity of conditions that will use the index 
o_orderDate BETWEEN '1992-06-06' and '1992-07-06'; 
19 07:48:08 AM 
● index (o_orderdate) 
select count(*) from orders 
where 
306322 rows 
● index (o_clerk) 
select count(*) from orders where o_clerk='Clerk#000009506' 
1507 rows.
Try adding composite indexes 
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ 
|1 |SIMPLE |orders|range|i_o_clerk_...|i_o_clerk_date|20 |NULL|19 |Using where| 
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ 
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ 
|1 |SIMPLE |orders|range|i_o_date_c...|i_o_date_clerk|20 |NULL|360354|Using where| 
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ 
20 07:48:08 AM 
● index (o_clerk, o_orderdate) 
● index (o_orderdate, o_clerk) 
Bingo! 100% efficiency 
Much worse! 
• If condition uses multiple columns, composite index will be most efficient 
• Order of column matters 
– Explanation why is outside of scope of this tutorial. Covered in last year's 
tutorial
Conditions must be in SARGable form 
• Condition must represent a range 
• It must have form that is recognized by the optimizer 
o_orderDate BETWEEN '1992-06-01' and '1992-06-30' 
day(o_orderDate)=1992 and month(o_orderdate)=6 
TO_DAYS(o_orderDATE) between TO_DAYS('1992-06-06') and 
TO_DAYS('1992-07-06') 
21 07:48:08 AM 
o_clerk='Clerk#000009506' 
o_clerk LIKE 'Clerk#000009506' 
o_clerk LIKE '%Clerk#000009506%' 
 
 
 
 
 
 
 
column IN (1,10,15,21, ...) 
(col1, col2) IN ( (1,1), (2,2), (3,3), …).
New in MySQL-5.6: optimizer_trace 
22 07:48:08 AM 
● Lets you see the ranges 
set optimizer_trace=1; 
explain select * from orders 
where o_orderDATE between '1992-06-01' and '1992-07-03' and 
o_orderdate not in ('1992-01-01', '1992-06-12','1992-07-04') 
select * from information_schema.optimizer_traceG 
● Will print a big JSON struct 
● Search for range_scan_alternatives.
New in MySQL-5.6: optimizer_trace 
23 07:48:08 AM 
... 
"range_scan_alternatives": [ 
{ 
"index": "i_o_orderdate", 
"ranges": [ 
"1992-06-01 <= o_orderDATE < 1992-06-12", 
"1992-06-12 < o_orderDATE <= 1992-07-03" 
], 
"index_dives_for_eq_ranges": true, 
"rowid_ordered": false, 
"using_mrr": false, 
"index_only": false, 
"rows": 319082, 
"cost": 382900, 
"chosen": true 
}, 
{ 
"index": "i_o_date_clerk", 
"ranges": [ 
"1992-06-01 <= o_orderDATE < 1992-06-12", 
"1992-06-12 < o_orderDATE <= 1992-07-03" 
], 
"index_dives_for_eq_ranges": true, 
"rowid_ordered": false, 
"using_mrr": false, 
"index_only": false, 
"rows": 406336, 
"cost": 487605, 
"chosen": false, 
"cause": "cost" 
} 
], 
... 
● Considered ranges are shown 
in range_scan_alternatives 
section 
● This is actually original use 
case of optimizer_trace 
● Alas, recent mysql-5.6 displays 
misleading info about ranges 
on multi-component keys (will 
file a bug) 
● Still, very useful.
24 07:48:08 AM 
Source of #rows estimates for range 
select * from orders 
where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' 
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where| 
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 
? 
• “records_in_range” estimate 
• Done by diving into index 
• Usually is fairly accurate 
• Not affected by ANALYZE 
TABLE.
25 07:48:08 AM 
Simple selects: conclusions 
• Efficiency == “#rows_scanned is close to #rows_returned” 
• Indexes and WHERE conditions reduce #rows scanned 
• Index estimates are usually accurate 
• Multi-column indexes 
– “handle” conditions on multiple columns 
– Order of columns in the index matters 
• optimizer_trace allows to view the ranges 
– But misrepresents ranges over multi-column indexes.
26 07:48:08 AM 
Now, will skip some topics 
One can also speedup simple selects with 
● index_merge access method 
● index access method 
● Index Condition Pushdown 
We don't have time for these now, check out the last 
year's tutorial.
27 07:48:08 AM 
● Introduction 
– What is an optimizer problem 
– How to catch it 
● old an new tools 
● Single-table selects 
– brief recap from 2012 
● JOINs 
– ref access 
● index statistics 
– join condition pushdown 
– join plan efficiency 
– query plan vs reality 
● Big I/O bound JOINs 
– Batched Key Access 
● Aggregate functions 
● ORDER BY ... LIMIT 
● GROUP BY 
● Subqueries
• “Customers with their orders” 
28 07:48:08 AM 
A simple join 
select * from customer, orders where c_custkey=o_custkey
Execution: Nested Loops join 
select * from customer, orders where c_custkey=o_custkey 
29 07:48:08 AM 
for each customer C { 
for each order O { 
if (C.c_custkey == O.o_custkey) 
produce record(C, O); 
} 
} 
• Complexity: 
– Scans table customer 
– For each record in customer, scans table orders 
• Is this ok?
Execution: Nested loops join (2) 
select * from customer, orders where c_custkey=o_custkey 
30 07:48:08 AM 
for each customer C { 
for each order O { 
if (C.c_custkey == O.o_custkey) 
produce record(C, O); 
} 
} 
• EXPLAIN: 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | 
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
Execution: Nested loops join (3) 
select * from customer, orders where c_custkey=o_custkey 
31 07:48:08 AM 
for each customer C { 
for each order O { 
if (C.c_custkey == O.o_custkey) 
produce record(C, O); 
} 
} 
rows to read 
• EXPLAIN: 
from customer 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | 
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
rows to read from orders 
c_custkey=o_custkey
Execution: Nested loops join (4) 
select * from customer, orders where c_custkey=o_custkey 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | 
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
• Scan a 1,493,361-row table 148,749 times 
– Consider 1,493,361 * 148,749 row combinations 
• Is this query inherently complex? 
– We know each customer has his own orders 
– size(customer x orders)= size(orders) 
– Lower bound is 
1,493,361 + 148,749 + costs to match customer<->order. 
32 07:48:08 AM
Using index for join: ref access 
alter table orders add index i_o_custkey(o_custkey) 
select * from customer, orders where c_custkey=o_custkey 
33 07:48:08 AM
select * from customer, orders where c_custkey=o_custkey 
34 07:48:08 AM 
ref access - analysis 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |148749| | 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ 
● One ref lookup scans 7 rows. 
● In total: 7 * 148,749=1,041,243 rows 
– `orders` has 1.4M rows 
– no redundant reads from `orders` 
● The whole query plan 
– Reads all customers 
– Reads 1M orders (of 1.4M) 
● Efficient!
Conditions that can be used for ref access 
35 07:48:08 AM 
● Can use equalities 
– tbl.key=other_table.col 
– tbl.key=const 
– tbl.key IS NULL 
● For multipart keys, will use largest prefix 
– keypart1=... AND keypart2= … AND keypartK=... .
Conditions that can't be used for ref access 
● Doesn't work for non-equalities 
36 07:48:08 AM 
t1.key BETWEEN t2.col1 AND t2.col2 
● Doesn't work for OR-ed equalities 
t1.key=t2.col1 OR t1.key=t2.col2 
– Except for ref_or_null 
t1.key=... OR t1.key IS NULL 
● Doesn't “combine” ref and range 
access 
– t.keypart1 BETWEEN c1 AND c2 AND 
t.keypart2=t2.col 
– t.keypart2 BETWEEN c1 AND c2 AND 
t.keypart1=t2.col .
37 07:48:08 AM 
Is ref always efficient? 
● Efficient, if column has many different values 
– Best case – unique index (eq_ref) 
● A few different values – not useful 
● Skewed distribution: depends on which part the 
join touches 
good 
bad depends
ref access estimates - index statistics 
38 07:48:08 AM 
• How many rows will match 
tbl.key_column = $value 
for an arbitrary $value? 
• Index statistics 
show keys from orders where key_name='i_o_custkey' 
*************************** 1. row *************** 
Table: orders 
Non_unique: 1 
Key_name: i_o_custkey 
Seq_in_index: 1 
Column_name: o_custkey 
Collation: A 
Cardinality: 214462 
Sub_part: NULL 
Packed: NULL 
Null: YES 
Index_type: BTREE 
show table status like 'orders' 
*************************** 1. row **** 
Name: orders 
Engine: InnoDB 
Version: 10 
Row_format: Compact 
Rows: 1495152 
Avg_row_length: 133 
Data_length: 199966720 
Max_data_length: 0 
Index_length: 122421248 
Data_free: 6291456 
... 
average = Rows /Cardinality = 1495152 / 214462 = 6.97.
39 07:48:08 AM 
ref access – conclusions 
● Based on t.key=... equality conditions 
● Can make joins very efficient 
● Relies on index statistics for estimates.
40 07:48:08 AM 
Optimizer statistics 
● MySQL/Percona Server 
– Index statistics 
– Persistent/transient InnoDB stats 
● MariaDB 
– Index statistics, persistent/transient 
● Same as Percona Server (via XtraDB) 
– Persistent, 
engine-independent, 
index-independent statistics.
41 07:48:08 AM 
Index statistics 
● Cardinality allows to calculate a table-wide 
average #rows-per-key-prefix 
● It is a statistical value (inexact) 
● Exact collection procedure depends on the 
storage engine 
– InnoDB – random sampling 
– MyISAM – index scan 
– Engine-independent – index scan.
42 07:48:08 AM 
Index statistics in MySQL 5.6 
● Sample [8] random index leaf pages 
● Table statistics (stored) 
– rows - estimated number of rows in a table 
– Other stats not used by optimizer 
● Index statistics (stored) 
– fields - #fields in the index 
– rows_per_key - rows per 1 key value, per prefix fields 
([1 column value], [2 columns value], [3 columns value], …) 
– Other stats not used by optimizer.
43 07:48:08 AM 
Index statics updates 
● Statistics updated when: 
– ANALYZE TABLE tbl_name [, tbl_name] … 
– SHOW TABLE STATUS, SHOW INDEX 
– Access to INFORMATION_SCHEMA.[TABLES| 
STATISTICS] 
– A table is opened for the first time 
(after server restart) 
– A table has changed >10% 
– When InnoDB Monitor is turned ON.
44 07:48:08 AM 
Displaying optimizer statistics 
● MySQL 5.5, MariaDB 5.3, and older 
– Issue SQL statements to count rows/keys 
– Indirectly, look at EXPLAIN for simple queries 
● MariaDB 5.5, Percona Server 5.5 (using XtraDB) 
– information_schema.[innodb_index_stats, innodb_table_stats] 
– Read-only, always visible 
● MySQL 5.6 
– mysql.[innodb_index_stats, innodb_table_stats] 
– User updatetable 
– Only available if innodb_analyze_is_persistent=ON 
● MariaDB 10.0 
– Persistent updateable tables mysql.[index_stats, column_stats, table_stats] 
– User updateable 
– + current XtraDB mechanisms.
45 07:48:08 AM 
Plan [in]stability 
● Statistics may vary a lot (orders) 
MariaDB [dbt3]> select * from information_schema.innodb_index_stats; 
+------------+-----------------+--------------+ +---------------+ 
| table_name | index_name | rows_per_key | | rows_per_key | error (actual) 
+------------+-----------------+--------------+ +---------------+ 
| partsupp | PRIMARY | 3, 1 | | 4, 1 | 25% 
| partsupp | i_ps_partkey | 3, 0 | => | 4, 1 | 25% (4) 
| partsupp | i_ps_suppkey | 64, 0 | | 91, 1 | 30% (80) 
| orders | i_o_orderdate | 9597, 1 | | 1660956, 0 | 99% (6234) 
| orders | i_o_custkey | 15, 1 | | 15, 0 | 0% (15) 
| lineitem | i_l_receiptdate | 7425, 1, 1 | | 6665850, 1, 1 | 99.9% (23477) 
+------------+-----------------+--------------+ +---------------+ 
MariaDB [dbt3]> select * from information_schema.innodb_table_stats; 
+-----------------+----------+ +----------+ 
| table_name | rows | | rows | 
+-----------------+----------+ +----------+ 
| partsupp | 6524766 | | 9101065 | 28% (8000000) 
| orders | 15039855 | ==> | 14948612 | 0.6% (15000000) 
| lineitem | 60062904 | | 59992655 | 0.1% (59986052) 
. 
+-----------------+----------+ +----------+
Controlling statistics (MySQL 5.6) 
● Persistent and user-updatetable InnoDB statistics 
– innodb_analyze_is_persistent = ON, 
– updated manually by ANALYZE TABLE or 
– automatically by innodb_stats_auto_recalc = ON 
● Control the precision of sampling [default 8] 
– innodb_stats_persistent_sample_pages, 
– innodb_stats_transient_sample_pages 
● No new statistics compared to older versions. 
46 07:48:08 AM
Controlling statistics (MariaDB 10.0) 
Current XtraDB index statistics 
+ 
● Engine-independent, persistent, user-updateable statistics 
● Precise 
● Additional statistics per column (even when there is no 
index): 
– min_value, max_value: minimum/maximum value per 
47 07:48:08 AM 
column 
– nulls_ratio: fraction of null values in a column 
– avg_length: average size of values in a column 
– avg_frequency: average number of rows with the same 
value.
48 07:48:08 AM 
Join condition 
pushdown
49 07:48:08 AM 
Join condition pushdown 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and 
o_orderpriority='1-URGENT'; 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+.
50 07:48:08 AM 
Join condition pushdown 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and 
o_orderpriority='1-URGENT'; 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
51 07:48:08 AM 
Join condition pushdown 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and 
o_orderpriority='1-URGENT'; 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
52 07:48:08 AM 
Join condition pushdown 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and 
o_orderpriority='1-URGENT'; 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
● Conjunctive (ANDed) conditions are split into parts 
● Each part is attached as early as possible 
– Either as “Using where” 
– Or as table access method.
Observing join condition pushdown 
53 07:48:08 AM 
EXPLAIN: { 
"query_block": { 
"select_id": 1, 
"nested_loop": [ 
{ 
"table": { 
"table_name": "orders", 
"access_type": "ALL", 
"possible_keys": [ 
"i_o_custkey" 
], 
"rows": 1499715, 
"filtered": 100, 
"attached_condition": "((`dbt3sf1`.`orders`.`o_orderpriority` = 
'1-URGENT') and (`dbt3sf1`.`orders`.`o_custkey` is not null))" 
} 
}, 
{ 
"table": { 
"table_name": "customer", 
"access_type": "eq_ref", 
"possible_keys": [ 
"PRIMARY" 
], 
"key": "PRIMARY", 
"used_key_parts": [ 
"c_custkey" 
], 
"key_length": "4", 
"ref": [ 
"dbt3sf1.orders.o_custkey" 
], 
"rows": 1, 
"filtered": 100, 
"attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` < 
<cache>(-(500)))" 
} 
● Before mysql-5.6: 
EXPLAIN shows only 
“Using where” 
– The condition itself 
only visible in debug 
trace 
● Starting from 5.6: 
EXPLAIN FORMAT=JSON 
shows attached 
conditions.
Reasoning about join plan efficiency 
54 07:48:08 AM 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
First table, “customer” 
● type=ALL, 150 K rows 
● select count(*) from customer where c_acctbal < -500 gives 6804. 
● alter table customer add index (c_acctbal).
Reasoning about join plan efficiency 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
55 07:48:08 AM 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; 
First table, “customer” 
● type=ALL, 150 K rows 
● select count(*) from customer where c_acctbal < -500 gives 6804. 
● alter table customer add index (c_acctbal) 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
Now, access to 'customer' is efficient.
Reasoning about join plan efficiency 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
56 07:48:08 AM 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; 
Second table, “orders” 
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT' 
● ref access uses only c_custkey=o_custkey 
● What about o_orderpriority='1-URGENT'?.
57 07:48:08 AM 
●o_orderpriority='1-URGENT' 
o_orderpriority='1-URGENT' 
● select count(*) from orders – 1.5M rows 
● select count(*) from orders where o_orderpriority='1-URGENT' - 300K 
rows 
● 300K / 1.5M = 0.2
Reasoning about join plan efficiency 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
58 07:48:08 AM 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; 
Second table, “orders” 
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT' 
● ref access uses only c_custkey=o_custkey 
● What about o_orderpriority='1-URGENT'? Selectivity= 0.2 
– Can examine 7*0.2=1.4 rows, 6802 times if we add an index: 
alter table orders add index (o_custkey, o_orderpriority) 
or 
alter table orders add index (o_orderpriority, o_custkey)
Reasoning about join plan efficiency - summary 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
Basic* approach to evaluation of join plan efficiency: 
for each table $T in the join order { 
Look at conditions attached to table $T (condition must 
use table $T, may also use previous tables) 
Does access method used with $T make a good use 
of attached conditions? 
} 
* some other details may also affect join performance 
59 07:48:08 AM
60 07:48:08 AM 
Attached conditions
61 07:48:08 AM 
Attached conditions 
● Ideally, should be used for table access 
● Not all conditions can be used [at the same time] 
– Unused ones are still useful 
– They reduce number of scans for subsequent tables 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and 
o_orderpriority='1-URGENT'; 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
Informing optimizer about attached conditions 
Currently: a range access that's too expensive to use 
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ 
|id|select_type|table |type|possible_keys |key |key_len|ref |rows |filtered|Extra | 
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY,c_acctbal|NULL |NULL |NULL |150081| 36.22 |Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | 100.00 |Using where| 
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ 
62 07:48:08 AM 
explain extended 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal > 8000 and 
o_orderpriority='1-URGENT'; 
● `orders` will be scanned 150081 * 36.22%= 54359 times 
● This reduces the cost of join 
– Has an effect when comparing potential join plans 
● => Index i_o_custkey is not used. But may help the optimizer.
63 07:48:08 AM 
Attached condition selectivity 
● Unused indexes provide info about selectivity 
– Works, but very expensive 
● MariaDB 10.0 has engine-independent statistics 
– Index statistics 
– Non-indexed Column statistics 
● Histograms 
– Further info: 
Tomorrow, 2:20 pm @ Ballroom D 
Igor Babaev 
Engine-independent persistent statistics with histograms 
in MariaDB.
How to check if the query plan 
64 07:48:08 AM 
matches the reality
65 07:48:08 AM 
Check if query plan is realistic 
● EXPLAIN shows what optimizer 
expects. It may be wrong 
– Out-of-date index statistics 
– Non-uniform data distribution 
● Other DBMS: EXPLAIN ANALYZE 
● MySQL: no equivalent. Instead, have 
– Handler counters 
– “User statistics” (Percona, MariaDB) 
– PERFORMANCE_SCHEMA
Join analysis: example query (Q18, DBT3) 
<reset counters> 
select c_name, c_custkey, o_orderkey, o_orderdate, 
o_totalprice, sum(l_quantity) 
from customer, orders, lineitem 
where 
o_totalprice > 500000 
and c_custkey = o_custkey 
and o_orderkey = l_orderkey 
group by c_name, c_custkey, o_orderkey, o_orderdate, 
o_totalprice 
order by o_totalprice desc, o_orderdate 
LIMIT 10; 
<collect statistics> 
66 07:48:08 AM
Join analysis: handler counters (old) 
67 07:48:08 AM 
FLUSH STATUS; 
=> RUN QUERY 
SHOW STATUS LIKE "Handler%"; 
+----------------------------+-------+ 
| Handler_mrr_key_refills | 0 | 
| Handler_mrr_rowid_refills | 0 | 
| Handler_read_first | 0 | 
| Handler_read_key | 1646 | 
| Handler_read_last | 0 | 
| Handler_read_next | 1462 | 
| Handler_read_prev | 0 | 
| Handler_read_rnd | 10 | 
| Handler_read_rnd_deleted | 0 | 
| Handler_read_rnd_next | 184 | 
| Handler_tmp_update | 1096 | 
| Handler_tmp_write | 183 | 
| Handler_update | 0 | 
| Handler_write | 0 |
Join analysis: USERSTAT by Facebook 
MariaDB, Percona Server 
SET GLOBAL USERSTAT=1; 
FLUSH TABLE_STATISTICS; 
FLUSH INDEX_STATISTICS; 
=> RUN QUERY 
SHOW TABLE_STATISTICS; 
+--------------+------------+-----------+--------------+-------------------------+ 
| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes | 
+--------------+------------+-----------+--------------+-------------------------+ 
| dbt3 | orders | 183 | 0 | 0 | 
| dbt3 | lineitem | 1279 | 0 | 0 | 
| dbt3 | customer | 183 | 0 | 0 | 
+--------------+------------+-----------+--------------+-------------------------+ 
SHOW INDEX_STATISTICS; 
+--------------+------------+-----------------------+-----------+ 
| Table_schema | Table_name | Index_name | Rows_read | 
+--------------+------------+-----------------------+-----------+ 
| dbt3 | customer | PRIMARY | 183 | 
| dbt3 | lineitem | i_l_orderkey_quantity | 1279 | 
| dbt3 | orders | i_o_totalprice | 183 | 
+--------------+------------+-----------------------+-----------+ 
68 07:48:08 AM
Join analysis: PERFORMANCE SCHEMA 
[MySQL 5.6, MariaDB 10.0] 
● summary tables with read/write statistics 
69 07:48:08 AM 
– table_io_waits_summary_by_table 
– table_io_waits_summary_by_index_usage 
● Superset of the userstat tables 
● More overhead 
● Not possible to associate statistics with a query 
=> truncate stats tables before running a query 
● Possible bug 
– performance schema not ignored 
– Disable by 
UPDATE setup_consumers SET ENABLED = 'NO' 
where name = 'global_instrumentation';
Analyze joins via PERFORMANCE SCHEMA: 
SHOW TABLE_STATISTICS analogue 
select object_schema, object_name, count_read, count_write, 
70 07:48:08 AM 
sum_timer_read, sum_timer_write, ... 
from table_io_waits_summary_by_table 
where object_schema = 'dbt3' and count_star > 0; 
+---------------+-------------+------------+-------------+ 
| object_schema | object_name | count_read | count_write | 
+---------------+-------------+------------+-------------+ 
| dbt3 | customer | 183 | 0 | 
| dbt3 | lineitem | 1462 | 0 | 
| dbt3 | orders | 184 | 0 | 
+---------------+-------------+------------+-------------+ 
+----------------+-----------------+ 
| sum_timer_read | sum_timer_write | ... 
+----------------+-----------------+ 
| 8326528406 | 0 | 
| 12117332778 | 0 | 
| 7946312812 | 0 | 
+----------------+-----------------+
Analyze joins via PERFORMANCE SCHEMA: 
SHOW INDEX_STATISTICS analogue 
select object_schema, object_name, index_name, count_read, 
71 07:48:08 AM 
sum_timer_read, sum_timer_write, ... 
from table_io_waits_summary_by_index_usage 
where object_schema = 'dbt3' and count_star > 0 
and index_name is not null; 
+---------------+-------------+-----------------------+------------+ 
| object_schema | object_name | index_name | count_read | 
+---------------+-------------+-----------------------+------------+ 
| dbt3 | customer | PRIMARY | 183 | 
| dbt3 | lineitem | i_l_orderkey_quantity | 1462 | 
| dbt3 | orders | i_o_totalprice | 184 | 
+---------------+-------------+-----------------------+------------+ 
+----------------+-----------------+ 
| sum_timer_read | sum_timer_write | ... 
+----------------+-----------------+ 
| 8326528406 | 0 | 
| 12117332778 | 0 | 
| 7946312812 | 0 | 
+----------------+-----------------+
72 07:48:08 AM 
● Introduction 
– What is an optimizer problem 
– How to catch it 
● old an new tools 
● Single-table selects 
– brief recap from 2012 
● JOINs 
– ref access 
● index statistics 
– join condition pushdown 
– join plan efficiency 
– query plan vs reality 
● Big I/O bound JOINs 
– Batched Key Access 
● Aggregate functions 
● ORDER BY ... LIMIT 
● GROUP BY 
● Subqueries
73 07:48:08 AM 
Batched joins 
● Optimization for analytical queries 
● Analytic queries shovel through lots of data 
– e.g. “average size of order in the last month” 
– or “pairs of goods purchased together” 
● Indexes,etc won't help when you really need to 
look at all data 
● More data means greater chance of being io-bound 
● Solution: batched joins
74 07:48:08 AM 
Batched Key Access Idea
75 07:48:08 AM 
Batched Key Access Idea
76 07:48:08 AM 
Batched Key Access Idea
77 07:48:08 AM 
Batched Key Access Idea
78 07:48:08 AM 
Batched Key Access Idea
79 07:48:08 AM 
Batched Key Access Idea
80 07:48:08 AM 
Batched Key Access Idea 
● Non-BKA join hits data at random 
● Caches are not used efficiently 
● Prefetching is not useful
81 07:48:08 AM 
Batched Key Access Idea 
● BKA implementation accesses data 
in order 
● Takes advantages of caches and 
prefetching
82 07:48:08 AM 
Batched Key access effect 
set join_cache_level=6; 
select max(l_extendedprice) 
from orders, lineitem 
where 
l_orderkey=o_orderkey and 
o_orderdate between $DATE1 and $DATE2 
The benchmark was run with 
● Various BKA buffer size 
● Various size of $DATE1...$DATE2 range
83 07:48:08 AM 
Batched Key Access Performance 
3000 
2500 
2000 
1500 
1000 
500 
0 
BKA join performance depending on buffer size 
-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000 
query_size=1, regular 
query_size=1, BKA 
query_size=2, regular 
query_size=2, BKA 
query_size=3, regular 
query_size=3, BKA 
Buffer size, bytes 
Query time, sec 
Performance without BKA 
Performance with BKA, 
given sufficient buffer size ● 4x-10x speedup 
● The more the data, the bigger the speedup 
● Buffer size setting is very important.
84 07:48:08 AM 
Batched Key Access settings 
● Needs to be turned on 
set join_buffer_size= 32*1024*1024; 
set join_cache_level=6; -- MariaDB 
set optimizer_switch='batched_key_access=on' -- MySQL 5.6 
set optimizer_switch='mrr=on'; 
set optimizer_switch='mrr_sort_keys=on'; -- MariaDB only 
● Further join_buffer_size tuning is watching 
– Query performance 
– Handler_mrr_init counter 
and increasing join_buffer_size until either saturates.
85 07:48:08 AM 
Batched Key Access - conclusions 
● Targeted at big joins 
● Needs to be enabled manually 
● @@join_buffer_size is the most important 
setting 
● MariaDB's implementation is a superset of 
MySQL's.
86 07:48:08 AM 
● Introduction 
– What is an optimizer problem 
– How to catch it 
● old an new tools 
● Single-table selects 
– brief recap from 2012 
● JOINs 
– ref access 
● index statistics 
– join condition pushdown 
– join plan efficiency 
– query plan vs reality 
● Big I/O bound JOINs 
– Batched Key Access 
● Aggregate functions 
● ORDER BY ... LIMIT 
● GROUP BY 
● Subqueries
ORDER BY 
87 07:48:08 AM 
aggregates 
GROUP BY
Aggregate functions, no GROUP BY 
● COUNT, SUM, AVG, etc need to examine all rows 
select SUM(column) from tbl needs to examine the whole tbl. 
● MIN and MAX can use index for lookup 
index (o_orderpriority, o_orderdate) 
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ 
|id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ 
|1 |SIMPLE |NULL |NULL|NULL |NULL|NULL |NULL|NULL|Select tables optimized away| 
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ 
88 07:48:08 AM 
index (o_orderdate) 
select max(o_orderdate) from orders 
select min(o_orderdate) from orders where o_orderdate > '1995-05-01' 
select max(o_orderdate) from orders where o_orderpriority='1-URGENT'
ORDER BY … LIMIT 
Three algorithms 
● Use an index to read in order 
● Read one table, sort, join - “Using filesort” 
● Execute join into temporary table and then 
sort - “Using temporary; Using filesort” 
89 07:48:08 AM
Using index to read data in order 
● No special indication 
in EXPLAIN output 
● LIMIT n: as soon as 
we read n records, 
we can stop! 
90 07:48:08 AM
A problem with LIMIT N optimization 
`orders` has 1.5 M rows 
explain select * from orders order by o_orderdate desc limit 10; 
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra| 
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ 
|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 | | 
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ 
select * from orders where o_orderpriority='1-URGENT' order by o_orderdate desc limit 10; 
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ 
|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 |Using where| 
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ 
91 07:48:08 AM 
● A problem: 
– 1.5M rows, 300K of them 'URGENT' 
– Scanning by date, when will we find 10 'URGENT' rows? 
– No good solution so far.
92 07:48:08 AM 
Using filesort strategy 
● Have to read the entire 
first table 
● For remaining, can apply 
LIMIT n 
● ORDER BY can only use 
columns of tbl1.
93 07:48:08 AM 
Using temporary; Using filesort 
● ORDER BY clause 
can use columns of 
any table 
● LIMIT is applied only 
after executing the 
entire join and 
sorting.
94 07:48:08 AM 
ORDER BY - conclusions 
● Resolving ORDER BY with index allows very 
efficient handling for LIMIT 
– Optimization for 
WHERE unused_condition ORDER BY … LIMIT n 
is challenging. 
● Use sql_big_result, IGNORE INDEX FOR ORDER BY 
● Using filesort 
– Needs all ORDER BY columns in the first table 
– Take advantage of LIMIT when doing join to non-first tables 
● Using where; Using filesort is least efficient.
95 07:48:08 AM 
GROUP BY strategies 
There are three strategies 
● Ordered index scan 
● Loose Index Scan (LooseScan) 
● Groups table 
(Using temporary; [Using filesort]).
96 07:48:08 AM 
Ordered index scan 
● Groups are 
enumerated one after 
another 
● Can compute 
aggregates on the fly 
● Loose index scan is 
also able to jump to 
next group.
Execution of GROUP BY with temptable 
97 07:48:08 AM
Subqueries 
98 07:48:08 AM
99 07:48:08 AM 
Subquery optimizations 
● Before MariaDB 5.3/MySQL 5.6 - “don't use subqueries” 
● Queries that caused most of the pain 
– SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins 
– SELECT … FROM (SELECT …) - derived tables 
● MariaDB 5.3 and MySQL 5.6 
– Have common inheritance, MySQL 6.0 alpha 
– Huge (100x, 1000x) speedups for painful areas 
– Other kinds of subqueries received a speedup, too 
– MariaDB 5.3/5.5 has a superset of MySQL 5.6's optimizations 
● 5.6 handles some un-handled edge cases, too
100 07:48:08 AM 
Tuning for subqueries 
● “Before”: one execution strategy 
– No tuning possible 
● “After”: similar to joins 
– Reasonable execution strategies supported 
– Need indexes 
– Need selective conditions 
– Support batching in most important cases 
● Should be better 9x% of the time.
What if it still picks a poor query plan? 
For both MariaDB and MySQL: 
● Check EXPLAIN [EXTENDED], find a keyword around a 
101 07:48:08 AM 
subquery table 
● Google “site:kb.askmonty.org $subuqery_keyword” 
or https://kb.askmonty.org/en/subquery-optimizations-map/ 
● Find which optimization it was 
● set optimizer_switch='$subquery_optimization=off'
102 07:48:08 AM 
Thanks! 
Q & A

More Related Content

What's hot

ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...Sergey Petrunya
 
Percona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuningPercona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuningSergey Petrunya
 
New features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in actionNew features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in actionSveta Smirnova
 
MySQL Query tuning 101
MySQL Query tuning 101MySQL Query tuning 101
MySQL Query tuning 101Sveta Smirnova
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
Introducing new SQL syntax and improving performance with preparse Query Rewr...
Introducing new SQL syntax and improving performance with preparse Query Rewr...Introducing new SQL syntax and improving performance with preparse Query Rewr...
Introducing new SQL syntax and improving performance with preparse Query Rewr...Sveta Smirnova
 
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sSveta Smirnova
 
Troubleshooting MySQL Performance
Troubleshooting MySQL PerformanceTroubleshooting MySQL Performance
Troubleshooting MySQL PerformanceSveta Smirnova
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
Optimizing Queries with Explain
Optimizing Queries with ExplainOptimizing Queries with Explain
Optimizing Queries with ExplainMYXPLAIN
 
MariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerMariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerSergey Petrunya
 
Advanced MySQL Query and Schema Tuning
Advanced MySQL Query and Schema TuningAdvanced MySQL Query and Schema Tuning
Advanced MySQL Query and Schema TuningMYXPLAIN
 
Performance Schema for MySQL Troubleshooting
 Performance Schema for MySQL Troubleshooting Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
Applied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System PresentationApplied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System PresentationRichard Crowley
 
Character Encoding - MySQL DevRoom - FOSDEM 2015
Character Encoding - MySQL DevRoom - FOSDEM 2015Character Encoding - MySQL DevRoom - FOSDEM 2015
Character Encoding - MySQL DevRoom - FOSDEM 2015mushupl
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace WalkthroughSergey Petrunya
 
Moving to the NoSQL side: MySQL JSON functions
 Moving to the NoSQL side: MySQL JSON functions Moving to the NoSQL side: MySQL JSON functions
Moving to the NoSQL side: MySQL JSON functionsSveta Smirnova
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query TuningSveta Smirnova
 
Efficient Pagination Using MySQL
Efficient Pagination Using MySQLEfficient Pagination Using MySQL
Efficient Pagination Using MySQLEvan Weaver
 

What's hot (20)

ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
 
Percona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuningPercona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuning
 
New features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in actionNew features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in action
 
Explain
ExplainExplain
Explain
 
MySQL Query tuning 101
MySQL Query tuning 101MySQL Query tuning 101
MySQL Query tuning 101
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Introducing new SQL syntax and improving performance with preparse Query Rewr...
Introducing new SQL syntax and improving performance with preparse Query Rewr...Introducing new SQL syntax and improving performance with preparse Query Rewr...
Introducing new SQL syntax and improving performance with preparse Query Rewr...
 
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]s
 
Troubleshooting MySQL Performance
Troubleshooting MySQL PerformanceTroubleshooting MySQL Performance
Troubleshooting MySQL Performance
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Optimizing Queries with Explain
Optimizing Queries with ExplainOptimizing Queries with Explain
Optimizing Queries with Explain
 
MariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerMariaDB 10.0 Query Optimizer
MariaDB 10.0 Query Optimizer
 
Advanced MySQL Query and Schema Tuning
Advanced MySQL Query and Schema TuningAdvanced MySQL Query and Schema Tuning
Advanced MySQL Query and Schema Tuning
 
Performance Schema for MySQL Troubleshooting
 Performance Schema for MySQL Troubleshooting Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Applied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System PresentationApplied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System Presentation
 
Character Encoding - MySQL DevRoom - FOSDEM 2015
Character Encoding - MySQL DevRoom - FOSDEM 2015Character Encoding - MySQL DevRoom - FOSDEM 2015
Character Encoding - MySQL DevRoom - FOSDEM 2015
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
Moving to the NoSQL side: MySQL JSON functions
 Moving to the NoSQL side: MySQL JSON functions Moving to the NoSQL side: MySQL JSON functions
Moving to the NoSQL side: MySQL JSON functions
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
 
Efficient Pagination Using MySQL
Efficient Pagination Using MySQLEfficient Pagination Using MySQL
Efficient Pagination Using MySQL
 

Similar to Advanced Query Optimizer Tuning and Analysis

Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)Valeriy Kravchuk
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf TuningHighLoad2009
 
How to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraHow to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraSveta Smirnova
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?Mydbops
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 MinutesSveta Smirnova
 
OSMC 2008 | Monitoring MySQL by Geert Vanderkelen
OSMC 2008 | Monitoring MySQL by Geert VanderkelenOSMC 2008 | Monitoring MySQL by Geert Vanderkelen
OSMC 2008 | Monitoring MySQL by Geert VanderkelenNETWAYS
 
MariaDB 10.4 New Features
MariaDB 10.4 New FeaturesMariaDB 10.4 New Features
MariaDB 10.4 New FeaturesFromDual GmbH
 
IT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New FeaturesIT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New FeaturesFromDual GmbH
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL IndexingMYXPLAIN
 
LVOUG meetup #4 - Case Study 10g to 11g
LVOUG meetup #4 - Case Study 10g to 11gLVOUG meetup #4 - Case Study 10g to 11g
LVOUG meetup #4 - Case Study 10g to 11gMaris Elsins
 
DB Floripa - ProxySQL para MySQL
DB Floripa - ProxySQL para MySQLDB Floripa - ProxySQL para MySQL
DB Floripa - ProxySQL para MySQLMarcelo Altmann
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query OptimizationAnju Garg
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015Dave Stokes
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015Dave Stokes
 

Similar to Advanced Query Optimizer Tuning and Analysis (20)

Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
 
Perf Tuning Short
Perf Tuning ShortPerf Tuning Short
Perf Tuning Short
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf Tuning
 
How to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraHow to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with Galera
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
 
OSMC 2008 | Monitoring MySQL by Geert Vanderkelen
OSMC 2008 | Monitoring MySQL by Geert VanderkelenOSMC 2008 | Monitoring MySQL by Geert Vanderkelen
OSMC 2008 | Monitoring MySQL by Geert Vanderkelen
 
MariaDB 10.4 New Features
MariaDB 10.4 New FeaturesMariaDB 10.4 New Features
MariaDB 10.4 New Features
 
IT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New FeaturesIT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New Features
 
MySQLinsanity
MySQLinsanityMySQLinsanity
MySQLinsanity
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL Indexing
 
LVOUG meetup #4 - Case Study 10g to 11g
LVOUG meetup #4 - Case Study 10g to 11gLVOUG meetup #4 - Case Study 10g to 11g
LVOUG meetup #4 - Case Study 10g to 11g
 
DB Floripa - ProxySQL para MySQL
DB Floripa - ProxySQL para MySQLDB Floripa - ProxySQL para MySQL
DB Floripa - ProxySQL para MySQL
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
Mysql tracing
Mysql tracingMysql tracing
Mysql tracing
 
Mysql tracing
Mysql tracingMysql tracing
Mysql tracing
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
 

More from MYXPLAIN

Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksMYXPLAIN
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index CookbookMYXPLAIN
 
Are You Getting the Best of your MySQL Indexes
Are You Getting the Best of your MySQL IndexesAre You Getting the Best of your MySQL Indexes
Are You Getting the Best of your MySQL IndexesMYXPLAIN
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, ReallyMYXPLAIN
 
MySQL 5.6 Performance
MySQL 5.6 PerformanceMySQL 5.6 Performance
MySQL 5.6 PerformanceMYXPLAIN
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MYXPLAIN
 
56 Query Optimization
56 Query Optimization56 Query Optimization
56 Query OptimizationMYXPLAIN
 
Tools and Techniques for Index Design
Tools and Techniques for Index DesignTools and Techniques for Index Design
Tools and Techniques for Index DesignMYXPLAIN
 
Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6MYXPLAIN
 
The Power of MySQL Explain
The Power of MySQL ExplainThe Power of MySQL Explain
The Power of MySQL ExplainMYXPLAIN
 
Improving Performance with Better Indexes
Improving Performance with Better IndexesImproving Performance with Better Indexes
Improving Performance with Better IndexesMYXPLAIN
 
Explaining the MySQL Explain
Explaining the MySQL ExplainExplaining the MySQL Explain
Explaining the MySQL ExplainMYXPLAIN
 
Covering indexes
Covering indexesCovering indexes
Covering indexesMYXPLAIN
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewMYXPLAIN
 
Advanced query optimization
Advanced query optimizationAdvanced query optimization
Advanced query optimizationMYXPLAIN
 

More from MYXPLAIN (15)

Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index Cookbook
 
Are You Getting the Best of your MySQL Indexes
Are You Getting the Best of your MySQL IndexesAre You Getting the Best of your MySQL Indexes
Are You Getting the Best of your MySQL Indexes
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, Really
 
MySQL 5.6 Performance
MySQL 5.6 PerformanceMySQL 5.6 Performance
MySQL 5.6 Performance
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6
 
56 Query Optimization
56 Query Optimization56 Query Optimization
56 Query Optimization
 
Tools and Techniques for Index Design
Tools and Techniques for Index DesignTools and Techniques for Index Design
Tools and Techniques for Index Design
 
Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6
 
The Power of MySQL Explain
The Power of MySQL ExplainThe Power of MySQL Explain
The Power of MySQL Explain
 
Improving Performance with Better Indexes
Improving Performance with Better IndexesImproving Performance with Better Indexes
Improving Performance with Better Indexes
 
Explaining the MySQL Explain
Explaining the MySQL ExplainExplaining the MySQL Explain
Explaining the MySQL Explain
 
Covering indexes
Covering indexesCovering indexes
Covering indexes
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Advanced query optimization
Advanced query optimizationAdvanced query optimization
Advanced query optimization
 

Recently uploaded

Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Amil baba
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabusViolet Violet
 
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfRenewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfodunowoeminence2019
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxHome
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Apollo Techno Industries Pvt Ltd
 
EPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxEPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxJoseeMusabyimana
 
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxVertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxLMW Machine Tool Division
 
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxSUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxNaveenVerma126
 
Multicomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdfMulticomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdfGiovanaGhasary1
 
solar wireless electric vechicle charging system
solar wireless electric vechicle charging systemsolar wireless electric vechicle charging system
solar wireless electric vechicle charging systemgokuldongala
 
Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...sahb78428
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchrohitcse52
 
UNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptxUNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptxrealme6igamerr
 
How to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfHow to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfRedhwan Qasem Shaddad
 
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratoryدليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide LaboratoryBahzad5
 
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdfsdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdfJulia Kaye
 
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecTrupti Shiralkar, CISSP
 

Recently uploaded (20)

Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
 
Présentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdfPrésentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdf
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabus
 
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfRenewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptx
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
 
EPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxEPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptx
 
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxVertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
 
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxSUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
 
Multicomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdfMulticomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdf
 
Litature Review: Research Paper work for Engineering
Litature Review: Research Paper work for EngineeringLitature Review: Research Paper work for Engineering
Litature Review: Research Paper work for Engineering
 
solar wireless electric vechicle charging system
solar wireless electric vechicle charging systemsolar wireless electric vechicle charging system
solar wireless electric vechicle charging system
 
Lecture 4 .pdf
Lecture 4                              .pdfLecture 4                              .pdf
Lecture 4 .pdf
 
Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
UNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptxUNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptx
 
How to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfHow to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdf
 
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratoryدليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
 
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdfsdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
 
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
 

Advanced Query Optimizer Tuning and Analysis

  • 1. Advanced query optimizer tuning and analysis Sergei Petrunia Timour Katchaounov Monty Program Ab MySQL Conference And Expo 2013
  • 2. 2 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 3. Is there a problem with query optimizer? 3 07:48:08 AM • Database performance is affected by many factors • One of them is the query optimizer • Is my performance problem caused by the optimizer?
  • 4. Sings that there is a query optimizer problem • Some (not all) queries are slow • A query seems to run longer than it ought to – And examines more records than it ought to • Usually, query remains slow regardless of other activity on the server 4 07:48:08 AM
  • 5. Catching slow queries, the old ways 5 07:48:08 AM ● Watch the Slow query log – Percona Server/MariaDB: --log_slow_verbosity=query_plan # Thread_id: 1 Schema: dbt3sf10 QC_hit: No # Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000 # Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 SET timestamp=1333385770; select * from customer where c_acctbal < -1000; – Run pt-query-digest on the log • Run SHOW PROCESSLIST periodically
  • 6. The new way: SHOW PROCESSLIST + SHOW EXPLAIN • Available in MariaDB 10.0+ • Displays EXPLAIN of a running statement MariaDB> show processlist; +--+----+---------+-------+-------+----+------------+-------------------------... |Id|User|Host |db |Command|Time|State |Info +--+----+---------+-------+-------+----+------------+-------------------------... | 1|root|localhost|dbt3sf1|Query | 10|Sending data|select max(o_totalprice) ... | 2|root|localhost|dbt3sf1|Query | 0|init |show processlist +--+----+---------+-------+-------+----+------------+-------------------------... MariaDB> show explain for 1; +--+-----------+------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |orders|ALL |NULL |NULL|NULL |NULL|1498194|Using where| +--+-----------+------+----+-------------+----+-------+----+-------+-----------+ MariaDB [dbt3sf1]> show warnings; +-----+----+-----------------------------------------------------------------+ |Level|Code|Message | +-----+----+-----------------------------------------------------------------+ |Note |1003|select max(o_totalprice) from orders where year(o_orderDATE)=1995| +-----+----+-----------------------------------------------------------------+ 6 07:48:08 AM
  • 7. 7 07:48:08 AM SHOW EXPLAIN usage ● Intended usage – SHOW PROCESSLIST ... – SHOW EXPLAIN FOR ... ● Why not just run EXPLAIN again – Difficult to replicate setups ● Temporary tables ● Optimizer settings ● Storage engine's index statistics ● ... – No uncertainty about whether you're looking at the same query plan or not.
  • 8. Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 8 07:48:08 AM ● use performance_schema ● Many ways to analyze via queries – events_statements_summary_by_digest ● count_star, sum_timer_wait, min_timer_wait, avg_timer_wait, max_timer_wait ● digest_text, digest ● sum_rows_examined, sum_created_tmp_disk_tables, sum_select_full_join – events_statements_history ● sql_text, digest_text, digest ● timer_start, timer_end, timer_wait ● rows_examined, created_tmp_disk_tables, select_full_join 8
  • 9. Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] • Modified Q18 from DBT3 select c_name, c_custkey, o_orderkey, o_orderdate, 9 07:48:08 AM o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > ? and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate LIMIT 10; • App executes Q18 many times with ? = 550000, 500000, 400000, ... 9
  • 10. Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● Find candidate slow queries ● Simple tests: select_full_join > 0, created_tmp_disk_tables > 0, etc ● Complex conditions: max execution time > X sec OR min/max time vary a lot: select max_timer_wait/avg_timer_wait as max_ratio, avg_timer_wait/min_timer_wait as min_ratio from events_statements_summary_by_digest where max_timer_wait > 1000000000000 or max_timer_wait / avg_timer_wait > 2 or avg_timer_wait / min_timer_wait > 2G 10 07:48:08 AM
  • 11. Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 11 07:48:08 AM *************************** 5. row *************************** DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6b DIGEST_TEXT: SELECT `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` , SUM ( `l_quantity` ) FROM `customer` , `orders` , `lineitem` WHERE `o_totalprice` > ? AND `c_custkey` = `o_custkey` AND `o_orderkey` = `l_orderkey` GROUP BY `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` ORDER BY `o_totalprice` DESC , `o_orderdate` LIMIT ? COUNT_STAR: 3 SUM_TIMER_WAIT: 3251758347000 MIN_TIMER_WAIT: 3914209000 → 0.0039 sec AVG_TIMER_WAIT: 1083919449000 MAX_TIMER_WAIT: 3204044053000 → 3.2 sec SUM_LOCK_TIME: 555000000 SUM_ROWS_SENT: 25 SUM_ROWS_EXAMINED: 0 SUM_CREATED_TMP_DISK_TABLES: 0 SUM_CREATED_TMP_TABLES: 3 SUM_SELECT_FULL_JOIN: 0 SUM_SELECT_RANGE: 3 SUM_SELECT_SCAN: 0 SUM_SORT_RANGE: 0 SUM_SORT_ROWS: 25 SUM_SORT_SCAN: 3 SUM_NO_INDEX_USED: 0 SUM_NO_GOOD_INDEX_USED: 0 FIRST_SEEN: 1970-01-01 03:38:27 LAST_SEEN: 1970-01-01 03:38:43 max_ratio: 2.9560 min_ratio: 276.9192 High variance of execution time
  • 12. Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● Check the actual queries and constants ● The events_statements_history table select timer_wait/1000000000000 as exec_time, sql_text from events_statements_history where digest in (select digest from events_statements_summary_by_digest where max_timer_wait > 1000000000000 12 07:48:08 AM or max_timer_wait / avg_timer_wait > 2 or avg_timer_wait / min_timer_wait > 2) order by timer_wait;
  • 13. Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] +-----------+-----------------------------------------------------------------------------------+ | exec_time | sql_text | +-----------+-----------------------------------------------------------------------------------+ | 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 | | 0.0438 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 500000 and c_custkey = o_custkey ... LIMIT 10 | | 3.2040 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 400000 and c_custkey = o_custkey ... LIMIT 10 | +-----------+-----------------------------------------------------------------------------------+ Observation: orders.o_totalprice > ? is less and less selective 13 07:48:08 AM
  • 14. Actions after finding the slow query Bad query plan – Rewrite the query – Force a good query plan • Bad optimizer settings – Do tuning • Query is inherently complex – Don't waste time with it – Look for other solutions. 14 07:48:08 AM
  • 15. 15 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 16. o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' ● Run the query: 19 rows in set (7.65 sec) ● Check the query plan: +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 16 07:48:08 AM Consider a simple select select * from orders where • 15M rows were scanned, 19 rows in output • Query plan seems inefficient – (note: this logic doesn't directly apply to group/order by queries).
  • 17. select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 17 07:48:08 AM Query plan analysis • Entire table is scanned • WHERE condition checked after records are read – Not used to limit #examined rows.
  • 18. o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where| +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 18 07:48:08 AM Let's add an index alter table orders add key i_o_orderdate (o_orderdate); select * from orders where ● Query time: 19 rows in set (0.76 sec) • Outcome – Down to reading 300K rows – Still, 300K >> 19 rows.
  • 19. Finding out which indexes to add select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' Check selectivity of conditions that will use the index o_orderDate BETWEEN '1992-06-06' and '1992-07-06'; 19 07:48:08 AM ● index (o_orderdate) select count(*) from orders where 306322 rows ● index (o_clerk) select count(*) from orders where o_clerk='Clerk#000009506' 1507 rows.
  • 20. Try adding composite indexes +--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ |1 |SIMPLE |orders|range|i_o_clerk_...|i_o_clerk_date|20 |NULL|19 |Using where| +--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ +--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ |1 |SIMPLE |orders|range|i_o_date_c...|i_o_date_clerk|20 |NULL|360354|Using where| +--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ 20 07:48:08 AM ● index (o_clerk, o_orderdate) ● index (o_orderdate, o_clerk) Bingo! 100% efficiency Much worse! • If condition uses multiple columns, composite index will be most efficient • Order of column matters – Explanation why is outside of scope of this tutorial. Covered in last year's tutorial
  • 21. Conditions must be in SARGable form • Condition must represent a range • It must have form that is recognized by the optimizer o_orderDate BETWEEN '1992-06-01' and '1992-06-30' day(o_orderDate)=1992 and month(o_orderdate)=6 TO_DAYS(o_orderDATE) between TO_DAYS('1992-06-06') and TO_DAYS('1992-07-06') 21 07:48:08 AM o_clerk='Clerk#000009506' o_clerk LIKE 'Clerk#000009506' o_clerk LIKE '%Clerk#000009506%'        column IN (1,10,15,21, ...) (col1, col2) IN ( (1,1), (2,2), (3,3), …).
  • 22. New in MySQL-5.6: optimizer_trace 22 07:48:08 AM ● Lets you see the ranges set optimizer_trace=1; explain select * from orders where o_orderDATE between '1992-06-01' and '1992-07-03' and o_orderdate not in ('1992-01-01', '1992-06-12','1992-07-04') select * from information_schema.optimizer_traceG ● Will print a big JSON struct ● Search for range_scan_alternatives.
  • 23. New in MySQL-5.6: optimizer_trace 23 07:48:08 AM ... "range_scan_alternatives": [ { "index": "i_o_orderdate", "ranges": [ "1992-06-01 <= o_orderDATE < 1992-06-12", "1992-06-12 < o_orderDATE <= 1992-07-03" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 319082, "cost": 382900, "chosen": true }, { "index": "i_o_date_clerk", "ranges": [ "1992-06-01 <= o_orderDATE < 1992-06-12", "1992-06-12 < o_orderDATE <= 1992-07-03" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 406336, "cost": 487605, "chosen": false, "cause": "cost" } ], ... ● Considered ranges are shown in range_scan_alternatives section ● This is actually original use case of optimizer_trace ● Alas, recent mysql-5.6 displays misleading info about ranges on multi-component keys (will file a bug) ● Still, very useful.
  • 24. 24 07:48:08 AM Source of #rows estimates for range select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where| +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ ? • “records_in_range” estimate • Done by diving into index • Usually is fairly accurate • Not affected by ANALYZE TABLE.
  • 25. 25 07:48:08 AM Simple selects: conclusions • Efficiency == “#rows_scanned is close to #rows_returned” • Indexes and WHERE conditions reduce #rows scanned • Index estimates are usually accurate • Multi-column indexes – “handle” conditions on multiple columns – Order of columns in the index matters • optimizer_trace allows to view the ranges – But misrepresents ranges over multi-column indexes.
  • 26. 26 07:48:08 AM Now, will skip some topics One can also speedup simple selects with ● index_merge access method ● index access method ● Index Condition Pushdown We don't have time for these now, check out the last year's tutorial.
  • 27. 27 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 28. • “Customers with their orders” 28 07:48:08 AM A simple join select * from customer, orders where c_custkey=o_custkey
  • 29. Execution: Nested Loops join select * from customer, orders where c_custkey=o_custkey 29 07:48:08 AM for each customer C { for each order O { if (C.c_custkey == O.o_custkey) produce record(C, O); } } • Complexity: – Scans table customer – For each record in customer, scans table orders • Is this ok?
  • 30. Execution: Nested loops join (2) select * from customer, orders where c_custkey=o_custkey 30 07:48:08 AM for each customer C { for each order O { if (C.c_custkey == O.o_custkey) produce record(C, O); } } • EXPLAIN: +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | |1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
  • 31. Execution: Nested loops join (3) select * from customer, orders where c_custkey=o_custkey 31 07:48:08 AM for each customer C { for each order O { if (C.c_custkey == O.o_custkey) produce record(C, O); } } rows to read • EXPLAIN: from customer +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | |1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ rows to read from orders c_custkey=o_custkey
  • 32. Execution: Nested loops join (4) select * from customer, orders where c_custkey=o_custkey +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | |1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ • Scan a 1,493,361-row table 148,749 times – Consider 1,493,361 * 148,749 row combinations • Is this query inherently complex? – We know each customer has his own orders – size(customer x orders)= size(orders) – Lower bound is 1,493,361 + 148,749 + costs to match customer<->order. 32 07:48:08 AM
  • 33. Using index for join: ref access alter table orders add index i_o_custkey(o_custkey) select * from customer, orders where c_custkey=o_custkey 33 07:48:08 AM
  • 34. select * from customer, orders where c_custkey=o_custkey 34 07:48:08 AM ref access - analysis +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |148749| | |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ ● One ref lookup scans 7 rows. ● In total: 7 * 148,749=1,041,243 rows – `orders` has 1.4M rows – no redundant reads from `orders` ● The whole query plan – Reads all customers – Reads 1M orders (of 1.4M) ● Efficient!
  • 35. Conditions that can be used for ref access 35 07:48:08 AM ● Can use equalities – tbl.key=other_table.col – tbl.key=const – tbl.key IS NULL ● For multipart keys, will use largest prefix – keypart1=... AND keypart2= … AND keypartK=... .
  • 36. Conditions that can't be used for ref access ● Doesn't work for non-equalities 36 07:48:08 AM t1.key BETWEEN t2.col1 AND t2.col2 ● Doesn't work for OR-ed equalities t1.key=t2.col1 OR t1.key=t2.col2 – Except for ref_or_null t1.key=... OR t1.key IS NULL ● Doesn't “combine” ref and range access – t.keypart1 BETWEEN c1 AND c2 AND t.keypart2=t2.col – t.keypart2 BETWEEN c1 AND c2 AND t.keypart1=t2.col .
  • 37. 37 07:48:08 AM Is ref always efficient? ● Efficient, if column has many different values – Best case – unique index (eq_ref) ● A few different values – not useful ● Skewed distribution: depends on which part the join touches good bad depends
  • 38. ref access estimates - index statistics 38 07:48:08 AM • How many rows will match tbl.key_column = $value for an arbitrary $value? • Index statistics show keys from orders where key_name='i_o_custkey' *************************** 1. row *************** Table: orders Non_unique: 1 Key_name: i_o_custkey Seq_in_index: 1 Column_name: o_custkey Collation: A Cardinality: 214462 Sub_part: NULL Packed: NULL Null: YES Index_type: BTREE show table status like 'orders' *************************** 1. row **** Name: orders Engine: InnoDB Version: 10 Row_format: Compact Rows: 1495152 Avg_row_length: 133 Data_length: 199966720 Max_data_length: 0 Index_length: 122421248 Data_free: 6291456 ... average = Rows /Cardinality = 1495152 / 214462 = 6.97.
  • 39. 39 07:48:08 AM ref access – conclusions ● Based on t.key=... equality conditions ● Can make joins very efficient ● Relies on index statistics for estimates.
  • 40. 40 07:48:08 AM Optimizer statistics ● MySQL/Percona Server – Index statistics – Persistent/transient InnoDB stats ● MariaDB – Index statistics, persistent/transient ● Same as Percona Server (via XtraDB) – Persistent, engine-independent, index-independent statistics.
  • 41. 41 07:48:08 AM Index statistics ● Cardinality allows to calculate a table-wide average #rows-per-key-prefix ● It is a statistical value (inexact) ● Exact collection procedure depends on the storage engine – InnoDB – random sampling – MyISAM – index scan – Engine-independent – index scan.
  • 42. 42 07:48:08 AM Index statistics in MySQL 5.6 ● Sample [8] random index leaf pages ● Table statistics (stored) – rows - estimated number of rows in a table – Other stats not used by optimizer ● Index statistics (stored) – fields - #fields in the index – rows_per_key - rows per 1 key value, per prefix fields ([1 column value], [2 columns value], [3 columns value], …) – Other stats not used by optimizer.
  • 43. 43 07:48:08 AM Index statics updates ● Statistics updated when: – ANALYZE TABLE tbl_name [, tbl_name] … – SHOW TABLE STATUS, SHOW INDEX – Access to INFORMATION_SCHEMA.[TABLES| STATISTICS] – A table is opened for the first time (after server restart) – A table has changed >10% – When InnoDB Monitor is turned ON.
  • 44. 44 07:48:08 AM Displaying optimizer statistics ● MySQL 5.5, MariaDB 5.3, and older – Issue SQL statements to count rows/keys – Indirectly, look at EXPLAIN for simple queries ● MariaDB 5.5, Percona Server 5.5 (using XtraDB) – information_schema.[innodb_index_stats, innodb_table_stats] – Read-only, always visible ● MySQL 5.6 – mysql.[innodb_index_stats, innodb_table_stats] – User updatetable – Only available if innodb_analyze_is_persistent=ON ● MariaDB 10.0 – Persistent updateable tables mysql.[index_stats, column_stats, table_stats] – User updateable – + current XtraDB mechanisms.
  • 45. 45 07:48:08 AM Plan [in]stability ● Statistics may vary a lot (orders) MariaDB [dbt3]> select * from information_schema.innodb_index_stats; +------------+-----------------+--------------+ +---------------+ | table_name | index_name | rows_per_key | | rows_per_key | error (actual) +------------+-----------------+--------------+ +---------------+ | partsupp | PRIMARY | 3, 1 | | 4, 1 | 25% | partsupp | i_ps_partkey | 3, 0 | => | 4, 1 | 25% (4) | partsupp | i_ps_suppkey | 64, 0 | | 91, 1 | 30% (80) | orders | i_o_orderdate | 9597, 1 | | 1660956, 0 | 99% (6234) | orders | i_o_custkey | 15, 1 | | 15, 0 | 0% (15) | lineitem | i_l_receiptdate | 7425, 1, 1 | | 6665850, 1, 1 | 99.9% (23477) +------------+-----------------+--------------+ +---------------+ MariaDB [dbt3]> select * from information_schema.innodb_table_stats; +-----------------+----------+ +----------+ | table_name | rows | | rows | +-----------------+----------+ +----------+ | partsupp | 6524766 | | 9101065 | 28% (8000000) | orders | 15039855 | ==> | 14948612 | 0.6% (15000000) | lineitem | 60062904 | | 59992655 | 0.1% (59986052) . +-----------------+----------+ +----------+
  • 46. Controlling statistics (MySQL 5.6) ● Persistent and user-updatetable InnoDB statistics – innodb_analyze_is_persistent = ON, – updated manually by ANALYZE TABLE or – automatically by innodb_stats_auto_recalc = ON ● Control the precision of sampling [default 8] – innodb_stats_persistent_sample_pages, – innodb_stats_transient_sample_pages ● No new statistics compared to older versions. 46 07:48:08 AM
  • 47. Controlling statistics (MariaDB 10.0) Current XtraDB index statistics + ● Engine-independent, persistent, user-updateable statistics ● Precise ● Additional statistics per column (even when there is no index): – min_value, max_value: minimum/maximum value per 47 07:48:08 AM column – nulls_ratio: fraction of null values in a column – avg_length: average size of values in a column – avg_frequency: average number of rows with the same value.
  • 48. 48 07:48:08 AM Join condition pushdown
  • 49. 49 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+.
  • 50. 50 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  • 51. 51 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  • 52. 52 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ ● Conjunctive (ANDed) conditions are split into parts ● Each part is attached as early as possible – Either as “Using where” – Or as table access method.
  • 53. Observing join condition pushdown 53 07:48:08 AM EXPLAIN: { "query_block": { "select_id": 1, "nested_loop": [ { "table": { "table_name": "orders", "access_type": "ALL", "possible_keys": [ "i_o_custkey" ], "rows": 1499715, "filtered": 100, "attached_condition": "((`dbt3sf1`.`orders`.`o_orderpriority` = '1-URGENT') and (`dbt3sf1`.`orders`.`o_custkey` is not null))" } }, { "table": { "table_name": "customer", "access_type": "eq_ref", "possible_keys": [ "PRIMARY" ], "key": "PRIMARY", "used_key_parts": [ "c_custkey" ], "key_length": "4", "ref": [ "dbt3sf1.orders.o_custkey" ], "rows": 1, "filtered": 100, "attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` < <cache>(-(500)))" } ● Before mysql-5.6: EXPLAIN shows only “Using where” – The condition itself only visible in debug trace ● Starting from 5.6: EXPLAIN FORMAT=JSON shows attached conditions.
  • 54. Reasoning about join plan efficiency 54 07:48:08 AM select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ First table, “customer” ● type=ALL, 150 K rows ● select count(*) from customer where c_acctbal < -500 gives 6804. ● alter table customer add index (c_acctbal).
  • 55. Reasoning about join plan efficiency +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 55 07:48:08 AM select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; First table, “customer” ● type=ALL, 150 K rows ● select count(*) from customer where c_acctbal < -500 gives 6804. ● alter table customer add index (c_acctbal) +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ Now, access to 'customer' is efficient.
  • 56. Reasoning about join plan efficiency +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 56 07:48:08 AM select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; Second table, “orders” ● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT' ● ref access uses only c_custkey=o_custkey ● What about o_orderpriority='1-URGENT'?.
  • 57. 57 07:48:08 AM ●o_orderpriority='1-URGENT' o_orderpriority='1-URGENT' ● select count(*) from orders – 1.5M rows ● select count(*) from orders where o_orderpriority='1-URGENT' - 300K rows ● 300K / 1.5M = 0.2
  • 58. Reasoning about join plan efficiency +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 58 07:48:08 AM select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; Second table, “orders” ● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT' ● ref access uses only c_custkey=o_custkey ● What about o_orderpriority='1-URGENT'? Selectivity= 0.2 – Can examine 7*0.2=1.4 rows, 6802 times if we add an index: alter table orders add index (o_custkey, o_orderpriority) or alter table orders add index (o_orderpriority, o_custkey)
  • 59. Reasoning about join plan efficiency - summary +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ Basic* approach to evaluation of join plan efficiency: for each table $T in the join order { Look at conditions attached to table $T (condition must use table $T, may also use previous tables) Does access method used with $T make a good use of attached conditions? } * some other details may also affect join performance 59 07:48:08 AM
  • 60. 60 07:48:08 AM Attached conditions
  • 61. 61 07:48:08 AM Attached conditions ● Ideally, should be used for table access ● Not all conditions can be used [at the same time] – Unused ones are still useful – They reduce number of scans for subsequent tables select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  • 62. Informing optimizer about attached conditions Currently: a range access that's too expensive to use +--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ |id|select_type|table |type|possible_keys |key |key_len|ref |rows |filtered|Extra | +--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY,c_acctbal|NULL |NULL |NULL |150081| 36.22 |Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | 100.00 |Using where| +--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ 62 07:48:08 AM explain extended select * from customer, orders where c_custkey=o_custkey and c_acctbal > 8000 and o_orderpriority='1-URGENT'; ● `orders` will be scanned 150081 * 36.22%= 54359 times ● This reduces the cost of join – Has an effect when comparing potential join plans ● => Index i_o_custkey is not used. But may help the optimizer.
  • 63. 63 07:48:08 AM Attached condition selectivity ● Unused indexes provide info about selectivity – Works, but very expensive ● MariaDB 10.0 has engine-independent statistics – Index statistics – Non-indexed Column statistics ● Histograms – Further info: Tomorrow, 2:20 pm @ Ballroom D Igor Babaev Engine-independent persistent statistics with histograms in MariaDB.
  • 64. How to check if the query plan 64 07:48:08 AM matches the reality
  • 65. 65 07:48:08 AM Check if query plan is realistic ● EXPLAIN shows what optimizer expects. It may be wrong – Out-of-date index statistics – Non-uniform data distribution ● Other DBMS: EXPLAIN ANALYZE ● MySQL: no equivalent. Instead, have – Handler counters – “User statistics” (Percona, MariaDB) – PERFORMANCE_SCHEMA
  • 66. Join analysis: example query (Q18, DBT3) <reset counters> select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 500000 and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate LIMIT 10; <collect statistics> 66 07:48:08 AM
  • 67. Join analysis: handler counters (old) 67 07:48:08 AM FLUSH STATUS; => RUN QUERY SHOW STATUS LIKE "Handler%"; +----------------------------+-------+ | Handler_mrr_key_refills | 0 | | Handler_mrr_rowid_refills | 0 | | Handler_read_first | 0 | | Handler_read_key | 1646 | | Handler_read_last | 0 | | Handler_read_next | 1462 | | Handler_read_prev | 0 | | Handler_read_rnd | 10 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 184 | | Handler_tmp_update | 1096 | | Handler_tmp_write | 183 | | Handler_update | 0 | | Handler_write | 0 |
  • 68. Join analysis: USERSTAT by Facebook MariaDB, Percona Server SET GLOBAL USERSTAT=1; FLUSH TABLE_STATISTICS; FLUSH INDEX_STATISTICS; => RUN QUERY SHOW TABLE_STATISTICS; +--------------+------------+-----------+--------------+-------------------------+ | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes | +--------------+------------+-----------+--------------+-------------------------+ | dbt3 | orders | 183 | 0 | 0 | | dbt3 | lineitem | 1279 | 0 | 0 | | dbt3 | customer | 183 | 0 | 0 | +--------------+------------+-----------+--------------+-------------------------+ SHOW INDEX_STATISTICS; +--------------+------------+-----------------------+-----------+ | Table_schema | Table_name | Index_name | Rows_read | +--------------+------------+-----------------------+-----------+ | dbt3 | customer | PRIMARY | 183 | | dbt3 | lineitem | i_l_orderkey_quantity | 1279 | | dbt3 | orders | i_o_totalprice | 183 | +--------------+------------+-----------------------+-----------+ 68 07:48:08 AM
  • 69. Join analysis: PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● summary tables with read/write statistics 69 07:48:08 AM – table_io_waits_summary_by_table – table_io_waits_summary_by_index_usage ● Superset of the userstat tables ● More overhead ● Not possible to associate statistics with a query => truncate stats tables before running a query ● Possible bug – performance schema not ignored – Disable by UPDATE setup_consumers SET ENABLED = 'NO' where name = 'global_instrumentation';
  • 70. Analyze joins via PERFORMANCE SCHEMA: SHOW TABLE_STATISTICS analogue select object_schema, object_name, count_read, count_write, 70 07:48:08 AM sum_timer_read, sum_timer_write, ... from table_io_waits_summary_by_table where object_schema = 'dbt3' and count_star > 0; +---------------+-------------+------------+-------------+ | object_schema | object_name | count_read | count_write | +---------------+-------------+------------+-------------+ | dbt3 | customer | 183 | 0 | | dbt3 | lineitem | 1462 | 0 | | dbt3 | orders | 184 | 0 | +---------------+-------------+------------+-------------+ +----------------+-----------------+ | sum_timer_read | sum_timer_write | ... +----------------+-----------------+ | 8326528406 | 0 | | 12117332778 | 0 | | 7946312812 | 0 | +----------------+-----------------+
  • 71. Analyze joins via PERFORMANCE SCHEMA: SHOW INDEX_STATISTICS analogue select object_schema, object_name, index_name, count_read, 71 07:48:08 AM sum_timer_read, sum_timer_write, ... from table_io_waits_summary_by_index_usage where object_schema = 'dbt3' and count_star > 0 and index_name is not null; +---------------+-------------+-----------------------+------------+ | object_schema | object_name | index_name | count_read | +---------------+-------------+-----------------------+------------+ | dbt3 | customer | PRIMARY | 183 | | dbt3 | lineitem | i_l_orderkey_quantity | 1462 | | dbt3 | orders | i_o_totalprice | 184 | +---------------+-------------+-----------------------+------------+ +----------------+-----------------+ | sum_timer_read | sum_timer_write | ... +----------------+-----------------+ | 8326528406 | 0 | | 12117332778 | 0 | | 7946312812 | 0 | +----------------+-----------------+
  • 72. 72 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 73. 73 07:48:08 AM Batched joins ● Optimization for analytical queries ● Analytic queries shovel through lots of data – e.g. “average size of order in the last month” – or “pairs of goods purchased together” ● Indexes,etc won't help when you really need to look at all data ● More data means greater chance of being io-bound ● Solution: batched joins
  • 74. 74 07:48:08 AM Batched Key Access Idea
  • 75. 75 07:48:08 AM Batched Key Access Idea
  • 76. 76 07:48:08 AM Batched Key Access Idea
  • 77. 77 07:48:08 AM Batched Key Access Idea
  • 78. 78 07:48:08 AM Batched Key Access Idea
  • 79. 79 07:48:08 AM Batched Key Access Idea
  • 80. 80 07:48:08 AM Batched Key Access Idea ● Non-BKA join hits data at random ● Caches are not used efficiently ● Prefetching is not useful
  • 81. 81 07:48:08 AM Batched Key Access Idea ● BKA implementation accesses data in order ● Takes advantages of caches and prefetching
  • 82. 82 07:48:08 AM Batched Key access effect set join_cache_level=6; select max(l_extendedprice) from orders, lineitem where l_orderkey=o_orderkey and o_orderdate between $DATE1 and $DATE2 The benchmark was run with ● Various BKA buffer size ● Various size of $DATE1...$DATE2 range
  • 83. 83 07:48:08 AM Batched Key Access Performance 3000 2500 2000 1500 1000 500 0 BKA join performance depending on buffer size -2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000 query_size=1, regular query_size=1, BKA query_size=2, regular query_size=2, BKA query_size=3, regular query_size=3, BKA Buffer size, bytes Query time, sec Performance without BKA Performance with BKA, given sufficient buffer size ● 4x-10x speedup ● The more the data, the bigger the speedup ● Buffer size setting is very important.
  • 84. 84 07:48:08 AM Batched Key Access settings ● Needs to be turned on set join_buffer_size= 32*1024*1024; set join_cache_level=6; -- MariaDB set optimizer_switch='batched_key_access=on' -- MySQL 5.6 set optimizer_switch='mrr=on'; set optimizer_switch='mrr_sort_keys=on'; -- MariaDB only ● Further join_buffer_size tuning is watching – Query performance – Handler_mrr_init counter and increasing join_buffer_size until either saturates.
  • 85. 85 07:48:08 AM Batched Key Access - conclusions ● Targeted at big joins ● Needs to be enabled manually ● @@join_buffer_size is the most important setting ● MariaDB's implementation is a superset of MySQL's.
  • 86. 86 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 87. ORDER BY 87 07:48:08 AM aggregates GROUP BY
  • 88. Aggregate functions, no GROUP BY ● COUNT, SUM, AVG, etc need to examine all rows select SUM(column) from tbl needs to examine the whole tbl. ● MIN and MAX can use index for lookup index (o_orderpriority, o_orderdate) +--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ |id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra | +--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ |1 |SIMPLE |NULL |NULL|NULL |NULL|NULL |NULL|NULL|Select tables optimized away| +--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ 88 07:48:08 AM index (o_orderdate) select max(o_orderdate) from orders select min(o_orderdate) from orders where o_orderdate > '1995-05-01' select max(o_orderdate) from orders where o_orderpriority='1-URGENT'
  • 89. ORDER BY … LIMIT Three algorithms ● Use an index to read in order ● Read one table, sort, join - “Using filesort” ● Execute join into temporary table and then sort - “Using temporary; Using filesort” 89 07:48:08 AM
  • 90. Using index to read data in order ● No special indication in EXPLAIN output ● LIMIT n: as soon as we read n records, we can stop! 90 07:48:08 AM
  • 91. A problem with LIMIT N optimization `orders` has 1.5 M rows explain select * from orders order by o_orderdate desc limit 10; +--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra| +--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ |1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 | | +--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ select * from orders where o_orderpriority='1-URGENT' order by o_orderdate desc limit 10; +--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ |1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 |Using where| +--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ 91 07:48:08 AM ● A problem: – 1.5M rows, 300K of them 'URGENT' – Scanning by date, when will we find 10 'URGENT' rows? – No good solution so far.
  • 92. 92 07:48:08 AM Using filesort strategy ● Have to read the entire first table ● For remaining, can apply LIMIT n ● ORDER BY can only use columns of tbl1.
  • 93. 93 07:48:08 AM Using temporary; Using filesort ● ORDER BY clause can use columns of any table ● LIMIT is applied only after executing the entire join and sorting.
  • 94. 94 07:48:08 AM ORDER BY - conclusions ● Resolving ORDER BY with index allows very efficient handling for LIMIT – Optimization for WHERE unused_condition ORDER BY … LIMIT n is challenging. ● Use sql_big_result, IGNORE INDEX FOR ORDER BY ● Using filesort – Needs all ORDER BY columns in the first table – Take advantage of LIMIT when doing join to non-first tables ● Using where; Using filesort is least efficient.
  • 95. 95 07:48:08 AM GROUP BY strategies There are three strategies ● Ordered index scan ● Loose Index Scan (LooseScan) ● Groups table (Using temporary; [Using filesort]).
  • 96. 96 07:48:08 AM Ordered index scan ● Groups are enumerated one after another ● Can compute aggregates on the fly ● Loose index scan is also able to jump to next group.
  • 97. Execution of GROUP BY with temptable 97 07:48:08 AM
  • 99. 99 07:48:08 AM Subquery optimizations ● Before MariaDB 5.3/MySQL 5.6 - “don't use subqueries” ● Queries that caused most of the pain – SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins – SELECT … FROM (SELECT …) - derived tables ● MariaDB 5.3 and MySQL 5.6 – Have common inheritance, MySQL 6.0 alpha – Huge (100x, 1000x) speedups for painful areas – Other kinds of subqueries received a speedup, too – MariaDB 5.3/5.5 has a superset of MySQL 5.6's optimizations ● 5.6 handles some un-handled edge cases, too
  • 100. 100 07:48:08 AM Tuning for subqueries ● “Before”: one execution strategy – No tuning possible ● “After”: similar to joins – Reasonable execution strategies supported – Need indexes – Need selective conditions – Support batching in most important cases ● Should be better 9x% of the time.
  • 101. What if it still picks a poor query plan? For both MariaDB and MySQL: ● Check EXPLAIN [EXTENDED], find a keyword around a 101 07:48:08 AM subquery table ● Google “site:kb.askmonty.org $subuqery_keyword” or https://kb.askmonty.org/en/subquery-optimizations-map/ ● Find which optimization it was ● set optimizer_switch='$subquery_optimization=off'
  • 102. 102 07:48:08 AM Thanks! Q & A