Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Advanced Query Optimizer Tuning and Analysis
1. Advanced query optimizer
tuning and analysis
Sergei Petrunia
Timour Katchaounov
Monty Program Ab
MySQL Conference And Expo 2013
2. 2 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
3. Is there a problem with query optimizer?
3 07:48:08 AM
• Database
performance is
affected by many
factors
• One of them is the
query optimizer
• Is my performance
problem caused by
the optimizer?
4. Sings that there is a query optimizer problem
• Some (not all) queries are slow
• A query seems to run longer than it ought to
– And examines more records than it ought to
• Usually, query remains slow regardless of
other activity on the server
4 07:48:08 AM
5. Catching slow queries, the old ways
5 07:48:08 AM
● Watch the Slow query log
– Percona Server/MariaDB:
--log_slow_verbosity=query_plan
# Thread_id: 1 Schema: dbt3sf10 QC_hit: No
# Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0
SET timestamp=1333385770;
select * from customer where c_acctbal < -1000;
– Run pt-query-digest on the log
• Run SHOW PROCESSLIST periodically
6. The new way: SHOW PROCESSLIST + SHOW EXPLAIN
• Available in MariaDB 10.0+
• Displays EXPLAIN of a running statement
MariaDB> show processlist;
+--+----+---------+-------+-------+----+------------+-------------------------...
|Id|User|Host |db |Command|Time|State |Info
+--+----+---------+-------+-------+----+------------+-------------------------...
| 1|root|localhost|dbt3sf1|Query | 10|Sending data|select max(o_totalprice) ...
| 2|root|localhost|dbt3sf1|Query | 0|init |show processlist
+--+----+---------+-------+-------+----+------------+-------------------------...
MariaDB> show explain for 1;
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |orders|ALL |NULL |NULL|NULL |NULL|1498194|Using where|
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+
MariaDB [dbt3sf1]> show warnings;
+-----+----+-----------------------------------------------------------------+
|Level|Code|Message |
+-----+----+-----------------------------------------------------------------+
|Note |1003|select max(o_totalprice) from orders where year(o_orderDATE)=1995|
+-----+----+-----------------------------------------------------------------+
6 07:48:08 AM
7. 7 07:48:08 AM
SHOW EXPLAIN usage
● Intended usage
– SHOW PROCESSLIST ...
– SHOW EXPLAIN FOR ...
● Why not just run EXPLAIN again
– Difficult to replicate setups
● Temporary tables
● Optimizer settings
● Storage engine's index statistics
● ...
– No uncertainty about whether you're looking at
the same query plan or not.
9. Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
• Modified Q18 from DBT3
select c_name, c_custkey, o_orderkey, o_orderdate,
9 07:48:08 AM
o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where
o_totalprice > ?
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey,
o_orderdate, o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 10;
• App executes Q18 many times with
? = 550000, 500000, 400000, ...
9
10. Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
● Find candidate slow queries
● Simple tests: select_full_join > 0,
created_tmp_disk_tables > 0, etc
● Complex conditions:
max execution time > X sec OR
min/max time vary a lot:
select max_timer_wait/avg_timer_wait as max_ratio,
avg_timer_wait/min_timer_wait as min_ratio
from events_statements_summary_by_digest
where max_timer_wait > 1000000000000
or max_timer_wait / avg_timer_wait > 2
or avg_timer_wait / min_timer_wait > 2G
10 07:48:08 AM
11. Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
11 07:48:08 AM
*************************** 5. row ***************************
DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6b
DIGEST_TEXT: SELECT `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` ,
`o_totalprice` , SUM ( `l_quantity` ) FROM `customer` , `orders` , `lineitem` WHERE
`o_totalprice` > ? AND `c_custkey` = `o_custkey` AND `o_orderkey` = `l_orderkey` GROUP BY
`c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` ORDER BY `o_totalprice`
DESC , `o_orderdate` LIMIT ?
COUNT_STAR: 3
SUM_TIMER_WAIT: 3251758347000
MIN_TIMER_WAIT: 3914209000 → 0.0039 sec
AVG_TIMER_WAIT: 1083919449000
MAX_TIMER_WAIT: 3204044053000 → 3.2 sec
SUM_LOCK_TIME: 555000000
SUM_ROWS_SENT: 25
SUM_ROWS_EXAMINED: 0
SUM_CREATED_TMP_DISK_TABLES: 0
SUM_CREATED_TMP_TABLES: 3
SUM_SELECT_FULL_JOIN: 0
SUM_SELECT_RANGE: 3
SUM_SELECT_SCAN: 0
SUM_SORT_RANGE: 0
SUM_SORT_ROWS: 25
SUM_SORT_SCAN: 3
SUM_NO_INDEX_USED: 0
SUM_NO_GOOD_INDEX_USED: 0
FIRST_SEEN: 1970-01-01 03:38:27
LAST_SEEN: 1970-01-01 03:38:43
max_ratio: 2.9560
min_ratio: 276.9192
High variance of
execution time
12. Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
● Check the actual queries and constants
● The events_statements_history table
select timer_wait/1000000000000 as exec_time, sql_text
from events_statements_history
where digest in
(select digest from events_statements_summary_by_digest
where max_timer_wait > 1000000000000
12 07:48:08 AM
or max_timer_wait / avg_timer_wait > 2
or avg_timer_wait / min_timer_wait > 2)
order by timer_wait;
13. Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
+-----------+-----------------------------------------------------------------------------------+
| exec_time | sql_text |
+-----------+-----------------------------------------------------------------------------------+
| 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 |
| 0.0438 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_totalprice > 500000 and c_custkey = o_custkey ... LIMIT 10 |
| 3.2040 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_totalprice > 400000 and c_custkey = o_custkey ... LIMIT 10 |
+-----------+-----------------------------------------------------------------------------------+
Observation:
orders.o_totalprice > ? is less and less selective
13 07:48:08 AM
14. Actions after finding the slow query
Bad query plan
– Rewrite the query
– Force a good query plan
• Bad optimizer settings
– Do tuning
• Query is inherently complex
– Don't waste time with it
– Look for other solutions.
14 07:48:08 AM
15. 15 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
16. o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
● Run the query:
19 rows in set (7.65 sec)
● Check the query plan:
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
16 07:48:08 AM
Consider a simple select
select * from orders
where
• 15M rows were scanned, 19 rows in output
• Query plan seems inefficient
– (note: this logic doesn't directly apply to group/order by queries).
17. select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
17 07:48:08 AM
Query plan analysis
• Entire table is scanned
• WHERE condition checked
after records are read
– Not used to limit
#examined rows.
18. o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
18 07:48:08 AM
Let's add an index
alter table orders add key i_o_orderdate (o_orderdate);
select * from orders
where
● Query time:
19 rows in set (0.76 sec)
• Outcome
– Down to reading 300K rows
– Still, 300K >> 19 rows.
19. Finding out which indexes to add
select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
Check selectivity of conditions that will use the index
o_orderDate BETWEEN '1992-06-06' and '1992-07-06';
19 07:48:08 AM
● index (o_orderdate)
select count(*) from orders
where
306322 rows
● index (o_clerk)
select count(*) from orders where o_clerk='Clerk#000009506'
1507 rows.
20. Try adding composite indexes
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+
|1 |SIMPLE |orders|range|i_o_clerk_...|i_o_clerk_date|20 |NULL|19 |Using where|
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_date_c...|i_o_date_clerk|20 |NULL|360354|Using where|
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+
20 07:48:08 AM
● index (o_clerk, o_orderdate)
● index (o_orderdate, o_clerk)
Bingo! 100% efficiency
Much worse!
• If condition uses multiple columns, composite index will be most efficient
• Order of column matters
– Explanation why is outside of scope of this tutorial. Covered in last year's
tutorial
21. Conditions must be in SARGable form
• Condition must represent a range
• It must have form that is recognized by the optimizer
o_orderDate BETWEEN '1992-06-01' and '1992-06-30'
day(o_orderDate)=1992 and month(o_orderdate)=6
TO_DAYS(o_orderDATE) between TO_DAYS('1992-06-06') and
TO_DAYS('1992-07-06')
21 07:48:08 AM
o_clerk='Clerk#000009506'
o_clerk LIKE 'Clerk#000009506'
o_clerk LIKE '%Clerk#000009506%'
column IN (1,10,15,21, ...)
(col1, col2) IN ( (1,1), (2,2), (3,3), …).
22. New in MySQL-5.6: optimizer_trace
22 07:48:08 AM
● Lets you see the ranges
set optimizer_trace=1;
explain select * from orders
where o_orderDATE between '1992-06-01' and '1992-07-03' and
o_orderdate not in ('1992-01-01', '1992-06-12','1992-07-04')
select * from information_schema.optimizer_traceG
● Will print a big JSON struct
● Search for range_scan_alternatives.
23. New in MySQL-5.6: optimizer_trace
23 07:48:08 AM
...
"range_scan_alternatives": [
{
"index": "i_o_orderdate",
"ranges": [
"1992-06-01 <= o_orderDATE < 1992-06-12",
"1992-06-12 < o_orderDATE <= 1992-07-03"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 319082,
"cost": 382900,
"chosen": true
},
{
"index": "i_o_date_clerk",
"ranges": [
"1992-06-01 <= o_orderDATE < 1992-06-12",
"1992-06-12 < o_orderDATE <= 1992-07-03"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 406336,
"cost": 487605,
"chosen": false,
"cause": "cost"
}
],
...
● Considered ranges are shown
in range_scan_alternatives
section
● This is actually original use
case of optimizer_trace
● Alas, recent mysql-5.6 displays
misleading info about ranges
on multi-component keys (will
file a bug)
● Still, very useful.
24. 24 07:48:08 AM
Source of #rows estimates for range
select * from orders
where o_orderDate BETWEEN '1992-06-06' and '1992-07-06'
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
?
• “records_in_range” estimate
• Done by diving into index
• Usually is fairly accurate
• Not affected by ANALYZE
TABLE.
25. 25 07:48:08 AM
Simple selects: conclusions
• Efficiency == “#rows_scanned is close to #rows_returned”
• Indexes and WHERE conditions reduce #rows scanned
• Index estimates are usually accurate
• Multi-column indexes
– “handle” conditions on multiple columns
– Order of columns in the index matters
• optimizer_trace allows to view the ranges
– But misrepresents ranges over multi-column indexes.
26. 26 07:48:08 AM
Now, will skip some topics
One can also speedup simple selects with
● index_merge access method
● index access method
● Index Condition Pushdown
We don't have time for these now, check out the last
year's tutorial.
27. 27 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
28. • “Customers with their orders”
28 07:48:08 AM
A simple join
select * from customer, orders where c_custkey=o_custkey
29. Execution: Nested Loops join
select * from customer, orders where c_custkey=o_custkey
29 07:48:08 AM
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
• Complexity:
– Scans table customer
– For each record in customer, scans table orders
• Is this ok?
30. Execution: Nested loops join (2)
select * from customer, orders where c_custkey=o_custkey
30 07:48:08 AM
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
• EXPLAIN:
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
31. Execution: Nested loops join (3)
select * from customer, orders where c_custkey=o_custkey
31 07:48:08 AM
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
rows to read
• EXPLAIN:
from customer
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
rows to read from orders
c_custkey=o_custkey
32. Execution: Nested loops join (4)
select * from customer, orders where c_custkey=o_custkey
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
• Scan a 1,493,361-row table 148,749 times
– Consider 1,493,361 * 148,749 row combinations
• Is this query inherently complex?
– We know each customer has his own orders
– size(customer x orders)= size(orders)
– Lower bound is
1,493,361 + 148,749 + costs to match customer<->order.
32 07:48:08 AM
33. Using index for join: ref access
alter table orders add index i_o_custkey(o_custkey)
select * from customer, orders where c_custkey=o_custkey
33 07:48:08 AM
34. select * from customer, orders where c_custkey=o_custkey
34 07:48:08 AM
ref access - analysis
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |148749| |
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
● One ref lookup scans 7 rows.
● In total: 7 * 148,749=1,041,243 rows
– `orders` has 1.4M rows
– no redundant reads from `orders`
● The whole query plan
– Reads all customers
– Reads 1M orders (of 1.4M)
● Efficient!
35. Conditions that can be used for ref access
35 07:48:08 AM
● Can use equalities
– tbl.key=other_table.col
– tbl.key=const
– tbl.key IS NULL
● For multipart keys, will use largest prefix
– keypart1=... AND keypart2= … AND keypartK=... .
36. Conditions that can't be used for ref access
● Doesn't work for non-equalities
36 07:48:08 AM
t1.key BETWEEN t2.col1 AND t2.col2
● Doesn't work for OR-ed equalities
t1.key=t2.col1 OR t1.key=t2.col2
– Except for ref_or_null
t1.key=... OR t1.key IS NULL
● Doesn't “combine” ref and range
access
– t.keypart1 BETWEEN c1 AND c2 AND
t.keypart2=t2.col
– t.keypart2 BETWEEN c1 AND c2 AND
t.keypart1=t2.col .
37. 37 07:48:08 AM
Is ref always efficient?
● Efficient, if column has many different values
– Best case – unique index (eq_ref)
● A few different values – not useful
● Skewed distribution: depends on which part the
join touches
good
bad depends
38. ref access estimates - index statistics
38 07:48:08 AM
• How many rows will match
tbl.key_column = $value
for an arbitrary $value?
• Index statistics
show keys from orders where key_name='i_o_custkey'
*************************** 1. row ***************
Table: orders
Non_unique: 1
Key_name: i_o_custkey
Seq_in_index: 1
Column_name: o_custkey
Collation: A
Cardinality: 214462
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
show table status like 'orders'
*************************** 1. row ****
Name: orders
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 1495152
Avg_row_length: 133
Data_length: 199966720
Max_data_length: 0
Index_length: 122421248
Data_free: 6291456
...
average = Rows /Cardinality = 1495152 / 214462 = 6.97.
39. 39 07:48:08 AM
ref access – conclusions
● Based on t.key=... equality conditions
● Can make joins very efficient
● Relies on index statistics for estimates.
40. 40 07:48:08 AM
Optimizer statistics
● MySQL/Percona Server
– Index statistics
– Persistent/transient InnoDB stats
● MariaDB
– Index statistics, persistent/transient
● Same as Percona Server (via XtraDB)
– Persistent,
engine-independent,
index-independent statistics.
41. 41 07:48:08 AM
Index statistics
● Cardinality allows to calculate a table-wide
average #rows-per-key-prefix
● It is a statistical value (inexact)
● Exact collection procedure depends on the
storage engine
– InnoDB – random sampling
– MyISAM – index scan
– Engine-independent – index scan.
42. 42 07:48:08 AM
Index statistics in MySQL 5.6
● Sample [8] random index leaf pages
● Table statistics (stored)
– rows - estimated number of rows in a table
– Other stats not used by optimizer
● Index statistics (stored)
– fields - #fields in the index
– rows_per_key - rows per 1 key value, per prefix fields
([1 column value], [2 columns value], [3 columns value], …)
– Other stats not used by optimizer.
43. 43 07:48:08 AM
Index statics updates
● Statistics updated when:
– ANALYZE TABLE tbl_name [, tbl_name] …
– SHOW TABLE STATUS, SHOW INDEX
– Access to INFORMATION_SCHEMA.[TABLES|
STATISTICS]
– A table is opened for the first time
(after server restart)
– A table has changed >10%
– When InnoDB Monitor is turned ON.
44. 44 07:48:08 AM
Displaying optimizer statistics
● MySQL 5.5, MariaDB 5.3, and older
– Issue SQL statements to count rows/keys
– Indirectly, look at EXPLAIN for simple queries
● MariaDB 5.5, Percona Server 5.5 (using XtraDB)
– information_schema.[innodb_index_stats, innodb_table_stats]
– Read-only, always visible
● MySQL 5.6
– mysql.[innodb_index_stats, innodb_table_stats]
– User updatetable
– Only available if innodb_analyze_is_persistent=ON
● MariaDB 10.0
– Persistent updateable tables mysql.[index_stats, column_stats, table_stats]
– User updateable
– + current XtraDB mechanisms.
46. Controlling statistics (MySQL 5.6)
● Persistent and user-updatetable InnoDB statistics
– innodb_analyze_is_persistent = ON,
– updated manually by ANALYZE TABLE or
– automatically by innodb_stats_auto_recalc = ON
● Control the precision of sampling [default 8]
– innodb_stats_persistent_sample_pages,
– innodb_stats_transient_sample_pages
● No new statistics compared to older versions.
46 07:48:08 AM
47. Controlling statistics (MariaDB 10.0)
Current XtraDB index statistics
+
● Engine-independent, persistent, user-updateable statistics
● Precise
● Additional statistics per column (even when there is no
index):
– min_value, max_value: minimum/maximum value per
47 07:48:08 AM
column
– nulls_ratio: fraction of null values in a column
– avg_length: average size of values in a column
– avg_frequency: average number of rows with the same
value.
49. 49 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+.
50. 50 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
51. 51 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
52. 52 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
● Conjunctive (ANDed) conditions are split into parts
● Each part is attached as early as possible
– Either as “Using where”
– Or as table access method.
53. Observing join condition pushdown
53 07:48:08 AM
EXPLAIN: {
"query_block": {
"select_id": 1,
"nested_loop": [
{
"table": {
"table_name": "orders",
"access_type": "ALL",
"possible_keys": [
"i_o_custkey"
],
"rows": 1499715,
"filtered": 100,
"attached_condition": "((`dbt3sf1`.`orders`.`o_orderpriority` =
'1-URGENT') and (`dbt3sf1`.`orders`.`o_custkey` is not null))"
}
},
{
"table": {
"table_name": "customer",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"c_custkey"
],
"key_length": "4",
"ref": [
"dbt3sf1.orders.o_custkey"
],
"rows": 1,
"filtered": 100,
"attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` <
<cache>(-(500)))"
}
● Before mysql-5.6:
EXPLAIN shows only
“Using where”
– The condition itself
only visible in debug
trace
● Starting from 5.6:
EXPLAIN FORMAT=JSON
shows attached
conditions.
54. Reasoning about join plan efficiency
54 07:48:08 AM
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
First table, “customer”
● type=ALL, 150 K rows
● select count(*) from customer where c_acctbal < -500 gives 6804.
● alter table customer add index (c_acctbal).
55. Reasoning about join plan efficiency
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
55 07:48:08 AM
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
First table, “customer”
● type=ALL, 150 K rows
● select count(*) from customer where c_acctbal < -500 gives 6804.
● alter table customer add index (c_acctbal)
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
Now, access to 'customer' is efficient.
56. Reasoning about join plan efficiency
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
56 07:48:08 AM
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
Second table, “orders”
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT'
● ref access uses only c_custkey=o_custkey
● What about o_orderpriority='1-URGENT'?.
57. 57 07:48:08 AM
●o_orderpriority='1-URGENT'
o_orderpriority='1-URGENT'
● select count(*) from orders – 1.5M rows
● select count(*) from orders where o_orderpriority='1-URGENT' - 300K
rows
● 300K / 1.5M = 0.2
58. Reasoning about join plan efficiency
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
58 07:48:08 AM
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
Second table, “orders”
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT'
● ref access uses only c_custkey=o_custkey
● What about o_orderpriority='1-URGENT'? Selectivity= 0.2
– Can examine 7*0.2=1.4 rows, 6802 times if we add an index:
alter table orders add index (o_custkey, o_orderpriority)
or
alter table orders add index (o_orderpriority, o_custkey)
59. Reasoning about join plan efficiency - summary
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
Basic* approach to evaluation of join plan efficiency:
for each table $T in the join order {
Look at conditions attached to table $T (condition must
use table $T, may also use previous tables)
Does access method used with $T make a good use
of attached conditions?
}
* some other details may also affect join performance
59 07:48:08 AM
61. 61 07:48:08 AM
Attached conditions
● Ideally, should be used for table access
● Not all conditions can be used [at the same time]
– Unused ones are still useful
– They reduce number of scans for subsequent tables
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
62. Informing optimizer about attached conditions
Currently: a range access that's too expensive to use
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+
|id|select_type|table |type|possible_keys |key |key_len|ref |rows |filtered|Extra |
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY,c_acctbal|NULL |NULL |NULL |150081| 36.22 |Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | 100.00 |Using where|
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+
62 07:48:08 AM
explain extended
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal > 8000 and
o_orderpriority='1-URGENT';
● `orders` will be scanned 150081 * 36.22%= 54359 times
● This reduces the cost of join
– Has an effect when comparing potential join plans
● => Index i_o_custkey is not used. But may help the optimizer.
63. 63 07:48:08 AM
Attached condition selectivity
● Unused indexes provide info about selectivity
– Works, but very expensive
● MariaDB 10.0 has engine-independent statistics
– Index statistics
– Non-indexed Column statistics
● Histograms
– Further info:
Tomorrow, 2:20 pm @ Ballroom D
Igor Babaev
Engine-independent persistent statistics with histograms
in MariaDB.
64. How to check if the query plan
64 07:48:08 AM
matches the reality
65. 65 07:48:08 AM
Check if query plan is realistic
● EXPLAIN shows what optimizer
expects. It may be wrong
– Out-of-date index statistics
– Non-uniform data distribution
● Other DBMS: EXPLAIN ANALYZE
● MySQL: no equivalent. Instead, have
– Handler counters
– “User statistics” (Percona, MariaDB)
– PERFORMANCE_SCHEMA
66. Join analysis: example query (Q18, DBT3)
<reset counters>
select c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where
o_totalprice > 500000
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 10;
<collect statistics>
66 07:48:08 AM
69. Join analysis: PERFORMANCE SCHEMA
[MySQL 5.6, MariaDB 10.0]
● summary tables with read/write statistics
69 07:48:08 AM
– table_io_waits_summary_by_table
– table_io_waits_summary_by_index_usage
● Superset of the userstat tables
● More overhead
● Not possible to associate statistics with a query
=> truncate stats tables before running a query
● Possible bug
– performance schema not ignored
– Disable by
UPDATE setup_consumers SET ENABLED = 'NO'
where name = 'global_instrumentation';
72. 72 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
73. 73 07:48:08 AM
Batched joins
● Optimization for analytical queries
● Analytic queries shovel through lots of data
– e.g. “average size of order in the last month”
– or “pairs of goods purchased together”
● Indexes,etc won't help when you really need to
look at all data
● More data means greater chance of being io-bound
● Solution: batched joins
80. 80 07:48:08 AM
Batched Key Access Idea
● Non-BKA join hits data at random
● Caches are not used efficiently
● Prefetching is not useful
81. 81 07:48:08 AM
Batched Key Access Idea
● BKA implementation accesses data
in order
● Takes advantages of caches and
prefetching
82. 82 07:48:08 AM
Batched Key access effect
set join_cache_level=6;
select max(l_extendedprice)
from orders, lineitem
where
l_orderkey=o_orderkey and
o_orderdate between $DATE1 and $DATE2
The benchmark was run with
● Various BKA buffer size
● Various size of $DATE1...$DATE2 range
83. 83 07:48:08 AM
Batched Key Access Performance
3000
2500
2000
1500
1000
500
0
BKA join performance depending on buffer size
-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000
query_size=1, regular
query_size=1, BKA
query_size=2, regular
query_size=2, BKA
query_size=3, regular
query_size=3, BKA
Buffer size, bytes
Query time, sec
Performance without BKA
Performance with BKA,
given sufficient buffer size ● 4x-10x speedup
● The more the data, the bigger the speedup
● Buffer size setting is very important.
84. 84 07:48:08 AM
Batched Key Access settings
● Needs to be turned on
set join_buffer_size= 32*1024*1024;
set join_cache_level=6; -- MariaDB
set optimizer_switch='batched_key_access=on' -- MySQL 5.6
set optimizer_switch='mrr=on';
set optimizer_switch='mrr_sort_keys=on'; -- MariaDB only
● Further join_buffer_size tuning is watching
– Query performance
– Handler_mrr_init counter
and increasing join_buffer_size until either saturates.
85. 85 07:48:08 AM
Batched Key Access - conclusions
● Targeted at big joins
● Needs to be enabled manually
● @@join_buffer_size is the most important
setting
● MariaDB's implementation is a superset of
MySQL's.
86. 86 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
88. Aggregate functions, no GROUP BY
● COUNT, SUM, AVG, etc need to examine all rows
select SUM(column) from tbl needs to examine the whole tbl.
● MIN and MAX can use index for lookup
index (o_orderpriority, o_orderdate)
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+
|1 |SIMPLE |NULL |NULL|NULL |NULL|NULL |NULL|NULL|Select tables optimized away|
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+
88 07:48:08 AM
index (o_orderdate)
select max(o_orderdate) from orders
select min(o_orderdate) from orders where o_orderdate > '1995-05-01'
select max(o_orderdate) from orders where o_orderpriority='1-URGENT'
89. ORDER BY … LIMIT
Three algorithms
● Use an index to read in order
● Read one table, sort, join - “Using filesort”
● Execute join into temporary table and then
sort - “Using temporary; Using filesort”
89 07:48:08 AM
90. Using index to read data in order
● No special indication
in EXPLAIN output
● LIMIT n: as soon as
we read n records,
we can stop!
90 07:48:08 AM
91. A problem with LIMIT N optimization
`orders` has 1.5 M rows
explain select * from orders order by o_orderdate desc limit 10;
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra|
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+
|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 | |
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+
select * from orders where o_orderpriority='1-URGENT' order by o_orderdate desc limit 10;
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+
|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 |Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+
91 07:48:08 AM
● A problem:
– 1.5M rows, 300K of them 'URGENT'
– Scanning by date, when will we find 10 'URGENT' rows?
– No good solution so far.
92. 92 07:48:08 AM
Using filesort strategy
● Have to read the entire
first table
● For remaining, can apply
LIMIT n
● ORDER BY can only use
columns of tbl1.
93. 93 07:48:08 AM
Using temporary; Using filesort
● ORDER BY clause
can use columns of
any table
● LIMIT is applied only
after executing the
entire join and
sorting.
94. 94 07:48:08 AM
ORDER BY - conclusions
● Resolving ORDER BY with index allows very
efficient handling for LIMIT
– Optimization for
WHERE unused_condition ORDER BY … LIMIT n
is challenging.
● Use sql_big_result, IGNORE INDEX FOR ORDER BY
● Using filesort
– Needs all ORDER BY columns in the first table
– Take advantage of LIMIT when doing join to non-first tables
● Using where; Using filesort is least efficient.
95. 95 07:48:08 AM
GROUP BY strategies
There are three strategies
● Ordered index scan
● Loose Index Scan (LooseScan)
● Groups table
(Using temporary; [Using filesort]).
96. 96 07:48:08 AM
Ordered index scan
● Groups are
enumerated one after
another
● Can compute
aggregates on the fly
● Loose index scan is
also able to jump to
next group.
99. 99 07:48:08 AM
Subquery optimizations
● Before MariaDB 5.3/MySQL 5.6 - “don't use subqueries”
● Queries that caused most of the pain
– SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins
– SELECT … FROM (SELECT …) - derived tables
● MariaDB 5.3 and MySQL 5.6
– Have common inheritance, MySQL 6.0 alpha
– Huge (100x, 1000x) speedups for painful areas
– Other kinds of subqueries received a speedup, too
– MariaDB 5.3/5.5 has a superset of MySQL 5.6's optimizations
● 5.6 handles some un-handled edge cases, too
100. 100 07:48:08 AM
Tuning for subqueries
● “Before”: one execution strategy
– No tuning possible
● “After”: similar to joins
– Reasonable execution strategies supported
– Need indexes
– Need selective conditions
– Support batching in most important cases
● Should be better 9x% of the time.
101. What if it still picks a poor query plan?
For both MariaDB and MySQL:
● Check EXPLAIN [EXTENDED], find a keyword around a
101 07:48:08 AM
subquery table
● Google “site:kb.askmonty.org $subuqery_keyword”
or https://kb.askmonty.org/en/subquery-optimizations-map/
● Find which optimization it was
● set optimizer_switch='$subquery_optimization=off'