SlideShare a Scribd company logo
1 of 41
Download to read offline
Using histograms
to get better performance
Sergei Petrunia
Varun Gupta
Database performance
● Performance is a product of many
factors
● One of them is Query optimizer
● It produces query plans
– A “good” query plan only
reads rows that contribute to
the query result
– A “bad” query plan means
unnecessary work is done
Do my queries use bad query plans?
● Queries take a long time
● Some are just inherently hard to
compute
● Some look good but turn out bad
due to factors that were not
accounted for
Query plan cost depends on data statistics
select *
from
lineitem, orders
where
o_orderkey=l_orderkey and
o_orderdate between '1990-01-01' and '1998-12-06' and
l_extendedprice > 1000000
● orders->lineitem
vs
lineitem->orders
● Depends on
condition selectivity
Another choice optimizer has to make
select *
from
orders
where
o_orderstatus='F'
order by
order_date
limit 10
● Use index(order_date)
– Stop as soon as we find 10 matches
● Find rows with o_orderstatus='F'
– Sort by o_orderdate picking first 10
● Again, it depends on condition
selectivity.
Data statistics in MariaDB
● Table: #rows in the table
● Index
– cardinality: AVG(#lineitems per order)
– “range estimates” - #rows(t.key BETWEEN const1 and
const2)
● Non-index column? Histogram
Histogram
● Partition the value space into buckets
– Store bucket bounds and #values in the bucket
– Imprecise
– Very compact
Summary so far
● Good database performance requires good query plans
● To pick those, optimizer needs statistics about the data
– Condition selectivity is important
● Certain kinds of statistics are always available
– Indexes
– For non-indexed columns, histograms may be needed.
Do my query plans suffer
from bad statistics?
Will my queries benefit?
● Very complex question
● No definite answer
● Suggestions
– ANALYZE for statements, r_filtered.
– Slow query log
ANALYZE for statements and r_filtered
● filtered – % of rows left after applying condition (expectation)
– r_filtered - ... - the reality
● r_filtered << filtered – the optimizer didn’t know the condition is selective
– Happens on a non-first table? We are filtering out late!
●
Add histogram on the column (Check the cond in FORMAT=JSON)
analyze select *
from lineitem, orders
where o_orderkey=l_orderkey and
o_orderdate between '1990-01-01' and '1998-12-06' and
l_extendedprice > 1000000
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra |
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
|1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where|
|1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where|
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
# Query_time: 1.961549 Lock_time: 0.011164 Rows_sent: 1 Rows_examined: 11745000
# Rows_affected: 0 Bytes_sent: 73
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
#
# explain: id select_type table type possible_keys key key_len ref rows r_rows
filtered r_filtered Extra
# explain: 1 SIMPLE inventory ALL NULL NULL NULL NULL 11837024
11745000.00 100.00 0.00 Using where
#
SET timestamp=1551155484;
select count(inv_date_sk) from inventory where inv_quantity_on_hand>10000;
Slow Query Log
slow-query-log
long-query-time=...
log-slow-verbosity=query_plan,explain
my.cnf
hostname-slow.log
● Rows_examined >> Rows_sent? Grouping,or a poor query plan
● log_slow_query=explain will shows ANALYZE output
Histograms in MariaDB
Histograms in MariaDB
● Available since MariaDB 10.0 (Yes)
● Used by advanced users
● Not enabled by default
● Have limitations, not user-friendly
● MariaDB 10.4
– Fixes some of the limitations
– Makes histograms easier to use
Collecting histograms
Configuration for collecting histograms
histogram_size=0
histogram_type=SINGLE_PREC_HB
histogram_size=254
histogram_type=DOUBLE_PREC_HB
● MariaDB before 10.4: change the default histogram size
● MariaDB 10.4 : enable automatic sampling
histogram_size=254
histogram_type=DOUBLE_PREC_HB
analyze_sample_percentage=100
analyze_sample_percentage=0
Histograms are [still] not collected by default
● “ANALYZE TABLE” will not collect a histogram
MariaDB> analyze table t1;
+---------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status | OK |
+---------+---------+----------+----------+
● This will collect only
– Total #rows in table
– Index cardinalities (#different values)
ANALYZE ... PERSISTENT collects histograms
– Collect statistics for everything:
analyze table t1 persistent
for columns (col1,...) indexes (idx1,...);
+---------+---------+----------+-----------------------------------------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+-----------------------------------------+
| test.t1 | analyze | status | Engine-independent statistics collected |
| test.t1 | analyze | status | OK |
+---------+---------+----------+-----------------------------------------+
analyze table t1 persistent for all;
Can make histogram collection automatic
set use_stat_tables='preferably';
analyze table t1;
+---------+---------+----------+-----------------------------------------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+-----------------------------------------+
| test.t1 | analyze | status | Engine-independent statistics collected |
| test.t1 | analyze | status | OK |
+---------+---------+----------+-----------------------------------------+
● Beware: this may be *much* slower than ANALYZE TABLE
you’re used to
● Great for migrations
Histogram collection performance
● MariaDB 10.0: uses all data in the table to build histogram
– Precise, but expensive
– Particularly so for VARCHARs
● A test on a real table:
– Real table, 740M rows, 90GB
– CHECKSUM TABLE: 5 min
– ANALYZE TABLE ... PERSISTENT FOR ALL – 30 min
MariaDB 10.4: Bernoulli sampling
● Default: analyze_sample_percentage=100
– Uses the entire table, slow
● Suggested: analyze_sample_percentage=0
– “Roll the dice” sampling, size picked automatically
analyze table t1 persistent for columns (...) indexes();
analyze table t1 persistent for all;
– full table and secondary index scans
– does a full table scan
Further plans: genuine sampling
● Work on avoiding full table scans is in progress
● Will allow to make ANALYZE TABLE collect all histograms
Making the optimizer
use histograms
Make the optimizer use histograms
@@use_stat_tables=NEVER
@@optimizer_use_condition_selectivity=1
@@use_stat_tables=PREFERABLY // also affects ANALYZE!
@@optimizer_use_condition_selectivity=4
● MariaDB before 10.4: does not use histograms
● MariaDB 10.4 : uses histograms if they are collected
@@use_stat_tables=PREFERABLY_FOR_QUERIES
@@optimizer_use_condition_selectivity=4
– remember to re-collect!
Conclusions: how to start using histograms
● MariaDB before 10.4
analyze_sample_percentage=0
use_stat_tables=PREFERABLY # Changes optimizer
optimizer_use_condition_selectivity=4 # behavior
● MariaDB 10.4
● Both: ANALYZE TABLE ... PERSISTENT FOR ...
histogram_size=254 # No risk
histogram_type=DOUBLE_PREC_HB #
Can I just have histograms
for all columns?
A stored procedure to analyze every table
CREATE PROCEDURE analyze_persistent_for_all(db_name VARCHAR(64))
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE x VARCHAR(64);
DECLARE cur1 CURSOR FOR
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_SCHEMA=db_name;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO x;
IF done THEN
LEAVE read_loop;
END IF;
SET @sql = CONCAT('analyze table ', x, ' persistent for all');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END LOOP;
CLOSE cur1;
END|
Should I ANALYZE ... PERSISTENT every table?
● New application
– Worth giving it a try
– Provision for periodic ANALYZE
– Column correlations?
● Existing application
– Performance fixes on a case-by-case basis.
Tests and benchmarks
TPC-DS benchmark
● scale=1
● The same dataset
– without histograms: ~20 min
– after ‘call analyze_persistent_for_all(‘tpcds’) from two slides
prior: 5 min.
TPC-DS benchmark run
A customer case with ORDER BY ... LIMIT
● table/column names replaced
CREATE TABLE cars (
type varchar(10),
company varchar(20),
model varchar(20),
quantity int,
KEY quantity (quantity),
KEY type (type)
);
select * from cars
where
type='electric' and
company='audi'
order by
quantity
limit 3;
● table/column names replaced
● quantity matches the ORDER BY, but need to match condition
● type is a Restrictive index
A customer case with ORDER BY ... LIMIT
● Uses ORDER-BY compatible index by default
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: cars
type: index
possible_keys: type
key: quantity
key_len: 5
ref: const
rows: 994266
r_rows: 700706.00
filtered: 0.20
r_filtered: 0.00
Extra: Using where
1 row in set (2.098 sec)
select * from cars
where
type='electric' and
company='audi'
order by
quantity
limit 3;
A customer case with ORDER BY ... LIMIT
● Providing the optimizer with histogram
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: cars
type: ref
possible_keys: type
key: type
key_len: 13
ref: const
rows: 2022
r_rows: 3.00
filtered: 100.00
r_filtered: 100.00
Extra: Using index condition; Using where; Using filesort
1 row in set (0.010 sec)
analyze table cars persistent for all;
select * from cars
where
type='electric' and
company='audi'
order by
quantity
limit 3;
Operations
Histograms are stored in a table
CREATE TABLE mysql.column_stats (
db_name varchar(64) NOT NULL,
table_name varchar(64) NOT NULL,
column_name varchar(64) NOT NULL,
min_value varbinary(255) DEFAULT NULL,
max_value varbinary(255) DEFAULT NULL,
nulls_ratio decimal(12,4) DEFAULT NULL,
avg_length decimal(12,4) DEFAULT NULL,
avg_frequency decimal(12,4) DEFAULT NULL,
hist_size tinyint unsigned,
hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB'),
histogram varbinary(255),
PRIMARY KEY (db_name,table_name,column_name)
);
TPC-DS benchmark
● Can save/restore histograms
● Can set @@optimizer_use_condition_selectivity to disable
histogram use per-thread
Caveat: correlations
Problem with correlated conditions
● Possible selectivities
– MIN(1/n, 1/m)
– (1/n) * (1/m)
– 0
select ...
from order_items
where shipdate='2015-12-15' AND item_name='christmas light'
'swimsuit'
Problem with correlated conditions
● PostgreSQL: Multi-variate statistics
– Detects functional dependencies, col1=F(col2)
– Only used for equality predicates
– Also #DISTINCT(a,b)
● MariaDB: MDEV-11107: Use table check constraints in optimizer
– In development
select ...
from order_items
where shipdate='2015-12-15' AND item_name='christmas light'
'swimsuit'
Thanks!

More Related Content

What's hot

PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres MonitoringDenish Patel
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesHans-Jürgen Schönig
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQLGeorgi Sotirov
 
PostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetPostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetLucian Oprea
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesSergey Petrunya
 
MySql Practical Partitioning
MySql Practical PartitioningMySql Practical Partitioning
MySql Practical PartitioningAndrei Tsibets
 
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).Alexey Lesovsky
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuningelliando dias
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slidesmetsarin
 
PostgreSQL Tutorial For Beginners | Edureka
PostgreSQL Tutorial For Beginners | EdurekaPostgreSQL Tutorial For Beginners | Edureka
PostgreSQL Tutorial For Beginners | EdurekaEdureka!
 
Partitioning tables and indexing them
Partitioning tables and indexing them Partitioning tables and indexing them
Partitioning tables and indexing them Hemant K Chitale
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for HumansCitus Data
 
Query Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQLQuery Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQLChristian Antognini
 

What's hot (20)

PostgreSQL: Advanced indexing
PostgreSQL: Advanced indexingPostgreSQL: Advanced indexing
PostgreSQL: Advanced indexing
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tables
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQL
 
PostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetPostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_Cheatsheet
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixes
 
MySql Practical Partitioning
MySql Practical PartitioningMySql Practical Partitioning
MySql Practical Partitioning
 
The PostgreSQL Query Planner
The PostgreSQL Query PlannerThe PostgreSQL Query Planner
The PostgreSQL Query Planner
 
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
 
MYSQL
MYSQLMYSQL
MYSQL
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
PostgreSQL Tutorial For Beginners | Edureka
PostgreSQL Tutorial For Beginners | EdurekaPostgreSQL Tutorial For Beginners | Edureka
PostgreSQL Tutorial For Beginners | Edureka
 
Partitioning tables and indexing them
Partitioning tables and indexing them Partitioning tables and indexing them
Partitioning tables and indexing them
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for Humans
 
Query Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQLQuery Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQL
 

Similar to Using histograms to optimize database queries

MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015Dave Stokes
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015Dave Stokes
 
MariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsMariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsSergey Petrunya
 
Adaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAdaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAnju Garg
 
A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013Connor McDonald
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query OptimizationAnju Garg
 
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013Sergey Petrunya
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cMauro Pagano
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL IndexingMYXPLAIN
 
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...Sergey Petrunya
 
SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所Hiroshi Sekiguchi
 
Histograms in 12c era
Histograms in 12c eraHistograms in 12c era
Histograms in 12c eraMauro Pagano
 
PostgreSQL 9.5 Features
PostgreSQL 9.5 FeaturesPostgreSQL 9.5 Features
PostgreSQL 9.5 FeaturesSaiful
 
Shaping Optimizer's Search Space
Shaping Optimizer's Search SpaceShaping Optimizer's Search Space
Shaping Optimizer's Search SpaceGerger
 
New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12Sergey Petrunya
 
Advanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and AnalysisAdvanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and AnalysisMYXPLAIN
 

Similar to Using histograms to optimize database queries (20)

MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
 
MariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsMariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histograms
 
Adaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAdaptive Query Optimization in 12c
Adaptive Query Optimization in 12c
 
A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12c
 
Pro PostgreSQL
Pro PostgreSQLPro PostgreSQL
Pro PostgreSQL
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL Indexing
 
MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
 
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
 
SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所
 
Histograms in 12c era
Histograms in 12c eraHistograms in 12c era
Histograms in 12c era
 
PostgreSQL 9.5 Features
PostgreSQL 9.5 FeaturesPostgreSQL 9.5 Features
PostgreSQL 9.5 Features
 
Shaping Optimizer's Search Space
Shaping Optimizer's Search SpaceShaping Optimizer's Search Space
Shaping Optimizer's Search Space
 
sqltuningcardinality1(1).ppt
sqltuningcardinality1(1).pptsqltuningcardinality1(1).ppt
sqltuningcardinality1(1).ppt
 
New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
MariaDB Temporal Tables
MariaDB Temporal TablesMariaDB Temporal Tables
MariaDB Temporal Tables
 
Advanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and AnalysisAdvanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and Analysis
 

More from Sergey Petrunya

Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Sergey Petrunya
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesSergey Petrunya
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureSergey Petrunya
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace WalkthroughSergey Petrunya
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemSergey Petrunya
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesSergey Petrunya
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что новогоSergey Petrunya
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeSergey Petrunya
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Sergey Petrunya
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkSergey Petrunya
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standSergey Petrunya
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18Sergey Petrunya
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3Sergey Petrunya
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Sergey Petrunya
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howSergey Petrunya
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBSergey Petrunya
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Sergey Petrunya
 
MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.Sergey Petrunya
 

More from Sergey Petrunya (20)

Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimates
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger picture
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gem
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что нового
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit hole
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3
 
MyRocks in MariaDB
MyRocks in MariaDBMyRocks in MariaDB
MyRocks in MariaDB
 
Say Hello to MyRocks
Say Hello to MyRocksSay Hello to MyRocks
Say Hello to MyRocks
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and how
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDB
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
 
MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.MariaDB 10.1 - что нового.
MariaDB 10.1 - что нового.
 

Recently uploaded

Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 

Using histograms to optimize database queries

  • 1. Using histograms to get better performance Sergei Petrunia Varun Gupta
  • 2. Database performance ● Performance is a product of many factors ● One of them is Query optimizer ● It produces query plans – A “good” query plan only reads rows that contribute to the query result – A “bad” query plan means unnecessary work is done
  • 3. Do my queries use bad query plans? ● Queries take a long time ● Some are just inherently hard to compute ● Some look good but turn out bad due to factors that were not accounted for
  • 4. Query plan cost depends on data statistics select * from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000 ● orders->lineitem vs lineitem->orders ● Depends on condition selectivity
  • 5. Another choice optimizer has to make select * from orders where o_orderstatus='F' order by order_date limit 10 ● Use index(order_date) – Stop as soon as we find 10 matches ● Find rows with o_orderstatus='F' – Sort by o_orderdate picking first 10 ● Again, it depends on condition selectivity.
  • 6. Data statistics in MariaDB ● Table: #rows in the table ● Index – cardinality: AVG(#lineitems per order) – “range estimates” - #rows(t.key BETWEEN const1 and const2) ● Non-index column? Histogram
  • 7. Histogram ● Partition the value space into buckets – Store bucket bounds and #values in the bucket – Imprecise – Very compact
  • 8. Summary so far ● Good database performance requires good query plans ● To pick those, optimizer needs statistics about the data – Condition selectivity is important ● Certain kinds of statistics are always available – Indexes – For non-indexed columns, histograms may be needed.
  • 9. Do my query plans suffer from bad statistics?
  • 10. Will my queries benefit? ● Very complex question ● No definite answer ● Suggestions – ANALYZE for statements, r_filtered. – Slow query log
  • 11. ANALYZE for statements and r_filtered ● filtered – % of rows left after applying condition (expectation) – r_filtered - ... - the reality ● r_filtered << filtered – the optimizer didn’t know the condition is selective – Happens on a non-first table? We are filtering out late! ● Add histogram on the column (Check the cond in FORMAT=JSON) analyze select * from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000 +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |r_rows |filtered|r_filtered|Extra | +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+ |1 |SIMPLE |orders |ALL |PRIMARY,i_...|NULL |NULL |NULL |1504278|1500000| 50.00 | 100.00 |Using where| |1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 |4.00 | 100.00 | 0.00 |Using where| +--+-----------+--------+----+-------------+-------+-------+-----------------+-------+-------+--------+----------+-----------+
  • 12. # Query_time: 1.961549 Lock_time: 0.011164 Rows_sent: 1 Rows_examined: 11745000 # Rows_affected: 0 Bytes_sent: 73 # Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 Priority_queue: No # # explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra # explain: 1 SIMPLE inventory ALL NULL NULL NULL NULL 11837024 11745000.00 100.00 0.00 Using where # SET timestamp=1551155484; select count(inv_date_sk) from inventory where inv_quantity_on_hand>10000; Slow Query Log slow-query-log long-query-time=... log-slow-verbosity=query_plan,explain my.cnf hostname-slow.log ● Rows_examined >> Rows_sent? Grouping,or a poor query plan ● log_slow_query=explain will shows ANALYZE output
  • 14. Histograms in MariaDB ● Available since MariaDB 10.0 (Yes) ● Used by advanced users ● Not enabled by default ● Have limitations, not user-friendly ● MariaDB 10.4 – Fixes some of the limitations – Makes histograms easier to use
  • 16. Configuration for collecting histograms histogram_size=0 histogram_type=SINGLE_PREC_HB histogram_size=254 histogram_type=DOUBLE_PREC_HB ● MariaDB before 10.4: change the default histogram size ● MariaDB 10.4 : enable automatic sampling histogram_size=254 histogram_type=DOUBLE_PREC_HB analyze_sample_percentage=100 analyze_sample_percentage=0
  • 17. Histograms are [still] not collected by default ● “ANALYZE TABLE” will not collect a histogram MariaDB> analyze table t1; +---------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+----------+ | test.t1 | analyze | status | OK | +---------+---------+----------+----------+ ● This will collect only – Total #rows in table – Index cardinalities (#different values)
  • 18. ANALYZE ... PERSISTENT collects histograms – Collect statistics for everything: analyze table t1 persistent for columns (col1,...) indexes (idx1,...); +---------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +---------+---------+----------+-----------------------------------------+ analyze table t1 persistent for all;
  • 19. Can make histogram collection automatic set use_stat_tables='preferably'; analyze table t1; +---------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +---------+---------+----------+-----------------------------------------+ ● Beware: this may be *much* slower than ANALYZE TABLE you’re used to ● Great for migrations
  • 20. Histogram collection performance ● MariaDB 10.0: uses all data in the table to build histogram – Precise, but expensive – Particularly so for VARCHARs ● A test on a real table: – Real table, 740M rows, 90GB – CHECKSUM TABLE: 5 min – ANALYZE TABLE ... PERSISTENT FOR ALL – 30 min
  • 21. MariaDB 10.4: Bernoulli sampling ● Default: analyze_sample_percentage=100 – Uses the entire table, slow ● Suggested: analyze_sample_percentage=0 – “Roll the dice” sampling, size picked automatically analyze table t1 persistent for columns (...) indexes(); analyze table t1 persistent for all; – full table and secondary index scans – does a full table scan
  • 22. Further plans: genuine sampling ● Work on avoiding full table scans is in progress ● Will allow to make ANALYZE TABLE collect all histograms
  • 24. Make the optimizer use histograms @@use_stat_tables=NEVER @@optimizer_use_condition_selectivity=1 @@use_stat_tables=PREFERABLY // also affects ANALYZE! @@optimizer_use_condition_selectivity=4 ● MariaDB before 10.4: does not use histograms ● MariaDB 10.4 : uses histograms if they are collected @@use_stat_tables=PREFERABLY_FOR_QUERIES @@optimizer_use_condition_selectivity=4 – remember to re-collect!
  • 25. Conclusions: how to start using histograms ● MariaDB before 10.4 analyze_sample_percentage=0 use_stat_tables=PREFERABLY # Changes optimizer optimizer_use_condition_selectivity=4 # behavior ● MariaDB 10.4 ● Both: ANALYZE TABLE ... PERSISTENT FOR ... histogram_size=254 # No risk histogram_type=DOUBLE_PREC_HB #
  • 26. Can I just have histograms for all columns?
  • 27. A stored procedure to analyze every table CREATE PROCEDURE analyze_persistent_for_all(db_name VARCHAR(64)) BEGIN DECLARE done INT DEFAULT FALSE; DECLARE x VARCHAR(64); DECLARE cur1 CURSOR FOR SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_SCHEMA=db_name; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE; OPEN cur1; read_loop: LOOP FETCH cur1 INTO x; IF done THEN LEAVE read_loop; END IF; SET @sql = CONCAT('analyze table ', x, ' persistent for all'); PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt; END LOOP; CLOSE cur1; END|
  • 28. Should I ANALYZE ... PERSISTENT every table? ● New application – Worth giving it a try – Provision for periodic ANALYZE – Column correlations? ● Existing application – Performance fixes on a case-by-case basis.
  • 30. TPC-DS benchmark ● scale=1 ● The same dataset – without histograms: ~20 min – after ‘call analyze_persistent_for_all(‘tpcds’) from two slides prior: 5 min.
  • 32. A customer case with ORDER BY ... LIMIT ● table/column names replaced CREATE TABLE cars ( type varchar(10), company varchar(20), model varchar(20), quantity int, KEY quantity (quantity), KEY type (type) ); select * from cars where type='electric' and company='audi' order by quantity limit 3; ● table/column names replaced ● quantity matches the ORDER BY, but need to match condition ● type is a Restrictive index
  • 33. A customer case with ORDER BY ... LIMIT ● Uses ORDER-BY compatible index by default *************************** 1. row *************************** id: 1 select_type: SIMPLE table: cars type: index possible_keys: type key: quantity key_len: 5 ref: const rows: 994266 r_rows: 700706.00 filtered: 0.20 r_filtered: 0.00 Extra: Using where 1 row in set (2.098 sec) select * from cars where type='electric' and company='audi' order by quantity limit 3;
  • 34. A customer case with ORDER BY ... LIMIT ● Providing the optimizer with histogram *************************** 1. row *************************** id: 1 select_type: SIMPLE table: cars type: ref possible_keys: type key: type key_len: 13 ref: const rows: 2022 r_rows: 3.00 filtered: 100.00 r_filtered: 100.00 Extra: Using index condition; Using where; Using filesort 1 row in set (0.010 sec) analyze table cars persistent for all; select * from cars where type='electric' and company='audi' order by quantity limit 3;
  • 36. Histograms are stored in a table CREATE TABLE mysql.column_stats ( db_name varchar(64) NOT NULL, table_name varchar(64) NOT NULL, column_name varchar(64) NOT NULL, min_value varbinary(255) DEFAULT NULL, max_value varbinary(255) DEFAULT NULL, nulls_ratio decimal(12,4) DEFAULT NULL, avg_length decimal(12,4) DEFAULT NULL, avg_frequency decimal(12,4) DEFAULT NULL, hist_size tinyint unsigned, hist_type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB'), histogram varbinary(255), PRIMARY KEY (db_name,table_name,column_name) );
  • 37. TPC-DS benchmark ● Can save/restore histograms ● Can set @@optimizer_use_condition_selectivity to disable histogram use per-thread
  • 39. Problem with correlated conditions ● Possible selectivities – MIN(1/n, 1/m) – (1/n) * (1/m) – 0 select ... from order_items where shipdate='2015-12-15' AND item_name='christmas light' 'swimsuit'
  • 40. Problem with correlated conditions ● PostgreSQL: Multi-variate statistics – Detects functional dependencies, col1=F(col2) – Only used for equality predicates – Also #DISTINCT(a,b) ● MariaDB: MDEV-11107: Use table check constraints in optimizer – In development select ... from order_items where shipdate='2015-12-15' AND item_name='christmas light' 'swimsuit'