MySQL Performance Tuning and Optimization

Who Am I?
• Jonathan
• MySQL Consultant
• Working with MySQL since 2007
• Specialize in SQL, Indexing and
Reporting (Big Data)

Who is this for?

* Smilies indicate ability to control area

What Will I Cover?

Domain Knowledge

This Much

0% 20% 100%

Solutions this
tutorial will cover
Occurrences

Problems

serenesimplycomplicated.blogspot.co.uk/2012/07/developing-direction.html

What Will I Cover?

• The top 20% of the strategies to resolve 80% of
your performance problems
• The strategies that are within reach of
developers
• Strategies that are more common and more
established
• From my experience.
• To reduce risk

Table of Contents

Part One Part Three
• Indexes • Read Cache
• Finding • Scaling Reads
Bottlenecks • Reporting
Part Two • Write Buffers
• Partitioning • Scaling Writes
• Intensive Table • Sharding
Optimization

Indexes
• Advantages
• Speed – Use the right path
• Now used in NoSQL stores
• “A properly indexed database will give
you very few problems” – me
• My blog – “Indexing and Caching”

B-Tree Indexes

(http://20bits.com/article/interview-questions-database-indexes)

(http://www.youtube.com/watch?v=coRJrcIYbF4)

Choosing the best Index

Prevent Reading
from Disk

*3

*2
Prevent Extra
*1 Processing

Prevent Table Scans

Indexes
• 1 Star – EXPLAIN
• Type
• const - where id=1
• ref - where location='london'
• eq_ref - where t1.id = t2.id
• Extra
• using where
• Limitation – type
• range - where id in (1,2,3,4,5)

Indexes
• Extra
• using where
• Using index
• And Not
• Using filesort
• Using temporary
• Limitation
• Using temporary - query contains different GROUP
BY and ORDER BY columns

Indexes
• Type
• index
• Extra
• Using index
• Using index for group-by

Regular Usage

SELECT , FROM ₸
WHERE = 121;

(PRIMARY KEY )

Range Scan

SELECT ▲, ■ FROM ₸
WHERE = 121
AND BETWEEN 1 AND 100

KEY ( , )

Range Scan

WHERE = 121
AND IN (1,100,30,7)

KEY ( , )

Covering Index

SELECT , FROM ₸
WHERE = 121

KEY ( , , )

Not Optimal

SELECT FROM ₸
WHERE = 121
AND ♪ IN (1,100,30,7)

KEY ( , )

Broken Range

WHERE = 121
AND IN (1,100,30,7);

KEY ( , )

Sub Queries

WHERE IN (SELECT ♪ FROM ♠)

KEY ( )

Indexes for Sorting

SELECT ☼, ☺ FROM ₸
WHERE = 121
GROUP BY
ORDER BY

KEY ( )

Indexes for Sorting

SELECT ☼, ☺ FROM ₸
WHERE = 121
GROUP BY
ORDER BY

KEY ( , )
or KEY ( , )

Indexes for Joins

SELECT . , . FROM
INNER JOIN ON . = .
WHERE . = 232

KEY ( )
or KEY( )

WHERE ₸.Ω = 232; INNER JOIN ♫

FULL SCAN ♫ INNER JOIN ₸

FILTER Ω = 232

SELECT ▲, (SELECT .. FROM WHERE )
FROM ₸ WHERE ♪ IN
(SELECT ♪ FROM ♠ WHERE..);

“I need help optimizing the my.cnf”

Clustered PK and
Secondary Indexes

KEY ( , , )

PRIMARY KEY ( )
KEY ( , )

(Can be used in GROUP / ORDER BY
SELECT variables to make Covering index)
http://www.dbasquare.com/2012/05/17/can-mysql-use-primary-key-values-from-a-secondary-index/

Index Merge

SELECT ☺ FROM ₸
WHERE =1 OR =2;

KEY ( )
KEY ( )

Index Merge

SELECT☺FROM ₸ WHERE =1
UNION (SELECT☺FROM ₸ WHERE =2)

KEY ( )
KEY ( )
* 5.6

Gathering Data

What you will need:
1. MySQL Slow log: MySQL >= 5.1 or
Percona/MariaDB microslow patch
2. Set long_query_time = 0 - for 6 to 24 hours
for decent sized slow log. (Make sure host
has enough space)

Log Processing
Worst Response Queries
1. Echo ‘’ > slow.log
2. mysql> set global long_query_time=0; set
long_query_time = 0; flush logs;
3. Wait X hours and return original value.
4. pt-query-digest slow.log > slow.txt
Processing should be
Bulky Queries done on another host
--filter ‘($event->{Rows_examined} > 1000)’
Write Queries
--filter '($event->{Rows_affected} > 0)‘

Log Processing
• MySQL 5.6
• Statement Digest
• No need for log processing to get Digest.

pt-query-digest slow.log > slow.txt

Rank Response time Calls R/Call Item

SELECT dp_node dp_usernode
1 480.9273 16.3% 600 0.8015 dp_buddylist dp_users
dp_node_access

2 322.4220 4.3% 129258 0.0025 ADMIN INIT DB
3 314.8719 4.2% 30220 0.0104 UPDATE dp_users
4 287.7109 3.8% 51606 0.0056 SET
SELECT dp_node dp_usernode
5 269.3434 3.6% 600 0.4489 dp_buddylist dp_users
dp_node_access
6 238.8571 3.1% 2902141 0.0001 SELECT dp_url_alias

mysql tables in use 4, locked 2
5289 lock struct(s), heap size 620984, 273785 row lock(s), undo log entries
363312
MySQL thread id 467, OS thread handle 0x7fceab7df700, query id 88914423
SELECT r.exchange_rate INTO destination_exchange_rate
FROM exchange_rate AS r WHERE r.currency_id =
NAME_CONST('destination_currency_id',6) AND r.date =
NAME_CONST('day',_latin1'2012-06-30' COLLATE
'latin1_swedish_ci')
Trx read view will not see trx with id >= 10ECC92B0, sees < 10ECC916F
TABLE LOCK table `currency` trx id 10ECC90CC lock mode IS
RECORD LOCKS space id 0 page no 261 n bits 80 index `fk_currency_status1` of
table `currency` trx id 10ECC90CC lock mode S
TABLE LOCK table `daily_summary` trx id 10ECC90CC lock mode IS
RECORD LOCKS space id 0 page no 34829580 n bits 200 index `PRIMARY` of table
`daily_summary` trx id 10ECC90CC lock mode S
TABLE LOCK table `exchange_rate` trx id 10ECC90CC lock mode IS
TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS

*** (1) TRANSACTION:
TRANSACTION 13DCDF4D9, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1248, 2 row lock(s)
MySQL thread id 2438176, OS thread handle 0x7f9a37408700, query id
118341815748
UPDATE sys_doctrine_lock_tracking
SET timestamp_obtained = '1341839053'
WHERE object_key = '1146' AND user_ident = '158' AND c_type = '137'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 48627 n bits 280 index `PRIMARY` of table
`sys_doctrine_lock_tracking` trx id 13DCDF4D9 lock_mode X locks rec but not
gap waiting
Record lock, heap no 207 PHYSICAL RECORD: n_fields 6; compact format;
0: len 8; hex 43616d706169676e;
1: len 4; hex 31313436; asc 1146;;
2: len 6; hex 00013dc12d7a; asc = -z;;
…

Bottlenecks
• Locking Queries
• Try to make them complete as fast as possible
• JOINs vs sub query
• Function wrapped around index
• UDF with SQL inside
• Long Transactions
• Looping with short queries

Misbehaving Optimizer
• Optimizer Hints:
• USE INDEX
• FORCE INDEX
• IGNORE INDEX
• STRIGHT_JOIN
• Joins
• LEFT sometimes faster then INNER

* Too many indexes confuse the optimizer

Bottlenecks
• Virtualization
• Increase (obscenely) innodb_log_file_size
• Needs restart + deleting old log files
• EXT3
• General I/O improvements
• innodb-flush-log-at-trx-commit
• Sync binlog
Group Commit
• Xa support

Bottlenecks
• General I/O improvements
• Percona server
• Better flushing to disk
• Less mutexes
• Upgrade MySQL
• Same reasons as above
• Innodb-io-capacity
http://www.wmarow.com/strcalc/

Database Upgrades
• My Secret Sauce for smooth MySQL
Migrations
1. Upgrade the dev/staging DBs with desired
version
2. Wait 1-2 months till silky-smooth
• All features have been tested
• All the query issues have been fixed
3. Upgrade servers
• Down to Up – Slaves first

Bottlenecks
• Mutexes
• Query Cache
• “Freeing items” in processlist
• Network
Batch Processes
• Skip-name-resolve
• net_write_timeout / read_timeout
• thread_cache_size
Lots of connections

Homework
• Haven’t talked fully about:
SQL and EXPLAIN
• Webinars
• Indexes - percona.tv/percona-webinars/tools-and-
techniques-for-index-design
• Explain - percona.tv/percona-webinars/explain-demystified
• Websites
• http://www.myxplain.net

Columns

Rows
Big Table

Columns

Partition

Partition

Partition

Partition
Algorithm
Partition Rows

Partition

Partition

Partition

Partition

Select
Algorithm Inserting
Parallelize Data
Unique Key Algorithm

Reduce Data
Foreign Keys
Issues
Use Cases
Primary Key
Overhead
Partitioning
ID

Benefits Automatic
Time
B-Tree
Levels Manual
Hash

Short Scans Time 80-90%
Archive
DB Usage
Shards

Partitioning Parallelize Data

Reduce Data

• Use Cases Use Cases

• Reducing Data – Only get the partition/table
that you need
• Parallelizing Data - Get an equal amount of
data from each partition in parallel
• Benefits
Benefits
• Shorter table scans
B-Tree
• Less levels for index scans Levels

Short Scans

Issues
• Algorithm Select
Inserting
Algorithm
• INSERTs Algorithm

• SELECTs
Issues
• Keys Unique Key

• Foreign Key
Foreign Keys
• Unique Key Primary Key
Overhead
• Primary Key Overhead
• Increases Table size
• Can change indexes

Partition Types
• Range
• List ID

• Hash
Automatic
Time
• Key
• Columns Hash

• Sub-partitioning

Partitioning by Usage
vs Partitioning by Maintenance

Rank Response time Calls Item

SELECT address FROM WHERE
1 480.9273 26.3% 129258 BETWEEN ‘2012-11-10’ and ‘2012-11-
17’
SELECT total FROM WHERE
2 322.4220 14.3% 600
BETWEEN ‘2012-11-01’ and ‘2012-11-30’
UPDATE SET active=1 WHERE =
3 34.8719 4.2% 30220
17635376

SELECT dispatch_time FROM
4 28.7109 3.8% 51606
WHERE = 7387612

CREATE TABLE orders
id int unsigned not null auto_increment,
`date` date not null,
…
Can also do
PRIMARY KEY (date, id),
PRIMARY KEY (id, date)
KEY id (id), KEY date (date)
..) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY ( (date))
(PARTITION VALUES LESS THAN ( (‘2012-01-01’),
PARTITION VALUES LESS THAN ( (‘2012-06-01’)),
PARTITION VALUES LESS THAN )

Manual Partitioning
• Archive - Main table & Archive Table
• Time – Create table per year, per month..
• Shards – Table per country

* Foreign keys Manual

Time
Archive

Shards

Intensive Table Optimization

Once upon a time, I was researching
ways to make a database working set fit as
much as possible to memory…

mysqlperformanceblog.com/2010/04/08/fast-ssd-or-more-memory/

1. People are usually very liberal with data type
sizes
2. There were (usually) so many indexes that:
1. They multiplied the table size
2. Were not efficient compares to how the table is
used
3. Confused the optimizer
3. Discovered partitions were not used or misused
1. Discovered sub partitions
2. Primary Key alignment
4. Discovered InnoDB compression


There are tools to help with this, but…


Data Types
Slowest Queries

Bottleneck
User Statistics Optimizations
Tables
Optional

Table Sizes
Query Logs

Partitions Indexes

Foreign Keys

Gathering Data

What you will need:
1. MySQL Slow log: MySQL >= 5.1 or
Percona/MariaDB microslow patch
2. Set long_query_time = 0 - for 6 to 24 hours
for decent sized slow log. (Make sure host
has enough space)
Slowest Queries

Bottleneck Tables

Gathering Data

Helpful (Optional):
1. Percona/Mariadb user_statistics patch
2. Get list of most read/written tables
3. Get list of used and un-used indexes
4. List of largest tables
Table Sizes

User Statistics

Bottleneck Tables

pt-query-digest slow.log > slow.txt


1 8589.9513 27.5% 231051 UPDATE dp_users
2 4752.6688 15.2% 257235 SELECT dp_cache_menu
3 1606.4946 5.1% 183542 SELECT community_chats
4 1418.9034 4.5% 259939 SELECT dp_cache
5 564.3305 1.8% 7970165 SELECT dp_url_alias
6 495.0092 1.6% 44940 SELECT dp_event dp_node

Table Statistics

SELECT table_name,
FROM information_schema.table_statistics
ORDER BY DESC LIMIT 5;
table_name rows_read
dp_users 2302477894
ROWS_CHANGED dp_node 1231318439
ROWS_CHANGED_X_INDEXES dp_comments 1071462211
dp_userpoints 1033073070
dp_search_index 260154684

Table Statistics

Worst Response Tables
--group-by tables

1 7975.4487 6.5% 124384 advertisement
2 5554.1435 4.5% 1834 info
3 4915.4816 4.0% 208 placement
4 4902.7644 4.0% 158 advert_summary

Table Sizes

Table_Name Rows Data Idx Total_size Idxfrac

total_daily_summary 610M 77G 88G 165G 1.15

advert_summary 478M 57G 45G 102G 0.78

log_messages 92M 47G 10G 57G 0.21
SELECT CONCAT(TABLE_SCHEMA, '.', TABLE_NAME) AS TABLE_NAME,
CONCAT(ROUND(TABLE_ROWS / 1000000, 2), 'M') ROWS,
CONCAT(ROUND(DATA_LENGTH / ( 1024 * 1024 * 1024 ), 2), 'G') DATA,
CONCAT(ROUND(INDEX_LENGTH / ( 1024 * 1024 * 1024 ), 2), 'G') IDX,
CONCAT(ROUND(( DATA_LENGTH + INDEX_LENGTH ) / ( 1024 * 1024 * 1024 ), 2), 'G') TOTAL_SIZE,
ROUND(INDEX_LENGTH / DATA_LENGTH, 2) IDXFRAC
FROM INFORMATION_SCHEMA.TABLES
ORDER BY DATA_LENGTH + INDEX_LENGTH DESC http://www.mysqlperformanceblog.com
LIMIT 10;
/2008/03/17/researching-your-mysql-table-sizes/

Which table needs
your attention?

Table Size

• Table Targeting
• The most “worthy” table to focus your
attention on
• Biggest bang for your buck
• If you know which table is the most
troublesome
• Ignore most of the investigations
• Apart from slow log
• Investigations help understand DB usage

Optimizations

Compression Data Types

Query Logs

Partitions Indexes

Sub Partitioning Don’t Need
Foreign Keys

• Datatypes
• SELECT * FROM table
G
• Example: Tinyint instead of Bigint:
(7 bytes row + 7bytes index) * 350million
rows = 4.9Gb
• Enum instead of Varchar
• Remove NULLs when not needed

• Compression
• Best for tables with a lot of varchar/text
• Compress table by x2, x4, x8..
• Need to experiment with innodb_strict = on;
• On my tests (5.5) – Very very slow
• Alter tables
• INSERTS/UPDATES/DELETES
Optimizations

Compression

1.
Slow Log
Filtered by
Get Data Target Table
Query Digest

New Results
EXPLAIN 3.

2. Index-Usage
Make Test
Assumptions Assumptions

4. Deploy

Target Table Processing
Filter Log:
pt-query-digest slow.log
--filter '$event->{arg} =~ m/dp_users /'
--no-report --print >dp_users.log

Worst Queries from new log:
pt-query-digest dp_users.log --limit 100%
>tbl_dp_users.txt

Response
Rank time Calls Item
209.2863 UPDATE dp_users SET access = 133******3 WHERE
1 88850
10.7% = 23****01G
162.2711
3 1309010 SELECT access FROM dp_users WHERE = 21***4G
8.3%
139.9009 SELECT uid, name FROM dp_users WHERE =1
4 197
7.1% ORDER BY DESCG
133.8691 SELECT * FROM dp_users u WHERE =
5 327
6.8% 's******s'G
SELECT name, created, picture FROM dp_users WHERE
109.6903
6 29152 picture !='' AND = '1' AND BETWEEN
5.6%
'133*****0' AND '133*****60'G
92.9095 SELECT dp_node dp_users using ( )
7 360642
4.7% dp_node_revisions
74.2426 SELECT * FROM dp_users u WHERE = hoa****rio'
8 106
3.8% AND = '3837********5f9b' AND = 1G

Partitioning by Usage


SELECT address FROM orders WHERE date
1 480.9273 26.3% 129258
BETWEEN ‘2012-11-10’ and ‘2012-11-17’

SELECT total FROM orders WHERE date
2 322.4220 14.3% 600
BETWEEN ‘2012-11-01’ and ‘2012-11-30’
UPDATE order SET active=1 WHERE id =
3 34.8719 4.2% 30220
17635376

SELECT dispatch_time FROM order WHERE
4 28.7109 3.8% 51606
id = 7387612

Testing Assumptions

SELECT uid, name FROM WHERE
= 1 ORDER BY DESCG
1.30secs

SELECT uid, name FROM
WHERE = 1 ORDER BY
DESCG
0.56secs Query Digest
EXPLAIN

Test Environment
• Hardware environment similar to live
• Data size similar to live environment:
• Replicating slave
• Cannot change datatypes on MIXED/ROW
replication
• Create table2 and run queries against it
• Xtrabackup – full replica New Results
• Script with Mysqldump + WHERE
• mysqldump --databases main --tables table1 table2 –where “date >
now() – interval 30 day” > dump.sql
• Mysqldump –all-database –ignore-table main.table1 main.table2 >>
dump.sql

Final Tweaking

(Remember the table log file – dp_users.log ?)
pt-index-usage
• pt-index-usage dp_users.log --host
127.0.0.1 --tables dp_users
>idx_dp_users.txt
• Go over recommendations
Index-Usage
Test
Assumptions

Deploy Strategies
1. Rolling Servers
2. pt-online-schema-change Deploy
3. Two-part move
a. Create new table – table2
b. Insert table rows that will not change – INSERT INTO
table2 SELECT * FROM table1 WHERE date <= curdate() – interval 30 day;
c. Short downtime
d. Rename table1 to table3; rename table2 to table1;
e. INSERT IGNORE INTO table1 SELECT * FROM table3 WHERE date >=
curdate() – interval 30 day;

4. Alter table – long downtime (pre 5.6, maybe)

Slow Log
Filtered by
Get Data Target Table
Query Digest

EXPLAIN
New Results

Index-Usage
Test
Make
Assumptions
Assumptions

Deploy

Streaming
Reverse File
Proxy Volatile

Browser Shield
Cache Query Cache
Page Cache

Denormalize
2nd Level
Summary Column
Tables Cache
3rd Level Subtotal

Attributes
Data Warehouse Conditional

Read Cache
• Outside the database
• Page Cache
• Query Cache

• Inside the database
• Column Cache
• Summary Table

* Complexity

Page Cache
• Browser Cache
• Etag, Expires, Last-modified
• Reverse Proxy
• Squid, Varnish, Nginx, Apache, Proprietary.
• File/Full page cache
• mod_file_cache, Zend_Cache_Backend
• W3 Total Cache, sfSuperCache
* Stale

memegenerator.net/instance/23247230

Query Cache

• Volatile
• Memcached, Redis, Hibernate Cache, Arrays..
• On-Request, Time-to-Live, Stale and Cache
Stampede
• Streaming
• Interval / Async, Stale, Common Queries
• Shield – Mongo Shield
• Script/Tool Replication, Dependency
• Aggregation
• Complexity / Layers

Mongo
Shield

147cm

img.photobucket.com/albums/v158/keris_hanuman/Afbeelding1455.jpg

Sticky Sticky

Sessions Sessions

Memcached Memcached
Cart

MySQL

Memcached
Sessions MySQL
Cart

Manipulating Time

* Error Handling

Column Cache
• Denormalize
• Additional Column(s) to prevent JOINs
• Maintenance, Space on disk
• Example: CustomerID, OrderID, OrderItemID
• Sub Total
• Prevent additional slow GROUP BY queries
• Maintenance, Generation, Space on disk
• Example: totalPurchases, moneyOwed
* Space vs Speed

Column Cache

• Conditional
• Store conditional (True/False) logic
• Prevents recalculating result – another query
• Can prevent rewriting code
• Example: isDone, hasReview, aboveAvg
• Attributes
• ENUM datatype
• SET datatype - ARRAY of options
• Prevents JOINs
• May save space

Summary Tables

An additional table which consists
of an aggregation of another table or
several JOIN’d tables.

Summary
Tables

SELECT ...
FROM main_table t1
INNER JOIN table2 t2 on t1.orderid = t2.id
INNER JOIN table3 t3 on t1.customerid = t3.id
INNER JOIN table4 t4 on t1.addressid = t4.id
INNER JOIN table5 t5 on t2.supplierid = t5.id
INNER JOIN table6 t6 on t2.warehouse = t6.id
INNER JOIN table8 t8 on t1.productid = t8.id
INNER JOIN table9 t9 on t1.buyerid = t9.id
INNER JOIN table10 t10 on t1.officeid = t10.id
WHERE
t1.date between '2012-11-01' and '2012-11-30'
GROUP BY t1.date

Summary Tables

Processed 1.2million rows
Returned 30 rows
Time 17.52 minutes

Summary Tables
CREATE TABLE summary_table
(primary key (date,addressid,productid)) as
SELECT ...
FROM main_table t1
INNER JOIN table2 t2 on t1.orderid = t2.id
INNER JOIN table3 t3 on t1.customerid =
t3.id
INNER JOIN table6 t6 on t2.warehouse = t6.id
GROUP BY t1.date, t1.addressid, t1.productid

Summary Tables

SELECT ...
FROM summary_table t1
INNER JOIN table5 t5 on t2.supplierid = t5.id
INNER JOIN table8 t8 on t1.productid = t8.id
INNER JOIN table9 t9 on t1.buyerid = t9.id
INNER JOIN table10 t10 on t1.officeid = t10.id
WHERE
t1.date between '2012-11-01' and '2012-11-30‘
GROUP BY t1.date

Summary Tables

Processed 35000 rows
Returned 30 rows
Time 0.75 seconds

Summary Tables
as an
Analytics Sub-System

Database Design
Operational System Analytic System
Comparison

Execution of a business Measurement of a
Purpose
process business process

Insert, Update, Query,
Primary Interaction Query
Delete

Design Optimization Update concurrency High-performance query

Entity-relationship (ER) Dimensional design (Star
Design Principle
3rd Normal form (3NF) schema or cube)

amazon.co.uk/Schema-Complete-Reference-Christopher-Adamson/

Data Warehouses

Fact Tables Dimension Tables

• Measurement • Context
• Narrow • Wide
• Long • Short
• Most of the
• Filters and
descriptive
data data

Operational Design
Customers

Orders

Order
Items

Addresses
Products

Star Schema
Date dim

Address dim

OrderItems
Fact

Products dim
Customers dim

Maintenance
• Hourly/Daily/Weekly/Monthly Aggregations
• Intervals
• Off Peak
• On-Insert

A lot more
InnoDB settings
Read Cache Buffer Pool

Indexes

Sharding Intensive
Scaling Reads Table
Optimization

Galera
Read Partitioning
Slaves
Better
Hardware Sub Partitioning

Another Read/Write
Master Splitting IO Memory

Scaling Reads
• InnoDB Buffer Pool
• Cache Warming
• Read buffer A lot more
settings
InnoDB
• Sort Buffer Buffer Pool

• Join Buffer
• Temp Table size / on disk

Scaling Reads
• Better Hardware
• Disk I/O
• Memory

Better
Hardware

IO Memory

Scaling Reads
• Read Slaves
• Read/Write Splitting
• Master/Master
• Galera

Server Architecture

MySQL

Columnar £££ A lot more
Store settings
InnoDB
Buffer Pool
Summary
Hadoop Tables Indexes

Intensive
OLAP Reporting Table
Cubes Optimization

Sharding Partitioning

Reporting
Cross Shard Better
Slaves
Joins Hardware Sub Partitioning

Different
Indexes IO Memory

Reporting
• Reporting Slaves
• Different Indexes
• No Foreign Keys
• Partitioning
• If ROW-replication:
• Must have same data types
Reporting
Slaves

Different
Indexes

Reporting
• Sharding
• Cross-shard JOINs
• Go Fish
• Aggregations Hadoop

Reporting
• Summary tables - aggregations
• Scripts
• Hadoop
• Ready –Made reports
• OLAP Cubes
• Columnar Store
• £££

* RDBMS very fast at GROUP BY

Battery A lot more
settings

RAID Innodb Log
Write-back File Size

Local Server Write Buffers
Storage

MySQL
ETL settings
Queue

Hadoop

Summarized
CRUSH HandCode
Writes

Write Buffers
• RAID card
• BBU
• Write-back vs write-through
• Battery Learning/Drain

Write Buffers
• Innodb log file size
• Buffer pool * dirty read (%) * io capacity

A lot more
settings

Innodb Log
* Virtual Environment File Size

Write Buffers
• innodb-flush-log-at-trx-commit
• Sync binlog
• Support xa

MySQL
settings

Write Buffers
• ActiveMQ, RabbitMQ, ZeroMQ, Gearman
• Not ACID, may need redundancy
• Summarized Writes
• Memcached counters + interval writes

Write Buffers
• Local Storage (web/app servers)
• Memcached
• SQLite (disk/in-memory)
• Log file Local Server
<- Most popular Storage
• MySQL
• Independent, isolated ETL
• Need to fetch data
• Prevent missed data and duplicates

Write Buffers
• Fetching Data
• Hand code
• ETL tool – Pentaho/Talend
• Flume
• Aggregation/Processing
• Hadoop <- Very Popular
• Google CRUSH

Innodb Log
File Size

Write Buffers
Sharding
Indexes

Intensive
Scaling Writes
Table
Hardware
Optimization

Remove
Partitioning
Bottlenecks
OS Settings

Sub Partitioning
MySQL
Bypass SQL
Settings
Layer

Scaling Writes
• Less Indexes
• Less writes
• Partitioning
• Less table maintenance
• Less B-tree levels
• Less need to organize blocks
• Algorithm overhead
• Mutex/Locks

Scaling Writes
• Bypass SQL Layer
SQL Parser
• Innodb/Memcached
• HandlerSocket
Optimizer

Storage Engine

* Raik

Scaling Writes
• IO Scheduler
• File System (+ nobarrier, noatime, nodiratime)
• EXT 3
• EXT 4
• XFS
• ZFS
• Block Sizes

Scaling Writes
• Faster I/O
• Faster Disks
• SAS
• SSD
• RAID + cache Hardware

• PCIe SSD
• FusionIO
• Virident

Scaling Writes
• Master/Master
• Does not scale writes
• Writes still need to replicate
• Sharding
• Does scale writes

App Global Data Child Data
Function DB

By Schema
Proxy Partitioned
Lookup Data

Shared
Nothing
Functional Sharding
Partitioning

ID

Go Fish Reporting Key

Area
Hash
Cross Shard Hadoop

Sharding
• Partitioned Data
• Splitting the Data
• Vertically
• Main Tables to partition
• Child Tables
• Global Tables
• By Schema
• Shared Nothing

Sharding
• Partitioning by Key
• ID – CustomerID, ProductID, App
• Area – Country, City, Continent
• Hash – Random for equal spread

Sharding
• Which shard has the data?
• Store it in a DB
• Flexible but slower
• Some Function in your App
• Faster, less flexible
• Proxy config file
• Faster, less flexible
• Needs some app coding

Sharding
• Maintenance
• Backups
• Slaves
• Uptime
• Loosely coupled system

Sharding
• Functional Partitioning
• Different Apps
• Share some tables

Functional
Partitioning

Sharding
• Reporting
• Go Fish – One server / Shared nothing
• Cross Shard – Many servers
• Hadoop – Aggregate to one reporting server

The End
• Questions & Answers
• Email: contact@jonathanlevin.co.uk
• Don’t forget to rate this tutorial

If we have time
• MySQL 5.6
• NoSQL
• ORM
• Beyond Hadoop
• Bring-Your-Own-Problems

MySQL Performance Tuning and Optimization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to MySQL Performance Tuning and Optimization

Similar to MySQL Performance Tuning and Optimization (20)

Recently uploaded

Recently uploaded (20)

MySQL Performance Tuning and Optimization