Upgrading MySQL databases do not come without risk. There is no guarantee that no problems will happen if you move to a new major MySQL version.
Should we just upgrade and rollback immediately if problems occur? But what if these problems only happen a few days after migrating to this new version?
You might have a database environment that is risk-adverse, where you really have to be sure that this new MySQL version will handle the workload properly.
Examples:
- Both MySQL 5.6 and 5.7 have a lot of changes in the MySQL Optimizer. It is expected that this improves performance of my queries, but is it really the case? What if there is a performance regression? How will this affect my database performance?
- Also, there are a lot of incompatible changes which are documented in the release notes, how do I know if I'm affected by this in my workload? It's a lot to read..
- Can I go immediately from MySQL 5.5 to 5.7 and skip MySQL 5.6 even though the MySQL documentation states that this is not supported?
- Many companies have staging environments, but is there a QA team and do they really test all functionality, under a similar workload?
This presentation will show you a process, using open source tools, of these types of migrations with a focus on assessing risk and fixing any problems you might run into prior to the migration.
This process can then be used for various changes:
- MySQL upgrades for major version upgrades
- Switching storage engines
- Changing hardware architecture
Additionally, we will describe ways to do the actual migration and rollback with the least amount of downtime.
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Reducing Risk When Upgrading MySQL
1. Reducing Risk When Upgrading
Your MySQL Environment
Kenny Gryp
MySQL Practice Manager
2. My Experience as MySQL
Consultant On Upgrading MySQL
it's quite complex...
Kenny Gryp
MySQL Practice Manager
3. Table of Contents
The O cial Documentation
Make Your Own Documentation
Potential Risks
Establish Upgrade Method For A Single Server
Rollback Scenario Testing
Test Writes
Test Individual Reads
Workload Testing
Establish (& Test) Migration Process
Migration In Production
(Rollback)
Post-Migration Assessment
3 / 77
5. Oracle's Recommended Process
Backup your data
Read all release notes and assess
https://dev.mysql.com/doc/relnotes/mysql/5.7/en/
Read Changes Affecting Upgrades to MySQL 5.7
https://dev.mysql.com/doc/refman/5.7/en/upgrading-from-
previous-series.html
5 / 77
7. Oracle's Recommended Process
Upgrade Slaves First
In-Place Upgrade:
Clean shutdown (innodb_fast_shutdown=0)
Run mysql_upgrade
Logical Upgrade:
mysqldump data
Import data again
Run mysql_upgrade to x mysql schema
http://dev.mysql.com/doc/refman/5.7/en/upgrading.html
7 / 77
8. Oracle's Recommended Process (cont.)
A Lot of Risk:
No guarantee queries will execute the same
No guarantee queries will be same speed or faster
No guarantee all your queries will still work (new default
stricter sql_mode)
There is no o cial support to upgrade from <5.6 to 5.7
but we might actually be able to do that
8 / 77
10. Documenting The Process
PEBKAC: Human errors happen and create issues
import data using wrong character set
setting up replica using wrong binlog le/pos
...
Document every step, we need to repeat it multiple times
10 / 77
12. Optimizer Changes
Example: index_merge_intersection
Often seen during migrations to MySQL 5.6
Affects environments with sub-optimal indexing
Queries with c1='a' AND c2='b' when composite index
(c1,c2) is missing
Is often slower when selectivity with 1 of the 2 columns is
bad (and it happens frequently)
Result: a lot of queries were slower in new environment
Need SELECT performance tests between versions
https://www.percona.com/blog/2012/12/14/the-optimization-that-often-isnt-index-merge-intersection/
12 / 77
13. New Defaults In MySQL 5.7
The new defaults in MySQL 5.7 make a lot of sense:
More use of available features and performance
enhancements out of the box
More strictness with data/query validation
New Reserved words
Applications might not be ready for it.
Drupal 7 - https://www.drupal.org/node/2545480
They will/might break the application more easily:
sql_mode=ONLY_FULL_GROUP_BY,
STRICT_TRANS_TABLES, NO_ZERO_IN_DATE,
NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO,
NO_AUTO_CREATE_USER, NO_ENGINE_SUBSTITUTION
innodb_strict_mode=1
Needs SELECT & DML query validity tests between versions 13 / 77
14. Other Changes in MySQL 5.7
Passwords that use the older pre-4.1 password hashing format
is removed.
14 / 77
15. MySQL 5.0.37
+-------+
| 0 |
+-------+
MySQL 5.0.45
+-------+
| 1 |
+-------+
Minor Versions Also At Risk
CREATE TABLE date (d DATE);
INSERT INTO date VALUES ('2017-04-19');
SELECT COUNT(*) FROM date
WHERE d < NOW()-INTERVAL 1 DAY;
Seen with DELETE FROM date WHERE d < NOW()-
INTERVAL 1 DAY in binlog_format=STATEMENT
environments.
Needs SELECT & DML query result tests between versions
15 / 77
16. Workload
SYNC_BINLOG=1 in MySQL 5.7
Can impact certain environments, might not be noticed when
looking at a single query
InnoDB LRU Flushing changes require tuning for heavy
workloads in 5.6
(innodb_lru_scan_depth)
When switching to MySQL 8.0 with the new data dictionary
...
Need to do Workload Testing between versions
http://mysqlentomologist.blogspot.com/2015/10/fun-with-bugs-38-regression-bugs-in.html
http://lefred.be/content/sync_binlog-1-in-5-7/
16 / 77
20. Upgrade Method For A Single Server
Follow MySQL documentation:
http://dev.mysql.com/doc/refman/5.7/en/upgrading.html
Ensure to document every command
Restore from backup
Or take a replica you can miss
20 / 77
24. Writes - Replication Consistency
pt-table-checksum:
validate consistency in a replication topology
Identify problems caused by PEBKAC
Ensure events replicate properly
(binlog_format=STATEMENT)
Upgrade a replica or add a replica which is using the modi ed
version.
Do it on production, will have no result in test/staging
https://www.percona.com/doc/percona-toolkit/3.0/pt-table-checksum.html
24 / 77
27. Rollback Scenario Testing
Possibility to fall back in case something went wrong during
migration
Can be done using replication, but has to be tested!
27 / 77
29. Rollback Scenario Testing
You might need to change some settings to your new my.cnf to be
able to support replicating back.
Example:
binlog_checksum = NONE
binlog_row_image = FULL
binlog_rows_query_log_events = OFF
log_bin_use_v1_row_events = 1
gtid_mode = OFF
log_slave_updates=1
skip-slave-start
29 / 77
34. Where To Run pt-table-checksum?
GTID:
pt-table-checksum can only be run on Master
(Errant Transactions)
Or scratch the pt-table-checksum host after tests
non-GTID:
pt-table-checksum can be run on intermediate master
binlog_format=ROW:
only 1 tier below can be checksummed
run on every tier that has a replica (for rollback)
pt-table-checksum can bring prod overhead when run on
active master
Let replication run for a while before checksumming
34 / 77
35. pt-table-checksum results
On every replica (including rollback):
SELECT db, tbl, SUM(this_cnt) AS total_rows,
COUNT(*) AS chunks
FROM percona.checksum
WHERE (master_cnt <> this_cnt
OR master_crc <> this_crc
OR ISNULL(master_crc) <> ISNULL(this_crc))
GROUP BY db, tbl;
+----+-----------------+------------+--------+
| db | tbl | total_rows | chunks |
+----+-----------------+------------+--------+
| db | telephone_debit | 44342 | 1 |
| db | orderline | 21451 | 3 |
| db | orders | 25125215 | 12 |
+----+-----------------+------------+--------+ 35 / 77
39. pt-table-checksum - Analysis
SELECT *
INTO outfile '/tmp/telephone_debit_mysql56'
FROM db.telephone_debit
WHERE id BETWEEN 5014733 AND 5059074;
SELECT *
INTO outfile '/tmp/telephone_debit_mysql57'
FROM db.telephone_debit
WHERE id BETWEEN 5014733 AND 5059074;
# diff -u /tmp/telephone_debit_mysql5{6,7}
39 / 77
40. pt-table-checksum - Analysis
SELECT *
INTO outfile '/tmp/telephone_debit_mysql56'
FROM db.telephone_debit
WHERE id BETWEEN 5014733 AND 5059074;
SELECT *
INTO outfile '/tmp/telephone_debit_mysql57'
FROM db.telephone_debit
WHERE id BETWEEN 5014733 AND 5059074;
# diff -u /tmp/telephone_debit_mysql5{6,7}
Use twindb_table_compare!
https://github.com/twindb/twindb_table_compare
40 / 77
41. pt-table-checksum - Analysis
Wrong upgrade method
backups
wrong replication le/pos
...
binlog_format=STATEMENT using (UUID()...)
Common Seen Issues replicating older versions:
Floating point differences: Storing currencies in a DOUBLE
Temporal data types
Invalid dates converted to zero dates
Trailing spaces in CHAR elds
41 / 77
42. Testing Writes
Consistency Checks Process:
Checksum
Check for differences
On new environment
On rollback environment
For each inconsistency
Analyze diff
Find root cause
Fix problem
Document problem & solution
Repeat checksum again
42 / 77
45. Testing Reads - Collect Queries
Collection Techniques:
Slow Query Log
long_query_time=0
Careful when ~+10000 QPS
Percona Server: log_slow_rate_limit
tcpdump
'packets lost' in libpcap
Application/Load Balancer queries
Ensure:
Get the full workload (long enough)
Get data from Master & Replicas
Collect batchjob queries running at night
https://www.percona.com/doc/percona-server/5.7/diagnostics/slow_extended.html
45 / 77
47. Testing Reads - Setup 2 Environments
Need 2 Test Servers:
Reuse servers from checksum + rollback
Ensure they have the same data
(break replication at same time)
Same HW speci cations
Similar Con gurations on buffer pool, flatc...
Fast enough to more or less resemble production
Optionally can be done using 1 machine
(pt-upgrade --save-results)
47 / 77
49. Testing Reads - pt-upgrade
pt-upgrade:
runs one query at a time on both test environments
compares differences:
warnings/errors
resultset (even different order)
query response time
Run pt-upgrade on third host with similar network latency
Run twice to warm up buffer pool rst (need to be equal)
Can also compare writes for execution time & warnings
Filter slowlog initially to limit similar queries
pt-query-digest
--no-report --output slowlog --samples 20
https://www.percona.com/doc/percona-toolkit/3.0/pt-upgrade.html
49 / 77
50. Testing Reads - pt-upgrade
Reporting class because there are 1000 row diffs.
Total queries 10
Unique queries 10
Discarded queries 0
select ... from ...
##
## Row diffs: 10
##
-- 1.
@ row 2
< 13178,"dim0",37,2,21,,,0,0,0,1,NULL,NULL
> 13178,"dimø",37,2,21,,,0,0,0,1,NULL,NULL
...
50 / 77
51. Testing Reads - pt-upgrade
Reporting class because it has diffs,
but hasn't been reported yet.
SELECT * FROM `database`.table
WHERE treeid = '' AND productid='0'
## Warning diffs: 2
Code: 1366
Level: Warning
Message: Incorrect integer value: ''
for column 'treeid' at row 1
vs.
No warning 1366
51 / 77
52. Testing Reads - pt-upgrade
SELECT *
FROM `database`.client_orders
WHERE client=?
AND blacklist=? LIMIT ?
## Query time diffs: 1
-- 1.
0.000513 vs. 0.036395 seconds (70.9x increase)
SELECT *
FROM `database`.client_orders
WHERE client=57450
AND blacklist=1 LIMIT 1
52 / 77
53. Testing Reads Process
Collect queries
Run pt-upgrade (twice)
For each entry in report
Figure out why it is reported
Deploy x in Prod Application
Make schema changes
Document analysis
Run pt-upgrade again
53 / 77
54. one of the most challenging is
Testing Workload
54 / 77
56. Workload Testing - Query Playback
Uses slowlog to replay queries
Needs long_query_time=0 - challenging on busy servers
Enough data during peak workload
Tries to execute workload as realistically as possible
same connections, same transactions, same delays between
queries
Run against both environments, compare speed
Think about preloading buffer on both the same way
Active development by Marius Wachtler (ex)-DropBox! Thank
you!
(uno cal product of Percona, no support)
56 / 77
60. Migration Process
Create Migration Plan
Different for every environment/application
Upgrade a replica rst for a couple of days/weeks?
How to switch masters?
How is failover being handled nowadays?
MHA, Orchestrator, Manual, GTID/msyqlrpladmin...?
Test in staging!
60 / 77
68. Rollback
What went wrong?
I did not follow the full process! (or I forgot to document it)
Do consistency checks again!
68 / 77
69. after all that testing, it's ok to spend time doing
Post-Migration Assessment
69 / 77
70. Post-Migration
Check trending for different behavior
more cpu load?
more disk IO?
higher amount of innodb_rows_* and handler_*
threads_running stability?
do some query optimization
If all looks good, scratch the 5.6 rollback & make it 5.7
Remove the rollback speci c con guration options
70 / 77
73. Multi-Use
(Minor MySQL version upgrades)
Major MySQL version upgrades
Switching Hardware from Intel -> AMD archicture
Using a new kernel/libc/memory allocator
Switching storage engines
MariaDB/Percona Server/MySQL
...
73 / 77
74. Do I really have to go through this?
Many success stories:
Have done several MySQL upgrades from 4.1 -> 5.5
without intermediate slaves
Upgraded environments with major schema changes in the mix
(mssql-style environments using stored procedures only)
Found numerous application bugs using this process
Optimized many customers schemas/queries in the meantime
As long as you follow this process completely,
the risk of running into problems is quite small.
74 / 77
75. Do I really have to go through this?
It Depends:
Your business might be risk-averse:
every change has to be thoroughly tested
Other companies just upgrade a replica in production and see
how it goes
My suggestion to do this at least for:
Major MySQL version upgrades
Switching storage engines
75 / 77
76. Summary
Test Step Skip?
Document Upgrade Single Server Really? Why?
Rollback Scenarios Not Recommended
Consistency Checks Required, No Debate!
Read Tests Strongly Suggested
Workload Tests Possible (Early Adopter Alert)
Migration Tests Not Recommended To Skip
76 / 77
77. Reducing Risk When Upgrading
Your MySQL Environment
Q&A!
Kenny Gryp
MySQL Practice Manager