This presentation talk will assist you in troubleshooting MySQL replication for the most common issues we might face with a simple comparison of how can we get them solved in the different replication methods (Classic VS GTID).
2. About Me
Mughees Ahmed
4 years of experience as an Oracle and MySQL DBA
Currently working with Etisalcom Bahrain
Certified Oracle and MySQL Professional
Created Course on The Ultimate MySQL Replication Crash Course from Zero to Hero
https://mughees.gumroad.com/l/yGqVw
Course is coming on Udemy Soon.
My YouTube Channel https://www.youtube.com/user/mughees52
Blogs https://ittutorial.org/category/mysql/
Linkedin, Twitter @mughees52
3. Agenda
Types of problem you can face.
SQL_SLAVE_SKIP_COUNTER
How it works in ACID compliance Table (Innodb)
How it works in NON-ACID compliance table (MyISAM)
pt-slave-restart
TROUBLESHOOTING GTID
Solving duplicate key error etc.
Errant GTID
Confirm if there is any Errant GTID
Finding the exact Errant GTID
Solving the Errant GTID
Insert empty transactions
Remove from Binlog
If You want to Learn more, You can Follow me on
https://app.gumroad.com/signup?referrer=mughees
4. Data Drift
A statement is executed on a primary with: SET SESSION sql_log_bin = OFF
A statement was executed directly on the replica
Can happen if the replica was not in super_read_only and a Super user executed
Can happen if the replica was not in read_only
A statement was executed on a replica and the replica was later promoted to a
primary without GTID in place
A primary server is not configured for full ACID compliance and it crashed
At some point, the primary was not configured for row-based replication (even
briefly)
More exotic cases can involve bugs, engine differences, version differences
few things to prevent and fix this:
• All replicas run in super_read_only mode
• Verify ACID compliance and if not using it, checksum after any crash or failover event
• Checksum regularly
5. How to Repair MySQL Replication
If you have set up MySQL replication, you probably know this problem: sometimes
there are invalid MySQL queries which cause the replication to not work anymore.
Identifying the Problem
MySQL Error Log
Show slave status;
Last_Errno: 1146
Last_Error: Error 'Table 'mydb.taggregate_temp_1212047760'
doesn't exist' on query. Default database: 'mydb’.
Query: 'UPDATE thread AS thread,taggregate_temp_1212047760 AS
aggregate
SET thread.views = thread.views + aggregate.views
WHERE thread.threadid = aggregate.threadid’
Repair the MySQL Replication
6. SAMPLE ERROR MESSAGES (FROM SHOW SLAVE
STATUS OUTPUT):
Last_SQL_Error: Could not execute Write_rows event on table
test.t1; Duplicate entry '4' for key 'PRIMARY', Error_code:
1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master
log mysql-bin.000304, end_log_pos 285
Last_SQL_Error: Could not execute Update_rows event on table
test.t1; Can't find record in 't1', Error_code: 1032; handler
error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-
bin.000304, end_log_pos 492
Last_SQL_Error: Could not execute Delete_rows event on table
test.t1; Can't find record in 't1', Error_code: 1032; handler
error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-
bin.000304, end_log_pos 688
7. First Option
SQL_SLAVE_SKIP_COUNTER
SET GLOBAL sql_slave_skip_counter = N
This statement skips the next N events from the master. This is useful for recovering
from replication stops caused by a statement.
If you look closely at the document, the last paragraph of the document also
says:
When you use SET GLOBAL sql_slave_skip_counter to skip events and the results
in the middle of a group, the slave continues to skip events until it reaches the end of
the group. Execution then starts with the next event group.
8. Master Slave
Table (Z) Table (Z)
+----+
| a |
+----+
| 1 |
| 2 |
| 3 |
+----+
+----+
| a |
+----+
| 1 |
| 3 |
+----+
BEGIN;
INSERT INTO z SELECT 4;
DELETE FROM z WHERE a = 2;
INSERT INTO z SELECT 5;
COMMIT;
Obviously, the slave will report an error, prompting an error of
1032, because the record 2 is not found. At this time, many
DBAs will choose to execute SET GLOBAL
sql_slave_skip_counter=1.
However, such processing will cause the INSERT 5 record to
not be executed. Because after skipping the DELETE 2
operation, the transaction is not over, and the next event will
continue to be skipped.
This is what the document says: the slave continues to skip
events until it reaches the end of the group . Interested
students can test by themselves to see the final result.
What should I do if I just want to skip an EVENT? Should we,
just set the parameter slave_exec_mode to IDEMPOTENT ?
9. Test Data
On Master
create tablerepl_innodb(id intprimary key,name1 char( 10),name2
char( 10)) engine= innodb;
create tablerepl_myisam(id intprimary key,name1 char( 10),name2
char( 10)) engine= myisam;
On Slave:
# Add data from the SLAVE to the test table, not recorded in binlog.
setsql_log_bin = 0;
insert intorepl_innodb(id,name1,name2) values( 1, ' s1062-1 ', 's1062-1 ');
insert intorepl_myisam(id,name1,name2) values( 1, ' s1062-1 ', 's1062-1 ');
setsql_log_bin = 1;
10. Current Data
Replica
mysql> select * from repl_innodb;
+----+----------+---------+
| id | name1 | name2 |
+----+----------+---------+
| 1 | s1062-1 | s1062-1 |
+----+----------+---------+
1 row in set (0.00 sec)
mysql> select * from repl_myisam;
+----+----------+---------+
| id | name1 | name2 |
+----+----------+---------+
| 1 | s1062-1 | s1062-1 |
+----+----------+---------+
1 row in set (0.00 sec)
MASTER
mysql> select * from repl_innodb;
Empty set (0.00 sec)
mysql> select * from repl_myisam;
Empty set (0.00 sec)
11. Transactional tables
On master :
begin ;
insert into repl_innodb(id,name1,name2) values ( 1 , ' m1062-1 ' , ' m1062-1 '
);
insert into repl_innodb(id,name1,name2) values ( 2 , ' m1062-2 ' , ' m1062-2 '
);
commit ;
mysql> select * from repl_innodb;
+----+----------+----------+
| id | name1 | name2 |
+----+----------+----------+
| 1 | m1062-1 | m1062-1 |
| 2 | m1062-2 | m1062-2 |
+----+----------+----------+
2 rows in set (0.00 sec)
12. Transactional tables
On Replica
select * from repl_innodb;
+----+----------+---------+
| id | name1 | name2 |
+----+----------+---------+
| 1 | s1062-1 | s1062-1 |
+----+----------+---------+
1 row in set (0.00 sec)
Master_Host: 192.168.70.10
Master_Log_File: binlog.000014
Read_Master_Log_Pos: 593
Slave_IO_Running: Yes
Slave_SQL_Running: No
Exec_Master_Log_Pos: 156
Last_Errno: 1062
Last_Error: Could not execute Write_rows event
on table test.repl_innodb; Duplicate entry '1' for key
'repl_innodb.PRIMARY', Error_code: 1062; handler error
HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000014,
end_log_pos 436
mysql> select * from repl_innodb;
+----+----------+---------+
| id | name1 | name2 |
+----+----------+---------+
| 1 | s1062-1 | s1062-1 |
+----+----------+---------+
1 row in set (0.00 sec)
13. Transactional tables
mysql> show binary logs;
+---------------+-----------+-----------+
| Log_name | File_size | Encrypted |
+---------------+-----------+-----------+
| binlog.000013 | 156 | No |
| binlog.000014 | 179 | No |
| binlog.000015 | 1861 | No |
+---------------+-----------+-----------+
mysqlbinlog –v --base64-output=decode-rows
/var/lib/mysql/binlog.000015
And by looking into the slave bin log file you will not find any entry in slave binlog
14. Transactional tables
set global sql_slave_skip_counter = 1 ;
start slave sql_thread;
mysql> select * from repl_innodb;
+----+----------+---------+
| id | name1 | name2 |
+----+----------+---------+
| 1 | s1062-1 | s1062-1 |
+----+----------+---------+
1 row in set (0.00 sec)
15. Non-transactional tables
The Master adds data to non-transactional tables
begin ;
insert into repl_myisam(id,name1,name2) values ( 1 , ' m1062-1 ' , ' m1062-1 '
);
insert into repl_myisam(id,name1,name2) values ( 2 , ' m1062-2 ' , ' m1062-2 '
);
commit ;
mysql> select * from repl_myisam;
+----+----------+----------+
| id | name1 | name2 |
+----+----------+----------+
| 1 | m1062-1 | m1062-1 |
| 2 | m1062-2 | m1062-2 |
+----+----------+----------+
2 rows in set (0.00 sec)
16. Non-transactional tables
mysql> show slave statusG;
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_SQL_Errno: 1062
Last_SQL_Error: Could
not execute Write_rows event on table
test.repl_myisam; Duplicate entry '1'
for key 'repl_myisam.PRIMARY',
Error_code: 1062; handler error
HA_ERR_FOUND_DUPP_KEY; the event's
master log binlog.000014, end_log_pos
2113
ON SALVE:
mysql> select * from
test.repl_myisam;
+----+----------+---------+
| id | name1 | name2 |
+----+----------+---------+
| 1 | s1062-1 | s1062-1 |
+----+----------+---------+
1 row in set (0.00 sec)
17. Non-transactional tables
Let Solve the error by skipping the event
set global sql_slave_skip_counter = 1 ;
start slave sql_thread;
select * from repl_myisam;
mysql> select * from repl_myisam;
+----+----------+----------+
| id | name1 | name2 |
+----+----------+----------+
| 1 | s1062-1 | s1062-1 |
| 2 | m1062-2 | m1062-2 |
+----+----------+----------+
2 rows in set (0.00 sec)
And if you see here where have second record here Why??
18. Non-transactional tables (Master binlog)
BEGIN
/*!*/;
# at 1987
#210706 17:44:15 server id 1 end_log_pos 2055 CRC32 0xd5896c4c Table_map:
`test`.`repl_myisam` mapped to number 111
# at 2055
#210706 17:44:15 server id 1 end_log_pos 2113 CRC32 0xadd8d9bd
Write_rows: table id 111 flags: STMT_END_F
### INSERT INTO `test`.`repl_myisam`
### SET
### @1=1
### @2=' m1062-1'
### @3=' m1062-1'
# at 2113
#210706 17:44:15 server id 1 end_log_pos 2189 CRC32 0x31c1deaa Query
thread_id=47 exec_time=0 error_code=0
SET TIMESTAMP=1625593455/*!*/;
COMMIT
19. Non-transactional tables (Master binlog)
SET TIMESTAMP=1625593455/*!*/;
BEGIN
/*!*/;
# at 2343
#210706 17:44:15 server id 1 end_log_pos 2411 CRC32 0x54db30b4 Table_map:
`test`.`repl_myisam` mapped to number 111
# at 2411
#210706 17:44:15 server id 1 end_log_pos 2469 CRC32 0xe02b1e17 Write_rows: table id 111
flags: STMT_END_F
### INSERT INTO `test`.`repl_myisam`
### SET
### @1=2
### @2=' m1062-2'
### @3=' m1062-2'
# at 2469
#210706 17:44:15 server id 1 end_log_pos 2545 CRC32 0x168f221d Query thread_id=47
exec_time=0 error_code=0
SET TIMESTAMP=1625593455/*!*/;
COMMIT
20. Non-transactional tables (Slave binlog)
# at 1490
#210706 17:44:15 server id 1 end_log_pos 1560 CRC32 0xb789547b Query thread_id=47 exec_time=308
error_code=0
SET TIMESTAMP=1625593455/*!*/;
BEGIN
/*!*/;
# at 1560
#210706 17:44:15 server id 1 end_log_pos 1628 CRC32 0x3d6a2f01 Table_map: `test`.`repl_myisam` mapped to
number 109
# at 1628
#210706 17:44:15 server id 1 end_log_pos 1686 CRC32 0x2872bae4 Write_rows: table id 109 flags: STMT_END_F
### INSERT INTO `test`.`repl_myisam`
### SET
### @1=2
### @2=' m1062-2'
### @3=' m1062-2'
# at 1686
#210706 17:44:15 server id 1 end_log_pos 1753 CRC32 0x83d5d026 Query thread_id=47 exec_time=308
error_code=0
SET TIMESTAMP=1625593455/*!*/;
SET @@session.sql_mode=1168113696/*!*/;
COMMIT
21. Option 2 pt-slave-restart
Last_SQL_Errno: 1062
Last_SQL_Error: Could not execute Write_rows event on table
test1.repl_innodb; Duplicate entry '1' for key 'repl_innodb.PRIMARY', Error_code:
1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log
binlog.000007, end_log_pos 1007
[root@mysql-gtid2 ~]# pt-slave-restart
2021-07-11T19:09:16 mysql-gtid2-relay-bin.000004 934
1062
pt-slave-restart watches one or more MySQL replication slaves and tries to skip
statements that cause errors. It polls slaves intelligently with an exponentially
varying sleep time.
When using GTID, an empty transaction should be created in order to skip it. If
writes are coming from different nodes in the replication tree above, it is not
possible to know which event from which UUID to skip.
master1 -> slave1 -> slave2
pt-slave-restart --master-uuid
22. Option 3
I am not going to show this one as this will take long time.
But this is the last option where you have to restore/reseed the replica from
the master backup.
Almost just the same, only need to reset the salve.
Incase of the GTID you need to reset the master as well to clear the value of
GTID_EXCUTED.
23. TROUBLESHOOTING GTID
show slave statusG;
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_Errno: 1062
Last_Error: Could not execute Write_rows event on
table test1.repl_innodb; Duplicate entry '1' for key
'repl_innodb.PRIMARY', Error_code: 1062; handler error
HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000007, end_log_pos
1624
Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188
Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-187
mysql> show master status;
+---------------+----------+--------------------------------------------+
| File | Position | Executed_Gtid_Set |
+---------------+----------+--------------------------------------------+
| binlog.000007 | 1782 | 02992584-de8e-11eb-98ad-080027b81a94:1-188 |
+---------------+----------+--------------------------------------------+
24. TROUBLESHOOTING GTID
mysql> SET gtid_next='02992584-de8e-11eb-98ad-080027b81a94:188';
mysql> BEGIN;
mysql> COMMIT;
mysql> SET GTID_NEXT="AUTOMATIC";
mysql> start slave;
mysql> show slave statusG;
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Slave_SQL_Running_State: Slave has read all relay log; waiting for more
updates
Last_SQL_Errno: 0
Last_SQL_Error:
Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188
Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188
25. How to Detect and solve Errant transection:
Un-replicated transaction existing only on a replica
Data is not the same on all nodes
Cluster is no longer in a consistent stat
Errant GTID detection:
Compare executed GTID sets between primary node and replica nodes
Replica has more GTIDs than primary => errant GTID
26. Let's Find Errant GTID
We Need to user two function:
GTID_SUBSET:
Used to find if Replica is a subset of Master or not?
SELECT GTID_SUBSET('<gtid_executed_replica>', '<gtid_executed_primary>');
GTID_SUBTRACT:
Used to find the exact Errant GTID.
SELECT GTID_SUBTRACT('<gtid_executed_replica>', '<gtid_executed_primary>’);
27. Current situation
mysql> show slave statusG;
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Master_UUID: 02992584-de8e-11eb-98ad-080027b81a94
Slave_SQL_Running_State: Slave has read all relay log; waiting for more
updates
Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188
Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188,
9ad0db84-e1b1-11eb-99b5-080027b81a94:1
28. GTID subset
mysql> SELECT GTID_SUBSET('02992584-de8e-11eb-98ad-080027b81a94:1-
170','02992584-de8e-11eb-98ad-080027b81a94:1-188')AS is_subset;
+-----------+
| is_subset |
+-----------+
| 1 |
+-----------+
Replica GTID set is a subset of primary GTID set : OK (It was just to show you guys in our case it’s not)
Check if Replica GTID set is the subset of Primary GTID Set:
mysql> SELECT GTID_SUBSET('02992584-de8e-11eb-98ad-080027b81a94:1-188,9ad0db84-
e1b1-11eb-99b5-080027b81a94:1','02992584-de8e-11eb-98ad-080027b81a94:1-188')AS
is_subset;
+-----------+
| is_subset |
+-----------+
| 0 |
+-----------+
Replica GTID set is NOT a subset of primary GTID set => Errant GTID on replica
30. [root@mysql-gtid2 ~]# mysqlbinlog --base64-output=DECODE-ROWS --verbose /var/lib/mysql/binlog.000001
| grep 9ad0db84-e1b1-11eb-99b5-080027b81a94:1 -A100
SET @@SESSION.GTID_NEXT= '9ad0db84-e1b1-11eb-99b5-080027b81a94:1'/*!*/;
# at 235
#210711 20:06:33 server id 2 end_log_pos 311 CRC32 0x45140e33 Query thread_id=14 exec_time=0
error_code=0
SET TIMESTAMP=1626033993/*!*/;
.
.
BEGIN
/*!*/;
# at 311
#210711 20:06:33 server id 2 end_log_pos 380 CRC32 0xb7b533ca Table_map: `test1`.`repl_innodb`
mapped to number 97
# at 380
#210711 20:06:33 server id 2 end_log_pos 438 CRC32 0x5ef8bb59 Write_rows: table id 97 flags:
STMT_END_F
### INSERT INTO `test1`.`repl_innodb`
### SET
### @1=2
### @2='m1062-2'
### @3='Ernt-tran'
# at 438
#210711 20:06:33 server id 2 end_log_pos 469 CRC32 0x696f6881 Xid = 135
COMMIT/*!*/;
31. Fix errant GTIDs
Possible fixes:
• Insert empty transactions on other nodes (including primary)
• Remove GTIDs from replica bin-log
• Restore data from primary/backup
32. Insert empty transactions
On all nodes (or only on the primary of replication still works):
In our case the replication is working fine so we will insert empty transactions on
Master
mysql> SET gtid_next='9ad0db84-e1b1-11eb-99b5-080027b81a94:1';
mysql> BEGIN;
mysql> COMMIT;
mysql> SET gtid_next=automatic;
Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188,
9ad0db84-e1b1-11eb-99b5-080027b81a94:1
Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188,
9ad0db84-e1b1-11eb-99b5-080027b81a94:1
If you don’t set gtid_next:
ERROR 1837 (HY000): When @@SESSION.GTID_NEXT is set to a GTID, you must
explicitly set it to a different value after a COMMIT or ROLLBACK.
33. Insert empty transactions
What if Master is down, then we will insert on all slave and then promote on
of the most up to date slave to master and point the rest of the slave to new
master.
STOP SLAVE;
SET gtid_next='9ad0db84-e1b1-11eb-99b5-080027b81a94:1';
BEGIN;
COMMIT;
SET gtid_next=automatic;
START SLAVE;
34. Errant GTID : Remove from binlog:
Inser new query on slave to make it inconsistant and create and errant GTID
insert into repl_innodb(id,name1,name2) values ( 6 , ' m1062-2 ' ,
'ReGTIDbin' );
mysql> show slave statusG;
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Master_UUID: 02992584-de8e-11eb-98ad-080027b81a94
Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-
080027b81a94:182-188,9ad0db84-e1b1-11eb-99b5-080027b81a94:1
Executed_Gtid_Set: 02992584-de8e-11eb-98ad-
080027b81a94:1-188,9ad0db84-e1b1-11eb-99b5-080027b81a94:1-2
35. Errant GTID : Remove from binlog:
On the primary:
mysql> SELECT @@GLOBAL.gtid_executed;
+------------------------------------------------------------------ -+
| @@GLOBAL.gtid_executed |
+--------------------------------------------------------------------+
| 02992584-de8e-11eb-98ad-080027b81a94:1-188,
9ad0db84-e1b1-11eb-99b5-080027b81a94:1 |
+--------------------------------------------------------------------+
36. Errant GTID : Remove from binlog:
On Replica:
mysql> STOP SLAVE;
mysql> RESET MASTER;
# With RESET MASTER : Binlogs are purged on the replica and reset the gtid_executed to ''
mysql> SELECT @@GLOBAL.gtid_executed;
+------------------------+
| @@GLOBAL.gtid_executed |
+------------------------+
| |
+------------------------+
mysql> SET GLOBAL GTID_PURGED="02992584-de8e-11eb-98ad-080027b81a94:1-188,9ad0db84-e1b1-11eb-99b5-080027b81a94:1";
mysql> START SLAVE;
mysql> show slave statusG;
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Retrieved_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:182-188,
9ad0db84-e1b1-11eb-99b5-080027b81a94:1
Executed_Gtid_Set: 02992584-de8e-11eb-98ad-080027b81a94:1-188,
9ad0db84-e1b1-11eb-99b5-080027b81a94:1