SlideShare a Scribd company logo
1 of 197
Download to read offline
Advanced Percona XtraDB Cluster
in a nutshell... la suite
Hands on tutorial for advanced users!
1
Kenny 'kenji' Gryp
kenny.gryp@percona.com
Frédéric 'lefred' Descamps
lefred@percona.com
2
Setting Up
Environment
Bootstrapping
Certification Errors
Replication Failures
Galera Cache
Flow Control
Replication
Throughput
WAN Replication
Consistent Reads
Backups
Load Balancers
Challenges
Table of Contents
3
Are you ready?
Setting Up The Environment
4
Setting Up The Environment
Fetch a USB Stick, Install VirtualBox & Copy the 3 images
5
Testing The Environment
Start all 3 VirtualBox images
ssh/putty to:
pxc1: ssh root@localhost -p 8821
pxc2: ssh root@localhost -p 8822
pxc3: ssh root@localhost -p 8823
root password is vagrant
HAProxy is running on pxc1
(http://localhost:8881/)
Verify ssh between nodes
Open 2 ssh sessions to every node
6
Attention - Hands On!
When you see in the right bottom,
there is an exercise that you should do!
7
Easy, no?
Bootstrap A Cluster
8
Bootstrapping PXC
You all should know this already...
# service mysql bootstrap-pxc
# /etc/init.d/mysql boostrap-pxc
# /etc/init.d/mysql start --wsrep-new-cluster
or with systemd environments like Centos 7:
# systemctl mysql@bootstrap start
Today we are using 32-bit Centos 6.
For Emily, just to be complete....
pxc1# service mysql bootstrap-pxc
pxc2# service mysql start
pxc3# service mysql start
9
Bootstrapping PXC
Bootstrapping a node gives it permission to form a new cluster
Bootstrapping should NOT happen automatically without a system
with split-brain protection that can coordinate it. Usually this is
done manually
The bootstrapped node is the source of truth for all nodes going
forward
10
Bootstrapping PXC
Recap On IST/SST
IST: Incremental State Transfer
Only transfer missing transactions
SST: State Snapshot Transfer
Snapshot the whole database and transfer, using:
Percona XtraBackup
rsync
mysqldump
One node of the cluster is DONOR
11
Bootstrapping PXC
With Stop/Start Of MySQL
When you need to start a new cluster from scratch, you decide which
node to start with and you bootstrap it
# /etc/init.d/mysql start --wsrep-new-cluster
That node becomes the cluster source of truth (SSTs for all new nodes)
12
Bootstrapping PXC
Without Restarting MySQL
When a cluster is already partitioned and you want to bring it up again.
1 or more nodes need to be in Non-Primary state.
Choose the node that is newest and can be enabled (to work with
application)
To bootstrap online:
mysql> set global wsrep_provider_options="pc.bootstrap=true";
be sure there is NO OTHER PRIMARY partition or there will be a
split brain!!
13
Bootstrapping PXC
Without Restarting MySQL
Use Case:
Only 1 of the 3 nodes is available and the other 2 nodes crashed,
causing node 1 to go Non-Primary.
In Multi Datacenter environments:
DC1 has 2 nodes, DC2 has 1 node,
If DC1 dies, the single node in DC2 will go Non-Primary. To
activate secondary DC, a bootstrap is necessary
14
Recover Cleanly Shutdown Cluster
Run the application (run_app.sh haproxy-all) on pxc1
One by one, stop mysql on all 3 nodes
How can you know which node to bootstrap?
15
Recover Cleanly Shutdown Cluster
Run the application (run_app.sh haproxy-all) on pxc1
One by one, stop mysql on all 3 nodes
How can you know which node to bootstrap?
Solution
# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 3759f5c0-56f6-11e5-ad87-afbd92f4dcd2
seqno: 1933471
cert_index:
Bootstrap node with highest seqno and start other nodes.
16
Recover Unclean Stopped Cluster
Run the application (run_app.sh haproxy-all) on pxc1
On all nodes at the same time run:
# killall -9 mysqld mysqld_safe
How can you know which node has the latest commit?
17
Recover Unclean Stopped Cluster
Run the application (run_app.sh haproxy-all) on pxc1
On all nodes at the same time run:
# killall -9 mysqld mysqld_safe
How can you know which node has the latest commit?
Solution
# mysqld_safe --wsrep-recover
Logging to '/var/lib/mysql/error.log'.
Starting mysqld daemon with databases from /var/lib/mysql
WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.Ln
WSREP: Recovered position 44e54b4b-5c69-11e5-83a3-8fc879cb495e:1719976
mysqld from pid file /var/lib/mysql/pxc1.pid ended
18
Recover Unclean Stopped Cluster
What methods can we use to bring back the cluster?
19
Recover Unclean Stopped Cluster
What methods can we use to bring back the cluster?
Solutions
Bootstrap the most accurate server
Since PXC 5.6.19-25.6 we have pc.recovery (enabled by default)
that uses the information stored in gvwstate.dat. We can then
just start MySQL on all 3 nodes at the same time
# service mysql restart
20
Avoiding SST
When MySQL cannot start due to an error, such as a
configuration error, an SST is always performed.
21
Avoiding SST
When MySQL cannot start due to an error, such as a
configuration error, an SST is always performed.
Let's try, stop MySQL on pxc2, modify my.cnf and add foobar under
[mysqld] section.
Then start MySQL, does it fail? Check /var/lib/mysql/error.log.
# /etc/init.d/mysql stop
# cat >> /etc/my.cnf << EOF
[mysqld]
foobar
EOF
# /etc/init.d/mysql start
22
Avoiding SST
When MySQL cannot start due to an error, such as a
configuration error, an SST is always performed.
Fix the error (remove the foobar configuration) and restart MySQL.
Does it perform SST?
Check /var/lib/mysql/error.log. Why?
23
Avoiding SST
When MySQL cannot start due to an error, such as a
configuration error, an SST is always performed.
Fix the error (remove the foobar configuration) and restart MySQL.
Does it perform SST?
Check /var/lib/mysql/error.log. Why?
SST is done, as we can see in the error.log:
[Warning] WSREP: Failed to prepare for incremental state transfer:
Local state UUID (00000000-0000-0000-0000-000000000000)
does not match group state UUID
(93a81eed-57b2-11e5-8f5e-82e53aab8d35): 1 (Operation not permitted)
...
WSREP_SST: [INFO] Proceeding with SST (20150916 19:13:50.990)
24
Avoiding SST
When MySQL cannot start due to an error, such as a
configuration error, an SST is always performed.
So how can we avoid SST?
It's easy, you need to hack /var/lib/mysql/grastate.dat.
Create the error again:
Bring node back in cluster
Add foobar to the configuration again
Start MySQL
25
Avoiding SST
When MySQL cannot start due to an error, such as a
configuration error, an SST is always performed.
# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 93a81eed-57b2-11e5-8f5e-82e53aab8d35
seqno: 1300762
cert_index:
26
Avoiding SST
When MySQL cannot start due to an error, such as a
configuration error, an SST is always performed.
# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 93a81eed-57b2-11e5-8f5e-82e53aab8d35
seqno: 1300762
cert_index:
When it fails due to an error, grastate.dat is reset to:
# GALERA saved state
version: 2.1
uuid: 00000000-0000-0000-0000-000000000000
seqno: -1
cert_index:
27
Avoiding SST
When MySQL cannot start due to an error, such as a
configuration error, an SST is always performed.
You need then to set the right uuid and seqno in grastate.dat
Run mysqld_safe --wsrep-recover to find the values to set:
[root@pxc2 mysql]# mysqld_safe --wsrep-recover
...
2015-09-16 19:26:14 6133 [Note] WSREP:
Recovered position: 93a81eed-57b2-11e5-8f5e-82e53aab8d35:1300762
28
Avoiding SST
When MySQL cannot start due to an error, such as a
configuration error, an SST is always performed.
Create grastate.dat with info from wsrep-recover:
# GALERA saved state
version: 2.1
uuid: 93a81eed-57b2-11e5-8f5e-82e53aab8d35
seqno: 1300762
cert_index:
Start MySQL again and check /var/lib/mysql/error.log:
[root@pxc2 mysql]# /etc/init.d/mysql start
...
150916 19:27:53 mysqld_safe
Assigning 93a81eed-57b2-11e5-8f5e-82e53aab8d35:1300762
to wsrep_start_position
...
WSREP_SST: [INFO] xtrabackup_ist received from donor:
Running IST (20150916 19:34:05.545
29
When putting in production unprepared...
Certification Errors
30
Certification
What it does:
Determine if writeset can be applied.
Based on unapplied earlier transactions on master
Such conflicts must come from other nodes
Happens on every node, individually
Deterministic
Results are not reported to other nodes in the cluster, as every node
does certification and is a determinstic process.
Pass: enter apply queue (commit success on master)
Fail: drop transaction (or return deadlock on master)
Serialized by GTID
Cost based on # of keys or # of rows
31
Certification
32
Certification
33
Certification
34
Certification
35
Certification
36
Certification
37
Certification
38
Conflict Detection
Local Certification Failure (lcf)
Transaction fails certification
Post-replication
Deadlock/Transaction Rollback
Status Counter: wsrep_local_cert_failures
Brute Force Abort (bfa)
(Most Common)
Deadlock/Transaction rolled back by applier threads
Pre-commit
Transaction Rollback
Status Counter: wsrep_local_bf_aborts
39
Conflict Deadlock/Rollback
note: Transaction Rollback can be gotten on any statement, including
SELECT and COMMIT
Example:
pxc1 mysql> commit;
ERROR 1213 (40001): Deadlock found when trying to get lock;
try restarting transaction
40
Multi-writer Conflict Types
Brute Force Abort (bfa)
Transaction rolled back by applier threads
Pre-commit
Transaction Rollback can be gotten on any statement, including
SELECT and COMMIT
Status Counter: wsrep_local_bf_aborts
41
Brute Force Abort (bfa)
42
Brute Force Abort (bfa)
43
Brute Force Abort (bfa)
44
Brute Force Abort (bfa)
45
Brute Force Abort (bfa)
46
Brute Force Abort (bfa)
47
Brute Force Abort (bfa)
48
Multi-writer Conflict Types
Local Certification Failure (lcf)
Transaction fails certification
Post-replication
Deadlock on commit
Status Counter: wsrep_local_cert_failures
49
Local Certification Failure (lcf)
50
Local Certification Failure (lcf)
51
Local Certification Failure (lcf)
52
Local Certification Failure (lcf)
53
Local Certification Failure (lcf)
54
Local Certification Failure (lcf)
55
Local Certification Failure (lcf)
56
Local Certification Failure (lcf)
57
Local Certification Failure (lcf)
58
Local Certification Failure (lcf)
59
Local Certification Failure (lcf)
60
Local Certification Failure (lcf)
61
Conflict Detection
Exercises!
62
Reproducing Conflicts - 1
On pxc1, create test table:
pxc1 mysql> CREATE TABLE test.deadlocks (
i INT UNSIGNED NOT NULL PRIMARY KEY,
j varchar(32),
t datetime
);
pxc1 mysql> INSERT INTO test.deadlocks VALUES (1, NULL, NULL);
Run myq_status on pxc1:
# myq_status wsrep
mycluster / pxc1 (idx: 1) / Galera 3.11(ra0189ab)
Cluster Node Repl Queue Ops Bytes Conflct Gcache Window
cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst appl comm
12 3 Sync N/A 0 0 0 0 0.0 0.0 0 0 1.8m 0 0 0
12 3 Sync N/A 0 0 0 0 0.0 0.0 0 0 1.8m 0 0 0
12 3 Sync N/A 0 0 0 0 0.0 0.0 0 0 1.8m 0 0 0
63
Reproducing Conflicts - 1
On pxc1:
pxc1 mysql> BEGIN;
pxc1 mysql> UPDATE test.deadlocks SET j='pxc1', t=now() WHERE i=1;
Before commit, go to pxc3:
pxc3 mysql> BEGIN;
pxc3 mysql> UPDATE test.deadlocks SET j='pxc3', t=now() WHERE i=1;
pxc3 mysql> COMMIT;
Now commit the transaction on pxc1:
pxc1 mysql> COMMIT;
pxc1 mysql> SELECT * FROM test.deadlocks;
64
Reproducing Conflicts - 1
It fails:
pxc1 mysql> commit;
ERROR 1213 (40001): Deadlock found when trying to get lock;
try restarting transaction
65
Reproducing Conflicts - 1
Which commit succeeded?
Is this a lcf or a bfa?
How would you diagnose this error?
66
Reproducing Conflicts - 1
Which commit succeeded? PXC3, first one that got in cluster.
Is this a lcf or a bfa? BFA
How would you diagnose this error?
show global status like 'wsrep_local_bf%';
show global status like 'wsrep_local_cert%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| wsrep_local_bf_aborts | 1 |
| wsrep_local_cert_failures | 0 |
+---------------------------+-------+
# myq_status wsrep
mycluster / pxc1 (idx: 1) / Galera 3.11(ra0189ab)
Wsrep Cluster Node Repl Queue Ops Bytes Conflct Gcache Win
time P cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst
10:49:43 P 12 3 Sync 1.1ms 0 0 0 0 0.0 0.0 0 0 3 3
10:49:44 P 12 3 Sync 1.1ms 0 0 0 0 0.0 0.0 0 0 3 3
10:49:45 P 12 3 Sync 1.1ms 0 0 0 0 0.0 0.0 0 0 3 3
10:49:46 P 12 3 Sync 1.1ms 0 0 0 1 0.0 0.3K 0 1 4 3
67
Reproducing Conflicts - 1
Log Conflicts
pxc1 mysql> set global wsrep_log_conflicts=on;
*** Priority TRANSACTION:
TRANSACTION 7743569, ACTIVE 0 sec starting index read
MySQL thread id 2, OS thread handle 0x93e78b70, query id 1395484 System lock
*** Victim TRANSACTION:
TRANSACTION 7743568, ACTIVE 9 sec
MySQL thread id 89984, OS thread handle 0x82bb1b70, query id 1395461 localhost root
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 80 page no 3 n bits 72 index PRIMARY of table test.deadlocks
trx id 7743568 lock_mode X locks rec but not gap
2015-09-19 12:36:17 4285 [Note] WSREP: cluster conflict due to high priority
abort for threads:
2015-09-19 12:36:17 4285 [Note] WSREP: Winning thread:
THD: 2, mode: applier, state: executing, conflict: no conflict,
seqno: 1824234
SQL: (null)
2015-09-19 12:36:17 4285 [Note] WSREP: Victim thread:
THD: 89984, mode: local, state: idle, conflict: no conflict,
seqno: -1
SQL: (null)
68
Reproducing Conflicts - 1
Log Conflicts - Debug
pxc1 mysql> set global wsrep_debug=on;
[Note] WSREP: BF kill (1, seqno: 1824243), victim: (90473) trx: 7743601
[Note] WSREP: Aborting query: void
[Note] WSREP: kill IDLE for 7743601
[Note] WSREP: enqueuing trx abort for (90473)
[Note] WSREP: signaling aborter
[Note] WSREP: WSREP rollback thread wakes for signal
[Note] WSREP: client rollback due to BF abort for (90473), query: (null)
[Note] WSREP: WSREP rollbacker aborted thd: (90473 2649955184)
[Note] WSREP: Deadlock error for: (null)
69
Reproducing Conflicts - 2
rollback; all transactions on all mysql clients
ensure SET GLOBAL wsrep_log_conflicts=on; on all nodes;
run myq_status wsrep on pxc1 and pxc2
run run_app.sh lcf on pxc1 to reproduce a LCF
check:
output of run_app.sh lcf
myq_status
/var/lib/mysql/error.log
70
Reproducing Conflicts - 2
[root@pxc2 ~]# myq_status wsrep
mycluster / pxc2 (idx: 0) / Galera 3.12(r9921e73)
Wsrep Cluster Node Repl Queue Ops Bytes Conflct Gcache Win
time P cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst
13:28:15 P 47 3 Sync 1.1ms 0 0 0 0 0.0 0.0 0 0 7433 101
13:28:16 P 47 3 Sync 1.1ms 0 409 0 4 0.0 1.1K 0 0 7436 5
13:28:17 P 47 3 Sync 1.1ms 0 947 0 0 0.0 0.0 0 0 7436 5
13:28:18 P 47 3 Sync 1.1ms 0 1470 0 0 0.0 0.0 0 0 7436 5
13:28:19 P 47 3 Sync 1.1ms 0 1892 0 0 0.0 0.0 0 0 7436 5
13:28:20 P 47 3 Sync 1.1ms 0 2555 0 0 0.0 0.0 0 0 7436 5
13:28:21 P 47 3 Sync 1.1ms 0 3274 0 0 0.0 0.0 0 0 7436 5
13:28:22 P 47 3 Sync 1.1ms 0 3945 0 0 0.0 0.0 0 0 7436 5
13:28:23 P 47 3 Sync 1.1ms 0 4663 0 0 0.0 0.0 0 0 7436 5
13:28:24 P 47 3 Sync 1.1ms 0 5400 0 0 0.0 0.0 0 0 7436 5
13:28:25 P 47 3 Sync 1.1ms 0 6096 0 0 0.0 0.0 0 0 7436 5
13:28:26 P 47 3 Sync 1.1ms 0 6839 0 0 0.0 0.0 0 0 7436 5
13:28:27 P 47 3 Sync 1.1ms 0 6872 0 0 0.0 0.0 0 0 7436 5
13:28:28 P 47 3 Sync 1.1ms 0 6872 0 0 0.0 0.0 0 0 7436 5
13:28:29 P 47 3 Sync 1.1ms 0 6872 0 0 0.0 0.0 0 0 7436 5
13:28:30 P 47 3 Sync 1.1ms 0 5778 1 1102 0.3K 0.3M 0 0 8537 5
13:28:31 P 47 3 Sync 1.1ms 0 978 0 4838 0.0 1.4M 0 0 13k 5
13:28:32 P 47 3 Sync 1.1ms 0 0 1 985 0.3K 0.3M 2 0 14k 5
13:28:33 P 47 3 Sync N/A 0 0 0 0 0.0 0.0 0 0 14k 5
13:28:34 P 47 3 Sync N/A 0 0 0 0 0.0 0.0 0 0 14k 5
13:28:35 P 47 3 Sync N/A 0 0 0 0 0.0 0.0 0 0 14k 5
71
Reproducing Conflicts - 2
*** Priority TRANSACTION:
TRANSACTION 7787747, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
1 lock struct(s), heap size 312, 0 row lock(s)
MySQL thread id 1, OS thread handle 0x93e78b70, query id 301870 System lock
*** Victim TRANSACTION:
TRANSACTION 7787746, ACTIVE 0 sec
mysql tables in use 1, locked 1
2 lock struct(s), heap size 312, 1 row lock(s), undo log entries 1
MySQL thread id 2575, OS thread handle 0x82369b70, query id 286919 pxc1 192.168
update test.test set sec_col = 0 where id = 1
[Note] WSREP: Winning thread:
THD: 1, mode: applier, state: executing, conflict: no conflict,
seqno: 1846028, SQL: (null)
[Note] WSREP: Victim thread:
THD: 2575, mode: local, state: committing, conflict: no conflict,
seqno: -1, SQL: update test.test set sec_col = 0 where id = 1
[Note] WSREP: BF kill (1, seqno: 1846028), victim: (2575) trx: 7787746
[Note] WSREP: Aborting query: update test.test set sec_col = 0 where id = 1
[Note] WSREP: kill trx QUERY_COMMITTING for 7787746
[Note] WSREP: trx conflict for key (1,FLAT8)258634b1 d0506abd:
source: 5cb369ab-5eca-11e5-8151-7afe8943c31a version: 3 local: 1
state: MUST_ABORT flags: 1 conn_id: 2575 trx_id: 7787746
seqnos (l: 21977, g: 1846030, s: 1846027, d: 1838545, ts: 161299135504641)
<--X-->
source: 9b376860-5e09-11e5-ac17-e6e46a2459ee version: 3 local: 0
state: APPLYING flags: 1 conn_id: 95747 trx_id: 7787229
72
Reproducing Conflicts
Summary
Conflicts are of a concern when determining if PXC is a fit for the
application:
Long running transactions increase chance of conflicts
Heavy write workload on multiple nodes
Large transactions increase chance of conflicts
Mark Callaghan's law a given row can't be modified more often
than 1/RTT times a second
These issues can usually be resolved by writing to 1 node only.
73
Reproducing Conflicts - Summary
74
Reproducing Conflicts - Summary
75
Reproducing Conflicts - Summary
76
What the ...?
Replication Failures
77
Replication Failure
When Do They Happen?
When a Total Order Isolation (TOI) error happened
DDL error: CREATE TABLE, ALTER TABLE...
GRANT failed
When there was a node inconsistency
Bug in Galera replication
Human Error, for example skipping binary log
(SQL_LOG_BIN=0) when doing writes
78
Replication Failure
What Happens?
At every error:
A GRA_*.log file is created into the MySQL datadir
[root@pxc1 ~]# ls -alsh1 /var/lib/mysql/GRA_*
-rw-rw----. 1 mysql 89 Sep 15 10:26 /var/lib/mysql/GRA_1_127792.log
-rw-rw----. 1 mysql 83 Sep 10 12:00 /var/lib/mysql/GRA_2_5.log
A message is written to the errorlog
/var/lib/mysql/error.log
It's possible to decode them, they are binary logs
They can be safely removed
79
Replication Failure
Reading GRA Contents
Run Application only on pxc1 using only pxc1 as writer:
# run_app.sh pxc1
pxc1 mysql> create table test.nocolumns;
What do you get?
80
Replication Failure
Reading GRA Contents
Run Application only on pxc1 using only pxc1 as writer:
# run_app.sh pxc1
pxc1 mysql> create table test.nocolumns;
What do you get?
ERROR 1113 (42000): A table must have at least 1 column
81
Replication Failure
Reading GRA Contents
Run Application only on pxc1 using only pxc1 as writer:
# run_app.sh pxc1
pxc1 mysql> create table test.nocolumns;
What do you get?
ERROR 1113 (42000): A table must have at least 1 column
Error Log Other Nodes:
[ERROR] Slave SQL: Error 'A table must have at least 1 column' on query.
Default database: ''. Query: 'create table test.nocolumns', Error_code: 1113
[Warning] WSREP: RBR event 1 Query apply warning: 1, 1881065
[Warning] WSREP: Ignoring error for TO isolated action:
source: 9b376860-5e09-11e5-ac17-e6e46a2459ee version: 3 local: 0
state: APPLYING flags: 65 conn_id: 106500 trx_id: -1
seqnos (l: 57292, g: 1881065, s: 1881064, d: 1881064, ts: 76501555632582)
82
Replication Failure
Making GRA Header File
note: Binary log headers only differ between versions, they can be
reused
Get a binary log without checksums:
Create the GRA_header file with the new binary log:
dd if=/var/lib/mysql/pxc2-bin.000018 bs=120 count=1 
of=/root/GRA_header
pxc2 mysql> set global binlog_checksum=0;
pxc2 mysql> flush binary logs;
pxc2 mysql> show master status;
+-----------------+----------+--------------+------------------+-------------------
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set
+-----------------+----------+--------------+------------------+-------------------
| pxc2-bin.000018 | 790333 | | |
+-----------------+----------+--------------+------------------+-------------------
83
Replication Failure
Reading GRA Contents
Join the header and one GRA*.log file (note the seqno of 1881065)
cat /root/GRA_header /var/lib/mysql/GRA_1_1881065.log 
>> /root/GRA_1_1881065-bin.log
View the content with mysqlbinlog
mysqlbinlog -vvv /root/GRA_1_1881065-bin.log
84
Replication Failure
Node Consistency Compromised
Delete some data while skipping binary log completely:
pxc2 mysql> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)
pxc2 mysql> delete from sbtest.sbtest1 limit 100;
Query OK, 100 rows affected (0.00 sec)
Repeat the DELETE until pxc2 crashes...
85
Replication Failure
Node Consistency Compromised
Error:
[ERROR] Slave SQL: Could not execute Update_rows event
on table sbtest.sbtest1;
Can't find record in 'sbtest1', Error_code: 1032;
handler error HA_ERR_KEY_NOT_FOUND;
the event's master log FIRST, end_log_pos 540, Error_code: 1032
[Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 1890959
[Warning] WSREP: Failed to apply app buffer: seqno: 1890959, status: 1
at galera/src/trx_handle.cpp:apply():351
Retrying 2th time ...
Retrying 4th time ...
[ERROR] WSREP: Failed to apply trx:
source: 9b376860-5e09-11e5-ac17-e6e46a2459ee version: 3 local: 0
state: APPLYING flags: 1 conn_id: 117611 trx_id: 7877213
seqnos (l: 67341, g: 1890959, s: 1890958, d: 1890841, ts: 78933399926835)
[ERROR] WSREP: Failed to apply trx 1890959 4 times
[ERROR] WSREP: Node consistency compromized, aborting...
...
[Note] WSREP: /usr/sbin/mysqld: Terminated.
86
Replication Failure
Node Consistency Compromised
[root@pxc2 ~]# cat /root/GRA_header /var/lib/mysql/GRA_1_1890959.log | 
mysqlbinlog -vvv -
BINLOG '
...MDQwNjUyMjMwMTQ=...
'/*!*/;
### UPDATE sbtest.sbtest1
### WHERE
### @1=3528 /* INT meta=0 nullable=0 is_null=0 */
### @2=4395 /* INT meta=0 nullable=0 is_null=0 */
### @3='01945529982-83991536409-94055999891-11150850160-46682230772-19159811582-7
### @4='92814455222-06024456935-25380449439-64345775537-04065223014' /* STRING(60
### SET
### @1=3528 /* INT meta=0 nullable=0 is_null=0 */
### @2=4396 /* INT meta=0 nullable=0 is_null=0 */
### @3='01945529982-83991536409-94055999891-11150850160-46682230772-19159811582-7
### @4='92814455222-06024456935-25380449439-64345775537-04065223014' /* STRING(60
# at 660
#150919 15:56:36 server id 1 end_log_pos 595 Table_map: sbtest.sbtest1 mapped to
# at 715
#150919 15:56:36 server id 1 end_log_pos 1005 Update_rows: table id 70 flags: STM
...
87
Gcache... what's that ?
Galera Cache
88
Galera Cache
All nodes contain a Cache of recent writesets, used to perform IST
used to store the writesets in circular buffer style
89
Galera Cache
All nodes contain a Cache of recent writesets, used to perform IST
used to store the writesets in circular buffer style
preallocated file with a specific size, configurable:
wsrep_provider_options = "gcache.size=1G"
default size is 128M
90
Galera Cache
All nodes contain a Cache of recent writesets, used to perform IST
used to store the writesets in circular buffer style
preallocated file with a specific size, configurable:
wsrep_provider_options = "gcache.size=1G"
default size is 128M
Galera Cache is mmaped (I/O buffered to memory)
So OS might swap (set vm.swappiness to 10)
use fincore-linux or dbsake fincore to see how much of the
file is cached in memory
91
Galera Cache
All nodes contain a Cache of recent writesets, used to perform IST
used to store the writesets in circular buffer style
preallocated file with a specific size, configurable:
wsrep_provider_options = "gcache.size=1G"
default size is 128M
Galera Cache is mmaped (I/O buffered to memory)
So OS might swap (set vm.swappiness to 10)
use fincore-linux or dbsake fincore to see how much of the
file is cached in memory
status counter wsrep_local_cached_downto to find last seqno
in the gcache
wsrep_gcache_pool_size shows the size of the page pool
and/or dynamic memroy allocated for gcache (since PXC 5.6.24)
92
Galera Cache
Calculating Optimal Size
It would be great that we could handle 1 hour of changes in the galera
cache for IST.
How large does the Galera cache need to be?
93
Galera Cache
Calculating Optimal Size
It would be great that we could handle 1 hour of changes in the galera
cache for IST.
How large does the Galera cache need to be?
We can calculate how much writes happen over time
wsrep_replicated_bytes: writesets sent to other nodes
wsrep_received_bytes: writesets received from other nodes
94
Galera Cache
Calculating Optimal Size
It would be great that we could handle 1 hour of changes in the galera
cache for IST.
How large does the Galera cache need to be?
We can calculate how much writes happen over time
wsrep_replicated_bytes: writesets sent to other nodes
wsrep_received_bytes: writesets received from other nodes
SHOW GLOBAL STATUS LIKE 'wsrep_%d_bytes';
SELECT SLEEP(60);
SHOW GLOBAL STATUS LIKE 'wsrep_%d_bytes';
Sum up both replicated and received
for each status and subtract.
95
Galera Cache
Calculating Optimal Size
Easier to do is:
SELECT ROUND(SUM(bytes)/1024/1024*60) AS megabytes_per_hour FROM
(SELECT SUM(VARIABLE_VALUE) * -1 AS bytes
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME IN ('wsrep_received_bytes',
'wsrep_replicated_bytes')
UNION ALL
SELECT sleep(60) AS bytes
UNION ALL
SELECT SUM(VARIABLE_VALUE) AS bytes
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME IN ('wsrep_received_bytes',
'wsrep_replicated_bytes')
) AS COUNTED
+--------------------+
| megabytes_per_hour |
+--------------------+
| 302 |
+--------------------+
1 row in set (1 min 0.00 sec)
96
Galera Cache
How Much Filesystem Cache Used?
Check galera cache's memory usage using dbsake:
dbsake fincore /var/lib/mysql/galera.cache
/var/lib/mysql/galera.cache: total_pages=32769 cached=448 percent=1.37
97
Hey! Wait for me!
Flow Control
98
Flow Control
Avoiding nodes drift behind (slave lag?)
Any node in the cluster can ask the other nodes to pause writes if it
lags behind too much.
Caused by wsrep_local_recv_queue exceeding a node’s
gcs.fc_limit
Can cause all writes on all nodes in the entire cluster to be
stalled.
99
Flow Control
100
Flow Control
101
Flow Control
102
Flow Control
103
Flow Control
104
Flow Control
105
Flow Control
106
Flow Control
107
Flow Control
108
Flow Control
109
Flow Control
110
Flow Control
111
Flow Control
112
Flow Control
113
Flow Control
114
Flow Control
115
Flow Control
116
Flow Control
117
Flow Control
118
Flow Control
119
Flow Control
Status Counters
wsrep_flow_control_paused_ns:
nanoseconds since starts of node did the cluster get stalled.
wsrep_flow_control_recv:
Amount of flow control messages received from other nodes
wsrep_flow_control_sent:
Amount of flow control messages sent to other nodes
(wsrep_flow_control_paused: Only use in Galera 2
% of the time the cluster was stalled since last SHOW GLOBAL
STATUS)
120
Flow Control
Observing Flow Control
Run the application (run_app.sh)
Run myq_status wsrep on all nodes.
Take a read lock on pxc3 and observe its effect on the cluster.
FLUSH TABLES WITH READ LOCK
121
pxc1
run_app.sh
all nodes:
myq_status wsrep
Flow Control
Observing Flow Control
Run the application (run_app.sh)
Run myq_status wsrep on all nodes.
Take a read lock on pxc3 and observe its effect on the cluster.
FLUSH TABLES WITH READ LOCK
pxc3 mysql> flush tables with read lock;
wait until flow control kicks in...
pcx3 mysql> unlock tables;
122
Flow Control
Increase the limit
Increase the flow control limit on pxc3 to 20000 and perform the same
exercise as previously.
123
pxc1
run_app.sh
all nodes:
myq_status wsrep
Flow Control
Increase the limit
Increase the flow control limit on pxc3 to 20000 and perform the same
exercise as previously.
pxc3 mysql> set global wsrep_provider_options="gcs.fc_limit=20000";
pxc3 mysql> flush tables with read lock;
wait until flow control kicks in...
pcx3 mysql> unlock tables;
124
Flow Control
Increase the limit
What do you see?
A node can lag behind more before sending flow control messages.
This can be controlled per node.
Is there another alternative?
125
Flow Control
DESYNC mode
It's possible to let a node going behind the flow control limit.
This can be performed by setting wsrep_desync=ON
Try the same exercises but enable DESYNC on pxc3.
126
pxc1
run_app.sh
all nodes:
myq_status wsrep
Flow Control
DESYNC mode
It's possible to let a node going behind the flow control limit.
This can be performed by setting wsrep_desync=ON
Try the same exercises but enable DESYNC on pxc3.
pxc3 mysql> set global wsrep_provider_options="gcs.fc_limit=16";
pxc3 mysql> set global wsrep_desync=on;
Don't forget when done:
pcx3 mysql> unlock tables;
127
How much more can we handle?
Max Replication Throughput
128
Max Replication Throughput
We can measure the write throughput of a node/cluster:
Put a node in wsrep_desync=on to avoid flow control messages
being sent
Lock the replication with FLUSH TABLES WITH READ LOCK
Wait and build up a queue for a certain amount of time
Unlock replication again with UNLOCK TABLES
Measure how fast it syncs up again
Compare with normal workload
129
Max Replication Throughput
Measure
On pxc2 run
show global status like 'wsrep_last_committed';
select sleep(60);
show global status like 'wsrep_last_committed';
One Liner:
SELECT ROUND(SUM(trx)/60) AS transactions_per_second FROM
(SELECT VARIABLE_VALUE * -1 AS trx
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'wsrep_last_committed'
UNION ALL SELECT sleep(60) AS trx
UNION ALL SELECT VARIABLE_VALUE AS trx
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'wsrep_last_committed') AS COUNTED;
+-------------------------+
| transactions_per_second |
+-------------------------+
| 185 |
+-------------------------+
130
Max Replication Throughput
Measure
The following stored function is already installed:
USE test; DROP FUNCTION IF EXISTS galeraWaitUntilEmptyRecvQueue;
DELIMITER $$
CREATE
DEFINER=root@localhost FUNCTION galeraWaitUntilEmptyRecvQueue()
RETURNS INT UNSIGNED READS SQL DATA
BEGIN
DECLARE queue INT UNSIGNED;
DECLARE starttime TIMESTAMP;
DECLARE blackhole INT UNSIGNED;
SET starttime = SYSDATE();
SELECT VARIABLE_VALUE AS trx INTO queue
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'wsrep_local_recv_queue';
WHILE queue > 1 DO /* we allow the queue to be 1 */
SELECT VARIABLE_VALUE AS trx INTO queue
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'wsrep_local_recv_queue';
SELECT SLEEP(1) into blackhole;
END WHILE;
RETURN SYSDATE() - starttime;
END$$
131
Max Replication Throughput
Measure
SET GLOBAL wsrep_desync=on;
FLUSH TABLES WITH READ LOCK;
...wait until the queue rises to be quite high, about 20.000
UNLOCK TABLES; use test;
SELECT sum(trx) as transactions, sum(duration) as time,
IF(sum(duration) < 5, 'DID NOT TAKE LONG ENOUGH TO BE ACCURATE',
ROUND(SUM(trx)/SUM(duration)))
AS transactions_per_second
FROM
(SELECT VARIABLE_VALUE * -1 AS trx, null as duration
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'wsrep_last_committed'
UNION ALL
SELECT null as trx, galeraWaitUntilEmptyRecvQueue() AS duration
UNION ALL
SELECT VARIABLE_VALUE AS trx, null as duration
FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'wsrep_last_committed'
) AS COUNTED;
+--------------+------+-------------------------+
| transactions | time | transactions_per_second |
+--------------+------+-------------------------+
| 17764 | 11 | 1615 |
132
Max Replication Throughput
Measure
Normal Workload: 185 tps
During Catchup: 1615 tps
Capacity: 185/1615= 11,5% of capacity
133
A wired world
Networking
134
Networking
With Synchronous Replication, It Matters
Network issues cause cluster issues, much faster compared to
asynchronous replication.
Network Partitioning
Nodes joining/leaving
Causing clusters to go Non-Primary,
not accepting any reads and writes anymore.
Latency has an impact on response time:
at COMMIT of a transaction
depending on wsrep_sync_wait setting for other statements
too.
135
Networking
Status Variables
pxc2 mysql> show global status like 'wsrep_evs_repl_latency';
+------------------------+-------------------------------------------------+
| Variable_name | Value |
+------------------------+-------------------------------------------------+
| wsrep_evs_repl_latency | 0.000745194/0.00175792/0.00832816/0.00184453/16 |
+------------------------+-------------------------------------------------+
Reset Interval with evs.stats_report_period=1min
# myq_status wsrep_latency`:
mycluster / pxc2 (idx: 2) / Galera 3.12(r9921e73)
Wsrep Cluster Node Ops Latencies
time P cnf # Stat Up Dn Size Min Avg Max Dev
22:55:48 P 53 3 Sync 0 65 9 681µs 1307µs 4192µs 1032µs
22:55:49 P 53 3 Sync 0 52 10 681µs 1274µs 4192µs 984µs
22:55:50 P 53 3 Sync 0 47 10 681µs 1274µs 4192µs 984µs
22:55:51 P 53 3 Sync 0 61 11 681µs 1234µs 4192µs 947µs
136
Networking
Latency
On pxc1, start the 'application':
# myq_status wsrep_latency
mycluster / pxc2 (idx: 2) / Galera 3.12(r9921e73)
Wsrep Cluster Node Ops Latencies
time P cnf # Stat Up Dn Size Min Avg Max Dev
23:02:44 P 53 3 Sync 0 48 7 777µs 1236µs 2126µs 434µs
23:02:45 P 53 3 Sync 0 47 7 777µs 1236µs 2126µs 434µs
23:02:46 P 53 3 Sync 0 58 7 777µs 1236µs 2126µs 434µs
run_app.sh pxc1
[1125s] tps: 51.05, reads: 687.72, writes: 204.21, response time: 10.54ms (95%), er
[1126s] tps: 33.98, reads: 475.77, writes: 135.94, response time: 15.07ms (95%), er
[1127s] tps: 42.01, reads: 588.19, writes: 168.05, response time: 12.79ms (95%), er
# myq_status wsrep
Wsrep Cluster Node Repl Queue Ops Bytes Conflct Gcache Wind
time P cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst
23:02:44 P 53 3 Sync 1.2ms 0 0 0 49 0.0 80K 0 0 77k 178
23:02:45 P 53 3 Sync 1.2ms 0 0 0 43 0.0 70K 0 0 77k 176
23:02:46 P 53 3 Sync 1.2ms 0 0 0 55 0.0 90K 0 0 77k 164
137
Networking
WAN Impact on Latency
Change from a LAN setup into a cluster across 2 datacenters
last_node_to_dc2.sh enable
138
Networking
WAN Impact on Latency
last_node_to_dc2.sh enable
What can we observe in the cluster after running this command?
139
Networking
WAN Impact on Latency
last_node_to_dc2.sh enable
What can we observe in the cluster after running this command?
myq_status wsrep_latency is up 200ms
run_app.sh throughput is a lot lower
run_app.sh response time is a lot higher
mycluster / pxc3 (idx: 0) / Galera 3.12(r9921e73)
Wsrep Cluster Node Ops Latencies
time P cnf # Stat Up Dn Size Min Avg Max Dev
23:23:34 P 53 3 Sync 0 14 6 201ms 202ms 206ms 2073µs
23:23:35 P 53 3 Sync 0 16 6 201ms 202ms 206ms 2073µs
23:23:36 P 53 3 Sync 0 14 6 201ms 202ms 206ms 2073µs
23:23:37 P 53 3 Sync 0 14 6 201ms 202ms 206ms 2073µs
23:23:38 P 53 3 Sync 0 14 6 201ms 202ms 206ms 2073µs
140
Networking
WAN Impact on Latency
Why Is that?
141
Networking
WAN Impact on Latency
Why Is that?
Don't forget this is synchronous replication, the writeset is replicated
synchronously.
Delivers the writeset to all nodes in the cluster at trx commit.
And all nodes acknowledging the writeset
Generates a GLOBAL ORDER for that transaction (GTID)
Cost is ~roundtrip latency for COMMIT to furthest node
GTID serialized, but many writesets can be replicating in parallel
Remember Mark Callaghan's Law a given row can't be modified
more often than 1/RTT times a second
142
Networking
WAN Configuration
Don't forget in WAN to use higher timeouts and send windows:
evs.user_send_window=2 ~> 256
evs.send_window=4 ~> 512
evs.keepalive_period=PT1S ~> PT1S
evs.suspect_timeout=PT5S ~> PT15S
evs.inactive_timeout=PT15S ~> PT45S
Don't forget to disable the WAN:
last_node_to_dc2.sh disable
143
Networking
WAN Configuration - Bandwidth
How to reduce bandwith used between datacenters?
144
Networking
WAN Configuration - Bandwidth
How to reduce bandwith used between datacenters?
Use segments (gmcast.segment) to reduce network traffic
between datacenters
Use minimal binlog_row_image to reduce binary log size
repl.key_format = FLAT8 which by default is already smallest
145
Networking
Replication Without Segments
Here we have a cluster spread across 2 datacenters
146
Networking
Replication Without Segments
A transaction executed on node1
147
Networking
Replication Without Segments
A transaction executed on node1
will be sent to all other nodes
148
Networking
Replication Without Segments
As writes are accepted everywhere, every node therefore communicate
with all nodes, including arbitrator nodes
149
Networking
Replication With Segments
Galera 3.0 introduced the segment concept
Replication traffic is minimized between segments
Donor selection is preferred on local segment
150
Networking
Replication With Segments
Transactions are only sent once to other segments
151
Networking
Replication With Segments
They do not always go through the same nodes, they all still need to be
able to connect to eachother.
152
Networking
Replication With Segments
Run the run_app.sh on pxc1
On another terminal on pxc1, run speedometer
speedometer -r eth1 -t eth1 -l -m 524288
Change the segment on pxc2 an pxc3 in /etc/my.cnf and restart
MySQL (this is not dynamic)
wsrep_provider_options='gmcast.segment=2'
Check the bandwidth usage again.
How do you explain this?
153
Networking
Replication With Segments
Bandwidth Transmit usage drops a lot (/2)
154
Networking
Binlog Row Image Format
On pxc1 run speedometer again::
speedometer -r eth1 -t eth1 -l -m 262144
On pxc1, set the binlog_row_image=minimal:
pxc1 mysql> SET GLOBAL binlog_row_image=minimal;
Check the bandwith usage
155
Networking
Binlog Row Image Format
On pxc1 run speedometer again::
speedometer -r eth1 -t eth1 -l -m 262144
On pxc1, set the binlog_row_image=minimal:
pxc1 mysql> SET GLOBAL binlog_row_image=minimal;
Check the bandwith usage
156
Networking
Not Completely Synchronous
Applying transactions is asynchronous
By default, reads on different nodes might show stale data.
Practically, flow control prevents this from lagging too much
behind, reducing stale data.
Read Consistency can be configured: We can enforce a read reads
the latest committed data, cluster wide.
What if we absolutely need consistency?
157
Networking
Not Completely Synchronous
What if we absolutely need consistency?
Since PXC 5.6.20-27.7:
SET <session|global> wsrep_sync_wait=[1|2|4];
1 Indicates check on READ statements, including SELECT,
SHOW, BEGIN/START TRANSACTION.
2 Indicates check on UPDATE and DELETE statements.
4 Indicates check on INSERT and REPLACE statements
Before: <session|global> wsrep_causal_reads=[1|0];
158
Networking
Consistent Reads & Latency
How does enabling WSREP_SYNC_WAIT consistent reads affect WAN
environments?
Stop the application run_app.sh
Move the last node to DC2:
last_node_to_dc2.sh enable
On pxc1, run:
pxc1 mysql> select * from sbtest.sbtest1 where id = 4;
...
1 row in set (0.00 sec)
159
Networking
Consistent Reads & Latency
Now change the causality check to ensure that READ statements are in
sync, and perform the same SELECT:
pxc1 mysql> SET SESSION wsrep_sync_wait=1;
pxc1 mysql> select * from sbtest.sbtest1 where id = 4;
What do you see ?
160
Networking
Consistent Reads & Latency
Now change the causality check to ensure that READ statements are in
sync, and perform the same SELECT:
pxc1 mysql> SET SESSION wsrep_sync_wait=1;
pxc1 mysql> select * from sbtest.sbtest1 where id = 4;
What do you see ?
...
1 row in set (0.20 sec)
161
Networking
Consistent Reads & Latency
Now change the causality check to ensure that READ statements are in
sync, and perform the same SELECT:
pxc1 mysql> SET SESSION wsrep_sync_wait=1;
pxc1 mysql> select * from sbtest.sbtest1 where id = 4;
What do you see ?
...
1 row in set (0.20 sec)
Put back pxc3 on dc1:
last_node_to_dc2.sh disable
162
Save My Data
Backups
163
Backups
Full: Percona XtraBackup
Feature-rich online physical Backups
Since PXC 5.6.21-25.8, There is LOCK TABLES FOR BACKUP
No FLUSH TABLES WITH READ LOCK anymore
Locks only DDL and MyISAM, leaves InnoDB fully unlocked
No more need to set the backup node in DESYNC to avoid
Flow Control
164
Backups
Full: Percona XtraBackup
Feature-rich online physical Backups
Since PXC 5.6.21-25.8, There is LOCK TABLES FOR BACKUP
No FLUSH TABLES WITH READ LOCK anymore
Locks only DDL and MyISAM, leaves InnoDB fully unlocked
No more need to set the backup node in DESYNC to avoid
Flow Control
Point In Time Recovery: Binary Logs
It's also recommended to save the binary logs to perform point-
in-time recovery
With mysqlbinlog 5.6, it's possible to stream them to another
'backup' host.
165
Backups
Full Backup
On pxc1, run the application:
run_app.sh pxc1
On pxc3, take a full backup with Percona Xtrabackup
166
Backups
Full Backup
On pxc1, run the application:
run_app.sh pxc1
On pxc3, take a full backup with Percona Xtrabackup
# innobackupex --galera-info --no-timestamp /root/backups/
xtrabackup version 2.2.12 based on MySQL server 5.6.24 Linux (i686) (revision id:
[01] Copying ./ibdata1 to /root/backups/ibdata1
[01] ...done
[01] Copying ./sbtest/sbtest1.ibd to /root/backups/sbtest/sbtest1.ibd
...
150920 08:31:01 innobackupex: Executing LOCK TABLES FOR BACKUP...
...
150920 08:31:01 innobackupex: Executing LOCK BINLOG FOR BACKUP...
...
150920 08:31:01 innobackupex: All tables unlocked
innobackupex: MySQL binlog position: filename 'pxc3-bin.000001',`
position 3133515
150920 08:31:01 innobackupex: completed OK!
167
Backups
Full Backup
Apply the logs and get the seqno:
# innobackupex --apply-log /root/backups/
# cat /root/backups/xtrabackup_galera_info
b55685a3-5f70-11e5-87f8-2f86c54ca425:1945
# cat /root/backups/xtrabackup_binlog_info
pxc3-bin.000001 3133515
We now have a full backup ready to be used.
168
Backups
Stream Binary Logs
Now setup mysqlbinlog to stream the binlogs in /root/binlogs.
As requirement, ensure the following is configured:
log_slave_updates
server-id=__ID__
169
Backups
Stream Binary Logs
Now setup mysqlbinlog to stream the binlogs in /root/binlogs.
As requirement, ensure the following is configured:
log_slave_updates
server-id=__ID__
Get mysqlbinlog running:
# mkdir /root/binlogs
# mysql -BN -e "show binary logs" | head -n1 | cut -f1
pxc3-bin.000001
# mysqlbinlog --read-from-remote-server --host=127.0.0.1 
--raw --stop-never --result-file=/root/binlogs/ pxc3-bin.000001 &
170
Backups
Point-in-Time Recovery
On pxc2 we update a record:
pxc2 mysql> update sbtest.sbtest1 set pad = "PLAM2015" where id = 999;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
171
Backups
Point-in-Time Recovery
On pxc2 we update a record:
pxc2 mysql> update sbtest.sbtest1 set pad = "PLAM2015" where id = 999;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
And now it's time to break things, on pxc2, TRUNCATE the sbtest1
table.
pxc2 mysql> truncate table sbtest.sbtest1;
Query OK, 0 rows affected (0.06 sec)
172
Backups
Point-in-Time Recovery
BROKEN! What now?
173
Backups
Point-in-Time Recovery
BROKEN! What now?
Let's stop MySQL on all nodes and restore from backup.
service mysql stop
Restore the backup on pxc3:
[root@pxc3 ~]# rm -rf /var/lib/mysql/*
[root@pxc3 ~]# innobackupex --copy-back /root/backups/
[root@pxc3 ~]# chown mysql. -R /var/lib/mysql/
174
Backups
Point-in-Time Recovery
BROKEN! What now?
Let's stop MySQL on all nodes and restore from backup.
service mysql stop
Restore the backup on pxc3:
[root@pxc3 ~]# rm -rf /var/lib/mysql/*
[root@pxc3 ~]# innobackupex --copy-back /root/backups/
[root@pxc3 ~]# chown mysql. -R /var/lib/mysql/
On pxc3, we bootstrap a completely new cluster:
[root@pxc3 ~]# service mysql bootstrap-pxc
175
Backups
Point-in-Time Recovery
The full backup is restored, now we need to do point in time recovery....
Find the position of the "event" that caused the problems
176
Backups
Point-in-Time Recovery
The full backup is restored, now we need to do point in time recovery....
Find the position of the "event" that caused the problems
We know the sbtest.sbtest1 table got truncated. Let's find that
statement:
[root@pxc3 ~]# mysqlbinlog /root/binlogs/pxc3-bin.* | grep -i truncate -B10
OC05MzAyOTU2MDQzOC0xNzU5MDQyMTM1NS02MDYyOTQ1OTk1MC0wODY4ODc0NTg2NTCjjIc=
'/*!*/;
# at 13961536
#150920 8:33:15 server id 1 end_log_pos 13961567 CRC32 0xc97eb41f Xid = 8667
COMMIT/*!*/;
# at 13961567
#150920 8:33:15 server id 2 end_log_pos 13961659 CRC32 0x491d7ff8 Query thread_
SET TIMESTAMP=1442737995/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
truncate table sbtest.sbtest1
177
Backups
Point-in-Time Recovery
We need to recover up to TRUNCATE TABLE, which was position
13961567
We can replay the binary log(s) from the last position we backupped
178
Backups
Point-in-Time Recovery
We need to recover up to TRUNCATE TABLE, which was position
13961567
We can replay the binary log(s) from the last position we backupped
# cat /var/lib/mysql/xtrabackup_info | grep binlog
binlog_pos = filename 'pxc3-bin.000001', position 3133515
179
Backups
Point-in-Time Recovery
We need to recover up to TRUNCATE TABLE, which was position
13961567
We can replay the binary log(s) from the last position we backupped
# cat /var/lib/mysql/xtrabackup_info | grep binlog
binlog_pos = filename 'pxc3-bin.000001', position 3133515
Note that if we don't have streamed the binary logs from the backup
server... it can happen, then you need to find the position from the Xid,
which is the galera seqno:
#150920 8:33:15 server id 1 end_log_pos 13961567 CRC32 0xc97eb41f Xid = 8667
COMMIT/*!*/;
# at 13961567
180
Backups
Point-in-Time Recovery
Let's replay it now:
# mysqlbinlog /root/binlogs/pxc3-bin.000001 
--start-position=3133515 --stop-position=13961567 | mysql
181
Backups
Point-in-Time Recovery
Let's replay it now:
# mysqlbinlog /root/binlogs/pxc3-bin.000001 
--start-position=3133515 --stop-position=13961567 | mysql
Let's Verify:
pxc3 mysql> select id, pad from sbtest.sbtest1 where id =999;
+-----+----------+
| id | pad |
+-----+----------+
| 999 | PLAM2015 |
+-----+----------+
1 row in set (0.00 sec)
182
Backups
Point-in-Time Recovery
Let's replay it now:
# mysqlbinlog /root/binlogs/pxc3-bin.000001 
--start-position=3133515 --stop-position=13961567 | mysql
Let's Verify:
pxc3 mysql> select id, pad from sbtest.sbtest1 where id =999;
+-----+----------+
| id | pad |
+-----+----------+
| 999 | PLAM2015 |
+-----+----------+
1 row in set (0.00 sec)
You can now restart the other nodes and they will perform
SST.
183
Spread the load
Load Balancers
184
Load Balancers
With PXC a Load Balancer is commonly used:
Layer 4
Lot's of choice
Usually HAProxy (most-common)
Layer 7:
MariaDB MaxScale
ScaleArc (proprietary)
ProxySQL
mysql-proxy (beta)
185
Load Balancers
Usually with Galera, people uses a load balancer to route the MySQL
requests from the application to a node
Redirect writes to another node when problems happen
Mostly 1 node for writes, others for reads
Layer 4: 1 TCP port writes, 1 TCP port reads
Layer 7: Automatic (challenging)
186
Load Balancers
HAProxy
On pxc1, we have HA Proxy configured like this when listening on port
3308:
## active-passive
listen 3308-active-passive-writes 0.0.0.0:3308
mode tcp
balance leastconn
option httpchk
server pxc1 pxc1:3306 check port 8000 inter 1000 rise 3 fall 3
server pxc2 pxc2:3306 check port 8000 inter 1000 rise 3 fall 3 backup
server pxc3 pxc3:3306 check port 8000 inter 1000 rise 3 fall 3 backup
187
Load Balancers
HAProxy
On pxc2 and pxc3, we connect to the loadbalancer and run a SELECT:
mysql -h pxc1 -P 3308 -utest -ptest -e "select @@wsrep_node_name, sleep(100)"
And on pxc1 while the previous command is running we check the
processlist:
pxc1 mysql> SELECT PROCESSLIST_ID AS id, PROCESSLIST_USER AS user,
PROCESSLIST_HOST AS host, PROCESSLIST_INFO
FROM performance_schema.threads
WHERE PROCESSLIST_INFO LIKE 'select @% sleep%';
+------+------+------+-------------------------------------+
| id | user | host | PROCESSLIST_INFO |
+------+------+------+-------------------------------------+
| 294 | test | pxc1 | select @@wsrep_node_name, sleep(10) |
| 297 | test | pxc1 | select @@wsrep_node_name, sleep(10) |
+------+------+------+-------------------------------------+
188
Load Balancers
HAProxy
On pxc2 and pxc3, we connect to the loadbalancer and run a SELECT:
mysql -h pxc1 -P 3308 -utest -ptest -e "select @@wsrep_node_name, sleep(100)"
And on pxc1 while the previous command is running we check the
processlist:
pxc1 mysql> SELECT PROCESSLIST_ID AS id, PROCESSLIST_USER AS user,
PROCESSLIST_HOST AS host, PROCESSLIST_INFO
FROM performance_schema.threads
WHERE PROCESSLIST_INFO LIKE 'select @% sleep%';
+------+------+------+-------------------------------------+
| id | user | host | PROCESSLIST_INFO |
+------+------+------+-------------------------------------+
| 294 | test | pxc1 | select @@wsrep_node_name, sleep(10) |
| 297 | test | pxc1 | select @@wsrep_node_name, sleep(10) |
+------+------+------+-------------------------------------+
What do you notice ?
189
Load Balancers
HA Proxy & Proxy Protocol
Since Percona XtraDB Cluster 5.6.25-73.1 we support proxy
protocol! (Almost released)
190
Load Balancers
HA Proxy & Proxy Protocol
Since Percona XtraDB Cluster 5.6.25-73.1 we support proxy
protocol! (Almost released)
Let's enable this in my.cnf on all 3 nodes:
[mysqld]
...
proxy_protocol_networks=*
...
191
Load Balancers
HA Proxy & Proxy Protocol
Since Percona XtraDB Cluster 5.6.25-73.1 we support proxy
protocol! (Almost released)
Let's enable this in my.cnf on all 3 nodes:
[mysqld]
...
proxy_protocol_networks=*
...
Restart them one by one:
[root@pxc1 ~]# /etc/init.d/mysql restart
...
[root@pxc2 ~]# /etc/init.d/mysql restart
...
[root@pxc3 ~]# /etc/init.d/mysql restart
192
Load Balancers
HA Proxy & Proxy Protocol
On pxc1, we have HAProxy configured like this when listening on port
3310 to support proxy protocol:
listen 3310-active-passive-writes 0.0.0.0:3310
mode tcp
balance roundrobin
option httpchk
server pxc1 pxc1:3306 send-proxy-v2 check port 8000 inter 1000 rise 3 fall 3
server pxc2 pxc2:3306 send-proxy-v2 check port 8000 inter 1000 backup
server pxc3 pxc3:3306 send-proxy-v2 check port 8000 inter 1000 backup
And restart HAProxy:
service haproxy restart
193
Load Balancers
HA Proxy & Proxy Protocol
On pxc2 and pxc3, we connect to the loadbalancer (using the new
port) and run a SELECT:
mysql -h pxc1 -P 3310 -utest -ptest -e "select @@wsrep_node_name, sleep(10)"
And on pxc1 while the previous command is running we check the
processlist:
pxc1 mysql> SELECT PROCESSLIST_ID AS id, PROCESSLIST_USER AS user,
PROCESSLIST_HOST AS host, PROCESSLIST_INFO
FROM performance_schema.threads
WHERE PROCESSLIST_INFO LIKE 'select @% sleep%';
+------+------+------+-------------------------------------+
| id | user | host | PROCESSLIST_INFO |
+------+------+------+-------------------------------------+
| 75 | test | pxc2 | select @@wsrep_node_name, sleep(10) |
| 76 | test | pxc3 | select @@wsrep_node_name, sleep(10) |
+------+------+------+-------------------------------------+
194
Load Balancers
HA Proxy & Proxy Protocol
Try to connect from pxc1 to pxc1, not using a load balancer:
pxc1 # mysql -h pxc1 -P 3306
What happens?
195
Load Balancers
HA Proxy & Proxy Protocol
Try to connect from pxc1 to pxc1, not using a load balancer:
pxc1 # mysql -h pxc1 -P 3306
What happens?
You can't connect to mysql anymore.
When proxy_protocol_network is enabled it won't connect if you
don't send TCP proxy header!
196
Load Balancers
HA Proxy & Proxy Protocol
Try to connect from pxc1 to pxc1, not using a load balancer:
pxc1 # mysql -h pxc1 -P 3306
What happens?
You can't connect to mysql anymore.
When proxy_protocol_network is enabled it won't connect if you
don't send TCP proxy header!
Let's cleanup the proxy_protocol_network and restart all nodes
before continuing.
197

More Related Content

What's hot

Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Mydbops
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestratorYoungHeon (Roy) Kim
 
Running MariaDB in multiple data centers
Running MariaDB in multiple data centersRunning MariaDB in multiple data centers
Running MariaDB in multiple data centersMariaDB plc
 
MySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsMySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsFrederic Descamps
 
How to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScaleHow to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScaleMariaDB plc
 
Percona XtraDB Cluster
Percona XtraDB ClusterPercona XtraDB Cluster
Percona XtraDB ClusterKenny Gryp
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18Derek Downey
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability SolutionsMydbops
 
ProxySQL High Avalability and Configuration Management Overview
ProxySQL High Avalability and Configuration Management OverviewProxySQL High Avalability and Configuration Management Overview
ProxySQL High Avalability and Configuration Management OverviewRené Cannaò
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLRené Cannaò
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability Mydbops
 
Plny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practicesPlny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practicesDimas Prasetyo
 
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021StreamNative
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScaleMariaDB plc
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceMariaDB plc
 
Migrating with Debezium
Migrating with DebeziumMigrating with Debezium
Migrating with DebeziumMike Fowler
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Productionconfluent
 

What's hot (20)

Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
 
Running MariaDB in multiple data centers
Running MariaDB in multiple data centersRunning MariaDB in multiple data centers
Running MariaDB in multiple data centers
 
MySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsMySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & Operations
 
How to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScaleHow to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScale
 
Percona XtraDB Cluster
Percona XtraDB ClusterPercona XtraDB Cluster
Percona XtraDB Cluster
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability Solutions
 
ProxySQL High Avalability and Configuration Management Overview
ProxySQL High Avalability and Configuration Management OverviewProxySQL High Avalability and Configuration Management Overview
ProxySQL High Avalability and Configuration Management Overview
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQL
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
Plny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practicesPlny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practices
 
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
 
Query logging with proxysql
Query logging with proxysqlQuery logging with proxysql
Query logging with proxysql
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
 
Migrating with Debezium
Migrating with DebeziumMigrating with Debezium
Migrating with Debezium
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Production
 

Viewers also liked

Java MySQL Connector & Connection Pool Features & Optimization
Java MySQL Connector & Connection Pool Features & OptimizationJava MySQL Connector & Connection Pool Features & Optimization
Java MySQL Connector & Connection Pool Features & OptimizationKenny Gryp
 
淘宝数据库架构演进历程
淘宝数据库架构演进历程淘宝数据库架构演进历程
淘宝数据库架构演进历程zhaolinjnu
 
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLOlivier DASINI
 
MySQL Replication Performance Tuning for Fun and Profit!
MySQL Replication Performance Tuning for Fun and Profit!MySQL Replication Performance Tuning for Fun and Profit!
MySQL Replication Performance Tuning for Fun and Profit!Vitor Oliveira
 
Group Replication: A Journey to the Group Communication Core
Group Replication: A Journey to the Group Communication CoreGroup Replication: A Journey to the Group Communication Core
Group Replication: A Journey to the Group Communication CoreAlfranio Júnior
 
Mysql参数-GDB
Mysql参数-GDBMysql参数-GDB
Mysql参数-GDBzhaolinjnu
 
Using Apache Spark and MySQL for Data Analysis
Using Apache Spark and MySQL for Data AnalysisUsing Apache Spark and MySQL for Data Analysis
Using Apache Spark and MySQL for Data AnalysisSveta Smirnova
 
Mastering InnoDB Diagnostics
Mastering InnoDB DiagnosticsMastering InnoDB Diagnostics
Mastering InnoDB Diagnosticsguest8212a5
 
MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...
MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...
MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...Frederic Descamps
 
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware companyMySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware companyContinuent
 
MySQL Best Practices - OTN LAD Tour
MySQL Best Practices - OTN LAD TourMySQL Best Practices - OTN LAD Tour
MySQL Best Practices - OTN LAD TourRonald Bradford
 
MySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMario Beck
 
MySQL High Availability with Group Replication
MySQL High Availability with Group ReplicationMySQL High Availability with Group Replication
MySQL High Availability with Group ReplicationNuno Carvalho
 
MySQL InnoDB 源码实现分析(一)
MySQL InnoDB 源码实现分析(一)MySQL InnoDB 源码实现分析(一)
MySQL InnoDB 源码实现分析(一)frogd
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability SolutionsLenz Grimmer
 
MySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialMySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialKenny Gryp
 

Viewers also liked (20)

Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
 
Java MySQL Connector & Connection Pool Features & Optimization
Java MySQL Connector & Connection Pool Features & OptimizationJava MySQL Connector & Connection Pool Features & Optimization
Java MySQL Connector & Connection Pool Features & Optimization
 
淘宝数据库架构演进历程
淘宝数据库架构演进历程淘宝数据库架构演进历程
淘宝数据库架构演进历程
 
Extensible Data Modeling
Extensible Data ModelingExtensible Data Modeling
Extensible Data Modeling
 
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
 
MySQL Replication Performance Tuning for Fun and Profit!
MySQL Replication Performance Tuning for Fun and Profit!MySQL Replication Performance Tuning for Fun and Profit!
MySQL Replication Performance Tuning for Fun and Profit!
 
SQL Outer Joins for Fun and Profit
SQL Outer Joins for Fun and ProfitSQL Outer Joins for Fun and Profit
SQL Outer Joins for Fun and Profit
 
Group Replication: A Journey to the Group Communication Core
Group Replication: A Journey to the Group Communication CoreGroup Replication: A Journey to the Group Communication Core
Group Replication: A Journey to the Group Communication Core
 
Mysql参数-GDB
Mysql参数-GDBMysql参数-GDB
Mysql参数-GDB
 
Load Data Fast!
Load Data Fast!Load Data Fast!
Load Data Fast!
 
Using Apache Spark and MySQL for Data Analysis
Using Apache Spark and MySQL for Data AnalysisUsing Apache Spark and MySQL for Data Analysis
Using Apache Spark and MySQL for Data Analysis
 
Mastering InnoDB Diagnostics
Mastering InnoDB DiagnosticsMastering InnoDB Diagnostics
Mastering InnoDB Diagnostics
 
MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...
MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...
MySQL innodb cluster and Group Replication in a nutshell - hands-on tutorial ...
 
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware companyMySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
 
MySQL Best Practices - OTN LAD Tour
MySQL Best Practices - OTN LAD TourMySQL Best Practices - OTN LAD Tour
MySQL Best Practices - OTN LAD Tour
 
MySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDB
 
MySQL High Availability with Group Replication
MySQL High Availability with Group ReplicationMySQL High Availability with Group Replication
MySQL High Availability with Group Replication
 
MySQL InnoDB 源码实现分析(一)
MySQL InnoDB 源码实现分析(一)MySQL InnoDB 源码实现分析(一)
MySQL InnoDB 源码实现分析(一)
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability Solutions
 
MySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialMySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn Tutorial
 

Similar to Advanced Percona XtraDB Cluster in a nutshell... la suite

MySQL Galera 集群
MySQL Galera 集群MySQL Galera 集群
MySQL Galera 集群YUCHENG HU
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group ReplicationKenny Gryp
 
RAC-Installing your First Cluster and Database
RAC-Installing your First Cluster and DatabaseRAC-Installing your First Cluster and Database
RAC-Installing your First Cluster and DatabaseNikhil Kumar
 
Mysql wp cluster_quickstart_windows
Mysql wp cluster_quickstart_windowsMysql wp cluster_quickstart_windows
Mysql wp cluster_quickstart_windowsRogério Rocha
 
Highly Available Load Balanced Galera MySql Cluster
Highly Available Load Balanced  Galera MySql ClusterHighly Available Load Balanced  Galera MySql Cluster
Highly Available Load Balanced Galera MySql ClusterAmr Fawzy
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisAnne Nicolas
 
Student exercise guide_training_cmode_8.2
Student exercise guide_training_cmode_8.2Student exercise guide_training_cmode_8.2
Student exercise guide_training_cmode_8.2Mohan Kumaresan
 
Nagios Conference 2014 - Troy Lea - Monitoring VMware Virtualization Using vMA
Nagios Conference 2014 - Troy Lea - Monitoring VMware Virtualization Using vMANagios Conference 2014 - Troy Lea - Monitoring VMware Virtualization Using vMA
Nagios Conference 2014 - Troy Lea - Monitoring VMware Virtualization Using vMANagios
 
How to debug ocfs2 hang problem
How to debug ocfs2 hang problemHow to debug ocfs2 hang problem
How to debug ocfs2 hang problemGang He
 
Mater,slave on mysql
Mater,slave on mysqlMater,slave on mysql
Mater,slave on mysqlVasudeva Rao
 
Armitage – The Ultimate Attack Platform for Metasploit
Armitage – The  Ultimate Attack  Platform for Metasploit Armitage – The  Ultimate Attack  Platform for Metasploit
Armitage – The Ultimate Attack Platform for Metasploit Ishan Girdhar
 
Percona XtraDB 集群文档
Percona XtraDB 集群文档Percona XtraDB 集群文档
Percona XtraDB 集群文档YUCHENG HU
 
Network Automation Tools
Network Automation ToolsNetwork Automation Tools
Network Automation ToolsEdwin Beekman
 
CloudStack hands-on workshop @ DevOpsDays Amsterdam 2015
CloudStack hands-on workshop @ DevOpsDays Amsterdam 2015CloudStack hands-on workshop @ DevOpsDays Amsterdam 2015
CloudStack hands-on workshop @ DevOpsDays Amsterdam 2015Remi Bergsma
 
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best PracticesMySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best PracticesKenny Gryp
 
Get mysql clusterrunning-windows
Get mysql clusterrunning-windowsGet mysql clusterrunning-windows
Get mysql clusterrunning-windowsJoeSg
 
Introduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB ClusterIntroduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB ClusterI Goo Lee
 

Similar to Advanced Percona XtraDB Cluster in a nutshell... la suite (20)

MySQL Galera 集群
MySQL Galera 集群MySQL Galera 集群
MySQL Galera 集群
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group Replication
 
RAC-Installing your First Cluster and Database
RAC-Installing your First Cluster and DatabaseRAC-Installing your First Cluster and Database
RAC-Installing your First Cluster and Database
 
Mysql wp cluster_quickstart_windows
Mysql wp cluster_quickstart_windowsMysql wp cluster_quickstart_windows
Mysql wp cluster_quickstart_windows
 
Mysql
Mysql Mysql
Mysql
 
Highly Available Load Balanced Galera MySql Cluster
Highly Available Load Balanced  Galera MySql ClusterHighly Available Load Balanced  Galera MySql Cluster
Highly Available Load Balanced Galera MySql Cluster
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
 
Student exercise guide_training_cmode_8.2
Student exercise guide_training_cmode_8.2Student exercise guide_training_cmode_8.2
Student exercise guide_training_cmode_8.2
 
Vt6655 linux user_guide
Vt6655 linux user_guideVt6655 linux user_guide
Vt6655 linux user_guide
 
Nagios Conference 2014 - Troy Lea - Monitoring VMware Virtualization Using vMA
Nagios Conference 2014 - Troy Lea - Monitoring VMware Virtualization Using vMANagios Conference 2014 - Troy Lea - Monitoring VMware Virtualization Using vMA
Nagios Conference 2014 - Troy Lea - Monitoring VMware Virtualization Using vMA
 
How to debug ocfs2 hang problem
How to debug ocfs2 hang problemHow to debug ocfs2 hang problem
How to debug ocfs2 hang problem
 
Mater,slave on mysql
Mater,slave on mysqlMater,slave on mysql
Mater,slave on mysql
 
Armitage – The Ultimate Attack Platform for Metasploit
Armitage – The  Ultimate Attack  Platform for Metasploit Armitage – The  Ultimate Attack  Platform for Metasploit
Armitage – The Ultimate Attack Platform for Metasploit
 
Percona XtraDB 集群文档
Percona XtraDB 集群文档Percona XtraDB 集群文档
Percona XtraDB 集群文档
 
Network Automation Tools
Network Automation ToolsNetwork Automation Tools
Network Automation Tools
 
CloudStack hands-on workshop @ DevOpsDays Amsterdam 2015
CloudStack hands-on workshop @ DevOpsDays Amsterdam 2015CloudStack hands-on workshop @ DevOpsDays Amsterdam 2015
CloudStack hands-on workshop @ DevOpsDays Amsterdam 2015
 
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best PracticesMySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
 
Get mysql clusterrunning-windows
Get mysql clusterrunning-windowsGet mysql clusterrunning-windows
Get mysql clusterrunning-windows
 
Database Replication
Database ReplicationDatabase Replication
Database Replication
 
Introduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB ClusterIntroduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB Cluster
 

More from Kenny Gryp

MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08Kenny Gryp
 
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11Kenny Gryp
 
MySQL Operator for Kubernetes
MySQL Operator for KubernetesMySQL Operator for Kubernetes
MySQL Operator for KubernetesKenny Gryp
 
MySQL Database Architectures - 2020-10
MySQL Database Architectures -  2020-10MySQL Database Architectures -  2020-10
MySQL Database Architectures - 2020-10Kenny Gryp
 
MySQL InnoDB Cluster / ReplicaSet - Tutorial
MySQL InnoDB Cluster / ReplicaSet - TutorialMySQL InnoDB Cluster / ReplicaSet - Tutorial
MySQL InnoDB Cluster / ReplicaSet - TutorialKenny Gryp
 
MySQL Connectors 8.0.19 & DNS SRV
MySQL Connectors 8.0.19 & DNS SRVMySQL Connectors 8.0.19 & DNS SRV
MySQL Connectors 8.0.19 & DNS SRVKenny Gryp
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterKenny Gryp
 
MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)Kenny Gryp
 
Reducing Risk When Upgrading MySQL
Reducing Risk When Upgrading MySQLReducing Risk When Upgrading MySQL
Reducing Risk When Upgrading MySQLKenny Gryp
 
Multi Source Replication With MySQL 5.7 @ Verisure
Multi Source Replication With MySQL 5.7 @ VerisureMulti Source Replication With MySQL 5.7 @ Verisure
Multi Source Replication With MySQL 5.7 @ VerisureKenny Gryp
 
Online MySQL Backups with Percona XtraBackup
Online MySQL Backups with Percona XtraBackupOnline MySQL Backups with Percona XtraBackup
Online MySQL Backups with Percona XtraBackupKenny Gryp
 

More from Kenny Gryp (11)

MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08
 
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
 
MySQL Operator for Kubernetes
MySQL Operator for KubernetesMySQL Operator for Kubernetes
MySQL Operator for Kubernetes
 
MySQL Database Architectures - 2020-10
MySQL Database Architectures -  2020-10MySQL Database Architectures -  2020-10
MySQL Database Architectures - 2020-10
 
MySQL InnoDB Cluster / ReplicaSet - Tutorial
MySQL InnoDB Cluster / ReplicaSet - TutorialMySQL InnoDB Cluster / ReplicaSet - Tutorial
MySQL InnoDB Cluster / ReplicaSet - Tutorial
 
MySQL Connectors 8.0.19 & DNS SRV
MySQL Connectors 8.0.19 & DNS SRVMySQL Connectors 8.0.19 & DNS SRV
MySQL Connectors 8.0.19 & DNS SRV
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
 
MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)
 
Reducing Risk When Upgrading MySQL
Reducing Risk When Upgrading MySQLReducing Risk When Upgrading MySQL
Reducing Risk When Upgrading MySQL
 
Multi Source Replication With MySQL 5.7 @ Verisure
Multi Source Replication With MySQL 5.7 @ VerisureMulti Source Replication With MySQL 5.7 @ Verisure
Multi Source Replication With MySQL 5.7 @ Verisure
 
Online MySQL Backups with Percona XtraBackup
Online MySQL Backups with Percona XtraBackupOnline MySQL Backups with Percona XtraBackup
Online MySQL Backups with Percona XtraBackup
 

Recently uploaded

What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 

Recently uploaded (20)

What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 

Advanced Percona XtraDB Cluster in a nutshell... la suite

  • 1. Advanced Percona XtraDB Cluster in a nutshell... la suite Hands on tutorial for advanced users! 1
  • 2. Kenny 'kenji' Gryp kenny.gryp@percona.com Frédéric 'lefred' Descamps lefred@percona.com 2
  • 3. Setting Up Environment Bootstrapping Certification Errors Replication Failures Galera Cache Flow Control Replication Throughput WAN Replication Consistent Reads Backups Load Balancers Challenges Table of Contents 3
  • 4. Are you ready? Setting Up The Environment 4
  • 5. Setting Up The Environment Fetch a USB Stick, Install VirtualBox & Copy the 3 images 5
  • 6. Testing The Environment Start all 3 VirtualBox images ssh/putty to: pxc1: ssh root@localhost -p 8821 pxc2: ssh root@localhost -p 8822 pxc3: ssh root@localhost -p 8823 root password is vagrant HAProxy is running on pxc1 (http://localhost:8881/) Verify ssh between nodes Open 2 ssh sessions to every node 6
  • 7. Attention - Hands On! When you see in the right bottom, there is an exercise that you should do! 7
  • 9. Bootstrapping PXC You all should know this already... # service mysql bootstrap-pxc # /etc/init.d/mysql boostrap-pxc # /etc/init.d/mysql start --wsrep-new-cluster or with systemd environments like Centos 7: # systemctl mysql@bootstrap start Today we are using 32-bit Centos 6. For Emily, just to be complete.... pxc1# service mysql bootstrap-pxc pxc2# service mysql start pxc3# service mysql start 9
  • 10. Bootstrapping PXC Bootstrapping a node gives it permission to form a new cluster Bootstrapping should NOT happen automatically without a system with split-brain protection that can coordinate it. Usually this is done manually The bootstrapped node is the source of truth for all nodes going forward 10
  • 11. Bootstrapping PXC Recap On IST/SST IST: Incremental State Transfer Only transfer missing transactions SST: State Snapshot Transfer Snapshot the whole database and transfer, using: Percona XtraBackup rsync mysqldump One node of the cluster is DONOR 11
  • 12. Bootstrapping PXC With Stop/Start Of MySQL When you need to start a new cluster from scratch, you decide which node to start with and you bootstrap it # /etc/init.d/mysql start --wsrep-new-cluster That node becomes the cluster source of truth (SSTs for all new nodes) 12
  • 13. Bootstrapping PXC Without Restarting MySQL When a cluster is already partitioned and you want to bring it up again. 1 or more nodes need to be in Non-Primary state. Choose the node that is newest and can be enabled (to work with application) To bootstrap online: mysql> set global wsrep_provider_options="pc.bootstrap=true"; be sure there is NO OTHER PRIMARY partition or there will be a split brain!! 13
  • 14. Bootstrapping PXC Without Restarting MySQL Use Case: Only 1 of the 3 nodes is available and the other 2 nodes crashed, causing node 1 to go Non-Primary. In Multi Datacenter environments: DC1 has 2 nodes, DC2 has 1 node, If DC1 dies, the single node in DC2 will go Non-Primary. To activate secondary DC, a bootstrap is necessary 14
  • 15. Recover Cleanly Shutdown Cluster Run the application (run_app.sh haproxy-all) on pxc1 One by one, stop mysql on all 3 nodes How can you know which node to bootstrap? 15
  • 16. Recover Cleanly Shutdown Cluster Run the application (run_app.sh haproxy-all) on pxc1 One by one, stop mysql on all 3 nodes How can you know which node to bootstrap? Solution # cat /var/lib/mysql/grastate.dat # GALERA saved state version: 2.1 uuid: 3759f5c0-56f6-11e5-ad87-afbd92f4dcd2 seqno: 1933471 cert_index: Bootstrap node with highest seqno and start other nodes. 16
  • 17. Recover Unclean Stopped Cluster Run the application (run_app.sh haproxy-all) on pxc1 On all nodes at the same time run: # killall -9 mysqld mysqld_safe How can you know which node has the latest commit? 17
  • 18. Recover Unclean Stopped Cluster Run the application (run_app.sh haproxy-all) on pxc1 On all nodes at the same time run: # killall -9 mysqld mysqld_safe How can you know which node has the latest commit? Solution # mysqld_safe --wsrep-recover Logging to '/var/lib/mysql/error.log'. Starting mysqld daemon with databases from /var/lib/mysql WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.Ln WSREP: Recovered position 44e54b4b-5c69-11e5-83a3-8fc879cb495e:1719976 mysqld from pid file /var/lib/mysql/pxc1.pid ended 18
  • 19. Recover Unclean Stopped Cluster What methods can we use to bring back the cluster? 19
  • 20. Recover Unclean Stopped Cluster What methods can we use to bring back the cluster? Solutions Bootstrap the most accurate server Since PXC 5.6.19-25.6 we have pc.recovery (enabled by default) that uses the information stored in gvwstate.dat. We can then just start MySQL on all 3 nodes at the same time # service mysql restart 20
  • 21. Avoiding SST When MySQL cannot start due to an error, such as a configuration error, an SST is always performed. 21
  • 22. Avoiding SST When MySQL cannot start due to an error, such as a configuration error, an SST is always performed. Let's try, stop MySQL on pxc2, modify my.cnf and add foobar under [mysqld] section. Then start MySQL, does it fail? Check /var/lib/mysql/error.log. # /etc/init.d/mysql stop # cat >> /etc/my.cnf << EOF [mysqld] foobar EOF # /etc/init.d/mysql start 22
  • 23. Avoiding SST When MySQL cannot start due to an error, such as a configuration error, an SST is always performed. Fix the error (remove the foobar configuration) and restart MySQL. Does it perform SST? Check /var/lib/mysql/error.log. Why? 23
  • 24. Avoiding SST When MySQL cannot start due to an error, such as a configuration error, an SST is always performed. Fix the error (remove the foobar configuration) and restart MySQL. Does it perform SST? Check /var/lib/mysql/error.log. Why? SST is done, as we can see in the error.log: [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (93a81eed-57b2-11e5-8f5e-82e53aab8d35): 1 (Operation not permitted) ... WSREP_SST: [INFO] Proceeding with SST (20150916 19:13:50.990) 24
  • 25. Avoiding SST When MySQL cannot start due to an error, such as a configuration error, an SST is always performed. So how can we avoid SST? It's easy, you need to hack /var/lib/mysql/grastate.dat. Create the error again: Bring node back in cluster Add foobar to the configuration again Start MySQL 25
  • 26. Avoiding SST When MySQL cannot start due to an error, such as a configuration error, an SST is always performed. # cat /var/lib/mysql/grastate.dat # GALERA saved state version: 2.1 uuid: 93a81eed-57b2-11e5-8f5e-82e53aab8d35 seqno: 1300762 cert_index: 26
  • 27. Avoiding SST When MySQL cannot start due to an error, such as a configuration error, an SST is always performed. # cat /var/lib/mysql/grastate.dat # GALERA saved state version: 2.1 uuid: 93a81eed-57b2-11e5-8f5e-82e53aab8d35 seqno: 1300762 cert_index: When it fails due to an error, grastate.dat is reset to: # GALERA saved state version: 2.1 uuid: 00000000-0000-0000-0000-000000000000 seqno: -1 cert_index: 27
  • 28. Avoiding SST When MySQL cannot start due to an error, such as a configuration error, an SST is always performed. You need then to set the right uuid and seqno in grastate.dat Run mysqld_safe --wsrep-recover to find the values to set: [root@pxc2 mysql]# mysqld_safe --wsrep-recover ... 2015-09-16 19:26:14 6133 [Note] WSREP: Recovered position: 93a81eed-57b2-11e5-8f5e-82e53aab8d35:1300762 28
  • 29. Avoiding SST When MySQL cannot start due to an error, such as a configuration error, an SST is always performed. Create grastate.dat with info from wsrep-recover: # GALERA saved state version: 2.1 uuid: 93a81eed-57b2-11e5-8f5e-82e53aab8d35 seqno: 1300762 cert_index: Start MySQL again and check /var/lib/mysql/error.log: [root@pxc2 mysql]# /etc/init.d/mysql start ... 150916 19:27:53 mysqld_safe Assigning 93a81eed-57b2-11e5-8f5e-82e53aab8d35:1300762 to wsrep_start_position ... WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20150916 19:34:05.545 29
  • 30. When putting in production unprepared... Certification Errors 30
  • 31. Certification What it does: Determine if writeset can be applied. Based on unapplied earlier transactions on master Such conflicts must come from other nodes Happens on every node, individually Deterministic Results are not reported to other nodes in the cluster, as every node does certification and is a determinstic process. Pass: enter apply queue (commit success on master) Fail: drop transaction (or return deadlock on master) Serialized by GTID Cost based on # of keys or # of rows 31
  • 39. Conflict Detection Local Certification Failure (lcf) Transaction fails certification Post-replication Deadlock/Transaction Rollback Status Counter: wsrep_local_cert_failures Brute Force Abort (bfa) (Most Common) Deadlock/Transaction rolled back by applier threads Pre-commit Transaction Rollback Status Counter: wsrep_local_bf_aborts 39
  • 40. Conflict Deadlock/Rollback note: Transaction Rollback can be gotten on any statement, including SELECT and COMMIT Example: pxc1 mysql> commit; ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 40
  • 41. Multi-writer Conflict Types Brute Force Abort (bfa) Transaction rolled back by applier threads Pre-commit Transaction Rollback can be gotten on any statement, including SELECT and COMMIT Status Counter: wsrep_local_bf_aborts 41
  • 42. Brute Force Abort (bfa) 42
  • 43. Brute Force Abort (bfa) 43
  • 44. Brute Force Abort (bfa) 44
  • 45. Brute Force Abort (bfa) 45
  • 46. Brute Force Abort (bfa) 46
  • 47. Brute Force Abort (bfa) 47
  • 48. Brute Force Abort (bfa) 48
  • 49. Multi-writer Conflict Types Local Certification Failure (lcf) Transaction fails certification Post-replication Deadlock on commit Status Counter: wsrep_local_cert_failures 49
  • 63. Reproducing Conflicts - 1 On pxc1, create test table: pxc1 mysql> CREATE TABLE test.deadlocks ( i INT UNSIGNED NOT NULL PRIMARY KEY, j varchar(32), t datetime ); pxc1 mysql> INSERT INTO test.deadlocks VALUES (1, NULL, NULL); Run myq_status on pxc1: # myq_status wsrep mycluster / pxc1 (idx: 1) / Galera 3.11(ra0189ab) Cluster Node Repl Queue Ops Bytes Conflct Gcache Window cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst appl comm 12 3 Sync N/A 0 0 0 0 0.0 0.0 0 0 1.8m 0 0 0 12 3 Sync N/A 0 0 0 0 0.0 0.0 0 0 1.8m 0 0 0 12 3 Sync N/A 0 0 0 0 0.0 0.0 0 0 1.8m 0 0 0 63
  • 64. Reproducing Conflicts - 1 On pxc1: pxc1 mysql> BEGIN; pxc1 mysql> UPDATE test.deadlocks SET j='pxc1', t=now() WHERE i=1; Before commit, go to pxc3: pxc3 mysql> BEGIN; pxc3 mysql> UPDATE test.deadlocks SET j='pxc3', t=now() WHERE i=1; pxc3 mysql> COMMIT; Now commit the transaction on pxc1: pxc1 mysql> COMMIT; pxc1 mysql> SELECT * FROM test.deadlocks; 64
  • 65. Reproducing Conflicts - 1 It fails: pxc1 mysql> commit; ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 65
  • 66. Reproducing Conflicts - 1 Which commit succeeded? Is this a lcf or a bfa? How would you diagnose this error? 66
  • 67. Reproducing Conflicts - 1 Which commit succeeded? PXC3, first one that got in cluster. Is this a lcf or a bfa? BFA How would you diagnose this error? show global status like 'wsrep_local_bf%'; show global status like 'wsrep_local_cert%'; +---------------------------+-------+ | Variable_name | Value | +---------------------------+-------+ | wsrep_local_bf_aborts | 1 | | wsrep_local_cert_failures | 0 | +---------------------------+-------+ # myq_status wsrep mycluster / pxc1 (idx: 1) / Galera 3.11(ra0189ab) Wsrep Cluster Node Repl Queue Ops Bytes Conflct Gcache Win time P cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst 10:49:43 P 12 3 Sync 1.1ms 0 0 0 0 0.0 0.0 0 0 3 3 10:49:44 P 12 3 Sync 1.1ms 0 0 0 0 0.0 0.0 0 0 3 3 10:49:45 P 12 3 Sync 1.1ms 0 0 0 0 0.0 0.0 0 0 3 3 10:49:46 P 12 3 Sync 1.1ms 0 0 0 1 0.0 0.3K 0 1 4 3 67
  • 68. Reproducing Conflicts - 1 Log Conflicts pxc1 mysql> set global wsrep_log_conflicts=on; *** Priority TRANSACTION: TRANSACTION 7743569, ACTIVE 0 sec starting index read MySQL thread id 2, OS thread handle 0x93e78b70, query id 1395484 System lock *** Victim TRANSACTION: TRANSACTION 7743568, ACTIVE 9 sec MySQL thread id 89984, OS thread handle 0x82bb1b70, query id 1395461 localhost root *** WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 80 page no 3 n bits 72 index PRIMARY of table test.deadlocks trx id 7743568 lock_mode X locks rec but not gap 2015-09-19 12:36:17 4285 [Note] WSREP: cluster conflict due to high priority abort for threads: 2015-09-19 12:36:17 4285 [Note] WSREP: Winning thread: THD: 2, mode: applier, state: executing, conflict: no conflict, seqno: 1824234 SQL: (null) 2015-09-19 12:36:17 4285 [Note] WSREP: Victim thread: THD: 89984, mode: local, state: idle, conflict: no conflict, seqno: -1 SQL: (null) 68
  • 69. Reproducing Conflicts - 1 Log Conflicts - Debug pxc1 mysql> set global wsrep_debug=on; [Note] WSREP: BF kill (1, seqno: 1824243), victim: (90473) trx: 7743601 [Note] WSREP: Aborting query: void [Note] WSREP: kill IDLE for 7743601 [Note] WSREP: enqueuing trx abort for (90473) [Note] WSREP: signaling aborter [Note] WSREP: WSREP rollback thread wakes for signal [Note] WSREP: client rollback due to BF abort for (90473), query: (null) [Note] WSREP: WSREP rollbacker aborted thd: (90473 2649955184) [Note] WSREP: Deadlock error for: (null) 69
  • 70. Reproducing Conflicts - 2 rollback; all transactions on all mysql clients ensure SET GLOBAL wsrep_log_conflicts=on; on all nodes; run myq_status wsrep on pxc1 and pxc2 run run_app.sh lcf on pxc1 to reproduce a LCF check: output of run_app.sh lcf myq_status /var/lib/mysql/error.log 70
  • 71. Reproducing Conflicts - 2 [root@pxc2 ~]# myq_status wsrep mycluster / pxc2 (idx: 0) / Galera 3.12(r9921e73) Wsrep Cluster Node Repl Queue Ops Bytes Conflct Gcache Win time P cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst 13:28:15 P 47 3 Sync 1.1ms 0 0 0 0 0.0 0.0 0 0 7433 101 13:28:16 P 47 3 Sync 1.1ms 0 409 0 4 0.0 1.1K 0 0 7436 5 13:28:17 P 47 3 Sync 1.1ms 0 947 0 0 0.0 0.0 0 0 7436 5 13:28:18 P 47 3 Sync 1.1ms 0 1470 0 0 0.0 0.0 0 0 7436 5 13:28:19 P 47 3 Sync 1.1ms 0 1892 0 0 0.0 0.0 0 0 7436 5 13:28:20 P 47 3 Sync 1.1ms 0 2555 0 0 0.0 0.0 0 0 7436 5 13:28:21 P 47 3 Sync 1.1ms 0 3274 0 0 0.0 0.0 0 0 7436 5 13:28:22 P 47 3 Sync 1.1ms 0 3945 0 0 0.0 0.0 0 0 7436 5 13:28:23 P 47 3 Sync 1.1ms 0 4663 0 0 0.0 0.0 0 0 7436 5 13:28:24 P 47 3 Sync 1.1ms 0 5400 0 0 0.0 0.0 0 0 7436 5 13:28:25 P 47 3 Sync 1.1ms 0 6096 0 0 0.0 0.0 0 0 7436 5 13:28:26 P 47 3 Sync 1.1ms 0 6839 0 0 0.0 0.0 0 0 7436 5 13:28:27 P 47 3 Sync 1.1ms 0 6872 0 0 0.0 0.0 0 0 7436 5 13:28:28 P 47 3 Sync 1.1ms 0 6872 0 0 0.0 0.0 0 0 7436 5 13:28:29 P 47 3 Sync 1.1ms 0 6872 0 0 0.0 0.0 0 0 7436 5 13:28:30 P 47 3 Sync 1.1ms 0 5778 1 1102 0.3K 0.3M 0 0 8537 5 13:28:31 P 47 3 Sync 1.1ms 0 978 0 4838 0.0 1.4M 0 0 13k 5 13:28:32 P 47 3 Sync 1.1ms 0 0 1 985 0.3K 0.3M 2 0 14k 5 13:28:33 P 47 3 Sync N/A 0 0 0 0 0.0 0.0 0 0 14k 5 13:28:34 P 47 3 Sync N/A 0 0 0 0 0.0 0.0 0 0 14k 5 13:28:35 P 47 3 Sync N/A 0 0 0 0 0.0 0.0 0 0 14k 5 71
  • 72. Reproducing Conflicts - 2 *** Priority TRANSACTION: TRANSACTION 7787747, ACTIVE 0 sec starting index read mysql tables in use 1, locked 1 1 lock struct(s), heap size 312, 0 row lock(s) MySQL thread id 1, OS thread handle 0x93e78b70, query id 301870 System lock *** Victim TRANSACTION: TRANSACTION 7787746, ACTIVE 0 sec mysql tables in use 1, locked 1 2 lock struct(s), heap size 312, 1 row lock(s), undo log entries 1 MySQL thread id 2575, OS thread handle 0x82369b70, query id 286919 pxc1 192.168 update test.test set sec_col = 0 where id = 1 [Note] WSREP: Winning thread: THD: 1, mode: applier, state: executing, conflict: no conflict, seqno: 1846028, SQL: (null) [Note] WSREP: Victim thread: THD: 2575, mode: local, state: committing, conflict: no conflict, seqno: -1, SQL: update test.test set sec_col = 0 where id = 1 [Note] WSREP: BF kill (1, seqno: 1846028), victim: (2575) trx: 7787746 [Note] WSREP: Aborting query: update test.test set sec_col = 0 where id = 1 [Note] WSREP: kill trx QUERY_COMMITTING for 7787746 [Note] WSREP: trx conflict for key (1,FLAT8)258634b1 d0506abd: source: 5cb369ab-5eca-11e5-8151-7afe8943c31a version: 3 local: 1 state: MUST_ABORT flags: 1 conn_id: 2575 trx_id: 7787746 seqnos (l: 21977, g: 1846030, s: 1846027, d: 1838545, ts: 161299135504641) <--X--> source: 9b376860-5e09-11e5-ac17-e6e46a2459ee version: 3 local: 0 state: APPLYING flags: 1 conn_id: 95747 trx_id: 7787229 72
  • 73. Reproducing Conflicts Summary Conflicts are of a concern when determining if PXC is a fit for the application: Long running transactions increase chance of conflicts Heavy write workload on multiple nodes Large transactions increase chance of conflicts Mark Callaghan's law a given row can't be modified more often than 1/RTT times a second These issues can usually be resolved by writing to 1 node only. 73
  • 78. Replication Failure When Do They Happen? When a Total Order Isolation (TOI) error happened DDL error: CREATE TABLE, ALTER TABLE... GRANT failed When there was a node inconsistency Bug in Galera replication Human Error, for example skipping binary log (SQL_LOG_BIN=0) when doing writes 78
  • 79. Replication Failure What Happens? At every error: A GRA_*.log file is created into the MySQL datadir [root@pxc1 ~]# ls -alsh1 /var/lib/mysql/GRA_* -rw-rw----. 1 mysql 89 Sep 15 10:26 /var/lib/mysql/GRA_1_127792.log -rw-rw----. 1 mysql 83 Sep 10 12:00 /var/lib/mysql/GRA_2_5.log A message is written to the errorlog /var/lib/mysql/error.log It's possible to decode them, they are binary logs They can be safely removed 79
  • 80. Replication Failure Reading GRA Contents Run Application only on pxc1 using only pxc1 as writer: # run_app.sh pxc1 pxc1 mysql> create table test.nocolumns; What do you get? 80
  • 81. Replication Failure Reading GRA Contents Run Application only on pxc1 using only pxc1 as writer: # run_app.sh pxc1 pxc1 mysql> create table test.nocolumns; What do you get? ERROR 1113 (42000): A table must have at least 1 column 81
  • 82. Replication Failure Reading GRA Contents Run Application only on pxc1 using only pxc1 as writer: # run_app.sh pxc1 pxc1 mysql> create table test.nocolumns; What do you get? ERROR 1113 (42000): A table must have at least 1 column Error Log Other Nodes: [ERROR] Slave SQL: Error 'A table must have at least 1 column' on query. Default database: ''. Query: 'create table test.nocolumns', Error_code: 1113 [Warning] WSREP: RBR event 1 Query apply warning: 1, 1881065 [Warning] WSREP: Ignoring error for TO isolated action: source: 9b376860-5e09-11e5-ac17-e6e46a2459ee version: 3 local: 0 state: APPLYING flags: 65 conn_id: 106500 trx_id: -1 seqnos (l: 57292, g: 1881065, s: 1881064, d: 1881064, ts: 76501555632582) 82
  • 83. Replication Failure Making GRA Header File note: Binary log headers only differ between versions, they can be reused Get a binary log without checksums: Create the GRA_header file with the new binary log: dd if=/var/lib/mysql/pxc2-bin.000018 bs=120 count=1 of=/root/GRA_header pxc2 mysql> set global binlog_checksum=0; pxc2 mysql> flush binary logs; pxc2 mysql> show master status; +-----------------+----------+--------------+------------------+------------------- | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set +-----------------+----------+--------------+------------------+------------------- | pxc2-bin.000018 | 790333 | | | +-----------------+----------+--------------+------------------+------------------- 83
  • 84. Replication Failure Reading GRA Contents Join the header and one GRA*.log file (note the seqno of 1881065) cat /root/GRA_header /var/lib/mysql/GRA_1_1881065.log >> /root/GRA_1_1881065-bin.log View the content with mysqlbinlog mysqlbinlog -vvv /root/GRA_1_1881065-bin.log 84
  • 85. Replication Failure Node Consistency Compromised Delete some data while skipping binary log completely: pxc2 mysql> set sql_log_bin=0; Query OK, 0 rows affected (0.00 sec) pxc2 mysql> delete from sbtest.sbtest1 limit 100; Query OK, 100 rows affected (0.00 sec) Repeat the DELETE until pxc2 crashes... 85
  • 86. Replication Failure Node Consistency Compromised Error: [ERROR] Slave SQL: Could not execute Update_rows event on table sbtest.sbtest1; Can't find record in 'sbtest1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 540, Error_code: 1032 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 1890959 [Warning] WSREP: Failed to apply app buffer: seqno: 1890959, status: 1 at galera/src/trx_handle.cpp:apply():351 Retrying 2th time ... Retrying 4th time ... [ERROR] WSREP: Failed to apply trx: source: 9b376860-5e09-11e5-ac17-e6e46a2459ee version: 3 local: 0 state: APPLYING flags: 1 conn_id: 117611 trx_id: 7877213 seqnos (l: 67341, g: 1890959, s: 1890958, d: 1890841, ts: 78933399926835) [ERROR] WSREP: Failed to apply trx 1890959 4 times [ERROR] WSREP: Node consistency compromized, aborting... ... [Note] WSREP: /usr/sbin/mysqld: Terminated. 86
  • 87. Replication Failure Node Consistency Compromised [root@pxc2 ~]# cat /root/GRA_header /var/lib/mysql/GRA_1_1890959.log | mysqlbinlog -vvv - BINLOG ' ...MDQwNjUyMjMwMTQ=... '/*!*/; ### UPDATE sbtest.sbtest1 ### WHERE ### @1=3528 /* INT meta=0 nullable=0 is_null=0 */ ### @2=4395 /* INT meta=0 nullable=0 is_null=0 */ ### @3='01945529982-83991536409-94055999891-11150850160-46682230772-19159811582-7 ### @4='92814455222-06024456935-25380449439-64345775537-04065223014' /* STRING(60 ### SET ### @1=3528 /* INT meta=0 nullable=0 is_null=0 */ ### @2=4396 /* INT meta=0 nullable=0 is_null=0 */ ### @3='01945529982-83991536409-94055999891-11150850160-46682230772-19159811582-7 ### @4='92814455222-06024456935-25380449439-64345775537-04065223014' /* STRING(60 # at 660 #150919 15:56:36 server id 1 end_log_pos 595 Table_map: sbtest.sbtest1 mapped to # at 715 #150919 15:56:36 server id 1 end_log_pos 1005 Update_rows: table id 70 flags: STM ... 87
  • 88. Gcache... what's that ? Galera Cache 88
  • 89. Galera Cache All nodes contain a Cache of recent writesets, used to perform IST used to store the writesets in circular buffer style 89
  • 90. Galera Cache All nodes contain a Cache of recent writesets, used to perform IST used to store the writesets in circular buffer style preallocated file with a specific size, configurable: wsrep_provider_options = "gcache.size=1G" default size is 128M 90
  • 91. Galera Cache All nodes contain a Cache of recent writesets, used to perform IST used to store the writesets in circular buffer style preallocated file with a specific size, configurable: wsrep_provider_options = "gcache.size=1G" default size is 128M Galera Cache is mmaped (I/O buffered to memory) So OS might swap (set vm.swappiness to 10) use fincore-linux or dbsake fincore to see how much of the file is cached in memory 91
  • 92. Galera Cache All nodes contain a Cache of recent writesets, used to perform IST used to store the writesets in circular buffer style preallocated file with a specific size, configurable: wsrep_provider_options = "gcache.size=1G" default size is 128M Galera Cache is mmaped (I/O buffered to memory) So OS might swap (set vm.swappiness to 10) use fincore-linux or dbsake fincore to see how much of the file is cached in memory status counter wsrep_local_cached_downto to find last seqno in the gcache wsrep_gcache_pool_size shows the size of the page pool and/or dynamic memroy allocated for gcache (since PXC 5.6.24) 92
  • 93. Galera Cache Calculating Optimal Size It would be great that we could handle 1 hour of changes in the galera cache for IST. How large does the Galera cache need to be? 93
  • 94. Galera Cache Calculating Optimal Size It would be great that we could handle 1 hour of changes in the galera cache for IST. How large does the Galera cache need to be? We can calculate how much writes happen over time wsrep_replicated_bytes: writesets sent to other nodes wsrep_received_bytes: writesets received from other nodes 94
  • 95. Galera Cache Calculating Optimal Size It would be great that we could handle 1 hour of changes in the galera cache for IST. How large does the Galera cache need to be? We can calculate how much writes happen over time wsrep_replicated_bytes: writesets sent to other nodes wsrep_received_bytes: writesets received from other nodes SHOW GLOBAL STATUS LIKE 'wsrep_%d_bytes'; SELECT SLEEP(60); SHOW GLOBAL STATUS LIKE 'wsrep_%d_bytes'; Sum up both replicated and received for each status and subtract. 95
  • 96. Galera Cache Calculating Optimal Size Easier to do is: SELECT ROUND(SUM(bytes)/1024/1024*60) AS megabytes_per_hour FROM (SELECT SUM(VARIABLE_VALUE) * -1 AS bytes FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME IN ('wsrep_received_bytes', 'wsrep_replicated_bytes') UNION ALL SELECT sleep(60) AS bytes UNION ALL SELECT SUM(VARIABLE_VALUE) AS bytes FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME IN ('wsrep_received_bytes', 'wsrep_replicated_bytes') ) AS COUNTED +--------------------+ | megabytes_per_hour | +--------------------+ | 302 | +--------------------+ 1 row in set (1 min 0.00 sec) 96
  • 97. Galera Cache How Much Filesystem Cache Used? Check galera cache's memory usage using dbsake: dbsake fincore /var/lib/mysql/galera.cache /var/lib/mysql/galera.cache: total_pages=32769 cached=448 percent=1.37 97
  • 98. Hey! Wait for me! Flow Control 98
  • 99. Flow Control Avoiding nodes drift behind (slave lag?) Any node in the cluster can ask the other nodes to pause writes if it lags behind too much. Caused by wsrep_local_recv_queue exceeding a node’s gcs.fc_limit Can cause all writes on all nodes in the entire cluster to be stalled. 99
  • 120. Flow Control Status Counters wsrep_flow_control_paused_ns: nanoseconds since starts of node did the cluster get stalled. wsrep_flow_control_recv: Amount of flow control messages received from other nodes wsrep_flow_control_sent: Amount of flow control messages sent to other nodes (wsrep_flow_control_paused: Only use in Galera 2 % of the time the cluster was stalled since last SHOW GLOBAL STATUS) 120
  • 121. Flow Control Observing Flow Control Run the application (run_app.sh) Run myq_status wsrep on all nodes. Take a read lock on pxc3 and observe its effect on the cluster. FLUSH TABLES WITH READ LOCK 121
  • 122. pxc1 run_app.sh all nodes: myq_status wsrep Flow Control Observing Flow Control Run the application (run_app.sh) Run myq_status wsrep on all nodes. Take a read lock on pxc3 and observe its effect on the cluster. FLUSH TABLES WITH READ LOCK pxc3 mysql> flush tables with read lock; wait until flow control kicks in... pcx3 mysql> unlock tables; 122
  • 123. Flow Control Increase the limit Increase the flow control limit on pxc3 to 20000 and perform the same exercise as previously. 123
  • 124. pxc1 run_app.sh all nodes: myq_status wsrep Flow Control Increase the limit Increase the flow control limit on pxc3 to 20000 and perform the same exercise as previously. pxc3 mysql> set global wsrep_provider_options="gcs.fc_limit=20000"; pxc3 mysql> flush tables with read lock; wait until flow control kicks in... pcx3 mysql> unlock tables; 124
  • 125. Flow Control Increase the limit What do you see? A node can lag behind more before sending flow control messages. This can be controlled per node. Is there another alternative? 125
  • 126. Flow Control DESYNC mode It's possible to let a node going behind the flow control limit. This can be performed by setting wsrep_desync=ON Try the same exercises but enable DESYNC on pxc3. 126
  • 127. pxc1 run_app.sh all nodes: myq_status wsrep Flow Control DESYNC mode It's possible to let a node going behind the flow control limit. This can be performed by setting wsrep_desync=ON Try the same exercises but enable DESYNC on pxc3. pxc3 mysql> set global wsrep_provider_options="gcs.fc_limit=16"; pxc3 mysql> set global wsrep_desync=on; Don't forget when done: pcx3 mysql> unlock tables; 127
  • 128. How much more can we handle? Max Replication Throughput 128
  • 129. Max Replication Throughput We can measure the write throughput of a node/cluster: Put a node in wsrep_desync=on to avoid flow control messages being sent Lock the replication with FLUSH TABLES WITH READ LOCK Wait and build up a queue for a certain amount of time Unlock replication again with UNLOCK TABLES Measure how fast it syncs up again Compare with normal workload 129
  • 130. Max Replication Throughput Measure On pxc2 run show global status like 'wsrep_last_committed'; select sleep(60); show global status like 'wsrep_last_committed'; One Liner: SELECT ROUND(SUM(trx)/60) AS transactions_per_second FROM (SELECT VARIABLE_VALUE * -1 AS trx FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_last_committed' UNION ALL SELECT sleep(60) AS trx UNION ALL SELECT VARIABLE_VALUE AS trx FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_last_committed') AS COUNTED; +-------------------------+ | transactions_per_second | +-------------------------+ | 185 | +-------------------------+ 130
  • 131. Max Replication Throughput Measure The following stored function is already installed: USE test; DROP FUNCTION IF EXISTS galeraWaitUntilEmptyRecvQueue; DELIMITER $$ CREATE DEFINER=root@localhost FUNCTION galeraWaitUntilEmptyRecvQueue() RETURNS INT UNSIGNED READS SQL DATA BEGIN DECLARE queue INT UNSIGNED; DECLARE starttime TIMESTAMP; DECLARE blackhole INT UNSIGNED; SET starttime = SYSDATE(); SELECT VARIABLE_VALUE AS trx INTO queue FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_local_recv_queue'; WHILE queue > 1 DO /* we allow the queue to be 1 */ SELECT VARIABLE_VALUE AS trx INTO queue FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_local_recv_queue'; SELECT SLEEP(1) into blackhole; END WHILE; RETURN SYSDATE() - starttime; END$$ 131
  • 132. Max Replication Throughput Measure SET GLOBAL wsrep_desync=on; FLUSH TABLES WITH READ LOCK; ...wait until the queue rises to be quite high, about 20.000 UNLOCK TABLES; use test; SELECT sum(trx) as transactions, sum(duration) as time, IF(sum(duration) < 5, 'DID NOT TAKE LONG ENOUGH TO BE ACCURATE', ROUND(SUM(trx)/SUM(duration))) AS transactions_per_second FROM (SELECT VARIABLE_VALUE * -1 AS trx, null as duration FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_last_committed' UNION ALL SELECT null as trx, galeraWaitUntilEmptyRecvQueue() AS duration UNION ALL SELECT VARIABLE_VALUE AS trx, null as duration FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_last_committed' ) AS COUNTED; +--------------+------+-------------------------+ | transactions | time | transactions_per_second | +--------------+------+-------------------------+ | 17764 | 11 | 1615 | 132
  • 133. Max Replication Throughput Measure Normal Workload: 185 tps During Catchup: 1615 tps Capacity: 185/1615= 11,5% of capacity 133
  • 135. Networking With Synchronous Replication, It Matters Network issues cause cluster issues, much faster compared to asynchronous replication. Network Partitioning Nodes joining/leaving Causing clusters to go Non-Primary, not accepting any reads and writes anymore. Latency has an impact on response time: at COMMIT of a transaction depending on wsrep_sync_wait setting for other statements too. 135
  • 136. Networking Status Variables pxc2 mysql> show global status like 'wsrep_evs_repl_latency'; +------------------------+-------------------------------------------------+ | Variable_name | Value | +------------------------+-------------------------------------------------+ | wsrep_evs_repl_latency | 0.000745194/0.00175792/0.00832816/0.00184453/16 | +------------------------+-------------------------------------------------+ Reset Interval with evs.stats_report_period=1min # myq_status wsrep_latency`: mycluster / pxc2 (idx: 2) / Galera 3.12(r9921e73) Wsrep Cluster Node Ops Latencies time P cnf # Stat Up Dn Size Min Avg Max Dev 22:55:48 P 53 3 Sync 0 65 9 681µs 1307µs 4192µs 1032µs 22:55:49 P 53 3 Sync 0 52 10 681µs 1274µs 4192µs 984µs 22:55:50 P 53 3 Sync 0 47 10 681µs 1274µs 4192µs 984µs 22:55:51 P 53 3 Sync 0 61 11 681µs 1234µs 4192µs 947µs 136
  • 137. Networking Latency On pxc1, start the 'application': # myq_status wsrep_latency mycluster / pxc2 (idx: 2) / Galera 3.12(r9921e73) Wsrep Cluster Node Ops Latencies time P cnf # Stat Up Dn Size Min Avg Max Dev 23:02:44 P 53 3 Sync 0 48 7 777µs 1236µs 2126µs 434µs 23:02:45 P 53 3 Sync 0 47 7 777µs 1236µs 2126µs 434µs 23:02:46 P 53 3 Sync 0 58 7 777µs 1236µs 2126µs 434µs run_app.sh pxc1 [1125s] tps: 51.05, reads: 687.72, writes: 204.21, response time: 10.54ms (95%), er [1126s] tps: 33.98, reads: 475.77, writes: 135.94, response time: 15.07ms (95%), er [1127s] tps: 42.01, reads: 588.19, writes: 168.05, response time: 12.79ms (95%), er # myq_status wsrep Wsrep Cluster Node Repl Queue Ops Bytes Conflct Gcache Wind time P cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst 23:02:44 P 53 3 Sync 1.2ms 0 0 0 49 0.0 80K 0 0 77k 178 23:02:45 P 53 3 Sync 1.2ms 0 0 0 43 0.0 70K 0 0 77k 176 23:02:46 P 53 3 Sync 1.2ms 0 0 0 55 0.0 90K 0 0 77k 164 137
  • 138. Networking WAN Impact on Latency Change from a LAN setup into a cluster across 2 datacenters last_node_to_dc2.sh enable 138
  • 139. Networking WAN Impact on Latency last_node_to_dc2.sh enable What can we observe in the cluster after running this command? 139
  • 140. Networking WAN Impact on Latency last_node_to_dc2.sh enable What can we observe in the cluster after running this command? myq_status wsrep_latency is up 200ms run_app.sh throughput is a lot lower run_app.sh response time is a lot higher mycluster / pxc3 (idx: 0) / Galera 3.12(r9921e73) Wsrep Cluster Node Ops Latencies time P cnf # Stat Up Dn Size Min Avg Max Dev 23:23:34 P 53 3 Sync 0 14 6 201ms 202ms 206ms 2073µs 23:23:35 P 53 3 Sync 0 16 6 201ms 202ms 206ms 2073µs 23:23:36 P 53 3 Sync 0 14 6 201ms 202ms 206ms 2073µs 23:23:37 P 53 3 Sync 0 14 6 201ms 202ms 206ms 2073µs 23:23:38 P 53 3 Sync 0 14 6 201ms 202ms 206ms 2073µs 140
  • 141. Networking WAN Impact on Latency Why Is that? 141
  • 142. Networking WAN Impact on Latency Why Is that? Don't forget this is synchronous replication, the writeset is replicated synchronously. Delivers the writeset to all nodes in the cluster at trx commit. And all nodes acknowledging the writeset Generates a GLOBAL ORDER for that transaction (GTID) Cost is ~roundtrip latency for COMMIT to furthest node GTID serialized, but many writesets can be replicating in parallel Remember Mark Callaghan's Law a given row can't be modified more often than 1/RTT times a second 142
  • 143. Networking WAN Configuration Don't forget in WAN to use higher timeouts and send windows: evs.user_send_window=2 ~> 256 evs.send_window=4 ~> 512 evs.keepalive_period=PT1S ~> PT1S evs.suspect_timeout=PT5S ~> PT15S evs.inactive_timeout=PT15S ~> PT45S Don't forget to disable the WAN: last_node_to_dc2.sh disable 143
  • 144. Networking WAN Configuration - Bandwidth How to reduce bandwith used between datacenters? 144
  • 145. Networking WAN Configuration - Bandwidth How to reduce bandwith used between datacenters? Use segments (gmcast.segment) to reduce network traffic between datacenters Use minimal binlog_row_image to reduce binary log size repl.key_format = FLAT8 which by default is already smallest 145
  • 146. Networking Replication Without Segments Here we have a cluster spread across 2 datacenters 146
  • 147. Networking Replication Without Segments A transaction executed on node1 147
  • 148. Networking Replication Without Segments A transaction executed on node1 will be sent to all other nodes 148
  • 149. Networking Replication Without Segments As writes are accepted everywhere, every node therefore communicate with all nodes, including arbitrator nodes 149
  • 150. Networking Replication With Segments Galera 3.0 introduced the segment concept Replication traffic is minimized between segments Donor selection is preferred on local segment 150
  • 151. Networking Replication With Segments Transactions are only sent once to other segments 151
  • 152. Networking Replication With Segments They do not always go through the same nodes, they all still need to be able to connect to eachother. 152
  • 153. Networking Replication With Segments Run the run_app.sh on pxc1 On another terminal on pxc1, run speedometer speedometer -r eth1 -t eth1 -l -m 524288 Change the segment on pxc2 an pxc3 in /etc/my.cnf and restart MySQL (this is not dynamic) wsrep_provider_options='gmcast.segment=2' Check the bandwidth usage again. How do you explain this? 153
  • 154. Networking Replication With Segments Bandwidth Transmit usage drops a lot (/2) 154
  • 155. Networking Binlog Row Image Format On pxc1 run speedometer again:: speedometer -r eth1 -t eth1 -l -m 262144 On pxc1, set the binlog_row_image=minimal: pxc1 mysql> SET GLOBAL binlog_row_image=minimal; Check the bandwith usage 155
  • 156. Networking Binlog Row Image Format On pxc1 run speedometer again:: speedometer -r eth1 -t eth1 -l -m 262144 On pxc1, set the binlog_row_image=minimal: pxc1 mysql> SET GLOBAL binlog_row_image=minimal; Check the bandwith usage 156
  • 157. Networking Not Completely Synchronous Applying transactions is asynchronous By default, reads on different nodes might show stale data. Practically, flow control prevents this from lagging too much behind, reducing stale data. Read Consistency can be configured: We can enforce a read reads the latest committed data, cluster wide. What if we absolutely need consistency? 157
  • 158. Networking Not Completely Synchronous What if we absolutely need consistency? Since PXC 5.6.20-27.7: SET <session|global> wsrep_sync_wait=[1|2|4]; 1 Indicates check on READ statements, including SELECT, SHOW, BEGIN/START TRANSACTION. 2 Indicates check on UPDATE and DELETE statements. 4 Indicates check on INSERT and REPLACE statements Before: <session|global> wsrep_causal_reads=[1|0]; 158
  • 159. Networking Consistent Reads & Latency How does enabling WSREP_SYNC_WAIT consistent reads affect WAN environments? Stop the application run_app.sh Move the last node to DC2: last_node_to_dc2.sh enable On pxc1, run: pxc1 mysql> select * from sbtest.sbtest1 where id = 4; ... 1 row in set (0.00 sec) 159
  • 160. Networking Consistent Reads & Latency Now change the causality check to ensure that READ statements are in sync, and perform the same SELECT: pxc1 mysql> SET SESSION wsrep_sync_wait=1; pxc1 mysql> select * from sbtest.sbtest1 where id = 4; What do you see ? 160
  • 161. Networking Consistent Reads & Latency Now change the causality check to ensure that READ statements are in sync, and perform the same SELECT: pxc1 mysql> SET SESSION wsrep_sync_wait=1; pxc1 mysql> select * from sbtest.sbtest1 where id = 4; What do you see ? ... 1 row in set (0.20 sec) 161
  • 162. Networking Consistent Reads & Latency Now change the causality check to ensure that READ statements are in sync, and perform the same SELECT: pxc1 mysql> SET SESSION wsrep_sync_wait=1; pxc1 mysql> select * from sbtest.sbtest1 where id = 4; What do you see ? ... 1 row in set (0.20 sec) Put back pxc3 on dc1: last_node_to_dc2.sh disable 162
  • 164. Backups Full: Percona XtraBackup Feature-rich online physical Backups Since PXC 5.6.21-25.8, There is LOCK TABLES FOR BACKUP No FLUSH TABLES WITH READ LOCK anymore Locks only DDL and MyISAM, leaves InnoDB fully unlocked No more need to set the backup node in DESYNC to avoid Flow Control 164
  • 165. Backups Full: Percona XtraBackup Feature-rich online physical Backups Since PXC 5.6.21-25.8, There is LOCK TABLES FOR BACKUP No FLUSH TABLES WITH READ LOCK anymore Locks only DDL and MyISAM, leaves InnoDB fully unlocked No more need to set the backup node in DESYNC to avoid Flow Control Point In Time Recovery: Binary Logs It's also recommended to save the binary logs to perform point- in-time recovery With mysqlbinlog 5.6, it's possible to stream them to another 'backup' host. 165
  • 166. Backups Full Backup On pxc1, run the application: run_app.sh pxc1 On pxc3, take a full backup with Percona Xtrabackup 166
  • 167. Backups Full Backup On pxc1, run the application: run_app.sh pxc1 On pxc3, take a full backup with Percona Xtrabackup # innobackupex --galera-info --no-timestamp /root/backups/ xtrabackup version 2.2.12 based on MySQL server 5.6.24 Linux (i686) (revision id: [01] Copying ./ibdata1 to /root/backups/ibdata1 [01] ...done [01] Copying ./sbtest/sbtest1.ibd to /root/backups/sbtest/sbtest1.ibd ... 150920 08:31:01 innobackupex: Executing LOCK TABLES FOR BACKUP... ... 150920 08:31:01 innobackupex: Executing LOCK BINLOG FOR BACKUP... ... 150920 08:31:01 innobackupex: All tables unlocked innobackupex: MySQL binlog position: filename 'pxc3-bin.000001',` position 3133515 150920 08:31:01 innobackupex: completed OK! 167
  • 168. Backups Full Backup Apply the logs and get the seqno: # innobackupex --apply-log /root/backups/ # cat /root/backups/xtrabackup_galera_info b55685a3-5f70-11e5-87f8-2f86c54ca425:1945 # cat /root/backups/xtrabackup_binlog_info pxc3-bin.000001 3133515 We now have a full backup ready to be used. 168
  • 169. Backups Stream Binary Logs Now setup mysqlbinlog to stream the binlogs in /root/binlogs. As requirement, ensure the following is configured: log_slave_updates server-id=__ID__ 169
  • 170. Backups Stream Binary Logs Now setup mysqlbinlog to stream the binlogs in /root/binlogs. As requirement, ensure the following is configured: log_slave_updates server-id=__ID__ Get mysqlbinlog running: # mkdir /root/binlogs # mysql -BN -e "show binary logs" | head -n1 | cut -f1 pxc3-bin.000001 # mysqlbinlog --read-from-remote-server --host=127.0.0.1 --raw --stop-never --result-file=/root/binlogs/ pxc3-bin.000001 & 170
  • 171. Backups Point-in-Time Recovery On pxc2 we update a record: pxc2 mysql> update sbtest.sbtest1 set pad = "PLAM2015" where id = 999; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 171
  • 172. Backups Point-in-Time Recovery On pxc2 we update a record: pxc2 mysql> update sbtest.sbtest1 set pad = "PLAM2015" where id = 999; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 And now it's time to break things, on pxc2, TRUNCATE the sbtest1 table. pxc2 mysql> truncate table sbtest.sbtest1; Query OK, 0 rows affected (0.06 sec) 172
  • 174. Backups Point-in-Time Recovery BROKEN! What now? Let's stop MySQL on all nodes and restore from backup. service mysql stop Restore the backup on pxc3: [root@pxc3 ~]# rm -rf /var/lib/mysql/* [root@pxc3 ~]# innobackupex --copy-back /root/backups/ [root@pxc3 ~]# chown mysql. -R /var/lib/mysql/ 174
  • 175. Backups Point-in-Time Recovery BROKEN! What now? Let's stop MySQL on all nodes and restore from backup. service mysql stop Restore the backup on pxc3: [root@pxc3 ~]# rm -rf /var/lib/mysql/* [root@pxc3 ~]# innobackupex --copy-back /root/backups/ [root@pxc3 ~]# chown mysql. -R /var/lib/mysql/ On pxc3, we bootstrap a completely new cluster: [root@pxc3 ~]# service mysql bootstrap-pxc 175
  • 176. Backups Point-in-Time Recovery The full backup is restored, now we need to do point in time recovery.... Find the position of the "event" that caused the problems 176
  • 177. Backups Point-in-Time Recovery The full backup is restored, now we need to do point in time recovery.... Find the position of the "event" that caused the problems We know the sbtest.sbtest1 table got truncated. Let's find that statement: [root@pxc3 ~]# mysqlbinlog /root/binlogs/pxc3-bin.* | grep -i truncate -B10 OC05MzAyOTU2MDQzOC0xNzU5MDQyMTM1NS02MDYyOTQ1OTk1MC0wODY4ODc0NTg2NTCjjIc= '/*!*/; # at 13961536 #150920 8:33:15 server id 1 end_log_pos 13961567 CRC32 0xc97eb41f Xid = 8667 COMMIT/*!*/; # at 13961567 #150920 8:33:15 server id 2 end_log_pos 13961659 CRC32 0x491d7ff8 Query thread_ SET TIMESTAMP=1442737995/*!*/; SET @@session.sql_mode=1073741824/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; truncate table sbtest.sbtest1 177
  • 178. Backups Point-in-Time Recovery We need to recover up to TRUNCATE TABLE, which was position 13961567 We can replay the binary log(s) from the last position we backupped 178
  • 179. Backups Point-in-Time Recovery We need to recover up to TRUNCATE TABLE, which was position 13961567 We can replay the binary log(s) from the last position we backupped # cat /var/lib/mysql/xtrabackup_info | grep binlog binlog_pos = filename 'pxc3-bin.000001', position 3133515 179
  • 180. Backups Point-in-Time Recovery We need to recover up to TRUNCATE TABLE, which was position 13961567 We can replay the binary log(s) from the last position we backupped # cat /var/lib/mysql/xtrabackup_info | grep binlog binlog_pos = filename 'pxc3-bin.000001', position 3133515 Note that if we don't have streamed the binary logs from the backup server... it can happen, then you need to find the position from the Xid, which is the galera seqno: #150920 8:33:15 server id 1 end_log_pos 13961567 CRC32 0xc97eb41f Xid = 8667 COMMIT/*!*/; # at 13961567 180
  • 181. Backups Point-in-Time Recovery Let's replay it now: # mysqlbinlog /root/binlogs/pxc3-bin.000001 --start-position=3133515 --stop-position=13961567 | mysql 181
  • 182. Backups Point-in-Time Recovery Let's replay it now: # mysqlbinlog /root/binlogs/pxc3-bin.000001 --start-position=3133515 --stop-position=13961567 | mysql Let's Verify: pxc3 mysql> select id, pad from sbtest.sbtest1 where id =999; +-----+----------+ | id | pad | +-----+----------+ | 999 | PLAM2015 | +-----+----------+ 1 row in set (0.00 sec) 182
  • 183. Backups Point-in-Time Recovery Let's replay it now: # mysqlbinlog /root/binlogs/pxc3-bin.000001 --start-position=3133515 --stop-position=13961567 | mysql Let's Verify: pxc3 mysql> select id, pad from sbtest.sbtest1 where id =999; +-----+----------+ | id | pad | +-----+----------+ | 999 | PLAM2015 | +-----+----------+ 1 row in set (0.00 sec) You can now restart the other nodes and they will perform SST. 183
  • 184. Spread the load Load Balancers 184
  • 185. Load Balancers With PXC a Load Balancer is commonly used: Layer 4 Lot's of choice Usually HAProxy (most-common) Layer 7: MariaDB MaxScale ScaleArc (proprietary) ProxySQL mysql-proxy (beta) 185
  • 186. Load Balancers Usually with Galera, people uses a load balancer to route the MySQL requests from the application to a node Redirect writes to another node when problems happen Mostly 1 node for writes, others for reads Layer 4: 1 TCP port writes, 1 TCP port reads Layer 7: Automatic (challenging) 186
  • 187. Load Balancers HAProxy On pxc1, we have HA Proxy configured like this when listening on port 3308: ## active-passive listen 3308-active-passive-writes 0.0.0.0:3308 mode tcp balance leastconn option httpchk server pxc1 pxc1:3306 check port 8000 inter 1000 rise 3 fall 3 server pxc2 pxc2:3306 check port 8000 inter 1000 rise 3 fall 3 backup server pxc3 pxc3:3306 check port 8000 inter 1000 rise 3 fall 3 backup 187
  • 188. Load Balancers HAProxy On pxc2 and pxc3, we connect to the loadbalancer and run a SELECT: mysql -h pxc1 -P 3308 -utest -ptest -e "select @@wsrep_node_name, sleep(100)" And on pxc1 while the previous command is running we check the processlist: pxc1 mysql> SELECT PROCESSLIST_ID AS id, PROCESSLIST_USER AS user, PROCESSLIST_HOST AS host, PROCESSLIST_INFO FROM performance_schema.threads WHERE PROCESSLIST_INFO LIKE 'select @% sleep%'; +------+------+------+-------------------------------------+ | id | user | host | PROCESSLIST_INFO | +------+------+------+-------------------------------------+ | 294 | test | pxc1 | select @@wsrep_node_name, sleep(10) | | 297 | test | pxc1 | select @@wsrep_node_name, sleep(10) | +------+------+------+-------------------------------------+ 188
  • 189. Load Balancers HAProxy On pxc2 and pxc3, we connect to the loadbalancer and run a SELECT: mysql -h pxc1 -P 3308 -utest -ptest -e "select @@wsrep_node_name, sleep(100)" And on pxc1 while the previous command is running we check the processlist: pxc1 mysql> SELECT PROCESSLIST_ID AS id, PROCESSLIST_USER AS user, PROCESSLIST_HOST AS host, PROCESSLIST_INFO FROM performance_schema.threads WHERE PROCESSLIST_INFO LIKE 'select @% sleep%'; +------+------+------+-------------------------------------+ | id | user | host | PROCESSLIST_INFO | +------+------+------+-------------------------------------+ | 294 | test | pxc1 | select @@wsrep_node_name, sleep(10) | | 297 | test | pxc1 | select @@wsrep_node_name, sleep(10) | +------+------+------+-------------------------------------+ What do you notice ? 189
  • 190. Load Balancers HA Proxy & Proxy Protocol Since Percona XtraDB Cluster 5.6.25-73.1 we support proxy protocol! (Almost released) 190
  • 191. Load Balancers HA Proxy & Proxy Protocol Since Percona XtraDB Cluster 5.6.25-73.1 we support proxy protocol! (Almost released) Let's enable this in my.cnf on all 3 nodes: [mysqld] ... proxy_protocol_networks=* ... 191
  • 192. Load Balancers HA Proxy & Proxy Protocol Since Percona XtraDB Cluster 5.6.25-73.1 we support proxy protocol! (Almost released) Let's enable this in my.cnf on all 3 nodes: [mysqld] ... proxy_protocol_networks=* ... Restart them one by one: [root@pxc1 ~]# /etc/init.d/mysql restart ... [root@pxc2 ~]# /etc/init.d/mysql restart ... [root@pxc3 ~]# /etc/init.d/mysql restart 192
  • 193. Load Balancers HA Proxy & Proxy Protocol On pxc1, we have HAProxy configured like this when listening on port 3310 to support proxy protocol: listen 3310-active-passive-writes 0.0.0.0:3310 mode tcp balance roundrobin option httpchk server pxc1 pxc1:3306 send-proxy-v2 check port 8000 inter 1000 rise 3 fall 3 server pxc2 pxc2:3306 send-proxy-v2 check port 8000 inter 1000 backup server pxc3 pxc3:3306 send-proxy-v2 check port 8000 inter 1000 backup And restart HAProxy: service haproxy restart 193
  • 194. Load Balancers HA Proxy & Proxy Protocol On pxc2 and pxc3, we connect to the loadbalancer (using the new port) and run a SELECT: mysql -h pxc1 -P 3310 -utest -ptest -e "select @@wsrep_node_name, sleep(10)" And on pxc1 while the previous command is running we check the processlist: pxc1 mysql> SELECT PROCESSLIST_ID AS id, PROCESSLIST_USER AS user, PROCESSLIST_HOST AS host, PROCESSLIST_INFO FROM performance_schema.threads WHERE PROCESSLIST_INFO LIKE 'select @% sleep%'; +------+------+------+-------------------------------------+ | id | user | host | PROCESSLIST_INFO | +------+------+------+-------------------------------------+ | 75 | test | pxc2 | select @@wsrep_node_name, sleep(10) | | 76 | test | pxc3 | select @@wsrep_node_name, sleep(10) | +------+------+------+-------------------------------------+ 194
  • 195. Load Balancers HA Proxy & Proxy Protocol Try to connect from pxc1 to pxc1, not using a load balancer: pxc1 # mysql -h pxc1 -P 3306 What happens? 195
  • 196. Load Balancers HA Proxy & Proxy Protocol Try to connect from pxc1 to pxc1, not using a load balancer: pxc1 # mysql -h pxc1 -P 3306 What happens? You can't connect to mysql anymore. When proxy_protocol_network is enabled it won't connect if you don't send TCP proxy header! 196
  • 197. Load Balancers HA Proxy & Proxy Protocol Try to connect from pxc1 to pxc1, not using a load balancer: pxc1 # mysql -h pxc1 -P 3306 What happens? You can't connect to mysql anymore. When proxy_protocol_network is enabled it won't connect if you don't send TCP proxy header! Let's cleanup the proxy_protocol_network and restart all nodes before continuing. 197