More Related Content Similar to Galera Cluster: Synchronous Multi-Master Replication for MySQL HA (20) More from Ludovico Caldara (20) Galera Cluster: Synchronous Multi-Master Replication for MySQL HA1. Galera Cluster
TechEvent
Synchronous Multi-Master
Replication for MySQL HA
April 2013
Ludovico CALDARA
LS-IMS
27.04.2013
BASEL
1
BERN
LAUSANNE
ZÜRICH
DÜSSELDORF
FRANKFURT A.M.
FREIBURG I.BR.
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
HAMBURG
MÜNCHEN
STUTTGART
WIEN
2. MySQL forks: which one is better?
MySQL
Oracle MySQL
New forks
Percona Server
Many new features
MariaDB
Improved instrumentation
Drizzle
New solutions for DEVs and DBAs
Fast-paced competition between forks’ developers
Recent evolutions in HA and scalability have made MySQL enterprise ready
2
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
3. There is no recipe that can satisfy all tastes
Percona Server
MariaDB
MySQL
Multi source replication
NO
YES (rel. 10)
NO
NoSQL integration
YES (cassandra)
YES (cassandra)
YES (memcached)
Virtual Columns
NO
YES
NO
Improved diagnostics
YES
NO
NO
Online DDL
NO
YES
YES
Galera Cluster
YES
YES
YES (codership patch)
Many many others
YES/NO
YES/NO
YES/NO
3
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
4. Your real requirements will let you choose… Need HA?
•
4
How will react your customer if there is an important loss of service?
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
5. Old-school solutions have weaknesses
Native MySQL Replication
• Doesn’t scale writes
• Complex to promote slaves
MySQL Multi-Master Replication
• Complex and not reliable
• Concurrent writes lead to logical corruption
DRBD Replication
• Standby is offline, doesn’t scale at all
• Poor performance
MySQL Cluster
• Very complex
• It’s not InnoDB!
NDB
NDB
NDB
5
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
6. New school solutions: 3rd parties are playing a decisive role
Continuent Tungsten Replicator
• Similar to Golden Gate
• Heterogeneous databases
• Provides complex topologies
• Asynchronous
• Conflicts are complex to resolve
• Complex to maintain
• Not free
ORACLE MYSQL
Galera Cluster Replication
• Transparent Multi-Master easy to mantain
• (Virtually) Synchronous
• It’s InnoDB (only InnoDB)
• Great and easy scalability
• Optimistic locking (side effects)
• At least 3 nodes for good HA
6
MYSQL ORACLE
MYSQL
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
7. Multi-Master and virtually synchronous: it’s transparent
R/W
7
R/W
R/W
R/W
R/W
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
8. Cluster implementation - Ingredients
• One or more standalone servers (either physical or virtual)
• Linux (other operating systems are not yet available)
• “Permissive” Firewall between nodes
• Codership’s Galera Library package
• A package of your choice:
• Percona XtraDB Cluster
• MariaDB Galera Cluster
• MySQL with wsrep patch
(patched by Codership)
8
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
9. Cluster implementation - Variables
• Each server’s my.cnf must contain:
• wsrep_cluster_address=gcomm://192.168.1.100,…,192.168.1.10x
• wsrep_provider=/usr/lib64/libgalera_smm.so
• binlog_format=ROW
• default_storage_engine=InnoDB
• innodb_autoinc_lock_mode=2
• innodb_locks_unsafe_for_binlog=1 #disables gap locking
9
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
10. Cluster implementation – Start the cluster
mysqld_safe --wsrep_cluster_address=gcomm:// &
[…]
130220 17:56:46 [Note] WSREP: Starting new group from scratch:
[…]
The empty gcomm:// address starts the node as the first of the cluster
NEVER USE IT TO JOIN AN EXISTING CLUSTER
10
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
11. Cluster implementation – Adding nodes to the cluster
mysqld_safe
--wsrep_cluster_address=gcomm://host1,host2… &
[…]
130220 18:01:56 [Note] WSREP: Shifting OPEN -> PRIMARY (TO:…)
130220 18:01:56 [Note] WSREP: State transfer required:
[…]
The address should be already present in the my.cnf!
11
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
12. Server State Transfer
• The joiner asks for a SST
R/W
12
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
R/W
R/W
13. Server State Transfer
• The joiner asks for a SST
• The cluster chooses a donor, the donor is taken offline
R/W
DONOR
13
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
R/W
14. Server State Transfer
• The joiner asks for a SST
• The cluster chooses a donor, the donor is taken offline
• The donor is backed up
• The donor comes online again and the joiner is loaded
R/W
DONOR
14
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
R/W
R/W
15. Server State Transfer
• The joiner asks for a SST
• The cluster chooses a donor, the donor is taken offline
• The donor is backed up
• The donor comes online again and the joiner is loaded
• The joiner replays the missing transactions
and joins the cluster
R/W
R/W
DONOR
15
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
R/W
R/W
16. Server State Transfer
• The joiner asks for a SST
• The cluster chooses a donor, the donor is taken offline
• The donor is backed up
• The donor comes online again and the joiner is loaded
• The joiner replays the missing transactions
and joins the cluster
• The cluster can also do
Incremental State Transfers (IST)
16
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
R/W
R/W
R/W
R/W
17. Split-Brain
• The majority of nodes wins
• Complete loss of network: all nodes
go offline
• The offline nodes will respond:
mysql> select * from emp;
ERROR 1047 (08S01): Unknown
command
• Galera arbitrator (garbd) can join the
cluster and count as a member in split
brain resolution.
• NEW: Galera 2.4 intruduces weighted
quorum
17
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
gar
arbitrator
18. Example 1: Arbitrator in Trivadis Swiss
BASEL
… sorry for German/Austrian attenders ☺
ZURICH
WAN
arbitrator
• If the WAN connection is lost,
Zurich survives
BERN
• If the Zurich site is lost, the cluster
will be off lined
LAUSANNE
18
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
19. Example 2: Arbitrator in Trivadis Swiss
BASEL
… sorry for German/Austrian attenders ☺
ZURICH
WAN
• If the Zurich site is lost, the other
sites survive
BERN
• If the WAN connection is lost, the
cluster will be off lined
LAUSANNE
19
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
arbitrator
20. What does “Virtually synchronous” mean? In brief:
Write
20
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
21. What does “Virtually synchronous” mean? In brief:
Write
Commit
WS
21
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
22. What does “Virtually synchronous” mean? In brief:
Write
Commit
WS
22
WS
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
WS
23. What does “Virtually synchronous” mean? In brief:
Write
Commit
Commit
OK
WS
23
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
WS
24. What does “Virtually synchronous” mean? In brief:
•
Writes are as fast as if they were local
•
Commits take just the time of a network
roundtrip: if acceptable then the cluster
can be spread geographically
Write
Commit
Commit
OK
WS
24
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
WS
25. Optimistic locking leads to side effects
mysql> update emp set salary=‘peanuts’ where name=‘Caldara';
Query OK, 1 row affected (0.03 sec)
Rows matched: 1 Changed: 1 Warnings: 0
25
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
26. Optimistic locking leads to side effects
mysql> update emp set salary=‘peanuts’ where name=‘Caldara';
Query OK, 1 row affected (0.03 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update emp set salary=‘one billion' where name=‘Caldara';
Query OK, 1 row affected (0.03 sec)
Rows matched: 1 Changed: 1 Warnings: 0
26
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
27. Optimistic locking leads to side effects
mysql> update emp set salary=‘peanuts’ where name=‘Caldara';
Query OK, 1 row affected (0.03 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update emp set salary=‘one billion' where name=‘Caldara';
Query OK, 1 row affected (0.03 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> commit;
Query OK, 0 rows affected (0.01 sec
WS
27
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
WS
28. Optimistic locking leads to side effects
mysql> update emp set salary=‘peanuts’ where name=‘Caldara';
Query OK, 1 row affected (0.03 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update emp set salary=‘one billion' where name=‘Caldara';
Query OK, 1 row affected (0.03 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> commit;
Query OK, 0 rows affected (0.01 sec
mysql> commit;
ERROR 1213 (40001): Deadlock found when trying to get lock; try
restarting transaction
mysql> select salary from emp where name=‘Caldara’;
+-------------+
| salary
|
+-------------+
| one billion |
+-------------+
WS
28
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
29. Conclusions on optimistic locking…
• Locally, the first that acquires the lock wins (it’s InnoDB…)
• Cluster-wise, the first that broadcasts its commit wins (it’s
Galera…)
• The application should not have hotspots...
• … or it should retry the transaction after the deadlock occurs…
• … or, for each database, you can elegy one node as the master
29
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
30. About performance
• Commit performance loss is between 5% and 10% plus the network RTT
• Write workloads scale to up to 8 nodes
• >8 nodes: it scales reads, not writes
• Many benchmarks show that Galera overcomes NDB with few nodes
• NDB scales out more with many nodes thanks to data sharding
• Benchmarks on internet are not always reliable… test the performance
of YOUR application
30
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
31. How to migrate
• Converts all your tables to InnoDB
• Double-check that all tables have primary keys
• Think about potential problems caused by triggers (if you have any)
• Create a new empty Galera Cluster
• Setup MySQL native replication between the old database and the
Galera cluster
• Once all is aligned, direct your clients on the new cluster
• Setup the old node to join the cluster
NATIVE
REPLICATION
31
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
JOIN
32. Load balancing
• HAProxy is the most used solution so far
• Codership is actively developing his own load
balancer: Galera Load Balancer (glbd)
• Several balancing modes: round robin,
custom, least connected, …
• Automatically drains disconnected nodes
• New nodes can be added with a single tcp
call
• Release 1.0 (now rc1) will support
watchdog and automatic discover of
nodes composing the cluster
• Other methods possible (e.g. java connector
properties, HW load balancer)
32
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
33. Conclusions on Galera Cluster
• Multi-master
• Shared-nothing
• Great performances and scalability
• «Virtually» synchronous
• It uses InnoDB!!
• Conflict prevention
• Split-brain (no inconsistencies)
• Easy to add/remove nodes
33
• At least 3 nodes to have good HA
• Optimistic locking (side effects)
• Explicit locking doesn’t work
• Only InnoDB is replicated
• Primary keys are mandatory
• Not yet available for MySQL 5.6
• Linux only
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
35. Little demo?
35
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
37. Trivadis SA
THANK YOU.
Ludovico Caldara
Senior Consultant
Ludovico.caldara@trivadis.com
www.trivadis.com
BASEL
37
BERN
LAUSANNE
ZÜRICH
DÜSSELDORF
FRANKFURT A.M.
FREIBURG I.BR.
2012 © Trivadis
Galera Cluster Synchronous Multi-Master Replication for MySQL HA
27.04.2013
HAMBURG
MÜNCHEN
STUTTGART
WIEN