3.
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purpose only, and may not be
incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied up in
making purchasing decisions. The development, release and timing of any features or functionality described for Oracle´s product
remains at the sole discretion of Oracle.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
3 / 191
4. about.me/lefred
Who am I ?
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
4 / 191
6. Group Replication: heart of MySQL InnoDB Cluster
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
6 / 191
7. Group Replication: heart of MySQL InnoDB Cluster
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
7 / 191
8. MySQL Group Replication
but what is it ?!?
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
8 / 191
9. GR is a plugin for MySQL, made by MySQL and packaged with
MySQL
GR is an implementation of Replicated Database State
Machine theory
Paxos based protocol
GR allows to write on all Group Members (cluster nodes)
simultaneously while retaining consistency
GR implements conflict detection and resolution
GR allows automatic distributed recovery
Automatic failover
Automatic membership configuration
Adding/removing members
Network partitions, failures
Prevents data loss
Supported on all MySQL platforms !!
Linux, Windows, Solaris, OSX, FreeBSD
Completely Open Source - GPL ! No license required to have
High Availability
MySQL Group Replication
but what is it ?!?
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
9 / 191
10. Group Replication is nice, but how does it work ?
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
10 / 191
11. Group Replication is nice, but how does it work ?
it´s just ...
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
11 / 191
12. Group Replication is nice, but how does it work ?
it´s just ...
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
12 / 191
13. Group Replication is nice, but how does it work ?
it´s just ...
... no, in fact the writesets replication is synchronous and then certification and apply of the changes are local to each nodes and
asynchronous.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
13 / 191
14. Group Replication is nice, but how does it work ?
it´s just ...
... no, in fact the writesets replication is synchronous and then certification and apply of the changes are local to each nodes and
asynchronous.
not that easy to understand... right ? As a picture is worth a 1000 words, let´s illustrate this...
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
14 / 191
15. Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
15 / 191
28. MySQL Group Replication (full transaction)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
28 / 191
29. MySQL Group Replication (full transaction)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
29 / 191
30. MySQL Group Replication (full transaction)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
30 / 191
31. MySQL Group Replication (full transaction)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
31 / 191
32. MySQL Group Replication (full transaction)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
32 / 191
33. MySQL Group Replication (full transaction)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
33 / 191
34. MySQL Group Replication (full transaction)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
34 / 191
35. MySQL Group Replication (full transaction)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
35 / 191
36. MySQL Group Replication (full transaction)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
36 / 191
37. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
37 / 191
38. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Replicated Database State Machine
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
38 / 191
39. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Replicated Database State Machine
Paxos based protocol (a variant of Mencius)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
39 / 191
40. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Replicated Database State Machine
Paxos based protocol (a variant of Mencius)
its task: deliver messages across the distributed system:
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
40 / 191
41. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Replicated Database State Machine
Paxos based protocol (a variant of Mencius)
its task: deliver messages across the distributed system:
atomically
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
41 / 191
42. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Replicated Database State Machine
Paxos based protocol (a variant of Mencius)
its task: deliver messages across the distributed system:
atomically
in Total Order (Writes)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
42 / 191
43. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Replicated Database State Machine
Paxos based protocol (a variant of Mencius)
its task: deliver messages across the distributed system:
atomically
in Total Order (Writes)
MySQL Group Replication receives the Ordered 'tickets' from this GCS subsystem.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
43 / 191
44. Architecture, Stack, Core, ...
MySQL Group Replication
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
44 / 191
45. MySQL Group Replication - Architecture
Node Types:
R: Traffic routers/proxies: MySQL Router, HAProxy, ProxySQL...
M: mysqld nodes participating in MySQL Group Replication
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
45 / 191
46. MySQL Group Replication - Stack
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
46 / 191
47. Server calls into the plugin through a generic interface
Most server internals are hidden from the plugin
Plugin interacts with the server through a generic interface
Replication plugin determines the fate of the commit
operation through a well defined server interface
The plugin makes use of the relay log infrastructure to
inject changes in the receiving server
MySQL Group Replication - Core
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
47 / 191
48. Maintaining distributed execution context
Detecting and Resolving conflicts
Handling distributed recovery
Detect membership changes
Donate state if needed
Collect state if needed
Proposing transactions to other members
Receiving and handling transactions from other members
Deciding the ultimate fate of transactions
commit or rollback
MySQL Group Replication - GR Plugin
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
48 / 191
49. The communication API (and binding) is responsible for:
Abstracting the underlying group communication system implementation from
the plugin itself
Mapping the interface to a specific group communication system implementation
The Group Communication System engine:
Paxos implementation
(Similar to Paxos Mencius)
Building block to provide distributed agreement between servers
MySQL Group Replication - GCS
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
49 / 191
51. MySQL Group Replication uses MySQL replication framework by
design:
binary logs
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
51 / 191
52. MySQL Group Replication uses MySQL replication framework by
design:
binary logs
relay logs
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
52 / 191
53. MySQL Group Replication uses MySQL replication framework by
design:
binary logs
relay logs
GTIDs: Global Transaction IDs
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
53 / 191
54. How does Group Replication handle GTIDs ?
There are two ways of generating GTIDs:
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
54 / 191
55. How does Group Replication handle GTIDs ?
There are two ways of generating GTIDs:
AUTOMATIC: the transaction is assigned with an automatically generated id during commit. Where regular replication uses the
source server UUID, on Group Replication, the group name is used.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
55 / 191
56. How does Group Replication handle GTIDs ?
There are two ways of generating GTIDs:
AUTOMATIC: the transaction is assigned with an automatically generated id during commit. Where regular replication uses the
source server UUID, on Group Replication, the group name is used.
ASSIGNED: the user assigns manually a GTID through SET GTID_NEXT to the transaction. This is common to any replication
format and the id is assigned before the transaction starts.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
56 / 191
57. Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
57 / 191
58. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
58 / 191
59. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
59 / 191
60. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
60 / 191
61. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
61 / 191
62. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
62 / 191
63. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
63 / 191
64. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
64 / 191
65. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
65 / 191
66. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
66 / 191
67. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
67 / 191
68. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
68 / 191
69. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
69 / 191
70. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
70 / 191
71. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
71 / 191
72. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
72 / 191
73. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
73 / 191
74. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
74 / 191
75. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
75 / 191
76. Group Replication : Total Order Delivery - GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
76 / 191
77. Group Replication : GTID
The previous example was not totally in sync with reality. In fact, a writer allocates a block of GTID and when we have multiple
writes (multi-primary mode) all writers will use GTID sequence numbers in their allocated block.
The size of the block is defined by group_replication_gtid_assignment_block_size (default to 1M)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
77 / 191
78. Group Replication : GTID
Example:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
78 / 191
79. Group Replication : GTID
Example:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355
New write on an other node:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355,1000354
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
79 / 191
80. Group Replication : GTID
Example:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355
New write on an other node:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355,1000354
Let's write on the third node:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355:1000354:2000354
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
80 / 191
81. Group Replication : GTID
Example:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355
New write on an other node:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355,1000354
Let's write on the third node:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355:1000354:2000354
And writing back on the first one:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-356:1000354:2000354
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
81 / 191
82. Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
82 / 191
83. done !
Return from Commit
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
83 / 191
84. Group Replication: return from commit
Asynchronous Replication:
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
84 / 191
85. Group Replication: return from commit (2)
Semi-Sync Replication:
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
85 / 191
86. (*): eventual
Group Replication: return from commit (3)
Group Replication (*):
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
86 / 191
87. (*): before
Group Replication: return from commit (4)
Group Replication (*):
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
87 / 191
88. (*): after
Group Replication: return from commit (5)
Group Replication (*):
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
88 / 191
89. Does this mean we can have a distant node and always let it ack later
?
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
89 / 191
90. Does this mean we can have a distant node and always let it ack later
?
NO!
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
90 / 191
91. Does this mean we can have a distant node and always let it ack later
?
NO!
Because the system has to wait for the noop (single skip message) from the “distant” node where latency is higher
The size of the GCS consensus messages window can be get and set from UDF functions:
group_replication_get_write_concurrency(), group_replication_set_write_concurrency()
mysql> select group_replication_get_write_concurrency();
+-------------------------------------------+
| group_replication_get_write_concurrency() |
+-------------------------------------------+
| 10 |
+-------------------------------------------+
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
91 / 191
92. Event Horizon
GCS Write Consensus Concurrency
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
92 / 191
93. Event Horizon
GCS Write Consensus Concurrency
group replication write concurrency
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
93 / 191
94. Event Horizon
GCS Write Consensus Concurrency
group replication write concurrency
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
94 / 191
95. Event Horizon
GCS Write Consensus Concurrency
group replication write concurrency
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
95 / 191
96. Event Horizon
GCS Write Consensus Concurrency
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
96 / 191
97. Event Horizon
GCS Write Consensus Concurrency
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
97 / 191
98. Event Horizon
GCS Write Consensus Concurrency
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
98 / 191
99. Event Horizon
GCS Write Consensus Concurrency
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
99 / 191
100. Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
100 / 191
102. Group Replication : Optimistic Locking
Group Replication uses optimistic locking
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
102 / 191
103. Group Replication : Optimistic Locking
Group Replication uses optimistic locking
during a transaction, local (InnoDB) locking happens
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
103 / 191
104. Group Replication : Optimistic Locking
Group Replication uses optimistic locking
during a transaction, local (InnoDB) locking happens
optimistically assumes there will be no conflicts across nodes
(no communication between nodes necessary)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
104 / 191
105. Group Replication : Optimistic Locking
Group Replication uses optimistic locking
during a transaction, local (InnoDB) locking happens
optimistically assumes there will be no conflicts across nodes
(no communication between nodes necessary)
cluster-wide conflict resolution happens only at COMMIT, during certification
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
105 / 191
106. Group Replication : Optimistic Locking
Group Replication uses optimistic locking
during a transaction, local (InnoDB) locking happens
optimistically assumes there will be no conflicts across nodes
(no communication between nodes necessary)
cluster-wide conflict resolution happens only at COMMIT, during certification
Let´s first have a look at the traditional locking to compare.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
106 / 191
119. Optimistic Locking
The system returns error 149 as certification failed:
ERROR 1180 (HY000): Got error 149 during COMMIT
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
119 / 191
120. Optimistic Locking (2)
The conflict detection can happen at two different stage:
Only one of the two transactions was sent to the Group and the other one is still running (local).
If both transaction were sent to the Group at almost the same time and both reached GCS/XCOM.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
120 / 191
121. Optimistic Locking (3)
Only one transaction was sent to GCS and the other conflicting one is still local:
In this case, it's the high priority transaction mechanism of InnoDB that kills the local one:
If you try any statement in the transaction's session you will see:
ERROR: 1213: Deadlock found when trying to get lock; try restarting transaction
If you try to commit the transaction you will see:
ERROR: 1180: Got error 149 - 'Lock deadlock; Retry transaction' during COMMIT
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
121 / 191
122. Optimistic Locking (4)
Both transactions were sent to GCS:
The second transaction (the conflicting one) will return:
ERROR 3101 (40000): Plugin instructed the server to rollback the current transaction.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
122 / 191
123. Such conflicts happen only when using multi-primary group !
not totally true in MySQL < 8.0.13 when failover happens
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
123 / 191
124. Drawbacks of optimistic locking
having a first-committer-wins system means conflicts will more likely happen when writing on multiple members with:
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
124 / 191
125. Drawbacks of optimistic locking
having a first-committer-wins system means conflicts will more likely happen when writing on multiple members with:
large transactions
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
125 / 191
126. Drawbacks of optimistic locking
having a first-committer-wins system means conflicts will more likely happen when writing on multiple members with:
large transactions
long running transactions
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
126 / 191
127. Drawbacks of optimistic locking
having a first-committer-wins system means conflicts will more likely happen when writing on multiple members with:
large transactions
long running transactions
hotspot records
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
127 / 191
128. Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
128 / 191
130. mysql> show variables like
'group_replication_consistency';
+-------------------------------+----------+
| Variable_name | Value |
+-------------------------------+----------+
| group_replication_consistency | EVENTUAL |
+-------------------------------+----------+
Consistency: EVENTUAL (default)
By default, there is no synchronization point for the transactions, when you perform a write on a node, if you immediately read the
same data on another node, it is eventually there.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
130 / 191
131. mysql> show variables like
'group_replication_consistency';
+-------------------------------+----------+
| Variable_name | Value |
+-------------------------------+----------+
| group_replication_consistency | EVENTUAL |
+-------------------------------+----------+
Consistency: EVENTUAL (default)
By default, there is no synchronization point for the transactions, when you perform a write on a node, if you immediately read the
same data on another node, it is eventually there.
Since MySQL 8.0.16, we have to possibility to set the
synchronization point at read or at write or both (globally or for a
session).
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
131 / 191
135. Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
135 / 191
136. can the transaction be committed ?
Certification
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
136 / 191
137. Certification
Certification is the process that only needs to answer the following unique question:
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
137 / 191
138. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
138 / 191
139. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
139 / 191
140. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
140 / 191
141. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
happens on every member/node and is deterministic
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
141 / 191
142. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
happens on every member/node and is deterministic
results are not reported to the group (does not require a new communication step)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
142 / 191
143. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
happens on every member/node and is deterministic
results are not reported to the group (does not require a new communication step)
pass: commit/queue to appy
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
143 / 191
144. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
happens on every member/node and is deterministic
results are not reported to the group (does not require a new communication step)
pass: commit/queue to appy
fail: rollback/drop the transaction
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
144 / 191
145. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
happens on every member/node and is deterministic
results are not reported to the group (does not require a new communication step)
pass: commit/queue to appy
fail: rollback/drop the transaction
serialized by the total order in GCS/XCOM + GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
145 / 191
146. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
happens on every member/node and is deterministic
results are not reported to the group (does not require a new communication step)
pass: commit/queue to appy
fail: rollback/drop the transaction
serialized by the total order in GCS/XCOM + GTID
cost is based on trx size (# rows & # keys)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
146 / 191
148. Let's illustrate a table:
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
148 / 191
149. Now let's make a change
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
149 / 191
150. and at commit time:
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
150 / 191
151. Writesets
Contain the hash for the rows PKs that are changed and in some cases the hashes of foreign keys or others dependencies that
need to be captured (e.g. non NULL UKs). Writesets are gathered during transaction execution.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
151 / 191
152. Writesets
Contain the hash for the rows PKs that are changed and in some cases the hashes of foreign keys or others dependencies that
need to be captured (e.g. non NULL UKs). Writesets are gathered during transaction execution.
Writes
Called also write values, refers to the actual changes. Write values are also gathered during transaction execution.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
152 / 191
153. Writeset - examples
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
153 / 191
154. Writeset - examples
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t2 values (1,2);
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
154 / 191
155. Writeset - examples
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t2 values (1,2);
pke: PRIMARY | test | t2 | 1 | 1 hash: 11853456929268668462
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
155 / 191
156. Writeset - examples
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t2 values (1,2);
pke: PRIMARY | test | t2 | 1 | 1 hash: 11853456929268668462
mysql> update t2 set name=3 where id=1;
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
156 / 191
157. Writeset - examples
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t2 values (1,2);
pke: PRIMARY | test | t2 | 1 | 1 hash: 11853456929268668462
mysql> update t2 set name=3 where id=1;
pke: PRIMARY | test | t2 | 1 | 1 hash: 10002085147685770725
pke: PRIMARY | test | t2 | 1 | 1 hash: 10002085147685770725
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
157 / 191
158. Writeset - examples (2)
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | UNI | NULL | |
| name2 | binary(1) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
158 / 191
159. Writeset - examples (2)
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | UNI | NULL | |
| name2 | binary(1) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t3 values (1,2,3);
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
159 / 191
160. Writeset - examples (2)
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | UNI | NULL | |
| name2 | binary(1) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t3 values (1,2,3);
pke: PRIMARY | test |t3 | 1 | 1 hash: 79134815725924853
pke: name | test |t3 | 2 hash: 11034644986657565827
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
160 / 191
161. Writeset - examples (2)
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | UNI | NULL | |
| name2 | binary(1) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t3 values (1,2,3);
pke: PRIMARY | test |t3 | 1 | 1 hash: 79134815725924853
pke: name | test |t3 | 2 hash: 11034644986657565827
mysql> update t3 set name=3 where id=1;
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
161 / 191
162. Writeset - examples (2)
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | UNI | NULL | |
| name2 | binary(1) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t3 values (1,2,3);
pke: PRIMARY | test |t3 | 1 | 1 hash: 79134815725924853
pke: name | test |t3 | 2 hash: 11034644986657565827
mysql> update t3 set name=3 where id=1;
pke: PRIMARY | test | t3 | 1 | 1 hash: 79134815725924853
pke: name | test | t3 | 3 hash: 18082071075512932388
pke: PRIMARY | test | t3 | 1 | 1 hash: 79134815725924853
pke: name | test | t3 | 2 hash: 11034644986657565827
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
162 / 191
163. Writeset - examples (2)
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | UNI | NULL | |
| name2 | binary(1) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t3 values (1,2,3);
pke: PRIMARY | test |t3 | 1 | 1 hash: 79134815725924853
pke: name | test |t3 | 2 hash: 11034644986657565827
mysql> update t3 set name=3 where id=1;
pke: PRIMARY | test | t3 | 1 | 1 hash: 79134815725924853
pke: name | test | t3 | 3 hash: 18082071075512932388
pke: PRIMARY | test | t3 | 1 | 1 hash: 79134815725924853
pke: name | test | t3 | 2 hash: 11034644986657565827
[after image]
[before image]
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
163 / 191
165. Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
165 / 191
166. Enhanced support for large transactions
Message Fragmentation
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
166 / 191
167. Fragmenting an outgoing message
(1). If the message size exceeds the maximum size that the user allows
(group_replication_communication_max_message_size), the member fragments the message into chunks
that do not exceed the maximum size.
(2). The member broadcasts each chunk to the group, i.e. forwards each chunk individually to XCom.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
167 / 191
168. Reassembling an incoming message - first fragment
(2). The members conclude that the incoming message is actually a fragment of a bigger message.
(3). The members buffer the incoming fragment because they conclude the fragment is a chunk of a still-incomplete message.
(Fragments contain the necessary metadata to reach this conclusion.)
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
168 / 191
169. Reassembling an incoming message - second fragment
(5). Same as step 3.
(6). Same as step 4.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
169 / 191
170. Reassembling an incoming message - last fragment
1. The members conclude that the incoming message is actually a fragment of a bigger message.
2. The members conclude that the incoming fragment is the last chunk missing, reassemble the original, whole message, and
process it.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
170 / 191
171. Houston we have a problem !
Flow Control
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
171 / 191
172. Flow Control
In Group Replication, every member send statistics about its queues (applier queue and certification queue) to the other members.
Then every node decide to slow down or not if they realize that one node reached the threshold for one of the queue.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
172 / 191
173. Flow Control
In Group Replication, every member send statistics about its queues (applier queue and certification queue) to the other members.
Then every node decide to slow down or not if they realize that one node reached the threshold for one of the queue.
So when group_replication_ ow_control_mode is set to QUOTA on the node seeing that one of the other
members of the cluster is lagging behind (threshold reached), it will throttle the write operations to the a quota that is calculated
based on the number of transactions applied in the last second, and then it is reduced below that by subtracting the “over the
quota” messages from the last period.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
173 / 191
174. Flow Control
In Group Replication, every member send statistics about its queues (applier queue and certification queue) to the other members.
Then every node decide to slow down or not if they realize that one node reached the threshold for one of the queue.
So when group_replication_ ow_control_mode is set to QUOTA on the node seeing that one of the other
members of the cluster is lagging behind (threshold reached), it will throttle the write operations to the a quota that is calculated
based on the number of transactions applied in the last second, and then it is reduced below that by subtracting the “over the
quota” messages from the last period.
This mean that the threshold is NOT decided on the node being slow, but the node writing a transaction checks its threshold flow
control values and compare them to the statistics from the other nodes to decide to throttle or not.
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
174 / 191
175. Flow Control - on writer
>quota
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
175 / 191
176. Flow Control - on all members
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
176 / 191
180. begin;
update table1
set c = 999
where id =2;
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
180 / 191
181. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
181 / 191
182. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
clientblocksoncommit...
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
182 / 191
183. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
clientblocksoncommit...
writesets
+ gtid_event
+ write values
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
183 / 191
184. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
clientblocksoncommit...
writesets
+ gtid_event
+ write values
certify
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
184 / 191
185. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
clientblocksoncommit...
writesets
+ gtid_event
+ write values
certify
certify
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
185 / 191
186. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
clientblocksoncommit...
writesets
+ gtid_event
+ write values
certify
certify
certify
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
186 / 191
187. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
commit finalized
writesets
+ gtid_event
+ write values
certify
certify
certify
+ GTID
bin
log
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
187 / 191
188. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
commit finalized
writesets
+ gtid_event
+ write values
certify
certify
certify
+ GTID
bin
log
+ GTID
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
188 / 191
189. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
commit finalized
writesets
+ gtid_event
+ write values
certify
certify
certify
+ GTID
bin
log
+ GTID
+ GTIDrelay
log
relay
log
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
189 / 191
190. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
commit finalized
writesets
+ gtid_event
+ write values
certify
certify
certify
+ GTID
bin
log
+ GTID
+ GTIDrelay
log
relay
log
bin
log
bin
log
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
190 / 191
191. Thank you !
Any Questions ?
Copyright @ 2019 Oracle and/or its affiliates. All rights reserved.
191 / 191