3.
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purpose only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied up in
making purchasing decisions. The development, release and timing of any features or
functionality described for Oracle´s product remains at the sole discretion of Oracle.
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
3 / 168
4. about.me/lefred
Who am I ?
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
4 / 168
6. Group Replication: heart of MySQL InnoDB
Cluster
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
6 / 168
7. Group Replication: heart of MySQL InnoDB
Cluster
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
7 / 168
8. MySQL Group Replication
but what is it ?!?
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
8 / 168
9. MySQL Group Replication
but what is it ?!?
GR is a plugin for MySQL, made by MySQL and packaged with MySQL
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
9 / 168
10. MySQL Group Replication
but what is it ?!?
GR is a plugin for MySQL, made by MySQL and packaged with MySQL
GR is an implementation of Replicated Database State Machine theory
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
10 / 168
11. MySQL Group Replication
but what is it ?!?
GR is a plugin for MySQL, made by MySQL and packaged with MySQL
GR is an implementation of Replicated Database State Machine theory
Paxos based protocol (our implementation is close to Mencius)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
11 / 168
12. MySQL Group Replication
but what is it ?!?
GR is a plugin for MySQL, made by MySQL and packaged with MySQL
GR is an implementation of Replicated Database State Machine theory
Paxos based protocol (our implementation is close to Mencius)
GR allows to write on all Group Members (cluster nodes) simultaneously while
retaining consistency
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
12 / 168
13. MySQL Group Replication
but what is it ?!?
GR is a plugin for MySQL, made by MySQL and packaged with MySQL
GR is an implementation of Replicated Database State Machine theory
Paxos based protocol (our implementation is close to Mencius)
GR allows to write on all Group Members (cluster nodes) simultaneously while
retaining consistency
GR implements conflict detection and resolution
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
13 / 168
14. MySQL Group Replication
but what is it ?!?
GR is a plugin for MySQL, made by MySQL and packaged with MySQL
GR is an implementation of Replicated Database State Machine theory
Paxos based protocol (our implementation is close to Mencius)
GR allows to write on all Group Members (cluster nodes) simultaneously while
retaining consistency
GR implements conflict detection and resolution
GR allows automatic distributed recovery
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
14 / 168
15. MySQL Group Replication
but what is it ?!?
GR is a plugin for MySQL, made by MySQL and packaged with MySQL
GR is an implementation of Replicated Database State Machine theory
Paxos based protocol (our implementation is close to Mencius)
GR allows to write on all Group Members (cluster nodes) simultaneously while
retaining consistency
GR implements conflict detection and resolution
GR allows automatic distributed recovery
Supported on all MySQL platforms !!
Linux, Windows, Solaris, OSX, FreeBSD
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
15 / 168
17. Let's illustrate a table:
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
17 / 168
18. Now let's make a change
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
18 / 168
19. and at commit time:
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
19 / 168
20. Writesets
Contain the hash for the rows PKs that are changed and in some cases the hashes of
foreign keys or others dependencies that need to be captured (e.g. non NULL UKs).
Writesets are gathered during transaction execution.
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
20 / 168
21. Writesets
Contain the hash for the rows PKs that are changed and in some cases the hashes of
foreign keys or others dependencies that need to be captured (e.g. non NULL UKs).
Writesets are gathered during transaction execution.
Writes
Called also write values, refers to the actual changes. Write values are also gathered
during transaction execution.
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
21 / 168
22. Writeset - examples
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
22 / 168
23. Writeset - examples
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t2 values (1,2);
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
23 / 168
24. Writeset - examples
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t2 values (1,2);
pke: PRIMARY | test | t2 | 1 | 1 hash: 11853456929268668462
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
24 / 168
25. Writeset - examples
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t2 values (1,2);
pke: PRIMARY | test | t2 | 1 | 1 hash: 11853456929268668462
mysql> update t2 set name=3 where id=1;
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
25 / 168
26. Writeset - examples
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t2 values (1,2);
pke: PRIMARY | test | t2 | 1 | 1 hash: 11853456929268668462
mysql> update t2 set name=3 where id=1;
pke: PRIMARY | test | t2 | 1 | 1 hash: 10002085147685770725
pke: PRIMARY | test | t2 | 1 | 1 hash: 10002085147685770725
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
26 / 168
27. Writeset - examples (2)
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | UNI | NULL | |
| name2 | binary(1) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
27 / 168
28. Writeset - examples (2)
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | UNI | NULL | |
| name2 | binary(1) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t3 values (1,2,3);
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
28 / 168
29. Writeset - examples (2)
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | UNI | NULL | |
| name2 | binary(1) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t3 values (1,2,3);
pke: PRIMARY | test |t3 | 1 | 1 hash: 79134815725924853
pke: name | test |t3 | 2 hash: 11034644986657565827
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
29 / 168
30. Writeset - examples (2)
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | UNI | NULL | |
| name2 | binary(1) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t3 values (1,2,3);
pke: PRIMARY | test |t3 | 1 | 1 hash: 79134815725924853
pke: name | test |t3 | 2 hash: 11034644986657565827
mysql> update t3 set name=3 where id=1;
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
30 / 168
31. Writeset - examples (2)
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | UNI | NULL | |
| name2 | binary(1) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t3 values (1,2,3);
pke: PRIMARY | test |t3 | 1 | 1 hash: 79134815725924853
pke: name | test |t3 | 2 hash: 11034644986657565827
mysql> update t3 set name=3 where id=1;
pke: PRIMARY | test | t3 | 1 | 1 hash: 79134815725924853
pke: name | test | t3 | 3 hash: 18082071075512932388
pke: PRIMARY | test | t3 | 1 | 1 hash: 79134815725924853
pke: name | test | t3 | 2 hash: 11034644986657565827
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
31 / 168
32. Writeset - examples (2)
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | binary(1) | NO | PRI | NULL | |
| name | binary(2) | YES | UNI | NULL | |
| name2 | binary(1) | YES | | NULL | |
+-------+-----------+------+-----+---------+-------+
mysql> insert into t3 values (1,2,3);
pke: PRIMARY | test |t3 | 1 | 1 hash: 79134815725924853
pke: name | test |t3 | 2 hash: 11034644986657565827
mysql> update t3 set name=3 where id=1;
pke: PRIMARY | test | t3 | 1 | 1 hash: 79134815725924853
pke: name | test | t3 | 3 hash: 18082071075512932388
pke: PRIMARY | test | t3 | 1 | 1 hash: 79134815725924853
pke: name | test | t3 | 2 hash: 11034644986657565827
[after image]
[before image]
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
32 / 168
33. GR is nice, but how does it work ?
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
33 / 168
34. GR is nice, but how does it work ?
it´s just ...
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
34 / 168
35. GR is nice, but how does it work ?
it´s just ...
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
35 / 168
36. GR is nice, but how does it work ?
it´s just ...
... no, in fact the writesets replication is synchronous and then certification and apply of
the changes are local to each nodes and asynchronous.
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
36 / 168
37. GR is nice, but how does it work ?
it´s just ...
... no, in fact the writesets replication is synchronous and then certification and apply of
the changes are local to each nodes and asynchronous.
not that easy to understand... right ? As a picture is worth a 1000 words, let´s illustrate
this...
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
37 / 168
38. Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
38 / 168
51. MySQL Group Replication (full transaction)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
51 / 168
52. MySQL Group Replication (full transaction)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
52 / 168
53. MySQL Group Replication (full transaction)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
53 / 168
54. MySQL Group Replication (full transaction)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
54 / 168
55. MySQL Group Replication (full transaction)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
55 / 168
56. MySQL Group Replication (full transaction)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
56 / 168
57. MySQL Group Replication (full transaction)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
57 / 168
58. MySQL Group Replication (full transaction)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
58 / 168
59. MySQL Group Replication (full transaction)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
59 / 168
60. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
60 / 168
61. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Replicated Database State Machine
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
61 / 168
62. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Replicated Database State Machine
Paxos based protocol
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
62 / 168
63. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Replicated Database State Machine
Paxos based protocol
its task: deliver messages across the distributed system:
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
63 / 168
64. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Replicated Database State Machine
Paxos based protocol
its task: deliver messages across the distributed system:
atomically
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
64 / 168
65. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Replicated Database State Machine
Paxos based protocol
its task: deliver messages across the distributed system:
atomically
in Total Order
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
65 / 168
66. MySQL Group Communication System (GCS)
MySQL Xcom protocol
Replicated Database State Machine
Paxos based protocol
its task: deliver messages across the distributed system:
atomically
in Total Order
MySQL Group Replication receives the Ordered 'tickets' from this GCS subsystem.
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
66 / 168
68. How does Group Replication handle GTIDs ?
There are two ways of generating GTIDs:
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
68 / 168
69. How does Group Replication handle GTIDs ?
There are two ways of generating GTIDs:
AUTOMATIC: the transaction is assigned with an automatically generated id during
commit. Where regular replication uses the source server UUID, on Group Replication,
the group name is used.
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
69 / 168
70. How does Group Replication handle GTIDs ?
There are two ways of generating GTIDs:
AUTOMATIC: the transaction is assigned with an automatically generated id during
commit. Where regular replication uses the source server UUID, on Group Replication,
the group name is used.
ASSIGNED: the user assigns manually a GTID through SET GTID_NEXT to the
transaction. This is common to any replication format and the id is assigned before
the transaction starts.
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
70 / 168
71. Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
71 / 168
72. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
72 / 168
73. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
73 / 168
74. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
74 / 168
75. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
75 / 168
76. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
76 / 168
77. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
77 / 168
78. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
78 / 168
79. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
79 / 168
80. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
80 / 168
81. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
81 / 168
82. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
82 / 168
83. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
83 / 168
84. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
84 / 168
85. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
85 / 168
86. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
86 / 168
87. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
87 / 168
88. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
88 / 168
89. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
89 / 168
90. Group Replication : Total Order Delivery - GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
90 / 168
91. Group Replication : GTID
The previous example was not totally in sync with reality. In fact, a writer allocates a
block of GTID and when we have multiple writes (multi-primary mode) all writers will use
GTID sequence numbers in their allocated block.
The size of the block is defined by
group_replication_gtid_assignment_block_size (default to 1M)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
91 / 168
92. Group Replication : GTID
Example:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
92 / 168
93. Group Replication : GTID
Example:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355
New write on an other node:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355,1000354
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
93 / 168
94. Group Replication : GTID
Example:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355
New write on an other node:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355,1000354
Let's write on the third node:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355:1000354:2000354
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
94 / 168
95. Group Replication : GTID
Example:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355
New write on an other node:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355,1000354
Let's write on the third node:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-355:1000354:2000354
And writing back on the first one:
Executed_Gtid_Set: 0b5c746d-d552-11e8-bef0-08002718d305:1-356:1000354:2000354
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
95 / 168
96. done !
Return from Commit
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
96 / 168
97. Group Replication: return from commit
Asynchronous Replication:
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
97 / 168
98. Group Replication: return from commit (2)
Semi-Sync Replication:
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
98 / 168
99. Group Replication: return from commit (3)
Group Replication:
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
99 / 168
100. Does this mean we can have a distant node and
always let it ack later ?
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
100 / 168
101. Does this mean we can have a distant node and
always let it ack later ?
NO!
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
101 / 168
102. Does this mean we can have a distant node and
always let it ack later ?
NO!
Because the system has to wait for the noop (single skip message) from the “distant”
node where latency is higher
The size of the GCS consensus messages window can be get and set from UDF functions:
group_replication_get_write_concurrency(), group_replication_set_write_concurrency()
mysql> select group_replication_get_write_concurrency();
+-------------------------------------------+
| group_replication_get_write_concurrency() |
+-------------------------------------------+
| 10 |
+-------------------------------------------+
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
102 / 168
103. Event Horizon
GCS Write Consensus Concurrency
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
103 / 168
104. Event Horizon
GCS Write Consensus Concurrency
group replication write concurrency
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
104 / 168
105. Event Horizon
GCS Write Consensus Concurrency
group replication write concurrency
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
105 / 168
106. Event Horizon
GCS Write Consensus Concurrency
group replication write concurrency
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
106 / 168
107. Event Horizon
GCS Write Consensus Concurrency
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
107 / 168
108. Event Horizon
GCS Write Consensus Concurrency
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
108 / 168
109. Event Horizon
GCS Write Consensus Concurrency
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
109 / 168
110. Event Horizon
GCS Write Consensus Concurrency
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
110 / 168
112. Group Replication : Optimistic Locking
Group Replication uses optimistic locking
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
112 / 168
113. Group Replication : Optimistic Locking
Group Replication uses optimistic locking
during a transaction, local (InnoDB) locking happens
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
113 / 168
114. Group Replication : Optimistic Locking
Group Replication uses optimistic locking
during a transaction, local (InnoDB) locking happens
optimistically assumes there will be no conflicts across nodes
(no communication between nodes necessary)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
114 / 168
115. Group Replication : Optimistic Locking
Group Replication uses optimistic locking
during a transaction, local (InnoDB) locking happens
optimistically assumes there will be no conflicts across nodes
(no communication between nodes necessary)
cluster-wide conflict resolution happens only at COMMIT, during certification
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
115 / 168
116. Group Replication : Optimistic Locking
Group Replication uses optimistic locking
during a transaction, local (InnoDB) locking happens
optimistically assumes there will be no conflicts across nodes
(no communication between nodes necessary)
cluster-wide conflict resolution happens only at COMMIT, during certification
Let´s first have a look at the traditional locking to compare.
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
116 / 168
129. Optimistic Locking
The system returns error 149 as certification failed:
ERROR 1180 (HY000): Got error 149 during COMMIT
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
129 / 168
130. Such conflicts happen only when using multi-
primary group !
not totally true in MySQL < 8.0.13 when failover happens
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
130 / 168
131. Drawbacks of optimistic locking
having a first-committer-wins system means conflicts will more likely happen when
writing on multiple members with:
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
131 / 168
132. Drawbacks of optimistic locking
having a first-committer-wins system means conflicts will more likely happen when
writing on multiple members with:
large transactions
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
132 / 168
133. Drawbacks of optimistic locking
having a first-committer-wins system means conflicts will more likely happen when
writing on multiple members with:
large transactions
long running transactions
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
133 / 168
134. Drawbacks of optimistic locking
having a first-committer-wins system means conflicts will more likely happen when
writing on multiple members with:
large transactions
long running transactions
hotspot records
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
134 / 168
135. can the transaction be committed ?
Certification
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
135 / 168
136. Certification
Certification is the process that only needs to answer the following unique question:
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
136 / 168
137. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
137 / 168
138. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
138 / 168
139. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
139 / 168
140. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
happens on every member/node and is deterministic
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
140 / 168
141. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
happens on every member/node and is deterministic
results are not reported to the group (does not require a new communication step)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
141 / 168
142. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
happens on every member/node and is deterministic
results are not reported to the group (does not require a new communication step)
pass: commit/queue to appy
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
142 / 168
143. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
happens on every member/node and is deterministic
results are not reported to the group (does not require a new communication step)
pass: commit/queue to appy
fail: rollback/drop the transaction
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
143 / 168
144. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
happens on every member/node and is deterministic
results are not reported to the group (does not require a new communication step)
pass: commit/queue to appy
fail: rollback/drop the transaction
serialized by the total order in GCS/XCOM + GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
144 / 168
145. Certification
Certification is the process that only needs to answer the following unique question:
can the write (transaction) be committed ?
based on yet to be applied transactions
such conflicts must come for other members/nodes
happens on every member/node and is deterministic
results are not reported to the group (does not require a new communication step)
pass: commit/queue to appy
fail: rollback/drop the transaction
serialized by the total order in GCS/XCOM + GTID
cost is based on trx size (# rows & # keys)
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
145 / 168
147. Houston we have a problem !
Flow Control
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
147 / 168
148. Flow Control
In Group Replication, every member send statistics about its queues (applier queue and
certification queue) to the other members. Then every node decide to slow down or not
if they realize that one node reached the threshold for one of the queue.
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
148 / 168
149. Flow Control
In Group Replication, every member send statistics about its queues (applier queue and
certification queue) to the other members. Then every node decide to slow down or not
if they realize that one node reached the threshold for one of the queue.
So when group_replication_ ow_control_mode is set to QUOTA on the
node seeing that one of the other members of the cluster is lagging behind (threshold
reached), it will throttle the write operations to the a quota that is calculated based on
the number of transactions applied in the last second, and then it is reduced below that
by subtracting the “over the quota” messages from the last period.
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
149 / 168
150. Flow Control
In Group Replication, every member send statistics about its queues (applier queue and
certification queue) to the other members. Then every node decide to slow down or not
if they realize that one node reached the threshold for one of the queue.
So when group_replication_ ow_control_mode is set to QUOTA on the
node seeing that one of the other members of the cluster is lagging behind (threshold
reached), it will throttle the write operations to the a quota that is calculated based on
the number of transactions applied in the last second, and then it is reduced below that
by subtracting the “over the quota” messages from the last period.
This mean that the threshold is NOT decided on the node being slow, but the node
writing a transaction checks its threshold flow control values and compare them to the
statistics from the other nodes to decide to throttle or not.
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
150 / 168
151. Flow Control - on writer
>quota
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
151 / 168
152. Flow Control - on all members
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
152 / 168
157. begin;
update table1
set c = 999
where id =2;
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
157 / 168
158. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
158 / 168
159. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
clientblocksoncommit...
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
159 / 168
160. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
clientblocksoncommit...
writesets
+ gtid_event
+ write values
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
160 / 168
161. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
clientblocksoncommit...
writesets
+ gtid_event
+ write values
certify
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
161 / 168
162. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
clientblocksoncommit...
writesets
+ gtid_event
+ write values
certify
certify
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
162 / 168
163. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
clientblocksoncommit...
writesets
+ gtid_event
+ write values
certify
certify
certify
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
163 / 168
164. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
commit finalized
writesets
+ gtid_event
+ write values
certify
certify
certify
+ GTID
bin
log
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
164 / 168
165. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
commit finalized
writesets
+ gtid_event
+ write values
certify
certify
certify
+ GTID
bin
log
+ GTID
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
165 / 168
166. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
commit finalized
writesets
+ gtid_event
+ write values
certify
certify
certify
+ GTID
bin
log
+ GTID
+ GTIDrelay
log
relay
log
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
166 / 168
167. begin;
update table1
set c = 999
where id =2;
update table1
set b = "eee"
where id = 3;
commit;
commit finalized
writesets
+ gtid_event
+ write values
certify
certify
certify
+ GTID
bin
log
+ GTID
+ GTIDrelay
log
relay
log
bin
log
bin
log
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
167 / 168
168. Thank you !
Any Questions ?
Copyright @ 2018 Oracle and/or its affiliates. All rights reserved.
168 / 168