More Related Content
Similar to Architecture & Pitfalls of Logical Replication (20)
Architecture & Pitfalls of Logical Replication
- 1. Copyright©2018 NTT Corp. All Rights Reserved.
Architecture & Pitfalls
of Logical Replication
NTT OSS Center
Atsushi Torikoshi
PGConf.US 2018
- 2. 2
Who am I
➢Atsushi Torikoshi
➢@atorik_shi
➢torikoshi_atsushi_z2@lab.ntt.co.jp
➢NTT Open Source Software Center
➢PostgreSQL technical support
➢PostgreSQL performance verification
Copyright©2018 NTT Corp. All Rights Reserved.
- 3. 3
About NTT
• Who we are
– NTT(Nippon Telegraph and Telephone Corporation)
– Japanese telecommunications company
• What NTT OSS Center does
– Promotes the adoption of OSS by the group companies
• Total support
– support desk, Introduction support, Product maintenance
• R&D
– developing OSS and related tools with the communities
• Deals about 60 OSS products
– developing OSS and related tools with the communities
NTT
NTT OSS Center
Copyright©2018 NTT Corp. All Rights Reserved.
- 4. Copyright©2018 NTT Corp. All Rights Reserved.
4
•Background of Logical Replication
•Architecture and Behavior
•Pitfalls
•Summary
INDEX
- 6. Copyright©2018 NTT Corp. All Rights Reserved.
6
PostgreSQL has built-in Physical Replication
since 2010.
It replicates a whole DB by sending WAL.
Suitable for load balancing and high
availability.
Physical Replication
Upstream Downstream
sendTable
Table
Table
WALWAL WALWAL Table
Table
Table
replay
- 7. Copyright©2018 NTT Corp. All Rights Reserved.
7
Physical Replication cannot do things like:
• partial replication
• replication between different major version
PostgreSQL
Logical Replication has added flexibility to
built-in replication and made these things
possible!
Logical Replication
Upstream Downstream
decode, sendTable
Table
Table
WALWAL WALWAL Table
Table
apply
write
- 8. Copyright©2018 NTT Corp. All Rights Reserved.
8
Comparison between Logical and Physical Replication
Physical Logical
way of the
replication
Sending and
replaying all
WAL
decoding WAL and extracting changes
downstream DB copy of the
upstream DB
not necessarily the same as upstream
DB
up/downstream DB can be different
PostgreSQL version
manipulations
for downstream
DB
SELECT only No restriction, but some manipulations
may lead to conflict
What is
replicated
ALL views, partition root tables, large
objects and some manipulations
including DDL are NOT replicated
- 9. Copyright©2018 NTT Corp. All Rights Reserved.
9
Logical Replication enables flexible data
replication.
1. Replicating partial data for analytical
purpose
2. Consolidating multiple DBs into a single
one
3. Online version up
Expected use cases of Logical Replication
(1) (2)
(3)
- 11. Copyright©2018 NTT Corp. All Rights Reserved.
11
• ‘walsender’ and ‘apply worker’ do most of
the work for Logical Replication.
• ‘sync worker’ and corresponding
‘walsender’ run only at initial table sync.
Basics of the architecture
WAL
wal
sender
Publisher (upstream)
write
wal
sender
apply
worker
launcher
sync
worker
launch
launch
Subscriber(downstream)
backend
process
read
decode
backend
process
- 12. Copyright©2018 NTT Corp. All Rights Reserved.
12
• ‘walsender’ reads WAL and decodes it. Then
sends it to subscriber.
• ‘apply worker’ applies that change.
Basics of the architecture ~replication
WAL
backend
process
wal
sender
Publisher
write
read
apply
worker
Subscriber
TableTableTable
write
decode
send
change
- 13. Copyright©2018 NTT Corp. All Rights Reserved.
13
• ‘walsender’ reassembles queries by its
transaction.
• When WAL is INSERT, UPDATE or DELETE,
‘walsender’ keeps the change in memory.
Basics of the architecture ~replication
WAL
walsender
INSERT
UPDATE
UPDATE
DELETE
UPDATE apply
worker
Publisher Subscriber
:transaction
- 14. Copyright©2018 NTT Corp. All Rights Reserved.
14
• ‘walsender’ reassembles queries by its
transaction.
• When WAL is INSERT, UPDATE or DELETE,
‘walsender’ keeps the change in memory.
Basics of the architecture ~replication
WAL
walsender
INSERT
UPDATE
UPDATE
DELETE
UPDATE
1. read WAL
apply
worker
Publisher Subscriber
:transaction
- 15. Copyright©2018 NTT Corp. All Rights Reserved.
15
• ‘walsender’ reassembles queries by its
transaction.
• When WAL is INSERT, UPDATE or DELETE,
‘walsender’ keeps the change in memory.
Basics of the architecture ~replication
WAL
walsender
INSERT
INSERT
UPDATE
UPDATE
DELETE
UPDATE
1. read WAL
2. decode
apply
worker
Publisher Subscriber
:transaction
- 16. Copyright©2018 NTT Corp. All Rights Reserved.
16
• ‘walsender’ reassembles queries by its
transaction.
• When WAL is INSERT, UPDATE or DELETE,
‘walsender’ keeps the change in memory.
Basics of the architecture ~replication
WAL
walsender
INSERT
INSERT
UPDATE
UPDATE
DELETE
UPDATE
1. read WAL
2. decode
3. reassemble
by transaction
apply
worker
Publisher Subscriber
:transaction
INSERT
- 17. Copyright©2018 NTT Corp. All Rights Reserved.
17
• When WAL is COMMIT, ‘walsender’ sends all
the changes for that transaction to
subscriber.
Basics of the architecture ~replication
:transaction
WAL
apply
worker
walsender
COMMIT
INSERT
UPDATE
UPDATE
DELETE
UPDATE
1. read WAL
2. decode
4. send
Publisher Subscriber
3. reassemble
by transaction
COMMIT
- 18. Copyright©2018 NTT Corp. All Rights Reserved.
18
• When WAL is ROLLBACK, ‘walsender’ just
throws away the changes for that
transaction.
Basics of the architecture ~replication
:transaction
WAL
walsender
ROLLBACK
INSERT
UPDATE
UPDATE
DELETE
UPDATE
ROLLBACK
1. read WAL
2. decode
4. cleanup
apply
worker
Publisher Subscriber
3. reassemble
by transaction
- 19. Copyright©2018 NTT Corp. All Rights Reserved.
19
• At initial table sync, COPY runs.
• COPY is done by dedicated ‘walsender’ and
sync worker. These processes exit after
COPY is done.
Initial table sync
WAL
backend
process
wal
sender
Publisher
write
read
apply
worker
Subscriber
TableTableTable
sync
worker
wal
sender write
(COPY)
- 20. Copyright©2018 NTT Corp. All Rights Reserved.
20
• PostgreSQL doesn’t have merge agents for
conflict resolution. If there are multiple
changes for the same data at one time, the
last change is reflected.
(Not) Conflict
Publisher Subscriber
id name
1 ‘A’
2 ‘B’
id name
1 ‘A’
2 ‘B’
- 21. Copyright©2018 NTT Corp. All Rights Reserved.
21
• PostgreSQL doesn’t have merge agents for
conflict resolution. If there are multiple
changes for the same data at one time, the
last change is reflected.
(Not) Conflict
Publisher Subscriber
2. UPDATE table SET name = ‘Y‘
WHERE id = 2
id name
1 ‘A’
2 ‘Y’
1. UPDATE table SET name = ‘X‘
WHERE id = 2
id name
1 ‘A’
2 ‘X’
- 22. Copyright©2018 NTT Corp. All Rights Reserved.
22
• PostgreSQL doesn’t have merge agents for
conflict resolution. If there are multiple
changes for the same data at one time, the
last change is reflected.
(Not) Conflict
Publisher Subscriber
2. UPDATE table SET name = ‘Y‘
WHERE id = 2
id name
1 ‘A’
2 ‘X’
1. UPDATE table SET name = ‘X‘
WHERE id = 2
3. replicate
id name
1 ‘A’
2 ‘X’
- 23. Copyright©2018 NTT Corp. All Rights Reserved.
23
• If replicating data causes an error at
subscriber side, the replication stops.
Conflict
Publisher Subscriber
id
1
2
1. INSERT INTO table VALUES (2);
id
1
2
2. INSERT INTO table VALUES (2);
- 24. Copyright©2018 NTT Corp. All Rights Reserved.
24
• If replicating data causes an error at
subscriber side, the replication stops.
Conflict
Publisher Subscriber
id
1
2
1. INSERT INTO table VALUES (2);
id
1
2
2. INSERT INTO table VALUES (2);
3. replicate
- 25. Copyright©2018 NTT Corp. All Rights Reserved.
25
• If replicating data causes an error at
subscriber side, the replication stops.
Conflict
Publisher Subscriber
id
1
2
2. INSERT INTO table VALUES (2);
id
1
2
1. INSERT INTO table VALUES (2);
3. replicate 4. conflict
- 26. Copyright©2018 NTT Corp. All Rights Reserved.
26
• Users must resolve conflict manually.
• After the conflict is resolved, replication is
resumed.
Conflict
Publisher Subscriber
id
1
2
2. INSERT INTO table VALUES (2);
id
1
2
1. INSERT INTO table VALUES (2);
3. replicate 4. conflict
- 28. Copyright©2018 NTT Corp. All Rights Reserved.
28
Q1. How does ‘walsender’ deal with WAL
which are NOT target of replication?
- 30. Copyright©2018 NTT Corp. All Rights Reserved.
30
• behavior: 'walsender’ decodes *all* of the
changes to the target database, NOT just
the changes to subscribed tables.
1. ‘walsender’ decodes most of the WAL
- 31. Copyright©2018 NTT Corp. All Rights Reserved.
31
• pitfall: Changes in non-subscribed tables
even consume resources, such as CPU and
memory.
1. ‘walsender’ decodes most of the WAL
perf visualization of walsender updating only non-subscribed tables
DecodeDelete DecodeInsert DecodeCommit
- 32. Copyright©2018 NTT Corp. All Rights Reserved.
32
• Lesson: ‘walsender’ consumes resources
depending on the whole amount of changes
on the publisher database database, NOT
only on the amount of changes on
subscribed tables.
1. ‘walsender’ decodes most of the WAL
- 35. Copyright©2018 NTT Corp. All Rights Reserved.
35
• behavior: ‘walsender’ keeps each change of
a transaction in memory until COMMIT or
ROLLBACK.
2. ‘walsender’ may consume a lot of memory
- 36. Copyright©2018 NTT Corp. All Rights Reserved.
36
• pitfall: It may cause ‘walsender’ to
consume a lot of memory.
2. ‘walsender’ may consume a lot of memory
Type of manipulation Measures to prevent memory use
many changes in
one transaction
walsender’ has a feature to spill
out changes to disk, when the
number of changes in one
transaction exceeds 4096.
changes which
modifies much data
There are no feature to avoid using
memory.
many transactions
many savepoints
※ Patches changing this behavior are under discussion.
- 37. Copyright©2018 NTT Corp. All Rights Reserved.
37
• lesson: If possible, it’s better to avoid the
manipulations which have no measures to
prevent consuming a lot of memory.
Monitoring memory usage at publisher may
be a good idea.
2. ‘walsender’ may consume a lot of memory
- 40. Copyright©2018 NTT Corp. All Rights Reserved.
40
• behavior: Under synchronous replication,
before replying to the client, publishers
wait for the COMMIT responses from all the
subscribers.
3. The response time may be quite long
Publisher
table
2
table
1
Client
BEGIN;
INSERT INTO Table1 VALUES (‘a’);
COMMIT;
(1)
(4)
Subscriber
table
1
BEGIN;
INSERT INTO Table1 VALUES (‘a’);
COMMIT;
(2)
(3)
Table1
- 41. Copyright©2018 NTT Corp. All Rights Reserved.
41
• pitfall: Under synchronous replication,
Publishers wait for COMMIT responses from
all the subscribers, even when there are no
changes to those subscribers.
3. The response time may be quite long
Publisher
table
2
table
1 Subscriber2
table
2
Client
BEGIN;
INSERT INTO Table1 VALUES (‘a’);
COMMIT;
Sends only
BEGIN and COMMIT
(1)
(2)
(3)
(4)
Subscriber1
table
1
BEGIN;
INSERT INTO Table1 VALUES (‘a’);
COMMIT;
(2)
(3)
Table1
- 42. Copyright©2018 NTT Corp. All Rights Reserved.
42
• lesson: The response time to clients
depends on the slowest subscriber.
• Also, as we’ve seen it on Q2, ‘walsender‘
sends changes to ‘apply worker’ after
COMMIT, it also tends to make response
time longer.
• It may also be beneficial to confirm you
really need synchronous replication.
3. The response time may be quite long
- 43. Copyright©2018 NTT Corp. All Rights Reserved.
43
Q4. Is the way to monitor the status of
replication the same as Physical
Replication?
- 45. Copyright©2018 NTT Corp. All Rights Reserved.
45
• behavior: Initial table sync is done by
dedicated processes, sync worker and
walsender.
4. pg_stat_replication might not be enough
WAL
backend
process
wal
sender
Publisher
write
read
apply
worker
Subscriber
TableTableTable
sync
worker
wal
sender write
(COPY)
- 46. Copyright©2018 NTT Corp. All Rights Reserved.
46
• pitfall: Even if ‘sync worker’ failed to start
and nothing has been replicated yet,
pg_stat_replication.state is ‘streaming’.
4. pg_stat_replication might not be enough
- 47. Copyright©2018 NTT Corp. All Rights Reserved.
47
• lesson: We should also monitor
pg_subscription_rel and check ‘srsubstate’
is ‘r’, meaning ready.
4. pg_stat_replication might not be enough
- 49. Copyright©2018 NTT Corp. All Rights Reserved.
49
A5. We can use
pg_replication_origin_advance(),
but it may skip some data.
- 50. Copyright©2018 NTT Corp. All Rights Reserved.
50
• behavior: pg_replication_origin_advance()
enables us to set the LSN up to which data
has been replicated.
5. pg_replication_origin_advance() may skip data
| |
10 20
remote lsn
Here
- 51. Copyright©2018 NTT Corp. All Rights Reserved.
51
• behavior: pg_replication_origin_advance()
enables us to set the LSN up to which data
has been replicated.
5. pg_replication_origin_advance() may skip data
| |
10 20
remote lsn
pg_replication_origin_advance(‘node_name’, 20)
Here
- 52. Copyright©2018 NTT Corp. All Rights Reserved.
52
• behavior: pg_replication_origin_advance()
enables us to set the LSN up to which data
has been replicated.
5. pg_replication_origin_advance() may skip data
| |
10 20
remote lsn
pg_replication_origin_advance(‘node_name’, 20)
Here
Conflict point
- 53. Copyright©2018 NTT Corp. All Rights Reserved.
53
• pitfalls: If there are some changes on the
publisher after the conflict,
pg_replication_origin_advance(‘current wal
lsn on publisher’) skips applying that
changes.
5. pg_replication_origin_advance() may skip data
| |
10 20
remote lsn
pg_replication_origin_advance(‘node_name’, 20)
INSERT
UPDATEConflict point
- 54. Copyright©2018 NTT Corp. All Rights Reserved.
54
• lessons: Changing conflicting data on the
subscriber may be usually a better choice.
5. pg_replication_origin_advance() may skip data
- 56. Copyright©2018 NTT Corp. All Rights Reserved.
56
A6. Backup DB under Logical Replication
may need additional procedure.
- 57. Copyright©2018 NTT Corp. All Rights Reserved.
57
• behavior: pg_dump doesn't backup
pg_subscription_rel, which keeps the state
of initial table sync.
6. Logical Replication may need additional procedure
- 58. Copyright©2018 NTT Corp. All Rights Reserved.
58
• pitfalls: Restoring data backed up by
pg_dump at a subscriber causes initial table
sync again.
It usually makes the replication stop due to
key duplication error.
6. Logical Replication may need additional procedure
Publisher Subscriber
pg_dump
TableTableTable
- 59. Copyright©2018 NTT Corp. All Rights Reserved.
59
• pitfalls: Restoring data backed up by
pg_dump at a subscriber causes initial table
synchronization again.
It usually makes the replication stop due to
key duplication error.
6. Logical Replication may need additional procedure
Publisher Subscriber
(1)restore
pg_dump
TableTableTable TableTableTable
- 60. Copyright©2018 NTT Corp. All Rights Reserved.
60
• pitfalls: Restoring data backed up by
pg_dump at a subscriber causes initial table
synchronization again.
It usually makes the replication stop due to
key duplication error.
6. Logical Replication may need additional procedure
Publisher Subscriber
(1)restore
pg_dump
TableTableTable TableTableTable
(2)replication
- 61. Copyright©2018 NTT Corp. All Rights Reserved.
61
• pitfalls: Restoring data backed up by
pg_dump at a subscriber causes initial table
synchronization again.
It usually makes the replication stop due to
key duplication error.
6. Logical Replication may need additional procedure
Publisher Subscriber
(1)restore
pg_dump
TableTableTable TableTableTable
(3)conflict
(2)replication
- 62. Copyright©2018 NTT Corp. All Rights Reserved.
62
• lessons: We can avoid this resyncing by
refresh subscription with 'copy_data =
false‘.
But if a subscription has tables which have
not completed the initial sync, we need
more work..
It's better to consider well what data is
really necessary and how to prevent data
loss.
In some cases it may be better to start
replication from scratch.
6. Logical Replication may need additional procedure
- 64. Copyright©2018 NTT Corp. All Rights Reserved.
64
Design
Take into account some counterintuitive
behaviors which cause performance impact.
• ‘walsender’ keeps changes in memory
• In sync replication, publishers wait for COMMIT
from all the subscribers even which have no
change.
• Changes on non-subscribed tables are also
decoded.
How should we manage Logical Replication?
- 65. Copyright©2018 NTT Corp. All Rights Reserved.
65
Monitoring
• Monitor memory usage on publisher.
• Monitor not only pg_stat_replication but
pg_subscription_rel.
How should we manage Logical Replication?
- 66. Copyright©2018 NTT Corp. All Rights Reserved.
66
Operation
• pg_replication_origin_advance() may skip
some data.
• Backup and restore need some extra
procedures, It's better to consider well
what data is really necessary and how to
prevent data loss.
How should we manage Logical Replication?