PostgreSQL is one of the most advanced relational databases. It offers superb replication capabilities. The most important features are: Streaming replication, Point-In-Time-Recovery, advanced monitoring, etc.
3. What you will learn
How PostgreSQL writes data
What the transaction log does
How to set up streaming replication
How to handle Point-In-Time-Recovery
Managing conflicts
Monitoring replication
More advanced techniques
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
5. Writing a row of data
Understanding how PostgreSQL writes data is key to
understanding replication
Vital to understand PITR
A lot of potential to tune the system
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
6. Write the log first (1)
It is not possible to send data to a data table directly.
What if the system crashes during a write?
A data file could end up with broken data at potentially
unknown positions
Corruption is not an option
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
7. Write the log first (2)
Data goes to the xlog (= WAL) first
WAL is short for “Write Ahead Log”
IMPORTANT: The xlog DOES NOT contain SQL
It contains BINARY changes
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
8. The xlog
The xlog consists of a set of 16 MB files
The xlog consists of various types of records (heap changes,
btree changes, etc.)
It has to be flushed to disk on commit to achieve durability
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
9. Expert tip: Debugging the xlog
Change WAL DEBUG in src/include/pg config manual.h
Recompile PostgreSQL
NOTE: This is not for normal use but just for training purposes
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
10. Enabling wal debug
test=# SET wal_debug TO on;
SET
test=# SET client_min_messages TO debug;
SET
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
11. Observing changes
Every change will go to the screen now
It helps to understand how PostgreSQL works
Apart from debugging: The practical purpose is limited
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
12. Making changes
Data goes to the xlog first
Then data is put into shared buffers
At some later point data is written to the data files
This does not happen instantly leaving a lot of room for
optimization and tuning
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
13. A consistent view of the data
Data is not sent to those data files instantly.
Still: End users will have a consistent view of the data
When a query comes in, it checks the I/O cache (=
shared buffers) and asks the data files only in case of a cache
miss.
xlog is about the physical not about the logical level
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
14. Sustaining writes
We cannot write to the xlog forever without recycling it.
The xlog is recycled during a so called “checkpoint”.
Before the xlog can be recycled, data must be stored safely in
those data files
Checkpoints have a huge impact on performance
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
16. Checkpointing to frequently
Checkpointing is expensive
PostgreSQL warns about too frequent checkpoints
This is what checkpoint warning is good for
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
17. min wal size and max wal size (1)
This is a replacement for checkpoint segments
Now the xlog size is auto-tuned
The new configuration was introduced in PostgreSQL 9.5
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
18. min wal size and max wal size (2)
Instead of having a single knob (checkpoint segments) that both
triggers checkpoints, and determines how many checkpoints to
recycle, they are now separate concerns. There is still an internal
variable called CheckpointSegments, which triggers checkpoints.
But it no longer determines how many segments to recycle at a
checkpoint. That is now auto-tuned by keeping a moving average
of the distance between checkpoints (in bytes), and trying to keep
that many segments in reserve.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
19. min wal size and max wal size (3)
The advantage of this is that you can set max wal size very high,
but the system won’t actually consume that much space if there
isn’t any need for it. The min wal size sets a floor for that; you
can effectively disable the auto-tuning behavior by setting
min wal size equal to max wal size.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
20. How does it impact replication
The xlog has all the changes needed and can therefore be
used for replication.
Copying data files is not enough to achieve a consistent view
of the data
It has some implications related to base backups
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
21. Setting up streaming replication
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
22. The basic process
S: Install PostgreSQL on the slave (no initdb)
M: Adapt postgresql.conf
M: Adapt pg hba.conf
M: Restart PostgreSQL
S: Pull a base backup
S: Start the slave
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
23. Changing postgresql.conf
wal level: Ensure that there is enough xlog generated by the
master (recovering a server needs more xlog than just simple
crash-safety)
max wal senders: When a slave is streaming, connects to the
master and fetches xlog. A base backup will also need 1 / 2
wal senders
hot standby: This is not needed because it is ignored on the
master but it saves some work on the slave later on
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
24. Changing pg hba.conf
Rules for replication have to be added.
Note that “all” databases does not include replication
A separate rule has to be added, which explicitly states
“replication” in the second column
Replication rules work just like any other pg hba.conf rule
Remember: The first line matching rules
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
25. Restarting PostgreSQL
To activate those settings in postgresql.conf the master has to
be restarted.
If only pg hba.conf is changed, a simple SIGHUP (pg ctl
reload) is enough.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
26. Using pg basebackup (1)
pg basebackup will fetch a copy of the data from the master
While pg basebackup is running, the master is fully
operational (no downtime needed)
pg basebackup connects through a database connection and
copies all data files as they are
In most cases this does not create a consistent backup
The xlog is needed to “repair” the base backup (this is exactly
what happens during xlog replay anyway)
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
28. xlog-method: Self-contained backups
By default a base backup is not self-contained.
The database does not start up without additional xlog.
This is fine for Point-In-Time-Recovery because there is an
archive around.
For streaming it can be a problem.
–xlog-method=stream opens a second connection to fetch
xlog during the base backup
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
29. checkpoint=fast: Instant backups
By default pg basebackup starts as soon as the master
checkpoints.
This can take a while.
–checkpoint=fast makes the master check instantly.
In case of a small backup an instant checkpoint speeds things
up.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
30. -R: Generating a config file
For a simple streaming setup all PostgreSQL has to know is
already passed to pg basebackup (host, port, etc.).
-R automatically generates a recovery.conf file, which is quite
ok in most cases.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
31. Backup throttling
–max-rate=RATE: maximum transfer rate to transfer data
directory (in kB/s, or use suffix “k” or “M”)
If your master is weak a pg basebackup running at full speed
can lead to high response times and disk wait.
Slowing down the backup can help to make sure the master
stays responsive.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
32. Adjusting recovery.conf
A basic setup needs:
primary conninfo: A connect string pointing to the master
server
standby mode = on: Tells the system to stream instantly
Additional configuration parameters are available
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
33. Starting up the slave
Make sure the slave has connected to the master
Make sure it has reached a consistent state
Check for wal sender and wal receiver processes
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
34. Promoting a slave to master
Promoting a slave to a master is easy:
pg_ctl -D ... promote
After promotion recovery.conf will be renamed to
recovery.done
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
35. One word about security
So far replication has been done as superuser
This is not necessary
Creating a user, which can do just replication makes sense
CREATE ROLE foo ... REPLICATION ... NOSUPERUSER;
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
37. Simple checks
The most basic and most simplistic check is to check for
wal sender (on the master)
wal receiver (on the slave)
Without those processes the party is over
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
38. More detailed analysis
pg stat replication contains a lot of information
Make sure an entry for each slave is there
Check for replication lag
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
39. Checking for replication lag
A sustained lag is not a good idea.
The distance between the sender and the receiver can be
measured in bytes
SELECT client_addr,
pg_xlog_location_diff(pg_current_xlog_location(),
sent_location)
FROM pg_stat_replication;
In asynchronous replication the replication lag can vary
dramatically (for example during CREATE INDEX, etc.)
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
41. Handling more than 2 nodes
A simple 2 node cluster is easy.
In case of more than 2 servers, life is a bit harder.
If you have two slaves and the master fails: Who is going to
be the new master?
Unless you want to resync all your data, you should better
elect the server containing most of the data already
Comparing xlog positions is necessary
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
42. Timeline issues
When a slave is promoted the timeline ID is incremented
Master and slave have to be in the same timeline
In case of two servers it is important to connect one server to
the second one first and do the promotion AFTERWARDS.
This ensures that the timeline switch is already replicated
from the new master to the surviving slave.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
43. Cascading slaves
Slaves can be connected to slaves
Cascading can make sense to reduce bandwidth requirements
Cascading can take load from the master
Use pg basebackup to fetch data from a slave as if it was a
master
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
45. How conflicts happen
During replication conflicts can happen
Example: The master might want to remove a row still visible
to a reading transaction on the slave
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
46. What happens during a conflict
PostgreSQL will terminate a database connection after some
time
max standby archive delay = 30s
max standby streaming delay = 30s
Those settings define the maximum time the slave waits
during replay before replay is resumed.
In rare cases a connection might be aborted quite soon.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
47. Reducing conflicts
Conflicts can be reduced nicely by setting
hot standby feedback.
The slave will send its oldest transaction ID to tell the master
that cleanup has to be deferred.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
49. What happens if a slave reboots?
If a slave is gone for too long, the master might recycle its
transaction log
The slave needs a full history of the xlog
Setting wal keep segments on the master helps to prevent the
master from recycling transaction log too early
I recommend to always use wal keep segments to make sure
that a slave can be started after a pg basebackup
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
50. Making use of replication slots
Replication slots have been added in PostgreSQL 9.4
There are two types of replication slots:
Physical replication slots (for streaming)
Logical replication slots (for logical decoding)
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
51. Configuring replication slots
Change max replication slots and restart the master
Run . . .
test=# SELECT *
FROM pg_create_physical_replication_slot(’some_name’);
slot_name | xlog_position
-----------+---------------
some_name |
(1 row)
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
52. Tweaking the slave
Add this replication slot to primary slot name on the slave:
primary_slot_name = ’some_name’
The master will ensure that xlog is only recycled when it has
been consumed by the slave.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
53. A word of caution
If a slave is removed make sure the replication slot is dropped.
Otherwise the master might run out of disk space.
NEVER use replication slots without monitoring the size of
the xlog on the sender.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
54. Key advantages of replication slots
The difference between master and slave can be arbitrary.
During bulk load or CREATE INDEX this can be essential.
It can help to overcome the problems caused by slow networks.
It can help to avoid resyncs.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
56. Synchronous vs. asynchronous
Asynchronous replication: Commits on the slave can happen
long after the commit on the master.
Synchronous replication: A transaction has to be written to a
second server.
Synchronous replication potentially adds some network
latency to the scenery
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
57. The application name
During normal operations the application name setting can be
used to assign a name to a database connection.
In case of synchronous replication this variable is used to
determine synchronous candidates.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
58. Configuring synchronous replication:
Master:
add names to synchronous standby names
Slave:
add an application name to your connect string in
primary conninfo
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
59. Fail safety
Synchronous replication needs 2 active servers
If no two servers are left, replication will wait until a second
server is available.
Use AT LEAST 3 servers for synchronous replication to avoid
risk.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
61. What it does
PITR can be used to reach (almost) any point after a base
backup.
It is more of a backup strategy than a replication thing.
Replication and PITR can be combined.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
62. Configuring for PITR
S: create an archive (ideally this is not on the master)
M: Change postgresql.conf
set wal level
set max wal senders (if pg basebackup is desired)
set archive mode to on
set a proper archive command to archive xlog
M: adapt pg hba.conf (if pg basebackup is desired)
M: restart the master
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
63. pg basebackup, etc.
Perform a pg basebackup as performed before
–xlog-method=stream and -R are not needed
In the archive a .backup file will be available after
pg basebackup
You can delete all xlog files older than the oldest base backup
you want to keep.
The .backup file will guide you
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
64. Restoring from a crash
Take a base backup.
Write a recovery.conf file:
restore command: Tell PostgreSQL where to find xlog
recovery target time (optional): Use a timestamp to tell the
system how far to recover
Start the server
Make sure the system has reached consistent state
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
66. recovery min apply delay: Delayed replay
This settings allows you to tell the slave that a certain delay is
desired.
Example: A stock broker might want to provide you with 15
minute old data
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
67. pause at recovery target
Make sure that the recovery does not stop at a specified point
in time.
Make PostgreSQL wait when a certain point is reached.
This is essential in case you do not know precisely how far to
recover
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
68. recovery target name
Sometimes you want to recover to a certain point in time,
which has been specified before.
To specify a point in time run . . .
SELECT pg_create_restore_point(’some_name’);
Use this name in recovery.conf to recover to this very specific
point
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
70. Contact us . . .
Cybertec Sch¨onig & Sch¨onig GmbH
Gr¨ohrm¨uhlgasse 26
A-2700 Wiener Neustadt Austria
More than 15 years of PostgreSQL experience:
Training
Consulting
24x7 support
Hans-J¨urgen Sch¨onig
www.postgresql-support.de