WalB is an open-source backup system that consists of block devices, called WalB devices, and userland utilities, called WalB tools. A WalB device records write-I/Os. WalB tools extracts them to create restorable snapshots in an incremental manner.
Compared with dm-snap and dm-thin, WalB is designed to achieve small I/O latency overhead and short backup time. We conducted an experiment to take an incremental backup of a volume under random write workload. The result confirms those advantages of WalB.
Cybozu cloud platform, which has 500TB volumes and processes 25TB write-I/Os per day, is required to achieve (1) stable workload performance without I/O spikes which may affect application user experience and (2) short backup interval specified in our service level objective. WalB satisfies the requirements, while dm-snap is not enough to and dm-thin is not expected to.
3. 3
About Cybozu
▌A large cloud service vendor in Japan.
▌Largest market shares
in field of collaborative software.
▌We serve web applications on our own cloud platform.
kintone: a low-code business app platform
and more
5. 5
Service Level Objective
▌24/7 nonstop service
▌99.99% availability (4 min / month)
▌Daily backup (retention period is 14 days)
▌Disaster recover: copy data to a remote site once a day
6. Architecture of our platform
6
Application
Server
L7LB
Storage Server
dm-snap
Storage Server
dm-snap
Backup Server
Remote Site
Database
Server
DiffDiff
DiffDiff
The scope of this talk
RAID 1
Blob
Server
7. Mapping
Info
Snapshot Management
with dm-snap
7
A B
Original Volume Area
Snapshot Area
Logical Structure
Physical Structure
(1) CoW
Latest Image
Write A’ Write B’
Snapshot Image
(2) Write
B’
B
B’
A
A’
A’
0 1 2 3 4
8. Backup using dm-snap
8
Snapshot1
(2) Full-scan a new snapshot
Logical Structure
Snapshot0
B’A’
(3) Generate a diff image
by comparing two snapshots
B
(1) Full-scan an old snapshot
B’A’
A
11. 11
We have no more “nights”
▌Until now:
Full scan is allowed only when access rate is low, i.e., at night.
▌From now on:
We have to handle accesses from multiple timezones.
▌We must be able to backup any time without UX degradation.
12. 12
New Solution
▌We need a new solution with:
No IO spikes
Short backup time
▌We compared dm-thin with WalB
13. 13
What is dm-thin?
▌dm-thin provides thin-provisioning volume management to
share same data among volumes
reduce disk usage using snapshots
▌In the mainline Linux kernel
16. Snapshot Management
with dm-thin
16
A A’
Snapshot Tree Latest Tree
(1) CoW
(1) CoW
Write A’
Physical Structure
(2) Write
(2) Update
A’
ASnapshot
Latest Image
Logical Structure
17. 17
A B B’
Snapshot0 Snapshot1
A’
A’ B’
A BSnapshot0
Snapshot1
Generate a diff image using dm-thin metadata
Logical Structure
Physical Structure
Backup using dm-thin
18. 18
What is WalB?
▌A real-time and incremental backup system
developed at Cybozu Labs
▌Can backup block devices without IO spikes
dm-snap
full scanning
WalB
no spikes
19. Special Block Devices for WalB
19
WalB device
Data device Log device
Read Write
Any application (File system, DBMS, etc.)
Linear mapped Ring buffer
20. Write IO Logging and Backup
with WalB
20
A B
Data Device Log Device
0 1 2 3 4
Time series of write I/Os
Time
21. Write IO Logging and Backup
with WalB
21
B
A B
Write A’
Data Device Log Device
A’
0 1 2 3 4
1 A’
Time series of write I/Os
Time
Scan the log device and
generate a diff image
22. Write IO Logging and Backup
with WalB
22
B
A B
B’
Write A’
Write B’
Data Device Log Device
A’
A’ 41
0 1 2 3 4
A’
A’ B’
Time series of write I/Os
Scan the log device and
generate a diff image
Time
1
23. 23
Performance test
▌Compared dm-snap, dm-thin, and WalB
▌Executed a workload during a backup
The workload & the backup will affect each other
▌Measured the following metrics:
Latencies of the workload
Backup time
24. 24
Environment & Settings
▌Test environment:
CPU:2.40 GHz x 12 cores
MEM:192 GiB
HDD:4 TB HDD, RAID 6 (8D2P)
NIC:10 Gbps x 2
Kernel:4.11 (latest upstream)
▌Test settings:
100 GiB volumes
Workload: 4 KiB Random writes for a 5 GiB range
25. 25
Measuring the Backup Time
(dm-snap, dm-thin)
▌dm-snap:take a snapshot & scan full image
▌dm-thin:get a structure of snapshot trees & find modified
blocks & read these blocks
5 GiB 95 GiB (unchanged)
4 KiB Random Writes
dm-snap : scan full image
dm-thin : scan changed chunks (tree traversal)
26. 26
Measuring the Backup Time
(WalB)
▌WalB:scan logs from a log device & send them to a backup
server continuously
5 GiB 95 GiB (unchanged)
4 KiB Random Writes
WalB : scan logs
Log Device
Write IO logsWalB Device
Backup Server
DiffDiff
Network
29. 29
Conclusion
▌dm-snap & dm-thin
High I/O latency during a backup
Long backup time
▌WalB
Stable and low I/O latency (no spikes)
Short backup time
WalB satisfies our requirements for production use.
30. 30
Try WalB!
▌Project page
https://walb-linux.github.io/
▌Tutorial
https://github.com/walb-linux/walb-
tools/tree/master/misc/vagrant/
Vagrantfile for Ubuntu 16.04 and CentOS 7
31. Remote Host
31
Incremental backup
▌Daily backup (retention period is 14 days)
▌Worker daemon of WalB selects diff files older than 14
days and applies them to a base image.
Volume Diff Diff Diff…
Base
Diff files for 14 days
Backup
Host
Apply everyday
32. Remote Host
32
Restoring a volume
▌To restore the latest state of a volume:
take a snapshot of a base image, and
apply all diff files to it.
Diff Diff Diff…
Base
Base'
Writable
snapshot
Apply all diffs
33. Remote Host
33
Make restoration faster 1/2
▌Fast restoration
by preparing read-only snapshots for each day
Diff Diff Diff…
Base
1421
dm-thin snapshots for each day
Diff
34. Remote Host
34
Make restoration faster 2/2
▌Apply some diffs to the appropriate snapshot.
▌At most 24 hours of diffs are needed to be applied.
Faster!
Diff Diff Diff…
Base
1421
Diff
35. 35
Worldline: restoring a whole
environment
▌"Worldline" means a parallel world.
▌We backup configurations in addition to user data.
Configurations:
definitions for each customer (ID, FQDN, Apps, …),
application version definition,
host definition, etc.
▌It is important to use applications whose versions are
consistent with user data backed up before.
36. 36
Worldline: restoring a whole
environment
▌A daily script takes a snapshot of a whole environment.
▌An weekly script restores the latest backup, so we can use it
for investigation of failures or development our services.
User
data
DiffDiff
Snap
shot
Config
DB
Config
DB'Backup Backup
Worldline
Spare hosts
Restore
DiffDiff
Restore
Thank you for attending to this presentation.
I’ll talk about WalB, a fast and low latency backup system for block devices.
OK, let’s start.
My name is Kota Uchida.
I’m a site reliability engineer at Cybozu incorporated.
I’m a WalB developer.
I have deployed WalB backup system on our production environment.
Do you know Cybozu?
Cybozu is a large cloud service vendor in Japan.
We have the largest market shares in the field of “groupware” or “collaborative software”, like online calendar, workflow, bulletin board system, and so on.
Our mission is to enhance teamworks all over the world.
We serve web applications on our own cloud platform, not a public cloud like AWS.
One of our applications is “kintone”.
“kintone” is a low-code business application platform.
You can create business applications with little or no code.
Over nineteen thousands companies are using our services.
One ninety millions accesses per day.
About Twenty five tibibytes data are written to storages everyday.
Let me explain our service level objective.
Our services are twenty-four seven, nonstop services.
We target, and almost achieve, four-nine availability.
We backup user data everyday and keep them for fourteen days.
We also send the data to a remote site once a day for disaster recovery.
This diagram shows the architecture of our cloud platform.
It is basic architecture.
An user request goes through several components, such as layer seven load balancers, application servers, a database server and a blob server.
User data will be written to two storage servers.
They replicate data each other by software RAID I.
Data written to the storage servers will be backed up to a backup server, and copied to a remote site once a day.
In a storage server, we use dm-snap to create snapshots and back them up.
I’ll show how we do that in the next slide.
This is detailed architecture of dm-snap.
The above figures, indicated as logical structure, show how volumes and snapshots of dm-snap look like from the point of view of users and applications .
Let’s consider there is a disk which consists of 5 blocks, numbered zero to four.
We assume we took a snapshot when the content of block 1 was A, and block 3 was B.
Then write requests came to block 1 and 3.
As a result these blocks were overwritten, while the snapshot image was not changed.
The below figures are physical structure of a volume with a snapshot.
When you create a snapshot, dm-snap prepares a snapshot area with initialized mapping information.
When you submit a write IO request at block 1 with A-dash data, dm-snap copies block 1 to the snapshot area first, then the block will be overwritten by A-dash data.
In the next slide, I’ll show you how to backup data using dm-snap.
There are two snapshots 0 and 1.
We assume snapshot 1 is newer than snapshot 0.
To backup a snapshot taken by dm-snap, two snapshots are required.
We scan all blocks of the two snapshots, then we compare them block by block, finally we can get a diff image.
We need massive amount of reads for incremental backup, using dm-snap.
Q: Like dm-thin, is it possible to extract diff images using mapping indexes without full scan dm-snap?
A: Yes it is possible, but it must be very slow because it requires two snapshots at the same time.
The overhead of copy-on-write is much larger than that of with dm-thin.
So we backup user data at night, when accesses are decreased.
The above graph shows read I/O throughput of one volume.
There is a large spike, about seven hundred MB per second, at midnight.
The below graph shows the number of user requests on our platform.
Many people don’t use our services at night.
# グラフデータは read throughput については tyss-221 と tyss-222 の合算。 response については serviceset:ty13 の全て
# Q. 何故 daytime に read がほとんどないのか
# A. ほとんどのデータはキャッシュに載っているから
Because almost all of data are on cache memory.
This graph shows a time series of user response time in milliseconds .
Data consist of several update operations in a storage server in production.
During full-scan, user experience seems to be worse than usual,
because ninety percentile of response time exceeds 1 second .
# 結局 ty22 の kintone add.json のみのグラフ。10分間隔でのデータ。8:00JST-19:00JST
And now, we have no more nights.
Because we try to provide our services to worldwide customers, we have to handle accesses from multiple timezones.
So we must be able to backup user data any time without affecting user experience.
With dm-snap, it cannot be achieved.
We need a new solution.
We have researched other backup solutions which satisfy 2 requirements: no I/O spikes and short backup time.
There are two candidates, dm-thin and WalB.
We compared them.
dm-thin provides thin-provisioning volume management to share same data among volumes and reduce disk usage using snapshots.
dm-thin is included in the mainline kernel.
In the following slides, I will explain how dm-thin implements snapshot feature, and how to use it for backup.
This figures show how dm-thin provides snapshot management feature.
Please look at the above figure.
From user’s point of view, a snapshot of dm-thin can be considered as a normal volume.
Next, look at the below figure.
This is little bit complicated.
Since dm-thin is not the essential part of this talk, you may ignore this diagram.
At first, there is one tree expressing the latest image.
Intermediate nodes have only meta data, and leaf nodes have user data.
When you take a snapshot of a volume, dm-thin copies the root node for the latest and the snapshot tree.
At this point, the “latest tree” node refers the original intermediate nodes.
When a write request comes, dm-thin copies a corresponding leaf and its ancestor nodes, and modify the link of the root node.
The latest tree has been updated while the snapshot tree remains unchanged.
In this slide, I’ll explain how to backup a volume using dm-thin.
In the above figure, there are two snapshots.
We assume snapshot 1 is newer than snapshot 0.
Difference between snapshot 0 and 1 are two blocks, A and B.
Its physical structure is pictured as the below figure.
There are two trees representing two snapshots.
You can get a diff image for incremental backup by comparing structure of two snapshot trees.
In this example, the diff image consists of A’ and B’.
So far I have explained dm-thin.
From here, let me introduce WalB.
WalB is a real-time and incremental backup system.
It has been developed at Cybozu Labs.
Using WalB, there seems to be no I/O spikes.
This slide explains architecture of a WalB device.
WalB backup system uses special block devices, called WalB devices.
A WalB device, shown in this picture, is a virtual block device that consists of two ordinary block devices, a data device and a log device.
A data device stores user data. Its block addresses are mapped linearly to WalB device.
A log device stores write IO logs. This area is used as a ring buffer.
Write I/O requests are handled at both the data and log devices.
Read I/O requests are handled at the data device only.
WalB device driver preserves consistency appropriately.
In this slide, I will explain how WalB takes a backup continuously.
Imagine there is a block device with 5 blocks.
Block 1 has data A, and block 4 has data B.
When a write request comes, a WalB device writes it to a data device and a log device.
WalB tools read the log device and generate a diff image.
Another write request will be treated in the same way.
Diff images generated by WalB tools will be sent to a backup server.
We conducted an experiment to see performance of dm-snap, dm-thin, and WalB.
In the experiment, we executed a workload during a backup.
The workload and the backup will affect each other, so we did these concurrently.
We measured two metrics, IO latencies of the workload and the backup time.
We used server machines with the same spec as ones in our production environment.
CPUs have total twelve cores.
One-ninety-two gibibytes memory.
RAID six storage.
Ten gigabits ethernet.
Latest upstream kernel.
We created one hundred gibibytes volumes for each backup solution, dm-snap, dm-thin, and WalB.
And we executed four kibibytes random writes onto the volumes.
During the workload running, we executed an incremental backup also.
Now let’s check again how to backup with dm-snap, dm-thin, and WalB.
With dm-snap, we have to scan full area of the snapshot to create a diff image for incremental backup.
Those read I/Os are nearly sequential.
With dm-thin, first we get information about structures of snapshot trees;
then we calculate which blocks are modified;
finally we read modified blocks to create a diff image.
Tree traversal is tend to be random read.
In fact we did an emulation of backup for dm-snap and dm-thin.
A real backup has to read a volume, calculate differences and send them to a backup server.
In this experiment, for dm-snap, we just scanned full of the latest snapshot.
For dm-thin, we just read modified blocks using metadata information.
They are the dominant part of the backup time, so we treated them as their backup time.
For WalB, we employed a real backup system, which is used in our production environment.
It’s not an emulation.
WalB tools extract write IO logs from the log device and send them to the backup server.
The logs extracted are converted to diff images at the backup server.
Then the corresponding snapshot becomes restorable.
Because WalB tools continuously extract and send I/Os logs, written data will be backed up at almost the same time as the data have been written to a volume.
Backup time is defined as a latency from when a snapshot is set at the WalB device as a mark on the latest IO log, to when the snapshot becomes restorable at the backup server.
Our 10 gigabit datacenter network does not become a bottleneck, so the backup time with WalB will be a few seconds or less.
This is the result of our experiment.
This graph shows I/O latencies of workloads, 4KiB random writes during a backup.
Base line is a normal logical volume without backup, labelled no-backup.
Its latency is stably at 5 milliseconds.
Latency of dm-snap is about six times of no-backup, thirty milliseconds.
Existence of snapshot and a lot of read I/Os affects write I/O latency.
Latency of dm-thin is much worse than that of dm-snap!
In the first forty seconds, we can see a large spike of dm-thin.
We can see an overhead of WalB, but it is very small: about zero point five milliseconds.
WalB does not generate spikes.
These characteristics are important for 24/7 cloud service.
Now let’s look another aspect of this experiment.
This graph shows backup times for the three solutions.
Backups are executed under random write workload.
dm-snap takes about one thousand seconds.
dm-snap is the solution we were using before WalB.
dm-thin takes twice as long as dm-snap.
If user data we have grow larger, dm-thin may not be able to back them up within one day.
It cannot be tolerated because our objective declares daily backup.
WalB takes only 1 point 2 seconds.
This is short enough to satisfy the declaration of daily backup.
In addition to that, this duration, 1 point 2 seconds, is independent of amount of data.
It depends amount of write IOs.
Our service will glow much more, but backup time shall be no longer a problem.
Let me conclude my talk.
Taking backup with dm-snap takes a long time and makes user experience worse.
Then I compared two solutions: dm-thin and WalB.
dm-thin takes also a long time to backup once and causes spikes of I/O latency,
while WalB takes a few seconds and has no spikes.
So we chose WalB for our backup system.
WalB is of course an open source software, it is developed on GitHub.
There are vagrantfiles for Ubuntu and CentOS.
Detailed tutorial documentation is linked in readme of the vagrantfile.
Please try WalB.
Thank you for coming to this presentation.
Now questions are welcomed.