WalB: Real-time and Incremental Backup System for Block Devices

WalB: A Fast and Low Latency
Backup System for Block Devices
Cybozu Meetup #8 SRE WalB
Kota Uchida
September 25, 2017
1

2
About me
▌Kota Uchida
▌SRE team at Cybozu, Inc.
▌A WalB developer

3
About Cybozu
▌A large cloud service vendor in Japan.
▌Largest market shares
in field of collaborative software.
▌We serve web applications on our own cloud platform.
 kintone: a low-code business app platform
 and more

#customer companies：
#accesses / day：
write IOs / day：
20,000+
210 millions
24.5 TiB
4

5
Service Level Objective
▌24/7 nonstop service
▌99.99% availability (4 min / month)
▌Daily backup (retention period is 14 days)
▌Disaster recover: copy data to a remote site once a day

Architecture of our platform
6
Application
Server
L7LB
Storage Server
dm-snap
Storage Server
dm-snap
Backup Server
Remote Site
Database
Server
DiffDiff
DiffDiff
The scope of this talk
RAID 1
Blob
Server

Mapping
Info
Snapshot Management
with dm-snap
7
A B
Original Volume Area
Snapshot Area
Logical Structure
Physical Structure
(1) CoW
Latest Image
Write A’ Write B’
Snapshot Image
(2) Write
B’
B
B’
A
A’
A’
0 1 2 3 4

Backup using dm-snap
8
Snapshot1
(2) Full-scan a new snapshot
Logical Structure
Snapshot0
B’A’
(3) Generate a diff image
by comparing two snapshots
B
(1) Full-scan an old snapshot
B’A’
A

Full-scan at night
9
Daytime
Backup processing time
o’clock

UX degradation
during a full-scan
10Full-scanning

11
We have no more “nights”
▌Until now:
Full scan is allowed only when access rate is low, i.e., at night.
▌From now on:
We have to handle accesses from multiple timezones.
▌We must be able to backup any time without UX degradation.

12
New Solution
▌We need a new solution with:
 No IO spikes
 Short backup time
▌We compared dm-thin with WalB

13
What is dm-thin?
▌dm-thin provides thin-provisioning volume management to
 share same data among volumes
 reduce disk usage using snapshots
▌In the mainline Linux kernel

Snapshot Management
with dm-thin
Logical Structure
Physical Structure
A
Latest Tree
Latest Image A

Snapshot Management
with dm-thin
15
Logical Structure
Physical Structure
A
Snapshot Tree Latest Tree
ASnapshot
Latest Image A

Snapshot Management
with dm-thin
16
A A’
Snapshot Tree Latest Tree
(1) CoW
(1) CoW
Write A’
Physical Structure
(2) Write
(2) Update
A’
ASnapshot
Latest Image
Logical Structure

17
A B B’
Snapshot0 Snapshot1
A’
A’ B’
A BSnapshot0
Snapshot1
Generate a diff image using dm-thin metadata
Logical Structure
Physical Structure
Backup using dm-thin

18
What is WalB?
▌A real-time and incremental backup system
 developed at Cybozu Labs
▌Can backup block devices without IO spikes
dm-snap
full scanning
WalB
no spikes

Special Block Devices for WalB
19
WalB device
Data device Log device
Read Write
Any application (File system, DBMS, etc.)
Linear mapped Ring buffer

Write IO Logging and Backup
with WalB
20
A B
Data Device Log Device
0 1 2 3 4
Time series of write I/Os
Time

with WalB
21
B
A B
Write A’
A’
0 1 2 3 4
1 A’
Time
Scan the log device and
generate a diff image

with WalB
22
B
A B
B’
Write A’
Write B’
A’
A’ 41
0 1 2 3 4
A’
A’ B’
Scan the log device and
generate a diff image
Time
1

23
Performance test
▌Compared dm-snap, dm-thin, and WalB
▌Executed a workload during a backup
 The workload & the backup will affect each other
▌Measured the following metrics:
 Latencies of the workload
 Backup time

24
Environment & Settings
▌Test environment:
 CPU：2.40 GHz x 12 cores
 MEM：192 GiB
 HDD：4 TB HDD, RAID 6 (8D2P)
 NIC：10 Gbps x 2
 Kernel：4.11 (latest upstream)
▌Test settings:
 100 GiB volumes
 Workload: 4 KiB Random writes for a 5 GiB range

25
Measuring the Backup Time
(dm-snap, dm-thin)
▌dm-snap：take a snapshot & scan full image
▌dm-thin：get a structure of snapshot trees & find modified
blocks & read these blocks
5 GiB 95 GiB (unchanged)
4 KiB Random Writes
dm-snap : scan full image
dm-thin : scan changed chunks (tree traversal)

26
Measuring the Backup Time
(WalB)
▌WalB：scan logs from a log device & send them to a backup
server continuously
5 GiB 95 GiB (unchanged)
4 KiB Random Writes
WalB : scan logs
Log Device
Write IO logsWalB Device
Backup Server
DiffDiff
Network

Write I/O latency
dm-thin
dm-snap
WalB
no-backup
27
IO spikes due to CoW,
worse than dm-snap!
Small overhead
large due to CoW

Backup time
28
1146
2260
1.2
slower than dm-snap
so fast!

29
Conclusion
▌dm-snap & dm-thin
 High I/O latency during a backup
 Long backup time
▌WalB
 Stable and low I/O latency (no spikes)
 Short backup time
WalB satisfies our requirements for production use.

30
Try WalB!
▌Project page
 https://walb-linux.github.io/
▌Tutorial
 https://github.com/walb-linux/walb-
tools/tree/master/misc/vagrant/
 Vagrantfile for Ubuntu 16.04 and CentOS 7

Remote Host
31
Incremental backup
▌Daily backup (retention period is 14 days)
▌Worker daemon of WalB selects diff files older than 14
days and applies them to a base image.
Volume Diff Diff Diff…
Base
Diff files for 14 days
Backup
Host
Apply everyday

Remote Host
32
Restoring a volume
▌To restore the latest state of a volume:
 take a snapshot of a base image, and
 apply all diff files to it.
Diff Diff Diff…
Base
Base'
Writable
snapshot
Apply all diffs

Remote Host
33
Make restoration faster 1/2
▌Fast restoration
by preparing read-only snapshots for each day
Diff Diff Diff…
Base
1421
dm-thin snapshots for each day
Diff

Remote Host
34
Make restoration faster 2/2
▌Apply some diffs to the appropriate snapshot.
▌At most 24 hours of diffs are needed to be applied.
Faster!
Diff Diff Diff…
Base
1421
Diff

35
Worldline: restoring a whole
environment
▌"Worldline" means a parallel world.
▌We backup configurations in addition to user data.
 Configurations:
definitions for each customer (ID, FQDN, Apps, …),
application version definition,
host definition, etc.
▌It is important to use applications whose versions are
consistent with user data backed up before.

36
Worldline: restoring a whole
environment
▌A daily script takes a snapshot of a whole environment.
▌An weekly script restores the latest backup, so we can use it
for investigation of failures or development our services.
User
data
DiffDiff
Snap
shot
Config
DB
Config
DB'Backup Backup
Worldline
Spare hosts
Restore
DiffDiff
Restore

Q&A
email: kota-uchida@cybozu.co.jp
twitter: @uchan_nos
37

WalB: Real-time and Incremental Backup System for Block Devices

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to WalB: Real-time and Incremental Backup System for Block Devices

Similar to WalB: Real-time and Incremental Backup System for Block Devices (20)

More from uchan_nos

More from uchan_nos (20)

Recently uploaded

Recently uploaded (20)

WalB: Real-time and Incremental Backup System for Block Devices

Editor's Notes