Redis on NVMe SSD - Zvika Guz, Samsung

Zvika Guz and Vijay Balakrishnan
Memory Solutions Lab, Samsung Semiconductor Inc
Redis on NVMe SSD

2
Redis-on-Flash
 Closed-source (RLEC Flesh), 100% compatible with the open-source Redis
 Uses Flash as RAM extension to increase effective node capacity
 Tiering memory into “fast” and “slow”:
 RAM saves keys and hot values
 Flash saves cold values
 Dynamic configuration of RAM/Flash usage
 Uses RockDB as the storage engine to optimize
access to block storage
 Multi-threaded and asynchronous Redis
used to access Flash

3
Why Redis-on-Flash?
 Optimize price-to-performance for a given workload
 DRAM is more performant than flash, but $/GB is higher
 Limited DRAM capacity per server
 Tiering dramatically reduces $/GB, while preserving good performance ($/ops)
 Enables orders-of-magnitude more capacity per server
 RoF is particularly suitable for large datasets with skewed access
distribution

4
Workload
 Models real-world Redis Labs customers
 Benchmark: memtier_benchmark (open source)
 GET/SET requests, varying:
1. Object size
2. Write-to-read ratio
3. Redis RAM hit ratio
 Performance target:
 Maximize operation-per-second on a single server, while maintaining sub-
millisecond latency
 Compared 3 system configuration
1. All-RAM: In-memory RLEC
2. Redis-on-NVMe: 4xSamsung PM1725 NVMe SSDs
3. Redis-on-SATA: 16xSamsung 850 Pro SATA SSDs
https://github.com/RedisLabs/memtier_benchmark

5
 Consistent sub-millisecond latencies favor NVMe
 NVMe SSD are designed for consistent high performance @ ultra-low
latency
 Modest incremental cost over SATA, with much better performance
 Samsung PM1725 is the fastest NVMe in the market
Redis-on-NVMe
Samsung PM1725 Specification*
Form Factor 2.5”
Host Interface PCIe Gen3 x4
Capacities 800GB, 1.6TB, 3.2TB
Sequential Read 3300 MB/s
Sequential Write 1900 MB/s
Random Read 840KIOPS
Random Write 130KIOPS
Read Latency 95 usec
Write Latency 60 usec
>6X over SATA
>8.5X over SATA
*PM1725 HHHL version (PCIe Gen3 x8) provides ~double the performance and capacity, but we did not use it here

7
System Configuration
 Single client, single server
 Industry-standard components, all available today
Server Dell PowerEdge R730xd, dual-socket
Processor 2 x Xeon E5-2690 v3 @ 2.6GHz
12 cores, 24 logical processor per CPU
24 cores, 48 logical processor total
Memory 256GB ECC DDR4
Network 10GbE
Storage 4 x Samsung
PM1725 NVMe
16 x Samsung
850PRO SATA SSD
Memtier_benchmark 1.2.6
RLEC version 4.3.0
Operating System Ubuntu 14.04
Linux Kernel 3.19.8

8
Use case #1: Small Objects
 100B objects, write-to-read ratio: 1:1
Perf= 750 KOPS
Latency = 0.75 msec
Disk BW=1.7 GB/s
Perf= 1.8 MOPS
Latency=0.9 msec
Disk BW=602 MB/s
50% RAM-to-Flash ratio 85% RAM-to-Flash ratio
 100% of requests served with <1msec latency

9
Disk Bandwidth Spike
 Spikes in disk bandwidth align with RocksDB compaction phase
 Can reach 2-3x the average BW
 Drives must be able to sustain these spikes, otherwise tail latency suffers
Object Size=100B, write-to-read ratio=1:1, RAM-to-Flash hit ratio=85%
Disk BW=602 MB/s

10
Use case #2: Large Objects
 1KB objects, write-to-read ratio: 1:4
 100% of requests served with <1msec latency
Perf= 270 KOPS
Latency = 0.75 msec
Disk BW=4.3 GB/s
Perf= 816 KOPS
Disk BW=3.9 GB/s
50% RAM-to-Flash ratio 85% RAM-to-Flash ratio
latency= 0.78 msec

11
Redis-on-Flash Performance
 80/20 read-to-write ratio
 With sufficient locality, RoF performance gets close to All-RAM
 NVMe speedup over SATA is 2x-2.5x (using ¼ of the drives)
7%
14% 18%
23%
26%
33%
12%
23%
36%
47%
60%
83%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
0
250,000
500,000
750,000
1,000,000
1,250,000
1,500,000
1,750,000
2,000,000
2,250,000
2,500,000
20% 30% 40% 50% 60% 70% 80% 90% 100%
OperationsPerSecond
RAM-to-Flash Hit Ratio
100B Objects
Series1 Series2
3%
8%
13%
19%
26%
35%
8%
15%
25%
35%
47%
74%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
20% 30% 40% 50% 60% 70% 80% 90% 100%
OperationsPerSecond
1KB Objects
Series1 Series2
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
0
250,000
500,000
750,000
1,000,000
1,250,000
1,500,000
1,750,000
2,000,000
2,250,000
2,500,000
20% 30% 40% 50% 60% 70% 80% 90% 100%
OperationsPerSecond
100B Objects
Series1 Series2
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
20% 30% 40% 50% 60% 70% 80% 90% 100%
OperationsPerSecond
1KB Objects
Series1 Series2

12
The Problem with SATA
 Need 4X the drives to get to ~half the performance of NVMe
 Performance is much more noisy:
 99 latency percentile > 1msec
 Very difficult to get rid of these latency spikes, exists in almost all our SATA runs
Perf= 132 KOPS
Latency = 0.65 msec
Object Size=1000B, write-to-read ratio=1:4, RAM-to-Flash hit ratio =50%

13
DRAM or Flash?
 Optimize performance/$ for each use-case
 Affected by the dataset size, access pattern, and access locality
Redis in Memory
Redis-on-NVMe
Redis-on-SATA
$/GB DRAM:NVMe:SATA = 15:2.5:1

14
Summary
 Redis-on-Flash enables:
 Order-of-magnitude more capacity per node
 High performance at significant lower cost
 Samsung PM1725 NVME:
 Enables breakthrough performance @ sub-millisecond latency
 Consistent performance reduces tail latency
 Industry standard components, available today
Thank You!
zvika.guz@samsung.com

Redis on NVMe SSD - Zvika Guz, Samsung

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Redis on NVMe SSD - Zvika Guz, Samsung

Similar to Redis on NVMe SSD - Zvika Guz, Samsung (20)

More from Redis Labs

More from Redis Labs (20)

Recently uploaded

Recently uploaded (20)

Redis on NVMe SSD - Zvika Guz, Samsung