Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
1. Redis at LINE
25 Billion Messages Per Day
Jongyeol Choi
LINE+ Corporation
2. S p e a k e r
• Jongyeol Choi
• Software engineer
• Lives in South Korea
• Works on Redis team at LINE
• Previously worked at Samsung Electronics
• Contributed to Netty (netty-codec-redis), Lettuce, etc.
3. A g e n d a
• LINE
• Storage systems for LINE Messaging System
• In-house Redis Cluster
• Scalable monitoring system
• Experiences with the official Redis Cluster
• Asynchronous Redis client
• Current challenges and future work
4. L I N E
• Messaging service
• 168 million active users in Japan,
Taiwan, Thailand, and Indonesia.
• 25 billion messages per day
• 420,000 messages sent per second at
peak
• Many family services
• News, Music, LIVE (video
streaming), Games, more
5. L I N E M e s s a g i n g S y s t e m
• Messaging server
• Most messaging features
• Java8, Spring, Thrift, Tomcat, and Armeria
• Asynchronous task processor systems
• New system backed by Kafka clusters
• Another old system backed by Redis queue
process per messaging server machine
• Other related components
A
P
I
G
A
T
E
W
A
Y
M e s s a g i n g
S e r v e r
A s y n c h ro n o u s
Ta s k P ro c e s s o r
C
L
I
E
N
T …
…
R e d i s
H B a s e
…
K a f k a
…
6. S t o r a g e S y s t e m s f o r L I N E M e s s a g i n g
• Redis
• Cache or Primary Storage
• HBase
• Backup Storage or Primary Storage
• Kafka
• For asynchronous processing
• Previous presentations about HBase and Kafka
• "HBase at LINE 2017" at LINE Developer Day 2017
• "Kafka at LINE" at Kafka Summit San Francisco 2017
A
P
I
G
A
T
E
W
A
Y
C
L
I
E
N
T …
…
R e d i s
H B a s e
…
K a f k a
…
M e s s a g i n g
S e r v e r
A s y n c h ro n o u s
Ta s k P ro c e s s o r
7. R e d i s u s a g e s f o r L I N E M e s s a g i n g
• Redis versions: 2.6, 2.8, 3.0, 3.2
• 60+ Redis clusters (In-house Redis clusters + Official Redis clusters)
• 1,000+ physical machines (8-12 Redis nodes per machine)
• Each machine: 10–20 cores (20–40 threads) / 192–256 GB memory
• 10,000+ Redis nodes (Max operations per second per node < 100,000)
• 370+ billion Redis keys and 120+ TB data in our Redis clusters
• Some clusters have 1,000–3,000 nodes in each cluster including slave nodes
8. I n - h o u s e R e d i s C l u s t e r
• Client-side sharding without proxy
• Sharding rules
• Fixed size ring or consistent hashing
• In-house facility implementations
• Cluster Manager Server + UI (Redhand)
• LINE Redis Client (w/ Jedis, Lettuce and Java)
• Redis Cluster Monitor (Scala, Akka)
C l u s t e r M a n a g e r S e r v e r
J a v a
A p p l i c a t i o n
L I N E
R e d i s
C l i e n t
HealthCheck
Sync
Update
Z o o K e e p e r
R e d i s C l u s t e r M o n i t o r
Monitoring
for statistics
master slave
shard-1
Cluster
shard-2
shard-3
9. P ro s / C o n s o f P ro x y - l e s s ( C l i e n t s h a rd i n g )
• Pros
• Short latency
• Average response time is 100–200 μs
(Messaging needs many storage I/Os in an API call)
• Cost efficiency
• Don’t need thousands of proxy servers
• Cons
• Client implementation is language dependent
• Fat client. Hard to maintain/release the client to all
related server systems
A p p l i c a t i o n
A p p l i c a t i o n
A p p l i c a t i o n
A p p l i c a t i o n
P ro x y
P ro x y
10. F a i l o v e r f o r I n - h o u s e R e d i s C l u s t e r
• Cluster types and data types
• Cache (master only) or storage (master/slave)
• Immutable or Mutable
• Cluster-Manager-Server sends PING to all Redis
nodes every 2 seconds
• When a master doesn’t respond
• Cache: Failure state → Use origin storage
• Storage: Slave becomes the new master
• Applications will “eventually” get updated
cluster information from ZooKeeper
C l u s t e r M a n a g e r S e r v e r
Update
Z o o K e e p e r
shard-1
Sync
PING
A p p l i c a t i o n A p p l i c a t i o n A p p l i c a t i o n
shard-2
11. F a i l o v e r f o r M u t a b l e d a t a
a t I n - h o u s e R e d i s C l u s t e r
• Failover with Mutable data
• Recovery Mode
• A client-side solution
• Delay all Redis commands to target
shard for few seconds
• Each Redis server node doesn’t know
each other (= Cannot use redirection)
A p p l i c a t i o n
A p p l i c a t i o n
A p p l i c a t i o n
A p p l i c a t i o n
A p p l i c a t i o n
A p p l i c a t i o n
R e c o v e r y
M o d e
shard-1 shard-2
12. R e s i z i n g a t i n - h o u s e R e d i s C l u s t e r
• Sharding rule: Consistent hashing
• Dynamic resizing, Immutable & cached data only
• Hard to balance data distribution
• Sharding rule: Fixed size ring
• No dynamic resizing, migration only
• Easy to implement & easy to balance
• Migrate data to new cluster online and in the
background
• Migration takes several days when data is large
O l d C l u s t e r
N e w C l u s t e r
A p p l i c a t i o n
Backgroundmigration
13. R e l i a b i l i t y o f R e d i s a s P r i m a r y S t o r a g e
• What if?
• What if both master and slave of a shard are down at the same time?
• Reboot all Redis servers if data center loses power?
• RDB or AOF?
• It affects average Redis’ response time
• Persistent storage?
• Adopted HBase
• Read/write important data from/to both Redis and HBase
14. D u a l Wr i t e a n d R e a d H A
( H i g h Av a i l a b i l i t y )
• Dual write
1. Write data to Redis first
2. Write data to HBase in the background
• With another thread or using Kafka
• Read HA
1. Send a read request to Redis first
2. Wait for response for a few hundred microseconds.
If no Redis response is received (rare case),
3. Send the read request to HBase concurrently
4. Use the response returned first, regardless of the
sender (usually the Redis response comes first).
Dual write
A p p l i c a t i o n
Read HA
A p p l i c a t i o n
A s y n c h ro n o u s
Ta s k P ro c e s s o r
15. H o t k e y
• Hot key results in:
• Command bursting, connection bursting
• Slowlogs
• Slower response time for applications
• How to avoid?
• Write intensive: Re-design key
• Read intensive: Use multiple slaves or
multiple clusters
Command bursting case
Count of commands per second per process
16. R e p l i c a t e d c l u s t e r
• To increase read scalability
• Used for specific purposes only
• Cache cluster
• Immutable data
• Long-lived data
• Chooses a cluster randomly
• Uses origin storage for failover
• Warming up a cluster takes several days
C l u s t e r- 1
C l u s t e r- 2
A p p l i c a t i o n
Random
C l u s t e r- 3
C l u s t e r- 4
C l u s t e r- 5
C l u s t e r- N
O r i g i n S t o r a g e
Fallback
17. T i m e o u t
• Sometimes, a Redis shard or machine slows down or
crashes
• Single bad Redis command affecting a big collection
• Command bursting caused by hot keys
• Various hardware failures (e.g. ECC memory failure)
• Waiting and timeout
• Waiting for millions of Redis responses can trigger an
outage (busy threads or a full request queue)
• Short timeout is important
• Is timeout enough?
I’m
busy!!
A p p l i c a t i o n
A p p l i c a t i o n
A p p l i c a t i o n
I’m
waiting!
I’m
waiting!
I’m
waiting!
18. C i rc u i t b re a k e r f o r f a s t f a i l u re
• Adopted a circuit breaker named Anticipator
for “important” clusters
• Aims to predict failures and not bother Redis
servers when they are busy
• When response time increases, it temporarily
marks the target shard as failed
• Applications declare failure for the request,
without sending the requests to the shard
• The shard will return to the normal state
after a short period of time and testing
Anticipator:
“You seem busy.
We are not sending any Redis request to you for now.”
A p p l i c a t i o n
A p p l i c a t i o n
A p p l i c a t i o n
19. R e d i s C l u s t e r M o n i t o r
S c a l a b l e M o n i t o r i n g S y s t e m
• Redis Cluster Monitor, an in-house monitoring system
• Gathers metrics with second precision
• Scala, Akka, Elastic Search, Kibana, Grafana
• A resilient and scalable Akka cluster
• Monitors 10,000+ Redis instances
• INFO to all nodes every second
• For network bandwidth, distributes INFO request
timing per node for response timing from all nodes
• View aggregated information on Grafana
…N o d e N o d e
← 1 sec → ← 1 sec →
20. A u t o m a t i c b u r s t i n g d e t e c t i o n
• To find command/connection bursting
• When “Redis Cluster Monitor” finds bursting
patterns
• Captures associated Redis commands
• Stores the commands into Elasticsearch
• Command, key, parameter, host:port,
client’s IP, count, and more
• Developers view the information on Kibana,
find problematic commands, and fix the cause
Detected bursting results in Kibana
21. E x p e r i e n c e s w i t h o ff i c i a l R e d i s C l u s t e r 3 . 2
• Used Redis Cluster 3.2 (not 4.x)
• Why “official” Redis Cluster?
• Dynamic resizing with mutable data by server side clustering
• Community standard (right?)
• So,
• Applied it on some clusters
• We met some issues
22. • When replacing a “master” machine?
(e.g. Memory ECC warning, disk failure)
• Killing a master? → Client failure continues from 20 sec to 1min
• Manual failover by CLUSTER FAILOVER command
• PSYNC v1 → Full-sync → Some client failures
• Workaround
• Move slots to other masters → FAILOVER → Move slots back
• Takes long and a lot of rehashing
R e d i s C l u s t e r 3 . 2 : R e p l i c a t i o n a n d o p e r a t i o n
A B
BA
CLUSTER
FAILOVER
23. R e d i s C l u s t e r 3 . 2 : M o re u s e d m e m o r y
• Needs more than standalone Redis
• key-slot in ZSET
• https://github.com/antirez/redis/issues/3800
• 2 x memory → Requires 2 x machines ?
• 10,000+ shards → 20,000+ shards ?
• 4.x uses RAX, but still uses more memory
than standalone
Type Version
Used Memory
(GB)
In-house (standalone) 3.2.11 5.91
Redis Cluster 3.2.11 11.69
In-house (standalone) 4.0.8 5.91
Redis Cluster 4.0.8 9.43
Example of a Redis server process with 56M keys
(jemalloc-4.0.3)
Max Memory: 16GB
24. R e d i s C l u s t e r 3 . 2 : M a x n o d e s s i z e
• “High performance and linear scalability up to 1,000 nodes.”
• The recommended max nodes size <= 1,000
• Some large clusters have 1,000–3,000 nodes (shards) now
• Size > 1,000 ? → The gossip traffic eat lots of network bandwidth
• Solutions
• Separating data to another cluster
• Client-side cluster sharding
25. P ro s / C o n s o f e a c h s y s t e m
Strength Weakness
Proxy Light client
Longer latency
Requires more servers for proxy
Client-Sharding
Short latency
Cost efficiency
Resizing is hard
Fat client
Official Redis Cluster Dynamic resizing is easy
Requires more memory and more servers
Limited number of maximum nodes
26. A s y n c h ro n o u s R e d i s C l i e n t
• Some LINE developers need a Redis client for our in-house Redis clusters
in their asynchronous application servers with RxJava and Armeria
• But Jedis 2.x doesn’t support asynchronous I/O
• Committed netty-codec-redis (codec only, not a client) to Netty repo
• Adopted Lettuce for LINE Redis client
• https://github.com/lettuce-io/lettuce-core
• Added implementations for in-house requirements:
client sharding, checking timeouts for each request, monitoring metrics, anticipator,
replicated cluster, external reconnection, etc
• Committed fixes and improvements to Lettuce repo
Synchronous I/O
Asynchronous I/O
Waiting
Waiting
27. L a t e n c y o f a s y n c h ro n o u s R e d i s C l i e n t ( 1 )
• Redis servers processes are fast
• Client-side latency can be a big part of the whole latency
• Average latency for a single request on an empty Linux machine
• Jedis: 16 μs
• Lettuce (Netty+epoll): 31 μs
• Differences?
• Jedis 2.9: sendto(), poll(), recvfrom() in same thread
• Lettuce 4.4.x: write(), epoll_wait(), read() + and more futex()
• ByteBuf and Netty’s buffer management, JNI (epoll), …
Nanoseconds
Jedis 2.9
Lettuce
4.4.x
netty-codec-redis
28. L a t e n c y o f a s y n c h ro n o u s R e d i s C l i e n t ( 2 )
• How about production?
• Differences depend on JVM state, threads, GC, and others
• Longer average response time
• More response time peaks
Average response time on client side (μs) Latency peaks on client side (μs)
Lettuce
Jedis
Jedis
Lettuce
29. T h ro u g h p u t o f a s y n c h ro n o u s R e d i s C l i e n t
• Good throughput
• Pros/Cons
• Sync: Short latency/Thread blocking
• Async: High throughput/Long latency
• Adopting Lettuce for asynchronous server modules
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
1 2 4 6 8 10 12
Throughput (Requests per second)
JEDIS LETTUCE
Jedis
Lettuce
30. C u r re n t c h a l l e n g e s a n d f u t u re w o r k
• Migrating more in-house clusters to official Redis Cluster 3.2
• Improving cluster management system for both in-house and official
clusters
• Adopting Lettuce-based clients more and more, for asynchronous systems
• Testing and trying Redis Cluster 4.x
• And more: Reducing hot keys, automating operations, reducing storage
clusters, reducing mutable data, and more
31. W E ’ R E H I R I N G
• https://career.linecorp.com/linecorp/career/list
• https://linecorp.com/ja/career/ja/all
• http://recruit.linepluscorp.com/lineplus/career/list