Yuto Kawamura
LINE / Z Part Team
At LINE we've been operating Apache Kafka to provide the company-wide shared data pipeline for services using it for storing and distributing data.
Kafka is underlying many of our services in some way, not only the messaging service but also AD, Blockchain, Pay, Timeline, Cryptocurrency trading and more.
Many services feeding many data into our cluster, leading over 250 billion daily messages and 3.5GB incoming bytes in 1 second which is one of the world largest scale.
At the same time, it is required to be stable and performant all the time because many important services uses it as a backend.
In this talk I will introduce the overview of Kafka usage at LINE and how we're operating it.
I'm also going to talk about some engineerings we did for maximizing its performance, solving troubles led particularly by hosting huge data from many services, leveraging advanced techniques like kernel-level dynamic tracing.
2. SPEAKER
● Yuto Kawamura
● Senior Software Engineer
● Lead of the team providing company-
wide Kafka platform
● Active in Apache Kafka Community
● Code contribution
● Speaking at Kafka Summit
6. KAFKA PLATFORM
● Large scale Kafka clusters provided for any systems/services
inside LINE
● Started internally from messaging server development team
● Expanded to company-wide platform
7. USAGE
● Two kinds of usage
● Distributed task queue for buffering and processing business
logic asynchronously
● e.g, Queueing heavy task from web app server to background
task processor
● "Data Hub" for distributing data to other services
9. SINGLE CLUSTER SHARED BY MANY SYSTEMS
● Concept of “Data Hub”
● Easy to find and access data
● Operational and management efficiency
● Not making operational cost to be proportional to users
● Concentrate engineering resources for maximizing reliability/
performance
11. 5 million / second
210TB
Daily inflow
4GB / second
50+
systems
250 billion
Daily messages
SCALE
12. ● Cluster can protect itself against abusive workloads
● Accidental workload doesn't propagate to other users
● We can track on which client is sending requests
● Find source of strange requests
● Certain level of isolation among client workloads
● Slow response for one client doesn't appears to another client
MULTITENANCY REQUIREMENTS
13. ● It is more important to manage number of requests over
incoming/outgoing byte rate
● Kafka is amazingly durable for large data
● Good design leveraging system functions
● Page cache for caching data
● sendfile(2) for zero copy transfer data
● Native batching
● Typical danger exists in clients sending many requests
PROTECT CLUSTER AGAINST ABUSING
14. REQUEST QUOTA
● Use Request Quota
● Limit “Time of broker threads that
can be used by each client group”
● Set default quota
● Prevent single client from
accidentally consuming all broker
resources
15. ISOLATION AMONG CLIENT WORKLOADS
● When can performance isolation among different clients violated?
● Let’s look at example of actual troubleshooting.
16. DETECTION
● 50x ~ 100x slower response time
in 99th %ile Produce response
time
● Normal: ~20ms
● Observed: 50ms ~ 200ms
23. NETWORK THREAD RUNS EVENT LOOP
● Multiplex and processes client sockets assigned sequentially
● It never blocks awaiting IO completion
24. WHEN NETWORK THREADS GETS BUSY...
● It means one of following:
● 1. Really busy doing lots of work Many requests/responses to
read/write
● 2. Blocked by some operations (which should not happen in
event loop in general)
25. RESPONSE HANDLING - NORMAL REQUESTS
● When response is in queue, all
data to be transferred are in
memory
26. RESPONSE HANDLING - FETCH RESPONSE
● When response is in queue, topic
data segments are not in
userspace memory
● => Copy to client socket directly
inside the kernel using
sendfile(2) system call
27. IF TARGET DATA NOT EXISTS IN PAGE CACHE
● Target data in page cache:
● => Just a memory copy. Very fast: ~ 100us
● Target data is NOT in page cache:
● => Needs to load data from disk into page cache first. Can be
slow: ~ 50ms (or even slower)
● => If this happens in event loop…?
28. SUSPECTING BLOCKING IN SENDFILE(2)
● Inspect duration of sendfile system calls issued by broker
process
● How?
29. SYSTEMTAP
● A kernel layer dynamic tracing tool
and scripting language
● Safe to run in production because
of low overhead
● Alternatively: DTrace, eBPF, etc…
32. SOLUTION
● Make sure that data is ready on memory before the response is
passed to the network thread
● => Event loop never blocks
33. WARMUP PAGE CACHE
● Move blocking part to request
handler threads (= single queue
and pool of threads)
34. WARMUP PAGE CACHE
● When Network thread calls
sendfile(2) for transferring log data,
it's always in page cache
35. WARMUP IMPLEMENTATION
● Easiest way: Do synchronous read(2) on target data
● => Large overhead by copying memory from kernel to
userland
● Why is Kafka using sendfile(2) for transferring topic data?
● => To avoid expensive large memory copy
● How can we achieve it keeping this property?
36. TRICK - ZERO COPY PAGE LOAD
● Call sendfile(2) for target data with
dest /dev/null
● The /dev/null driver does not
actually copy data to anywhere
37. WHY IT HAS ALMOST NO OVERHEAD?
● Linux kernel internally uses `splice` to implement sendfile(2)
● `splice` implementation of /dev/null returns without iterating
target data
39. IT WORKED
● No response time degradation
with coincidence of Fetch request
reading disk
40. KAFKA-7504 Broker performance degradation caused by call of sendfile
reading disk in network thread - x50 ~ x100 response time reduction
KAFKA-4614 Long GC pause harming broker performance which is caused by
mmap objects created for OffsetIndex - x100 ~ response time reduction
KAFKA-6501 ReplicaFetcherThread should close the ReplicaFetcherBlockingSend
earlier on shutdown - Eliminate significant latency during broker restart
WHY NOT CONTRIBUTE IT BACK?
41. CONCLUSION
● Introduced our engineering for operating the company-wide
Kafka platform
● Quota, SystemTap and patch understanding system deeply
● After fixing some issues, our hosting policy is working well and
efficiently, keeping:
● concept of single "Data Hub" and
● operational cost not to be proportional to the number of users/
usages
● We are contributing to the world through OSS
42. … AND NEXT
● Kafka platform evolution
● Clients standardization and management
● Higher availability
● SRE team
● Planning to rollout new team for Reliability Engineering
● Share knowledge/tools which are independent from
Middleware
● Come and ask me more!