Multi-tenancy Kafka Cluster for Line Services with 250 Billion Daily Messages

MULTI-TENANCY KAFKA
CLUSTER FOR LINE SERVICES
WITH 250 BILLION DAILY
MESSAGES

SPEAKER
● Yuto Kawamura
● Senior Software Engineer
● Lead of the team providing company-
wide Kafka platform
● Active in Apache Kafka Community
● Code contribution
● Speaking at Kafka Summit

Agenda • Multitenancy company-wide
Kafka Platform

APACHE KAFKA
● Middleware for streaming data
● Highly scalable/available
● Data persistency
● Supports Pub-Sub model

KAFKA COMPONENTS
Consumer
ConsumerProducer
Producer
Broker cluster

KAFKA PLATFORM
● Large scale Kafka clusters provided for any systems/services
inside LINE
● Started internally from messaging server development team
● Expanded to company-wide platform

USAGE
● Two kinds of usage
● Distributed task queue for buffering and processing business
logic asynchronously
● e.g, Queueing heavy task from web app server to background
task processor
● "Data Hub" for distributing data to other services

SINGLE CLUSTER SHARED BY MANY SYSTEMS
● Concept of “Data Hub”
● Easy to find and access data
● Operational and management efficiency
● Not making operational cost to be proportional to users
● Concentrate engineering resources for maximizing reliability/
performance

MULTITENANCY
Blockchain
Platform
Data Analysis
System
Timeline

5 million / second
210TB
Daily inflow
4GB / second
50+
systems
250 billion
Daily messages
SCALE

● Cluster can protect itself against abusive workloads
● Accidental workload doesn't propagate to other users
● We can track on which client is sending requests
● Find source of strange requests
● Certain level of isolation among client workloads
● Slow response for one client doesn't appears to another client
MULTITENANCY REQUIREMENTS

● It is more important to manage number of requests over
incoming/outgoing byte rate
● Kafka is amazingly durable for large data
● Good design leveraging system functions
● Page cache for caching data
● sendfile(2) for zero copy transfer data
● Native batching
● Typical danger exists in clients sending many requests
PROTECT CLUSTER AGAINST ABUSING

REQUEST QUOTA
● Use Request Quota
● Limit “Time of broker threads that
can be used by each client group”
● Set default quota
● Prevent single client from
accidentally consuming all broker
resources

ISOLATION AMONG CLIENT WORKLOADS
● When can performance isolation among different clients violated?
● Let’s look at example of actual troubleshooting.

DETECTION
● 50x ~ 100x slower response time
in 99th %ile Produce response
time
● Normal: ~20ms
● Observed: 50ms ~ 200ms

FINDING #1
● Coincidental disk read of a
certain amount

FINDING #2
● Network threads' utilization was
very high

REQUEST HANDLING IN KAFKA BROKER
● Network Threads: Reads/
Writes request/response from/to
client sockets
● Request Handler Threads:
Processes requests and
produces response object

REQUEST HANDLING - READ REQUEST

REQUEST HANDLING - WRITE RESPONSE

NETWORK THREAD RUNS EVENT LOOP
● Multiplex and processes client sockets assigned sequentially
● It never blocks awaiting IO completion

WHEN NETWORK THREADS GETS BUSY...
● It means one of following:
● 1. Really busy doing lots of work Many requests/responses to
read/write
● 2. Blocked by some operations (which should not happen in
event loop in general)

RESPONSE HANDLING - NORMAL REQUESTS
● When response is in queue, all
data to be transferred are in
memory

RESPONSE HANDLING - FETCH RESPONSE
● When response is in queue, topic
data segments are not in
userspace memory
● => Copy to client socket directly
inside the kernel using
sendfile(2) system call

IF TARGET DATA NOT EXISTS IN PAGE CACHE
● Target data in page cache:
● => Just a memory copy. Very fast: ~ 100us
● Target data is NOT in page cache:
● => Needs to load data from disk into page cache first. Can be
slow: ~ 50ms (or even slower)
● => If this happens in event loop…?

SUSPECTING BLOCKING IN SENDFILE(2)
● Inspect duration of sendfile system calls issued by broker
process
● How?

SYSTEMTAP
● A kernel layer dynamic tracing tool
and scripting language
● Safe to run in production because
of low overhead
● Alternatively: DTrace, eBPF, etc…

SOLUTION
● Make sure that data is ready on memory before the response is
passed to the network thread
● => Event loop never blocks

WARMUP PAGE CACHE
● Move blocking part to request
handler threads (= single queue
and pool of threads)

WARMUP PAGE CACHE
● When Network thread calls
sendfile(2) for transferring log data,
it's always in page cache

WARMUP IMPLEMENTATION
● Easiest way: Do synchronous read(2) on target data
● => Large overhead by copying memory from kernel to
userland
● Why is Kafka using sendfile(2) for transferring topic data?
● => To avoid expensive large memory copy
● How can we achieve it keeping this property?

TRICK - ZERO COPY PAGE LOAD
● Call sendfile(2) for target data with
dest /dev/null
● The /dev/null driver does not
actually copy data to anywhere

WHY IT HAS ALMOST NO OVERHEAD?
● Linux kernel internally uses `splice` to implement sendfile(2)
● `splice` implementation of /dev/null returns without iterating
target data

IT WORKED
● No response time degradation
with coincidence of Fetch request
reading disk

KAFKA-7504 Broker performance degradation caused by call of sendfile
reading disk in network thread - x50 ~ x100 response time reduction
KAFKA-4614 Long GC pause harming broker performance which is caused by
mmap objects created for OffsetIndex - x100 ~ response time reduction
KAFKA-6501 ReplicaFetcherThread should close the ReplicaFetcherBlockingSend
earlier on shutdown - Eliminate significant latency during broker restart
WHY NOT CONTRIBUTE IT BACK?

CONCLUSION
● Introduced our engineering for operating the company-wide
Kafka platform
● Quota, SystemTap and patch understanding system deeply
● After fixing some issues, our hosting policy is working well and
efficiently, keeping:
● concept of single "Data Hub" and
● operational cost not to be proportional to the number of users/
usages
● We are contributing to the world through OSS

… AND NEXT
● Kafka platform evolution
● Clients standardization and management
● Higher availability
● SRE team
● Planning to rollout new team for Reliability Engineering
● Share knowledge/tools which are independent from
Middleware
● Come and ask me more!

Multi-tenancy Kafka Cluster for Line Services with 250 Billion Daily Messages

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Multi-tenancy Kafka Cluster for Line Services with 250 Billion Daily Messages

Similar to Multi-tenancy Kafka Cluster for Line Services with 250 Billion Daily Messages (20)

More from LINE Corporation

More from LINE Corporation (20)

Recently uploaded

Recently uploaded (20)

Multi-tenancy Kafka Cluster for Line Services with 250 Billion Daily Messages