Kafka is a high-throughput, fault-tolerant, scalable platform for building high-volume near-real-time data pipelines. This presentation is about tuning Kafka pipelines for high-performance.
Select configuration parameters and deployment topologies essential to achieve higher throughput and low latency across the pipeline are discussed. Lessons learned in troubleshooting and optimizing a truly global data pipeline that replicates 100GB data under 25 minutes is discussed.
3. Tuning Truly Global Production Kafka Pipelines
Data
Source
(Hadoop)
Kafka
Venice Feed
East Coast
Mirror-Maker
To west-coast
Mirror-Maker
To Asia
Mirror-Maker
To east-coast
Mirror-Maker
To gulf-coast
Gulf
Coast
West
Coast
Asia
Kafka
Venice
Kafka
Venice
Kafka
Venice
Kafka
Venice
Venice
Consumers
Venice
Consumers
Venice
Consumers
Venice
Consumers
East
Coast
4. But first, some basics…
• Kafka: Distributed Messaging System rethought as a distributed
commit log
Producer 1
Kafka Cluster
Broker 1
Broker 2
P0
P1’
P1
P0’
Consumer Group A
Consumer Group B
A1
A2
B1
Producer 2
Topic T
Log
Log
Replication
Topic T has 2 partitions P0 and P1.
P0’ and P1’ are replicas of P0 and P1.
5. Moving Data Is Critical in Internet Companies
(Image Credit: Kafka Online Documentation)
6. Kafka Pipeline
• Why Kafka-based Pipelines
• Producer/Consumer Throughput and Time Decoupling
• Large, Reliable, Durable buffer
• Data replication for high availability of data
Producer
Source Kafka
Cluster
Kafka
Mirror-Maker
Cluster
Destination
Kafka
Cluster
Consumer
Log Log
The main value Kafka provides to data pipelines is its ability to serve as a very
large, reliable buffer between various stages in the pipeline, effectively
decoupling producers and consumers of data within the pipeline.
7. Anatomy of a Kafka Pipeline
(Image Credit: Kafka Definitive Guide, O’Reilly)
8. Aspects of Kafka Pipelines
• Reliability and Availability
• Replication Topologies (Structure)
• Time Decoupling
• Durability
• Throughput
• Latency
• Data Integration and Schemas
• Transformations
• Fair Load Distribution
• Migration/Upgrades
• Topic Lifecycle Management
• DDoS Prevention and Quotas
• Auditing
9. Reliability and Availability
• Must avoid single points of failure
• Allow fast and automatic recovery
• Most systems need at-least once delivery guarantee
• Do not lose data
• But, be ready for duplicates
10. Replication Topologies
Hub and Spoke Architecture
(Image Credit: Kafka Definitive Guide, O’Reilly)
Kafka
Cluster
Local
Apps
Kafka
Cluster
Local
Apps
Kafka
Cluster
Local
Apps
Kafka
Cluster
Local
Apps
Kafka
Cluster
Local
Apps
Crossbar Architecture
(LinkedIn)
There are many more replication topologies
Each arrow is a
Mirror-Maker
Cluster
11. Kafka Pipelines in Industrial IoT
Coditation
[link]
telemetry
(Dotted lines and shaded shapes mean passive replication)
12. Durability (no-loss data pipeline)
• Durability interacts with throughput and latency
• Durability levels change depending upon producer configurations
Producer Configurations Throughput Latency Durability Ordered
acks=0 High Low No guarantee Yes
acks=1 Medium Medium Leader Yes
acks=all (-1) Low High In Sync Replicas Yes
13. Kafka
Mirror-Maker
Cluster
Throughput
• Producer and consumer throughputs are decoupled
• Add/Remove producers and consumers independently
• Throughput scales with cluster size
• Increase parallelization by increasing partitions
• Throughput also depends on co-location
• Remote consume throughput is much greater than remote produce
• Consumers can batch much more data in a response than producer requests
Source Kafka
Cluster
Destination
Kafka
Cluster
Log Log
Kafka
Mirror-Maker
Cluster
Remote Produce Remote Consume
Datacenter 1 Datacenter 2
14. Configurations For Tuning Throughput [link]
Producer
Source Kafka
Cluster
Kafka
Mirror-Maker
Cluster
Destination
Kafka
Cluster
Consumer
Log Log
Producer Configurations Kafka Broker Configurations KMM Configurations Consumer Configurations
batch.size num.replica.fetchers All producer and
consumer configs are
applicable
Increase # of topic
partitions
linger.ms replica.fetch.max.byte
s
Consumer to producer
ratio
fetch.message.max.byt
es
compression.type Disable inter-broker
SSL
fetch.min.bytes
acks socket.receive.buffer.
bytes
max.in.flight.requests
.per.connection
send.buffer.bytes
(also TCP buffers)
15. Latency
• Typical latency few hundred milliseconds
• Latency SLA depends on availability SLA
• One 60-minutes downtime in a week is 99.4% availability (Assuming a weekly report)
• One 1-minute downtime in a week is 99.99% availability (Assuming a weekly report)
• But SLA can be fragile
• Large Mirror-Maker clusters could take minutes to rebalance
• Maintenance of Mirror-Maker clusters could take several minutes
• Bounce Mirror-Maker cluster with 100% concurrency (to avoid repetitive rebalances)
• Configurations that affect pipeline latency
• Producer linger.ms and acks
• Topic replication factor
16. Data Integration and Schemas
• Kafka is schema agnostic
• But applications must be protected from backwards incompatible
changes to schema
• Schema-registry
• Data Integration should support schema evolution
• Only backwards compatible schema evolution
• But bend the rules if/when needed
• Single topic with multiple schemas
• Propagate schema changes automatically through the pipeline
18. Fair Load Distribution
• Ideal: Each Kafka Mirror Maker should share the burden equally
• But
• When brokers go up/down partition imbalance can happen because Preferred
Leader Election is not run
• Imbalance in partitions and change in partition leadership may caused KMM
to exceed quotas
• Remedy: Move partitions manually
19. Migration/Upgrades
• Upgrading hardware for brokers
• More cores
• More memory
• Faster NIC
• If you reduce # of brokers
• Must increase quotas
• Increase num.replica.fetchers
• Increase replica.fetch.response.max.bytes
20. Topic Lifecycle Management
• Topic creation
• Topic should be created in the destination cluster first
• If not, Mirror-Maker will start replicating the topic and may fail to produce (or
a topic with default configs gets created)
• Topic deletion
• Topic should be deleted in the source cluster first
• But only when no one is producing or consuming
• If topic is deleted in the source cluster, the mirror-maker will cause them to
be recreated with default configs due to metadata refresh
21. DDoS Prevention and Quotas
• Hadoop to Kafka pipeline gets DDoS easily
• 800+ mappers in some cases
• Should use reducers instead
• Quotas on incoming byte rate
• Byte rate may be low but request-rate also matters
• Request-rate throttling is available in Kafka 0.11.
• Mirror-Makers batch very well so request-rate throttling is not
necessarily needed
23. Global PROD Kafka Pipelines for Venice
Data
Source
(Hadoop)
Kafka
Venice Feed
East Coast
Kafka MM
To west-coast
Kafka MM
To Asia
Kafka MM
To east-coast
Kafka MM
To gulf-coast
Gulf
Coast
West
Coast
Asia
Kafka
Venice
Kafka
Venice
Kafka
Venice
Kafka
Venice
Venice
Consumers
Venice
Consumers
Venice
Consumers
Venice
Consumers
East
Coast
Low throughput
Low throughput
24. The Slow Throughput Problem (One Topic Experiment)
22 min
38 min
Replication to West Coast = 54 mins
Replication to Asia = 180 min
25. CPU Utilization On Slow Mirror-Makers
To Asia (this one was the slowest)
To West coast (slower)
Average
CPU Util
(aggregate)
Max CPU
Util
(aggregate)
To Gulf
Coast
96% 165%
To East
Coast
104% 165%
To West
Coast
40% 90%
To Asia 16% 60%
27. Setup
• Producer Setup
• 100 GB data in each push
from Hadoop
• 840 mappers producing
data
• Kafka Broker Setup
• 4 large brokers, 32 cores
each, 256 GB RAM each
• Broker replication over SSL
• Topic Replication Factor=3
• Producer ACK = -1 (all)
• Partitions = 200
• Mirror Maker Setup
• 4 independent groups
• 10 processes in each cluster
• 8 consumers in each process
• 80 consumers in each
pipeline
• It’s CPU bound (due to
decompression)
28. High Ping Latency
• From East Coast
East coast Gulf Coast West Coast Asia
0.025 ms 29 ms 67 ms 236 ms
29. Text Book Solution
• Don’t remote produce. Prefer remote consume and local produce
• Increase max.in.flight.request.per.connection > 1
Data
Source
(Hadoop)
Kafka
Venice Feed
Kafka MM
To east-coast
Kafka MM
To gulf-coast
Gulf
Coast
West
Coast
Asia
Kafka
Venice
Kafka
Venice
Kafka
Venice
Kafka
Venice
Venice
Consumers
Venice
Consumers
Venice
Consumers
Venice
Consumers
East
Coast
Kafka MM
To west-coast
Kafka MM
To Asia
30. Text Book Solution Was Not Practical (at the moment)
• Must guarantee order
(max.in.flight.requests.per.connection must be 1)
• Must open ACLs (firewall ports) for incoming remote connections. Takes
time.
• Must have hardware capacity in the destination datacenter
31. Key Observations and Remedies
• High Ping Latency
• From East-coast
• Four Source brokers
• 150+ Under Replicated Partitions (URP)
• 840 mappers (producers) is simply way to many Replaced by reducers
• SSL has overhead Disable inter-broker SSL
• Imbalanced response time
• Unequal workload on the brokers. Should do manual replica movement to spread load evenly
• Kafka Mirror Maker
• Under provisioned machines. 4 cores only. Must change to 8 cores.
• 200 partitions and 80 consumers 2 or 3 partitions per consumer Each consume talks to at most 3
brokers Inefficient Fetch Must increase # of partitions
• Producer batch.size=100K Must increase batch size (1 MB max is allowed)
• Producer send.buffer.bytes=128K Must increase send.buffer.bytes (10 MB)
• Just 1 producer per process. At most one request in flight at a time Can’t change that because order
must be preserved
East coast Gulf Coast West Coast Asia
0.025 ms 29 ms 67 ms 236 ms
32. The Solution That Saved The Day Week
• Remote produce
• Max-in-flight = 1
• Increased batch.size to 1 MB and send.buffer.bytes to 10
MB
• But there was a bug. Producer estimated batch sizes incorrectly.
• Sent larger than 1MB batches to the broker.
• Sporadic REQUEST_TO_LARGE exceptions. Shuts down KMM.
• Disabled compression estimation
• Pack a batch up to 1 MB, compress, and send.
• Resulting compressed batch size up to 650K (30% unutilized)
The main value Kafka provides to data pipelines is its ability to serve as a very large, reliable buffer between various stages in the pipeline, effectively decoupling producers and consumers of data within the pipeline. This decoupling, combined with reliability security and efficiency, makes Kafka a good fit for most data pipelines.
Fetch response sent to consumers batch much more data than a produce response can batch.
Performance of compression types differs a lot.
KMM: High value of messageBatchSize to 200K. 1 consumer 4 producers per process. Small linger because the batches fill fast due to cpu optimization
Another way to increase throughput without increase partition number is to bump up the fetch.min.bytes to something like 20 MB, this will allow more data to be fetched from a single partition. The downside is that there might be long GC due to such big memory allocation,
When end-to-end latency requirements are in seconds, even availability % starts to matter