Kafka is becoming an ever more popular choice for users to help enable fast data and Streaming. Kafka provides a wide landscape of configuration to allow you to tweak its performance profile. Understanding the internals of Kafka is critical for picking your ideal configuration. Depending on your use case and data needs, different settings will perform very differently. Lets walk through performance essentials of Kafka. Let's talk about how your Consumer configuration, can speed up or slow down the flow of messages to Brokers. Lets talk about message keys, their implications and their impact on partition performance. Lets talk about how to figure out how many partitions and how many Brokers you should have. Let's discuss consumers and what effects their performance. How do you combine all of these choices and develop the best strategy moving forward? How do you test performance of Kafka? I will attempt a live demo with the help of Zeppelin to show in real time how to tune for performance.
8. Agenda
• Performance tuning - Just some quick points
• What you can change
• Simple changes
• Kafka Configuration Changes
• Brief Canned Demo
• Beware Kafka settings are not exciting for everyone
• Architectural changes
10. Performance tuning
There is no magic bullet
Guesses are just Guesses
Empirical fact requires testing
Requires hardware, SME’s, time, effort
It’s non-trivial to do performance testing.
13. Performance tuning
The better your load tests are the better your tuning will be.
Garbage in, Garbage out.
Everyone (Every client) is different
Has a unique signature of data/hardware/topics
14. Performance tuning
The better your load tests are the better your tuning will be.
Garbage in, Garbage out.
Everyone client is different
Has a unique signature of data/hardware/topics
Tune for bottlenecks found through testing.
Yes, There is always some low hanging fruit.
22. The basics
● File descriptor limits
○ Per broker Partitions * segments +
Overhead
■ Watch this when you upgrade to 0.10
● set vm.swappiness = 0
23. The basics
● Kafka Data should be on its own disks
● If you encounter read/write issues add
more disks
● Each data folder you add to config will
be written to in round robin
24. Latest is the Greatest
● Have you upgraded to 0.10
● Add 8 bytes of time stamp
○ Not great for small messages.
● No longer does broker decompression
○ Better performance when you use compression.
● File descriptor limits
○ Segments indexing changed
26. Defaults are your friends
The default when you drive is to put on your seatbelt.
If you are going to change the default to not wearing a seatbelt I
hope you have thought through your choice.
Kafka’s defaults are setup to help keep you safe.
If you are going to change the default to something else I hope
you have thought through your choice.
28. Default Example
Acks:
Setting Description Risk of Data loss Performance
Acks=0 No acknowledgment from
the server at all.
(Set it and forget it.)
Highest Highest
Acks=1 Leader completes
write of data.
Medium Medium
Acks=all All leaders and
followers have
written the data.
Lowest Lowest
29. Default Example
Acks:
Setting Description Risk of Data loss Performance
Acks=0 No acknowledgment from
the server at all.
(Set it and forget it.)
Highest Highest
Acks=1 Leader completes
write of data.
Medium Medium
Acks=all All leaders and
followers have
written the data.
Lowest Lowest
30. Definitions:
Latency: The length of time for one message to be processed.
Throughput: The number of messages processed
Batch:
• “Message 1” - Time 1
• “Message 2” - Time 2
• “Message 3” - Time 3
← Worst Latency
← Best Latency
32. Batch Management
Batch.size
- How many messages define the maximum batch size?
Linger.ms
- What is the maximum amount of time to wait before
sending a batch?
Other:
- Same Broker Sending (Piggy Back)
- flush() or close() is called
37. Batch Management
Producer
Batch -Partition 1- TopicA
Broker
Partition
“data” “data”
Segment
Batch -Partition 1- TopicB
“data”
Partition 1 - TopicB
Segment
“data”
← Linger is triggering
Before batch is full.
← Using bigger
messages to fill the
batch
38. Batch Management
Tune your Batch.size/linger.ms
batch.size + linger.ms = latency + through put
batch.size + linger.ms = latency + through put
Once tuned, do not forget to size your buffer.memory
39. Compression
Compression.type = none
Compression can introduce performance due to transferring less
data over the network. (Cost of additional CPU)
Generalization:
Use snappy ***
*** You should do real performance tests.
41. Did we stick with the Defaults?
Custom Class written for performance?
● Partitioner
○ - Create a custom key based on data - help prevent Skew
● Serializer
○ - Pluggable
● Interceptors
○ - Allows manipulation of records into Kafka
○ - Are they being used? Should they? How are they written?
42. Tuning
To tune performance you need to experiment with different
settings.
Data and throughput are different with every project.
There is no one size fits all.
Luckily there is a tool to help test configurations.
43. kafka-run-class.sh
bin/kafka-run-class.sh
org.apache.kafka.clients.tools.ProducerPerformance
test 50000000 100 -1 acks=1
bootstrap.servers=esv4-hcl198.yoppworks.rules.com:9092
buffer.memory=67108864 batch.size=8196
Or use the short cut:
bin/kafka-producer-perf-test.sh
test 50000000 100 -1 acks=1
bootstrap.servers=esv4-hcl198.yoppworks.rules.com:9092
buffer.memory=67108864 batch.size=8196
There is also one for the consumer:
bin/kafka-consumer-perf-test.sh
45. Monitoring
Ops Clarity
- Now owned by Lightbend - Cadillac of monitoring.
Burrow
- A little Resource heavy, (Kafka client per partition)
- Health monitor has some false positives
Yahoo Kafka-manager
Confluent Control Center
- Confluent distro
Roll your own Kafka JMX & MBeans
46. Where did they get the name Kafka?
My Guess
Putting Apache Kafka to Use for Event Streams,
https://www.youtube.com/watch?v=el-SqcZLZlI
~ Jay Kreps
50. Where did they get the name Kafka?
“I thought that since Kafka was a system optimized for
writing using a writer's name would make sense. I had
taken a lot of lit classes in college and liked Franz Kafka.
Plus the name sounded cool for an open source project.”
~ Jay Kreps
https://www.quora.com/What-is-the-relation-between-Kafka-the-writer-and-Apache-Kafka-the-distributed-messagi
ng-system
51. Where did they get the name Kafka?
“I thought that since Kafka was a system optimized for writing
using a writer's name would make sense. I had taken a lot of lit classes in
college and liked Franz Kafka. Plus the name sounded cool for an open
source project.” ~ Jay Kreps
https://www.quora.com/What-is-the-relation-between-Kafka-the-writer-and-Apache-Kafka-the-distributed-messagi
ng-system
53. Broker Disk Usage
● What your rate of growth and when
will you need to expand?
● Try and make sure the number of
partions you select covers that growth
54. Broker Disk Usage
● Log.retention.bytes
■ Default is unlimited (-1)
● Log.retention.[time interval]
■ Default is 7 days (168 hours)
63. Beyond Tuning
> # of Partitions means:
> Level of parallelism
> # files open
( Partitions * Segment count * Replication) / Brokers ~= # of open files per machine
10’s of Thousands of files is manageable on appropriate hardware.
> Memory usage (Broker and Zookeeper)
> Leader fail over time (Can be mitigated by increased # brokers)
64. Beyond Tuning
How do I calculate the number of partitions to have on a broker?
What’s the rule of thumb to start testing at?
[# partitions per broker] = c x [# brokers] x [replication factor]
c ~ Your machine's awesomeness
c ~ Your appetite for risk
c ~ 100 a good safe starting point
65. Beyond Tuning
Can I move an existing partition around? I just added a new broker, and it’s not sharing the load.
Use: bin/kafka-reassign-partitions.sh
1) Create a JSON file of the topics you want to redistribute topics.json
2) Use kafka-reassign-partitions.sh … --generate to suggest partition reassignment
3) Copy proposed assignment to a JSON file.
4) Use kafka-reassign-partitions.sh … --execute to start the redistirbution process.
a) Can take several hours, depending on data.
5) Use kafka-reassign-partitions.sh … --verify to check progress of the redistirbution process.
Link to documentation from conference sponsor.
topics.json:
{"topics": [{"topic": "weather"},
{"topic": "sensors"}],
"version":1
}
66. Thanks!
Matt Andruff - Hortonworks Practice lead @ Yoppworks
@MattAndruff
I’m not an expert I just sound like one.