This document summarizes the key changes to Kafka's message format over different versions from 0.7 to 0.10. It discusses how the message format changes helped address bottlenecks and shifting from predominantly network-bound to CPU-bound. It also describes how LinkedIn migrated their mirroring pipelines to continue using the efficient 0.7 approach with newer Kafka versions.
9. Kafka 0.7 message format
Internal messages of compressed
message sets are not addressable
via a scalar offset
10. Kafka 0.7 message format
Consumer checkpoints offset M
for this message
11. • Tricky to checkpoint within
compressed message set
• Hard to rewind by N
messages
• Unsuitable for features
such as log compaction
Drawbacks of 0.7
message format
12. B U T ! ! !
Very efficient
(broker did not need to modify messages)
26. SSL
• Forego zero-copy optimization
• CPU overhead to decrypt/encrypt
• Minor impact
• (Used only on our mirroring pipelines
at the time)
Kafka 0.9
31. S H I F T I N G B O T T L E N E C K S O V E R T I M E
Predominantly network in 0.7
Storage and network in 0.8, 0.9 (1Gbps NICs)
Increasingly CPU in 0.9 (10Gbps NICs)
51. Migrate clients before switching to 0.10 message format
Ideal Less ideal Worse Worst
Majority
producer version
0.10 0.9 0.10 0.9
Majority
consumer version
0.10 0.10 0.9 0.9
66. U S E C A U T I O N ! !
Severe performance degradation with older clients
and there is no roll-back after switching
67. So know your clients!
• Useful to have a shepherding
system in your service infra
• EOL older libraries
• Check API versions in public
access logs
• Add API version metrics to the
Kafka broker
PRODUCERS CONSUMERS
84. 0.8+ mirror maker
• Needs to preserve order of
keyed messages
• 0.8+ consumers do not support
shallow iteration (KAFKA-732)
• 0.8+ producers do not support
pass-through mode
86. Handling keyed messages in pass-through mode
• Need to preserve order of keyed messages…
but pass-through mirror maker cannot
repartition
87. Handling keyed messages in pass-through mode
• Need to preserve order of keyed messages…
but pass-through mirror maker cannot
repartition
• Work around is to require identical partition
counts across all clusters and do identity
partitioning
• i.e., Pinput= Poutput
88. • Restore shallow iteration in consumer
(KAFKA-1895)
• “Todd’s trick” – introduce an identity
compression codec in producer
• Uniform partition counts across clusters
0.10 pass-through
mirroring how-to
89. • Restore shallow iteration in consumer
(KAFKA-1895)
• “Todd’s trick” – introduce an identity
compression codec in producer
• Uniform partition counts across clusters
• … and a few more subtleties (future talk)
0.10 pass-through
mirroring how-to
90. • Jiangjie Qin (KIP-3[1,2,3])
• Todd Palino (pass-through mirroring)
• Kafka open source community
Acknowledgments