15. 15
Why improve?
• Stream processing is becoming an ever bigger part of the
data landscape.
• Apache Kafka is the heart of the streams platform.
• Strengthening Kafka’s semantics expands the universe of
streaming applications.
33. 33
TL;DR
• Sequence numbers and producer ids:
• enable de-dup
• are in the log.
• Hence de-dup works transparently across leader changes.
• Will not de-dup application-level resends.
• Works transparently – no API changes.
50. 50
Some notes on consuming transactions
• Two ‘isolation levels’ : read_committed, and
read_uncommitted.
• Messages read in offset order.
• read_committed consumers read to the point where there
are no open transactions.
51. 51
TL;DR
• Transaction coordinator and transaction log maintain
transaction state.
• Use the new producer APIs for transactions.
• Consumers can read only committed messages.
53. 53
What’s new, part 3: Performance boost!
• Up to +20% producer throughput
• Up to +50% consumer throughput
• Up to -20% disk utilization
• Savings start when you batch
• Details: https://bit.ly/kafka-eos-perf
61. 61
TL;DR
• With a batch size of 2, the new format starts saving
space.
• Savings are maximal for large batches of small
messages.
• Hence higher throughput when IO bound.
• Works as soon as you upgrade to the new format.
66. 66
Putting it together
• We understood Kafka’s existing delivery semantics
• Understood why we want to improve them
• Learned how these have been strengthened
• Learned how the new semantics work
67. 67
When is it available?
Available to try in Kafka 0.11, June 2017.