3. e
• ) notm l tm
• 2.6 6/-u y
• .54 6 4 22 . 0 P 1 B
• . A . EC CC 0 B A FC
• 3.2 6 C 1 B
• ( h SO u
• ) a d rs
• M
• R h( u gi
• M / C B
L
LTI
erhwenkuo@gmail.com
3
4. Agenda
• Why exactly-once?
• An overview of messaging semantics
• Why are duplicates introduced?
• What is exactly-once semantics?
• Exactly-once semantics in Kafka
4
6. An overview of messaging semantics
Kafka message delivery semantics
• At most once: offsets are committed as soon as the message is received. If
the processing goes wrong, the message will be lost (it won’t be read again).
• At least once: offsets are committed after the message is processed. If the
processing goes wrong, the message will be read again. This can result in
duplicate processing of messages. Make sure your processing is idempotent
(i.e. processing again the message won’t impact your systems)
• Exactly once: Very difficult to achieve / need strong engineering. (Kafka start
to provide “exactly once” from v.0.11
6
7. • Stream processing is becoming the
norm; it’s more natural.
• Apache Kafka is the most popular
streaming platform.
• Mission critical applications require
stronger guarantees.
Why exactly-once?
7
20. Various failures must be handled correctly
• Broker can fail
• Producer-to-Broker RPC can fail
• Network between Producer & Broker can fail
• Producer client can fail
• Producer client can become zombie
Why are duplicates introduced?
20
21. Semantic Weaknesses
At-least-once
• Producer retries are not safe
• Processed data is not written atomically with corresponding offsets
• No protection from evil zombies
21
Producer
22. How did Kafka add exactly once
semantics?
version >= 0.11
22
23. Exactly-once semantics in Kafka, explained
Apache Kafka’s guarantees are stronger in 3 ways:
• Idempotent producer
• Exactly-once, in-order, delivery per partition.
• Transactions
• Atomic writes across multiple topics/partitions.
• Exactly-once stream processing - (Kafka Stream & KSQL)
• across read-process-write tasks
23
25. Idempotent Producer Semantics
• Idempotent is the second name to exactly once. To stop processing a
message multiple times, message must be persisted to Kafka topic
only once.
• A single successful producer.send( ) will result in exactly one copy of
the message in the log in all circumstances
• Idempotent delivery ensures that messages are delivered exactly
once to a particular topic partition during the lifetime of a single
producer.
25
26. How idempotent producer works?
Key Design Principle
Idempotent producer
• Exactly-once, in-order, delivery per partition.
• Avoid data duplication
• Works transparently -- only one config change.
• Resilient to broker failures, producer retries, etc.
26
27. How idempotent producer works?
Message Binary Format Change
Idempotent producer
• Change Log Message Binary Format
• Add “ProducerId”
• Add “Sequence” number offset
Message Format
key
value
timestamp
headers
producerid
sequence
27
29. The idempotent producer
pid = 100pid = 100
seq = 0
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
Send(x, y)
key value
x yx y
pid seq
x y100 0
29
30. The idempotent producer
pid = 100
seq = 0
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
key value
x yx y
pid seq
x y100 0
pid = 100
append(x, y)
key value
x yx y
pid seq
x y100 0
30
31. The idempotent producer
pid = 100
seq = 0
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 0
key value
x yx y
pid seq
x y100 0ack
31
32. The idempotent producer
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 0
key value
x yx y
pid seq
x y100 0Send(a, b)
key value
x ya b
pid seq
x y100 1
32
pid = 100
seq = 0
33. The idempotent producer
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 1
key value
x yx y
pid seq
x y100 0
key value
x ya b
pid seq
x y100 1
key value
x ya b
pid seq
x y100 1
append(a, b)
33
34. The idempotent producer
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 1
key value
x yx y
pid seq
x y100 0
key value
x ya b
pid seq
x y100 1
ack
,
34
35. The idempotent producer
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 1
key value
x yx y
pid seq
x y100 0Send(a, b)
key value
x ya b
pid seq
x y100 1
key value
x ya b
pid seq
x y100 1
,
35
36. The idempotent producer
Broker found duplicate (pid + seq)!
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 1
key value
x yx y
pid seq
x y100 0
ack - duplicate
key value
x ya b
pid seq
x y100 1
+ , -
B
,
+
36