SlideShare a Scribd company logo
1 of 40
Download to read offline
Kafka
(Exactly-once)”
1
2
e
• ) notm l tm
• 2.6 6/-u y
• .54 6 4 22 . 0 P 1 B
• . A . EC CC 0 B A FC
• 3.2 6 C 1 B
• ( h SO u
• ) a d rs
• M
• R h( u gi
• M / C B
L
LTI
erhwenkuo@gmail.com
3
Agenda
• Why exactly-once?
• An overview of messaging semantics
• Why are duplicates introduced?
• What is exactly-once semantics?
• Exactly-once semantics in Kafka
4
Kafka Exactly-once
5
An overview of messaging semantics
Kafka message delivery semantics
• At most once: offsets are committed as soon as the message is received. If
the processing goes wrong, the message will be lost (it won’t be read again).
• At least once: offsets are committed after the message is processed. If the
processing goes wrong, the message will be read again. This can result in
duplicate processing of messages. Make sure your processing is idempotent
(i.e. processing again the message won’t impact your systems)
• Exactly once: Very difficult to achieve / need strong engineering. (Kafka start
to provide “exactly once” from v.0.11
6
• Stream processing is becoming the
norm; it’s more natural.
• Apache Kafka is the most popular
streaming platform.
• Mission critical applications require
stronger guarantees.
Why exactly-once?
7
Apache Kafka’s existing semantics
At Least Once
8
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
9
Producer configurations
Kafka’s Existing Semantics
At-least-once
Key Value
x yx y
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
Send(x, y)
10
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
append(x, y)
Key Value
x yx y
K V
x yx y
11
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
ack
K V
x yx y
12
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
K V
x yx y
Key Value
x ya b
Send(a, b)
13
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
append(x, y)
K V
x yx y
Key Value
x ya b
K V
x ya b
14
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
K V
x yx yack
K V
x ya b
,
15
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
K V
x yx yack
K V
x ya b
16
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
K V
x yx y
K V
x ya b
Key Value
x ya b
Send(a, b)
,
17
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
append(x, y)
K V
x yx y
Key Value
x ya b
K V
x ya b
K V
x ya b
18
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
ack
K V
x yx y
K V
x ya b
K V
x ya b
B
At-least-once
!, ,
19
Producer configurations
Various failures must be handled correctly
• Broker can fail
• Producer-to-Broker RPC can fail
• Network between Producer & Broker can fail
• Producer client can fail
• Producer client can become zombie
Why are duplicates introduced?
20
Semantic Weaknesses
At-least-once
• Producer retries are not safe
• Processed data is not written atomically with corresponding offsets
• No protection from evil zombies
21
Producer
How did Kafka add exactly once
semantics?
version >= 0.11
22
Exactly-once semantics in Kafka, explained
Apache Kafka’s guarantees are stronger in 3 ways:
• Idempotent producer
• Exactly-once, in-order, delivery per partition.
• Transactions
• Atomic writes across multiple topics/partitions.
• Exactly-once stream processing - (Kafka Stream & KSQL)
• across read-process-write tasks
23
Exactly-once, in-order, delivery
per partition
Idempotent Producer
24
Idempotent Producer Semantics
• Idempotent is the second name to exactly once. To stop processing a
message multiple times, message must be persisted to Kafka topic
only once.
• A single successful producer.send( ) will result in exactly one copy of
the message in the log in all circumstances
• Idempotent delivery ensures that messages are delivered exactly
once to a particular topic partition during the lifetime of a single
producer.
25
How idempotent producer works?
Key Design Principle
Idempotent producer
• Exactly-once, in-order, delivery per partition.
• Avoid data duplication
• Works transparently -- only one config change.
• Resilient to broker failures, producer retries, etc.
26
How idempotent producer works?
Message Binary Format Change
Idempotent producer
• Change Log Message Binary Format
• Add “ProducerId”
• Add “Sequence” number offset
Message Format
key
value
timestamp
headers
producerid
sequence
27
The idempotent producer
pid = 100pid = 100
seq = 0
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
,
The log
28
Producer configurations
The idempotent producer
pid = 100pid = 100
seq = 0
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
Send(x, y)
key value
x yx y
pid seq
x y100 0
29
The idempotent producer
pid = 100
seq = 0
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
key value
x yx y
pid seq
x y100 0
pid = 100
append(x, y)
key value
x yx y
pid seq
x y100 0
30
The idempotent producer
pid = 100
seq = 0
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 0
key value
x yx y
pid seq
x y100 0ack
31
The idempotent producer
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 0
key value
x yx y
pid seq
x y100 0Send(a, b)
key value
x ya b
pid seq
x y100 1
32
pid = 100
seq = 0
The idempotent producer
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 1
key value
x yx y
pid seq
x y100 0
key value
x ya b
pid seq
x y100 1
key value
x ya b
pid seq
x y100 1
append(a, b)
33
The idempotent producer
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 1
key value
x yx y
pid seq
x y100 0
key value
x ya b
pid seq
x y100 1
ack
,
34
The idempotent producer
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 1
key value
x yx y
pid seq
x y100 0Send(a, b)
key value
x ya b
pid seq
x y100 1
key value
x ya b
pid seq
x y100 1
,
35
The idempotent producer
Broker found duplicate (pid + seq)!
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 1
key value
x yx y
pid seq
x y100 0
ack - duplicate
key value
x ya b
pid seq
x y100 1
+ , -
B
,
+
36
Producer Configs
• idempotent=true
• retries=infinite
• acks = all
• max.inflight=1 ??
-() 1 )
1 1 1 () ! 1( )
- .- ,
37
Producer Configs
https://issues.apache.org/jira/browse/KAFKA-5494
38
Producer Configs (Revised)
• idempotent=true
• retries=infinite
• acks = all
• max.inflight=3 (or whatever)
, E
) 0 1)
. -. )
K
39
40

More Related Content

What's hot

Self Created Load Balancer for MTA on AWS
Self Created Load Balancer for MTA on AWSSelf Created Load Balancer for MTA on AWS
Self Created Load Balancer for MTA on AWS
sharu1204
 

What's hot (19)

Flink Forward SF 2017: Jamie Grier - Apache Flink - The latest and greatest
Flink Forward SF 2017: Jamie Grier - Apache Flink - The latest and greatestFlink Forward SF 2017: Jamie Grier - Apache Flink - The latest and greatest
Flink Forward SF 2017: Jamie Grier - Apache Flink - The latest and greatest
 
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
 
Twisted Introduction
Twisted IntroductionTwisted Introduction
Twisted Introduction
 
Optimizing kubernetes networking
Optimizing kubernetes networkingOptimizing kubernetes networking
Optimizing kubernetes networking
 
Docker and Fluentd
Docker and FluentdDocker and Fluentd
Docker and Fluentd
 
Self Created Load Balancer for MTA on AWS
Self Created Load Balancer for MTA on AWSSelf Created Load Balancer for MTA on AWS
Self Created Load Balancer for MTA on AWS
 
Server Side Swift
Server Side SwiftServer Side Swift
Server Side Swift
 
ErLounge SF/Bay: 2010.01.12 Christian Westbrook / CoTweet
ErLounge SF/Bay: 2010.01.12 Christian Westbrook / CoTweetErLounge SF/Bay: 2010.01.12 Christian Westbrook / CoTweet
ErLounge SF/Bay: 2010.01.12 Christian Westbrook / CoTweet
 
Developing high-performance network servers in Lisp
Developing high-performance network servers in LispDeveloping high-performance network servers in Lisp
Developing high-performance network servers in Lisp
 
IP Virtual Server(IPVS) 101
IP Virtual Server(IPVS) 101IP Virtual Server(IPVS) 101
IP Virtual Server(IPVS) 101
 
Troubleshooting RabbitMQ and services that use it
Troubleshooting RabbitMQ and services that use itTroubleshooting RabbitMQ and services that use it
Troubleshooting RabbitMQ and services that use it
 
Whoops! I Rewrote It in Rust
Whoops! I Rewrote It in RustWhoops! I Rewrote It in Rust
Whoops! I Rewrote It in Rust
 
JRuby with Java Code in Data Processing World
JRuby with Java Code in Data Processing WorldJRuby with Java Code in Data Processing World
JRuby with Java Code in Data Processing World
 
R ext world/ useR! Kiev
R ext world/ useR!  KievR ext world/ useR!  Kiev
R ext world/ useR! Kiev
 
Driving containerd operations with gRPC
Driving containerd operations with gRPCDriving containerd operations with gRPC
Driving containerd operations with gRPC
 
Rust with-kafka-07-02-2019
Rust with-kafka-07-02-2019Rust with-kafka-07-02-2019
Rust with-kafka-07-02-2019
 
The Parenscript Common Lisp to JavaScript compiler
The Parenscript Common Lisp to JavaScript compilerThe Parenscript Common Lisp to JavaScript compiler
The Parenscript Common Lisp to JavaScript compiler
 
127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on Linux127 Ch 2: Stack overflows on Linux
127 Ch 2: Stack overflows on Linux
 
Scaling application with RabbitMQ
Scaling application with RabbitMQScaling application with RabbitMQ
Scaling application with RabbitMQ
 

Similar to TDEA 2018 Kafka EOS (Exactly-once)

SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
Jun Rao
 

Similar to TDEA 2018 Kafka EOS (Exactly-once) (20)

Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
Message reliability in kafka
Message reliability in kafkaMessage reliability in kafka
Message reliability in kafka
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Message reliability in Kafka
Message reliability in KafkaMessage reliability in Kafka
Message reliability in Kafka
 
Top Ten Kafka® Configs
Top Ten Kafka® ConfigsTop Ten Kafka® Configs
Top Ten Kafka® Configs
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Transaction preview of Apache Pulsar
Transaction preview of Apache PulsarTransaction preview of Apache Pulsar
Transaction preview of Apache Pulsar
 
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoM
 

More from Erhwen Kuo

More from Erhwen Kuo (20)

Datacon 2019-ksql-kubernetes-prometheus
Datacon 2019-ksql-kubernetes-prometheusDatacon 2019-ksql-kubernetes-prometheus
Datacon 2019-ksql-kubernetes-prometheus
 
Cncf k8s Ingress Example-03
Cncf k8s Ingress Example-03Cncf k8s Ingress Example-03
Cncf k8s Ingress Example-03
 
Cncf k8s Ingress Example-02
Cncf k8s Ingress Example-02Cncf k8s Ingress Example-02
Cncf k8s Ingress Example-02
 
Cncf k8s Ingress Example-01
Cncf k8s Ingress Example-01Cncf k8s Ingress Example-01
Cncf k8s Ingress Example-01
 
Cncf k8s_network_03 (Ingress introduction)
Cncf k8s_network_03 (Ingress introduction)Cncf k8s_network_03 (Ingress introduction)
Cncf k8s_network_03 (Ingress introduction)
 
Cncf k8s_network_02
Cncf k8s_network_02Cncf k8s_network_02
Cncf k8s_network_02
 
Cncf k8s_network_part1
Cncf k8s_network_part1Cncf k8s_network_part1
Cncf k8s_network_part1
 
Cncf explore k8s_api_go
Cncf explore k8s_api_goCncf explore k8s_api_go
Cncf explore k8s_api_go
 
CNCF explore k8s api using java client
CNCF explore k8s api using java clientCNCF explore k8s api using java client
CNCF explore k8s api using java client
 
CNCF explore k8s_api
CNCF explore k8s_apiCNCF explore k8s_api
CNCF explore k8s_api
 
Cncf Istio introduction
Cncf Istio introductionCncf Istio introduction
Cncf Istio introduction
 
啟動你的AI工匠魂
啟動你的AI工匠魂啟動你的AI工匠魂
啟動你的AI工匠魂
 
Realtime analytics with Flink and Druid
Realtime analytics with Flink and DruidRealtime analytics with Flink and Druid
Realtime analytics with Flink and Druid
 
Spark手把手:[e2-spk-s03]
Spark手把手:[e2-spk-s03]Spark手把手:[e2-spk-s03]
Spark手把手:[e2-spk-s03]
 
Spark手把手:[e2-spk-s02]
Spark手把手:[e2-spk-s02]Spark手把手:[e2-spk-s02]
Spark手把手:[e2-spk-s02]
 
Spark手把手:[e2-spk-s01]
Spark手把手:[e2-spk-s01]Spark手把手:[e2-spk-s01]
Spark手把手:[e2-spk-s01]
 
06 integrate elasticsearch
06 integrate elasticsearch06 integrate elasticsearch
06 integrate elasticsearch
 
05 integrate redis
05 integrate redis05 integrate redis
05 integrate redis
 
04 integrate entityframework
04 integrate entityframework04 integrate entityframework
04 integrate entityframework
 
03 integrate webapisignalr
03 integrate webapisignalr03 integrate webapisignalr
03 integrate webapisignalr
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

TDEA 2018 Kafka EOS (Exactly-once)

  • 2. 2
  • 3. e • ) notm l tm • 2.6 6/-u y • .54 6 4 22 . 0 P 1 B • . A . EC CC 0 B A FC • 3.2 6 C 1 B • ( h SO u • ) a d rs • M • R h( u gi • M / C B L LTI erhwenkuo@gmail.com 3
  • 4. Agenda • Why exactly-once? • An overview of messaging semantics • Why are duplicates introduced? • What is exactly-once semantics? • Exactly-once semantics in Kafka 4
  • 6. An overview of messaging semantics Kafka message delivery semantics • At most once: offsets are committed as soon as the message is received. If the processing goes wrong, the message will be lost (it won’t be read again). • At least once: offsets are committed after the message is processed. If the processing goes wrong, the message will be read again. This can result in duplicate processing of messages. Make sure your processing is idempotent (i.e. processing again the message won’t impact your systems) • Exactly once: Very difficult to achieve / need strong engineering. (Kafka start to provide “exactly once” from v.0.11 6
  • 7. • Stream processing is becoming the norm; it’s more natural. • Apache Kafka is the most popular streaming platform. • Mission critical applications require stronger guarantees. Why exactly-once? 7
  • 8. Apache Kafka’s existing semantics At Least Once 8
  • 9. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log 9 Producer configurations
  • 10. Kafka’s Existing Semantics At-least-once Key Value x yx y Producer Partition (leader) Topic: xxx Kafka Brokers The log Send(x, y) 10 Producer configurations
  • 11. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log append(x, y) Key Value x yx y K V x yx y 11 Producer configurations
  • 12. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log ack K V x yx y 12 Producer configurations
  • 13. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log K V x yx y Key Value x ya b Send(a, b) 13 Producer configurations
  • 14. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log append(x, y) K V x yx y Key Value x ya b K V x ya b 14 Producer configurations
  • 15. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log K V x yx yack K V x ya b , 15 Producer configurations
  • 16. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log K V x yx yack K V x ya b 16 Producer configurations
  • 17. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log K V x yx y K V x ya b Key Value x ya b Send(a, b) , 17 Producer configurations
  • 18. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log append(x, y) K V x yx y Key Value x ya b K V x ya b K V x ya b 18 Producer configurations
  • 19. Kafka’s Existing Semantics At-least-once Producer Partition (leader) Topic: xxx Kafka Brokers The log ack K V x yx y K V x ya b K V x ya b B At-least-once !, , 19 Producer configurations
  • 20. Various failures must be handled correctly • Broker can fail • Producer-to-Broker RPC can fail • Network between Producer & Broker can fail • Producer client can fail • Producer client can become zombie Why are duplicates introduced? 20
  • 21. Semantic Weaknesses At-least-once • Producer retries are not safe • Processed data is not written atomically with corresponding offsets • No protection from evil zombies 21 Producer
  • 22. How did Kafka add exactly once semantics? version >= 0.11 22
  • 23. Exactly-once semantics in Kafka, explained Apache Kafka’s guarantees are stronger in 3 ways: • Idempotent producer • Exactly-once, in-order, delivery per partition. • Transactions • Atomic writes across multiple topics/partitions. • Exactly-once stream processing - (Kafka Stream & KSQL) • across read-process-write tasks 23
  • 24. Exactly-once, in-order, delivery per partition Idempotent Producer 24
  • 25. Idempotent Producer Semantics • Idempotent is the second name to exactly once. To stop processing a message multiple times, message must be persisted to Kafka topic only once. • A single successful producer.send( ) will result in exactly one copy of the message in the log in all circumstances • Idempotent delivery ensures that messages are delivered exactly once to a particular topic partition during the lifetime of a single producer. 25
  • 26. How idempotent producer works? Key Design Principle Idempotent producer • Exactly-once, in-order, delivery per partition. • Avoid data duplication • Works transparently -- only one config change. • Resilient to broker failures, producer retries, etc. 26
  • 27. How idempotent producer works? Message Binary Format Change Idempotent producer • Change Log Message Binary Format • Add “ProducerId” • Add “Sequence” number offset Message Format key value timestamp headers producerid sequence 27
  • 28. The idempotent producer pid = 100pid = 100 seq = 0 Producer Partition (leader) Topic: xxx Kafka Brokers , The log 28 Producer configurations
  • 29. The idempotent producer pid = 100pid = 100 seq = 0 Producer Partition (leader) Topic: xxx Kafka Brokers The log Send(x, y) key value x yx y pid seq x y100 0 29
  • 30. The idempotent producer pid = 100 seq = 0 Producer Partition (leader) Topic: xxx Kafka Brokers The log key value x yx y pid seq x y100 0 pid = 100 append(x, y) key value x yx y pid seq x y100 0 30
  • 31. The idempotent producer pid = 100 seq = 0 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 0 key value x yx y pid seq x y100 0ack 31
  • 32. The idempotent producer pid = 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 0 key value x yx y pid seq x y100 0Send(a, b) key value x ya b pid seq x y100 1 32 pid = 100 seq = 0
  • 33. The idempotent producer pid = 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 1 key value x yx y pid seq x y100 0 key value x ya b pid seq x y100 1 key value x ya b pid seq x y100 1 append(a, b) 33
  • 34. The idempotent producer pid = 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 1 key value x yx y pid seq x y100 0 key value x ya b pid seq x y100 1 ack , 34
  • 35. The idempotent producer pid = 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 1 key value x yx y pid seq x y100 0Send(a, b) key value x ya b pid seq x y100 1 key value x ya b pid seq x y100 1 , 35
  • 36. The idempotent producer Broker found duplicate (pid + seq)! pid = 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 1 key value x yx y pid seq x y100 0 ack - duplicate key value x ya b pid seq x y100 1 + , - B , + 36
  • 37. Producer Configs • idempotent=true • retries=infinite • acks = all • max.inflight=1 ?? -() 1 ) 1 1 1 () ! 1( ) - .- , 37
  • 39. Producer Configs (Revised) • idempotent=true • retries=infinite • acks = all • max.inflight=3 (or whatever) , E ) 0 1) . -. ) K 39
  • 40. 40