SlideShare a Scribd company logo
1 of 22
No Data Loss Pipeline
with Apache Kafka
Jiangjie (Becket) Qin @ LinkedIn
● Data loss
o producer.send(record) is called but record
did not end up in consumer as expected
● Message reordering
o send(record1) is called before send(record2)
o record2 shows in broker before record1 does
o matters in cases like DB replication
Data loss and message reordering
Kafka based data pipeline
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Today’s Agenda:
● No data loss
● No message reordering
● Mirror maker enhancement
○ Customized consumer rebalance listener
○ Message handler
Synchronous send is safe but slow...
producer.send(record).get()
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Using asynchronous send with callback can
be tricky
producer.send(record,callback)
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Producer can cause data loss when
● block.on.buffer.full=false
● retries are exhausted
● sending message without using acks=all
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Is this good enough?
producer.send(record,callback)
● block.on.buffer.full=TRUE
● retries=Long.MAX_VALUE
● acks=all
● resend in callback when message send failed
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Message reordering might happen if:
● max.in.flight.requests.per.connection > 1, AND
● retries are enabled
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Kafka BrokerProducer
message 0
message 1
message 0 failed
retry message 0
Timeline
Message reordering might also happen if:
● producer is closed carelessly
o close producer in user thread, or
o close without using close(0)
Producer
Record
Accumulator
Sender Thread
Kafka Broker
Timeline
1.msg 0
2.callback(msg 0) ack expt.
User
Thread
close prod.
3.msg 1
notify
● close producer in the callback on error
● close producer with close(0) to prevent further
sending after previous message send failed
Producer
Record
Accumulator
Sender Thread
Kafka Broker
Timeline
1.msg 0
2.callback(msg 0) ack expt.
User
Thread
close(0)
notify
To prevent data loss:
● block.on.buffer.full=TRUE
● retries=Long.MAX_VALUE (for some use cases)
● acks=all
To prevent reordering:
● max.in.flight.requests.per.connection=1
● close producer in callback with close(0) on send failure
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Not a perfect solution:
● Producer needs to be closed to guarantee message
order. E.g. In mirror maker, one message send failure
to a topic should not affect the whole pipeline.
● When producer is down, message in buffer will still be
lost
Producer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Correct producer setting is not enough
● acks=all still can lose data when unclean
leader election happens.
● Two replicas are needed at any time to
guarantee data persistence.
Kafka Brokers
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
● replication factor >= 3
● min.isr = 2
● Replication factor > min.isr
o If replication factor = min.isr, partition will
be offline when one replica is down
Kafka Brokers
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Settings we use:
● replication factor = 3
● min.isr = 2
● unclean leader election disabled
Kafka Brokers
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
● Consumer might lose message when offsets are
committed carelessly. E.g. commit offsets before
processing messages completely
o Disable auto.offset.commit
o Commit offsets only after the messages are
processed
Consumer
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Kafka based data pipeline
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Today’s Agenda:
● No data loss
● No message reordering
● Mirror maker enhancement
○ Customized consumer rebalance listener
○ Message handler
● Consume-then-produce pattern
● Only commit consumer offsets of
messages acked by target cluster
● Default to no-data-loss and no-
reordering settings
Mirror Maker Enhancement
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
● Customized Consumer Rebalance
Listener
o Can be used to propagate topic change from
source cluster to target cluster. E.g.
partition number change, new topic
creation.
Mirror Maker Enhancement
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
● Customized Message Handler, useful for
o partition-to-partition mirror
o filtering out messages
o message format conversion
o other simple message processing
Mirror Maker Enhancement
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
● startup/shutdown acceleration
o parallelized startup and shutdown
o 26 nodes cluster with 4 consumer each
takes about 1 min to startup and shutdown
Mirror Maker Enhancement
Kafka Cluster
(Colo 1)
Producer
Kafka Cluster
(Colo 2)
ConsumerMirror Maker
Q&A

More Related Content

What's hot

Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 

What's hot (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
kafka
kafkakafka
kafka
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Apache kafka 관리와 모니터링
Apache kafka 관리와 모니터링Apache kafka 관리와 모니터링
Apache kafka 관리와 모니터링
 

Viewers also liked

BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 

Viewers also liked (8)

Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 

Similar to No data loss pipeline with apache kafka

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
HostedbyConfluent
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
Jun Rao
 
Kafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueKafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message Queue
Shafaq Abdullah
 

Similar to No data loss pipeline with apache kafka (20)

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Apache Kafka Reliability
Apache Kafka Reliability Apache Kafka Reliability
Apache Kafka Reliability
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015 Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
 
3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To
 
Kafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueKafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message Queue
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 
Reactive mistakes - ScalaDays Chicago 2017
Reactive mistakes -  ScalaDays Chicago 2017Reactive mistakes -  ScalaDays Chicago 2017
Reactive mistakes - ScalaDays Chicago 2017
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Webinar Back to Basics 3 - Introduzione ai Replica Set
Webinar Back to Basics 3 - Introduzione ai Replica SetWebinar Back to Basics 3 - Introduzione ai Replica Set
Webinar Back to Basics 3 - Introduzione ai Replica Set
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

No data loss pipeline with apache kafka

  • 1. No Data Loss Pipeline with Apache Kafka Jiangjie (Becket) Qin @ LinkedIn
  • 2. ● Data loss o producer.send(record) is called but record did not end up in consumer as expected ● Message reordering o send(record1) is called before send(record2) o record2 shows in broker before record1 does o matters in cases like DB replication Data loss and message reordering
  • 3. Kafka based data pipeline Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker Today’s Agenda: ● No data loss ● No message reordering ● Mirror maker enhancement ○ Customized consumer rebalance listener ○ Message handler
  • 4. Synchronous send is safe but slow... producer.send(record).get() Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 5. Using asynchronous send with callback can be tricky producer.send(record,callback) Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 6. Producer can cause data loss when ● block.on.buffer.full=false ● retries are exhausted ● sending message without using acks=all Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 7. Is this good enough? producer.send(record,callback) ● block.on.buffer.full=TRUE ● retries=Long.MAX_VALUE ● acks=all ● resend in callback when message send failed Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 8. Message reordering might happen if: ● max.in.flight.requests.per.connection > 1, AND ● retries are enabled Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker Kafka BrokerProducer message 0 message 1 message 0 failed retry message 0 Timeline
  • 9. Message reordering might also happen if: ● producer is closed carelessly o close producer in user thread, or o close without using close(0) Producer Record Accumulator Sender Thread Kafka Broker Timeline 1.msg 0 2.callback(msg 0) ack expt. User Thread close prod. 3.msg 1 notify
  • 10. ● close producer in the callback on error ● close producer with close(0) to prevent further sending after previous message send failed Producer Record Accumulator Sender Thread Kafka Broker Timeline 1.msg 0 2.callback(msg 0) ack expt. User Thread close(0) notify
  • 11. To prevent data loss: ● block.on.buffer.full=TRUE ● retries=Long.MAX_VALUE (for some use cases) ● acks=all To prevent reordering: ● max.in.flight.requests.per.connection=1 ● close producer in callback with close(0) on send failure Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 12. Not a perfect solution: ● Producer needs to be closed to guarantee message order. E.g. In mirror maker, one message send failure to a topic should not affect the whole pipeline. ● When producer is down, message in buffer will still be lost Producer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 13. Correct producer setting is not enough ● acks=all still can lose data when unclean leader election happens. ● Two replicas are needed at any time to guarantee data persistence. Kafka Brokers Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 14. ● replication factor >= 3 ● min.isr = 2 ● Replication factor > min.isr o If replication factor = min.isr, partition will be offline when one replica is down Kafka Brokers Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 15. Settings we use: ● replication factor = 3 ● min.isr = 2 ● unclean leader election disabled Kafka Brokers Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 16. ● Consumer might lose message when offsets are committed carelessly. E.g. commit offsets before processing messages completely o Disable auto.offset.commit o Commit offsets only after the messages are processed Consumer Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 17. Kafka based data pipeline Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker Today’s Agenda: ● No data loss ● No message reordering ● Mirror maker enhancement ○ Customized consumer rebalance listener ○ Message handler
  • 18. ● Consume-then-produce pattern ● Only commit consumer offsets of messages acked by target cluster ● Default to no-data-loss and no- reordering settings Mirror Maker Enhancement Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 19. ● Customized Consumer Rebalance Listener o Can be used to propagate topic change from source cluster to target cluster. E.g. partition number change, new topic creation. Mirror Maker Enhancement Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 20. ● Customized Message Handler, useful for o partition-to-partition mirror o filtering out messages o message format conversion o other simple message processing Mirror Maker Enhancement Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 21. ● startup/shutdown acceleration o parallelized startup and shutdown o 26 nodes cluster with 4 consumer each takes about 1 min to startup and shutdown Mirror Maker Enhancement Kafka Cluster (Colo 1) Producer Kafka Cluster (Colo 2) ConsumerMirror Maker
  • 22. Q&A

Editor's Notes

  1. What if we just stop producing on send failure? No retry.