SlideShare a Scribd company logo
1 of 16
Common issues with Apache
Kafka® Producer
Badai Aqrandista, Senior Technical Support Engineer
Introduction
2
• My name is BADAI AQRANDISTA
• I started as a web developer, building website with Perl
and PHP in 2005.
• Experience supporting applications on Linux/UNIX
environment, from hotel booking engine,
telecommunication billing system, and mining equipment
monitoring system.
• Currently working for Confluent as Senior Technical
Support Engineer.
Kafka in a nutshell
3
• Kafka is a Pub/Sub system
• Kafka Producer sends record into Kafka
broker
• Kafka Consumer fetches record from
Kafka broker
• Kafka broker persists any data it receives
until retention period expires
PRODUCER CONSUME
R
Kafka Producer Internals
Kafka Producer Internals
5
• KafkaProducer API:
• public Future<RecordMetadata> send(ProducerRecord<K,V> record)
• public Future<RecordMetadata> send(ProducerRecord<K,V> record, Callback callback)
• KafkaProducer#send method is asynchronous.
• It does not immediately send the record to Kafka broker.
• It puts the record in an internal queue and an internal queue will send multiple records as a
batch.
Batch
Record
Key
Value
Record
Key
Value
Record
Key
Value
Kafka Producer Internals
6
• Each Kafka Producer batch corresponds to a partitions.
• Kafka Producer determines the batch to append a record to based on the record key.
• If record key is “null”, Kafka Producer will choose the batch randomly.
• If record key is not “null”, Kafka Producer will use the hash of the record key to determine
the partition number.
• One or more batches are sent to the Kafka broker in a PRODUCE request.
Kafka Producer Internals
7
• Kafka Producer internal thread sends a batch to Kafka broker based on these
configuration:
• “batch.size” – defaults to 16 kB
• “linger.ms” – defaults to 0
• So, Kafka Producer internal thread sends a batch to Kafka broker when:
• The total size of records in the batch exceeds “batch.size”, or
• The time since batch creation exceeds “linger.ms”, or
• Kafka Producer ”flush()” method is called (directly or indirectly via “close()”).
• Kafka Producer only creates one connection to each broker.
• In the end, every batch for a Kafka broker must be sent sequentially through this one
connection.
• The maximum number of batches sent to each broker at any one time is controlled by
“max.in.flight.requests.per.connection”, which defaults to 5.
Kafka Producer Issues
Kafka Producer Issues
9
1. Failure to connect to Kafka broker
2. Record is too large
3. Batch expires before sending
4. Not enough replicas error
Failure to connect to Kafka broker
10
• This error is not obvious, but it means failure to connect to Kafka broker.
• The error message looks like this:
• [2021-08-02 12:57:44,097] WARN [Producer clientId=producer-1] Connection to node -1
(kafka1/172.20.0.6:9093) could not be established. Broker may not be available.
(org.apache.kafka.clients.NetworkClient)
• How to fix this:
• Check the broker configuration to confirm the listener port and security protocol
• Check the hostname or the IP address of the broker
• Confirm that Kafka Producer’s bootstrap.server configuration is correct
• Confirm that connectivity exists between Kafka Producer’s host and Kafka broker hosts with commands
such as:
• ping {BROKER_HOST}
• nc {BROKER_HOST} {BROKER PORT}
• openssl s_client -connect {BROKER_HOST}:{BROKER_PORT}
Record is too large
11
• This error is because the record size is greater than “max.request.size” configuration, which
defaults to 1048576 (1 MB).
• The error message is like this:
• org.apache.kafka.common.errors.RecordTooLargeException: The message is 1600088 bytes when
serialized which is larger than 1048576, which is the value of the max.request.size configuration.
• How to fix it:
• Reduce the record size. This requires a change in the application that generates the record.
• If you cannot reduce the record size, you can increase producer configuration “max.request.size”. If you
do this, you also need to increase topic configuration “max.message.bytes”.
• Note: “max.request.size” is the maximum request size AFTER serialization but BEFORE
compression. So, setting compression will not fix this.
Batch expires before sending
12
• This error is a symptom of slow transfer time (on network) or slow processing (on Kafka
broker).
• The error looks like this:
• org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test1-0:1500 ms has
passed since batch creation
• Sanity checks:
• Is the topic partition online? Topic partition is online if one or more Kafka brokers hosting the replicas
are online.
• Use “kafka-topics --bootstrap-server {BROKER HOST:PORT} --describe --topic {TOPIC NAME}”
• “delivery.timeout.ms” – An upper bound on the time to report success or failure after a call to send()
returns.
• The default value is 120000 ms (2 minutes).
• If ”delivery.timeout.ms” is set to a very low value, it can cause batches to be expired too early.
• “batch.size” – The maximum size of a record batch.
• The default value is 16384 bytes (16 kB).
• If the message size is large, this configuration may need to be increased to allow more records per
batch. More records per batch means higher throughput and lower latency per record.
Batch expires before sending
13
• How to investigate this issue (cont’d):
• First, we need to identify whether this is caused by slow transfer time or slow processing.
• To check if it is slow transfer time, execute “ping {BROKER HOST}” from the producer host. The round trip time
(RTT) should be reasonable. For example: If both producer and Kafka brokers are in the same data center, the
RTT should be less than 10 ms, mostly should be under 1 ms.
• If ”ping” result is good (i.e. consistently under 10 ms with 0% packet loss), then network latency is unlikely
to be the cause.
• To check if it is slow processing, check the following on Kafka brokers:
• Number of connections on the Kafka broker with “netstat -n | grep 9092 | wc -l”. More than 1000
connections is usually too high and can cause slow processing or connectivity issue.
• Number of topic partitions per broker. More than 1500 partitions per broker is usually too high and can
cause slow processing. Check it with “kafka-topics --describe | awk ‘{print $5, $6}’ | sort | uniq –c”.
• If Kafka broker host has enough CPU and memory, then you can increment “num.replica.fetchers” to 2 or 3 to allow
more partitions per broker.
• Inter-broker ”ping” latency. If the brokers are running on multiple data center (e.g. multiple Availability
Zone), then this may be significant contributor to produce latency.
• CPU usage of Kafka brokers. Following JMX metrics also show the internal thread idle-ness if you need:
• kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent – if this is low (< 0.5), that
means it needs higher “num.io.threads”, if CPU allows.
• kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent – if this is low (< 0.5), that means it
needs higher “num.network.threads”, if CPU allows.
Not enough replicas error
14
• This means the number of replicas in ISR is less than “min.insync.replicas” configuration.
• The error looks like this:
• [2021-08-03 01:34:05,077] WARN [Producer clientId=producer-1] Got error produce response with
correlation id 3 on topic-partition test2-0, retrying (2147483646 attempts left). Error:
NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender)
• This error occurs when:
• Topic replication factor is 3.
• Topic configuration includes “min.insync.replicas=2”.
• Producer uses “acks=all” configuration.
Not enough replica error
15
• What is ISR? Short for “In Sync Replicas”. This means the follower replicas that are in sync
with the leader. In other word, the follower replicas that have all records that the leader
replica has.
• How can a replica become out of sync? Either because the broker is offline or replication
failure or slow replication.
• How to fix this error:
• If it is out of sync because Kafka broker being offline, start the broker hosting the offline replicas.
• If it is out of sync because of replication failure, fix the failure. This is separate discussion. But the most
common one is disk failure. If the disk storing the replica data is full, Kafka broker will stop replicating all
replicas on that disk.
• If it is out of sync because of slow replication, fix the slow replication. This is also separate discussion.
But the most common cause is inter-broker latency or too many topic partitions per broker.
Thank you. Any questions?

More Related Content

What's hot

Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...confluent
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connectconfluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistentconfluent
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overviewconfluent
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database Systemconfluent
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practicesconfluent
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 

What's hot (20)

Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overview
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 

Similar to Common issues with Apache Kafka® Producer

Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafkaconfluent
 
Apache Kafka Reliability
Apache Kafka Reliability Apache Kafka Reliability
Apache Kafka Reliability Jeff Holoman
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafkaconfluent
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka TLV
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to heroAvi Levi
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaLevon Avakyan
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationKnoldus Inc.
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Otávio Carvalho
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleScyllaDB
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersJean-Paul Azar
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE
 
Load Balancing 101
Load Balancing 101Load Balancing 101
Load Balancing 101HungWei Chiu
 

Similar to Common issues with Apache Kafka® Producer (20)

Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Apache Kafka Reliability
Apache Kafka Reliability Apache Kafka Reliability
Apache Kafka Reliability
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using Kafka
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at Scale
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
 
Load Balancing 101
Load Balancing 101Load Balancing 101
Load Balancing 101
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Common issues with Apache Kafka® Producer

  • 1. Common issues with Apache Kafka® Producer Badai Aqrandista, Senior Technical Support Engineer
  • 2. Introduction 2 • My name is BADAI AQRANDISTA • I started as a web developer, building website with Perl and PHP in 2005. • Experience supporting applications on Linux/UNIX environment, from hotel booking engine, telecommunication billing system, and mining equipment monitoring system. • Currently working for Confluent as Senior Technical Support Engineer.
  • 3. Kafka in a nutshell 3 • Kafka is a Pub/Sub system • Kafka Producer sends record into Kafka broker • Kafka Consumer fetches record from Kafka broker • Kafka broker persists any data it receives until retention period expires PRODUCER CONSUME R
  • 5. Kafka Producer Internals 5 • KafkaProducer API: • public Future<RecordMetadata> send(ProducerRecord<K,V> record) • public Future<RecordMetadata> send(ProducerRecord<K,V> record, Callback callback) • KafkaProducer#send method is asynchronous. • It does not immediately send the record to Kafka broker. • It puts the record in an internal queue and an internal queue will send multiple records as a batch. Batch Record Key Value Record Key Value Record Key Value
  • 6. Kafka Producer Internals 6 • Each Kafka Producer batch corresponds to a partitions. • Kafka Producer determines the batch to append a record to based on the record key. • If record key is “null”, Kafka Producer will choose the batch randomly. • If record key is not “null”, Kafka Producer will use the hash of the record key to determine the partition number. • One or more batches are sent to the Kafka broker in a PRODUCE request.
  • 7. Kafka Producer Internals 7 • Kafka Producer internal thread sends a batch to Kafka broker based on these configuration: • “batch.size” – defaults to 16 kB • “linger.ms” – defaults to 0 • So, Kafka Producer internal thread sends a batch to Kafka broker when: • The total size of records in the batch exceeds “batch.size”, or • The time since batch creation exceeds “linger.ms”, or • Kafka Producer ”flush()” method is called (directly or indirectly via “close()”). • Kafka Producer only creates one connection to each broker. • In the end, every batch for a Kafka broker must be sent sequentially through this one connection. • The maximum number of batches sent to each broker at any one time is controlled by “max.in.flight.requests.per.connection”, which defaults to 5.
  • 9. Kafka Producer Issues 9 1. Failure to connect to Kafka broker 2. Record is too large 3. Batch expires before sending 4. Not enough replicas error
  • 10. Failure to connect to Kafka broker 10 • This error is not obvious, but it means failure to connect to Kafka broker. • The error message looks like this: • [2021-08-02 12:57:44,097] WARN [Producer clientId=producer-1] Connection to node -1 (kafka1/172.20.0.6:9093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) • How to fix this: • Check the broker configuration to confirm the listener port and security protocol • Check the hostname or the IP address of the broker • Confirm that Kafka Producer’s bootstrap.server configuration is correct • Confirm that connectivity exists between Kafka Producer’s host and Kafka broker hosts with commands such as: • ping {BROKER_HOST} • nc {BROKER_HOST} {BROKER PORT} • openssl s_client -connect {BROKER_HOST}:{BROKER_PORT}
  • 11. Record is too large 11 • This error is because the record size is greater than “max.request.size” configuration, which defaults to 1048576 (1 MB). • The error message is like this: • org.apache.kafka.common.errors.RecordTooLargeException: The message is 1600088 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration. • How to fix it: • Reduce the record size. This requires a change in the application that generates the record. • If you cannot reduce the record size, you can increase producer configuration “max.request.size”. If you do this, you also need to increase topic configuration “max.message.bytes”. • Note: “max.request.size” is the maximum request size AFTER serialization but BEFORE compression. So, setting compression will not fix this.
  • 12. Batch expires before sending 12 • This error is a symptom of slow transfer time (on network) or slow processing (on Kafka broker). • The error looks like this: • org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test1-0:1500 ms has passed since batch creation • Sanity checks: • Is the topic partition online? Topic partition is online if one or more Kafka brokers hosting the replicas are online. • Use “kafka-topics --bootstrap-server {BROKER HOST:PORT} --describe --topic {TOPIC NAME}” • “delivery.timeout.ms” – An upper bound on the time to report success or failure after a call to send() returns. • The default value is 120000 ms (2 minutes). • If ”delivery.timeout.ms” is set to a very low value, it can cause batches to be expired too early. • “batch.size” – The maximum size of a record batch. • The default value is 16384 bytes (16 kB). • If the message size is large, this configuration may need to be increased to allow more records per batch. More records per batch means higher throughput and lower latency per record.
  • 13. Batch expires before sending 13 • How to investigate this issue (cont’d): • First, we need to identify whether this is caused by slow transfer time or slow processing. • To check if it is slow transfer time, execute “ping {BROKER HOST}” from the producer host. The round trip time (RTT) should be reasonable. For example: If both producer and Kafka brokers are in the same data center, the RTT should be less than 10 ms, mostly should be under 1 ms. • If ”ping” result is good (i.e. consistently under 10 ms with 0% packet loss), then network latency is unlikely to be the cause. • To check if it is slow processing, check the following on Kafka brokers: • Number of connections on the Kafka broker with “netstat -n | grep 9092 | wc -l”. More than 1000 connections is usually too high and can cause slow processing or connectivity issue. • Number of topic partitions per broker. More than 1500 partitions per broker is usually too high and can cause slow processing. Check it with “kafka-topics --describe | awk ‘{print $5, $6}’ | sort | uniq –c”. • If Kafka broker host has enough CPU and memory, then you can increment “num.replica.fetchers” to 2 or 3 to allow more partitions per broker. • Inter-broker ”ping” latency. If the brokers are running on multiple data center (e.g. multiple Availability Zone), then this may be significant contributor to produce latency. • CPU usage of Kafka brokers. Following JMX metrics also show the internal thread idle-ness if you need: • kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent – if this is low (< 0.5), that means it needs higher “num.io.threads”, if CPU allows. • kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent – if this is low (< 0.5), that means it needs higher “num.network.threads”, if CPU allows.
  • 14. Not enough replicas error 14 • This means the number of replicas in ISR is less than “min.insync.replicas” configuration. • The error looks like this: • [2021-08-03 01:34:05,077] WARN [Producer clientId=producer-1] Got error produce response with correlation id 3 on topic-partition test2-0, retrying (2147483646 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender) • This error occurs when: • Topic replication factor is 3. • Topic configuration includes “min.insync.replicas=2”. • Producer uses “acks=all” configuration.
  • 15. Not enough replica error 15 • What is ISR? Short for “In Sync Replicas”. This means the follower replicas that are in sync with the leader. In other word, the follower replicas that have all records that the leader replica has. • How can a replica become out of sync? Either because the broker is offline or replication failure or slow replication. • How to fix this error: • If it is out of sync because Kafka broker being offline, start the broker hosting the offline replicas. • If it is out of sync because of replication failure, fix the failure. This is separate discussion. But the most common one is disk failure. If the disk storing the replica data is full, Kafka broker will stop replicating all replicas on that disk. • If it is out of sync because of slow replication, fix the slow replication. This is also separate discussion. But the most common cause is inter-broker latency or too many topic partitions per broker.
  • 16. Thank you. Any questions?