Apache Kafka is a distributed messaging system used to build real-time data pipelines & streaming applications. Since applications rely heavily on efficient data transfer, message passing platforms like Kafka cannot afford a breakdown or poor performance.
But how do we ensure that Kafka is running well and successfully streaming messages at low latency? This is where Kafka monitoring steps in.
Here’s the agenda of the webinar -
> Why Kafka monitoring?
> Top 10 Kafka metrics to focus on
> How to change Kafka topic configuration at runtime?
2. About Knoldus
Knoldus is a technology consulting firm with focus on modernizing the digital systems
at the pace your business demands.
Functional. Reactive. Cloud Native
DevOps
3. 01 Introduction to kafka monitoring
02 Why to monitor kafka
03 Important Metrics to focus on first
04 Kafka Topic Introduction
Default kafka topic configuration
Our Agenda
05
05
06 Modify topic configuration with
Demo
4. Introduction to
Kafka Monitoring
Apache kafka deals with transfering of large amount
of real-time data( we can call it data in a motion). To
assure end-to-end stream monitoring and every
message is delivered from producer to consumer.
How long messages take to be delivered, also
determines the source of issue in your cluster.
We can monitor kafka with the help of metrics. While
monitoring kafka, it’s important to also monitor
Zookeeper as kafka depends on it.
LEARN NOW
5. c
Why to
Monitor Kafka
Kafka monitoring is important to ensure
timeliness of data delivery, overall application
performance , knowing when to scale up ,
connectivity issues and ensuring data is not lost
as we deal with streaming data. Volume of data
is large and there are different components
involved into kafka cluster which are:
Producer , Consumer and Broker.
To ensure every component is working fine.
LEARN NOW
6. BANNER INFOGRAPHICInsert Your Subtitle Here
Network Request Rate01
02
03
Since the goal of kafka brokers is to gather and move for processing,
they can also be sources of high network traffic. Monitor and
compare the network throughput per server, if possible by tracking
the number of network requests per second.
Kafka.network: type=RequestMetrics, name=RequestsPerSec
Important Metrics to Focus on
Network error Rate
Under-Replicated
Partitions
Cross referencing network throughput with related network error
rates can help diagnose the reasons for latency. Error conditions
include dropped network packets, error rates in responses per
request type, and the types of error(s) occurring.
Kafka.network: type=RequestMetrics, name=ErrorsPerSec
To ensure data durability and that brokers are always available to deliver
data , you can set a replication number per topic as applicable This
metric alert you to cases where there are fewer than the minimum
number of active brokers for a given topic.
Kafka.server: type=ReplicaManager, name=UnderReplicatedPartitions
7. BANNER INFOGRAPHICInsert Your Subtitle Here
Total broker
Partitions
04
05
06
Simply knowing how many partitions a broker is managing can help
you avoid errors and know when it’s time to scale out. The goal should
be to keep the count balanced across brokers.
Kafka.server: type=ReplicaManager, name=PartitionCount – Number
of partitions on the brokers.
Important Metrics to Focus on
Log Flush Latency
Consumer
Message Rate
Kafka Stores data by appending to existing log files .Cache based
writes are flushed to physical storage. Your monitoring strategy
should include combination of data replication and latency in the
asynchronous disk log flush time.
Kafka.log: type=LogFlushStats, name=LogFlushRateAndTimeMs
Set baselines for expected consumer message throughput and measure
fluctuations in the rate to detect latency and the need to scale the
number of consumers up and down accordingly.
Kafka.consumer type=ConsumerTopicMetrics, name=MessagePerSec,
clientId=([-.w]+) Messages consumed per sec.
8. BANNER INFOGRAPHICInsert Your Subtitle Here
Consumer Max
Lag
07
08
Even with consumers fetching messages at a high rate, producers
can still outspace them. This metrics works at the level of consumer
and partition , means each partition in each topic has its own lag for
a given consumer.
Kafka.consumer: type=ConsumerFetcherManager, name=MaxLag,
clientId=([-.w]+) Number of messages by which consumer lags
behind the producer.
Important Metrics to Focus on
Fetcher Lag
This metrics indicates the lag in the number of messages per follower
replica, indicating that replication has potentially stopped or has
been interrupted. Monitoring the replica.lag.time.max.ms
configuration parameter you can measure the time for which the
replica has not attempted to fetch new data from the leader.
Kafka.server: type=FetcherLagMetrics, name=ConsumerLag,
clientId=([-.w]+), partition=([0-9]+)
9. BANNER INFOGRAPHICInsert Your Subtitle Here
Offline Partition
Count
09
10
Offline partitions represent data stores unavailable to your
application due to a server failure or restart. In kafka cluster one of
the broker server acts as a controller for managing the states of
partitions and replicas and to reassign partitions when needed.
Kafka.controller: type=KafkaController, name=OfflinePartitionCount –
Number of partitions without an active leader.
Important Metrics to Focus on
Free Memory and
Swap space Usage
Kafka performance is best when swapping is kept to minimum. To do
this set the JVM max heap size large enough to avoid frequent
garbage collecion activity, but small enough to allow space for
filesystem caching . Additionally , watch for swap usage if you have
swap enabled , watching for increases in server swapping activity, as
this can lead to kafka operations timeout.
In many cases its best to turn off swap entirely, we have to adjust our
monitoring accordingly.
10. Introduction to
Kafka Topic
We can say that kafka topic is the
same concept as a table in the
database. But its definetly not a table
and kafka isn’t a database.
A topic is where data(messages) get
published by the producer and pulled
from by a consumer.
12. ABOUT COMPANYBy default kafka topic being created with the replication-factor 1 and Partitions as 1 for a
particular topic.
Description
Kafka Topic Configuration
1) For changing the configuration of partitions of the topic use --alter .
./kafka-topics.sh --zookeeper localhost:2181 --alter --topic
sendInvitation --partitions 3
2) For changing the configuration of replication-factor of a topic : add a
json script with the content provided in the next slide...
Modify kafka topic configuration at runtime.
13. ABOUT COMPANY
Kafka Topic Configuration
2) For changing the configuration of replication-factor of a topic : add a json script with the
content provided below:
Assume the script name is increase-replication-factor.json.
{"version":1,
"partitions":[
{"topic":"sendInvitation","partition":0,"replicas":[0,1,2]},
{"topic":"sendInvitation","partition":1,"replicas":[0,1,2]},
{"topic":"sendInvitation","partition":2,"replicas":[0,1,2]},
{"topic":"xyz","partition":0,"replicas":[0,1,2]},
{"topic":"xyz","partition":1,"replicas":[0,1,2]},
]}
Than execute the following command to run and apply this script:
./kafka-reassign-partitions --zookeeper localhost:2181 --reassignment-json-file
increase-replication-factor.json --execute
Modify kafka topic configuration at runtime.