Apache Kafka Reference

Apache Kafka Reference
Biju Nair

Architecture
BK1 BK2 BK3 BK4 BK5
ZK1 ZK2 ZK3
Producer
-  Serialize Data
- Identify Partition
-  Send data - a/sync
-  Wait for sync ack
Consumer
-  Assigned partition
- Polls for message
-  Grp Coord heartbeat
-  Partition rebalance
-  Consumer group lead
-  Partition will be assigned to mount points with less number of partitions
-  Partition restricted to one disk / mount point : log.dirs
-  Max message size : message.max.bytes
Brokers
-  All brokers registers to ZK
-  First broker becomes the
controller
-  Responsible for partition leader
election
- Controller failure and election
uses ZK watch notification
-  Broker creating /controller
Znode becomes the controller
-  Controller epoch through ZK
conditional increment operation
is used
-  First consumer is group lead
- One broker assigned as group
coordinator
-  Change in # of consumer initiates
partition rebalance
-  All reads go through part leader
-  Leader keeps track of ISR
/brokers/ids – ephemeral nodes – subscribed by kafka components
/controller – ZooKeeper watch
-  All writes go through leader
- Writes will block if < min ISR
-  Broker failure results in next
ISR being assigned part leader
-  Part leader keeps track of ISR
Fetch
Replica

Partition Leader Election
•  When a broker fails the next broker hosting an
ISR will be assigned the leader
•  Only brokers with ISR can become the leader
•  Broker which was the leader when the
partition is created is the Preferred Leader (PL)
•  If PL is not the leader and is an ISR then a
leader election is triggered
– auto.leader.rebalance.enable = true|false

Partition Replication
•  Replicas send fetch requests to part leaders
•  Replica is out of sync if
– No request for 10 seconds or
– Not caught up to the last message in 10 seconds
– replica.lag.time.max.ms

Kafka Client
•  Uses binary protocol
•  Requests are ordered – Produce/Fetch
•  Acceptor thread to Processing/network thread
•  Processing thread is configurable
–  Places requests into request queue
–  Picks responses from response queue and returns
•  Requests picked-up and processed by IO threads
•  Client makes metadata request and caches result
–  metadata.max.age.ms

Request Header
•  Request Type
•  Request Version – Client version
•  Correlation Id – Unique ID to relate req/res
•  Client Id – Identifies the application

Producer Request
•  Checks for
– User privilege
– Valid acks – 0, 1, ALL
– For valid acks ALL, check there are enough ISRs
•  Ack ALL – leader responds when replicated to
all
– Until then stores request in buffer called Purgatory

Producer Request
Produce.send(r)
Producer.send(r).get()
Producer.send(r, new CallBack())
acks = 0|1|ALL
buffer.memory.compression.type =
snappy|gzip|lz4
retry.backoff.ms
batch.size – batch size in bytes
linger.ms – time before batch send
client.id – for stats/logging
timeout.ms
request.timeout.ms
metadata.fetch.timeout.ms
max.block.ms
max.request.size
receive|sender.buffer.bytes
max.in.flight.requests.per.session = 1
B
K
2
Producer : BootStrap, KeySerializer, ValueSerializer
Producer Record (r)
Topic Partition Key Value
Serializer
Partitioner
Sender Threads
B
K
1
B
K
3
RecordMetaData

Fetch Request
•  Topic, partition, offset
•  Limit the data returned
–  Size or # of messages
•  Uses Zero-Copy for performance
–  File system cache to network cache
•  Can set min size to minimize network traffic
–  Also set time in ms to send data
•  Only sees data which has been replicated – high
water mark
•  replica.lag.time.max.ms

Partition Allocation
•  Rack awareness
•  Equal distribution
•  Leader at node A, followers in A+1, A+2
•  Partition assigned to directory with least number
of partitions
•  Partitions are divided into segments
–  Segments store 1 GB or 1 weeks worth of data
•  File handlers open to all segments in all partitions
–  OS ulimit need to be changed for open file handlers

Files
•  Indexes
–  Index to segment files -> positions within segments
–  Indexes correspond to data segments
–  Indexes will be purged along with data
•  Compaction
–  Retention policy : delete|compact
–  log.cleaner.enabled
–  To delete message; generate message with NULL values
–  No compaction of active segments
–  Compaction on topics with 50% of records being dirty
Offset
Magic/
Checksum
Compression
Codec
Timestamp Key Size Key Value Size Value
Data File

Consumer
•  Consumer Group
–  Consumers -> Partition
•  More consumers than partition -> idle consumers
•  Adding/dropping consumers -> partition rebalancing
–  While rebalancing can’t consume messages
•  Group membership maintained by heartbeat to group
coordinator
–  Heartbeats are send during poll()
•  Commit records offset of message consumed
•  Consumer crash leads to no processing of messages from
the assigned partition
–  session.timeout.ms / max.poll.interval.ms

Consumer Group
•  One broker acts as Group Coordinator
•  Consumers make JoinGroup request
•  First consumer becomes group leader
•  Leader receives all details about all consumers
•  Assigns partitions to consumer using
“PartitionAssignor”
•  Sends assignments to Group Coordinator
•  Group coordinator sends relevant information to
each consumer like the assigned partition

Consumer
•  Consumer
–  Bootstrap, KeyDeSerializer, ValueDeSerializer, Group.id
•  subscribe()
•  poll
–  Returns records
–  Sends heartbeats
•  close
•  pause/resume
–  Poll without retrieving data which does heartbeat
•  wakeup
–  JVM shutdown hook to perform housekeeping

Consumer Attributes
•  fetch.min.bytes
•  fetch.max.wait.ms – 500 ms
•  max.partiton.fetch.bytes > max.message.size
•  session.timeout.ms – 3 secs
•  heartbeat.interval.ms
•  auto.offset.reset – latest|earliest
•  enable.auto.commit – true|false
•  auto.commit.interval.ms
•  partition.assignment.strategy – range|round robin
•  client.id
•  max.poll.records
•  receive.buffer.bytes, send.buffer.bytes

Commits & Offsets
•  Topics __consumer_offset to store offsets
–  enable.auto.commit = true
–  Auto.commit.interval.ms = 5
•  During rebalancing, data can be processed twice
or missed with auto commit
•  Disable auto commit – auto.commit.offset = false
–  poll() and process all the records
–  Commit
•  commitSync()
•  commitAsync()
•  commitAsync(new OffsetCommitCallback())

Handling Rebalancing
•  Pass in “ConsumerRebalancerListener” to
subscribe
– onPartitionsRevoked
– onPartitionsAssigned
•  consumer.seekToBegining(TopicPartition)
•  consumer.seekToEnd(TopicPartition)
•  Consumer.seek(partition, offset)

Serializer/DeSerializer
•  Avro
•  String
•  Integer
•  ByteArray

Broker Configs
•  default.replication.factor
– replication.factor
•  broker.rack
•  unclean.leader.election.enable – true|false
•  min.insync.replicas

Administration
Shell Script Feature
Kafka-topics.sh Create, delete, alter, list, describe topics
Kafka-consumer-groups.sh List, describe, delete (o) consumer groups
kafka-run-class.sh
kafka.tools.ExportZkOffsets
Export offset of a consumer group
kafka-run-class.sh
kafka.tools.ImportZkOffsets
Import offset for a consumer group
kafka-configs.sh Dynamic configuration changes for topics
and client quotas
kafka-preferred-replica- election.sh Request preferred replica leader election
kafka-reassign-partitions.sh Reassign partitions, rebalance, change
replication
kafka-run-class.sh
kafka.tools.DumpLogSegments
Dump log segments and verify indexes
kafka-replica-verification.sh Replica verification
kafka-verifiable-producer/consumer.sh Script

Apache Kafka Reference

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Kafka Reference

Similar to Apache Kafka Reference (20)

More from Biju Nair

More from Biju Nair (17)

Recently uploaded

Recently uploaded (20)

Apache Kafka Reference