Getting Started with Kafka on k8s

© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Getting Started w/ Kafka On K8s
Rohini Rajaram
Sr. Platform Architect, Pivotal
rrajaram@pivotal.io
July 2019

Cover w/ Image
Agenda
■ Kafka Fundamentals - Pub/Sub Done
Right
■ Kafka On K8s
■ Building Event Driven Systems
■ Demo
○ Provision a Kafka Cluster On PKS

Data Infrastructure - Point To Point

Data Infrastructure - Centralized Data Pipeline

Messaging
Systems
Why not traditional messaging
systems for the centralized
pipeline?
Transient Vs Durable Messages
Consumer Publish - Push vs Pull Based Mechanism
Offset Tracking - Replay Messages On Consumer
Failures
Horizontal Scalability
Distributed - Partitioning & Replication

Key Ideas
Key Idea 1: Data parallelism leads to scale out
Randomly distribute clients across
partitions
Key Idea 2: Disks are fast when used sequentially
Store messages as a write ahead log
Key Idea 3: Batching makes best use of the network
Batched transfer, compression, no JVM
caching (low memory footprint) & Zero Copy

Architecture Overview
Scale Out Architecture
Producer Producer Producer Producer
Consumer Consumer Consumer Consumer
Kafka Broker Cluster Topic Partitions

Producer Consumer
Broker 0 Broker 1 Broker 2 Broker 3 Broker 4 Broker n
Storage
Distributed Commit Log
Architecture Overview

Message
Offset Msg.
Length
CRC Magic Attr Timestamp Key Len Key Value
Len
Value
8 bytes 4 bytes 4 bytes 1 byte 1 byte 8 bytes 4 bytes Varying 4 bytes Varying
Bit 0-2
0 – No Compression
1 – gzip
2 – Snappy

Topics
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
P0
P1
P2
Writes
Broker 1 Broker 2 Broker 3
Node 1
P0 P1 P2
Topic Logs
P0 P0P1
P2 P2 P1
Node 2 Node 3

Distributed Commit Log
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
P0
P1
P2
Producer
writes
Consumer A Consumer B
readsreads

Partitions & Segments
Kafka
|
partition-0
|
segment0.log
segment0.index
segment5.log
segment5.index
Segment3065011416.log
Starting offset: 3065011416
offset: 3065011416 position: 0 isvalid: true payloadsize: 2020 magic: 1
compresscodec: NoCompressionCodec crc: 811055132 payload: {"name":
”Smith", msg: "Hello world"}
offset: 3065011417 position: 1779 isvalid: true payloadsize: 2244 magic: 1
compresscodec: NoCompressionCodec crc: 151590202 payload: {"name":
”James", msg: ”Hello to all of you!"}
Segment3065011416.index
Offset (rel. to Base). Position (on the
log)
0 0
1 1779
0 1 2 3 4 5 6 7 8 9
writes
Active Segment

Why File System
& Not Memory?
Lean differences with sequential access b/w file
system & memory speeds
Kafka runs on JVM
● Heavy object overheads for data stored in
memory
● Increased GC Time

Zero Copy
Page Cache
Socket Buffer NIC Buffer
Application Context
Kernel Context
User Space Buffer
OS send-file

Brokers
Cluster Aware
Receives messages from Producers,
Assigns Offset & Writes To Disk
Fetches Messages for consumers
reading partitions & responding
with committed messages.
One elected as Controller - Admin,
assigns partitions to brokers &
Monitoring
Topic Retention - Time or Size
Based
Topic A Partition 0 Topic A Partition 1
Topic A Partition 0 Topic A Partition 1
Broker 0 (Controller)
Broker 1
Leader
LeaderReplica
Replica
Kafka Cluster
Producer Consumer
Messages for A/0
Messages for A/1
Messages from A/0
Messages for A/1

Producers
Producers accept a ProducerRecord
ProducerRecord Key & Values are serialized
into byte array by Serializer
Partitioner - Chooses partition by key if not
specified & adds record to a specific batch for
the partition
Separate threads handles sending batches to
the brokers
Three Methods: 1. Fire & Forget 2.
Synchronous 3. Asynchronous

Consumers
Consumer Groups For Consumption Scaling
Topic Partitions distributed among consumers
in a group
Partitions are rebalanced on consumer
additions or crashes (consumer unavailability
& loss of consumer cache)

Replication
Broker 1 Broker 2 Broker 3
P1 P2 P3
P1 P1
P3 P2P1
P2
Producer
Topic A
P1 Leader P1 Followers
P1 P2 P3
P1 P1
P3 P2P1
P2
Topic B
Leader
Followers/ISR

StorageClass
● Dynamic provisioning persistent
volumes
● Allows admins to define different
class of storage to offer
○ aws-ebs
○ azure-disk
○ gce-pd
○ vsphere-volume
○ portworx-volume
StorageClassName (provisioner=...)
Pod
Persistent Volume Claim
Container

Pod-0
StatefulSet
...
StatefulSet
● Stable, unique network identifiers
● Stable, persistent storage
● Ordered, graceful deployment
and scaling
● Ordered, automated rolling
update
Pod-N

Custom Controller + Custom
Resource
Operator StatefulSet
Custom
Resource
... ...Deployment... ReplicaSet
StatefulSet
Controller
Custom
Controller
... ...
Deployment
Controller
...
ReplicaSet
Controller

Operator Pattern
● Kubernetes Native
○ Custom Resource + Custom
Controller
● Embedded with operational
knowledge of both data software
and Kubernetes
○ Backup/restore
○ Scale up/down
○ Rebalance data
Observe
Analyze
Act

Getting Started with Kafka on k8s

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Getting Started with Kafka on k8s

Similar to Getting Started with Kafka on k8s (20)

More from VMware Tanzu

More from VMware Tanzu (20)

Recently uploaded

Recently uploaded (20)

Getting Started with Kafka on k8s

Editor's Notes