SlideShare a Scribd company logo
1 of 25
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Getting Started w/ Kafka On K8s
Rohini Rajaram
Sr. Platform Architect, Pivotal
rrajaram@pivotal.io
July 2019
Cover w/ Image
Agenda
■ Kafka Fundamentals - Pub/Sub Done
Right
■ Kafka On K8s
■ Building Event Driven Systems
■ Demo
○ Provision a Kafka Cluster On PKS
Data Infrastructure - Point To Point
Data Infrastructure - Centralized Data Pipeline
Messaging
Systems
Why not traditional messaging
systems for the centralized
pipeline?
Transient Vs Durable Messages
Consumer Publish - Push vs Pull Based Mechanism
Offset Tracking - Replay Messages On Consumer
Failures
Horizontal Scalability
Distributed - Partitioning & Replication
Key Ideas
Key Idea 1: Data parallelism leads to scale out
Randomly distribute clients across
partitions
Key Idea 2: Disks are fast when used sequentially
Store messages as a write ahead log
Key Idea 3: Batching makes best use of the network
Batched transfer, compression, no JVM
caching (low memory footprint) & Zero Copy
Architecture Overview
Scale Out Architecture
Producer Producer Producer Producer
Consumer Consumer Consumer Consumer
Kafka Broker Cluster Topic Partitions
Producer Consumer
Broker 0 Broker 1 Broker 2 Broker 3 Broker 4 Broker n
Storage
Distributed Commit Log
Architecture Overview
Message
Offset Msg.
Length
CRC Magic Attr Timestamp Key Len Key Value
Len
Value
8 bytes 4 bytes 4 bytes 1 byte 1 byte 8 bytes 4 bytes Varying 4 bytes Varying
Bit 0-2
0 – No Compression
1 – gzip
2 – Snappy
Topics
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
P0
P1
P2
Writes
Broker 1 Broker 2 Broker 3
Node 1
P0 P1 P2
Topic Logs
P0 P0P1
P2 P2 P1
Node 2 Node 3
Distributed Commit Log
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
P0
P1
P2
Producer
writes
Consumer A Consumer B
readsreads
Partitions & Segments
Kafka
|
partition-0
|
segment0.log
segment0.index
segment5.log
segment5.index
Segment3065011416.log
Starting offset: 3065011416
offset: 3065011416 position: 0 isvalid: true payloadsize: 2020 magic: 1
compresscodec: NoCompressionCodec crc: 811055132 payload: {"name":
”Smith", msg: "Hello world"}
offset: 3065011417 position: 1779 isvalid: true payloadsize: 2244 magic: 1
compresscodec: NoCompressionCodec crc: 151590202 payload: {"name":
”James", msg: ”Hello to all of you!"}
Segment3065011416.index
Offset (rel. to Base). Position (on the
log)
0 0
1 1779
0 1 2 3 4 5 6 7 8 9
writes
Active Segment
Why File System
& Not Memory?
Lean differences with sequential access b/w file
system & memory speeds
Kafka runs on JVM
● Heavy object overheads for data stored in
memory
● Increased GC Time
Zero Copy
Page Cache
Socket Buffer NIC Buffer
Application Context
Kernel Context
User Space Buffer
OS send-file
Brokers
Cluster Aware
Receives messages from Producers,
Assigns Offset & Writes To Disk
Fetches Messages for consumers
reading partitions & responding
with committed messages.
One elected as Controller - Admin,
assigns partitions to brokers &
Monitoring
Topic Retention - Time or Size
Based
Topic A Partition 0 Topic A Partition 1
Topic A Partition 0 Topic A Partition 1
Broker 0 (Controller)
Broker 1
Leader
LeaderReplica
Replica
Kafka Cluster
Producer Consumer
Messages for A/0
Messages for A/1
Messages from A/0
Messages for A/1
Producers
Producers accept a ProducerRecord
ProducerRecord Key & Values are serialized
into byte array by Serializer
Partitioner - Chooses partition by key if not
specified & adds record to a specific batch for
the partition
Separate threads handles sending batches to
the brokers
Three Methods: 1. Fire & Forget 2.
Synchronous 3. Asynchronous
Consumers
Consumer Groups For Consumption Scaling
Topic Partitions distributed among consumers
in a group
Partitions are rebalanced on consumer
additions or crashes (consumer unavailability
& loss of consumer cache)
Replication
Broker 1 Broker 2 Broker 3
P1 P2 P3
P1 P1
P3 P2P1
P2
Producer
Topic A
P1 Leader P1 Followers
P1 P2 P3
P1 P1
P3 P2P1
P2
Topic B
Leader
Followers/ISR
Basics
Kubernetes for Data
StorageClass
● Dynamic provisioning persistent
volumes
● Allows admins to define different
class of storage to offer
○ aws-ebs
○ azure-disk
○ gce-pd
○ vsphere-volume
○ portworx-volume
StorageClassName (provisioner=...)
Pod
Persistent Volume Claim
Container
Pod-0
StatefulSet
...
StatefulSet
● Stable, unique network identifiers
● Stable, persistent storage
● Ordered, graceful deployment
and scaling
● Ordered, automated rolling
update
Pod-N
Custom Controller + Custom
Resource
Operator StatefulSet
Custom
Resource
... ...Deployment... ReplicaSet
StatefulSet
Controller
Custom
Controller
... ...
Deployment
Controller
...
ReplicaSet
Controller
Operator Pattern
● Kubernetes Native
○ Custom Resource + Custom
Controller
● Embedded with operational
knowledge of both data software
and Kubernetes
○ Backup/restore
○ Scale up/down
○ Rebalance data
Observe
Analyze
Act
Basics
Demo
Transforming How The World Builds Software
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.

More Related Content

What's hot

Migrating from oracle soa suite to microservices on kubernetes
Migrating from oracle soa suite to microservices on kubernetesMigrating from oracle soa suite to microservices on kubernetes
Migrating from oracle soa suite to microservices on kubernetesKonveyor Community
 
Deploying Kong with Mesosphere DC/OS
Deploying Kong with Mesosphere DC/OSDeploying Kong with Mesosphere DC/OS
Deploying Kong with Mesosphere DC/OSMesosphere Inc.
 
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s worldDávid Kőszeghy
 
Introduction to container mangement
Introduction to container mangementIntroduction to container mangement
Introduction to container mangementMartin Marcher
 
Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To FlinkKnoldus Inc.
 
Building Developer Pipelines with PKS, Harbor, Clair, and Concourse
Building Developer Pipelines with PKS, Harbor, Clair, and ConcourseBuilding Developer Pipelines with PKS, Harbor, Clair, and Concourse
Building Developer Pipelines with PKS, Harbor, Clair, and ConcourseVMware Tanzu
 
Intro to GKE and app deployment with Kubernetes
Intro to GKE and app deployment with KubernetesIntro to GKE and app deployment with Kubernetes
Intro to GKE and app deployment with KubernetesGDG Cloud Bengaluru
 
GitOps is the best modern practice for CD with Kubernetes
GitOps is the best modern practice for CD with KubernetesGitOps is the best modern practice for CD with Kubernetes
GitOps is the best modern practice for CD with KubernetesVolodymyr Shynkar
 
Cloud Native Java Development Patterns
Cloud Native Java Development PatternsCloud Native Java Development Patterns
Cloud Native Java Development PatternsBilgin Ibryam
 
Deploying Anything as a Service (XaaS) Using Operators on Kubernetes
Deploying Anything as a Service (XaaS) Using Operators on KubernetesDeploying Anything as a Service (XaaS) Using Operators on Kubernetes
Deploying Anything as a Service (XaaS) Using Operators on KubernetesAll Things Open
 
Akri cncf-jobs-webinar-final
Akri cncf-jobs-webinar-finalAkri cncf-jobs-webinar-final
Akri cncf-jobs-webinar-finalLibbySchulze1
 
Mc git ops_incorpbackups_kanister
Mc git ops_incorpbackups_kanisterMc git ops_incorpbackups_kanister
Mc git ops_incorpbackups_kanisterLibbySchulze
 
Cloud-Native Modernization or Death? A false dichotomy. | DevNation Tech Talk
Cloud-Native Modernization or Death? A false dichotomy. | DevNation Tech TalkCloud-Native Modernization or Death? A false dichotomy. | DevNation Tech Talk
Cloud-Native Modernization or Death? A false dichotomy. | DevNation Tech TalkRed Hat Developers
 
Zero-downtime deployment of Micro-services with Kubernetes
Zero-downtime deployment of Micro-services with KubernetesZero-downtime deployment of Micro-services with Kubernetes
Zero-downtime deployment of Micro-services with KubernetesWojciech Barczyński
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger
 
Tectonic Summit 2016: Brandon Philips, CTO of CoreOS, Keynote
Tectonic Summit 2016: Brandon Philips, CTO of CoreOS, KeynoteTectonic Summit 2016: Brandon Philips, CTO of CoreOS, Keynote
Tectonic Summit 2016: Brandon Philips, CTO of CoreOS, KeynoteCoreOS
 

What's hot (20)

Migrating from oracle soa suite to microservices on kubernetes
Migrating from oracle soa suite to microservices on kubernetesMigrating from oracle soa suite to microservices on kubernetes
Migrating from oracle soa suite to microservices on kubernetes
 
Deploying Kong with Mesosphere DC/OS
Deploying Kong with Mesosphere DC/OSDeploying Kong with Mesosphere DC/OS
Deploying Kong with Mesosphere DC/OS
 
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
 
Introduction to container mangement
Introduction to container mangementIntroduction to container mangement
Introduction to container mangement
 
Introduction to helm
Introduction to helmIntroduction to helm
Introduction to helm
 
Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To Flink
 
Building Developer Pipelines with PKS, Harbor, Clair, and Concourse
Building Developer Pipelines with PKS, Harbor, Clair, and ConcourseBuilding Developer Pipelines with PKS, Harbor, Clair, and Concourse
Building Developer Pipelines with PKS, Harbor, Clair, and Concourse
 
K8s vs Cloud Foundry
K8s vs Cloud FoundryK8s vs Cloud Foundry
K8s vs Cloud Foundry
 
Cloudify 4.5 Webinar
Cloudify 4.5 WebinarCloudify 4.5 Webinar
Cloudify 4.5 Webinar
 
Intro to GKE and app deployment with Kubernetes
Intro to GKE and app deployment with KubernetesIntro to GKE and app deployment with Kubernetes
Intro to GKE and app deployment with Kubernetes
 
GitOps is the best modern practice for CD with Kubernetes
GitOps is the best modern practice for CD with KubernetesGitOps is the best modern practice for CD with Kubernetes
GitOps is the best modern practice for CD with Kubernetes
 
Cloud Native Java Development Patterns
Cloud Native Java Development PatternsCloud Native Java Development Patterns
Cloud Native Java Development Patterns
 
Deploying Anything as a Service (XaaS) Using Operators on Kubernetes
Deploying Anything as a Service (XaaS) Using Operators on KubernetesDeploying Anything as a Service (XaaS) Using Operators on Kubernetes
Deploying Anything as a Service (XaaS) Using Operators on Kubernetes
 
Akri cncf-jobs-webinar-final
Akri cncf-jobs-webinar-finalAkri cncf-jobs-webinar-final
Akri cncf-jobs-webinar-final
 
Mc git ops_incorpbackups_kanister
Mc git ops_incorpbackups_kanisterMc git ops_incorpbackups_kanister
Mc git ops_incorpbackups_kanister
 
Cloud-Native Modernization or Death? A false dichotomy. | DevNation Tech Talk
Cloud-Native Modernization or Death? A false dichotomy. | DevNation Tech TalkCloud-Native Modernization or Death? A false dichotomy. | DevNation Tech Talk
Cloud-Native Modernization or Death? A false dichotomy. | DevNation Tech Talk
 
Zero-downtime deployment of Micro-services with Kubernetes
Zero-downtime deployment of Micro-services with KubernetesZero-downtime deployment of Micro-services with Kubernetes
Zero-downtime deployment of Micro-services with Kubernetes
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Deploy prometheus on kubernetes
Deploy prometheus on kubernetesDeploy prometheus on kubernetes
Deploy prometheus on kubernetes
 
Tectonic Summit 2016: Brandon Philips, CTO of CoreOS, Keynote
Tectonic Summit 2016: Brandon Philips, CTO of CoreOS, KeynoteTectonic Summit 2016: Brandon Philips, CTO of CoreOS, Keynote
Tectonic Summit 2016: Brandon Philips, CTO of CoreOS, Keynote
 

Similar to Getting Started with Kafka on k8s

Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedGuozhang Wang
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaRicardo Bravo
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...HostedbyConfluent
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpJosé Román Martín Gil
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafkaSamuel Kerrien
 
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...HostedbyConfluent
 
Cloud Messaging Service: Technical Overview
Cloud Messaging Service: Technical OverviewCloud Messaging Service: Technical Overview
Cloud Messaging Service: Technical OverviewMessaging Meetup
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupSnehal Nagmote
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMEconfluent
 
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...HostedbyConfluent
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...HostedbyConfluent
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaQAware GmbH
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLEdunomica
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to KafkaDucas Francis
 

Similar to Getting Started with Kafka on k8s (20)

Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...
 
Cloud Messaging Service: Technical Overview
Cloud Messaging Service: Technical OverviewCloud Messaging Service: Technical Overview
Cloud Messaging Service: Technical Overview
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
 
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with Kafka
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 

More from VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durbanmasabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 

Recently uploaded (20)

%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 

Getting Started with Kafka on k8s

  • 1. © Copyright 2019 Pivotal Software, Inc. All rights Reserved. Getting Started w/ Kafka On K8s Rohini Rajaram Sr. Platform Architect, Pivotal rrajaram@pivotal.io July 2019
  • 2. Cover w/ Image Agenda ■ Kafka Fundamentals - Pub/Sub Done Right ■ Kafka On K8s ■ Building Event Driven Systems ■ Demo ○ Provision a Kafka Cluster On PKS
  • 3. Data Infrastructure - Point To Point
  • 4. Data Infrastructure - Centralized Data Pipeline
  • 5. Messaging Systems Why not traditional messaging systems for the centralized pipeline? Transient Vs Durable Messages Consumer Publish - Push vs Pull Based Mechanism Offset Tracking - Replay Messages On Consumer Failures Horizontal Scalability Distributed - Partitioning & Replication
  • 6. Key Ideas Key Idea 1: Data parallelism leads to scale out Randomly distribute clients across partitions Key Idea 2: Disks are fast when used sequentially Store messages as a write ahead log Key Idea 3: Batching makes best use of the network Batched transfer, compression, no JVM caching (low memory footprint) & Zero Copy
  • 7. Architecture Overview Scale Out Architecture Producer Producer Producer Producer Consumer Consumer Consumer Consumer Kafka Broker Cluster Topic Partitions
  • 8. Producer Consumer Broker 0 Broker 1 Broker 2 Broker 3 Broker 4 Broker n Storage Distributed Commit Log Architecture Overview
  • 9. Message Offset Msg. Length CRC Magic Attr Timestamp Key Len Key Value Len Value 8 bytes 4 bytes 4 bytes 1 byte 1 byte 8 bytes 4 bytes Varying 4 bytes Varying Bit 0-2 0 – No Compression 1 – gzip 2 – Snappy
  • 10. Topics 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 P0 P1 P2 Writes Broker 1 Broker 2 Broker 3 Node 1 P0 P1 P2 Topic Logs P0 P0P1 P2 P2 P1 Node 2 Node 3
  • 11. Distributed Commit Log 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 P0 P1 P2 Producer writes Consumer A Consumer B readsreads
  • 12. Partitions & Segments Kafka | partition-0 | segment0.log segment0.index segment5.log segment5.index Segment3065011416.log Starting offset: 3065011416 offset: 3065011416 position: 0 isvalid: true payloadsize: 2020 magic: 1 compresscodec: NoCompressionCodec crc: 811055132 payload: {"name": ”Smith", msg: "Hello world"} offset: 3065011417 position: 1779 isvalid: true payloadsize: 2244 magic: 1 compresscodec: NoCompressionCodec crc: 151590202 payload: {"name": ”James", msg: ”Hello to all of you!"} Segment3065011416.index Offset (rel. to Base). Position (on the log) 0 0 1 1779 0 1 2 3 4 5 6 7 8 9 writes Active Segment
  • 13. Why File System & Not Memory? Lean differences with sequential access b/w file system & memory speeds Kafka runs on JVM ● Heavy object overheads for data stored in memory ● Increased GC Time
  • 14. Zero Copy Page Cache Socket Buffer NIC Buffer Application Context Kernel Context User Space Buffer OS send-file
  • 15. Brokers Cluster Aware Receives messages from Producers, Assigns Offset & Writes To Disk Fetches Messages for consumers reading partitions & responding with committed messages. One elected as Controller - Admin, assigns partitions to brokers & Monitoring Topic Retention - Time or Size Based Topic A Partition 0 Topic A Partition 1 Topic A Partition 0 Topic A Partition 1 Broker 0 (Controller) Broker 1 Leader LeaderReplica Replica Kafka Cluster Producer Consumer Messages for A/0 Messages for A/1 Messages from A/0 Messages for A/1
  • 16. Producers Producers accept a ProducerRecord ProducerRecord Key & Values are serialized into byte array by Serializer Partitioner - Chooses partition by key if not specified & adds record to a specific batch for the partition Separate threads handles sending batches to the brokers Three Methods: 1. Fire & Forget 2. Synchronous 3. Asynchronous
  • 17. Consumers Consumer Groups For Consumption Scaling Topic Partitions distributed among consumers in a group Partitions are rebalanced on consumer additions or crashes (consumer unavailability & loss of consumer cache)
  • 18. Replication Broker 1 Broker 2 Broker 3 P1 P2 P3 P1 P1 P3 P2P1 P2 Producer Topic A P1 Leader P1 Followers P1 P2 P3 P1 P1 P3 P2P1 P2 Topic B Leader Followers/ISR
  • 20. StorageClass ● Dynamic provisioning persistent volumes ● Allows admins to define different class of storage to offer ○ aws-ebs ○ azure-disk ○ gce-pd ○ vsphere-volume ○ portworx-volume StorageClassName (provisioner=...) Pod Persistent Volume Claim Container
  • 21. Pod-0 StatefulSet ... StatefulSet ● Stable, unique network identifiers ● Stable, persistent storage ● Ordered, graceful deployment and scaling ● Ordered, automated rolling update Pod-N
  • 22. Custom Controller + Custom Resource Operator StatefulSet Custom Resource ... ...Deployment... ReplicaSet StatefulSet Controller Custom Controller ... ... Deployment Controller ... ReplicaSet Controller
  • 23. Operator Pattern ● Kubernetes Native ○ Custom Resource + Custom Controller ● Embedded with operational knowledge of both data software and Kubernetes ○ Backup/restore ○ Scale up/down ○ Rebalance data Observe Analyze Act
  • 25. Transforming How The World Builds Software © Copyright 2019 Pivotal Software, Inc. All rights Reserved.

Editor's Notes

  1. The story of Kafka began at LinkedIn, where their engineering team was challenged with the task of redefining their infrastructure. Breaking down monolithic applications into microservices allowed LinkedIn to scale their search, profile, communications, etc efficiently. However there was a need to share to data among these different services. Data sources: 1. User Activity - Page views, Ad Impressions, etc 2. Server Log Metrics & Monitoring Data 3. Computationally derived data from downstream systems. Data driven products: 1. Recommendation Engine - Connections, Endorsements 2. Profile Stats - How many searches did you appear in this week?, Who viewed your profile? 3. Visualizations - Graphs showing increase or dip in profile views 4. Infrastructure Monitoring - Restarts, Upgrades, Utilizations etc. Challenges: Varied Systems/Applications, Volume & Durability
  2. What did this unified log/data pipeline model bring? Decouple producers & consumers Simplified addition of new producers or consumers. Single source of truth for real time and batch processing applications. Distributed scaling for varying volume demands
  3. Traditional messaging systems follows a push based mechanism, pushing messages to consumers. If the speed of the push is faster than the consumer processing speed, then consumers may become overwhelmed (back pressure protocol). It does not offer a centralized data pipeline for addressing real time and batch processing consumers unanimously. RabbitMQ uses a push model and prevents overwhelming consumers via the consumer configured prefetch limit. This is great for low latency messaging and works well for RabbitMQ's queue based architecture. Kafka on the other hand uses a pull model where consumers request batches of messages from a given offset. To avoid tight loops when no messages exist beyond the current offset Kafka allows for long-polling. A pull model makes sense for Kafka due to its partitions. As Kafka guarantees message order in a partition with no competing consumers, we can leverage the batching of messages for a more efficient message delivery that gives us higher throughput. Messages are transient at an exchange/routing which assumes availability of consumers as well as removed post consumption by the consumer. Distributed: A distributed system in its most simplest definition is a group of computers working together as to appear as a single computer to the end-user.These machines have a shared state, operate concurrently and can fail independently without affecting the whole system’s uptime.
  4. To achieve high throughput, Kafka implements a three tiered architecture, and can scale out any of the tiers. In the middle, we have the Kafka cluster which runs one or more brokers, the producer tier that publishes data into the cluster and the consumer tier consumes data from the cluster. Topics in Kafka are logical category to which messages/records are published and consumed from. The little blue boxes in the Kafka brokers represent partitions of a topic. A topic can be partitioned into multiple topic partitions and a topic partition is the unit that is distributed to the brokers.
  5. A message is the unit of data within Kafka aka record. A message is simply an array of bytes as far as Kafka is concerned, so the data contained within it does not have a specific format or meaning to Kafka. A message can have an optional key which provides the option of controlled distribution to partitions. The hash of the key is computed and the hash modulo based on the number of partitions for the topic. Example: Assuming there are 5 partitions and the hash of keys compute as: 0%5=0 2%5=2 4%5=4 7%5=2 9%5=4 10%5=0
  6. A topic is a named stream of records/messages. Topics are stored in commit logs as partition segments. Topic partitions are units of parallelism. Record orders are guaranteed only within a partition. Records in a partition are appended in a sequential order and are assigned a sequential id called Offset. Offsets are ever growing numbers. The offset identifies each record location within a partition.
  7. A commit log is not a new concept. Its been used long enough in the database market for writing out information about the records they will be modifying, before applying the changes to all the various data structures it maintains - transaction logs. The log is the record of what happened, and each table or index is a projection of this history into some useful data structure or index. Since the log is immediately persisted it is used as the authoritative source in restoring all other persistent structures in the event of a crash. Eventually logs were used for replicating data between databases. Many databases allow transmitting portions of log to replica databases.
  8. To handle retention, Kafka often needs to find messages that need to be purged. With a single long partition, this is going to be slow. A partition is therefore segmented into multiple segments. On a disk a partition is directory and each segment is a commit log. When Kafka writes to a partition, it writes to a segment — the active segment. If the segment’s size limit is reached, a new segment is opened and that becomes the new active segment.Segments are named by their base offset. The base offset of a segment is an offset greater than offsets in previous segments and less than or equal to offsets in that segment. Each message is its value, offset, timestamp, key, message size, compression codec, checksum, and version of the message format. The data format on disk is exactly the same as what the broker receives from the producer over the network and sends to its consumers. This allows Kafka to efficiently transfer data with zero copy. Segments are two files: its log and index The segment index maps offsets to their message positions in the segment. The index file is memory mapped, and the offset look up uses binary search to find the nearest offset less than or equal to the target offset.The index file is made up of 8 byte entries, 4 bytes to store the offset relative to the base offset and 4 bytes to store the position. The offset is relative to the base offset so that only 4 bytes is needed to store the offset. For example: let’s say the base offset is 10000000000000000000, rather than having to store subsequent offsets 10000000000000000001 and 10000000000000000002 they are just 1 and 2.
  9. Kafka relies on the filesystem for the storage and caching. The problem is disks are slower than RAM. This is because the seek-time through a disk is large compared to the time required for actually reading the data.But if you can avoid seeking, then you can achieve latencies as low as RAM in some cases. This is done by Kafka through Sequential I/O.One advantage of Sequential I/O is you get a cache without writing any logic in your application for it. Modern operating systems allocate most of their free memory to disk-caching. So, if you are reading in an ordered fashion, the OS can always read-ahead and store data in a cache on each disk read. This is much better than maintaining a cache in a JVM application. This is because JVM objects are “heavy” and can lead to high garbage collection, which becomes worse as data size increases.
  10. One of the major inefficiencies of data processing system is the serialization and deserialization of data into formats suitable for storing & transmitting (JSON). How does Kafka avoid this? By using the standardized binary data format between producers, consumers and brokers. Zero Copy Keeping data in the same format as it would be sent over the network helps copying directly from page cache to socket buffer removing the application context
  11. The controller is responsible for administrative operations, including assigning partitions to brokers and monitoring for broker failures. A partition is owned by a single broker in the cluster, and that broker is called the leader of the partition. A partition may be assigned to multiple brokers, which will result in the partition being replicated. This provides redundancy of messages in the partition, such that another broker can take over leadership if there is a broker failure. However, all consumers and producers operating on that partition must connect to the leader. Kafka brokers are configured with a default retention setting for topics, either retaining messages for some period of time (e.g., 7 days) or until the topic reaches a certain size in bytes (e.g., 1 GB). Once these limits are reached, messages are expired and deleted so that the retention configuration is a minimum amount of data available at any time. Individual topics can also be configured with their own retention settings so that messages are stored for only as long as they are useful.
  12. Producers balance load as described above. Fire-and-forget We send a message to the server and don’t really care if it arrives succesfully or not. Most of the time, it will arrive successfully, since Kafka is highly available and the producer will retry sending messages automatically. However, some messages will get lost using this method. Synchronous send We send a message, the send() method returns a Future object, and we use get() to wait on the future and see if the send() was successful or not. Asynchronous send We call the send() method with a callback function, which gets triggered when it receives a response from the Kafka broker. A producer object can be used by multiple threads to send messages. Its typical to start with one producer and one thread. If better throughput is needed, more threads that use the same producer can be added. Once this ceases to increase throughput, more producers to the application to achieve even higher throughput can be added.
  13. If we add more consumers to a single group with a single topic than we have partitions, some of the consumers will be idle and get no messages at all. The main way we scale data consumption from a Kafka topic is by adding more consumers to a consumer group. It is common for Kafka consumers to do high-latency operations such as write to a database or a time-consuming computation on the data. In these cases, a single consumer can’t possibly keep up with the rate data flows into a topic, and adding more consumers that share the load by having each consumer own just a subset of the partitions and messages is our main method of scaling. This is a good reason to create topics with a large number of partitions—it allows adding more consumers when the load increases. In addition to having multiple consumers within a group, we may also have multiple consumer groups to the same topic. Kafka scales to large number of consumer groups and consumers without impacting performance.