SlideShare a Scribd company logo
1 of 43
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Enterprise Kafka
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Why Am I Here?
 You want to find out what this “Kafka” thing is
 You’re running Kafka, but you want to go big
 You’re looking for some neat whizbangs
2
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Clark Haskins
Todd Palino
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Who Are We?
 Kafka SRE at LinkedIn
 Site Reliability Engineering
– Administrators
– Architects
– Developers
 Keep the site running, always
4
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Kafka Overview
5
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
What Is Kafka?
6
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
What Is Kafka?
Broker
A
P0
A
P1
A
P0
7
Consumer
Producer
Zookeeper
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Attributes of a Kafka Cluster
 Disk Based
 Durable
 Scalable
 Low Latency
 Finite Retention
 NOT Idempotent (yet)
8
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Kafka At LinkedIn
 Multiple Datacenters, Multiple Clusters
 Mirroring between clusters
 Message Types
– Metrics
– Tracking
– Queuing
 Data transport from applications to Hadoop, and back
9
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Kafka At LinkedIn
10
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Kafka At LinkedIn
 300+ Kafka brokers
 Over 18,000 topics
 140,000+ Partitions
 220 Billion messages per day
 40 Terabytes In
 160 Terabytes Out
 Peak Load
– 3.25 Million messages per second
– 5.5 Gigabits/sec Inbound
– 18 Gigabits/sec Outbound
11
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Challenges We Have Overcome
12
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Solutions
 Kafka is young…..we Influenced development
 Operations wizardry…
13
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Hyper Growth
 Need to expand clusters to keep up with site traffic, and then balance them.
14
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Adding brokers
15
Brokers
Consumers
Producers
A
P1
A
P0
B
P1
B
P0
a
P5
A
P4
B
P5
B
P4
A
P3
A
P2
B
P3
B
P2
A
P7
A
P6
B
P7
B
P6
A
P5
A
P4
B
P5
B
P4
A
P1
A
P0
B
P1
B
P0
A
P7
A
P6
B
P7
B
P6
A
P3
A
P2
B
P3
B
P2
C
P1
C
P0
C
P3
C
P2
C
P1
C
P0
C
P3
C
P2
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Adding a broker(with broker leveling)
16
Brokers
Consumers
Producers
A
P1
A
P0
B
P1
B
P0
A
P5
A
P4
B
P5
B
P4
A
P3
A
P2
B
P3
B
P2
A
P7
A
P6
B
P7
B
P6
A
P5
A
P4
B
P5
B
P4
A
P1
A
P0
B
P1
B
P0
A
P7
A
P6
B
P7
B
P6
A
P3
A
P2
B
P3
B
P2
C
P1
C
P0
C
P3
C
P2
C
P1
C
P0
C
P3
C
P2
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Logs vs. Metrics
 Logging data killed the metrics cluster
17
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Quality of Service with Kafka
18
Brokers
Consumers
Producers
A
P1
A
P0
B
P1
B
P0
A
P5
A
P4
B
P5
B
P4
A
P3
A
P2
B
P3
B
P2
A
P7
A
P6
B
P7
B
P6
A
P5
A
P4
B
P5
B
P4
A
P1
A
P0
B
P1
B
P0
A
P7
A
P6
B
P7
B
P6
A
P3
A
P2
B
P3
B
P2
C
P1
C
P0
C
P3
C
P2
C
P1
C
P0
C
P3
C
P2
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Deployment Nightmares
 Parallel deployment wasn’t possible so…
 Babysitting sequential deployments
19
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Easy deployments
 Kafka 0.8.1 makes sure the cluster is in a good state before shutting down
– If any brokers in the cluster have under replicated partitions, Kafka will not shut
down
– Kafka ensures that only 1 broker is in shutdown sequence at a time.
20
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Killing Zookeeper
 Consumer offset management done within Zookeeper
 Every consumer committing offsets every minute for every partition makes
ZK very unhappy.
21
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Zookeeper on SSD
22
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Monitoring
23
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Kafka Is Broken!
24
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Kafka Is Broken!
 Everything is Kafka’s fault first
 What is lag?
 Consumer Problems
– Application problems
– Kafka client problems
25
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
How Do We Sleep At Night?
 Educating Users
– Why lag is their fault
 Monitoring the Ecosystem
– Kafka Brokers
– Zookeeper
– Mirror Makers
– Audit
– REST Interfaces
 Week Over Week
26
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Cluster Health and Utilization
 Under replicated partitions
 Offline partitions
 Broker partition count
 Data size on disk
 Leader partition count
 Network utilization
27
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Zookeeper
 Ensemble availability
 Latency
 Outstanding requests
28
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Mirror Maker and Audit
 Mirror Maker
– Lag
– Dropped Messages
 Audit Consumer
– Lag
– Completeness check
 Audit UI
29
Producer
Cluster ClusterMM
MessagesMessage
Counts
Audit
Consumer
All
Messages
Audit
State
Audit
Consumer
Audit
UI
Audit
State
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Audit UI
30
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Audit UI
31
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Tuning
32
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Hardware and OS
 Kernel Tuning
– Swapping is Death
– Allow more dirty pages
– Allow less dirty cache
 Disk throughput
– More spindles
– Longer commit interval
33
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Java Virtual Machine
34
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Garbage Collection
35
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Garbage Collection
 Java 7, update 51
 Garbage First (G1) Collector
– Set the heap size
– Specify a target GC pause time
– Don’t set the New size
 GC Times
– Less than 15ms per second in GC
– Steady 20-22ms GC intervals
– Almost no full GC cycles (and only 200-400ms when it does)
36
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Closing
37
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
What’s Coming in 0.8.2
 Consumer offsets in the broker
 Delete topic
 Further down the road
– New producer
– Improved producer API
38
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Upcoming Operational Work
 Learning to share
 Shrinking a cluster
 Cluster comparison
 Advanced monitoring
39
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
How Can You Get Involved?
 http://kafka.apache.org
 Join the mailing lists
– users@kafka.apache.org
 irc.freenode.net - #apache-kafka
 Contribute tools
40
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Talk To Us
 Kafka SREs at LinkedIn
– Clark Haskins
 https://www.linkedin.com/in/clarkhaskins
 chaskins@linkedin.com
– Todd Palino
 https://www.linkedin.com/in/toddpalino
 tpalino@linkedin.com
41
SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved.
Questions
42
Enterprise Kafka: Kafka as a Service

More Related Content

What's hot

Open Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesOpen Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesAll Things Open
 
Nano Server - the future of Windows Server - Thomas Maurer
Nano Server - the future of Windows Server - Thomas MaurerNano Server - the future of Windows Server - Thomas Maurer
Nano Server - the future of Windows Server - Thomas MaurerITCamp
 
Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?NGINX, Inc.
 
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Jonghyun Lee
 
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...Natan Silnitsky
 
From a monolith to microservices + REST: The evolution of LinkedIn's architec...
From a monolith to microservices + REST: The evolution of LinkedIn's architec...From a monolith to microservices + REST: The evolution of LinkedIn's architec...
From a monolith to microservices + REST: The evolution of LinkedIn's architec...Karan Parikh
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveWalid Shaari
 
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, ConfluentMaking Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, ConfluentHostedbyConfluent
 
Cloud-Native Operations with Kubernetes and CI/CD
Cloud-Native Operations with Kubernetes and CI/CDCloud-Native Operations with Kubernetes and CI/CD
Cloud-Native Operations with Kubernetes and CI/CDVMware Tanzu
 
Architecting for now & the future with NGINX London April 19
Architecting for now & the future with NGINX London April 19Architecting for now & the future with NGINX London April 19
Architecting for now & the future with NGINX London April 19NGINX, Inc.
 
What's New in Hyper-V 2016 - Thomas Maurer
What's New in Hyper-V 2016 - Thomas MaurerWhat's New in Hyper-V 2016 - Thomas Maurer
What's New in Hyper-V 2016 - Thomas MaurerITCamp
 
Comparison of Current Service Mesh Architectures
Comparison of Current Service Mesh ArchitecturesComparison of Current Service Mesh Architectures
Comparison of Current Service Mesh ArchitecturesMirantis
 
Travelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian WideraTravelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian WideraITCamp
 
Building and Evolving a Dependency-Graph Based Microservice Architecture (La...
 Building and Evolving a Dependency-Graph Based Microservice Architecture (La... Building and Evolving a Dependency-Graph Based Microservice Architecture (La...
Building and Evolving a Dependency-Graph Based Microservice Architecture (La...confluent
 
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...StreamNative
 
Kafka Summit 2019 Microservice Orchestration
Kafka Summit 2019 Microservice OrchestrationKafka Summit 2019 Microservice Orchestration
Kafka Summit 2019 Microservice Orchestrationlarsfrancke
 
Ocs F5 Bigip Bestpractices
Ocs F5 Bigip BestpracticesOcs F5 Bigip Bestpractices
Ocs F5 Bigip BestpracticesThiago Gutierri
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
Message Driven and Event Sourcing
Message Driven and Event SourcingMessage Driven and Event Sourcing
Message Driven and Event SourcingPaolo Castagna
 

What's hot (20)

Open Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesOpen Source Applied - Real World Use Cases
Open Source Applied - Real World Use Cases
 
Nano Server - the future of Windows Server - Thomas Maurer
Nano Server - the future of Windows Server - Thomas MaurerNano Server - the future of Windows Server - Thomas Maurer
Nano Server - the future of Windows Server - Thomas Maurer
 
Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?
 
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
 
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
 
From a monolith to microservices + REST: The evolution of LinkedIn's architec...
From a monolith to microservices + REST: The evolution of LinkedIn's architec...From a monolith to microservices + REST: The evolution of LinkedIn's architec...
From a monolith to microservices + REST: The evolution of LinkedIn's architec...
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
 
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, ConfluentMaking Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
 
Cloud-Native Operations with Kubernetes and CI/CD
Cloud-Native Operations with Kubernetes and CI/CDCloud-Native Operations with Kubernetes and CI/CD
Cloud-Native Operations with Kubernetes and CI/CD
 
Architecting for now & the future with NGINX London April 19
Architecting for now & the future with NGINX London April 19Architecting for now & the future with NGINX London April 19
Architecting for now & the future with NGINX London April 19
 
What's New in Hyper-V 2016 - Thomas Maurer
What's New in Hyper-V 2016 - Thomas MaurerWhat's New in Hyper-V 2016 - Thomas Maurer
What's New in Hyper-V 2016 - Thomas Maurer
 
Comparison of Current Service Mesh Architectures
Comparison of Current Service Mesh ArchitecturesComparison of Current Service Mesh Architectures
Comparison of Current Service Mesh Architectures
 
Travelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian WideraTravelling in time with SQL Server 2016 - Damian Widera
Travelling in time with SQL Server 2016 - Damian Widera
 
Building and Evolving a Dependency-Graph Based Microservice Architecture (La...
 Building and Evolving a Dependency-Graph Based Microservice Architecture (La... Building and Evolving a Dependency-Graph Based Microservice Architecture (La...
Building and Evolving a Dependency-Graph Based Microservice Architecture (La...
 
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...
Building the Next-Generation Messaging Platform on Pulsar at Intuit - Pulsar ...
 
Kafka Summit 2019 Microservice Orchestration
Kafka Summit 2019 Microservice OrchestrationKafka Summit 2019 Microservice Orchestration
Kafka Summit 2019 Microservice Orchestration
 
Ocs F5 Bigip Bestpractices
Ocs F5 Bigip BestpracticesOcs F5 Bigip Bestpractices
Ocs F5 Bigip Bestpractices
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
Message Driven and Event Sourcing
Message Driven and Event SourcingMessage Driven and Event Sourcing
Message Driven and Event Sourcing
 

Similar to Enterprise Kafka: Kafka as a Service

Kafka overview and use cases
Kafka overview and use casesKafka overview and use cases
Kafka overview and use casesIndrajeet Kumar
 
WebRTC Infrastructure the Hard Parts: Media
WebRTC Infrastructure the Hard Parts: MediaWebRTC Infrastructure the Hard Parts: Media
WebRTC Infrastructure the Hard Parts: MediaDialogic Inc.
 
Linked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaLinked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaNitin Kumar
 
Web rtc infrastructure the hard parts v4
Web rtc infrastructure the hard parts v4Web rtc infrastructure the hard parts v4
Web rtc infrastructure the hard parts v4Dialogic Inc.
 
Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...
Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...
Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...Cisco Canada
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Continuous Delivery pour vos applications avec Cloud Foundry et Jenkins
Continuous Delivery pour vos applications avec Cloud Foundry et JenkinsContinuous Delivery pour vos applications avec Cloud Foundry et Jenkins
Continuous Delivery pour vos applications avec Cloud Foundry et JenkinsErwan Bornier
 
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & Infostretch
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & InfostretchApp-First & Cloud-Native: How InterMiles Boosted CX with AWS & Infostretch
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & InfostretchInfostretch
 
Cisco Meraki - Simplifying Powerful Technology
Cisco Meraki - Simplifying Powerful TechnologyCisco Meraki - Simplifying Powerful Technology
Cisco Meraki - Simplifying Powerful TechnologyCisco Canada
 
Concevoir et déployer vos applications a base de microservices sur Cloud Foundry
Concevoir et déployer vos applications a base de microservices sur Cloud FoundryConcevoir et déployer vos applications a base de microservices sur Cloud Foundry
Concevoir et déployer vos applications a base de microservices sur Cloud FoundryVMware Tanzu
 
Build your first IoT device - The tricky interface of Product and R&D with Ni...
Build your first IoT device - The tricky interface of Product and R&D with Ni...Build your first IoT device - The tricky interface of Product and R&D with Ni...
Build your first IoT device - The tricky interface of Product and R&D with Ni...Product of Things
 
INTERFACE, by apidays - Design and Build Great Web APIs
INTERFACE, by apidays - Design and Build Great Web APIsINTERFACE, by apidays - Design and Build Great Web APIs
INTERFACE, by apidays - Design and Build Great Web APIsapidays
 
IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
 IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a... IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...HelpSystems
 
What does it take to be an architect
What does it take to be an architectWhat does it take to be an architect
What does it take to be an architectConstantine Slisenka
 
What does it take to be architect (for Cjicago JUG)
What does it take to be architect (for Cjicago JUG)What does it take to be architect (for Cjicago JUG)
What does it take to be architect (for Cjicago JUG)Constantine Slisenka
 
Vbrownbag container networking for real workloads
Vbrownbag container networking for real workloadsVbrownbag container networking for real workloads
Vbrownbag container networking for real workloadsCisco DevNet
 
apidays LIVE New York - Building Great Web APIs by Mike Amundsen
apidays LIVE New York - Building Great Web APIs by Mike Amundsenapidays LIVE New York - Building Great Web APIs by Mike Amundsen
apidays LIVE New York - Building Great Web APIs by Mike Amundsenapidays
 
Bringing Partners, Teams & Systems Together through APIs
Bringing Partners, Teams & Systems Together through APIsBringing Partners, Teams & Systems Together through APIs
Bringing Partners, Teams & Systems Together through APIsApigee | Google Cloud
 
Cisco Connect Ottawa 2018 dna assurance shortest path to network innocence
Cisco Connect Ottawa 2018 dna assurance shortest path to network innocenceCisco Connect Ottawa 2018 dna assurance shortest path to network innocence
Cisco Connect Ottawa 2018 dna assurance shortest path to network innocenceCisco Canada
 

Similar to Enterprise Kafka: Kafka as a Service (20)

Kafka overview and use cases
Kafka overview and use casesKafka overview and use cases
Kafka overview and use cases
 
WebRTC Infrastructure the Hard Parts: Media
WebRTC Infrastructure the Hard Parts: MediaWebRTC Infrastructure the Hard Parts: Media
WebRTC Infrastructure the Hard Parts: Media
 
Scribe Online CDK & Connector Development
Scribe Online CDK & Connector DevelopmentScribe Online CDK & Connector Development
Scribe Online CDK & Connector Development
 
Linked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafkaLinked in multi tier, multi-tenant, multi-problem kafka
Linked in multi tier, multi-tenant, multi-problem kafka
 
Web rtc infrastructure the hard parts v4
Web rtc infrastructure the hard parts v4Web rtc infrastructure the hard parts v4
Web rtc infrastructure the hard parts v4
 
Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...
Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...
Cisco Connect Vancouver 2017 - Cisco's Digital Network Architecture - deeper ...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Continuous Delivery pour vos applications avec Cloud Foundry et Jenkins
Continuous Delivery pour vos applications avec Cloud Foundry et JenkinsContinuous Delivery pour vos applications avec Cloud Foundry et Jenkins
Continuous Delivery pour vos applications avec Cloud Foundry et Jenkins
 
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & Infostretch
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & InfostretchApp-First & Cloud-Native: How InterMiles Boosted CX with AWS & Infostretch
App-First & Cloud-Native: How InterMiles Boosted CX with AWS & Infostretch
 
Cisco Meraki - Simplifying Powerful Technology
Cisco Meraki - Simplifying Powerful TechnologyCisco Meraki - Simplifying Powerful Technology
Cisco Meraki - Simplifying Powerful Technology
 
Concevoir et déployer vos applications a base de microservices sur Cloud Foundry
Concevoir et déployer vos applications a base de microservices sur Cloud FoundryConcevoir et déployer vos applications a base de microservices sur Cloud Foundry
Concevoir et déployer vos applications a base de microservices sur Cloud Foundry
 
Build your first IoT device - The tricky interface of Product and R&D with Ni...
Build your first IoT device - The tricky interface of Product and R&D with Ni...Build your first IoT device - The tricky interface of Product and R&D with Ni...
Build your first IoT device - The tricky interface of Product and R&D with Ni...
 
INTERFACE, by apidays - Design and Build Great Web APIs
INTERFACE, by apidays - Design and Build Great Web APIsINTERFACE, by apidays - Design and Build Great Web APIs
INTERFACE, by apidays - Design and Build Great Web APIs
 
IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
 IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a... IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
IBM i Development: Increase Accuracy and Efficiency with SEQUEL's ABSTRACT a...
 
What does it take to be an architect
What does it take to be an architectWhat does it take to be an architect
What does it take to be an architect
 
What does it take to be architect (for Cjicago JUG)
What does it take to be architect (for Cjicago JUG)What does it take to be architect (for Cjicago JUG)
What does it take to be architect (for Cjicago JUG)
 
Vbrownbag container networking for real workloads
Vbrownbag container networking for real workloadsVbrownbag container networking for real workloads
Vbrownbag container networking for real workloads
 
apidays LIVE New York - Building Great Web APIs by Mike Amundsen
apidays LIVE New York - Building Great Web APIs by Mike Amundsenapidays LIVE New York - Building Great Web APIs by Mike Amundsen
apidays LIVE New York - Building Great Web APIs by Mike Amundsen
 
Bringing Partners, Teams & Systems Together through APIs
Bringing Partners, Teams & Systems Together through APIsBringing Partners, Teams & Systems Together through APIs
Bringing Partners, Teams & Systems Together through APIs
 
Cisco Connect Ottawa 2018 dna assurance shortest path to network innocence
Cisco Connect Ottawa 2018 dna assurance shortest path to network innocenceCisco Connect Ottawa 2018 dna assurance shortest path to network innocence
Cisco Connect Ottawa 2018 dna assurance shortest path to network innocence
 

More from Todd Palino

Leading Without Managing: Becoming an SRE Technical Leader
Leading Without Managing: Becoming an SRE Technical LeaderLeading Without Managing: Becoming an SRE Technical Leader
Leading Without Managing: Becoming an SRE Technical LeaderTodd Palino
 
From Operations to Site Reliability in Five Easy Steps
From Operations to Site Reliability in Five Easy StepsFrom Operations to Site Reliability in Five Easy Steps
From Operations to Site Reliability in Five Easy StepsTodd Palino
 
Code Yellow: Helping Operations Top-Heavy Teams the Smart Way
Code Yellow: Helping Operations Top-Heavy Teams the Smart WayCode Yellow: Helping Operations Top-Heavy Teams the Smart Way
Code Yellow: Helping Operations Top-Heavy Teams the Smart WayTodd Palino
 
Why Does (My) Monitoring Suck?
Why Does (My) Monitoring Suck?Why Does (My) Monitoring Suck?
Why Does (My) Monitoring Suck?Todd Palino
 
URP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowURP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowTodd Palino
 
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...Todd Palino
 
Running Kafka for Maximum Pain
Running Kafka for Maximum PainRunning Kafka for Maximum Pain
Running Kafka for Maximum PainTodd Palino
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into OverdriveTodd Palino
 
Tuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTodd Palino
 

More from Todd Palino (9)

Leading Without Managing: Becoming an SRE Technical Leader
Leading Without Managing: Becoming an SRE Technical LeaderLeading Without Managing: Becoming an SRE Technical Leader
Leading Without Managing: Becoming an SRE Technical Leader
 
From Operations to Site Reliability in Five Easy Steps
From Operations to Site Reliability in Five Easy StepsFrom Operations to Site Reliability in Five Easy Steps
From Operations to Site Reliability in Five Easy Steps
 
Code Yellow: Helping Operations Top-Heavy Teams the Smart Way
Code Yellow: Helping Operations Top-Heavy Teams the Smart WayCode Yellow: Helping Operations Top-Heavy Teams the Smart Way
Code Yellow: Helping Operations Top-Heavy Teams the Smart Way
 
Why Does (My) Monitoring Suck?
Why Does (My) Monitoring Suck?Why Does (My) Monitoring Suck?
Why Does (My) Monitoring Suck?
 
URP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowURP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to Know
 
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
Redefine Operations in a DevOps World: The New Role for Site Reliability Eng...
 
Running Kafka for Maximum Pain
Running Kafka for Maximum PainRunning Kafka for Maximum Pain
Running Kafka for Maximum Pain
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
Tuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTuning Kafka for Fun and Profit
Tuning Kafka for Fun and Profit
 

Recently uploaded

IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 

Recently uploaded (20)

IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 

Enterprise Kafka: Kafka as a Service

  • 1. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Enterprise Kafka
  • 2. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Why Am I Here?  You want to find out what this “Kafka” thing is  You’re running Kafka, but you want to go big  You’re looking for some neat whizbangs 2
  • 3. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Clark Haskins Todd Palino
  • 4. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Who Are We?  Kafka SRE at LinkedIn  Site Reliability Engineering – Administrators – Architects – Developers  Keep the site running, always 4
  • 5. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Kafka Overview 5
  • 6. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. What Is Kafka? 6
  • 7. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. What Is Kafka? Broker A P0 A P1 A P0 7 Consumer Producer Zookeeper
  • 8. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Attributes of a Kafka Cluster  Disk Based  Durable  Scalable  Low Latency  Finite Retention  NOT Idempotent (yet) 8
  • 9. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Kafka At LinkedIn  Multiple Datacenters, Multiple Clusters  Mirroring between clusters  Message Types – Metrics – Tracking – Queuing  Data transport from applications to Hadoop, and back 9
  • 10. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Kafka At LinkedIn 10
  • 11. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Kafka At LinkedIn  300+ Kafka brokers  Over 18,000 topics  140,000+ Partitions  220 Billion messages per day  40 Terabytes In  160 Terabytes Out  Peak Load – 3.25 Million messages per second – 5.5 Gigabits/sec Inbound – 18 Gigabits/sec Outbound 11
  • 12. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Challenges We Have Overcome 12
  • 13. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Solutions  Kafka is young…..we Influenced development  Operations wizardry… 13
  • 14. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Hyper Growth  Need to expand clusters to keep up with site traffic, and then balance them. 14
  • 15. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Adding brokers 15 Brokers Consumers Producers A P1 A P0 B P1 B P0 a P5 A P4 B P5 B P4 A P3 A P2 B P3 B P2 A P7 A P6 B P7 B P6 A P5 A P4 B P5 B P4 A P1 A P0 B P1 B P0 A P7 A P6 B P7 B P6 A P3 A P2 B P3 B P2 C P1 C P0 C P3 C P2 C P1 C P0 C P3 C P2
  • 16. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Adding a broker(with broker leveling) 16 Brokers Consumers Producers A P1 A P0 B P1 B P0 A P5 A P4 B P5 B P4 A P3 A P2 B P3 B P2 A P7 A P6 B P7 B P6 A P5 A P4 B P5 B P4 A P1 A P0 B P1 B P0 A P7 A P6 B P7 B P6 A P3 A P2 B P3 B P2 C P1 C P0 C P3 C P2 C P1 C P0 C P3 C P2
  • 17. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Logs vs. Metrics  Logging data killed the metrics cluster 17
  • 18. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Quality of Service with Kafka 18 Brokers Consumers Producers A P1 A P0 B P1 B P0 A P5 A P4 B P5 B P4 A P3 A P2 B P3 B P2 A P7 A P6 B P7 B P6 A P5 A P4 B P5 B P4 A P1 A P0 B P1 B P0 A P7 A P6 B P7 B P6 A P3 A P2 B P3 B P2 C P1 C P0 C P3 C P2 C P1 C P0 C P3 C P2
  • 19. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Deployment Nightmares  Parallel deployment wasn’t possible so…  Babysitting sequential deployments 19
  • 20. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Easy deployments  Kafka 0.8.1 makes sure the cluster is in a good state before shutting down – If any brokers in the cluster have under replicated partitions, Kafka will not shut down – Kafka ensures that only 1 broker is in shutdown sequence at a time. 20
  • 21. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Killing Zookeeper  Consumer offset management done within Zookeeper  Every consumer committing offsets every minute for every partition makes ZK very unhappy. 21
  • 22. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Zookeeper on SSD 22
  • 23. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Monitoring 23
  • 24. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Kafka Is Broken! 24
  • 25. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Kafka Is Broken!  Everything is Kafka’s fault first  What is lag?  Consumer Problems – Application problems – Kafka client problems 25
  • 26. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. How Do We Sleep At Night?  Educating Users – Why lag is their fault  Monitoring the Ecosystem – Kafka Brokers – Zookeeper – Mirror Makers – Audit – REST Interfaces  Week Over Week 26
  • 27. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Cluster Health and Utilization  Under replicated partitions  Offline partitions  Broker partition count  Data size on disk  Leader partition count  Network utilization 27
  • 28. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Zookeeper  Ensemble availability  Latency  Outstanding requests 28
  • 29. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Mirror Maker and Audit  Mirror Maker – Lag – Dropped Messages  Audit Consumer – Lag – Completeness check  Audit UI 29 Producer Cluster ClusterMM MessagesMessage Counts Audit Consumer All Messages Audit State Audit Consumer Audit UI Audit State
  • 30. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Audit UI 30
  • 31. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Audit UI 31
  • 32. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Tuning 32
  • 33. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Hardware and OS  Kernel Tuning – Swapping is Death – Allow more dirty pages – Allow less dirty cache  Disk throughput – More spindles – Longer commit interval 33
  • 34. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Java Virtual Machine 34
  • 35. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Garbage Collection 35
  • 36. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Garbage Collection  Java 7, update 51  Garbage First (G1) Collector – Set the heap size – Specify a target GC pause time – Don’t set the New size  GC Times – Less than 15ms per second in GC – Steady 20-22ms GC intervals – Almost no full GC cycles (and only 200-400ms when it does) 36
  • 37. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Closing 37
  • 38. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. What’s Coming in 0.8.2  Consumer offsets in the broker  Delete topic  Further down the road – New producer – Improved producer API 38
  • 39. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Upcoming Operational Work  Learning to share  Shrinking a cluster  Cluster comparison  Advanced monitoring 39
  • 40. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. How Can You Get Involved?  http://kafka.apache.org  Join the mailing lists – users@kafka.apache.org  irc.freenode.net - #apache-kafka  Contribute tools 40
  • 41. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Talk To Us  Kafka SREs at LinkedIn – Clark Haskins  https://www.linkedin.com/in/clarkhaskins  chaskins@linkedin.com – Todd Palino  https://www.linkedin.com/in/toddpalino  tpalino@linkedin.com 41
  • 42. SITE RELIABILITY ENGINEERING©2014 LinkedIn Corporation. All Rights Reserved. Questions 42