SlideShare a Scribd company logo
1 of 29
Best Practices for
Running Kafka on
Docker Containers
Nanda Vijaydev, BlueData
Kafka Summit San Francisco
August 28, 2017
Agenda
• What is Docker?
• Deploying services on Docker
• Messaging systems (Kafka) on Docker: Challenges
• How We Did it: Lessons Learned
• Key Takeaways for Running Kafka on Docker
• Q & A
What is a Docker Container?
• Lightweight, stand-alone, executable
package of software to run specific
services
• Runs on all major Linux distributions
• On any infrastructure including VMs,
bare-metal, and in the cloud
• Package includes code, runtime,
system libraries, configurations, etc.
• Run as an isolated process in user
space
Docker Containers vs. Virtual Machines
• Unlike VMs, containers virtualize OS
and not hardware
• More portable and efficient
• Abstraction at the app layer that
packages app and dependencies
• Multiple containers share the base
kernel
• Take up less space and start almost
immediately
Kafka, Producers, and Consumers
• Independent services that
send/receive messages over
Kafka
• Can be written in many
languages
• Purpose-built for specific actions
• Mostly operate on high
frequency events and data
• Availability and scalability are
important
Considerations for Kafka Deployment
• Multiple services; each with its
own requirements
• Single QOS for related containers
and services (CPU & Memory)
• Storage – Local persistence &
External Volumes
• Service monitoring and
dependency management
How We Did It: Design Decisions I
• Run Kafka (e.g. Confluent distribution) and related
services and tools / applications unmodified
– Deploy all services that run on a single bare-metal
host in a single container
• Multi-tenancy support is key
– Network and storage security
• Clusters of containers span physical hosts
How We Did It: Sample Dockerfile
# Confluent Kafka 3.2.1 docker image
FROM bluedata/centos7:latest
#Install java 1.8
RUN yum -y install java-1.8.0-openjdk-devel
#Download and extract Kafka installation tar file
RUN mkdir /usr/lib/kafka;curl -s http://packages.confluent.io/archive/3.2/confluent-3.2.1-2.11.tar.gz | tar xz -C
/usr/lib/kafka/
##Create necessary directories for Kafka and Zookeeper to run
RUN mkdir /var/lib/zookeeper
…....
How We Did It: Design Decisions II
• Images built to “auto-configure” themselves at time
of instantiation
– Not all instances of a single image run the same set
of services when instantiated
• Zookeeper vs. Broker cluster nodes
– Ability to scale on demand
How We Did It: Deployment Configuration
#!/usr/bin/env bdwb
##############################################################
#
# Sample workbench instructions for building a BlueData catalog entry.
#
##############################################################
#
# YOUR_ORGANIZATION_NAME must be replaced with a valid organization name. Please
# refer to 'help builder organization' for details.
# builder organization --name YOUR_ORGANIZATION_NAME
builder organization --name BlueData
## Begin a new catalog entry
catalog new --distroid confluent-kafka --name "Confluent Kafka 3.2.1" 
--desc "The free, open-source streaming platform (Enterprise edition) based on Apache 
Kafka. Confluent Platform is the best way to get started 
with real-time data streams." 
--categories Kafka --version 4.0
## Define all node roles for the virtual
cluster.
role add broker 1+
role add zookeeper 1+
role add schemareg 1+
role add gateway 0+
## Define all services that are available in the virtual cluster.
service add --srvcid kafka-broker --name "Kafka Broker service" --port 9092
service add --srvcid zookeeper --name "Zookeeper service" --port 2181
service add --srvcid schema-registry --name "Schema-registry service" --port 8081
service add --srvcid control-center --name "Control center service" --port 9021
## Dev Configuration. Multiple services are placed on same container
clusterconfig new --configid default
clusterconfig assign --configid default –role gateway –srvcids gateway control-center
clusterconfig assign --configid default --role broker –srvcids kafka-broker schema-registry
clusterconfig assign --configid default --role zookeeper --srvcids kafka-broker zookeeper
## Prod Configuration. Services run on dedicated nodes with special
attributes
clusterconfig new --configid production
clusterconfig assign --configid production --role broker --srvcids kafka-broker
clusterconfig assign --configid production --role zookeeper --srvcids zookeeper
clusterconfig assign --configid production --role schemareg --srvcids schemareg
clusterconfig assign --configid production --role gateway --srvcids control-center
How We Did It: Deployment Configuration
#Configure your docker nodes with appropriate run time values
appconfig autogen --replace /tmp/zookeeper/myid –pattern @@ID@@ --macro UNIQUE_SELF_NODE_INT
appconfig autogen --replace /usr/lib/kafka/etc/kafka/server.properties –pattern @@HOST@@ --macro GET_NODE_FQDN
appconfig autogen --replace /usr/lib/kafka/etc/kafka/server.properties –pattern @@zookeeper.connet@@ --macro
ZOOKEEPER_SERVICE_STRING
#Start services in the order specified
REGISTER_START_SERVICE_SYSV zookeeper
REGISTER_START_SERVICE_SYSV kafka-broker –wait zookeeper
REGISTER_START_SERVICE_SYSV schema-registry –wait zookeeper
How We Did It: Resource Allocation
• Users to choose “flavors”
while launching containers
• Storage heavy containers
can have more disk space
• vCPUs * n = cpu-shares
• No over-provisioning of
memory
① Get started with Kafka (e.g.
Confluent community edition)
② Evaluate features/configurations
simultaneously on smaller
hardware footprint
③ Prototype multiple data pipelines
quickly with dockerized producers
and consumers
① Spin up dev/test clusters with
replica image of production
② QA/UAT using production
configuration without re-
inventing the wheel
③ Offload specific users and
workloads from production
① LOB multi-tenancy with strict
resource allocations
② Bare-metal performance for
business critical workloads
③ Share data hub / data lake with
strict access controls
Kafka On Docker Use Cases
Prototyping Departmental Enterprise
Exploring
the Value of Kafka
Initial Departmental
Deployments
Enterprise-Wide,
Mission-Critical Deployments
Multi-Tenant Deployment
5.10 3.3
2.1
ComputeIsolation
ComputeIsolation
Team 1 Team 2 Team 3
Build Components End to End Testing Prod Environment
Team 1 Team 2 Team3
Multiple teams or business groups
Evaluate different Kafka use cases
(e.g. producers, consumers,
pipelines)
Use different services & tools
(e.g. Broker, Zookeeper, Schema
Registry, API Gateway)
Use different distributions of
standalone Kafka and/or Hadoop
BlueData EPIC software platform
Shared server infrastructure with
node labels
Shared data sets for HDFS access
Multiple distributions, services, tools on shared, cost-effective infrastructure
Shared Servers
Dev/QA Hardware
Shared, Centrally Managed Server Infrastructure
Confluent 3.2
Prod Hardware
Apache Kafka 0.9Apache Kafka 0.8
Multi-Host Kafka Deployment
4 containers
On 3 different hosts
using 1 VLAN and 4 persistent IPs
How We Did It: Security Considerations
• Security is essential since containers and host share one kernel
– Non-privileged containers
• Achieved through layered set of capabilities
• Different capabilities provide different levels of isolation and protection
• Add “capabilities” to a container based on what operations are permitted
How We Did It: Network Architecture
• Connect containers across
hosts
• Persistence of IP address
across container restart
• DHCP/DNS service required
for IP allocation and hostname
resolution
• Deploy VLANs and VxLAN
tunnels for tenant-level traffic
isolation
Storage – Internal To Host File System
Data Volume
• A directory on host FS
• Data not deleted when container is deleted
Device Mapper Storage Driver
• Default – OverlayFS
• We use direct-lvm thinpool with devicemapper
• Data is deleted with container
Storage - External Volumes
• Storage is external to host FS, accessed over the
network
• Separates container from storage
• Cloud providers have storage services such as S3,
EBS
• You can also connect to HDFS, NFS, Gluster
• Services such as REX-Ray provide external volume
support
App Store for Kafka, Spark, & More
Pre-built
images, or
author your
own Docker
app images
with our App
Workbench
Docker image + app config scripts +
metadata (e.g. name, logo)
BlueData Application Image (.bin file)
Application bin file
Docker
image
CentOS
Dockerfile
RHEL
Dockerfile
appconfig
conf Init.d startscript
<app>
logo file
<app>.wb
bdwb
command
clusterconfig, image, role,
appconfig, catalog, service, ..
Sources
Docker file,
logo .PNG,
Init.d
RuntimeSoftware Bits
OR
Development
(e.g. extract .bin and modify to
create new bin)
Different Services in Each Container
Broker + Zookeeper + Schema Registry
Broker + Zookeeper
Broker
Container Storage On Host
Container Storage
Host Storage
Container Hosts
Kafka Cluster Containers
Multi-Tenant Resource Quotas
Aggregate Docker
container storage,
memory and cores
(CPU shares) for all
containers in
tenant “Team 1”
Aggregate compute, memory, & storage quotas for Docker containers
Monitoring Containers
Resource monitoring
Several open source and
commercial monitoring
options available
We use Elasticsearch with
Metricbeat plugin
Containers = the Future of Apps
Infrastructure
• Agility and elasticity
• Standardized environments
(dev, test, prod)
• Portability
(on-premises and cloud)
• Higher resource utilization
Applications
• Fool-proof packaging
(configs, libraries, driver
versions, etc.)
• Repeatable builds and
orchestration
• Faster app dev cycles
Kafka on Docker: Key Takeaways
• Enterprise deployment requirements:
– Docker base image includes all needed services (Kafka,
Zookeeper, Schema registry, etc.), libraries, jar files
– Container orchestration, including networking and storage,
depends on standards enforced by enterprises
– Resource-aware runtime configuration, including CPU and
RAM
– Sequence-aware app deployment needs more thought
Kafka on Docker: Key Takeaways
• Enterprise deployment challenges:
– Access to container secured with ssh keypair or PAM
module (LDAP/AD)
– Access to Kafka from Data Science applications
– Management agents in Docker images
– Runtime injection of resource and configuration
information
• Consider a turnkey software solution (e.g. BlueData)
to accelerate time to value and avoid DIY pitfalls
Nanda Vijaydev
@NandaVijaydev
www.bluedata.com

More Related Content

What's hot

Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...StreamNative
 
Docker at Shopify: From This-Looks-Fun to Production by Simon Eskildsen (Shop...
Docker at Shopify: From This-Looks-Fun to Production by Simon Eskildsen (Shop...Docker at Shopify: From This-Looks-Fun to Production by Simon Eskildsen (Shop...
Docker at Shopify: From This-Looks-Fun to Production by Simon Eskildsen (Shop...Docker, Inc.
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...Lucas Jellema
 
How you can contribute to Apache Cassandra
How you can contribute to Apache CassandraHow you can contribute to Apache Cassandra
How you can contribute to Apache CassandraYuki Morishita
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache KafkaJoe Stein
 
Clocker - The Docker Cloud Maker
Clocker - The Docker Cloud MakerClocker - The Docker Cloud Maker
Clocker - The Docker Cloud MakerAndrew Kennedy
 
A Journey through the JDKs (Java 9 to Java 11)
A Journey through the JDKs (Java 9 to Java 11)A Journey through the JDKs (Java 9 to Java 11)
A Journey through the JDKs (Java 9 to Java 11)Markus Günther
 
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...DataStax
 
Play Support in Cloud Foundry
Play Support in Cloud FoundryPlay Support in Cloud Foundry
Play Support in Cloud Foundryrajdeep
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerSpark Summit
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Erik Onnen
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
AWS re:Invent 2016: Amazon ECR Deep Dive on Image Optimization (CON401)
AWS re:Invent 2016: Amazon ECR Deep Dive on Image Optimization (CON401)AWS re:Invent 2016: Amazon ECR Deep Dive on Image Optimization (CON401)
AWS re:Invent 2016: Amazon ECR Deep Dive on Image Optimization (CON401)Amazon Web Services
 
NFS and CIFS Options for AWS (STG401) | AWS re:Invent 2013
NFS and CIFS Options for AWS (STG401) | AWS re:Invent 2013NFS and CIFS Options for AWS (STG401) | AWS re:Invent 2013
NFS and CIFS Options for AWS (STG401) | AWS re:Invent 2013Amazon Web Services
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and dockerFabio Fumarola
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaJoe Stein
 

What's hot (20)

Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
 
Docker at Shopify: From This-Looks-Fun to Production by Simon Eskildsen (Shop...
Docker at Shopify: From This-Looks-Fun to Production by Simon Eskildsen (Shop...Docker at Shopify: From This-Looks-Fun to Production by Simon Eskildsen (Shop...
Docker at Shopify: From This-Looks-Fun to Production by Simon Eskildsen (Shop...
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 
How you can contribute to Apache Cassandra
How you can contribute to Apache CassandraHow you can contribute to Apache Cassandra
How you can contribute to Apache Cassandra
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
Clocker - The Docker Cloud Maker
Clocker - The Docker Cloud MakerClocker - The Docker Cloud Maker
Clocker - The Docker Cloud Maker
 
A Journey through the JDKs (Java 9 to Java 11)
A Journey through the JDKs (Java 9 to Java 11)A Journey through the JDKs (Java 9 to Java 11)
A Journey through the JDKs (Java 9 to Java 11)
 
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
 
Play Support in Cloud Foundry
Play Support in Cloud FoundryPlay Support in Cloud Foundry
Play Support in Cloud Foundry
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On Docker
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
AWS re:Invent 2016: Amazon ECR Deep Dive on Image Optimization (CON401)
AWS re:Invent 2016: Amazon ECR Deep Dive on Image Optimization (CON401)AWS re:Invent 2016: Amazon ECR Deep Dive on Image Optimization (CON401)
AWS re:Invent 2016: Amazon ECR Deep Dive on Image Optimization (CON401)
 
NFS and CIFS Options for AWS (STG401) | AWS re:Invent 2013
NFS and CIFS Options for AWS (STG401) | AWS re:Invent 2013NFS and CIFS Options for AWS (STG401) | AWS re:Invent 2013
NFS and CIFS Options for AWS (STG401) | AWS re:Invent 2013
 
TIAD : Automating the aplication lifecycle
TIAD : Automating the aplication lifecycleTIAD : Automating the aplication lifecycle
TIAD : Automating the aplication lifecycle
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 

Similar to Best Practices for Running Kafka on Docker Containers

Docker - Portable Deployment
Docker - Portable DeploymentDocker - Portable Deployment
Docker - Portable Deploymentjavaonfly
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetesDongwon Kim
 
DockerCon EU 2015 Barcelona
DockerCon EU 2015 BarcelonaDockerCon EU 2015 Barcelona
DockerCon EU 2015 BarcelonaRoman Dembitsky
 
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker ContainersKafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containersconfluent
 
ContainerDayVietnam2016: Dockerize a small business
ContainerDayVietnam2016: Dockerize a small businessContainerDayVietnam2016: Dockerize a small business
ContainerDayVietnam2016: Dockerize a small businessDocker-Hanoi
 
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013dotCloud
 
IBM Bluemix Paris Meetup #14 - Le Village by CA - 20160413 - Introduction à D...
IBM Bluemix Paris Meetup #14 - Le Village by CA - 20160413 - Introduction à D...IBM Bluemix Paris Meetup #14 - Le Village by CA - 20160413 - Introduction à D...
IBM Bluemix Paris Meetup #14 - Le Village by CA - 20160413 - Introduction à D...IBM France Lab
 
Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment Arun prasath
 
Docker intro
Docker introDocker intro
Docker introspiddy
 
Killer Docker Workflows for Development
Killer Docker Workflows for DevelopmentKiller Docker Workflows for Development
Killer Docker Workflows for DevelopmentChris Tankersley
 
IBM WebSphere Application Server traditional and Docker
IBM WebSphere Application Server traditional and DockerIBM WebSphere Application Server traditional and Docker
IBM WebSphere Application Server traditional and DockerDavid Currie
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to DockerAditya Konarde
 
Dockerization of Azure Platform
Dockerization of Azure PlatformDockerization of Azure Platform
Dockerization of Azure Platformnirajrules
 
Everything you need to know about Docker
Everything you need to know about DockerEverything you need to know about Docker
Everything you need to know about DockerAlican Akkuş
 
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSPutting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSLightbend
 
Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013dotCloud
 

Similar to Best Practices for Running Kafka on Docker Containers (20)

Docker - Portable Deployment
Docker - Portable DeploymentDocker - Portable Deployment
Docker - Portable Deployment
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetes
 
DockerCon EU 2015 Barcelona
DockerCon EU 2015 BarcelonaDockerCon EU 2015 Barcelona
DockerCon EU 2015 Barcelona
 
OpenStack Summit
OpenStack SummitOpenStack Summit
OpenStack Summit
 
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker ContainersKafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
 
Docker-Intro
Docker-IntroDocker-Intro
Docker-Intro
 
ContainerDayVietnam2016: Dockerize a small business
ContainerDayVietnam2016: Dockerize a small businessContainerDayVietnam2016: Dockerize a small business
ContainerDayVietnam2016: Dockerize a small business
 
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013
 
IBM Bluemix Paris Meetup #14 - Le Village by CA - 20160413 - Introduction à D...
IBM Bluemix Paris Meetup #14 - Le Village by CA - 20160413 - Introduction à D...IBM Bluemix Paris Meetup #14 - Le Village by CA - 20160413 - Introduction à D...
IBM Bluemix Paris Meetup #14 - Le Village by CA - 20160413 - Introduction à D...
 
Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment
 
Docker intro
Docker introDocker intro
Docker intro
 
Killer Docker Workflows for Development
Killer Docker Workflows for DevelopmentKiller Docker Workflows for Development
Killer Docker Workflows for Development
 
IBM WebSphere Application Server traditional and Docker
IBM WebSphere Application Server traditional and DockerIBM WebSphere Application Server traditional and Docker
IBM WebSphere Application Server traditional and Docker
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
Docker Introduction
Docker IntroductionDocker Introduction
Docker Introduction
 
Dockerization of Azure Platform
Dockerization of Azure PlatformDockerization of Azure Platform
Dockerization of Azure Platform
 
Docker slides
Docker slidesDocker slides
Docker slides
 
Everything you need to know about Docker
Everything you need to know about DockerEverything you need to know about Docker
Everything you need to know about Docker
 
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSPutting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
 
Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013
 

More from BlueData, Inc.

Introduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes MeetupIntroduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes MeetupBlueData, Inc.
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataBlueData, Inc.
 
BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData, Inc.
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.
 
BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)BlueData, Inc.
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBlueData, Inc.
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsBlueData, Inc.
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData, Inc.
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceBlueData, Inc.
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorBlueData, Inc.
 
Hadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperHadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperBlueData, Inc.
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorBlueData, Inc.
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData, Inc.
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBlueData, Inc.
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
 
Spark Infrastructure Made Easy
Spark Infrastructure Made EasySpark Infrastructure Made Easy
Spark Infrastructure Made EasyBlueData, Inc.
 
BlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData, Inc.
 

More from BlueData, Inc. (19)

Introduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes MeetupIntroduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes Meetup
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big Data
 
BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containers
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark Workloads
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec Sheet
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline Accelerator
 
Hadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperHadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White Paper
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab Accelerator
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 Overview
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 Telco
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for Hadoop
 
Spark Infrastructure Made Easy
Spark Infrastructure Made EasySpark Infrastructure Made Easy
Spark Infrastructure Made Easy
 
BlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData Integration with Cloudera Manager
BlueData Integration with Cloudera Manager
 

Recently uploaded

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 

Recently uploaded (20)

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 

Best Practices for Running Kafka on Docker Containers

  • 1. Best Practices for Running Kafka on Docker Containers Nanda Vijaydev, BlueData Kafka Summit San Francisco August 28, 2017
  • 2. Agenda • What is Docker? • Deploying services on Docker • Messaging systems (Kafka) on Docker: Challenges • How We Did it: Lessons Learned • Key Takeaways for Running Kafka on Docker • Q & A
  • 3. What is a Docker Container? • Lightweight, stand-alone, executable package of software to run specific services • Runs on all major Linux distributions • On any infrastructure including VMs, bare-metal, and in the cloud • Package includes code, runtime, system libraries, configurations, etc. • Run as an isolated process in user space
  • 4. Docker Containers vs. Virtual Machines • Unlike VMs, containers virtualize OS and not hardware • More portable and efficient • Abstraction at the app layer that packages app and dependencies • Multiple containers share the base kernel • Take up less space and start almost immediately
  • 5. Kafka, Producers, and Consumers • Independent services that send/receive messages over Kafka • Can be written in many languages • Purpose-built for specific actions • Mostly operate on high frequency events and data • Availability and scalability are important
  • 6. Considerations for Kafka Deployment • Multiple services; each with its own requirements • Single QOS for related containers and services (CPU & Memory) • Storage – Local persistence & External Volumes • Service monitoring and dependency management
  • 7. How We Did It: Design Decisions I • Run Kafka (e.g. Confluent distribution) and related services and tools / applications unmodified – Deploy all services that run on a single bare-metal host in a single container • Multi-tenancy support is key – Network and storage security • Clusters of containers span physical hosts
  • 8. How We Did It: Sample Dockerfile # Confluent Kafka 3.2.1 docker image FROM bluedata/centos7:latest #Install java 1.8 RUN yum -y install java-1.8.0-openjdk-devel #Download and extract Kafka installation tar file RUN mkdir /usr/lib/kafka;curl -s http://packages.confluent.io/archive/3.2/confluent-3.2.1-2.11.tar.gz | tar xz -C /usr/lib/kafka/ ##Create necessary directories for Kafka and Zookeeper to run RUN mkdir /var/lib/zookeeper …....
  • 9. How We Did It: Design Decisions II • Images built to “auto-configure” themselves at time of instantiation – Not all instances of a single image run the same set of services when instantiated • Zookeeper vs. Broker cluster nodes – Ability to scale on demand
  • 10. How We Did It: Deployment Configuration #!/usr/bin/env bdwb ############################################################## # # Sample workbench instructions for building a BlueData catalog entry. # ############################################################## # # YOUR_ORGANIZATION_NAME must be replaced with a valid organization name. Please # refer to 'help builder organization' for details. # builder organization --name YOUR_ORGANIZATION_NAME builder organization --name BlueData ## Begin a new catalog entry catalog new --distroid confluent-kafka --name "Confluent Kafka 3.2.1" --desc "The free, open-source streaming platform (Enterprise edition) based on Apache Kafka. Confluent Platform is the best way to get started with real-time data streams." --categories Kafka --version 4.0 ## Define all node roles for the virtual cluster. role add broker 1+ role add zookeeper 1+ role add schemareg 1+ role add gateway 0+ ## Define all services that are available in the virtual cluster. service add --srvcid kafka-broker --name "Kafka Broker service" --port 9092 service add --srvcid zookeeper --name "Zookeeper service" --port 2181 service add --srvcid schema-registry --name "Schema-registry service" --port 8081 service add --srvcid control-center --name "Control center service" --port 9021 ## Dev Configuration. Multiple services are placed on same container clusterconfig new --configid default clusterconfig assign --configid default –role gateway –srvcids gateway control-center clusterconfig assign --configid default --role broker –srvcids kafka-broker schema-registry clusterconfig assign --configid default --role zookeeper --srvcids kafka-broker zookeeper ## Prod Configuration. Services run on dedicated nodes with special attributes clusterconfig new --configid production clusterconfig assign --configid production --role broker --srvcids kafka-broker clusterconfig assign --configid production --role zookeeper --srvcids zookeeper clusterconfig assign --configid production --role schemareg --srvcids schemareg clusterconfig assign --configid production --role gateway --srvcids control-center
  • 11. How We Did It: Deployment Configuration #Configure your docker nodes with appropriate run time values appconfig autogen --replace /tmp/zookeeper/myid –pattern @@ID@@ --macro UNIQUE_SELF_NODE_INT appconfig autogen --replace /usr/lib/kafka/etc/kafka/server.properties –pattern @@HOST@@ --macro GET_NODE_FQDN appconfig autogen --replace /usr/lib/kafka/etc/kafka/server.properties –pattern @@zookeeper.connet@@ --macro ZOOKEEPER_SERVICE_STRING #Start services in the order specified REGISTER_START_SERVICE_SYSV zookeeper REGISTER_START_SERVICE_SYSV kafka-broker –wait zookeeper REGISTER_START_SERVICE_SYSV schema-registry –wait zookeeper
  • 12. How We Did It: Resource Allocation • Users to choose “flavors” while launching containers • Storage heavy containers can have more disk space • vCPUs * n = cpu-shares • No over-provisioning of memory
  • 13. ① Get started with Kafka (e.g. Confluent community edition) ② Evaluate features/configurations simultaneously on smaller hardware footprint ③ Prototype multiple data pipelines quickly with dockerized producers and consumers ① Spin up dev/test clusters with replica image of production ② QA/UAT using production configuration without re- inventing the wheel ③ Offload specific users and workloads from production ① LOB multi-tenancy with strict resource allocations ② Bare-metal performance for business critical workloads ③ Share data hub / data lake with strict access controls Kafka On Docker Use Cases Prototyping Departmental Enterprise Exploring the Value of Kafka Initial Departmental Deployments Enterprise-Wide, Mission-Critical Deployments
  • 14. Multi-Tenant Deployment 5.10 3.3 2.1 ComputeIsolation ComputeIsolation Team 1 Team 2 Team 3 Build Components End to End Testing Prod Environment Team 1 Team 2 Team3 Multiple teams or business groups Evaluate different Kafka use cases (e.g. producers, consumers, pipelines) Use different services & tools (e.g. Broker, Zookeeper, Schema Registry, API Gateway) Use different distributions of standalone Kafka and/or Hadoop BlueData EPIC software platform Shared server infrastructure with node labels Shared data sets for HDFS access Multiple distributions, services, tools on shared, cost-effective infrastructure Shared Servers Dev/QA Hardware Shared, Centrally Managed Server Infrastructure Confluent 3.2 Prod Hardware Apache Kafka 0.9Apache Kafka 0.8
  • 15. Multi-Host Kafka Deployment 4 containers On 3 different hosts using 1 VLAN and 4 persistent IPs
  • 16. How We Did It: Security Considerations • Security is essential since containers and host share one kernel – Non-privileged containers • Achieved through layered set of capabilities • Different capabilities provide different levels of isolation and protection • Add “capabilities” to a container based on what operations are permitted
  • 17. How We Did It: Network Architecture • Connect containers across hosts • Persistence of IP address across container restart • DHCP/DNS service required for IP allocation and hostname resolution • Deploy VLANs and VxLAN tunnels for tenant-level traffic isolation
  • 18. Storage – Internal To Host File System Data Volume • A directory on host FS • Data not deleted when container is deleted Device Mapper Storage Driver • Default – OverlayFS • We use direct-lvm thinpool with devicemapper • Data is deleted with container
  • 19. Storage - External Volumes • Storage is external to host FS, accessed over the network • Separates container from storage • Cloud providers have storage services such as S3, EBS • You can also connect to HDFS, NFS, Gluster • Services such as REX-Ray provide external volume support
  • 20. App Store for Kafka, Spark, & More Pre-built images, or author your own Docker app images with our App Workbench Docker image + app config scripts + metadata (e.g. name, logo)
  • 21. BlueData Application Image (.bin file) Application bin file Docker image CentOS Dockerfile RHEL Dockerfile appconfig conf Init.d startscript <app> logo file <app>.wb bdwb command clusterconfig, image, role, appconfig, catalog, service, .. Sources Docker file, logo .PNG, Init.d RuntimeSoftware Bits OR Development (e.g. extract .bin and modify to create new bin)
  • 22. Different Services in Each Container Broker + Zookeeper + Schema Registry Broker + Zookeeper Broker
  • 23. Container Storage On Host Container Storage Host Storage Container Hosts Kafka Cluster Containers
  • 24. Multi-Tenant Resource Quotas Aggregate Docker container storage, memory and cores (CPU shares) for all containers in tenant “Team 1” Aggregate compute, memory, & storage quotas for Docker containers
  • 25. Monitoring Containers Resource monitoring Several open source and commercial monitoring options available We use Elasticsearch with Metricbeat plugin
  • 26. Containers = the Future of Apps Infrastructure • Agility and elasticity • Standardized environments (dev, test, prod) • Portability (on-premises and cloud) • Higher resource utilization Applications • Fool-proof packaging (configs, libraries, driver versions, etc.) • Repeatable builds and orchestration • Faster app dev cycles
  • 27. Kafka on Docker: Key Takeaways • Enterprise deployment requirements: – Docker base image includes all needed services (Kafka, Zookeeper, Schema registry, etc.), libraries, jar files – Container orchestration, including networking and storage, depends on standards enforced by enterprises – Resource-aware runtime configuration, including CPU and RAM – Sequence-aware app deployment needs more thought
  • 28. Kafka on Docker: Key Takeaways • Enterprise deployment challenges: – Access to container secured with ssh keypair or PAM module (LDAP/AD) – Access to Kafka from Data Science applications – Management agents in Docker images – Runtime injection of resource and configuration information • Consider a turnkey software solution (e.g. BlueData) to accelerate time to value and avoid DIY pitfalls

Editor's Notes

  1. In the evalution phase, the key to success in addition is the ability to evaluate multiple use cases using different distributions and versions of Kafka. For example, using two different distributions could mean 4-6 months with twice the hardware. In the implementation phase, where you have a use case deployed into production, the challenges continue. You have to maintain multiple environments for like Dev, Test and get a broader team of infrastructure, security and data management experts. When you figure out this formula and expand to more use cases from newer business groups, this translates to newer workloads, newer analytic tools that might have dependencies on specific distributions and versions. Finally, when you do get to this broader environment with more production applications, the typical IT lens gets applied to optimize the infrastructure for service levels metrics such as TCO and the associated return on investment
  2. {allowed_docker_caps, ["SETPCAP", "SYS_PACCT", "SYS_RESOURCE", "AUDIT_WRITE", "CHOWN", "DAC_OVERRIDE", "FOWNER", "DAC_READ_SEARCH", "FSETID", "KILL", "SETGID", "SETUID", "NET_BIND_SERVICE", "NET_BROADCAST", "SYS_CHROOT", "SYS_PTRACE", "SETFCAP", "NET_RAW"]},
  3. Local persistent storage is “local” to the node within the cluster and is usually the storage resident within the machine – think “internal disks”. These disks can be partitioned for specific services and will typically provide the best in terms of performance and data isolation. The downside to local persistent storage is that it binds the service or container to a specific node. Distributed services like nosql databases work well in this model.