SlideShare a Scribd company logo
1 of 26
Download to read offline
Modern	Big	Data	&	Machine	Learning	in	
the	era	of	cloud,	Docker	and	Kubernetes
Slim	Baltagi
Minneapolis,	Minnesota
June	5th 2018
Agenda
1.Key	takeaways
2.What	is	Docker?
3.What	is	Kubernetes?
4.Why	Big	Data	on	Kubernetes?
5.Why	Machine	Learning	on	
Kubernetes?
6.How	to	get	started?	
2
1. Key takeaways
• There is a major shift in web and mobile application
architecture from the ‘old-school’ one to a modern ‘micro-
services’ architecture based on containers. Kubernetes
has been quite successful in managing those containers and
running them in distributed computing environments.
• Now enabling Big Data and Machine Learning on Kubernetes
will allow IT organizations to standardize on the same
Kubernetes infrastructure. This will propel adoption and
reduce costs.
• Kubeflow is an open source framework dedicated to making
it easy to use the machine learning tool of your choice and
deploy your ML applications at scale on Kubernetes.
Kubeflow is becoming an industry standard as well!
• Both Kubernetes and Kubeflow will enable IT organizations
to focus more effort on applications rather than
infrastructure. 3
Agenda
1.Key	takeaway
2.What	is	Docker?
3.What	is	Kubernetes?
4.Why	Big	Data	on	Kubernetes?
5.Why	Machine	Learning	on	
Kubernetes?
6.How	to	get	started?	
4
2. What is Docker?
The Docker logo is sort of a whale / boat hybrid, filled with
shipping containers. The analogy is taken from freight
transport where goods are shipped in containers.
5
2. What is Docker?
Docker is an open source technology, released back in 2013, for
development and deployment of applications in containers that
package together application’s code, libraries,
configurations and software dependencies into container
images.
6
2. What is Docker?
A container is a runnable instance of an image. Container
images can be pulled from a registry ( such as Docker Hub
hub.docker.com, Azure Container Registry, …) and deployed
anywhere the container runtime is installed: your laptop, servers
on-premises, or in the cloud.
7
2. What is Docker?
• Some of the advantages that Docker offers:
• Identical environments: Deploy and run the same way
whether in development, testing or production and the
application that you deploy to one environment is going to
work in another.
• Isolated environments for your individual applications
• Version control: Instead of “patching”, new functionality is
added to a micro-service by replacing existing containers with
ones that incorporate new functionality.
• Portability: Easy move workloads between different versions
of Linux for example
• Developer Productivity
• Application Agility: How quickly you can evolve an
application
• Operational Efficiencies: containerized applications are
easier to deploy.
• Scale out (not up): simply start more containers
8
Agenda
1.Key	takeaway
2.What	is	Docker?
3.What	is	Kubernetes?
4.Why	Big	Data	on	Kubernetes?
5.Why	Machine	Learning	on	
Kubernetes?
6.How	to	get	started?	
9
3. What is Kubernetes?
• The Kubernetes logo is literally a boat’s steering wheel.
• It should be an admiral’s hat because, as we will see,
Kubernetes helps you manage a fleet of Docker ‘boats’, not
just one! 10
3. What is Kubernetes?
• Kubernetes (numeronym K8s) is an open source platform for
automating deployment, scaling and management of
containerized applications both in cloud and on premise.
• It was initially released by Google in 2014 and it is now
managed by the Cloud Native Computing Foundation (CNCF).
• Kubernetes has been already adopted by the largest public
cloud vendors and technology providers.
• Some of the companies providing Kubernetes Managed
Services: Google Cloud Platform (GCP) – GKE; Microsoft
Azure – AKS; Amazon AWS – EKS; Oracle – OKE; IBM
Cloud Container Service; RedHat – OpenStack; Pivotal –
PKS; Alibaba Cloud Container Service for Kubernetes, …
• Kubernetes is being embraced by even more software
vendors and enterprises.
11
3. What is Kubernetes?
• A Kubernetes cluster is comprised of at least one master
node, which manages the cluster, and multiple worker
nodes, where containerized applications run using Pods.
• A Pod is a logical grouping of one or more containers. Pods
enable multiple containers to run on a host machine and
share resources such as: storage, networking, and container
runtime information.
12
3. What is Kubernetes?
Some of the advantages that Kubernetes offers:
• Kubernetes makes containers manageable
• Portability between cloud and on-premises
• Kubernetes cloud agnostic design made containerized
applications to run on any platform without any changes to
the application code.
• Kubernetes provides two types of auto-scaling:
• pod auto-scaling where more pods are automatically
created in a cluster based on scaling rules, and
• cluster auto-scaling where more nodes are added to a
cluster based on flexible rules.
• Monitoring: Rather than having to rely on ad hoc
monitoring approaches, system monitoring is built into
Kubernetes and provides for a wide range of features:
replicas, rolling updates, auto-scaling, etc.
• Better cluster resource utilization 13
Agenda
1.Key	takeaway
2.What	is	Docker?
3.What	is	Kubernetes?
4.Why	Big	Data	on	Kubernetes?
5.Why	Machine	Learning	on	
Kubernetes?
6.How	to	get	started?	
14
4. Why Big Data on Kubernetes?
• Big Data on Kubernetes is now a reality thanks to:
• The Special Interest Group in Kubernetes Community on
Big Data and the many companies collaborating on the
related effort.
• Kubernetes newer features such as StatefulSets, custom
schedulers, custom resources, custom controllers,
container storage interface, …
• More persistent storage options to run stateful
applications on Kubernetes, depending on data type, such
as object storage, file systems, software defined storage, …
• More and more Big Data/Fast Data Tools running on
Kubernetes such as: Apache Spark, Apache Kafka,
Apache Flink, Apache Cassandra, Apache Zookeeper, …
15
4. Why Big Data on Kubernetes?
Example: Apache Spark on Kubernetes
• Video: Submitting Spark jobs using Kubernetes scheduler on
AKS. March 16, 2018 https://www.youtube.com/watch?v=T7pAZplLiCk
• Article: Running Apache Spark jobs on AKS. March 15, 2018
https://docs.microsoft.com/en-us/azure/aks/spark-job
• Blog: Apache Spark 2.3 with Native Kubernetes Support.
March 15, 2018
• Docs: Running Spark on Kubernetes https://apache-spark-on-
k8s.github.io/userdocs/running-on-kubernetes.html
16
4. Why Big Data on Kubernetes?
• There are many ways to run Big Data applications such as
Spark. For example:
• Standalone mode using dedicated resources
• YARN cluster co-resident with Hadoop
• Mesos cluster alongside other Mesos applications
• So, why would you run Big Data applications on Kubernetes?
• In addition to all the advantages that Kubernetes offer, the
following ones are particularly relevant to Big Data
applications:
• A single container orchestrator for all your applications
• Increased server utilization
• Isolation between workloads
• Reduction in operational overhead
• Language-agnostic distributed computing clusters
17
4. Why Big Data on Kubernetes?
• A single container orchestrator for all your
applications: For example, Kubernetes can manage a
broad range of workloads; no need to deal with
YARN/HDFS for data processing and a separate container
orchestrator for your other applications. This solve the
problem of running Big Data applications in silos in their
own clusters.
• Increased server utilization: For example, share nodes
between Spark and other applications by having a
streaming application running to feed a streaming Spark
pipeline, or a nginx pod to serve web traffic without the
need to statically partition nodes.
18
4. Why Big Data on Kubernetes?
• Isolation between workloads: For example, Kubernetes
allows you to safely co-schedule batch workloads like
Spark on the same nodes as latency-sensitive servers.
• Reduction in operational overhead. For example: Static
clusters require greater operational know-how to do
common tasks with Kafka, such as applying broker
configuration updates, upgrading to a new version, and
adding or decommissioning brokers. By using Kafka on
Kubernetes, you can reduce the overhead for a number of
common operational tasks with standard cluster resource
manager features.
• Containers and Kubernetes make great language-
agnostic distributed computing clusters.
19
Agenda
1.Key	takeaway
2.What	is	Docker?
3.What	is	Kubernetes?
4.Why	Big	Data	on	Kubernetes?
5.Why	Machine	Learning	on	
Kubernetes?
6.How	to	get	started?	
20
5. Why Machine Learning on Kubernetes?
Machine Learning on Kubernetes is now a reality thanks to:
• Development in Kubernetes such as Stateful applications,
extension points, …
• Hardware acceleration for Kubernetes from Nvidia (GPU)
, Google (TPU: Tensor Processing Unit), …
• Machine Learning tools running on Kubernetes such as:
Kubeflow, Paddle, Seldon, RiseML, Anaconda, H2O, …
• Emergence of Kubeflow, an open source framework
dedicated to making it easy to use the machine learning
tool of your choice and deploy your ML applications in
distributed mode on Kubernetes. Kubeflow is becoming the
industry standard as well!
• Services such as the one from Microsoft to train and
serve TensorFlow Models at scale with Kubernetes and
Kubeflow on Azure Kubernetes Service AKS
21
5. Why Machine Learning on Kubernetes?
• You've created a machine learning model, using a tool of
choice such as TensorFlow, PyTorch, or scikit-learn… Now
what?
• How can you ensure that the model is deployed to
production and can scale as needed on incoming
data?
• How can you seamlessly migrate a model from your
local laptop / virtual machine to your cloud platform of
choice?
• Kubeflow includes:
• the JupyterHub platform for creating and managing
Jupyter notebook servers that are used by data science
and research groups
• a Tensorflow Customer Resource for managing
compute resources to a specific cluster size
• a Tensorflow Serving container to house the machine
learning application 22
5 Why Machine Learning on Kubernetes?
• Distributed training instead of sequential: huge time saver
for large trainings
• Enabling Machine Learning at large scale
• Mix of GPU and CPU nodes to serve both as a training and
serving platform
• IT can better support data science and machine learning
applications with Kubernetes as the common
orchestration layer for all (containerized) applications
• Ability for IT to create self-service environments for data
scientists and other data users.
• Single scheduling solution for multiple environments, on
premise or in multiple clouds
• Better resource utilization through centralized scheduling
of data science and other containerized applications
23
Agenda
1.Key	takeaway
2.What	is	Docker?
3.What	is	Kubernetes?
4.Why	Big	Data	on	Kubernetes?
5.Why	Machine	Learning	on	
Kubernetes?
6.How	to	get	started?	
24
6. How to get started?
• Learn from some free tutorials in your browser !
• Docker & Containers https://www.katacoda.com/courses/docker
• Kubernetes https://www.katacoda.com/courses/kubernetes
• KubeFlow https://www.katacoda.com/kubeflow
• Watch some demos
• Sentiment Analysis using Kubernetes and Kubeflow, Google,
May 31st 2018 https://www.youtube.com/watch?v=-ZlIuQXyD1A
• OSS Unboxing – Kubeflow, Lachlan Evenson, Microsoft, May
11th 2018 https://www.youtube.com/watch?v=uL_pqP_HgcY
• Do some labs
• Labs for Training and Serving TensorFlow Models with
Kubernetes and Kubeflow on Azure Container Service (AKS)
https://github.com/Azure/kubeflow-labs
• Introduction to Kubeflow on Google Kubernetes Engine (GKE)
https://codelabs.developers.google.com/codelabs/kubeflow-
introduction/index.html?index=..%2F..%2Fio2018#0 25
Thank you!
Let’s keep in touch!
@SlimBaltagi
https://www.linkedin.com/in/slimbaltagi
sbaltagi@gmail.com
26

More Related Content

What's hot

Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Edureka!
 
Kubernetes: A Short Introduction (2019)
Kubernetes: A Short Introduction (2019)Kubernetes: A Short Introduction (2019)
Kubernetes: A Short Introduction (2019)Megan O'Keefe
 
Containerized Applications Overview
Containerized Applications OverviewContainerized Applications Overview
Containerized Applications OverviewApoorv Anand
 
Introduction of Kubernetes - Trang Nguyen
Introduction of Kubernetes - Trang NguyenIntroduction of Kubernetes - Trang Nguyen
Introduction of Kubernetes - Trang NguyenTrang Nguyen
 
Azure kubernetes service (aks)
Azure kubernetes service (aks)Azure kubernetes service (aks)
Azure kubernetes service (aks)Akash Agrawal
 
Introduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopIntroduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopBob Killen
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes IntroductionPeng Xiao
 
Kubernetes architecture
Kubernetes architectureKubernetes architecture
Kubernetes architectureJanakiram MSV
 
Docker and kubernetes_introduction
Docker and kubernetes_introductionDocker and kubernetes_introduction
Docker and kubernetes_introductionJason Hu
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding KubernetesTu Pham
 
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep diveWinton Winton
 
An Introduction to Kubernetes
An Introduction to KubernetesAn Introduction to Kubernetes
An Introduction to KubernetesImesh Gunaratne
 
Getting Started with Kubernetes
Getting Started with Kubernetes Getting Started with Kubernetes
Getting Started with Kubernetes VMware Tanzu
 
Kubernetes
KubernetesKubernetes
KubernetesHenry He
 
DevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to KubernetesDevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to KubernetesRonny Trommer
 

What's hot (20)

Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
 
Kubernetes: A Short Introduction (2019)
Kubernetes: A Short Introduction (2019)Kubernetes: A Short Introduction (2019)
Kubernetes: A Short Introduction (2019)
 
Containerized Applications Overview
Containerized Applications OverviewContainerized Applications Overview
Containerized Applications Overview
 
Why Kubernetes on Azure
Why Kubernetes on AzureWhy Kubernetes on Azure
Why Kubernetes on Azure
 
Introduction of Kubernetes - Trang Nguyen
Introduction of Kubernetes - Trang NguyenIntroduction of Kubernetes - Trang Nguyen
Introduction of Kubernetes - Trang Nguyen
 
Azure kubernetes service (aks)
Azure kubernetes service (aks)Azure kubernetes service (aks)
Azure kubernetes service (aks)
 
Introduction to Kubernetes Workshop
Introduction to Kubernetes WorkshopIntroduction to Kubernetes Workshop
Introduction to Kubernetes Workshop
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
DevOps with Kubernetes
DevOps with KubernetesDevOps with Kubernetes
DevOps with Kubernetes
 
Kubernetes architecture
Kubernetes architectureKubernetes architecture
Kubernetes architecture
 
Docker and kubernetes_introduction
Docker and kubernetes_introductionDocker and kubernetes_introduction
Docker and kubernetes_introduction
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding Kubernetes
 
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep dive
 
An Introduction to Kubernetes
An Introduction to KubernetesAn Introduction to Kubernetes
An Introduction to Kubernetes
 
Getting Started with Kubernetes
Getting Started with Kubernetes Getting Started with Kubernetes
Getting Started with Kubernetes
 
Kubernetes
KubernetesKubernetes
Kubernetes
 
DevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to KubernetesDevJam 2019 - Introduction to Kubernetes
DevJam 2019 - Introduction to Kubernetes
 

Similar to Modern big data and machine learning in the era of cloud, docker and kubernetes

Why is Kubernetes considered the next generation application platform
Why is Kubernetes considered the next generation application platformWhy is Kubernetes considered the next generation application platform
Why is Kubernetes considered the next generation application platformCalidad Infotech
 
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and KubelessBuilding Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and KubelessBitnami
 
Best online kubernetes course in H2KInfosys.pdf
Best online kubernetes course in H2KInfosys.pdfBest online kubernetes course in H2KInfosys.pdf
Best online kubernetes course in H2KInfosys.pdfabhayah2k
 
6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf
6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf
6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdfMars Devs
 
Kubernetes - An introduction
Kubernetes - An introductionKubernetes - An introduction
Kubernetes - An introductionLoves Cloud
 
Kubernetes solutions
Kubernetes solutionsKubernetes solutions
Kubernetes solutionsEric Cattoir
 
Kubernetes: https://youtu.be/KnjnQj-FvfQ
Kubernetes: https://youtu.be/KnjnQj-FvfQKubernetes: https://youtu.be/KnjnQj-FvfQ
Kubernetes: https://youtu.be/KnjnQj-FvfQRahul Malhotra
 
Global Azure Bootcamp: Container, Docker & Kubernetes Basics
Global Azure Bootcamp: Container, Docker & Kubernetes BasicsGlobal Azure Bootcamp: Container, Docker & Kubernetes Basics
Global Azure Bootcamp: Container, Docker & Kubernetes BasicsNico Meisenzahl
 
Kubernetes is all you need
Kubernetes is all you needKubernetes is all you need
Kubernetes is all you needVishwas N
 
Nugwc k8s session-16-march-2021
Nugwc k8s session-16-march-2021Nugwc k8s session-16-march-2021
Nugwc k8s session-16-march-2021Avanti Patil
 
Running and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStackRunning and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStackVictor Palma
 
Why kubernetes matters
Why kubernetes mattersWhy kubernetes matters
Why kubernetes mattersPlatform9
 
Cloud technology with practical knowledge
Cloud technology with practical knowledgeCloud technology with practical knowledge
Cloud technology with practical knowledgeAnshikaNigam8
 
Simplify Your Way To Expert Kubernetes Management
Simplify Your Way To Expert Kubernetes ManagementSimplify Your Way To Expert Kubernetes Management
Simplify Your Way To Expert Kubernetes ManagementDevOps.com
 
Serverless brewbox
Serverless   brewboxServerless   brewbox
Serverless brewboxLino Telera
 
How docker & kubernetes can optimize the cost of hosting
How docker & kubernetes can optimize the cost of hostingHow docker & kubernetes can optimize the cost of hosting
How docker & kubernetes can optimize the cost of hosting9 series
 

Similar to Modern big data and machine learning in the era of cloud, docker and kubernetes (20)

Why is Kubernetes considered the next generation application platform
Why is Kubernetes considered the next generation application platformWhy is Kubernetes considered the next generation application platform
Why is Kubernetes considered the next generation application platform
 
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and KubelessBuilding Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
 
Best online kubernetes course in H2KInfosys.pdf
Best online kubernetes course in H2KInfosys.pdfBest online kubernetes course in H2KInfosys.pdf
Best online kubernetes course in H2KInfosys.pdf
 
6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf
6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf
6 Steps Functionality Hacks To Kubernetes - 2023 Update.pdf
 
Kubernetes for All
Kubernetes for AllKubernetes for All
Kubernetes for All
 
Kubernetes - An introduction
Kubernetes - An introductionKubernetes - An introduction
Kubernetes - An introduction
 
Kubernetes solutions
Kubernetes solutionsKubernetes solutions
Kubernetes solutions
 
Kubernetes: https://youtu.be/KnjnQj-FvfQ
Kubernetes: https://youtu.be/KnjnQj-FvfQKubernetes: https://youtu.be/KnjnQj-FvfQ
Kubernetes: https://youtu.be/KnjnQj-FvfQ
 
Global Azure Bootcamp: Container, Docker & Kubernetes Basics
Global Azure Bootcamp: Container, Docker & Kubernetes BasicsGlobal Azure Bootcamp: Container, Docker & Kubernetes Basics
Global Azure Bootcamp: Container, Docker & Kubernetes Basics
 
Kubernetes is all you need
Kubernetes is all you needKubernetes is all you need
Kubernetes is all you need
 
Why to Cloud Native
Why to Cloud NativeWhy to Cloud Native
Why to Cloud Native
 
Nugwc k8s session-16-march-2021
Nugwc k8s session-16-march-2021Nugwc k8s session-16-march-2021
Nugwc k8s session-16-march-2021
 
Running and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStackRunning and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStack
 
Why kubernetes matters
Why kubernetes mattersWhy kubernetes matters
Why kubernetes matters
 
Cloud technology with practical knowledge
Cloud technology with practical knowledgeCloud technology with practical knowledge
Cloud technology with practical knowledge
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacing
 
Simplify Your Way To Expert Kubernetes Management
Simplify Your Way To Expert Kubernetes ManagementSimplify Your Way To Expert Kubernetes Management
Simplify Your Way To Expert Kubernetes Management
 
Docker & kubernetes
Docker & kubernetesDocker & kubernetes
Docker & kubernetes
 
Serverless brewbox
Serverless   brewboxServerless   brewbox
Serverless brewbox
 
How docker & kubernetes can optimize the cost of hosting
How docker & kubernetes can optimize the cost of hostingHow docker & kubernetes can optimize the cost of hosting
How docker & kubernetes can optimize the cost of hosting
 

More from Slim Baltagi

How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?Slim Baltagi
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiSlim Baltagi
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsSlim Baltagi
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming AnalyticsSlim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiSlim Baltagi
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiSlim Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkSlim Baltagi
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuSlim Baltagi
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Slim Baltagi
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 

More from Slim Baltagi (20)

How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim Baltagi
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 

Recently uploaded

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Modern big data and machine learning in the era of cloud, docker and kubernetes

  • 3. 1. Key takeaways • There is a major shift in web and mobile application architecture from the ‘old-school’ one to a modern ‘micro- services’ architecture based on containers. Kubernetes has been quite successful in managing those containers and running them in distributed computing environments. • Now enabling Big Data and Machine Learning on Kubernetes will allow IT organizations to standardize on the same Kubernetes infrastructure. This will propel adoption and reduce costs. • Kubeflow is an open source framework dedicated to making it easy to use the machine learning tool of your choice and deploy your ML applications at scale on Kubernetes. Kubeflow is becoming an industry standard as well! • Both Kubernetes and Kubeflow will enable IT organizations to focus more effort on applications rather than infrastructure. 3
  • 5. 2. What is Docker? The Docker logo is sort of a whale / boat hybrid, filled with shipping containers. The analogy is taken from freight transport where goods are shipped in containers. 5
  • 6. 2. What is Docker? Docker is an open source technology, released back in 2013, for development and deployment of applications in containers that package together application’s code, libraries, configurations and software dependencies into container images. 6
  • 7. 2. What is Docker? A container is a runnable instance of an image. Container images can be pulled from a registry ( such as Docker Hub hub.docker.com, Azure Container Registry, …) and deployed anywhere the container runtime is installed: your laptop, servers on-premises, or in the cloud. 7
  • 8. 2. What is Docker? • Some of the advantages that Docker offers: • Identical environments: Deploy and run the same way whether in development, testing or production and the application that you deploy to one environment is going to work in another. • Isolated environments for your individual applications • Version control: Instead of “patching”, new functionality is added to a micro-service by replacing existing containers with ones that incorporate new functionality. • Portability: Easy move workloads between different versions of Linux for example • Developer Productivity • Application Agility: How quickly you can evolve an application • Operational Efficiencies: containerized applications are easier to deploy. • Scale out (not up): simply start more containers 8
  • 10. 3. What is Kubernetes? • The Kubernetes logo is literally a boat’s steering wheel. • It should be an admiral’s hat because, as we will see, Kubernetes helps you manage a fleet of Docker ‘boats’, not just one! 10
  • 11. 3. What is Kubernetes? • Kubernetes (numeronym K8s) is an open source platform for automating deployment, scaling and management of containerized applications both in cloud and on premise. • It was initially released by Google in 2014 and it is now managed by the Cloud Native Computing Foundation (CNCF). • Kubernetes has been already adopted by the largest public cloud vendors and technology providers. • Some of the companies providing Kubernetes Managed Services: Google Cloud Platform (GCP) – GKE; Microsoft Azure – AKS; Amazon AWS – EKS; Oracle – OKE; IBM Cloud Container Service; RedHat – OpenStack; Pivotal – PKS; Alibaba Cloud Container Service for Kubernetes, … • Kubernetes is being embraced by even more software vendors and enterprises. 11
  • 12. 3. What is Kubernetes? • A Kubernetes cluster is comprised of at least one master node, which manages the cluster, and multiple worker nodes, where containerized applications run using Pods. • A Pod is a logical grouping of one or more containers. Pods enable multiple containers to run on a host machine and share resources such as: storage, networking, and container runtime information. 12
  • 13. 3. What is Kubernetes? Some of the advantages that Kubernetes offers: • Kubernetes makes containers manageable • Portability between cloud and on-premises • Kubernetes cloud agnostic design made containerized applications to run on any platform without any changes to the application code. • Kubernetes provides two types of auto-scaling: • pod auto-scaling where more pods are automatically created in a cluster based on scaling rules, and • cluster auto-scaling where more nodes are added to a cluster based on flexible rules. • Monitoring: Rather than having to rely on ad hoc monitoring approaches, system monitoring is built into Kubernetes and provides for a wide range of features: replicas, rolling updates, auto-scaling, etc. • Better cluster resource utilization 13
  • 15. 4. Why Big Data on Kubernetes? • Big Data on Kubernetes is now a reality thanks to: • The Special Interest Group in Kubernetes Community on Big Data and the many companies collaborating on the related effort. • Kubernetes newer features such as StatefulSets, custom schedulers, custom resources, custom controllers, container storage interface, … • More persistent storage options to run stateful applications on Kubernetes, depending on data type, such as object storage, file systems, software defined storage, … • More and more Big Data/Fast Data Tools running on Kubernetes such as: Apache Spark, Apache Kafka, Apache Flink, Apache Cassandra, Apache Zookeeper, … 15
  • 16. 4. Why Big Data on Kubernetes? Example: Apache Spark on Kubernetes • Video: Submitting Spark jobs using Kubernetes scheduler on AKS. March 16, 2018 https://www.youtube.com/watch?v=T7pAZplLiCk • Article: Running Apache Spark jobs on AKS. March 15, 2018 https://docs.microsoft.com/en-us/azure/aks/spark-job • Blog: Apache Spark 2.3 with Native Kubernetes Support. March 15, 2018 • Docs: Running Spark on Kubernetes https://apache-spark-on- k8s.github.io/userdocs/running-on-kubernetes.html 16
  • 17. 4. Why Big Data on Kubernetes? • There are many ways to run Big Data applications such as Spark. For example: • Standalone mode using dedicated resources • YARN cluster co-resident with Hadoop • Mesos cluster alongside other Mesos applications • So, why would you run Big Data applications on Kubernetes? • In addition to all the advantages that Kubernetes offer, the following ones are particularly relevant to Big Data applications: • A single container orchestrator for all your applications • Increased server utilization • Isolation between workloads • Reduction in operational overhead • Language-agnostic distributed computing clusters 17
  • 18. 4. Why Big Data on Kubernetes? • A single container orchestrator for all your applications: For example, Kubernetes can manage a broad range of workloads; no need to deal with YARN/HDFS for data processing and a separate container orchestrator for your other applications. This solve the problem of running Big Data applications in silos in their own clusters. • Increased server utilization: For example, share nodes between Spark and other applications by having a streaming application running to feed a streaming Spark pipeline, or a nginx pod to serve web traffic without the need to statically partition nodes. 18
  • 19. 4. Why Big Data on Kubernetes? • Isolation between workloads: For example, Kubernetes allows you to safely co-schedule batch workloads like Spark on the same nodes as latency-sensitive servers. • Reduction in operational overhead. For example: Static clusters require greater operational know-how to do common tasks with Kafka, such as applying broker configuration updates, upgrading to a new version, and adding or decommissioning brokers. By using Kafka on Kubernetes, you can reduce the overhead for a number of common operational tasks with standard cluster resource manager features. • Containers and Kubernetes make great language- agnostic distributed computing clusters. 19
  • 21. 5. Why Machine Learning on Kubernetes? Machine Learning on Kubernetes is now a reality thanks to: • Development in Kubernetes such as Stateful applications, extension points, … • Hardware acceleration for Kubernetes from Nvidia (GPU) , Google (TPU: Tensor Processing Unit), … • Machine Learning tools running on Kubernetes such as: Kubeflow, Paddle, Seldon, RiseML, Anaconda, H2O, … • Emergence of Kubeflow, an open source framework dedicated to making it easy to use the machine learning tool of your choice and deploy your ML applications in distributed mode on Kubernetes. Kubeflow is becoming the industry standard as well! • Services such as the one from Microsoft to train and serve TensorFlow Models at scale with Kubernetes and Kubeflow on Azure Kubernetes Service AKS 21
  • 22. 5. Why Machine Learning on Kubernetes? • You've created a machine learning model, using a tool of choice such as TensorFlow, PyTorch, or scikit-learn… Now what? • How can you ensure that the model is deployed to production and can scale as needed on incoming data? • How can you seamlessly migrate a model from your local laptop / virtual machine to your cloud platform of choice? • Kubeflow includes: • the JupyterHub platform for creating and managing Jupyter notebook servers that are used by data science and research groups • a Tensorflow Customer Resource for managing compute resources to a specific cluster size • a Tensorflow Serving container to house the machine learning application 22
  • 23. 5 Why Machine Learning on Kubernetes? • Distributed training instead of sequential: huge time saver for large trainings • Enabling Machine Learning at large scale • Mix of GPU and CPU nodes to serve both as a training and serving platform • IT can better support data science and machine learning applications with Kubernetes as the common orchestration layer for all (containerized) applications • Ability for IT to create self-service environments for data scientists and other data users. • Single scheduling solution for multiple environments, on premise or in multiple clouds • Better resource utilization through centralized scheduling of data science and other containerized applications 23
  • 25. 6. How to get started? • Learn from some free tutorials in your browser ! • Docker & Containers https://www.katacoda.com/courses/docker • Kubernetes https://www.katacoda.com/courses/kubernetes • KubeFlow https://www.katacoda.com/kubeflow • Watch some demos • Sentiment Analysis using Kubernetes and Kubeflow, Google, May 31st 2018 https://www.youtube.com/watch?v=-ZlIuQXyD1A • OSS Unboxing – Kubeflow, Lachlan Evenson, Microsoft, May 11th 2018 https://www.youtube.com/watch?v=uL_pqP_HgcY • Do some labs • Labs for Training and Serving TensorFlow Models with Kubernetes and Kubeflow on Azure Container Service (AKS) https://github.com/Azure/kubeflow-labs • Introduction to Kubeflow on Google Kubernetes Engine (GKE) https://codelabs.developers.google.com/codelabs/kubeflow- introduction/index.html?index=..%2F..%2Fio2018#0 25
  • 26. Thank you! Let’s keep in touch! @SlimBaltagi https://www.linkedin.com/in/slimbaltagi sbaltagi@gmail.com 26