SlideShare a Scribd company logo
1 of 40
Download to read offline
Containerized architectures for deep learning
Antje Barth @anbarth
Me
Data Enthusiast
Technical Evangelist
AI / ML / Deep Learning
Container / Kubernetes
Big Data
#CodeLikeAGirl
Agenda
• Motivation
• ML pipeline tools and platforms
• Machine Learning on Kubernetes
• Deep Learning Demo
• Conclusion
Agenda
• Motivation
• ML pipeline tools and platforms
• Machine Learning on Kubernetes
• Deep Learning Demo
• Conclusion
ML – Helicopter view
How good are
your predictions?• Accuracy
• Optimization
ML – The (enterprise) reality
• Wrangle large datasets
• Unify disparate systems
• Composability
• Manage pipeline complexity
• Improve training/serving
consistency
• Improve portability
• Improve model quality
• Manage versions
Building
a model
Data
ingestion
Data
analysis
Data
transform
Data
validation
Data
splitting
Ad-hoc
Training
Model
validation
Logging
Roll-out Serving
Monitoring
Distributed
Training
Training
at scale
Data
Versioning
HP Tuning
Experiment
Tracking
Feature
Store
SYSTEM 1
SYSTEM 2 SYSTEM 3
SYSTEM 4 SYSTEM 5
SYSTEM 6
SYSTEM 3.5
SYSTEM 1.5
The rise of ML pipeline tools & platforms
Agenda
• Motivation
• ML pipeline tools and platforms
• Machine Learning on Kubernetes
• Deep Learning Demo
• Conclusion
Quick comparison
Apache Airflow is a
platform to
programmatically author,
schedule and monitor
workflows.
The Kubeflow project is
dedicated to making
deployments of machine
learning (ML) workflows
on Kubernetes simple,
portable and scalable.
TensorFlow Extended
(TFX) is an end-to-end
platform for deploying
production ML pipelines.
MLflow is an open
source platform to
manage the ML lifecycle,
including
experimentation,
reproducibility and
deployment.
https://airflow.apache.org/ https://www.kubeflow.org/
https://www.tensorflow.org/
tfx
https://mlflow.org/
How to scale to production?
Composability
Portability
Scalability
Wait a minute…
Virtual Machines
are Computers in a Box
Containers
are Applications in a Box
Containers?
Kubernetes?
{api}
Kubernetes is an API and agents
The Kubernetes API provides containers
with a scheduling, configuration, network,
and storage
The Kubernetes runtime manages the
containers
Agenda
• Motivation
• ML pipeline tools and platforms
• Machine Learning on Kubernetes
• Deep Learning Demo
• Conclusion
Machine Learning on Kubernetes
• Kubernetes-native
• Run wherever k8s runs
• Move between local – dev – test – prod – cloud
• Use k8s to manage ML tasks
• CRDs (UDTs) for distributed training
• Adopt k8s patterns
• Microservices
• Manage infrastructure declaratively
• Support for multiple ML frameworks
• Tensorflow, Pytorch, Scikit, Xgboost, etc.
Kubernetes ML/DL
Landscape
Source: https://twimlai.com/kubernetes-ebook/
https://landscape.lfai.foundation/
https://landscape.cncf.io/
Introducing Kubeflow
Make it easy for everyone to develop,
deploy, and manage portable, scalable
ML everywhere.
Credits:
Kubeflow components
Credits:
Composability
• Build and deploy re-usable,
portable, scalable, machine
learning workflows based on
Docker containers.
• Use the libraries/ frameworks of
your choice
Example:
KubeFlow "deployer" component lets you
deploy as a plain TF Serving model
server:
https://github.com/kubeflow/pipelines/tree/
master/components/kubeflow/deployer
METADATA
SERVING
Back to our ML enterprise workflow!
Building
a model
Data
ingestion
Data
analysis
Data
transform
Data
validation
Data
splitting
Ad-hoc
Training
Model
validation
Logging
Roll-out Serving
Monitoring
Distributed
Training
Training
at scale
Data
Versioning
HP Tuning
Experiment
Tracking
Feature
Store
Portability
Containers for
Deep Learning
Container runtime
Infrastructure
NVIDIA drivers
Host OS
Packages:
TensorFlow
mkl
cudnn
cublas
Nccl
CUDA toolkit
CPU:
GPU:
TensorFlow
Container
Image
Keras
horovod
numpy
scipy
others…
scikit-
learn
pandas
openmpi
Python
ML environments
that are:
TensorFlow
mkl
cudnn
cublas
Nccl
CUDA toolkit
CPU:
GPU:
TensorFlow
Container
Image
Keras
horovod
numpy
scipy
others…
scikit-
learn
pandas
openmpi
Python
Container runtime
Development System
NVIDIA drivers
Host OS
Container registry
push
pull
TensorFlow
mkl
cudnn
cublas
Nccl
CUDA toolkit
CPU:
GPU:
TensorFlow
Container
Image
Keras
horovod
numpy
scipy
others…
scikit-
learn
pandas
openmpi
Python
Container runtime
Training Cluster
NVIDIA drivers
Host OS
Scalability
• Kubernetes - Autoscaling Jobs
• Describe the job, let Kubernetes take care of the rest
• CPU, RAM, Accelerators
• TF Jobs delete themselves when finished, node pool will auto scale back
down
Model works
great! But I need
six nodes.
Data Scientist IT Ops
Credit: @aronchick
Scalability
• Kubernetes - Autoscaling Jobs
• Describe the job, let Kubernetes take care of the rest
• CPU, RAM, Accelerators
• TF Jobs delete themselves when finished, node pool will auto scale back
down
Data Scientist IT Ops
apiVersion: "kubeflow.org/v1alpha1"
kind: "TFJob"
spec:
replicaSpecs:
replicas: 6
CPU: 1
GPU: 1
containers: gcr.io/myco/myjob:1.0
Credit: @aronchick
Scalability
• Kubernetes - Autoscaling Jobs
• Describe the job, let Kubernetes take care of the rest
• CPU, RAM, Accelerators
• TF Jobs delete themselves when finished, node pool will auto scale back
down
Data Scientist IT Ops
GPU GPU GPU
GPU GPU GPU
Credit: @aronchick
Scalability
• Kubernetes - Autoscaling Jobs
• Describe the job, let Kubernetes take care of the rest
• CPU, RAM, Accelerators
• TF Jobs delete themselves when finished, node pool will auto scale back
down
Job’s done!
Data Scientist IT Ops
Credit: @aronchick
Agenda
• Motivation
• ML pipeline tools and platforms
• Container > Kubernetes > Kubeflow
• Deep Learning Demo
• Conclusion
DEMO “Doppelganger App”
Implementing Image Similarity search
Recap:
The “Kube”flow
• Deploy Kubernetes & Kubeflow
• Experiment in Jupyter
• Build Docker Image
• Train at Scale
• Build Model Server
• Deploy Model
• Integrate Model into App
• Operate
Model Training Model Serving
Pod
Pod Pod
Kubernetes Worker Nodes
#1 #2 #3
Jupyter
Notebook
Seldon Core
Engine
Seldon Core
Engine
Doppelganger
Model
Doppelganger
Model
Istio Gateway
(Traffic Routing)
{REST API}
curl…
Dockerfile
Training Job
Dockerfile
Inference Service
Data Scientist
Pod
Train
Model
Pod
Train
Model
Agenda
• Motivation
• ML pipeline tools and platforms
• Machine Learning on Kubernetes
• Deep Learning Demo
• Conclusion
Conclusion & Take-aways
• Platform matters
• Composability – Portability – Scalability
• Containerized architectures
• Kubernetes + Machine Learning = Kubeflow
• Start building!
https://github.com/antje/doppelganger
More information
• Kubeflow
https://www.kubeflow.org/
https://github.com/kubeflow/kubeflow
• Tensorflow Extended (TFX)
https://www.tensorflow.org/tfx
• The Definitive Guide to Machine Learning Platforms
https://twimlai.com/mlplatforms-ebook/
• Amazon Elastic Kubernetes Service (Amazon EKS)
https://eksworkshop.com
https://github.com/aws-samples/machine-learning-using-k8s
Session page on conference website O’Reilly Events App
Rate today’s session
Thank you!
antje.official
antje@anbarth
Antje Barth

More Related Content

What's hot

KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesAnimesh Singh
 
Automating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflowAutomating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflowStepan Pushkarev
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)Simba Khadder
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaCodeOps Technologies LLP
 
Kubeflow Control Plane 中文
Kubeflow Control Plane 中文Kubeflow Control Plane 中文
Kubeflow Control Plane 中文Weiqiang Zhuang
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...Provectus
 
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...Seldon
 
DevOps: Kubernetes + Helm with Azure
DevOps: Kubernetes + Helm with AzureDevOps: Kubernetes + Helm with Azure
DevOps: Kubernetes + Helm with AzureJessica Deen
 
Helm chart-introduction
Helm chart-introductionHelm chart-introduction
Helm chart-introductionGanesh Pol
 
A Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCA Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCChris Bunch
 
Knative from an Enterprise Perspective
Knative from an Enterprise PerspectiveKnative from an Enterprise Perspective
Knative from an Enterprise PerspectiveQAware GmbH
 
Intro to Helm for Kubernetes
Intro to Helm for KubernetesIntro to Helm for Kubernetes
Intro to Helm for KubernetesCarlos E. Salazar
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at AirbnbBill Liu
 
Hydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on KubeflowHydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on KubeflowRustem Zakiev
 

What's hot (19)

KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow Pipelines
 
Kubeflow
KubeflowKubeflow
Kubeflow
 
Automating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflowAutomating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflow
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
 
Kubeflow Control Plane 中文
Kubeflow Control Plane 中文Kubeflow Control Plane 中文
Kubeflow Control Plane 中文
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
 
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
 
DevOps: Kubernetes + Helm with Azure
DevOps: Kubernetes + Helm with AzureDevOps: Kubernetes + Helm with Azure
DevOps: Kubernetes + Helm with Azure
 
Helm chart-introduction
Helm chart-introductionHelm chart-introduction
Helm chart-introduction
 
PR workflow
PR workflowPR workflow
PR workflow
 
Webinar kubernetes and-spark
Webinar  kubernetes and-sparkWebinar  kubernetes and-spark
Webinar kubernetes and-spark
 
A Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCA Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCC
 
Neptune @ SoCal
Neptune @ SoCalNeptune @ SoCal
Neptune @ SoCal
 
Knative from an Enterprise Perspective
Knative from an Enterprise PerspectiveKnative from an Enterprise Perspective
Knative from an Enterprise Perspective
 
Intro to Helm for Kubernetes
Intro to Helm for KubernetesIntro to Helm for Kubernetes
Intro to Helm for Kubernetes
 
AWS in Practice
AWS in PracticeAWS in Practice
AWS in Practice
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at Airbnb
 
Hydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on KubeflowHydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on Kubeflow
 

Similar to Containerized architectures for deep learning

Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDocker, Inc.
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Akash Tandon
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Tushar Katarki
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Animesh Singh
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesDatabricks
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...Henry Saputra
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...DataScienceConferenc1
 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesCodemotion Tel Aviv
 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMwareVMUG IT
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudDatadog
 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowIT Arena
 
Containers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshellContainers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshellEugene Fedorenko
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...Abhinav Joshi
 

Similar to Containerized architectures for deep learning (20)

Deploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with KubernetesDeploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with Kubernetes
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
NextGenML
NextGenML NextGenML
NextGenML
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Kubeflow.pptx
Kubeflow.pptxKubeflow.pptx
Kubeflow.pptx
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with Kubernetes
 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
Containers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshellContainers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshell
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 

Recently uploaded

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Containerized architectures for deep learning

  • 1. Containerized architectures for deep learning Antje Barth @anbarth
  • 2. Me Data Enthusiast Technical Evangelist AI / ML / Deep Learning Container / Kubernetes Big Data #CodeLikeAGirl
  • 3. Agenda • Motivation • ML pipeline tools and platforms • Machine Learning on Kubernetes • Deep Learning Demo • Conclusion
  • 4. Agenda • Motivation • ML pipeline tools and platforms • Machine Learning on Kubernetes • Deep Learning Demo • Conclusion
  • 5.
  • 6. ML – Helicopter view How good are your predictions?• Accuracy • Optimization
  • 7.
  • 8. ML – The (enterprise) reality • Wrangle large datasets • Unify disparate systems • Composability • Manage pipeline complexity • Improve training/serving consistency • Improve portability • Improve model quality • Manage versions Building a model Data ingestion Data analysis Data transform Data validation Data splitting Ad-hoc Training Model validation Logging Roll-out Serving Monitoring Distributed Training Training at scale Data Versioning HP Tuning Experiment Tracking Feature Store SYSTEM 1 SYSTEM 2 SYSTEM 3 SYSTEM 4 SYSTEM 5 SYSTEM 6 SYSTEM 3.5 SYSTEM 1.5
  • 9.
  • 10. The rise of ML pipeline tools & platforms
  • 11. Agenda • Motivation • ML pipeline tools and platforms • Machine Learning on Kubernetes • Deep Learning Demo • Conclusion
  • 12. Quick comparison Apache Airflow is a platform to programmatically author, schedule and monitor workflows. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines. MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment. https://airflow.apache.org/ https://www.kubeflow.org/ https://www.tensorflow.org/ tfx https://mlflow.org/
  • 13. How to scale to production? Composability Portability Scalability
  • 15.
  • 16. Virtual Machines are Computers in a Box Containers are Applications in a Box Containers?
  • 18. Kubernetes is an API and agents The Kubernetes API provides containers with a scheduling, configuration, network, and storage The Kubernetes runtime manages the containers
  • 19. Agenda • Motivation • ML pipeline tools and platforms • Machine Learning on Kubernetes • Deep Learning Demo • Conclusion
  • 20. Machine Learning on Kubernetes • Kubernetes-native • Run wherever k8s runs • Move between local – dev – test – prod – cloud • Use k8s to manage ML tasks • CRDs (UDTs) for distributed training • Adopt k8s patterns • Microservices • Manage infrastructure declaratively • Support for multiple ML frameworks • Tensorflow, Pytorch, Scikit, Xgboost, etc.
  • 22. Introducing Kubeflow Make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere.
  • 24. Composability • Build and deploy re-usable, portable, scalable, machine learning workflows based on Docker containers. • Use the libraries/ frameworks of your choice Example: KubeFlow "deployer" component lets you deploy as a plain TF Serving model server: https://github.com/kubeflow/pipelines/tree/ master/components/kubeflow/deployer
  • 25. METADATA SERVING Back to our ML enterprise workflow! Building a model Data ingestion Data analysis Data transform Data validation Data splitting Ad-hoc Training Model validation Logging Roll-out Serving Monitoring Distributed Training Training at scale Data Versioning HP Tuning Experiment Tracking Feature Store
  • 26. Portability Containers for Deep Learning Container runtime Infrastructure NVIDIA drivers Host OS Packages: TensorFlow mkl cudnn cublas Nccl CUDA toolkit CPU: GPU: TensorFlow Container Image Keras horovod numpy scipy others… scikit- learn pandas openmpi Python ML environments that are:
  • 27. TensorFlow mkl cudnn cublas Nccl CUDA toolkit CPU: GPU: TensorFlow Container Image Keras horovod numpy scipy others… scikit- learn pandas openmpi Python Container runtime Development System NVIDIA drivers Host OS Container registry push pull TensorFlow mkl cudnn cublas Nccl CUDA toolkit CPU: GPU: TensorFlow Container Image Keras horovod numpy scipy others… scikit- learn pandas openmpi Python Container runtime Training Cluster NVIDIA drivers Host OS
  • 28. Scalability • Kubernetes - Autoscaling Jobs • Describe the job, let Kubernetes take care of the rest • CPU, RAM, Accelerators • TF Jobs delete themselves when finished, node pool will auto scale back down Model works great! But I need six nodes. Data Scientist IT Ops Credit: @aronchick
  • 29. Scalability • Kubernetes - Autoscaling Jobs • Describe the job, let Kubernetes take care of the rest • CPU, RAM, Accelerators • TF Jobs delete themselves when finished, node pool will auto scale back down Data Scientist IT Ops apiVersion: "kubeflow.org/v1alpha1" kind: "TFJob" spec: replicaSpecs: replicas: 6 CPU: 1 GPU: 1 containers: gcr.io/myco/myjob:1.0 Credit: @aronchick
  • 30. Scalability • Kubernetes - Autoscaling Jobs • Describe the job, let Kubernetes take care of the rest • CPU, RAM, Accelerators • TF Jobs delete themselves when finished, node pool will auto scale back down Data Scientist IT Ops GPU GPU GPU GPU GPU GPU Credit: @aronchick
  • 31. Scalability • Kubernetes - Autoscaling Jobs • Describe the job, let Kubernetes take care of the rest • CPU, RAM, Accelerators • TF Jobs delete themselves when finished, node pool will auto scale back down Job’s done! Data Scientist IT Ops Credit: @aronchick
  • 32. Agenda • Motivation • ML pipeline tools and platforms • Container > Kubernetes > Kubeflow • Deep Learning Demo • Conclusion
  • 35. Recap: The “Kube”flow • Deploy Kubernetes & Kubeflow • Experiment in Jupyter • Build Docker Image • Train at Scale • Build Model Server • Deploy Model • Integrate Model into App • Operate Model Training Model Serving Pod Pod Pod Kubernetes Worker Nodes #1 #2 #3 Jupyter Notebook Seldon Core Engine Seldon Core Engine Doppelganger Model Doppelganger Model Istio Gateway (Traffic Routing) {REST API} curl… Dockerfile Training Job Dockerfile Inference Service Data Scientist Pod Train Model Pod Train Model
  • 36. Agenda • Motivation • ML pipeline tools and platforms • Machine Learning on Kubernetes • Deep Learning Demo • Conclusion
  • 37. Conclusion & Take-aways • Platform matters • Composability – Portability – Scalability • Containerized architectures • Kubernetes + Machine Learning = Kubeflow • Start building! https://github.com/antje/doppelganger
  • 38. More information • Kubeflow https://www.kubeflow.org/ https://github.com/kubeflow/kubeflow • Tensorflow Extended (TFX) https://www.tensorflow.org/tfx • The Definitive Guide to Machine Learning Platforms https://twimlai.com/mlplatforms-ebook/ • Amazon Elastic Kubernetes Service (Amazon EKS) https://eksworkshop.com https://github.com/aws-samples/machine-learning-using-k8s
  • 39. Session page on conference website O’Reilly Events App Rate today’s session