Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Containerized architectures for deep learning
Antje Barth @anbarth
Me
Data Enthusiast
Technical Evangelist
AI / ML / Deep Learning
Container / Kubernetes
Big Data
#CodeLikeAGirl
Agenda
• Motivation
• ML pipeline tools and platforms
• Machine Learning on Kubernetes
• Deep Learning Demo
• Conclusion
Agenda
• Motivation
• ML pipeline tools and platforms
• Machine Learning on Kubernetes
• Deep Learning Demo
• Conclusion
ML – Helicopter view
How good are
your predictions?• Accuracy
• Optimization
ML – The (enterprise) reality
• Wrangle large datasets
• Unify disparate systems
• Composability
• Manage pipeline complex...
The rise of ML pipeline tools & platforms
Agenda
• Motivation
• ML pipeline tools and platforms
• Machine Learning on Kubernetes
• Deep Learning Demo
• Conclusion
Quick comparison
Apache Airflow is a
platform to
programmatically author,
schedule and monitor
workflows.
The Kubeflow pro...
How to scale to production?
Composability
Portability
Scalability
Wait a minute…
Virtual Machines
are Computers in a Box
Containers
are Applications in a Box
Containers?
Kubernetes?
{api}
Kubernetes is an API and agents
The Kubernetes API provides containers
with a scheduling, configuration, network,
and stor...
Agenda
• Motivation
• ML pipeline tools and platforms
• Machine Learning on Kubernetes
• Deep Learning Demo
• Conclusion
Machine Learning on Kubernetes
• Kubernetes-native
• Run wherever k8s runs
• Move between local – dev – test – prod – clou...
Kubernetes ML/DL
Landscape
Source: https://twimlai.com/kubernetes-ebook/
https://landscape.lfai.foundation/
https://landsc...
Introducing Kubeflow
Make it easy for everyone to develop,
deploy, and manage portable, scalable
ML everywhere.
Credits:
Kubeflow components
Credits:
Composability
• Build and deploy re-usable,
portable, scalable, machine
learning workflows based on
Docker containers.
• U...
METADATA
SERVING
Back to our ML enterprise workflow!
Building
a model
Data
ingestion
Data
analysis
Data
transform
Data
val...
Portability
Containers for
Deep Learning
Container runtime
Infrastructure
NVIDIA drivers
Host OS
Packages:
TensorFlow
mkl
...
TensorFlow
mkl
cudnn
cublas
Nccl
CUDA toolkit
CPU:
GPU:
TensorFlow
Container
Image
Keras
horovod
numpy
scipy
others…
sciki...
Scalability
• Kubernetes - Autoscaling Jobs
• Describe the job, let Kubernetes take care of the rest
• CPU, RAM, Accelerat...
Scalability
• Kubernetes - Autoscaling Jobs
• Describe the job, let Kubernetes take care of the rest
• CPU, RAM, Accelerat...
Scalability
• Kubernetes - Autoscaling Jobs
• Describe the job, let Kubernetes take care of the rest
• CPU, RAM, Accelerat...
Scalability
• Kubernetes - Autoscaling Jobs
• Describe the job, let Kubernetes take care of the rest
• CPU, RAM, Accelerat...
Agenda
• Motivation
• ML pipeline tools and platforms
• Container > Kubernetes > Kubeflow
• Deep Learning Demo
• Conclusion
DEMO “Doppelganger App”
Implementing Image Similarity search
Recap:
The “Kube”flow
• Deploy Kubernetes & Kubeflow
• Experiment in Jupyter
• Build Docker Image
• Train at Scale
• Build...
Agenda
• Motivation
• ML pipeline tools and platforms
• Machine Learning on Kubernetes
• Deep Learning Demo
• Conclusion
Conclusion & Take-aways
• Platform matters
• Composability – Portability – Scalability
• Containerized architectures
• Kub...
More information
• Kubeflow
https://www.kubeflow.org/
https://github.com/kubeflow/kubeflow
• Tensorflow Extended (TFX)
htt...
Session page on conference website O’Reilly Events App
Rate today’s session
Thank you!
antje.official
antje@anbarth
Antje Barth
Containerized architectures for deep learning
Containerized architectures for deep learning
Containerized architectures for deep learning
Containerized architectures for deep learning
Upcoming SlideShare
Loading in …5
×

Containerized architectures for deep learning

O'Reilly AI London, October 2019

  • Be the first to comment

Containerized architectures for deep learning

  1. 1. Containerized architectures for deep learning Antje Barth @anbarth
  2. 2. Me Data Enthusiast Technical Evangelist AI / ML / Deep Learning Container / Kubernetes Big Data #CodeLikeAGirl
  3. 3. Agenda • Motivation • ML pipeline tools and platforms • Machine Learning on Kubernetes • Deep Learning Demo • Conclusion
  4. 4. Agenda • Motivation • ML pipeline tools and platforms • Machine Learning on Kubernetes • Deep Learning Demo • Conclusion
  5. 5. ML – Helicopter view How good are your predictions?• Accuracy • Optimization
  6. 6. ML – The (enterprise) reality • Wrangle large datasets • Unify disparate systems • Composability • Manage pipeline complexity • Improve training/serving consistency • Improve portability • Improve model quality • Manage versions Building a model Data ingestion Data analysis Data transform Data validation Data splitting Ad-hoc Training Model validation Logging Roll-out Serving Monitoring Distributed Training Training at scale Data Versioning HP Tuning Experiment Tracking Feature Store SYSTEM 1 SYSTEM 2 SYSTEM 3 SYSTEM 4 SYSTEM 5 SYSTEM 6 SYSTEM 3.5 SYSTEM 1.5
  7. 7. The rise of ML pipeline tools & platforms
  8. 8. Agenda • Motivation • ML pipeline tools and platforms • Machine Learning on Kubernetes • Deep Learning Demo • Conclusion
  9. 9. Quick comparison Apache Airflow is a platform to programmatically author, schedule and monitor workflows. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines. MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment. https://airflow.apache.org/ https://www.kubeflow.org/ https://www.tensorflow.org/ tfx https://mlflow.org/
  10. 10. How to scale to production? Composability Portability Scalability
  11. 11. Wait a minute…
  12. 12. Virtual Machines are Computers in a Box Containers are Applications in a Box Containers?
  13. 13. Kubernetes? {api}
  14. 14. Kubernetes is an API and agents The Kubernetes API provides containers with a scheduling, configuration, network, and storage The Kubernetes runtime manages the containers
  15. 15. Agenda • Motivation • ML pipeline tools and platforms • Machine Learning on Kubernetes • Deep Learning Demo • Conclusion
  16. 16. Machine Learning on Kubernetes • Kubernetes-native • Run wherever k8s runs • Move between local – dev – test – prod – cloud • Use k8s to manage ML tasks • CRDs (UDTs) for distributed training • Adopt k8s patterns • Microservices • Manage infrastructure declaratively • Support for multiple ML frameworks • Tensorflow, Pytorch, Scikit, Xgboost, etc.
  17. 17. Kubernetes ML/DL Landscape Source: https://twimlai.com/kubernetes-ebook/ https://landscape.lfai.foundation/ https://landscape.cncf.io/
  18. 18. Introducing Kubeflow Make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere.
  19. 19. Credits: Kubeflow components Credits:
  20. 20. Composability • Build and deploy re-usable, portable, scalable, machine learning workflows based on Docker containers. • Use the libraries/ frameworks of your choice Example: KubeFlow "deployer" component lets you deploy as a plain TF Serving model server: https://github.com/kubeflow/pipelines/tree/ master/components/kubeflow/deployer
  21. 21. METADATA SERVING Back to our ML enterprise workflow! Building a model Data ingestion Data analysis Data transform Data validation Data splitting Ad-hoc Training Model validation Logging Roll-out Serving Monitoring Distributed Training Training at scale Data Versioning HP Tuning Experiment Tracking Feature Store
  22. 22. Portability Containers for Deep Learning Container runtime Infrastructure NVIDIA drivers Host OS Packages: TensorFlow mkl cudnn cublas Nccl CUDA toolkit CPU: GPU: TensorFlow Container Image Keras horovod numpy scipy others… scikit- learn pandas openmpi Python ML environments that are:
  23. 23. TensorFlow mkl cudnn cublas Nccl CUDA toolkit CPU: GPU: TensorFlow Container Image Keras horovod numpy scipy others… scikit- learn pandas openmpi Python Container runtime Development System NVIDIA drivers Host OS Container registry push pull TensorFlow mkl cudnn cublas Nccl CUDA toolkit CPU: GPU: TensorFlow Container Image Keras horovod numpy scipy others… scikit- learn pandas openmpi Python Container runtime Training Cluster NVIDIA drivers Host OS
  24. 24. Scalability • Kubernetes - Autoscaling Jobs • Describe the job, let Kubernetes take care of the rest • CPU, RAM, Accelerators • TF Jobs delete themselves when finished, node pool will auto scale back down Model works great! But I need six nodes. Data Scientist IT Ops Credit: @aronchick
  25. 25. Scalability • Kubernetes - Autoscaling Jobs • Describe the job, let Kubernetes take care of the rest • CPU, RAM, Accelerators • TF Jobs delete themselves when finished, node pool will auto scale back down Data Scientist IT Ops apiVersion: "kubeflow.org/v1alpha1" kind: "TFJob" spec: replicaSpecs: replicas: 6 CPU: 1 GPU: 1 containers: gcr.io/myco/myjob:1.0 Credit: @aronchick
  26. 26. Scalability • Kubernetes - Autoscaling Jobs • Describe the job, let Kubernetes take care of the rest • CPU, RAM, Accelerators • TF Jobs delete themselves when finished, node pool will auto scale back down Data Scientist IT Ops GPU GPU GPU GPU GPU GPU Credit: @aronchick
  27. 27. Scalability • Kubernetes - Autoscaling Jobs • Describe the job, let Kubernetes take care of the rest • CPU, RAM, Accelerators • TF Jobs delete themselves when finished, node pool will auto scale back down Job’s done! Data Scientist IT Ops Credit: @aronchick
  28. 28. Agenda • Motivation • ML pipeline tools and platforms • Container > Kubernetes > Kubeflow • Deep Learning Demo • Conclusion
  29. 29. DEMO “Doppelganger App”
  30. 30. Implementing Image Similarity search
  31. 31. Recap: The “Kube”flow • Deploy Kubernetes & Kubeflow • Experiment in Jupyter • Build Docker Image • Train at Scale • Build Model Server • Deploy Model • Integrate Model into App • Operate Model Training Model Serving Pod Pod Pod Kubernetes Worker Nodes #1 #2 #3 Jupyter Notebook Seldon Core Engine Seldon Core Engine Doppelganger Model Doppelganger Model Istio Gateway (Traffic Routing) {REST API} curl… Dockerfile Training Job Dockerfile Inference Service Data Scientist Pod Train Model Pod Train Model
  32. 32. Agenda • Motivation • ML pipeline tools and platforms • Machine Learning on Kubernetes • Deep Learning Demo • Conclusion
  33. 33. Conclusion & Take-aways • Platform matters • Composability – Portability – Scalability • Containerized architectures • Kubernetes + Machine Learning = Kubeflow • Start building! https://github.com/antje/doppelganger
  34. 34. More information • Kubeflow https://www.kubeflow.org/ https://github.com/kubeflow/kubeflow • Tensorflow Extended (TFX) https://www.tensorflow.org/tfx • The Definitive Guide to Machine Learning Platforms https://twimlai.com/mlplatforms-ebook/ • Amazon Elastic Kubernetes Service (Amazon EKS) https://eksworkshop.com https://github.com/aws-samples/machine-learning-using-k8s
  35. 35. Session page on conference website O’Reilly Events App Rate today’s session
  36. 36. Thank you! antje.official antje@anbarth Antje Barth

×