SlideShare a Scribd company logo
1 of 57
Download to read offline
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Arun Gupta, @arungupta
Machine Learning using Kubeflow and
Kubernetes
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
https://dilbert.com/strip/2013-02-02
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Supervised/unsupervised/reinforcement learning …
Data sourcing, cleanup, tagging & classification
Linear/logistic regression, Random forest, Decision tree, …
Linear algebra, Statistics, Probability
TensorFlow, PyTorch, MXNet, Caffe2, Keras, SciKit-Learn, …
Python, Julia, R, …
Training and evaluating models
Distributed training
IntelliJ, VSCode, PyCharm, Jupyter notebook
Hyperparameter Tuning
GPU or CPU
MLOps
https://happykaty.com/2018/05/15/drinking-from-a-fire-hose/
Machine Learning is Hard!
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine Learning 101
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FRAMEWORKS INTERFACES INFRASTRUCTURE
AI Services
Broadest and deepest set of capabilities
T H E AW S M L S TA C K
VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS
ML Services
ML Frameworks + Infrastructure
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D
& C O M P R E H E N D
M E D I C A L
L E X F O R E C A S TR E K O G N I T I O N
I M A G E
R E K O G N I T I O N
V I D E O
T E X T R A C T P E R S O N A L I Z E
Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment HostingAmazon SageMaker
F P G A SE C 2 P 3
& P 3 D N
E C 2 G 4
E C 2 C 5
I N F E R E N T I AG R E E N G R A S S E L A S T I C
I N F E R E N C E
D L
C O N T A I N E R S
& A M I s
E L A S T I C
K U B E R N E T E S
S E R V I C E
E L A S T I C
C O N T A I N E R
S E R V I C E
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FRAMEWORKS INTERFACES INFRASTRUCTURE
AI Services
Broadest and deepest set of capabilities
T H E AW S M L S TA C K
VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS
ML Services
ML Frameworks + Infrastructure
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D
& C O M P R E H E N D
M E D I C A L
L E X F O R E C A S TR E K O G N I T I O N
I M A G E
R E K O G N I T I O N
V I D E O
T E X T R A C T P E R S O N A L I Z E
Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment HostingAmazon SageMaker
F P G A SE C 2 P 3
& P 3 D N
E C 2 G 4
E C 2 C 5
I N F E R E N T I AG R E E N G R A S S E L A S T I C
I N F E R E N C E
D L
C O N T A I N E R S
& A M I s
E L A S T I C
K U B E R N E T E S
S E R V I C E
E L A S T I C
C O N T A I N E R
S E R V I C E
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FRAMEWORKS INTERFACES INFRASTRUCTURE
AI Services
Broadest and deepest set of capabilities
T H E AW S M L S TA C K
VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS
ML Services
ML Frameworks + Infrastructure
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D
& C O M P R E H E N D
M E D I C A L
L E X F O R E C A S TR E K O G N I T I O N
I M A G E
R E K O G N I T I O N
V I D E O
T E X T R A C T P E R S O N A L I Z E
Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment HostingAmazon SageMaker
F P G A SE C 2 P 3
& P 3 D N
E C 2 G 4
E C 2 C 5
I N F E R E N T I AG R E E N G R A S S E L A S T I C
I N F E R E N C E
D L
C O N T A I N E R S
& A M I s
E L A S T I C
K U B E R N E T E S
S E R V I C E
E L A S T I C
C O N T A I N E R
S E R V I C E
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FRAMEWORKS INTERFACES INFRASTRUCTURE
AI Services
Broadest and deepest set of capabilities
T H E AW S M L S TA C K
VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS
ML Services
ML Frameworks + Infrastructure
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D
& C O M P R E H E N D
M E D I C A L
L E X F O R E C A S TR E K O G N I T I O N
I M A G E
R E K O G N I T I O N
V I D E O
T E X T R A C T P E R S O N A L I Z E
Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment Hosting
F P G A SE C 2 P 3
& P 3 D N
E C 2 G 4
E C 2 C 5
I N F E R E N T I AG R E E N G R A S S E L A S T I C
I N F E R E N C E
D L
C O N T A I N E R S
& A M I s
E L A S T I C
K U B E R N E T E S
S E R V I C E
E L A S T I C
C O N T A I N E R
S E R V I C E
Amazon SageMaker
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
M A C H I N E L E A R N I N G
S T O R A G E
Amazon Redshift
+ Redshift Spectrum
Amazon
QuickSight
Amazon EMR
Hadoop, Spark, Presto,
Pig, Hive…19 total
Amazon
Athena
Amazon
Kinesis
Amazon
Elasticsearch
Service
AWS Glue
A N A L Y T I C S
Amazon S3
Standard-IA
Amazon S3
Standard
Amazon S3
One Zone-IA
Amazon
Glacier
Amazon S3
Intelligent-
Tiering
N E W
Amazon
EBS
Amazon S3
Glacier Deep
Archive
N E W
Storage and Analytics for Machine Learning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why Machine Learning on Kubernetes?
Composability Portability Scalability
O N - P R E M I S E S C L O U D
http://www.shutterstock.com/gallery-635827p1.html
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine Learning on K8s: Without KubeFlow
@aronchik
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine Learning on K8s: With KubeFlow
@aronchik
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in
KubeFlow?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EKS: run Kubernetes in cloud
Managed Kubernetes control plane, attach data plane
Native upstream Kubernetes experience
Platform for enterprises to run production-grade workloads
Integrates with additional AWS services
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Getting started with Amazon EKS
eksctl CLI—create Amazon EKS clusters (eksctl.io)
Creates all resources needed for the cluster
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Train Inference
Set up K8s for ML: Option 1
Trained
model
2 3 4
Data
1
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Set up K8s for ML: Option 2a
Train & inference
Trained
model
2
3
4
role: train
role: train
role: train role: inference
role: inference
Data
1
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Set up K8s for ML: Option 2b
Train, inference, & applications
role: train
role: train
role: train role: inference
role: inference
role: apps
role: apps
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Scaling the cluster
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kubeflow Requirements
4 CPU, 12 GB memory, 50 GB storage
kubeflow.org/docs/started/k8s/overview/
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kubeflow on Desktop
MiniKF: Local Kubeflow deployment using VirtualBox and
Vagrant
• Minikube -> Kubernetes
• MiniKF -> Kubeflow (includes minikube)
Runs on macOS, Linux, and Windows
Does not require k8s-specific knowledge
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kubeflow on Cloud
Major cloud providers supported
Choices on Amazon Web Services
• Self-managed k8s on EC2: Kops, CloudFormation, Terraform
• Amazon EKS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Getting Started with Kubeflow
on Amazon EKS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Jupyter Notebook
Web application to build, deploy, and train ML models
Create and share documents that contain live code,
equations, visualizations, and narrative text
40+ programming languages
Use cases
data cleaning and transformation
numerical simulation
data visualization
machine learning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Training using Jupyter Notebook
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kubeflow Fairing
Python SDK to build, train, and deploy ML models remotely
Goals:
• Easily package ML training jobs
• Train ML models in the cloud
• Streamline the process of deploying a trained model
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
https://github.com/aws-samples/eks-kubeflow-workshop/blob/master/notebooks/02_Fairing/02_06_fairing_e2e.ipynb
Setup Kubeflow Fairing for training and prediction
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Train an XGBoost model remotely on Kubeflow
Deploy the trained model to Kubeflow for prediction
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Call the prediction endpoint
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hyperparameter Tuning using Katib
Hyperparameter are parameters external to the model to
control the training
e.g. Learning rate, batch size, epochs
Tuning finds a set of hyperparameters that optimizes an
objective function
e.g. Find the optimal batch size and learning rate to maximize prediction
accuracy
Katib enables hyperparameter tuning in Kubeflow
Credits @Richard Liu @Johnu (Kubeflow slack)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Katib Concepts
Extensible
Framework agnostic: TensorFlow, PyTorch, MXNet, …
Customizable algorithm backend
Experiment: “optimization loop” for some specific problem
Suggestion: a proposed solution to the problem
Trial: one iteration of the loop
Job: evaluate a trial and calculate objective value
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Katib System Architecture
Experiment
Controller
Hyperparameters
Trial
Controller
Suggestion
Controller
Trial
Trial
Trial
Create
Experiment
Metrics
Trial CR
Experiment = CreateExperimentCR()
while not Experiment.Objective reached:
Suggestion = CreateSuggestionCR()
HyperParameters = Suggestion.Assignments
Metrics = CreateTrialCR(HyperParameters)
ReportMetrics(Metrics)
Suggestion CR
Metric DB
Experiment CR
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Experiment CR Hyperparameters
https://github.com/aws-samples/eks-kubeflow-workshop/blob/master/notebooks/08_Hyperparameter_Tuning/random-search-example.yaml
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Trial template
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kubeflow KFServing
Simple and pluggable platform for ML inference
Intuitive and consistent experience
Serving models on arbitrary frameworks
e.g. TensorFlow, XGBoost, SciKitLearn
Encapsulates GPU auto-scaling, canary rollouts
Credits @ellis-bigelow (Kubeflow slack)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KFServing Custom Resource
S3 secret
attached to
Service
Account
Trained
model
https://github.com/kubeflow/kfserving/blob/master/docs/samples/s3/tensorflow_s3.yaml
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pluggable Interface
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
default:
sklearn:
storageUri: "gs://kfserving-samples/models/sklearn/iris"
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "InferenceService"
metadata:
name: "flowers-sample"
spec:
default:
tensorflow:
storageUri: "gs://kfserving-samples/models/tensorflow/flowers"
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "KFService"
metadata:
name: "pytorch-cifar10"
spec:
default:
pytorch:
storageUri: "gs://kfserving-samples/models/pytorch/cifar10"
modelClassName: "Net"
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KFServing Interface – Scikit Learn
apiVersion: "serving.kubeflow.org/v1alpha1"
kind: "KFService"
metadata:
name: "sklearn-iris"
spec:
default:
sklearn:
storageUri: "gs://kfserving-samples/models/sklearn/iris"
serviceAccount: inferencing-robot
minReplicas: 3
maxReplicas: 10
resources:
requests:
cpu: 2
gpu: 1
memory: 10Gi
canaryTrafficPercent: 25
canary:
sklearn:
storageUri: "gs://kfserving-samples/models/sklearn/iris-v2"
serviceAccount: inferencing-robot
minReplicas: 3
maxReplicas: 10
resources:
requests:
cpu: 2
gpu: 1
memory: 10Gi
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Distributed Training using Horovod
Created by Uber, hosted by LF AI Foundation
Distributed training framework for TensorFlow, Keras,
PyTorch, and MXNet
Compared to distributed TensorFlow
• Far less code changes
• ~2x faster
Examples at Uber: Self-driving vehicles, fraud detection,
and trip forecasting
Named after a traditional Russian dance
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kubeflow Pipelines
Compose, deploy, and manage end-to-end ML workflows
• End-to-end orchestration
• Easy, rapid, and reliable experimentation
• Easy re-use
Built using Pipelines SDK
• kfp.compiler, kfp.components, kfp.Client
Uses Argo under the hood to orchestrate resources
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kubeflow Pipelines Platform
UI for managing and tracking experiments, jobs, and runs
Engine for scheduling multi-step ML workflows
SDK for defining and manipulating pipelines and
components
• kubeflow-pipelines.readthedocs.io/en/latest/
Notebooks for interacting with the system using the SDK
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Creating Kubeflow Pipeline Components
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Consumer Loan Acceptance Scoring
Objective
• Putting hundreds of data products live
• Single development -> deployment -> delivery environment
• First go batch, then real-time
Analytics environment
• AWS
• High security and compliance with regulation
Typical modeling context
• Structured data
• Supervised learning
• Internalizing interpretable models and hybrid pipelines
www.credo.be
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Requirements
Data Scientists
• Hybrid, integrated, cloud-based dev env
• Python
• PySpark (locally + remotely on Spark cluster)
• R
• SQL
• Version control (scripts & artifacts)
ML DevOps
• Seamless deployment of hybrid pipelines
• Trigger-based scheduling & orchestration of runs
• Monitoring & dashboard
• Version control (runs & pipelines)
www.credo.be
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Architecture
AWS
Infrastructure, connections, security
S3, Spark cluster, VMs, …
Amazon EKS Amazon ECR
Kubeflow 0.6
Notebook dev environment
Pipelines for dev & delivery
ElasticStack
Dashboarding
Custom notebook servers
www.credo.be
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storage: FSx for Lustre
High performance file system for processing Amazon S3
or on-premises data
Low latency and high throughput
Works natively with Amazon S3
Container Storage Interface (CSI) driver
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Best Practices for Optimizing Distributed Deep
Learning Performance on Amazon EKS
https://aws.amazon.com/blogs/opensource/optimizing-distributed-deep-learning-performance-amazon-eks/
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Advantages of KubeFlow on AWS
EKS cluster provision with
External traffic with
to manage Lustre file system
Centralized and unified K8s logs in
TLS and Auth with and
for your K8s API server endpoint
Detect GPU instance and install
kubeflow.org/docs/aws
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kanban Board
github.com/orgs/kubeflow/projects/25
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Application Requirements 1.0
https://github.com/kubeflow/community/blob/master/g
uidelines/application_requirements.md
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine Learning pipeline
Choose and
Optimize your
ML algorithm
Setup and
manage
environments
for training
Deploy model
in production
Collect &
prepare
training data
Train and
tune model
(trial and error)
Scale &
manage
environment
in production
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine Learning pipeline for Kubernetes on AWS
Linear
regression,
decision tree,
BYOA
GPU- and CPU-
based clusters,
*operators
(TensorFlow,
MXNet, …)
TensorFlow
Serving, MXNet
Model Server,
Seldon, …
EMR,
Redshift, S3
TensorFlow,
Horovod,
MXNet,
PyTorch, Keras,
…
EKS
or
Self-managed
K8s
Choose and
Optimize your
ML algorithm
Setup and
manage
environments
for training
Deploy model
in production
Collect &
prepare
training data
Train and
tune model
(trial and error)
Scale &
manage
environment
in production
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
References
Workshop: eksworkshop.com/kubeflow
Jupyter notebooks: github.com/aws-samples/eks-
kubeflow-workshop/
Optimizing Machine Learning performance:
aws.amazon.com/blogs/opensource/optimizing-
distributed-deep-learning-performance-amazon-eks/

More Related Content

What's hot

End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro sessionAvinash Patil
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&MDatabricks
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Animesh Singh
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowDatabricks
 
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)DataWorks Summit
 
Google Vertex AI
Google Vertex AIGoogle Vertex AI
Google Vertex AIVikasBisoi
 
Kubernetes design principles, patterns and ecosystem
Kubernetes design principles, patterns and ecosystemKubernetes design principles, patterns and ecosystem
Kubernetes design principles, patterns and ecosystemSreenivas Makam
 
Kubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOKubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOAnimesh Singh
 
MLops workshop AWS
MLops workshop AWSMLops workshop AWS
MLops workshop AWSGili Nachum
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumSasha Rosenbaum
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOpsDatabricks
 
Azure kubernetes service (aks)
Azure kubernetes service (aks)Azure kubernetes service (aks)
Azure kubernetes service (aks)Akash Agrawal
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsNilesh Gule
 

What's hot (20)

End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflow
 
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
 
Kubeflow
KubeflowKubeflow
Kubeflow
 
Google Vertex AI
Google Vertex AIGoogle Vertex AI
Google Vertex AI
 
Kubernetes design principles, patterns and ecosystem
Kubernetes design principles, patterns and ecosystemKubernetes design principles, patterns and ecosystem
Kubernetes design principles, patterns and ecosystem
 
Kubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOKubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPO
 
MLops workshop AWS
MLops workshop AWSMLops workshop AWS
MLops workshop AWS
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha Rosenbaum
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
Azure kubernetes service (aks)
Azure kubernetes service (aks)Azure kubernetes service (aks)
Azure kubernetes service (aks)
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss tools
 

Similar to SuggestionTrial ControllerTrialTrialTrialJob ControllerJobJobJobMetricsMetricsMetricsMetricsObjectiveObjectiveObjectiveObjectiveSuggestionTrialTrialTrialJobJobJobMetricsMetricsMetricsObjectiveObjectiveObjectiveSuggestionTrialTrialJobJobMetricsMetricsObjectiveObjectiveSuggestionTrialJobMetricsObjectiveSuggest

Machine Learning with Kubernetes- AWS Container Day 2019 Barcelona
Machine Learning with Kubernetes- AWS Container Day 2019 BarcelonaMachine Learning with Kubernetes- AWS Container Day 2019 Barcelona
Machine Learning with Kubernetes- AWS Container Day 2019 BarcelonaAmazon Web Services
 
Sviluppa, addestra e distribuisci modelli di machine learning.pdf
Sviluppa, addestra e distribuisci modelli di machine learning.pdfSviluppa, addestra e distribuisci modelli di machine learning.pdf
Sviluppa, addestra e distribuisci modelli di machine learning.pdfAmazon Web Services
 
Machine learning using Kubernetes
Machine learning using KubernetesMachine learning using Kubernetes
Machine learning using KubernetesArun Gupta
 
Artifical Intelligence and Machine Learning 201, AWS Federal Pop-Up Loft
Artifical Intelligence and Machine Learning 201, AWS Federal Pop-Up LoftArtifical Intelligence and Machine Learning 201, AWS Federal Pop-Up Loft
Artifical Intelligence and Machine Learning 201, AWS Federal Pop-Up LoftAmazon Web Services
 
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019AWS Summits
 
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019Amazon Web Services
 
Drive Digital Transformation using Machine Learning
Drive Digital Transformation using Machine LearningDrive Digital Transformation using Machine Learning
Drive Digital Transformation using Machine LearningAmazon Web Services
 
ML Centrepiece for Digital Transformation
ML Centrepiece for Digital TransformationML Centrepiece for Digital Transformation
ML Centrepiece for Digital TransformationAmazon Web Services
 
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)Julien SIMON
 
Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Julien SIMON
 
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMakerDeep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMakerAmazon Web Services
 
Uu 2019-05-08 - machine learning -alternative
Uu   2019-05-08 - machine learning -alternativeUu   2019-05-08 - machine learning -alternative
Uu 2019-05-08 - machine learning -alternativeAdarshMamidpelliwar1
 
AWS Summit Singapore 2019 | Building Business Outcomes with Machine Learning ...
AWS Summit Singapore 2019 | Building Business Outcomes with Machine Learning ...AWS Summit Singapore 2019 | Building Business Outcomes with Machine Learning ...
AWS Summit Singapore 2019 | Building Business Outcomes with Machine Learning ...Amazon Web Services
 
Build-Train-Deploy-Machine-Learning-Models-at-Any-Scale
Build-Train-Deploy-Machine-Learning-Models-at-Any-ScaleBuild-Train-Deploy-Machine-Learning-Models-at-Any-Scale
Build-Train-Deploy-Machine-Learning-Models-at-Any-ScaleAmazon Web Services
 
Scale - Amazon SageMaker Deep Dive for Builders
Scale - Amazon SageMaker Deep Dive for BuildersScale - Amazon SageMaker Deep Dive for Builders
Scale - Amazon SageMaker Deep Dive for BuildersAmazon Web Services
 
Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...
Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...
Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...Amazon Web Services
 
Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...
Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...
Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...Amazon Web Services
 
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalise
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalisere:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalise
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon PersonaliseAmazon Web Services
 

Similar to SuggestionTrial ControllerTrialTrialTrialJob ControllerJobJobJobMetricsMetricsMetricsMetricsObjectiveObjectiveObjectiveObjectiveSuggestionTrialTrialTrialJobJobJobMetricsMetricsMetricsObjectiveObjectiveObjectiveSuggestionTrialTrialJobJobMetricsMetricsObjectiveObjectiveSuggestionTrialJobMetricsObjectiveSuggest (20)

Machine Learning with Kubernetes- AWS Container Day 2019 Barcelona
Machine Learning with Kubernetes- AWS Container Day 2019 BarcelonaMachine Learning with Kubernetes- AWS Container Day 2019 Barcelona
Machine Learning with Kubernetes- AWS Container Day 2019 Barcelona
 
Sviluppa, addestra e distribuisci modelli di machine learning.pdf
Sviluppa, addestra e distribuisci modelli di machine learning.pdfSviluppa, addestra e distribuisci modelli di machine learning.pdf
Sviluppa, addestra e distribuisci modelli di machine learning.pdf
 
Machine learning using Kubernetes
Machine learning using KubernetesMachine learning using Kubernetes
Machine learning using Kubernetes
 
Artifical Intelligence and Machine Learning 201, AWS Federal Pop-Up Loft
Artifical Intelligence and Machine Learning 201, AWS Federal Pop-Up LoftArtifical Intelligence and Machine Learning 201, AWS Federal Pop-Up Loft
Artifical Intelligence and Machine Learning 201, AWS Federal Pop-Up Loft
 
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
 
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
Unleash the Power of ML with AWS | AWS Summit Tel Aviv 2019
 
Drive Digital Transformation using Machine Learning
Drive Digital Transformation using Machine LearningDrive Digital Transformation using Machine Learning
Drive Digital Transformation using Machine Learning
 
ML Centrepiece for Digital Transformation
ML Centrepiece for Digital TransformationML Centrepiece for Digital Transformation
ML Centrepiece for Digital Transformation
 
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
 
Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)
 
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMakerDeep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
 
Alexa for Education
Alexa for EducationAlexa for Education
Alexa for Education
 
Uu 2019-05-08 - machine learning -alternative
Uu   2019-05-08 - machine learning -alternativeUu   2019-05-08 - machine learning -alternative
Uu 2019-05-08 - machine learning -alternative
 
Amazon SageMaker In Action
Amazon SageMaker In Action Amazon SageMaker In Action
Amazon SageMaker In Action
 
AWS Summit Singapore 2019 | Building Business Outcomes with Machine Learning ...
AWS Summit Singapore 2019 | Building Business Outcomes with Machine Learning ...AWS Summit Singapore 2019 | Building Business Outcomes with Machine Learning ...
AWS Summit Singapore 2019 | Building Business Outcomes with Machine Learning ...
 
Build-Train-Deploy-Machine-Learning-Models-at-Any-Scale
Build-Train-Deploy-Machine-Learning-Models-at-Any-ScaleBuild-Train-Deploy-Machine-Learning-Models-at-Any-Scale
Build-Train-Deploy-Machine-Learning-Models-at-Any-Scale
 
Scale - Amazon SageMaker Deep Dive for Builders
Scale - Amazon SageMaker Deep Dive for BuildersScale - Amazon SageMaker Deep Dive for Builders
Scale - Amazon SageMaker Deep Dive for Builders
 
Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...
Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...
Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...
 
Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...
Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...
Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...
 
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalise
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalisere:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalise
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalise
 

More from Arun Gupta

5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdf5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdfArun Gupta
 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Arun Gupta
 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerArun Gupta
 
Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019Arun Gupta
 
Why Amazon Cares about Open Source
Why Amazon Cares about Open SourceWhy Amazon Cares about Open Source
Why Amazon Cares about Open SourceArun Gupta
 
Building Cloud Native Applications
Building Cloud Native ApplicationsBuilding Cloud Native Applications
Building Cloud Native ApplicationsArun Gupta
 
Chaos Engineering with Kubernetes
Chaos Engineering with KubernetesChaos Engineering with Kubernetes
Chaos Engineering with KubernetesArun Gupta
 
How to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAMHow to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAMArun Gupta
 
Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018Arun Gupta
 
The Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 KeynoteThe Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 KeynoteArun Gupta
 
Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018Arun Gupta
 
Mastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv SummitMastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv SummitArun Gupta
 
Top 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's LandscapeTop 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's LandscapeArun Gupta
 
Container Landscape in 2017
Container Landscape in 2017Container Landscape in 2017
Container Landscape in 2017Arun Gupta
 
Java EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShiftJava EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShiftArun Gupta
 
Docker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developersDocker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developersArun Gupta
 
Thanks Managers!
Thanks Managers!Thanks Managers!
Thanks Managers!Arun Gupta
 
Migrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to ContainersMigrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to ContainersArun Gupta
 
NoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessNoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessArun Gupta
 
Package your Java EE Application using Docker and Kubernetes
Package your Java EE Application using Docker and KubernetesPackage your Java EE Application using Docker and Kubernetes
Package your Java EE Application using Docker and KubernetesArun Gupta
 

More from Arun Gupta (20)

5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdf5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdf
 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019
 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using Firecracker
 
Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019
 
Why Amazon Cares about Open Source
Why Amazon Cares about Open SourceWhy Amazon Cares about Open Source
Why Amazon Cares about Open Source
 
Building Cloud Native Applications
Building Cloud Native ApplicationsBuilding Cloud Native Applications
Building Cloud Native Applications
 
Chaos Engineering with Kubernetes
Chaos Engineering with KubernetesChaos Engineering with Kubernetes
Chaos Engineering with Kubernetes
 
How to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAMHow to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAM
 
Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018
 
The Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 KeynoteThe Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 Keynote
 
Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018
 
Mastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv SummitMastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv Summit
 
Top 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's LandscapeTop 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's Landscape
 
Container Landscape in 2017
Container Landscape in 2017Container Landscape in 2017
Container Landscape in 2017
 
Java EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShiftJava EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShift
 
Docker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developersDocker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developers
 
Thanks Managers!
Thanks Managers!Thanks Managers!
Thanks Managers!
 
Migrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to ContainersMigrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to Containers
 
NoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessNoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern Success
 
Package your Java EE Application using Docker and Kubernetes
Package your Java EE Application using Docker and KubernetesPackage your Java EE Application using Docker and Kubernetes
Package your Java EE Application using Docker and Kubernetes
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

SuggestionTrial ControllerTrialTrialTrialJob ControllerJobJobJobMetricsMetricsMetricsMetricsObjectiveObjectiveObjectiveObjectiveSuggestionTrialTrialTrialJobJobJobMetricsMetricsMetricsObjectiveObjectiveObjectiveSuggestionTrialTrialJobJobMetricsMetricsObjectiveObjectiveSuggestionTrialJobMetricsObjectiveSuggest

  • 1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Arun Gupta, @arungupta Machine Learning using Kubeflow and Kubernetes
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://dilbert.com/strip/2013-02-02
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Supervised/unsupervised/reinforcement learning … Data sourcing, cleanup, tagging & classification Linear/logistic regression, Random forest, Decision tree, … Linear algebra, Statistics, Probability TensorFlow, PyTorch, MXNet, Caffe2, Keras, SciKit-Learn, … Python, Julia, R, … Training and evaluating models Distributed training IntelliJ, VSCode, PyCharm, Jupyter notebook Hyperparameter Tuning GPU or CPU MLOps https://happykaty.com/2018/05/15/drinking-from-a-fire-hose/ Machine Learning is Hard!
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Learning 101
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FRAMEWORKS INTERFACES INFRASTRUCTURE AI Services Broadest and deepest set of capabilities T H E AW S M L S TA C K VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS ML Services ML Frameworks + Infrastructure P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D & C O M P R E H E N D M E D I C A L L E X F O R E C A S TR E K O G N I T I O N I M A G E R E K O G N I T I O N V I D E O T E X T R A C T P E R S O N A L I Z E Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment HostingAmazon SageMaker F P G A SE C 2 P 3 & P 3 D N E C 2 G 4 E C 2 C 5 I N F E R E N T I AG R E E N G R A S S E L A S T I C I N F E R E N C E D L C O N T A I N E R S & A M I s E L A S T I C K U B E R N E T E S S E R V I C E E L A S T I C C O N T A I N E R S E R V I C E
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FRAMEWORKS INTERFACES INFRASTRUCTURE AI Services Broadest and deepest set of capabilities T H E AW S M L S TA C K VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS ML Services ML Frameworks + Infrastructure P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D & C O M P R E H E N D M E D I C A L L E X F O R E C A S TR E K O G N I T I O N I M A G E R E K O G N I T I O N V I D E O T E X T R A C T P E R S O N A L I Z E Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment HostingAmazon SageMaker F P G A SE C 2 P 3 & P 3 D N E C 2 G 4 E C 2 C 5 I N F E R E N T I AG R E E N G R A S S E L A S T I C I N F E R E N C E D L C O N T A I N E R S & A M I s E L A S T I C K U B E R N E T E S S E R V I C E E L A S T I C C O N T A I N E R S E R V I C E
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FRAMEWORKS INTERFACES INFRASTRUCTURE AI Services Broadest and deepest set of capabilities T H E AW S M L S TA C K VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS ML Services ML Frameworks + Infrastructure P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D & C O M P R E H E N D M E D I C A L L E X F O R E C A S TR E K O G N I T I O N I M A G E R E K O G N I T I O N V I D E O T E X T R A C T P E R S O N A L I Z E Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment HostingAmazon SageMaker F P G A SE C 2 P 3 & P 3 D N E C 2 G 4 E C 2 C 5 I N F E R E N T I AG R E E N G R A S S E L A S T I C I N F E R E N C E D L C O N T A I N E R S & A M I s E L A S T I C K U B E R N E T E S S E R V I C E E L A S T I C C O N T A I N E R S E R V I C E
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FRAMEWORKS INTERFACES INFRASTRUCTURE AI Services Broadest and deepest set of capabilities T H E AW S M L S TA C K VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS ML Services ML Frameworks + Infrastructure P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D & C O M P R E H E N D M E D I C A L L E X F O R E C A S TR E K O G N I T I O N I M A G E R E K O G N I T I O N V I D E O T E X T R A C T P E R S O N A L I Z E Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment Hosting F P G A SE C 2 P 3 & P 3 D N E C 2 G 4 E C 2 C 5 I N F E R E N T I AG R E E N G R A S S E L A S T I C I N F E R E N C E D L C O N T A I N E R S & A M I s E L A S T I C K U B E R N E T E S S E R V I C E E L A S T I C C O N T A I N E R S E R V I C E Amazon SageMaker
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. M A C H I N E L E A R N I N G S T O R A G E Amazon Redshift + Redshift Spectrum Amazon QuickSight Amazon EMR Hadoop, Spark, Presto, Pig, Hive…19 total Amazon Athena Amazon Kinesis Amazon Elasticsearch Service AWS Glue A N A L Y T I C S Amazon S3 Standard-IA Amazon S3 Standard Amazon S3 One Zone-IA Amazon Glacier Amazon S3 Intelligent- Tiering N E W Amazon EBS Amazon S3 Glacier Deep Archive N E W Storage and Analytics for Machine Learning
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why Machine Learning on Kubernetes? Composability Portability Scalability O N - P R E M I S E S C L O U D http://www.shutterstock.com/gallery-635827p1.html
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Learning on K8s: Without KubeFlow @aronchik
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Learning on K8s: With KubeFlow @aronchik
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in KubeFlow?
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EKS: run Kubernetes in cloud Managed Kubernetes control plane, attach data plane Native upstream Kubernetes experience Platform for enterprises to run production-grade workloads Integrates with additional AWS services
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Getting started with Amazon EKS eksctl CLI—create Amazon EKS clusters (eksctl.io) Creates all resources needed for the cluster
  • 17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Train Inference Set up K8s for ML: Option 1 Trained model 2 3 4 Data 1
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Set up K8s for ML: Option 2a Train & inference Trained model 2 3 4 role: train role: train role: train role: inference role: inference Data 1
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Set up K8s for ML: Option 2b Train, inference, & applications role: train role: train role: train role: inference role: inference role: apps role: apps
  • 20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Scaling the cluster
  • 21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kubeflow Requirements 4 CPU, 12 GB memory, 50 GB storage kubeflow.org/docs/started/k8s/overview/
  • 22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kubeflow on Desktop MiniKF: Local Kubeflow deployment using VirtualBox and Vagrant • Minikube -> Kubernetes • MiniKF -> Kubeflow (includes minikube) Runs on macOS, Linux, and Windows Does not require k8s-specific knowledge
  • 23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kubeflow on Cloud Major cloud providers supported Choices on Amazon Web Services • Self-managed k8s on EC2: Kops, CloudFormation, Terraform • Amazon EKS
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Getting Started with Kubeflow on Amazon EKS
  • 25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jupyter Notebook Web application to build, deploy, and train ML models Create and share documents that contain live code, equations, visualizations, and narrative text 40+ programming languages Use cases data cleaning and transformation numerical simulation data visualization machine learning
  • 26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Training using Jupyter Notebook
  • 27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kubeflow Fairing Python SDK to build, train, and deploy ML models remotely Goals: • Easily package ML training jobs • Train ML models in the cloud • Streamline the process of deploying a trained model
  • 28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://github.com/aws-samples/eks-kubeflow-workshop/blob/master/notebooks/02_Fairing/02_06_fairing_e2e.ipynb Setup Kubeflow Fairing for training and prediction
  • 29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Train an XGBoost model remotely on Kubeflow Deploy the trained model to Kubeflow for prediction
  • 30. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Call the prediction endpoint
  • 31. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hyperparameter Tuning using Katib Hyperparameter are parameters external to the model to control the training e.g. Learning rate, batch size, epochs Tuning finds a set of hyperparameters that optimizes an objective function e.g. Find the optimal batch size and learning rate to maximize prediction accuracy Katib enables hyperparameter tuning in Kubeflow Credits @Richard Liu @Johnu (Kubeflow slack)
  • 32. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Katib Concepts Extensible Framework agnostic: TensorFlow, PyTorch, MXNet, … Customizable algorithm backend Experiment: “optimization loop” for some specific problem Suggestion: a proposed solution to the problem Trial: one iteration of the loop Job: evaluate a trial and calculate objective value
  • 33. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Katib System Architecture Experiment Controller Hyperparameters Trial Controller Suggestion Controller Trial Trial Trial Create Experiment Metrics Trial CR Experiment = CreateExperimentCR() while not Experiment.Objective reached: Suggestion = CreateSuggestionCR() HyperParameters = Suggestion.Assignments Metrics = CreateTrialCR(HyperParameters) ReportMetrics(Metrics) Suggestion CR Metric DB Experiment CR
  • 34. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Experiment CR Hyperparameters https://github.com/aws-samples/eks-kubeflow-workshop/blob/master/notebooks/08_Hyperparameter_Tuning/random-search-example.yaml
  • 35. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Trial template
  • 36. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 37. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kubeflow KFServing Simple and pluggable platform for ML inference Intuitive and consistent experience Serving models on arbitrary frameworks e.g. TensorFlow, XGBoost, SciKitLearn Encapsulates GPU auto-scaling, canary rollouts Credits @ellis-bigelow (Kubeflow slack)
  • 38. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 39. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. KFServing Custom Resource S3 secret attached to Service Account Trained model https://github.com/kubeflow/kfserving/blob/master/docs/samples/s3/tensorflow_s3.yaml
  • 40. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pluggable Interface apiVersion: "serving.kubeflow.org/v1alpha1" kind: "InferenceService" metadata: name: "sklearn-iris" spec: default: sklearn: storageUri: "gs://kfserving-samples/models/sklearn/iris" apiVersion: "serving.kubeflow.org/v1alpha1" kind: "InferenceService" metadata: name: "flowers-sample" spec: default: tensorflow: storageUri: "gs://kfserving-samples/models/tensorflow/flowers" apiVersion: "serving.kubeflow.org/v1alpha1" kind: "KFService" metadata: name: "pytorch-cifar10" spec: default: pytorch: storageUri: "gs://kfserving-samples/models/pytorch/cifar10" modelClassName: "Net"
  • 41. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. KFServing Interface – Scikit Learn apiVersion: "serving.kubeflow.org/v1alpha1" kind: "KFService" metadata: name: "sklearn-iris" spec: default: sklearn: storageUri: "gs://kfserving-samples/models/sklearn/iris" serviceAccount: inferencing-robot minReplicas: 3 maxReplicas: 10 resources: requests: cpu: 2 gpu: 1 memory: 10Gi canaryTrafficPercent: 25 canary: sklearn: storageUri: "gs://kfserving-samples/models/sklearn/iris-v2" serviceAccount: inferencing-robot minReplicas: 3 maxReplicas: 10 resources: requests: cpu: 2 gpu: 1 memory: 10Gi
  • 42. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Distributed Training using Horovod Created by Uber, hosted by LF AI Foundation Distributed training framework for TensorFlow, Keras, PyTorch, and MXNet Compared to distributed TensorFlow • Far less code changes • ~2x faster Examples at Uber: Self-driving vehicles, fraud detection, and trip forecasting Named after a traditional Russian dance
  • 43. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kubeflow Pipelines Compose, deploy, and manage end-to-end ML workflows • End-to-end orchestration • Easy, rapid, and reliable experimentation • Easy re-use Built using Pipelines SDK • kfp.compiler, kfp.components, kfp.Client Uses Argo under the hood to orchestrate resources
  • 44. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kubeflow Pipelines Platform UI for managing and tracking experiments, jobs, and runs Engine for scheduling multi-step ML workflows SDK for defining and manipulating pipelines and components • kubeflow-pipelines.readthedocs.io/en/latest/ Notebooks for interacting with the system using the SDK
  • 45. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Creating Kubeflow Pipeline Components
  • 46. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Consumer Loan Acceptance Scoring Objective • Putting hundreds of data products live • Single development -> deployment -> delivery environment • First go batch, then real-time Analytics environment • AWS • High security and compliance with regulation Typical modeling context • Structured data • Supervised learning • Internalizing interpretable models and hybrid pipelines www.credo.be
  • 47. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Requirements Data Scientists • Hybrid, integrated, cloud-based dev env • Python • PySpark (locally + remotely on Spark cluster) • R • SQL • Version control (scripts & artifacts) ML DevOps • Seamless deployment of hybrid pipelines • Trigger-based scheduling & orchestration of runs • Monitoring & dashboard • Version control (runs & pipelines) www.credo.be
  • 48. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architecture AWS Infrastructure, connections, security S3, Spark cluster, VMs, … Amazon EKS Amazon ECR Kubeflow 0.6 Notebook dev environment Pipelines for dev & delivery ElasticStack Dashboarding Custom notebook servers www.credo.be
  • 49. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storage: FSx for Lustre High performance file system for processing Amazon S3 or on-premises data Low latency and high throughput Works natively with Amazon S3 Container Storage Interface (CSI) driver
  • 50. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Best Practices for Optimizing Distributed Deep Learning Performance on Amazon EKS https://aws.amazon.com/blogs/opensource/optimizing-distributed-deep-learning-performance-amazon-eks/
  • 51. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Advantages of KubeFlow on AWS EKS cluster provision with External traffic with to manage Lustre file system Centralized and unified K8s logs in TLS and Auth with and for your K8s API server endpoint Detect GPU instance and install kubeflow.org/docs/aws
  • 52. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 53. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kanban Board github.com/orgs/kubeflow/projects/25
  • 54. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Application Requirements 1.0 https://github.com/kubeflow/community/blob/master/g uidelines/application_requirements.md
  • 55. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Learning pipeline Choose and Optimize your ML algorithm Setup and manage environments for training Deploy model in production Collect & prepare training data Train and tune model (trial and error) Scale & manage environment in production
  • 56. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Learning pipeline for Kubernetes on AWS Linear regression, decision tree, BYOA GPU- and CPU- based clusters, *operators (TensorFlow, MXNet, …) TensorFlow Serving, MXNet Model Server, Seldon, … EMR, Redshift, S3 TensorFlow, Horovod, MXNet, PyTorch, Keras, … EKS or Self-managed K8s Choose and Optimize your ML algorithm Setup and manage environments for training Deploy model in production Collect & prepare training data Train and tune model (trial and error) Scale & manage environment in production
  • 57. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. References Workshop: eksworkshop.com/kubeflow Jupyter notebooks: github.com/aws-samples/eks- kubeflow-workshop/ Optimizing Machine Learning performance: aws.amazon.com/blogs/opensource/optimizing- distributed-deep-learning-performance-amazon-eks/