SlideShare a Scribd company logo
1 of 25
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
My Nguyen – Solutions Architect – Amazon Web Services Vietnam
AWS’s philosophy on
designing
MLOps platform
Dec 2020
© 2019, Amazon Web Services, Inc. or its Affiliates.
Agenda
• What is MLOps?
• DevOps vs MLOps
• DevOps practices inheritance
• Machine learning development lifecycle
• Unique driving factors to MLOps
• Personas
• Unique challenges faced by ML workload
• MLOps practices on Amazon SageMaker
• Complete separation of steps (and their environments)
• Versioning & tracking
• Pipeline automation
• Continuous improvement
• Demo
• QnA
2
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
What is MLOps?
Operationalizing machine learning workloads
© 2019, Amazon Web Services, Inc. or its Affiliates.
DevOps vs MLOps 4
© 2019, Amazon Web Services, Inc. or its Affiliates.
Notes: Technology is just a piece of the overall picture 5
© 2019, Amazon Web Services, Inc. or its Affiliates.
DevOps practices inheritance
• Communication & collaboration
• Continuous integration
• Continuous delivery/deployment
• Microservices design
• Infrastructure-as-code & configuration-as-code
• Continuous monitoring & logging
6
© 2019, Amazon Web Services, Inc. or its Affiliates.
Machine learning development lifecycle 7
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Unique driving factors to MLOps
© 2019, Amazon Web Services, Inc. or its Affiliates.
Personas
• Business stakeholder
• Data scientist
• Domain expert
• Data engineer
• Security engineer
• Machine learning/DevOps engineer
• Software engineer
All with different skillsets & priorities
9
© 2019, Amazon Web Services, Inc. or its Affiliates.
Unique challenges
• Data:
• The need to utilize production data in development activities
• Dependencies on data pipelines
• Longer experiment lifecycles
• Output of model artifacts:
• Independent lifecycles between model and integrated applications/systems
• Monitoring & tracking of experiments and models
• Unique metrics for performance evaluation
10
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
MLOps practices on Amazon SageMaker
© 2019, Amazon Web Services, Inc. or its Affiliates.
Complete separation of steps
101011010
010101010
000011110
Data processing Explore
& Build
Train
&Validate
Deploy Monitor
12
© 2019, Amazon Web Services, Inc. or its Affiliates.
Versioning & tracking of every steps 13
© 2019, Amazon Web Services, Inc. or its Affiliates.
Pipeline automation
Metaflow Apache Airflow AWS Step FunctionsKubeflowFlyte
14
© 2019, Amazon Web Services, Inc. or its Affiliates.
SageMaker workflow
The notebook: An entry-point / studio / IDE
Notebook: Explore and Interact
Data Scientists
SageMaker Container
Runtime
Elastic Container
Registry (ECR)
Simple Storage
Service (S3)
15
© 2019, Amazon Web Services, Inc. or its Affiliates.
SageMaker Container
Runtime
Elastic Container
Registry (ECR)
Simple Storage
Service (S3)
SageMaker workflow
Prepare data and script; find or build container image(s)
Notebook: Explore and Interact
Training Data
Custom Code
Training Image
Framework Code
Data Scientists
16
© 2019, Amazon Web Services, Inc. or its Affiliates.
SageMaker Container
Runtime
Elastic Container
Registry (ECR)
Simple Storage
Service (S3)
SageMaker workflow
Run a training job to create a model artifact
Notebook: Explore and Interact
Training Job
Custom
model.tar.gz
Training Data
Custom Code Training Image
Framework CodeFrameworkData
Data Scientists
17
© 2019, Amazon Web Services, Inc. or its Affiliates.
SageMaker Container
Runtime
Elastic Container
Registry (ECR)
Simple Storage
Service (S3)
SageMaker workflow
Deploy the model to a real-time inference endpoint
Notebook: Explore and Interact
Inference Endpoint
Custom
Inference Image
model.tar.gz
Training Data
Framework Code
Training Image
Framework Code
FrameworkModel
Data Scientists
Inference Requests
Custom Code
18
© 2019, Amazon Web Services, Inc. or its Affiliates.
SageMaker Container
Runtime
Elastic Container
Registry (ECR)
Simple Storage
Service (S3)
SageMaker workflow
(…Or run a batch transform job)
Notebook: Explore and Interact
Transform Job
Custom
Inference Image
model.tar.gz Framework Code
Training Image
Framework Code
FrameworkModel
Data Scientists
Input Data
Custom Code
Results
19
© 2019, Amazon Web Services, Inc. or its Affiliates.
SageMaker Container
Runtime
Elastic Container
Registry (ECR)
Simple Storage
Service (S3)
SageMaker workflow
Notebook: Explore and Interact
Training Job
Endpoint /Transformer
Custom
Custom
Inference Image
model.tar.gz
Training Data
Custom Code
Framework Code
Training Image
Framework Code
FrameworkModel
FrameworkData
Data Scientists
Inference Requests
20
© 2019, Amazon Web Services, Inc. or its Affiliates.
Continuous improvement
SageMaker
Hosting
Services
SageMaker
Batch
Transform
SageMaker
Notebooks
SageMaker
Autopilot
SageMaker
Experiments
SageMaker
GroundTruth
SageMaker
Processing
SageMaker
Model
Monitor
Amazon
Augmented
AI
SageMaker
Training
SageMaker
Debugger
SageMaker
Hyperparameter
Tuning
SageMaker Studio, the First Fully Integrated Development
Environment For Machine Learning
21
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Demo
Transformation from local notebook to SageMaker workflow
© 2019, Amazon Web Services, Inc. or its Affiliates.
The bigger picture 23
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
QnA
References:
https://d1.awsstatic.com/whitepapers/architecture/wellarchitected-Machine-Learning-Lens.pdf
https://github.com/aws-samples/aws-stepfunctions-byoc-mlops-using-data-science-sdk
https://github.com/apac-ml-tfc/sagemaker-workshop-101
© 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates.
Thank you!
My Nguyen - https://www.linkedin.com/in/mynguyen6512/

More Related Content

What's hot

MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big DataAmazon Web Services
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&MDatabricks
 
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...Amazon Web Services
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
 
Implementing your landing zone - FND210 - AWS re:Inforce 2019
Implementing your landing zone - FND210 - AWS re:Inforce 2019 Implementing your landing zone - FND210 - AWS re:Inforce 2019
Implementing your landing zone - FND210 - AWS re:Inforce 2019 Amazon Web Services
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowDatabricks
 
Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신
Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신
Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신AgileKoreaConference Alliance
 
Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...
Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...
Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLJordan Birdsell
 
[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...
[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...
[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...Amazon Web Services
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...Edge AI and Vision Alliance
 
AWS re:Invent 2016: Workshop: Secure Your Web Application with AWS WAF and Am...
AWS re:Invent 2016: Workshop: Secure Your Web Application with AWS WAF and Am...AWS re:Invent 2016: Workshop: Secure Your Web Application with AWS WAF and Am...
AWS re:Invent 2016: Workshop: Secure Your Web Application with AWS WAF and Am...Amazon Web Services
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureKai Wähner
 
Deep dive on amazon rekognition architectures for image analysis - MCL318 - ...
Deep dive on amazon rekognition architectures for image analysis - MCL318  - ...Deep dive on amazon rekognition architectures for image analysis - MCL318  - ...
Deep dive on amazon rekognition architectures for image analysis - MCL318 - ...Amazon Web Services
 

What's hot (20)

MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Deep Dive on AWS Lambda
Deep Dive on AWS LambdaDeep Dive on AWS Lambda
Deep Dive on AWS Lambda
 
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
Implementing your landing zone - FND210 - AWS re:Inforce 2019
Implementing your landing zone - FND210 - AWS re:Inforce 2019 Implementing your landing zone - FND210 - AWS re:Inforce 2019
Implementing your landing zone - FND210 - AWS re:Inforce 2019
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
 
Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신
Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신
Amazon & AWS의 MSA와 DevOps, 그리고 지속적 혁신
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...
Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...
Leveraging the AWS Sales Methodology and Partner Best Practices aws-partner-s...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...
[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...
[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
 
AWS re:Invent 2016: Workshop: Secure Your Web Application with AWS WAF and Am...
AWS re:Invent 2016: Workshop: Secure Your Web Application with AWS WAF and Am...AWS re:Invent 2016: Workshop: Secure Your Web Application with AWS WAF and Am...
AWS re:Invent 2016: Workshop: Secure Your Web Application with AWS WAF and Am...
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
Deep dive on amazon rekognition architectures for image analysis - MCL318 - ...
Deep dive on amazon rekognition architectures for image analysis - MCL318  - ...Deep dive on amazon rekognition architectures for image analysis - MCL318  - ...
Deep dive on amazon rekognition architectures for image analysis - MCL318 - ...
 

Similar to Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform

AWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applicationsAWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applicationsCobus Bernard
 
Become a Machine Learning Developer with AWS Services
Become a Machine Learning Developer with AWS ServicesBecome a Machine Learning Developer with AWS Services
Become a Machine Learning Developer with AWS ServicesAmazon Web Services
 
Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Julien SIMON
 
Amazon SageMaker workshop
Amazon SageMaker workshopAmazon SageMaker workshop
Amazon SageMaker workshopJulien SIMON
 
WhereML a Serverless ML Powered Location Guessing Twitter Bot
WhereML a Serverless ML Powered Location Guessing Twitter BotWhereML a Serverless ML Powered Location Guessing Twitter Bot
WhereML a Serverless ML Powered Location Guessing Twitter BotRandall Hunt
 
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018Amazon Web Services
 
Integrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an HourIntegrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an HourVMware Tanzu
 
Modern Applications Development on AWS
Modern Applications Development on AWSModern Applications Development on AWS
Modern Applications Development on AWSBoaz Ziniman
 
Supercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMakerSupercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMakerAmazon Web Services
 
Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...
Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...
Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...Amazon Web Services
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Julien SIMON
 
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...Jonathan Dion
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Amazon Web Services
 
CICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfCICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfAmazon Web Services
 
Mainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best PracticesMainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best PracticesAmazon Web Services
 
Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018
Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018
Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018Amazon Web Services
 
MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)Julien SIMON
 
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...Amazon Web Services Korea
 
How_to_build_your_cloud_enablement_engine_with_the_people_you_already_have
How_to_build_your_cloud_enablement_engine_with_the_people_you_already_haveHow_to_build_your_cloud_enablement_engine_with_the_people_you_already_have
How_to_build_your_cloud_enablement_engine_with_the_people_you_already_haveAmazon Web Services
 

Similar to Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform (20)

AWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applicationsAWS DevDay Cologne - CI/CD for modern applications
AWS DevDay Cologne - CI/CD for modern applications
 
Become a Machine Learning Developer with AWS Services
Become a Machine Learning Developer with AWS ServicesBecome a Machine Learning Developer with AWS Services
Become a Machine Learning Developer with AWS Services
 
Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)
 
Amazon SageMaker workshop
Amazon SageMaker workshopAmazon SageMaker workshop
Amazon SageMaker workshop
 
WhereML a Serverless ML Powered Location Guessing Twitter Bot
WhereML a Serverless ML Powered Location Guessing Twitter BotWhereML a Serverless ML Powered Location Guessing Twitter Bot
WhereML a Serverless ML Powered Location Guessing Twitter Bot
 
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
Train & Deploy ML Models with Amazon Sagemaker: Collision 2018
 
Integrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an HourIntegrate Machine Learning into Your Spring Application in Less than an Hour
Integrate Machine Learning into Your Spring Application in Less than an Hour
 
Modern Applications Development on AWS
Modern Applications Development on AWSModern Applications Development on AWS
Modern Applications Development on AWS
 
Supercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMakerSupercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMaker
 
Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...
Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...
Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
 
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
AWS Toronto Summit 2019 - AIM302 - Build, train, and deploy ML models with Am...
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
 
CICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfCICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdf
 
Mainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best PracticesMainframe Modernization with AWS: Patterns and Best Practices
Mainframe Modernization with AWS: Patterns and Best Practices
 
Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018
Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018
Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018
 
CI/CD for Modern Applications
CI/CD for Modern ApplicationsCI/CD for Modern Applications
CI/CD for Modern Applications
 
MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)
 
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...
 
How_to_build_your_cloud_enablement_engine_with_the_people_you_already_have
How_to_build_your_cloud_enablement_engine_with_the_people_you_already_haveHow_to_build_your_cloud_enablement_engine_with_the_people_you_already_have
How_to_build_your_cloud_enablement_engine_with_the_people_you_already_have
 

More from Grokking VN

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking VN
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking VN
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking VN
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking VN
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
 
Grokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking VN
 
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 Grokking Techtalk #39: How to build an event driven architecture with Kafka ... Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...Grokking VN
 
Grokking Techtalk #38: Escape Analysis in Go compiler
 Grokking Techtalk #38: Escape Analysis in Go compiler Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking Techtalk #38: Escape Analysis in Go compilerGrokking VN
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problemGrokking VN
 
Grokking Techtalk #37: Software design and refactoring
 Grokking Techtalk #37: Software design and refactoring Grokking Techtalk #37: Software design and refactoring
Grokking Techtalk #37: Software design and refactoringGrokking VN
 
Grokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking VN
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...Grokking VN
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking VN
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking VN
 
SOLID & Design Patterns
SOLID & Design PatternsSOLID & Design Patterns
SOLID & Design PatternsGrokking VN
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking VN
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking VN
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking VN
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking VN
 

More from Grokking VN (20)

Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banks
 
Grokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles ThinkingGrokking Techtalk #45: First Principles Thinking
Grokking Techtalk #45: First Principles Thinking
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...
 
Grokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystifiedGrokking Techtalk #43: Payment gateway demystified
Grokking Techtalk #43: Payment gateway demystified
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
 
Grokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applicationsGrokking Techtalk #39: Gossip protocol and applications
Grokking Techtalk #39: Gossip protocol and applications
 
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 Grokking Techtalk #39: How to build an event driven architecture with Kafka ... Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 
Grokking Techtalk #38: Escape Analysis in Go compiler
 Grokking Techtalk #38: Escape Analysis in Go compiler Grokking Techtalk #38: Escape Analysis in Go compiler
Grokking Techtalk #38: Escape Analysis in Go compiler
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problem
 
Grokking Techtalk #37: Software design and refactoring
 Grokking Techtalk #37: Software design and refactoring Grokking Techtalk #37: Software design and refactoring
Grokking Techtalk #37: Software design and refactoring
 
Grokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellcheckingGrokking TechTalk #35: Efficient spellchecking
Grokking TechTalk #35: Efficient spellchecking
 
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer... Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
Grokking Techtalk #34: K8S On-premise: Incident & Lesson Learned ZaloPay Mer...
 
Grokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking TechTalk #33: High Concurrency Architecture at TIKI
Grokking TechTalk #33: High Concurrency Architecture at TIKI
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
 
SOLID & Design Patterns
SOLID & Design PatternsSOLID & Design Patterns
SOLID & Design Patterns
 
Grokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous CommunicationsGrokking TechTalk #31: Asynchronous Communications
Grokking TechTalk #31: Asynchronous Communications
 
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at Scale
 
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedIn
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search Tree
 
Grokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking TechTalk #26: Kotlin, Understand the Magic
Grokking TechTalk #26: Kotlin, Understand the Magic
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform

  • 1. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. My Nguyen – Solutions Architect – Amazon Web Services Vietnam AWS’s philosophy on designing MLOps platform Dec 2020
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. Agenda • What is MLOps? • DevOps vs MLOps • DevOps practices inheritance • Machine learning development lifecycle • Unique driving factors to MLOps • Personas • Unique challenges faced by ML workload • MLOps practices on Amazon SageMaker • Complete separation of steps (and their environments) • Versioning & tracking • Pipeline automation • Continuous improvement • Demo • QnA 2
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. What is MLOps? Operationalizing machine learning workloads
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. DevOps vs MLOps 4
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. Notes: Technology is just a piece of the overall picture 5
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. DevOps practices inheritance • Communication & collaboration • Continuous integration • Continuous delivery/deployment • Microservices design • Infrastructure-as-code & configuration-as-code • Continuous monitoring & logging 6
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. Machine learning development lifecycle 7
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Unique driving factors to MLOps
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. Personas • Business stakeholder • Data scientist • Domain expert • Data engineer • Security engineer • Machine learning/DevOps engineer • Software engineer All with different skillsets & priorities 9
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. Unique challenges • Data: • The need to utilize production data in development activities • Dependencies on data pipelines • Longer experiment lifecycles • Output of model artifacts: • Independent lifecycles between model and integrated applications/systems • Monitoring & tracking of experiments and models • Unique metrics for performance evaluation 10
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. MLOps practices on Amazon SageMaker
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. Complete separation of steps 101011010 010101010 000011110 Data processing Explore & Build Train &Validate Deploy Monitor 12
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. Versioning & tracking of every steps 13
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. Pipeline automation Metaflow Apache Airflow AWS Step FunctionsKubeflowFlyte 14
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. SageMaker workflow The notebook: An entry-point / studio / IDE Notebook: Explore and Interact Data Scientists SageMaker Container Runtime Elastic Container Registry (ECR) Simple Storage Service (S3) 15
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. SageMaker Container Runtime Elastic Container Registry (ECR) Simple Storage Service (S3) SageMaker workflow Prepare data and script; find or build container image(s) Notebook: Explore and Interact Training Data Custom Code Training Image Framework Code Data Scientists 16
  • 17. © 2019, Amazon Web Services, Inc. or its Affiliates. SageMaker Container Runtime Elastic Container Registry (ECR) Simple Storage Service (S3) SageMaker workflow Run a training job to create a model artifact Notebook: Explore and Interact Training Job Custom model.tar.gz Training Data Custom Code Training Image Framework CodeFrameworkData Data Scientists 17
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. SageMaker Container Runtime Elastic Container Registry (ECR) Simple Storage Service (S3) SageMaker workflow Deploy the model to a real-time inference endpoint Notebook: Explore and Interact Inference Endpoint Custom Inference Image model.tar.gz Training Data Framework Code Training Image Framework Code FrameworkModel Data Scientists Inference Requests Custom Code 18
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. SageMaker Container Runtime Elastic Container Registry (ECR) Simple Storage Service (S3) SageMaker workflow (…Or run a batch transform job) Notebook: Explore and Interact Transform Job Custom Inference Image model.tar.gz Framework Code Training Image Framework Code FrameworkModel Data Scientists Input Data Custom Code Results 19
  • 20. © 2019, Amazon Web Services, Inc. or its Affiliates. SageMaker Container Runtime Elastic Container Registry (ECR) Simple Storage Service (S3) SageMaker workflow Notebook: Explore and Interact Training Job Endpoint /Transformer Custom Custom Inference Image model.tar.gz Training Data Custom Code Framework Code Training Image Framework Code FrameworkModel FrameworkData Data Scientists Inference Requests 20
  • 21. © 2019, Amazon Web Services, Inc. or its Affiliates. Continuous improvement SageMaker Hosting Services SageMaker Batch Transform SageMaker Notebooks SageMaker Autopilot SageMaker Experiments SageMaker GroundTruth SageMaker Processing SageMaker Model Monitor Amazon Augmented AI SageMaker Training SageMaker Debugger SageMaker Hyperparameter Tuning SageMaker Studio, the First Fully Integrated Development Environment For Machine Learning 21
  • 22. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Demo Transformation from local notebook to SageMaker workflow
  • 23. © 2019, Amazon Web Services, Inc. or its Affiliates. The bigger picture 23
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. QnA References: https://d1.awsstatic.com/whitepapers/architecture/wellarchitected-Machine-Learning-Lens.pdf https://github.com/aws-samples/aws-stepfunctions-byoc-mlops-using-data-science-sdk https://github.com/apac-ml-tfc/sagemaker-workshop-101
  • 25. © 2019, Amazon Web Services, Inc. or its Affiliates.© 2019, Amazon Web Services, Inc. or its Affiliates. Thank you! My Nguyen - https://www.linkedin.com/in/mynguyen6512/

Editor's Notes

  1. Build trên nền Non trẻ hơn
  2. Also pipeline-as-code & policy-as-code
  3. Different skillset & priorities
  4. Also pipeline-as-code & policy-as-code
  5. Code versioning controls Shared environments, IDE – Jupyter Note/Lab Infrastructure as code Self-service environment SaaS
  6. Most importantly: training & processing Separation of source, environments, etc. Security Experiment lifecycles Pricing Efficiency
  7. Reproduceability is hard End-to-end tracability Dashboard ->
  8. Netflix built metaflow Lyft build Flyte Kubeflow Apache Airflow Important factor: skill set & enforce Metaflow Netflix built metaflow Netflix is a huge customer of AWS In production since 2018 Made open source by Netflix & AWS in 2019 What is it? Basic concepts of metaflow Deploying to AWS is easy Flyte A K8s native distributed workflow orchestrator used at Lyft for: Data science Pricing Fraud detection Locations ETA and more Enables highly concurrent, scalable workflows for ML and data processing Core concepts of Flyte – task, DAG, workflows, control flow specification. Actual task can be in any language – tasks executed as containers. Provisions necessary resources dynamically, executes tasks as docker containers, and de-provisions resources when tasks are complete to control costs. Supports execution across 100s of machines e.g. production model training Kubeflow, Airflow are fairly popular Airflow Amazon SageMaker with Apache Airflow 1.10.1. If you use Airflow, you can use SageMaker Workflow in Apache Airflow More details from https://sagemaker.readthedocs.io/en/stable/using_workflow.html Many customers want to use the fully managed capabilities of Amazon SageMaker for machine learning, but also want platform and infrastructure teams to continue using Kubernetes for orchestration and managing pipelines. SageMaker addresses this requirement by letting Kubernetes users train and deploy models in SageMaker using SageMaker-Kubeflow operations and pipelines. With operators and pipelines, Kubernetes users can access fully managed SageMaker ML tools and engines, natively from Kubeflow. This eliminates the need to manually manage and optimize ML infrastructure in Kubernetes while still preserving control of overall orchestration through Kubernetes. Using SageMaker operators and pipelines for Kubernetes, you can get the benefits of a fully managed service for machine learning in Kubernetes, without migrating workloads. If you use Kubernetes, you can use SageMaker Operators for Kubernetes You can install the Sagemaker Operator for Kubernetes using the provided Helm Chart Once you have this operator installed, K8s users can natively invoke SageMaker features like model training, Hyperparameter Tuning and Batch Transform jobs They can also setup model serving using SageMaker Model Hosting Services https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_operators_for_kubernetes.html#what-is-an-operator https://eksworkshop.com/advanced/420_kubeflow/pipelines/ We see customers build serverless ML workflows using AWS Step Functions Open source - Step Functions Data Science SDK for SageMaker Create workflows to pre-process data, train/deploy models using SageMaker Data pre-processing can be done using AWS Glue SageMaker functionality like model training, HPO and end point creation is accessible Use the SDK to create and visualize the workflows Scale workflows without having to worry about infrastructure https://aws.amazon.com/about-aws/whats-new/2019/11/introducing-aws-step-functions-data-science-sdk-amazon-sagemaker/ Many good tools exist. You can run any of the tools we saw earlier on AWS. Remember - Tools are meant to make your life easier Don’t get fixated on the tools. Work backwards from the problem you are trying to solve. So think about your existing s/w engg workflows and tools Ask yourself, which tools will best augment what you already have Ask yourself, which tools are your people most comfortable with AWS approach is use the tools that work for you
  9. Easy to think of SageMaker as Notebook. The key thing to remember is that the notebook UI we see a lot in the demos is just a part of the SageMaker platform – and an optional part at that! The notebook is the front-end environment in which we’ll experiment with our data and code. Keep that instance low-cost resource. Value of separation… When we’re ready to try and train or deploy a model, we’ll be spinning up separate, dedicated infrastructure in the SageMaker container runtime – which means we have lots of flexibility to choose resources cost-effectively and only pay for what we need. All managed The orchestration that SageMaker gives us to make this happen is closely integrated to these other two services: The images defining our containers will need to be stored in Amazon ECR (there’s not currently an integration for external registries like DockerHub – but if you have a particular technology in mind our service team would appreciate the feedback! …And the preferred storage platform for not just our input data but also model artifacts and other stuff generated in the workflow will be Amazon S3. Why? <The generic S3 pitch – it’s got everything you need for a data lake> Most integrated service, arguably most mature, tiers, security models, high durability Recaping: 4 things …So let’s look at how that end-to-end process works.
  10. To start with I have: The data that I want to train on (prepared and loaded to S3) – pre-processed already, in Notebook, but also option for other services like Glue or Processing Jobs to … The training script I’d like to run (e.g. defining neural network shape and fitting routine – on the notebook instance where I’m working) minimum code One of the pre-prepared SageMaker framework container images somewhere in Amazon ECR – maybe TensorFlow, PyTorch, or MXNet repeatable, controlled, re-producable
  11. So what’s happening when we start a training job by calling “estimator.fit()” in those examples from before? We’re gonna start seeing a lot of arrows here, so the cool thing to remember is that all of the arrows are things *SageMaker is doing for you* - not things you need to do yourself! First, assuming you provide a custom code script (or folder of code), the SageMaker SDK is going to zip that up and upload it to a new location in S3. So you can’t forget to check your working version in to git, and you won’t lose track of that version that worked well in the middle of your experiments: The results are going to be traceable to the code that created them. Next, SageMaker is going to spin up whatever infrastructure you asked for in the fit() request, and pull down the docker image to run on it SageMaker will also start downloading your source data from S3 into the container – no messing about with S3 API calls in your script – your code can read it from folder, just as if you were running locally. Env params… As the container fires up, that framework application does a load of helpful prep but one particularly important thing: It installs any additional inline dependencies specified for your custom code, then starts it up and passes in the parameters of the training job. Your code runs, prints status to the console, and saves the trained model to disk just like you normally would… But SageMaker takes care of zipping and uploading that final model to S3 – and also other output mechanisms like sending the logs to CloudWatch and collecting metrics. Pay only for … So the benefit we’ve gained here is that our custom code can be quite simple: Load a CSV from file, make a random forest, save it to file, etc. We can even add specify additional dependencies via a requirements.txt file… and SageMaker plus the framework container will orchestrate these overhead tasks to give us this nice lineage-traceable workflow with all of the cool features we talked about earlier – with no extra code complexity required on our part.
  12. When it’s time to deploy that model to an inference endpoint, we simply reference: Our model artifact tarball from S3 An inference container (which might be the same one as for training, or might be a different image because the dependencies could be differently optimized for run-time) And maybe some custom code again: This time just defining some helper functions that we might want to customize from the built-in inference flow, such as how to de/serialize requests and responses, or how the model file(s) need to be loaded from disk into memory if the process is different from standard. How it’s optimized As in training, SageMaker will handle the creation of infrastructure and loading of these components for us. If we used the ‘estimator’ pattern from the high-level SageMaker SDK, all we need to call is a single estimator.deploy(…) function to make it happen. Again here the intent is that any custom code needed can be small: Just providing a few optional functions for serialization, model loading, etc… Rather than writing and having to maintain a model server, integrations with TorchServe or TensorFlow Serving, etc. Custom input format (JSON)…
  13. Not today, but… In SageMaker, batch transform jobs function pretty much identically to real time inference endpoints from a user code point of view: The batch transform engine handles reading your source data from S3, feeding it through your model, storing the results back to S3, and shutting down the resources again as soon as the job is done. Pay only for…
  14. Mechanism: how easiest for different personas? Skillset dependency – learning curve …So that’s our overview picture for framework containers: You write pretty minimal code just as you usually would for experimenting in your notebook. But instead of running that code locally, which can make things like infrastructure optimization, experiment tracking, and inference deployment tricky… SageMaker provides some nice streamlined, high-level APIs to trigger containerized training and inference jobs (or deploy endpoints) on separate infrastructure. At the fundamental level, the system is super flexible because you can make fully custom container images and model artifact tarballs… But the framework container images together with the SageMaker SDK library (for your notebook) enable this higher-level, container-plus-custom-code workflow. Same as the morning, just diff drawing Solve problems on experimenting, tracking, etc.
  15. Also lession learnt & best practices
  16. The Repeatable stage is generally focused on applying automation as the number of machine learning workloads running in production increases. In general, at this stage many of the activities in building, training and deploying machine learning models is automated. The introduction of automation reduces manual hand-offs between teams and reduces the operational overhead of previously manual/ad-hoc tasks. The ability to orchestrate machine learning workflows into automated machine learning also depends on having a data strategy and automated data processing tasks. Queue Management: Ability to manage, schedule, and prioritize tasks Resource Management: Access to horizontally scalable compute that can scale based on workflow task requirements Workflow Operators: Error handling, retry and conditional logic functions Workflow Logs: Centralized logs and configuration parameters for execution and task level logs The Reliable stage builds on the automation from the Repeatable stage but aims to ensure automation is balanced with practices aimed to increase quality, enable end-to-end traceability, increase reliability through automatic rollbacks, increase visibility into development and operational health, and ensure repeatability. In general, at this stage MLOps practices of Infrastructure-as-Code/Configuration-as-Code, Continuous Integration, Continuous Delivery/Deployment, and Continuous Monitoring are introduced.