ACDKOCHI19 - Demystifying amazon sagemaker

Demystifying Amazon
Sagemaker
Jayesh Bapu Ahire
@Jayesh_Ahire1

Jayesh Bapu Ahire
➢ Organizer,
Twilio India Community, AWS UG Pune, Elasticsearch UG Pune,
Alexa UG Nashik
➢ Research Assistant, Stanford AI Lab
➢ Research Associate, Tsinghua AI Lab & ETH Research
➢ Author, Blogger, Speaker, Student, Poet

Select Algo & Framework
Integrate & Deploy
Data Preprocessing
Train & Tune Model

Machine Learning in Cloud
● The cloud’s pay-per-use model
● Easy for enterprises to experiment, scale and go in
production.
● Intelligent capabilities accessible without requiring
advanced skills in AI.
● Don’t require deep knowledge of AI, machine learning
theory, or a team of data scientists.

AI & ML capabilities of AWS
ML Frameworks +
Infrastructure
ML Services AI Services
Frameworks
Interfaces
+
Infrastructure
Amazon Sagemaker
Build
+
Train
+
Deploy
Personalize Forecast Rekognition
Comprehend Textract Polly
Lex Translate Transcribe

Let’s explore more about Amazon
Sagemaker

Reduce Complexity Fully managed
Quick Test
Pre-optimized
Algorithms
Bring Your Own
Algorithm
Distributed Training

Build Train Deploy
Collect & prepare
training data
Data labelling & pre-built
notebooks for common
problems
Set up & manage
environments for training
One-click training using
Amazon EC2 On-Demand
or Spot instances
Deploy model in
production
One-click deployment
Choose & optimize your
ML algorithm
Built-in, high-performance
algorithms and hundreds
of ready to use
algorithms in AWS
Marketplace
Train & tune model
Train once, run anywhere
& model optimization
Scale & manage the
production environment
Fully managed with auto-
scaling for 75% less

Machine Learning end to end pipeline using Amazon Sagemaker
Build
1. Pre-build algorithms
& notebooks
2. Data Labeling:
Ground Truth
3. AWS marketplace for
ML
Deploy
1. one-click deployment
and hosting
Train
1. One-click model
training and tuning
2. Sagemaker Neo
3. Sagemaker RL
03
01 02

Amazon SageMaker: Open Source Containers
● Customize them
● Run them locally for development and testing
● Run them on SageMaker for training and prediction at scale
https://github.com/aws/sagemaker-tensorflow-containers
https://github.com/aws/sagemaker-mxnet-containers

Amazon SageMaker: Bring Your Own Container
● Prepare the training code in Docker container
● Upload container image to Amazon Elastic Container Registry (ECR)
● Upload training dataset to Amazon S3/FSx/EFS
● Invoke Create Training Job API to execute a SageMaker training job
SageMaker training job pulls the container image from Amazon ECR, reads
the training data from the data source, configures the training job with
hyperparameter inputs, trains a model, and saves the model to model_dir so
that it can be deployed for inference later.
https://github.com/aws/sagemaker-container-support

Distributed Training At Scale on Amazon SageMaker
● Training on Amazon SageMaker can automatically distribute processing
across a number of nodes - including P3 instances
● You can choose from two data distribution types for training ML models
○ Fully Replicated - This will pass every file in the input to every
machine
○ Sharded S3 Key - This will separate and distribute the files in the input
across the training nodes
Overall, sharding can run faster but it depends on the algorithm

Amazon SageMaker: Local Mode Training
Enabling experimentation speed
● Train with local notebooks
● Train on notebook instances
● Iterate faster a small sample of the dataset locally no waiting for a new
● training cluster to be built each time
● Emulate CPU (single and multi-instance) and GPU (single instance) in local
mode
● Go distributed with a single line of code

Automatic Model Tuning on Amazon SageMaker
Hyperparameter Optimizer
● Amazon SageMaker automatic model tuning predicts hyperparameter
values, which might be most effective at improving fit.
● Automatic model tuning can be used with the Amazon SageMaker
○ Built-in algorithms,
○ Pre-built deep learning frameworks, and
○ Bring-your-own-algorithm containers
http://github.com/awslabs/amazon-sagemakerexamples/tree/master/hyperparameter tuning

Amazon SageMaker: Accelerating ML Training
Faster start times and training job execution time
● Two modes: File Mode and Pipe Mode
○ input mode parameter in sagemaker.estimator.estimator
● File Mode: S3 data source or file system data source
○ When using S3 as data source, training data set is downloaded to EBS volumes
○ Use file system data source (Amazon EFS or Amazon FSx for Lustre) for faster
training
○ startup and execution time. Different data formats supported: CSV, protobuf, JSON,
libsvm (check algo docs!)
● Pipe Mode streams the data set to training instances
○ This allows you to process large data sets and training starts faster
○ Dataset must be in recordio-encoded protobuf or csv format

Amazon SageMaker: Fully-Managed Spot Training
Reduce training costs at scale
● Managed Spot training on SageMaker to reduce training costs by up to 90%
● Managed Spot Training is available in all training configurations:
○ All instance types supported by Amazon SageMaker
○ All models: built-in algorithms, built-in frameworks, and custom models
○ All configurations: single instance training, distributed training, and
automatic model tuning.
● Setting it up is extremely simple
○ If you're using the console, just switch the feature on.
○ If you're working with the Amazon SageMaker SDK just set
train_use_spot_instances to true in the Estimator constructor.

Amazon SageMaker: Secure Machine Learning
● No retention of customers data
● SageMaker provides encryption in transit
● Encryption at rest everywhere
● Compute isolation - instances allocated for computation are never shared with
others
● Network isolation: all compute instances run inside private service managed
VPCs
● Secure, fully managed infrastructure: Amazon Sagemaker take care of patching
and keeping instances up-to-date
● Notebook security - Jupyter notebooks can be operated without internet access
and bound to secure customer VPCs

Amazon SageMaker Training: Getting Started
To train a model in Amazon SageMaker, you will need the following:
● A dataset. Here we will use the MNIST (Modified National Institute of Standards and
Technology database) dataset. This dataset provides a training set of 50,000 example
images of handwritten single-digit numbers, a validation set of 10,000 images, and a test
dataset of 10,000 images.
● An algorithm. Here we will use the Linear Learner algorithm provided by Amazon
● An Amazon Simple Storage Service (Amazon S3) bucket to store the training data and the
model artifacts
● An Amazon SageMaker notebook instance to prepare and process data and to train and
deploy a machine learning model.
● A Jupyter notebook to use with the notebook instance
● For model training, deployment, and validation, I will use the high-level Amazon
SageMaker Python SDK

Amazon SageMaker Training: Getting Started
● Create the S3 bucket
● Create an Amazon SageMaker Notebook instance by going here:
https://console.aws.amazon.com/sagemaker/
● Choose Notebook instances, then choose Create notebook instance.
● On the Create notebook instance page, provide the Notebook instance name,
choose ml.t2.medium for instance type (least expensive instance) For IAM role,
choose Create a new role, then choose Create role.
● Choose Create notebook instance.
In a few minutes, Amazon SageMaker launches an ML compute instance
and attaches an ML storage volume to it. The notebook instance has a
preconfigured Jupyter notebook server and a set of Anaconda libraries.

How To Train a Model With Amazon SageMaker
To train a model in Amazon SageMaker, you create a training job. The training job
includes the following information:
● The URL of the Amazon Simple Storage Service (Amazon S3) bucket or the file
● system id of the file system where you've stored the training data.
● The compute resources that you want Amazon SageMaker to use for model
training. Compute resources are ML compute instances that are managed by
Amazon SageMaker.
● The URL of the S3 bucket where you want to store the output of the job.
● The Amazon Elastic Container Registry path where the training code is stored.

Linear Learner with MNIST dataset example
● Provide the S3 bucket and prefix that you want to use for training and model
artifacts. This should be within the same region as the Notebook instance,
training, and hosting
● The IAM role arn used to give training and hosting access to your data
● Download the MNIST dataset
● Amazon SageMaker implementation of Linear Learner takes recordio wrapped
protobuf, where as the data we have is a pickle-ized numpy array on disk.
● This data conversion will be handled by the Amazon SageMaker Python SDK,
imported as sagemaker

Train the model
Create and Run a Training Job with Amazon SageMaker Python SDK
● To train a model in Amazon Sagemaker, you can use
○ Amazon SageMaker Python SDK or
○ AWS SDK for Python (Boto 3) or
○ AWS console
● For this exercise, I will use the notebook instance and the Python SDK
● The Amazon SageMaker Python SDK includes the
sagemaker.estimator.Estimator estimator, which can be used with any
algorithm.
● To run a model training job import the Amazon SageMaker Python SDK and get
the Linear Learner container

Take free ML on AWS
course on Coursera

Links
● https://github.com/aws/sagemaker-tensorflow-
containers
● https://github.com/aws/sagemaker-mxnet-containers
● https://github.com/aws/sagemaker-container-support
● http://github.com/awslabs/amazon-
sagemakerexamples/
● https://docs.aws.amazon.com/sagemaker/index.html

Thank You!
@Jayesh_Ahire1 @jayeshbahire @jbahire

ACDKOCHI19 - Demystifying amazon sagemaker

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ACDKOCHI19 - Demystifying amazon sagemaker

Similar to ACDKOCHI19 - Demystifying amazon sagemaker (20)

More from AWS User Group Kochi

More from AWS User Group Kochi (14)

Recently uploaded

Recently uploaded (20)

ACDKOCHI19 - Demystifying amazon sagemaker