SlideShare a Scribd company logo
1 of 21
Download to read offline
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building Machine Learning inference
pipelines at scale
Julien Simon
Global Evangelist, AI & Machine Learning
@julsimon
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Problem statement
• Real-life Machine Learning applications require more than a single model.
• Data may need pre-processing: normalization, feature engineering,
dimensionality reduction, etc.
• Predictions may need post-processing: filtering, sorting, combining, etc.
Our goal:
build scalable ML pipelines with open source (Spark, Scikit-learn, XGBoost) and
managed services (Amazon EMR, AWS Glue, Amazon SageMaker)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ApacheSpark
• Open-source, distributed processing system
• In-memory caching and optimized execution for fast performance (typically
100x faster than Hadoop)
• Batch processing, streaming analytics, machine learning, graph databases
and ad hoc queries
• API for Java, Scala, Python, R, and SQL
• Available in Amazon EMR and AWS Glue
https://spark.apache.org/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
MLlib – Machinelearning library
• Algorithms: classification, regression, clustering, collaborative filtering.
• Featurization: feature extraction, transformation, dimensionality reduction.
• Tools for constructing, evaluating and tuning pipelines
• Transformer – a transform function that maps a DataFrame into a new one
• Adding a column, changing the rows of a specific column, etc.
• Predicting the label based on the feature vector
• Estimator – an algorithm that trains on data
• Consists of a fit() function that maps a DataFrame into a Model
https://spark.apache.org/docs/latest/ml-guide.html
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: binary classificationfor textsamples
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/PipelineExample.scala
// Prepare training documents from a list of (id, text, label) tuples.
val training = <LOAD_TRAINING_DATA>
// Configure an ML pipeline with three stages: tokenizer, hashingTF, and lr.
val tokenizer = new Tokenizer().setInputCol("text”).setOutputCol("words")
val hashingTF = new HashingTF()
.setNumFeatures(1000)
.setInputCol(tokenizer.getOutputCol)
.setOutputCol("features")
val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.001)
val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, lr))
// Fit the pipeline to training documents.
val model = pipeline.fit(training)
// Prepare test documents, which are unlabeled (id, text) tuples.
val test = <LOAD_TEST_DATA>
// Make predictions on test documents.
model.transform(test)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thisisanaïveexample.Whataboutreal-lifechallenges?
1. Exporting models to other applications
2. Decoupling pipeline infrastructure
3. Using any ML algorithm in any language
4. Getting low latency predictions
… while avoiding complex and time-consuming infrastructure work
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Save the model and load it in another Spark application
• Export the model to PMML, and use it with Java, R, etc.
• Export the model to MLeap
• http://mleap-docs.combust.ml/
• Lightweight runtime independent from Spark
• Interoperability between SparkML,
TensorFlow and scikit-learn
#1 – Exporting models
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#2– Decoupling pipelineinfrastructure
• Different steps require different hardware configurations
• Say, R5 for ETL, P3 for training and C5 for prediction?
• If you need GPUs for training, does it make sense to run ETL on GPUs?
• Do you want to build and manage a specific cluster for each step?
• Size and scale each step independently
• Avoid oversizing your Spark cluster because one step requires it
• Avoid time-consuming resizing operations on Amazon EMR
• We often run ETL once, and train many models in parallel
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#3–Using any MLalgorithm
• MLlib is great, but you may need something else
• Other ML algorithms
• Deep Learning:TensorFlow, Apache MXNet, PyTorch, etc.
• Your own custom code in any language
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#4–Getting low latencypredictions
• Run ML predictions without using Spark.
• Save the overhead of the Spark framework
• Save loading your data in a DataFrame
• Deploy MLeap models
• Improve latency for small-batch predictions.
• It can be difficult to achieve low-latency predictions with Spark
• Use the optimal instance type for prediction
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AmazonSageMaker:
Build,Train,and Deploy MLModels atScale
Collect and prepare
training data
Choose and optimize
your
ML algorithm
Train and
Tune ML Models
Set up and
manage
environments
for training
Deploy models
in production
Scale and manage
the production
environment
1
2
3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Model options
Training code
Factorization Machines
Linear Learner
Principal Component Analysis
K-Means Clustering
XGBoost
And more
Built-inAlgorithms (17)
No ML coding required
Distributed training
Pipe mode
BringYour Own Container
Full control, run anything!
R, C++, etc.
Built-in Frameworks
Bring your own code: script mode
Open source containers
Distributed training
Pipe mode
No infrastructure work required
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AmazonSageMakerSDKs
• SageMakerAPI
• AWS SDKs (boto3, etc.), CLI
• Nice for low-level management and automation
• Python SDK (aka ‘SageMaker SDK’)
• Algorithm selection, training, deploying, automatic model tuning, etc.
• Nice for notebooks and experimentation
• PySpark/Scala SDK for Apache Spark 2.1.1 and 2.2
• Pre-installed onAmazon EMR 5.11 and later
• Train, deploy and predict with SageMaker directly from your Spark application
• Supports standalone models and MLlib pipelines
• DataFrames in, DataFrames out: no data conversion needed
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/sparkml_serving_emr_mleap_abalone
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Inference Pipelines
• Sequence of 2-5 containers processing inference requests
• Train a model for each step, deploy pipeline as a single unit
• Real-time prediction endpoint (HTTPS)
• Batch transform
• Use any model
• Built-in algorithms,
• Built-in frameworks (including MLeap)
• Custom containers
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/scikit_learn_inference_pipeline
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Getting started
https://ml.aws
https://aws.amazon.com/sagemaker
https://github.com/awslabs/amazon-sagemaker-examples
https://github.com/aws/sagemaker-spark
https://github.com/aws/sagemaker-sparkml-serving-container
https://medium.com/@julsimon
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Julien Simon
Global Evangelist, AI & Machine Learning
@julsimon

More Related Content

What's hot

Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)Julien SIMON
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Julien SIMON
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)Julien SIMON
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...Julien SIMON
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Julien SIMON
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Julien SIMON
 
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)Julien SIMON
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerAmazon Web Services
 
Using Amazon SageMaker to build, train, & deploy your ML Models
Using Amazon SageMaker to build, train, & deploy your ML ModelsUsing Amazon SageMaker to build, train, & deploy your ML Models
Using Amazon SageMaker to build, train, & deploy your ML ModelsAmazon Web Services
 
Speed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithmsSpeed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithmsJulien SIMON
 
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using TensorflowAWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using TensorflowJulien SIMON
 
Integrating Deep Learning into your Enterprise
Integrating Deep Learning into your EnterpriseIntegrating Deep Learning into your Enterprise
Integrating Deep Learning into your EnterpriseAmazon Web Services
 
Optimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdfOptimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdfAmazon Web Services
 
An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)Julien SIMON
 
Build Deep Learning Applications with TensorFlow & SageMaker
Build Deep Learning Applications with TensorFlow & SageMakerBuild Deep Learning Applications with TensorFlow & SageMaker
Build Deep Learning Applications with TensorFlow & SageMakerAmazon Web Services
 
AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)Julien SIMON
 
An Introduction to Reinforcement Learning (December 2018)
An Introduction to Reinforcement Learning (December 2018)An Introduction to Reinforcement Learning (December 2018)
An Introduction to Reinforcement Learning (December 2018)Julien SIMON
 
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)Julien SIMON
 
Working with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model TrainingWorking with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model TrainingAmazon Web Services
 
AWS re:Invent 2018 - AIM401-R2 - Deep Learning Applications with Tensorflow
AWS re:Invent 2018 - AIM401-R2 - Deep Learning Applications with TensorflowAWS re:Invent 2018 - AIM401-R2 - Deep Learning Applications with Tensorflow
AWS re:Invent 2018 - AIM401-R2 - Deep Learning Applications with TensorflowJulien SIMON
 

What's hot (20)

Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
 
Using Amazon SageMaker to build, train, & deploy your ML Models
Using Amazon SageMaker to build, train, & deploy your ML ModelsUsing Amazon SageMaker to build, train, & deploy your ML Models
Using Amazon SageMaker to build, train, & deploy your ML Models
 
Speed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithmsSpeed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithms
 
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using TensorflowAWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
 
Integrating Deep Learning into your Enterprise
Integrating Deep Learning into your EnterpriseIntegrating Deep Learning into your Enterprise
Integrating Deep Learning into your Enterprise
 
Optimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdfOptimize your ML workloads_converted.pdf
Optimize your ML workloads_converted.pdf
 
An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)
 
Build Deep Learning Applications with TensorFlow & SageMaker
Build Deep Learning Applications with TensorFlow & SageMakerBuild Deep Learning Applications with TensorFlow & SageMaker
Build Deep Learning Applications with TensorFlow & SageMaker
 
AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)
 
An Introduction to Reinforcement Learning (December 2018)
An Introduction to Reinforcement Learning (December 2018)An Introduction to Reinforcement Learning (December 2018)
An Introduction to Reinforcement Learning (December 2018)
 
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
 
Working with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model TrainingWorking with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model Training
 
AWS re:Invent 2018 - AIM401-R2 - Deep Learning Applications with Tensorflow
AWS re:Invent 2018 - AIM401-R2 - Deep Learning Applications with TensorflowAWS re:Invent 2018 - AIM401-R2 - Deep Learning Applications with Tensorflow
AWS re:Invent 2018 - AIM401-R2 - Deep Learning Applications with Tensorflow
 

Similar to Building Machine Learning Inference Pipelines at Scale (July 2019)

Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Amazon Web Services
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...AWS Summits
 
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...Amazon Web Services
 
Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...
Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...
Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...Amazon Web Services
 
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Amazon Web Services
 
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopAWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopJulien SIMON
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Amazon Web Services
 
Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...
Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...
Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...Amazon Web Services
 
End to End Model Development to Deployment using SageMaker
End to End Model Development to Deployment using SageMakerEnd to End Model Development to Deployment using SageMaker
End to End Model Development to Deployment using SageMakerAmazon Web Services
 
MXNet Paris Workshop - Intro To MXNet
MXNet Paris Workshop - Intro To MXNetMXNet Paris Workshop - Intro To MXNet
MXNet Paris Workshop - Intro To MXNetApache MXNet
 
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaData meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaSandesh Rao
 
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...Amazon Web Services
 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Arun Gupta
 
Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Setting up custom machine learning environments on AWS - AIM309 - New York AW...Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Setting up custom machine learning environments on AWS - AIM309 - New York AW...Amazon Web Services
 
Advanced Machine Learning with Amazon SageMaker
Advanced Machine Learning with Amazon SageMakerAdvanced Machine Learning with Amazon SageMaker
Advanced Machine Learning with Amazon SageMakerJulien SIMON
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueAmazon Web Services
 
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Amazon Web Services
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerAmazon Web Services
 
AWS Machine Learning Week SF: End to End Model Development Using SageMaker
AWS Machine Learning Week SF: End to End Model Development Using SageMakerAWS Machine Learning Week SF: End to End Model Development Using SageMaker
AWS Machine Learning Week SF: End to End Model Development Using SageMakerAmazon Web Services
 
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...Amazon Web Services
 

Similar to Building Machine Learning Inference Pipelines at Scale (July 2019) (20)

Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
 
Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...
Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...
Big Data Meets Machine Learning: Architecting Spark Environment for Data Scie...
 
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
 
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopAWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
 
Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...
Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...
Build, Train, and Deploy Machine Learning for the Enterprise with Amazon Sage...
 
End to End Model Development to Deployment using SageMaker
End to End Model Development to Deployment using SageMakerEnd to End Model Development to Deployment using SageMaker
End to End Model Development to Deployment using SageMaker
 
MXNet Paris Workshop - Intro To MXNet
MXNet Paris Workshop - Intro To MXNetMXNet Paris Workshop - Intro To MXNet
MXNet Paris Workshop - Intro To MXNet
 
Data meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow IndiaData meets AI - ATP Roadshow India
Data meets AI - ATP Roadshow India
 
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019
 
Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Setting up custom machine learning environments on AWS - AIM309 - New York AW...Setting up custom machine learning environments on AWS - AIM309 - New York AW...
Setting up custom machine learning environments on AWS - AIM309 - New York AW...
 
Advanced Machine Learning with Amazon SageMaker
Advanced Machine Learning with Amazon SageMakerAdvanced Machine Learning with Amazon SageMaker
Advanced Machine Learning with Amazon SageMaker
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMaker
 
AWS Machine Learning Week SF: End to End Model Development Using SageMaker
AWS Machine Learning Week SF: End to End Model Development Using SageMakerAWS Machine Learning Week SF: End to End Model Development Using SageMaker
AWS Machine Learning Week SF: End to End Model Development Using SageMaker
 
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
Build, Train, and Deploy ML Models with Amazon SageMaker (AIM410-R2) - AWS re...
 

More from Julien SIMON

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceJulien SIMON
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersJulien SIMON
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Julien SIMON
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)Julien SIMON
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)Julien SIMON
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Julien SIMON
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)Julien SIMON
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Julien SIMON
 
Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Julien SIMON
 
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...Julien SIMON
 
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Julien SIMON
 
Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)Julien SIMON
 
Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Julien SIMON
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Julien SIMON
 
Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Julien SIMON
 

More from Julien SIMON (16)

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
 
Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)
 
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
 
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
 
Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)
 
Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
 
Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)
 

Recently uploaded

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 

Building Machine Learning Inference Pipelines at Scale (July 2019)

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building Machine Learning inference pipelines at scale Julien Simon Global Evangelist, AI & Machine Learning @julsimon
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Problem statement • Real-life Machine Learning applications require more than a single model. • Data may need pre-processing: normalization, feature engineering, dimensionality reduction, etc. • Predictions may need post-processing: filtering, sorting, combining, etc. Our goal: build scalable ML pipelines with open source (Spark, Scikit-learn, XGBoost) and managed services (Amazon EMR, AWS Glue, Amazon SageMaker)
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. ApacheSpark • Open-source, distributed processing system • In-memory caching and optimized execution for fast performance (typically 100x faster than Hadoop) • Batch processing, streaming analytics, machine learning, graph databases and ad hoc queries • API for Java, Scala, Python, R, and SQL • Available in Amazon EMR and AWS Glue https://spark.apache.org/
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. MLlib – Machinelearning library • Algorithms: classification, regression, clustering, collaborative filtering. • Featurization: feature extraction, transformation, dimensionality reduction. • Tools for constructing, evaluating and tuning pipelines • Transformer – a transform function that maps a DataFrame into a new one • Adding a column, changing the rows of a specific column, etc. • Predicting the label based on the feature vector • Estimator – an algorithm that trains on data • Consists of a fit() function that maps a DataFrame into a Model https://spark.apache.org/docs/latest/ml-guide.html
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example: binary classificationfor textsamples https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/PipelineExample.scala // Prepare training documents from a list of (id, text, label) tuples. val training = <LOAD_TRAINING_DATA> // Configure an ML pipeline with three stages: tokenizer, hashingTF, and lr. val tokenizer = new Tokenizer().setInputCol("text”).setOutputCol("words") val hashingTF = new HashingTF() .setNumFeatures(1000) .setInputCol(tokenizer.getOutputCol) .setOutputCol("features") val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.001) val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, lr)) // Fit the pipeline to training documents. val model = pipeline.fit(training) // Prepare test documents, which are unlabeled (id, text) tuples. val test = <LOAD_TEST_DATA> // Make predictions on test documents. model.transform(test)
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thisisanaïveexample.Whataboutreal-lifechallenges? 1. Exporting models to other applications 2. Decoupling pipeline infrastructure 3. Using any ML algorithm in any language 4. Getting low latency predictions … while avoiding complex and time-consuming infrastructure work
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Save the model and load it in another Spark application • Export the model to PMML, and use it with Java, R, etc. • Export the model to MLeap • http://mleap-docs.combust.ml/ • Lightweight runtime independent from Spark • Interoperability between SparkML, TensorFlow and scikit-learn #1 – Exporting models
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. #2– Decoupling pipelineinfrastructure • Different steps require different hardware configurations • Say, R5 for ETL, P3 for training and C5 for prediction? • If you need GPUs for training, does it make sense to run ETL on GPUs? • Do you want to build and manage a specific cluster for each step? • Size and scale each step independently • Avoid oversizing your Spark cluster because one step requires it • Avoid time-consuming resizing operations on Amazon EMR • We often run ETL once, and train many models in parallel
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. #3–Using any MLalgorithm • MLlib is great, but you may need something else • Other ML algorithms • Deep Learning:TensorFlow, Apache MXNet, PyTorch, etc. • Your own custom code in any language
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. #4–Getting low latencypredictions • Run ML predictions without using Spark. • Save the overhead of the Spark framework • Save loading your data in a DataFrame • Deploy MLeap models • Improve latency for small-batch predictions. • It can be difficult to achieve low-latency predictions with Spark • Use the optimal instance type for prediction
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AmazonSageMaker: Build,Train,and Deploy MLModels atScale Collect and prepare training data Choose and optimize your ML algorithm Train and Tune ML Models Set up and manage environments for training Deploy models in production Scale and manage the production environment 1 2 3
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Model options Training code Factorization Machines Linear Learner Principal Component Analysis K-Means Clustering XGBoost And more Built-inAlgorithms (17) No ML coding required Distributed training Pipe mode BringYour Own Container Full control, run anything! R, C++, etc. Built-in Frameworks Bring your own code: script mode Open source containers Distributed training Pipe mode No infrastructure work required
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AmazonSageMakerSDKs • SageMakerAPI • AWS SDKs (boto3, etc.), CLI • Nice for low-level management and automation • Python SDK (aka ‘SageMaker SDK’) • Algorithm selection, training, deploying, automatic model tuning, etc. • Nice for notebooks and experimentation • PySpark/Scala SDK for Apache Spark 2.1.1 and 2.2 • Pre-installed onAmazon EMR 5.11 and later • Train, deploy and predict with SageMaker directly from your Spark application • Supports standalone models and MLlib pipelines • DataFrames in, DataFrames out: no data conversion needed
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/sparkml_serving_emr_mleap_abalone
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Inference Pipelines • Sequence of 2-5 containers processing inference requests • Train a model for each step, deploy pipeline as a single unit • Real-time prediction endpoint (HTTPS) • Batch transform • Use any model • Built-in algorithms, • Built-in frameworks (including MLeap) • Custom containers
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/scikit_learn_inference_pipeline
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Getting started https://ml.aws https://aws.amazon.com/sagemaker https://github.com/awslabs/amazon-sagemaker-examples https://github.com/aws/sagemaker-spark https://github.com/aws/sagemaker-sparkml-serving-container https://medium.com/@julsimon
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Julien Simon Global Evangelist, AI & Machine Learning @julsimon

Editor's Notes

  1. TODO