Model Serving for Deep Learning

•

10 likes•1,889 views

Slides from my talk at the Data Innovations Summit on MXNet Model Server. https://www.datainnovationsummit.com/ Apache MXNet Model Server (MMS) is a flexible and easy to use tool for serving deep learning models exported from MXNet or the Open Neural Network Exchange (ONNX). https://github.com/awslabs/mxnet-model-server

Technology

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Model Serving for Deep Learning
©2018 Amazon Web Services, Inc. or its affiliates, All rights reserved
Adrian Hornsby, Technical Evangelist
@adhorn

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What are we talking about?
AI
Machine
Learning
Deep
Learning

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is a Neural Net?

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Predicting the price of a house with humans
Price
City
ZipCode Life Quality
Parking
Size
# Room
Accessibility
Family Friendly

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Predicting the price of a house with neural network
Price
City
ZipCode Life Quality
Parking
Size
# Room
Accessibility
Family Friendly
Input Output
Discovered by the neural network

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Learning – Neural Networks
Output
Layer
Input
Layer
Hidden
Layers
Many
More…

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Learning is a Big Deal
It’s able to do better than other ML and Humans

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
https://github.com/precedenceguo/mx-rcnn https://github.com/zhreshold/mxnet-yolo
CNN: Object Detection

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
https://github.com/tornadomeet/mxnet-face
CNN: Face Detection

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
PredNet: Prediction Networks
What comes next
https://coxlab.github.io/prednet/

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CapsNet: Capsule Networks
Spatial Memory
https://arxiv.org/pdf/1710.09829v1.pdf

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Long Short Term Memory Networks (LSTM)
https://github.com/awslabs/sockeye

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Generative Adversarial Networks (GAN)
The future at work (already) today
Generating new ”celebrity” faces
https://github.com/tkarras/progressive_growing_of_gans

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Personalization Logistics Voice
Autonomous
Vehicles
Deep Learning at Amazon

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How do people ”build” Neural Nets?

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Model Zoos & Transfer Learning
• Full implementations of many state-of-the-art models
reported in the academic literature.
• Complete models, with scripts, pre-trained weights and
instructions on how to build and fine tune these models.

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
https://www.youtube.com/watch?v=qGotULKg8e0
• Over 10 million images from 300,000 hotels
• Fine-tuned a pre-trained Convolutional Neural Network
using 100,000 images
• Hotel descriptions now automatically feature the best
available images
Expedia
Ranking hotel images using deep learning
https://news.developer.nvidia.com/expedia-ranking-hotel-images-with-deep-learning/

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
So what does a deployed model looks like?

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Model
Model Server
Mobile
Desktop
IoT
Internet

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance
Availability
Networking
Monitoring
Model Decoupling
Cross Framework
Cross Platform
The Undifferentiated
Heavy Lifting of
Model Serving

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tensor Flow
Serving
Model Server
for MXNet
UC Berkeley
Clipper
Model Serving Systems for Deep Learning

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Model Archive
REST and
OpenAPI
Containerized
ONNX Support Operational Metrics

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Trained
Network
Model
Signature
Custom
Code
Auxiliary
Assets
Model Archive
Model Export CLI
Model Archive
Back

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
REST and OpenAPI
REST-like endpoint: <model-name>/predict
Endpoint auto-generated from the model’s signature.json
JSON encoding by default
Binary input via request payload
OpenAPI support – client code-gen and tooling
Back

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Requests
• Latencies
• Resources
Metrics
• Model Name
• Host Name
Dimensions
• Log / CSV
• AWS CloudWatch
Target
Operational Metrics
Back

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MMS
Dockerfile
Build
Push
Launch
Containerization
Container Cluster
MMS Container
MMS ContainerMMS Container
MXNet NGINX
MXNet Model Server
Lightweight virtualization, isolation, runs anywhere
Back

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
O(n2)
Pairs
MXNet
Caffe2
PyTorch
TF
CNTKCoreML
TensorRT
NGraph
SNPEMany Frameworks
ONNX Support
(initiative driven by AWS, Facebook and Microsoft)
Many Platforms
ONNX: Common IR
Supported in MMS v0.2

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
It’s Demo Time!

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Open source – try it out and file issues
github.com/awslabs/mxnet-model-server
adhorn@amazon.com

What's hot

MBL206_Building Conversational Bot Interfaces with Amazon Lex and AWS Mobile HubAmazon Web Services

IOT315_AWS IoT Rules EngineAmazon Web Services

Introduction to AI services for Developers - Builders Day IsraelAmazon Web Services

GPSBUS216-GPS Applying AI-ML to Find Security Needles in the HaystackAmazon Web Services

NEW LAUNCH! Amazon Neptune Overview and Customer Use Cases - DAT319 - re:Inve...Amazon Web Services

MCL335_RhythmAmazon Web Services

WPS204-Effective Emergency Response in AWS.pdfAmazon Web Services

IOT311_Customer Stories of Things, Cloud, and Analytics on AWSAmazon Web Services

ALX328_Smart Devices EverywhereAmazon Web Services

Enabling Big Data Computing at Pfizer with AWS Service Catalog and AWS Lambda...Amazon Web Services

MCL303-Deep Learning with Apache MXNet and GluonAmazon Web Services

AI and IoT innovation - an industry focusAmazon Web Services

EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...Amazon Web Services

NEW LAUNCH! Realtime and Offline application development using GraphQL with A...Amazon Web Services

How can your business benefit from going ServerlessAmazon Web Services

GPSBUS223-Starting Out with the AWS Partner NetworkAmazon Web Services

BAP307_Use Amazon Lex to Build a Customer Service Chatbot in Your Amazon Conn...Amazon Web Services

GPSBUS213-Success in the Public Sector MarketAmazon Web Services

MAE301_Boom for your BuckAmazon Web Services

AWS reInvent Recap 線上研討會Amazon Web Services

What's hot (20)

MBL206_Building Conversational Bot Interfaces with Amazon Lex and AWS Mobile Hub

IOT315_AWS IoT Rules Engine

Introduction to AI services for Developers - Builders Day Israel

GPSBUS216-GPS Applying AI-ML to Find Security Needles in the Haystack

NEW LAUNCH! Amazon Neptune Overview and Customer Use Cases - DAT319 - re:Inve...

MCL335_Rhythm

WPS204-Effective Emergency Response in AWS.pdf

IOT311_Customer Stories of Things, Cloud, and Analytics on AWS

ALX328_Smart Devices Everywhere

Enabling Big Data Computing at Pfizer with AWS Service Catalog and AWS Lambda...

MCL303-Deep Learning with Apache MXNet and Gluon

AI and IoT innovation - an industry focus

EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...

NEW LAUNCH! Realtime and Offline application development using GraphQL with A...

How can your business benefit from going Serverless

GPSBUS223-Starting Out with the AWS Partner Network

BAP307_Use Amazon Lex to Build a Customer Service Chatbot in Your Amazon Conn...

GPSBUS213-Success in the Public Sector Market

MAE301_Boom for your Buck

AWS reInvent Recap 線上研討會

Similar to Model Serving for Deep Learning

Deep learning systems model servingHagay Lupesko

Innovations fueled by IoT and the CloudAdrian Hornsby

Devoxx: Building AI-powered applications on AWSAdrian Hornsby

Model Serving for Deep Learning with MXNet Model ServerAmazon Web Services

Technological Accelerants for Organizational Transformation - DVC303 - re:Inv...Amazon Web Services

DVC303-Technological Accelerants for Organizational TransformationAmazon Web Services

Moving Forward with AIAmazon Web Services

Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenAWS Germany

Artificial Intelligence (Machine Learning) on AWS: How to StartVladimir Simek

AI / ML Services - re:Invent Comes to London 2.0Amazon Web Services

GPSTEC305-Machine Learning in Capital MarketsAmazon Web Services

NEW LAUNCH! Push Intelligence to the edge with Greengrass - IOT209 - re:Inven...Amazon Web Services

What is deep learning (and why you should care) - Talk at SJSU Oct 2018Hagay Lupesko

Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...Amazon Web Services

GPSTEC201_Building an Artificial Intelligence Practice for Consulting PartnersAmazon Web Services

Reactive Architectures with MicroservicesAWS Germany

CON203_Driving Innovation with ContainersAmazon Web Services

Driving Innovation with Containers - CON203 - re:Invent 2017Amazon Web Services

CMP314_Bringing Deep Learning to the Cloud with Amazon EC2Amazon Web Services

Similar to Model Serving for Deep Learning (20)

Deep learning systems model serving

Innovations fueled by IoT and the Cloud

Devoxx: Building AI-powered applications on AWS

Model Serving for Deep Learning with MXNet Model Server

Technological Accelerants for Organizational Transformation - DVC303 - re:Inv...

DVC303-Technological Accelerants for Organizational Transformation

Moving Forward with AI

Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten

Artificial Intelligence (Machine Learning) on AWS: How to Start

AI / ML Services - re:Invent Comes to London 2.0

GPSTEC305-Machine Learning in Capital Markets

NEW LAUNCH! Push Intelligence to the edge with Greengrass - IOT209 - re:Inven...

What is deep learning (and why you should care) - Talk at SJSU Oct 2018

Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...

GPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners

Reactive Architectures with Microservices

CON203_Driving Innovation with Containers

Driving Innovation with Containers - CON203 - re:Invent 2017

CMP314_Bringing Deep Learning to the Cloud with Amazon EC2

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Manulife - Insurer Transformation Award 2024The Digital Insurer

AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer

Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security

Corporate and higher education May webinar.pptxRustici Software

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

MS Copilot expands with MS Graph connectorsNanddeep Nachan

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

ICT role in 21st century education and its challengesrafiqahmad00786416

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

FWD Group - Insurer Innovation Award 2024The Digital Insurer

Exploring Multimodal Embeddings with MilvusZilliz

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

Ransomware_Q4_2023. The report. [EN].pdfOverkill Security

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Boost Fertility New Invention Ups Success Rates.pdf

Exploring the Future Potential of AI-Enabled Smartphone Processors

Manulife - Insurer Transformation Award 2024

AXA XL - Insurer Innovation Award Americas 2024

Cyberprint. Dark Pink Apt Group [EN].pdf

Corporate and higher education May webinar.pptx

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

MS Copilot expands with MS Graph connectors

Apidays New York 2024 - The value of a flexible API Management solution for O...

ICT role in 21st century education and its challenges

AWS Community Day CPH - Three problems of Terraform

FWD Group - Insurer Innovation Award 2024

Exploring Multimodal Embeddings with Milvus

How to Troubleshoot Apps for the Modern Connected Worker

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Ransomware_Q4_2023. The report. [EN].pdf

Model Serving for Deep Learning

1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Model Serving for Deep Learning ©2018 Amazon Web Services, Inc. or its affiliates, All rights reserved Adrian Hornsby, Technical Evangelist @adhorn

4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Predicting the price of a house with humans Price City ZipCode Life Quality Parking Size # Room Accessibility Family Friendly

5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Predicting the price of a house with neural network Price City ZipCode Life Quality Parking Size # Room Accessibility Family Friendly Input Output Discovered by the neural network

14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Generative Adversarial Networks (GAN) The future at work (already) today Generating new ”celebrity” faces https://github.com/tkarras/progressive_growing_of_gans

17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Model Zoos & Transfer Learning • Full implementations of many state-of-the-art models reported in the academic literature. • Complete models, with scripts, pre-trained weights and instructions on how to build and fine tune these models.

18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://www.youtube.com/watch?v=qGotULKg8e0 • Over 10 million images from 300,000 hotels • Fine-tuned a pre-trained Convolutional Neural Network using 100,000 images • Hotel descriptions now automatically feature the best available images Expedia Ranking hotel images using deep learning https://news.developer.nvidia.com/expedia-ranking-hotel-images-with-deep-learning/

21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Performance Availability Networking Monitoring Model Decoupling Cross Framework Cross Platform The Undifferentiated Heavy Lifting of Model Serving

25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. REST and OpenAPI REST-like endpoint: <model-name>/predict Endpoint auto-generated from the model’s signature.json JSON encoding by default Binary input via request payload OpenAPI support – client code-gen and tooling Back

26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Requests • Latencies • Resources Metrics • Model Name • Host Name Dimensions • Log / CSV • AWS CloudWatch Target Operational Metrics Back

27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MMS Dockerfile Build Push Launch Containerization Container Cluster MMS Container MMS ContainerMMS Container MXNet NGINX MXNet Model Server Lightweight virtualization, isolation, runs anywhere Back

28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. O(n2) Pairs MXNet Caffe2 PyTorch TF CNTKCoreML TensorRT NGraph SNPEMany Frameworks ONNX Support (initiative driven by AWS, Facebook and Microsoft) Many Platforms ONNX: Common IR Supported in MMS v0.2

Editor's Notes

Hi everyone! My name is Adrian Hornsby, I’m an technical evangelist at AWS , and one of my focus area is AI and especially Deep Learning. Today I’m going to talk about model serving. It’s a super interesting domain within Deep Learning, and I hope you will enjoy learning more about it. If you want to chat more - I’ll be here after the talk so feel free to drop by!
With a show of hands – How many of you know what Deep Learning is? How many have ever implemented a neural network? How many have deployed one to production? OK – so we have fair knowledge of DL. In this talk we will not dive into the details of DNNs, since this Is not the topic of this talk, nor do we have the time… But I will briefly discuss it to set the right context. So Deep Learning is a field within Machine Learning, which is by itself a field within AI. AI is the set of technique that enables computers to mimic, and surpass, human intelligence ML is a subset of AI, and is the set of mostly statistical techniques that enables computers to improve with experience – hence “learning” DL is a subset of ML, a technique inspired by the human brain – or neurons to be more exact – that uses interconnected artificial neurons to learn from samples.
So at the base of Deep Learning we have the Neural Network.Let’s briefly see what these networks look like. So a neural network at its most simplistic form is composed of layers, each consisting of a set of neurons, that are interconnected across layers with weighted edges. The term “deep learning” was coined due to these networks having many hidden layers, which makes them “deep”.The network takes the input vector, matrix, or more generally tensor, and feeds every element of the input into a unit in the input layer. From there the computation cascades across the units and layers, until we get an output in the output layer.Neural networks are non linear functions, and can learn non linear features, as the activation functions in each neuron is non linear.They enable learning features in a hierarchical way, with each layer learning a feature that is leveraging the features learned in the previous layer.And very importantly: it is a scalable architecture that can be made more complex with more learning capabilities by enlarging the network and/or modifying the operators in neurons.And it is typically very heavy computationally. Modern networks such as resnet-152, which has 152 layers, requires 11GFLOPS for a single forward pass.
Beyond the growing usage of DL in applications and devices around us, there is another interesting aspect to deep learning, and that is how well it does compared to the dominant species on this planet: us! One of the first areas Deep Learning was able to demonstrate state of the art results, was in the domain of Machine Vision. A classical problem in that domain is Object Classification: given an image, identify the most prominent object in that image out of a set of pre-defined classes. A DNN presented in 2012 by Alex Krizhevsky, was able to leap-frog the best known algo to date by over 30%. That was really a major leap, and since then every year the best algorithm for Object Classification, and many other Vision tasks, are based on Deep Learning, with results that keep on getting better. Research paper by Geirhos from 2017 shows that DNNs already outperform humans in Object Classification – a task us humans have been programmed to specialize in by evolution. The paper also shows that human vision actually performs better when noise is introduced – it may make you feel better, it worked for me  AlexNet paper: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf Humans vs DNNs paper: https://arxiv.org/pdf/1706.06969.pdf
The PredNet is a deep convolutional recurrent neural network inspired by the principles of predictive coding from the neuroscience literature
A bit about why Deep Learning is a big deal You can see Deep Learning applied in more and more domains, with a growing impact on our lives. If you look at the breadth of AI applied within Amazon alone, you can see DL in the Retail Website within personalization and recs, you can see it optimizing Amazon’s logistics, you probably noticed the boom voice-enabled personal assistants, and you may have heard that Amazon drones also rely on deep learning, just as other autonomous vehicles tech is relying on it. And of course the list goes on.
OK, so hopefully by now you are convinced that Deep Learning is awesome, and the next thing you want to do is use it in your production system. So, how do you actually use a deep learning model in your production environment? Let’s start with the outcome we’re trying to achieve. In fact, it is pretty straight-forward, and is not very different than deploying any other service. We have a trained model, that we want to use for inference, We have a bunch of clients: mobile, desktop, iot, cloud – or any combination of those We want to have a server of sorts, hosting a trained model, exposing an inference API, which when called runs a feed forward through the network doing the deep learning “magic” Naveen explained earlier. That’s a very simple schema of model serving setup.
As we saw in the previous slide, in many ways, serving deep learning models is similar to other, more traditional, serving frameworks out there, such as Apache Tomcat. And indeed in many ways, Model Serving is undifferentiated heavy lifting. That is a term we use and focus on in AWS a lot. What it means is all of the aspects that are necessary to get the job done, but that are not differentiating the business and win against the competition. Setting up servers, networks, etc. is all UHL. Let’s quickly go over the main concerns Model Serving system needs to address: - Performance – this concern is about providing a scalable architecture that is able to meet target TPS, making an efficient use of the available compute resources, strike the right balance between throughput and latency. It is especially important for Deep Learning, since the computational load of running a single inference is typically significant. As a reference, a model such as ResNet-152 requires billions of FLOPs for a single forward pass. Availability – to make your application working properly all the time, you want to minimize down time, and avoid offline status when load is high, or when you are busy deploying a new model. Networking – making your model consumable means you need to expose a network endpoint that clients can call to get predictions. This endpoint needs to support standard interfaces such as HTTP, error codes, security and more. Monitoring – having any service in production means you need the ability to look into your operational metrics in near-real time; things like resource utilization on host, inference latencies, requests and errors. Model Decoupling– when you are serving models you want to offer a way that enables to use trained models without knowing anything about their inner working details. The model may be identifying cats in images, or doing sentiment analysis. No change should be done to the server beyond deploying a different model. Cross Framework – there are many different Neural Network frameworks: MXNet, TensorFlow, PyTorch, Caffe, and more. “Same Same, But Different” - all similar, but different in style and implementation details. We want a model server that just works, regardless of the framework used to build and train the model. Cross Platform – similar to how there are many frameworks, there are also many platforms you can run your server on. From the OS (Linux, Windows) to the actual compute processor which can be a CPU, a GPU or a TPU. And beyond all of that, one uber-concern that is an important meta concern is Ease of Use – all of the concerns just mentioned needs to be addressed in a way that is easy to use, quick to learn, and just work!
Are there systems that handle that for us?The answer is: yes! Deep Learning serving is pretty nascent, but there are a few systems - let’s go over a few: - TF Serving was open sourced Feb 2016, and went 1.0 Aug 2017. It is designed to serve TF models over gRPC, and is used extensively within Google. - Clipper is an ongoing project by RiseLabs at UC Berkely. Open sourced in 2017, currently in v0.2. It is a machine learning serving system with various backend engine support, including Caffe, TF and recently also MXNet - MXNet Model Server, or MMS for short, is actively developed by my team, open sourced Dec 2017, it is built on top of Apache MXNet, which is AWS’s DL framework of choice. MMS is almost at v0.2, in active development, and in this talk we will dive deeper into how it’s designed and some of the exciting engineering challenges we have in front of us as we keep developing the system. Image source:- TF - https://commons.wikimedia.org/wiki/File:Tensorflow_logo.svg Clipper - https://github.com/ucbrise/clipper/blob/develop/images/clipper-logo.png (Apache 2.0 license)
Now that you have seen MMS in action on a simple use case, I’d like to dive into some technical details on how MMS is engineered and used. I'll start with the Model ArchiveNow let's talk about MMS' network interface.Let's see how MMS uses containers.Metrics.And lastly, I'd like to chat about how we're leveraging ONNX to achieve cross platform support.
To decouple the actual model from the serving framework, we designed the “Model Archive”. Model Archive is a file that encapsulates all of the model-specific logic. It is the one-and-only resource MMS needs in order to set up serving for the model. In many ways, it is similar to Java’s JAR file – and indeed we have took a similar implementation approach. Let’s take a look at what is needed to generate a model archive: a trained neural network, a signature file defining input and output types and shapes, which tells MMS what endpoints to setup, and how to transform the inputs and outputs. Then there’s the option to include custom code, which allows users to add feature extraction logic, or any other init/pre/post processing logic they may want to build into the model. Additionally, users can package whatever other additional files their model will need at runtime. Class labels is an example use case for aux files. Users use the MMS export CLI to package up all of these assets into a Model Archive package, which is then used by MMS to initialize and serve requests as we’ve seen earlier. This decoupling enables a clean separation of responsibilities between model creation and model serving.1. The ML Engineer or Data Scientist build and trains the model, writes feature extraction code, and then packages it all up into the archive. 2. The Software Engineer or Dev Ops Engineer setup up MMS on a prod cluster, and configures MMS to point to the archive, either on the local FS or on a remote URL. Let’s quickly jump to the console to see how this looks (DEMO) Show a pre-prepared folder with model, signature, code and aux files Open the signature and show Open the code and show Show how the export utility is used
One of our major design decisions when planning MMS was to focus on ease of use, while not introducing any one-way doors that will prevent improvements in the future. With that in mind, we decided to: Expose REST-like endpoints over HTTP - arguably the easiest endpoint to integrate with, which is quite different than TFS's approach which supports only gRPC for performance reasons. All of these endpoints are automatically generated based on the model archive's signature.json JSON is the default encoding format for endpoint - to make it easy for clients to integrate with MMS has an out-of-the-box support for handling binary inputs such as JPEG. With this support, clients can include a JPG image as part of the request payload, and MMS will automatically translate this into an input tensor and resize it for you so it fits the model’s expected input tensor. Support OpenAPI specification - this enables hooking up tooling to automate tasks, such as auto-generating client code across many popular programming languages. Let’s see how this looks – Demo 3 - Curl the api-description endpoint and go over the response
Anyone who ever owned a service in production knows how critical it is to have a reliable and extensive set of operational metrics. You want them reported at a relatively low interval, say every 1 minutes or so, and report operational data that enables the service owner to know important stuff, like errors, traffic, latencies, etc. We took care to design MMS with built in Ops Metrics reporting, so MMS supports out of the box:(1) Requests (2) Latencies (3) Resources We report all metrics across model and hostname dimensions, so users can setup their monitoring and alarming across an entire cluster, or across a specific model, etc. And MMS integrates directly with AWS CloudWatch, so users can use CW’s console and integrations to have full visibility and control over their prod setup.
As I demoed, you can easily run MMS on your Mac. While this will work well for prototyping or testing, it is not a scalable setup for high-load production traffic. For production deployments we recommend using containers: they are lightweight, provides isolation and have wide platform support. The MMS repo includes Docker images that are pre-configured with required software components and configuration for optimal execution. Users can use this image with their container orchestration tool of choice, and there’s plenty of good options out there such as ECS, Docker and Kubernetes. Users can build the pre-configured image MMS provides, push it to a registry, and then orchestrate it with a platform such as ECS.ECS manages the cluster, including scaling, load balancing, networking, instrumenting and more. The MMS image itself includes an NGINX network reverse proxy, integrated with MMS. To learn more about MMS container setup, visit the GitHub repo, where we have details and instructions. We’re also planning to publish a blog post about this specific use case soon!
One of the Model Serving concerns we talked about earlier was Cross Framework, and indeed there are many awesome DL framework to choose from. In an ideal world, you will build your DL model with whatever framework you fancy, and then just deploy it – and it will just work. Think about how the JVM works – as long as the language compiles to ByteCode – it will run regardless of the language you used! A good model server will enable the same flexibility. Another concern we talked about was cross platform support. Intel, Nvidia, Apple’s CoreML…– many different and important runtime platforms. The problem here, as you may have observed, is that to support all of them in a naïve way we need order of N^2 translations/conversions which is pretty hard to build and maintain. This is where ONNX comes into play. ONNX is an initiative driven by AWS, Facebook and Microsoft, with the goal of defining an open neural network and operator definition. You can check it out on onnx.ai. Support includes quite a few frameworks and platforms– and the list is gradually expanding! Model Server will introduce ONNX support in the coming release that is going out next week. With ONNX support, users will be able to package up models built with frameworks that support ONNX, such as PyTorch, Caffe2, CNTK. In the future, we may also leverage ONNX to help MMS run on more platforms, that will add support for ONNX.
OK, without further ado, let’s see how MMS looks in practice. We’ll start with the basic use case: installing MMS, loading a model, serving it, and doing prediction. Ready? Demo: Install MMS Show the model zoo, copy a model link Download an image Use cURL to do inference Examine the results
Thank you for listening, I hope you learned about deep learning systems and serving, and had a good time. MXNet and Model Server are open source - feel free to try it out and file issues. We’re also hiring aggressively, so if you have talented friends that want to be part of the DL revolution - feel free to refer and talk to us! Thank you!

Model Serving for Deep Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Model Serving for Deep Learning

Similar to Model Serving for Deep Learning (20)

More from Adrian Hornsby

More from Adrian Hornsby (20)

Recently uploaded

Recently uploaded (20)

Model Serving for Deep Learning

Editor's Notes