SlideShare a Scribd company logo
1 of 17
Download to read offline
Deploying Data Science with
Docker and AWS
Audience: Cambridge AWS Meetup Group
Presenter: Matt McDonnell, Data Scientist at Metail
Date: 9th June 2016
Context
Lots of event stream data
Many AWS components
Outputs:
- Business Intelligence
- Bespoke Analysis
- Productionised Science
What?
Goal: Moving laptop analyses onto a server
Turn :
<types>run_analysis.sh<presses enter>
… analysis script retrieves data from DB, Looker, web, etc. …
… runs analysis …
… outputs results as csv, png, etc. to local hard disk …
<gets back command prompt>
Into :
Automated process running on a server
Why?
• Production scheduled task e.g. Firm Wide Metrics daily processing
• Make use of more powerful Amazon Web Services (AWS) cloud resources
for large scale analysis
• Ease of deployment for Data Science analysts
• Build consistent development environment
How?
• Containerize applications and runtime using Docker to produce images
• Store images on AWS Elastic Container Registry (ECR)
• Run images either locally, or Amazon Elastic Container Service (ECS)
• Use AWS Lambda functions to trigger scheduled tasks (or react to events)
What is Docker?
“Docker containers wrap up a piece of software in a complete
filesystem that contains everything it needs to run: code, runtime,
system tools, system libraries – anything you can install on a server. This
guarantees that it will always run the same, regardless of the
environment it is running in.” -- https://www.docker.com/what-docker
Public code: store Dockerfile on GitHub, use Travis to automatically
build image on DockerHub
Private code: private Dockerfile, build locally, push image to AWS Elastic
Container Registry
Example application: retrieve market data
PyAnalysis
Application code built on PCR image
https://github.com/mattmcd/PyAnalysis
PCR: Python Component Runtime
Base Docker image
https://github.com/mattmcd/PCR
Where? Amazon Web Services Cloud
• Elastic Container Service (ECS)
• Defines the task that runs the container
• Runs tasks on a cluster of EC2 nodes
• EC2 instance set up to act as node
• Needs to be an AWS ECS optimized AMI
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html
• Needs an IAM Role that has:
• AmazonEC2ContainerServiceforEC2Role policy attached
• Policies to allow access to any AWS resources needed e.g. S3
• Lambda function to trigger ECS task
• cron equivalent by using CloudWatch scheduled events
EC2 Instance Security Group
EC2 instance used by ECS can be locked down – no need to SSH in to it so no inbound ports needed
EC2 Instance AMI
Use latest available Amazon ECS Optimized AMI – it has Docker and ECS Container Agent already installed
EC2 Instance Details
Enable Auto-assign Public IP so ECS can connect and assign a custom IAM Role as a hook for access permissions
EC2 Instance IAM Role
Attach AmazonEC2ContainerServiceForEC2Role Policy and any extra access Policies for containers on the instance
ECS Task
ECS task retrieves image and runs it
Lambda function
Use the lambda-canary blueprint as a basis for cron job equivalents
Lambda function
cron job equivalent via CloudWatch scheduled event
Lambda Function
Simple Lambda function to run task on ECS
Lambda function IAM role
AWS will create default IAM Roles for Lambda function – need to add ecs:RunTask to run container
Demo / Q&A
Blog posts
• ‘Scheduled Downloads using AWS EC2 and Docker — Medium’ http://bit.ly/1TO9a1h (me)
• ‘Better Together: Amazon ECS and AWS Lambda’ http://amzn.to/1UkitEF (not me)
Code samples
• https://github.com/mattmcd/PyAnalysis
• https://github.com/mattmcd/PCR
Docker images
• mattmcd/pyanalysis
• mattmcd/pcr
Me
• Twitter @mattmcd
• Email matt@metail.com or matt@matt-mcdonnell.com

More Related Content

What's hot

Long running aws lambda - Joel Schuweiler, Minneapolis
Long running aws lambda -  Joel Schuweiler, MinneapolisLong running aws lambda -  Joel Schuweiler, Minneapolis
Long running aws lambda - Joel Schuweiler, MinneapolisAWS Chicago
 
Kubernetes in Azure
Kubernetes in AzureKubernetes in Azure
Kubernetes in AzureKarl Ots
 
Doing Azure With PowerShell
Doing Azure With PowerShellDoing Azure With PowerShell
Doing Azure With PowerShellThomas Lee
 
Kube London May 2018
Kube London May 2018Kube London May 2018
Kube London May 2018Justin Davies
 
Nested Beanstalk Deployment - Brett Sutter, Minneapolis
 Nested Beanstalk Deployment - Brett Sutter, Minneapolis Nested Beanstalk Deployment - Brett Sutter, Minneapolis
Nested Beanstalk Deployment - Brett Sutter, MinneapolisAWS Chicago
 
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Amazon Web Services
 
SoCal NodeJS Meetup 20170215_aws_lambda
SoCal NodeJS Meetup 20170215_aws_lambdaSoCal NodeJS Meetup 20170215_aws_lambda
SoCal NodeJS Meetup 20170215_aws_lambdaStefan Deusch
 
Azure kubernetes service (aks) part 3
Azure kubernetes service (aks)   part 3Azure kubernetes service (aks)   part 3
Azure kubernetes service (aks) part 3Nilesh Gule
 
Walk-through: Amazon ECS
Walk-through: Amazon ECSWalk-through: Amazon ECS
Walk-through: Amazon ECSKnoldus Inc.
 
AWS Elastic Container Service (ECS) with a CI Pipeline Overview
AWS Elastic Container Service (ECS) with a CI Pipeline OverviewAWS Elastic Container Service (ECS) with a CI Pipeline Overview
AWS Elastic Container Service (ECS) with a CI Pipeline OverviewWyn B. Van Devanter
 
Major Container Platform Comparison
Major Container Platform ComparisonMajor Container Platform Comparison
Major Container Platform Comparisonindu Yadav
 
Serverless Apps with Open Whisk
Serverless Apps with Open Whisk Serverless Apps with Open Whisk
Serverless Apps with Open Whisk Dev_Events
 
Binary Studio Academy 2016. MS Azure. Cloud hosting.
Binary Studio Academy 2016. MS Azure. Cloud hosting.Binary Studio Academy 2016. MS Azure. Cloud hosting.
Binary Studio Academy 2016. MS Azure. Cloud hosting.Binary Studio
 
TugaIT 2016 - Docker and the world of “containerized" environments​
TugaIT 2016 - Docker and the world of “containerized" environments​TugaIT 2016 - Docker and the world of “containerized" environments​
TugaIT 2016 - Docker and the world of “containerized" environments​Pedro Sousa
 
Deploy Elasticsearch Cluster on Kubernetes
Deploy Elasticsearch Cluster on KubernetesDeploy Elasticsearch Cluster on Kubernetes
Deploy Elasticsearch Cluster on KubernetesIsmaeel Enjreny
 

What's hot (20)

Long running aws lambda - Joel Schuweiler, Minneapolis
Long running aws lambda -  Joel Schuweiler, MinneapolisLong running aws lambda -  Joel Schuweiler, Minneapolis
Long running aws lambda - Joel Schuweiler, Minneapolis
 
AKS
AKSAKS
AKS
 
Azure AKS
Azure AKSAzure AKS
Azure AKS
 
Kubernetes in Azure
Kubernetes in AzureKubernetes in Azure
Kubernetes in Azure
 
Doing Azure With PowerShell
Doing Azure With PowerShellDoing Azure With PowerShell
Doing Azure With PowerShell
 
Kube London May 2018
Kube London May 2018Kube London May 2018
Kube London May 2018
 
Nested Beanstalk Deployment - Brett Sutter, Minneapolis
 Nested Beanstalk Deployment - Brett Sutter, Minneapolis Nested Beanstalk Deployment - Brett Sutter, Minneapolis
Nested Beanstalk Deployment - Brett Sutter, Minneapolis
 
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
 
AWS Kinesis
AWS KinesisAWS Kinesis
AWS Kinesis
 
SoCal NodeJS Meetup 20170215_aws_lambda
SoCal NodeJS Meetup 20170215_aws_lambdaSoCal NodeJS Meetup 20170215_aws_lambda
SoCal NodeJS Meetup 20170215_aws_lambda
 
Azure kubernetes service (aks) part 3
Azure kubernetes service (aks)   part 3Azure kubernetes service (aks)   part 3
Azure kubernetes service (aks) part 3
 
Apache JClouds
Apache JCloudsApache JClouds
Apache JClouds
 
Walk-through: Amazon ECS
Walk-through: Amazon ECSWalk-through: Amazon ECS
Walk-through: Amazon ECS
 
AWS Elastic Container Service (ECS) with a CI Pipeline Overview
AWS Elastic Container Service (ECS) with a CI Pipeline OverviewAWS Elastic Container Service (ECS) with a CI Pipeline Overview
AWS Elastic Container Service (ECS) with a CI Pipeline Overview
 
Major Container Platform Comparison
Major Container Platform ComparisonMajor Container Platform Comparison
Major Container Platform Comparison
 
Serverless Apps with Open Whisk
Serverless Apps with Open Whisk Serverless Apps with Open Whisk
Serverless Apps with Open Whisk
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web Services
 
Binary Studio Academy 2016. MS Azure. Cloud hosting.
Binary Studio Academy 2016. MS Azure. Cloud hosting.Binary Studio Academy 2016. MS Azure. Cloud hosting.
Binary Studio Academy 2016. MS Azure. Cloud hosting.
 
TugaIT 2016 - Docker and the world of “containerized" environments​
TugaIT 2016 - Docker and the world of “containerized" environments​TugaIT 2016 - Docker and the world of “containerized" environments​
TugaIT 2016 - Docker and the world of “containerized" environments​
 
Deploy Elasticsearch Cluster on Kubernetes
Deploy Elasticsearch Cluster on KubernetesDeploy Elasticsearch Cluster on Kubernetes
Deploy Elasticsearch Cluster on Kubernetes
 

Viewers also liked

Docker @ Data Science Meetup
Docker @ Data Science MeetupDocker @ Data Science Meetup
Docker @ Data Science MeetupDaniel Nüst
 
Agile deployment predictive analytics on hadoop
Agile deployment predictive analytics on hadoopAgile deployment predictive analytics on hadoop
Agile deployment predictive analytics on hadoopDataWorks Summit
 
PMML Execution of R Built Predictive Solutions
PMML Execution of R Built Predictive SolutionsPMML Execution of R Built Predictive Solutions
PMML Execution of R Built Predictive Solutionsaguazzel
 
Pattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and HadoopPattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and HadoopPaco Nathan
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Vincenzo Ferme
 
Using python and docker for data science
Using python and docker for data scienceUsing python and docker for data science
Using python and docker for data scienceCalvin Giles
 
PMML - Predictive Model Markup Language
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Languageaguazzel
 
Docker for data science
Docker for data scienceDocker for data science
Docker for data scienceCalvin Giles
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilChristian Frech
 

Viewers also liked (9)

Docker @ Data Science Meetup
Docker @ Data Science MeetupDocker @ Data Science Meetup
Docker @ Data Science Meetup
 
Agile deployment predictive analytics on hadoop
Agile deployment predictive analytics on hadoopAgile deployment predictive analytics on hadoop
Agile deployment predictive analytics on hadoop
 
PMML Execution of R Built Predictive Solutions
PMML Execution of R Built Predictive SolutionsPMML Execution of R Built Predictive Solutions
PMML Execution of R Built Predictive Solutions
 
Pattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and HadoopPattern: PMML for Cascading and Hadoop
Pattern: PMML for Cascading and Hadoop
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
 
Using python and docker for data science
Using python and docker for data scienceUsing python and docker for data science
Using python and docker for data science
 
PMML - Predictive Model Markup Language
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Language
 
Docker for data science
Docker for data scienceDocker for data science
Docker for data science
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 

Similar to Deploying Data Science with Docker and AWS

Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
 
Continuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceContinuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceAmazon Web Services
 
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...Amazon Web Services
 
AWS Summit Sydney 2014 | Running your First Application on AWS
AWS Summit Sydney 2014 | Running your First Application on AWSAWS Summit Sydney 2014 | Running your First Application on AWS
AWS Summit Sydney 2014 | Running your First Application on AWSAmazon Web Services
 
Deep Dive on Microservices and Docker
Deep Dive on Microservices and DockerDeep Dive on Microservices and Docker
Deep Dive on Microservices and DockerKristana Kane
 
AWS Workshop 102
AWS Workshop 102AWS Workshop 102
AWS Workshop 102lynn80827
 
SRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerSRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerAmazon Web Services
 
Running your First Application on AWS
Running your First Application on AWS Running your First Application on AWS
Running your First Application on AWS Amazon Web Services
 
Running your First Application on AWS
Running your First Application on AWSRunning your First Application on AWS
Running your First Application on AWSAmazon Web Services
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAmazon Web Services
 
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2Amazon Web Services
 
Docker on Amazon ECS
Docker on Amazon ECSDocker on Amazon ECS
Docker on Amazon ECSDeepak Kumar
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsAvere Systems
 
Continuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceContinuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceAmazon Web Services
 
ECS and Docker at Okta
ECS and Docker at OktaECS and Docker at Okta
ECS and Docker at OktaJon Todd
 
Architecting for the Cloud: Best Practices
Architecting for the Cloud: Best PracticesArchitecting for the Cloud: Best Practices
Architecting for the Cloud: Best PracticesAmazon Web Services
 

Similar to Deploying Data Science with Docker and AWS (20)

Containers on AWS
Containers on AWSContainers on AWS
Containers on AWS
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
Continuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceContinuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container Service
 
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
 
AWS Summit Sydney 2014 | Running your First Application on AWS
AWS Summit Sydney 2014 | Running your First Application on AWSAWS Summit Sydney 2014 | Running your First Application on AWS
AWS Summit Sydney 2014 | Running your First Application on AWS
 
Deep Dive on Microservices and Docker
Deep Dive on Microservices and DockerDeep Dive on Microservices and Docker
Deep Dive on Microservices and Docker
 
Amazon ECS Deep Dive
Amazon ECS Deep DiveAmazon ECS Deep Dive
Amazon ECS Deep Dive
 
AWS Workshop 102
AWS Workshop 102AWS Workshop 102
AWS Workshop 102
 
Getting Started on AWS
Getting Started on AWSGetting Started on AWS
Getting Started on AWS
 
SRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerSRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and Docker
 
Running your First Application on AWS
Running your First Application on AWS Running your First Application on AWS
Running your First Application on AWS
 
Running your First Application on AWS
Running your First Application on AWSRunning your First Application on AWS
Running your First Application on AWS
 
Introduction to DevOps on AWS
Introduction to DevOps on AWSIntroduction to DevOps on AWS
Introduction to DevOps on AWS
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the Cloud
 
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
Amazon EC2 Container Service: Manage Docker-Enabled Apps in EC2
 
Docker on Amazon ECS
Docker on Amazon ECSDocker on Amazon ECS
Docker on Amazon ECS
 
Building a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for AnalystsBuilding a Just-in-Time Application Stack for Analysts
Building a Just-in-Time Application Stack for Analysts
 
Continuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceContinuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container Service
 
ECS and Docker at Okta
ECS and Docker at OktaECS and Docker at Okta
ECS and Docker at Okta
 
Architecting for the Cloud: Best Practices
Architecting for the Cloud: Best PracticesArchitecting for the Cloud: Best Practices
Architecting for the Cloud: Best Practices
 

Recently uploaded

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 

Recently uploaded (20)

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 

Deploying Data Science with Docker and AWS

  • 1. Deploying Data Science with Docker and AWS Audience: Cambridge AWS Meetup Group Presenter: Matt McDonnell, Data Scientist at Metail Date: 9th June 2016
  • 2. Context Lots of event stream data Many AWS components Outputs: - Business Intelligence - Bespoke Analysis - Productionised Science
  • 3. What? Goal: Moving laptop analyses onto a server Turn : <types>run_analysis.sh<presses enter> … analysis script retrieves data from DB, Looker, web, etc. … … runs analysis … … outputs results as csv, png, etc. to local hard disk … <gets back command prompt> Into : Automated process running on a server
  • 4. Why? • Production scheduled task e.g. Firm Wide Metrics daily processing • Make use of more powerful Amazon Web Services (AWS) cloud resources for large scale analysis • Ease of deployment for Data Science analysts • Build consistent development environment How? • Containerize applications and runtime using Docker to produce images • Store images on AWS Elastic Container Registry (ECR) • Run images either locally, or Amazon Elastic Container Service (ECS) • Use AWS Lambda functions to trigger scheduled tasks (or react to events)
  • 5. What is Docker? “Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.” -- https://www.docker.com/what-docker Public code: store Dockerfile on GitHub, use Travis to automatically build image on DockerHub Private code: private Dockerfile, build locally, push image to AWS Elastic Container Registry
  • 6. Example application: retrieve market data PyAnalysis Application code built on PCR image https://github.com/mattmcd/PyAnalysis PCR: Python Component Runtime Base Docker image https://github.com/mattmcd/PCR
  • 7. Where? Amazon Web Services Cloud • Elastic Container Service (ECS) • Defines the task that runs the container • Runs tasks on a cluster of EC2 nodes • EC2 instance set up to act as node • Needs to be an AWS ECS optimized AMI https://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html • Needs an IAM Role that has: • AmazonEC2ContainerServiceforEC2Role policy attached • Policies to allow access to any AWS resources needed e.g. S3 • Lambda function to trigger ECS task • cron equivalent by using CloudWatch scheduled events
  • 8. EC2 Instance Security Group EC2 instance used by ECS can be locked down – no need to SSH in to it so no inbound ports needed
  • 9. EC2 Instance AMI Use latest available Amazon ECS Optimized AMI – it has Docker and ECS Container Agent already installed
  • 10. EC2 Instance Details Enable Auto-assign Public IP so ECS can connect and assign a custom IAM Role as a hook for access permissions
  • 11. EC2 Instance IAM Role Attach AmazonEC2ContainerServiceForEC2Role Policy and any extra access Policies for containers on the instance
  • 12. ECS Task ECS task retrieves image and runs it
  • 13. Lambda function Use the lambda-canary blueprint as a basis for cron job equivalents
  • 14. Lambda function cron job equivalent via CloudWatch scheduled event
  • 15. Lambda Function Simple Lambda function to run task on ECS
  • 16. Lambda function IAM role AWS will create default IAM Roles for Lambda function – need to add ecs:RunTask to run container
  • 17. Demo / Q&A Blog posts • ‘Scheduled Downloads using AWS EC2 and Docker — Medium’ http://bit.ly/1TO9a1h (me) • ‘Better Together: Amazon ECS and AWS Lambda’ http://amzn.to/1UkitEF (not me) Code samples • https://github.com/mattmcd/PyAnalysis • https://github.com/mattmcd/PCR Docker images • mattmcd/pyanalysis • mattmcd/pcr Me • Twitter @mattmcd • Email matt@metail.com or matt@matt-mcdonnell.com