Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Deploying Data Science with Docker and AWS
1. Deploying Data Science with
Docker and AWS
Audience: Cambridge AWS Meetup Group
Presenter: Matt McDonnell, Data Scientist at Metail
Date: 9th June 2016
2. Context
Lots of event stream data
Many AWS components
Outputs:
- Business Intelligence
- Bespoke Analysis
- Productionised Science
3. What?
Goal: Moving laptop analyses onto a server
Turn :
<types>run_analysis.sh<presses enter>
… analysis script retrieves data from DB, Looker, web, etc. …
… runs analysis …
… outputs results as csv, png, etc. to local hard disk …
<gets back command prompt>
Into :
Automated process running on a server
4. Why?
• Production scheduled task e.g. Firm Wide Metrics daily processing
• Make use of more powerful Amazon Web Services (AWS) cloud resources
for large scale analysis
• Ease of deployment for Data Science analysts
• Build consistent development environment
How?
• Containerize applications and runtime using Docker to produce images
• Store images on AWS Elastic Container Registry (ECR)
• Run images either locally, or Amazon Elastic Container Service (ECS)
• Use AWS Lambda functions to trigger scheduled tasks (or react to events)
5. What is Docker?
“Docker containers wrap up a piece of software in a complete
filesystem that contains everything it needs to run: code, runtime,
system tools, system libraries – anything you can install on a server. This
guarantees that it will always run the same, regardless of the
environment it is running in.” -- https://www.docker.com/what-docker
Public code: store Dockerfile on GitHub, use Travis to automatically
build image on DockerHub
Private code: private Dockerfile, build locally, push image to AWS Elastic
Container Registry
6. Example application: retrieve market data
PyAnalysis
Application code built on PCR image
https://github.com/mattmcd/PyAnalysis
PCR: Python Component Runtime
Base Docker image
https://github.com/mattmcd/PCR
7. Where? Amazon Web Services Cloud
• Elastic Container Service (ECS)
• Defines the task that runs the container
• Runs tasks on a cluster of EC2 nodes
• EC2 instance set up to act as node
• Needs to be an AWS ECS optimized AMI
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html
• Needs an IAM Role that has:
• AmazonEC2ContainerServiceforEC2Role policy attached
• Policies to allow access to any AWS resources needed e.g. S3
• Lambda function to trigger ECS task
• cron equivalent by using CloudWatch scheduled events
8. EC2 Instance Security Group
EC2 instance used by ECS can be locked down – no need to SSH in to it so no inbound ports needed
9. EC2 Instance AMI
Use latest available Amazon ECS Optimized AMI – it has Docker and ECS Container Agent already installed
10. EC2 Instance Details
Enable Auto-assign Public IP so ECS can connect and assign a custom IAM Role as a hook for access permissions
11. EC2 Instance IAM Role
Attach AmazonEC2ContainerServiceForEC2Role Policy and any extra access Policies for containers on the instance