SlideShare a Scribd company logo
1 of 47
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
High-Throughput Genomics on AWS
A a r o n F r i e d m a n
A n g e l P i z a r r o
L F S 3 0 9
N o v e m b e r 2 7 , 2 0 1 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• Presentation - Introduction and AWS Batch Deep Dive (20 minutes)
• Hands-on Lab - Packaging applications as Docker containers and
integrating into AWS Batch to align genome sequences (1 hour)
• Presentation - AWS Lambda and AWS Step Functions (20 minutes)
• Hands-on Lab - Defining a end-to-end genomic data analysis workflow
using Step Functions, Lambda, and Batch (40 minutes)
Prerequisites and materials
amzn.to/reinvent17-lfs309
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The problem
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
Serial steps
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
Parallel steps
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics data processing
Typical workflow in genomics analysis
Retry logic
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics
Amazon ECR
Amazon S3
Applications
Data
Job Layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics
Amazon ECR
Amazon S3
AWS Batch
Job Layer Batch Layer
Job
Execution
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Job Layer Batch Layer Workflow Layer
Orchestration
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A reference architecture for genomics
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Job Layer Batch Layer Workflow Layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The job layer: Application
packaging using Docker
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The reference architecture
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Workflow LayerBatch LayerJob Layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bioinformatics application stacks
* Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bioinformatics application stacks
* Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Virtual machines vs. containers
Pros:
• Easy application publishing
• Clean dependency bundling
Cons:
• Large OS images
• Duplication of basic services
• Long start time
Application
Bins/Libs
OS
Application
Bins/Libs
OS
Application
Bins/Libs
OS
Application
Pros:
• Easy application publishing
• Clean dependency bundling
• Shared dependencies
• Shared OS services
• Small images
Cons:
• Some cross container
networking issues
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FROM ubuntu:16.04
RUN apt-get install -y python-pip python-dev
RUN pip install PIL
FROM python:2.7
RUN pip install numpy pandas
Docker Dockerfile and the build process
961f9d3583
c6d01316e4
a408d3cfe23
python27ubuntu:precise
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Docker container sources
Community containers Custom developed
• Support for S3 download and check
pointing
• Scratch space management
• Container metadata management
• Full control on the software stack
• Licensing
• Monitoring
• Security and compliance adherence
https://dockstore.org/
http://biocontainers.pro/
http://bioshadock.genouest.org/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The batch layer: AWS Batch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The reference architecture
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Workflow LayerBatch LayerJob Layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Introducing AWS Batch
Fully Managed
Task Execution
No software to install or
servers to manage. AWS
Batch provisions and
scales your infrastructure
Integrated with AWS
AWS Batch jobs can easily
and securely interact with
services such as Amazon S3,
DynamoDB, and Rekognition
Cost-Efficient
AWS Batch launches compute
resources tailored to your jobs
and can provision Amazon EC2
and EC2 Spot instances
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Batch concepts
• Jobs
• Job definitions
• Job queue
• Compute environments
• Scheduler
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: AWS Batch job architecture
IAM Role for
Batch Job
Amazon S3
Input Files
Queue of
Runnable Jobs
S3 Events Trigger
Lambda Function
Submits Batch Job
AWS Batch
Compute Environments
AWS Batch Job
Output
Job Definition
Job Resource Requirements
and other parameters
AWS Batch Execution
Application
Image
AWS Batch
Scheduler
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Visual Representation of AWS Batch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Executing Job(s)
Specify Docker run parameters as container overrides
Specify Job Queue
Submit Dependencies
response = batch_client.submit_job(
dependsOn=event['dependsOn'],
containerOverrides=event['containerOverrides'],
jobDefinition=event['jobDefinition'],
jobName=event['jobName'],
jobQueue=event['jobQueue'],
)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Considerations for Batch Layer for
genomics
? Data Staging
> Use Amazon S3 to store reference and input data, store
results
? Multi-tenancy
> Have processes work with temporary directories
? Storage cost/efficiency
> Each Job cleans up after itself before returning
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lab 1: Creating the job and batch
layers
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The workflow layer: AWS Lambda
and AWS Step Functions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The reference architecture
Lambda
functions
Amazon ECR
Amazon S3
AWS Batch AWS Step FunctionsAWS Lambda
Workflow LayerBatch LayerJob Layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Lambda
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Owning servers means dealing with...
Scaling
Availability and fault tolerance
Operations and management
Provisioning and utilization
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serverless compute: AWS Lambda
COMPUTE
SERVICE
EVENT- DRIVEN
Run arbitrary
code without
managing
servers
Code only runs
when it needs to
run
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Lambda: Run code in response to
events
Lambda functions: Stateless, trigger-based code execution
Triggered by events:
• Direct sync and async API calls
• AWS service integrations
• Third-party triggers
• Many more…
Makes it easy to:
• Perform data-driven auditing, analysis, and notification
• Build back-end services that perform at scale
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Step Functions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Step Functions…
…makes it easy to
coordinate the components
of distributed applications
using visual workflows
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Application lifecycle in AWS Step Functions
Visualize in the
console
Define in JSON Monitor
executions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Seven state types
Task A single unit of work
Choice Adds branching logic
Parallel Fork and join the data across tasks
Wait Delay for a specified time
Fail Stops an execution and marks it as a failure
Succeed Stops an execution successfully
Pass Passes its input to its output
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Build Visual Workflows Using State Types
Task
Choice
Fail
ParallelMountains
People
Snow
Amazon
Rekognition
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of incorporating
AWS Step Functions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deployment with AWS Step Functions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A flexible workflow deployment model
• Decouple batch engine and workflow orchestration
• Workflow creation now done as JSON
• Easier to deploy
• Easier to automate
• Easier to test
• Can integrate non-batch applications as well
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Change one line to change workflow
{
...
"SubmitJob": {
"Type": "Task",
"Resource":
"arn:aws:lambda:REGION:ACCOUN
T:function:batchSubmitJob1",
"Next": "GetJobStatus"
},
...
}
{
...
"SubmitJob": {
"Type": "Task",
"Resource":
"arn:aws:lambda:REGION:ACCOUN
T:function:batchSubmitJob2",
"Next": "GetJobStatus"
},
...
}
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deployment with AWS Step Functions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Genomics Workflow
Alignment
Variant
Calling
Annotation
QC
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Put it together
$ aws stepfunctions start-execution
--state-machine-arn <your-
state-machine-arn>
--input
file://input.states.json
AWS Command Line Interface
AWS Batch console
Step Function console
S3 object listing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lab 2: Creating the workflow layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web ServicesCMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web ServicesAmazon Web Services
 
IOT203_Getting Started with AWS IoT
IOT203_Getting Started with AWS IoTIOT203_Getting Started with AWS IoT
IOT203_Getting Started with AWS IoTAmazon Web Services
 
CON202-Getting Started with Docker and Amazon ECS
CON202-Getting Started with Docker and Amazon ECSCON202-Getting Started with Docker and Amazon ECS
CON202-Getting Started with Docker and Amazon ECSAmazon Web Services
 
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and BeyondGPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and BeyondAmazon Web Services
 
Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...
Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...
Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...Amazon Web Services
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural PatternsAmazon Web Services
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...Amazon Web Services
 
Advanced Serverless Apps With Step Functions
Advanced Serverless Apps With Step FunctionsAdvanced Serverless Apps With Step Functions
Advanced Serverless Apps With Step FunctionsAmazon Web Services
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersAmazon Web Services
 
What's New in Serverless - SRV305 - re:Invent 2017
What's New in Serverless - SRV305 - re:Invent 2017What's New in Serverless - SRV305 - re:Invent 2017
What's New in Serverless - SRV305 - re:Invent 2017Amazon Web Services
 
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...Amazon Web Services
 
CON320_Monitoring, Logging and Debugging Containerized Services
CON320_Monitoring, Logging and Debugging Containerized ServicesCON320_Monitoring, Logging and Debugging Containerized Services
CON320_Monitoring, Logging and Debugging Containerized ServicesAmazon Web Services
 
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsFrom Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsAmazon Web Services
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture PatternsAmazon Web Services
 
Create a Serverless Image Processing Platform
Create a Serverless Image Processing PlatformCreate a Serverless Image Processing Platform
Create a Serverless Image Processing PlatformAmazon Web Services
 
IOT328_Building an AWS IoT-Enabled Drink Dispenser
IOT328_Building an AWS IoT-Enabled Drink DispenserIOT328_Building an AWS IoT-Enabled Drink Dispenser
IOT328_Building an AWS IoT-Enabled Drink DispenserAmazon Web Services
 
Interstella GTC: Monolith to Microservices with ECS
Interstella GTC: Monolith to Microservices with ECSInterstella GTC: Monolith to Microservices with ECS
Interstella GTC: Monolith to Microservices with ECSAmazon Web Services
 
CMP211_Getting Started with Serverless Architectures
CMP211_Getting Started with Serverless ArchitecturesCMP211_Getting Started with Serverless Architectures
CMP211_Getting Started with Serverless ArchitecturesAmazon Web Services
 
HLC302_Adopting Microservices in Healthcare Building a Compliant DevOps Pipel...
HLC302_Adopting Microservices in Healthcare Building a Compliant DevOps Pipel...HLC302_Adopting Microservices in Healthcare Building a Compliant DevOps Pipel...
HLC302_Adopting Microservices in Healthcare Building a Compliant DevOps Pipel...Amazon Web Services
 

What's hot (20)

CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web ServicesCMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
CMP323_AWS Batch Easy & Efficient Batch Computing on Amazon Web Services
 
IOT203_Getting Started with AWS IoT
IOT203_Getting Started with AWS IoTIOT203_Getting Started with AWS IoT
IOT203_Getting Started with AWS IoT
 
CON202-Getting Started with Docker and Amazon ECS
CON202-Getting Started with Docker and Amazon ECSCON202-Getting Started with Docker and Amazon ECS
CON202-Getting Started with Docker and Amazon ECS
 
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and BeyondGPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
GPSTEC317-From Leaves to Lawns AWS Greengrass at the Edge and Beyond
 
Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...
Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...
Amazon.com - Replacing 100s of Oracle DBs with Just One: DynamoDB - ARC406 - ...
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
 
Introducing Amazon EKS
Introducing Amazon EKSIntroducing Amazon EKS
Introducing Amazon EKS
 
Advanced Serverless Apps With Step Functions
Advanced Serverless Apps With Step FunctionsAdvanced Serverless Apps With Step Functions
Advanced Serverless Apps With Step Functions
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
What's New in Serverless - SRV305 - re:Invent 2017
What's New in Serverless - SRV305 - re:Invent 2017What's New in Serverless - SRV305 - re:Invent 2017
What's New in Serverless - SRV305 - re:Invent 2017
 
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
Advanced Patterns in Microservices Implementation with Amazon ECS - CON402 - ...
 
CON320_Monitoring, Logging and Debugging Containerized Services
CON320_Monitoring, Logging and Debugging Containerized ServicesCON320_Monitoring, Logging and Debugging Containerized Services
CON320_Monitoring, Logging and Debugging Containerized Services
 
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsFrom Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture Patterns
 
Create a Serverless Image Processing Platform
Create a Serverless Image Processing PlatformCreate a Serverless Image Processing Platform
Create a Serverless Image Processing Platform
 
IOT328_Building an AWS IoT-Enabled Drink Dispenser
IOT328_Building an AWS IoT-Enabled Drink DispenserIOT328_Building an AWS IoT-Enabled Drink Dispenser
IOT328_Building an AWS IoT-Enabled Drink Dispenser
 
Interstella GTC: Monolith to Microservices with ECS
Interstella GTC: Monolith to Microservices with ECSInterstella GTC: Monolith to Microservices with ECS
Interstella GTC: Monolith to Microservices with ECS
 
CMP211_Getting Started with Serverless Architectures
CMP211_Getting Started with Serverless ArchitecturesCMP211_Getting Started with Serverless Architectures
CMP211_Getting Started with Serverless Architectures
 
HLC302_Adopting Microservices in Healthcare Building a Compliant DevOps Pipel...
HLC302_Adopting Microservices in Healthcare Building a Compliant DevOps Pipel...HLC302_Adopting Microservices in Healthcare Building a Compliant DevOps Pipel...
HLC302_Adopting Microservices in Healthcare Building a Compliant DevOps Pipel...
 

Similar to High-Throughput Genomics on AWS: A Reference Architecture

Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Brendan Bouffler
 
Serverless use cases with AWS Lambda
Serverless use cases with AWS Lambda Serverless use cases with AWS Lambda
Serverless use cases with AWS Lambda Boaz Ziniman
 
Application Performance Management on AWS
Application Performance Management on AWSApplication Performance Management on AWS
Application Performance Management on AWSAmazon Web Services
 
Getting started with Serverless on AWS
Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWSAdrian Hornsby
 
Serverless Architecture and Best Practices
Serverless Architecture and Best PracticesServerless Architecture and Best Practices
Serverless Architecture and Best PracticesAmazon Web Services
 
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWSServerless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWSCodeOps Technologies LLP
 
Application Performance Management on AWS - ARC317 - re:Invent 2017
Application Performance Management on AWS - ARC317 - re:Invent 2017Application Performance Management on AWS - ARC317 - re:Invent 2017
Application Performance Management on AWS - ARC317 - re:Invent 2017Amazon Web Services
 
Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017Amazon Web Services
 
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017Amazon Web Services
 
Become a Serverless Black Belt: Optimizing Your Serverless Applications - SRV...
Become a Serverless Black Belt: Optimizing Your Serverless Applications - SRV...Become a Serverless Black Belt: Optimizing Your Serverless Applications - SRV...
Become a Serverless Black Belt: Optimizing Your Serverless Applications - SRV...Amazon Web Services
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural PatternsAdrian Hornsby
 
Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)Amazon Web Services
 
Serverless in Action on AWS
Serverless in Action on AWSServerless in Action on AWS
Serverless in Action on AWSAdrian Hornsby
 
Building Serverless Microservices with AWS
Building Serverless Microservices with AWSBuilding Serverless Microservices with AWS
Building Serverless Microservices with AWSDonnie Prakoso
 
CON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSCON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSAmazon Web Services
 
Leo Zhadanovsky - Building Web Apps with AWS CodeStar and AWS Elastic Beansta...
Leo Zhadanovsky - Building Web Apps with AWS CodeStar and AWS Elastic Beansta...Leo Zhadanovsky - Building Web Apps with AWS CodeStar and AWS Elastic Beansta...
Leo Zhadanovsky - Building Web Apps with AWS CodeStar and AWS Elastic Beansta...Amazon Web Services
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural PatternsAmazon Web Services
 

Similar to High-Throughput Genomics on AWS: A Reference Architecture (20)

Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018
 
Serverless use cases with AWS Lambda
Serverless use cases with AWS Lambda Serverless use cases with AWS Lambda
Serverless use cases with AWS Lambda
 
Application Performance Management on AWS
Application Performance Management on AWSApplication Performance Management on AWS
Application Performance Management on AWS
 
Getting started with Serverless on AWS
Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWS
 
Serverless Architecture and Best Practices
Serverless Architecture and Best PracticesServerless Architecture and Best Practices
Serverless Architecture and Best Practices
 
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWSServerless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
 
Application Performance Management on AWS - ARC317 - re:Invent 2017
Application Performance Management on AWS - ARC317 - re:Invent 2017Application Performance Management on AWS - ARC317 - re:Invent 2017
Application Performance Management on AWS - ARC317 - re:Invent 2017
 
Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017
 
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017
 
Become a Serverless Black Belt: Optimizing Your Serverless Applications - SRV...
Become a Serverless Black Belt: Optimizing Your Serverless Applications - SRV...Become a Serverless Black Belt: Optimizing Your Serverless Applications - SRV...
Become a Serverless Black Belt: Optimizing Your Serverless Applications - SRV...
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 
Amazon ECS Deep Dive
Amazon ECS Deep DiveAmazon ECS Deep Dive
Amazon ECS Deep Dive
 
Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)Amazon Amazon Elastic Container Service (Amazon ECS)
Amazon Amazon Elastic Container Service (Amazon ECS)
 
Serverless in Action on AWS
Serverless in Action on AWSServerless in Action on AWS
Serverless in Action on AWS
 
Building Web Apps on AWS
Building Web Apps on AWSBuilding Web Apps on AWS
Building Web Apps on AWS
 
Building Serverless Microservices with AWS
Building Serverless Microservices with AWSBuilding Serverless Microservices with AWS
Building Serverless Microservices with AWS
 
CON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSCON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWS
 
Introduction to Serverless
Introduction to ServerlessIntroduction to Serverless
Introduction to Serverless
 
Leo Zhadanovsky - Building Web Apps with AWS CodeStar and AWS Elastic Beansta...
Leo Zhadanovsky - Building Web Apps with AWS CodeStar and AWS Elastic Beansta...Leo Zhadanovsky - Building Web Apps with AWS CodeStar and AWS Elastic Beansta...
Leo Zhadanovsky - Building Web Apps with AWS CodeStar and AWS Elastic Beansta...
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

High-Throughput Genomics on AWS: A Reference Architecture

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT High-Throughput Genomics on AWS A a r o n F r i e d m a n A n g e l P i z a r r o L F S 3 0 9 N o v e m b e r 2 7 , 2 0 1 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda • Presentation - Introduction and AWS Batch Deep Dive (20 minutes) • Hands-on Lab - Packaging applications as Docker containers and integrating into AWS Batch to align genome sequences (1 hour) • Presentation - AWS Lambda and AWS Step Functions (20 minutes) • Hands-on Lab - Defining a end-to-end genomic data analysis workflow using Step Functions, Lambda, and Batch (40 minutes) Prerequisites and materials amzn.to/reinvent17-lfs309
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The problem
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis Serial steps
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis Parallel steps
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics data processing Typical workflow in genomics analysis Retry logic
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A reference architecture for genomics Amazon ECR Amazon S3 Applications Data Job Layer
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A reference architecture for genomics Amazon ECR Amazon S3 AWS Batch Job Layer Batch Layer Job Execution
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A reference architecture for genomics Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Job Layer Batch Layer Workflow Layer Orchestration
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A reference architecture for genomics Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Job Layer Batch Layer Workflow Layer
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The job layer: Application packaging using Docker
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The reference architecture Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Workflow LayerBatch LayerJob Layer
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bioinformatics application stacks * Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bioinformatics application stacks * Image courtesy of The Broad Institute - https://www.broadinstitute.org/gatk/img/BP_workflow.png
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Virtual machines vs. containers Pros: • Easy application publishing • Clean dependency bundling Cons: • Large OS images • Duplication of basic services • Long start time Application Bins/Libs OS Application Bins/Libs OS Application Bins/Libs OS Application Pros: • Easy application publishing • Clean dependency bundling • Shared dependencies • Shared OS services • Small images Cons: • Some cross container networking issues
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FROM ubuntu:16.04 RUN apt-get install -y python-pip python-dev RUN pip install PIL FROM python:2.7 RUN pip install numpy pandas Docker Dockerfile and the build process 961f9d3583 c6d01316e4 a408d3cfe23 python27ubuntu:precise
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Docker container sources Community containers Custom developed • Support for S3 download and check pointing • Scratch space management • Container metadata management • Full control on the software stack • Licensing • Monitoring • Security and compliance adherence https://dockstore.org/ http://biocontainers.pro/ http://bioshadock.genouest.org/
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The batch layer: AWS Batch
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The reference architecture Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Workflow LayerBatch LayerJob Layer
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introducing AWS Batch Fully Managed Task Execution No software to install or servers to manage. AWS Batch provisions and scales your infrastructure Integrated with AWS AWS Batch jobs can easily and securely interact with services such as Amazon S3, DynamoDB, and Rekognition Cost-Efficient AWS Batch launches compute resources tailored to your jobs and can provision Amazon EC2 and EC2 Spot instances
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Batch concepts • Jobs • Job definitions • Job queue • Compute environments • Scheduler
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: AWS Batch job architecture IAM Role for Batch Job Amazon S3 Input Files Queue of Runnable Jobs S3 Events Trigger Lambda Function Submits Batch Job AWS Batch Compute Environments AWS Batch Job Output Job Definition Job Resource Requirements and other parameters AWS Batch Execution Application Image AWS Batch Scheduler
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A Visual Representation of AWS Batch
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Executing Job(s) Specify Docker run parameters as container overrides Specify Job Queue Submit Dependencies response = batch_client.submit_job( dependsOn=event['dependsOn'], containerOverrides=event['containerOverrides'], jobDefinition=event['jobDefinition'], jobName=event['jobName'], jobQueue=event['jobQueue'], )
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Considerations for Batch Layer for genomics ? Data Staging > Use Amazon S3 to store reference and input data, store results ? Multi-tenancy > Have processes work with temporary directories ? Storage cost/efficiency > Each Job cleans up after itself before returning
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lab 1: Creating the job and batch layers
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The workflow layer: AWS Lambda and AWS Step Functions
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The reference architecture Lambda functions Amazon ECR Amazon S3 AWS Batch AWS Step FunctionsAWS Lambda Workflow LayerBatch LayerJob Layer
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Lambda
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Owning servers means dealing with... Scaling Availability and fault tolerance Operations and management Provisioning and utilization
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless compute: AWS Lambda COMPUTE SERVICE EVENT- DRIVEN Run arbitrary code without managing servers Code only runs when it needs to run
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Lambda: Run code in response to events Lambda functions: Stateless, trigger-based code execution Triggered by events: • Direct sync and async API calls • AWS service integrations • Third-party triggers • Many more… Makes it easy to: • Perform data-driven auditing, analysis, and notification • Build back-end services that perform at scale
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Step Functions
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Step Functions… …makes it easy to coordinate the components of distributed applications using visual workflows
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Application lifecycle in AWS Step Functions Visualize in the console Define in JSON Monitor executions
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Seven state types Task A single unit of work Choice Adds branching logic Parallel Fork and join the data across tasks Wait Delay for a specified time Fail Stops an execution and marks it as a failure Succeed Stops an execution successfully Pass Passes its input to its output
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Build Visual Workflows Using State Types Task Choice Fail ParallelMountains People Snow Amazon Rekognition
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of incorporating AWS Step Functions
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deployment with AWS Step Functions
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A flexible workflow deployment model • Decouple batch engine and workflow orchestration • Workflow creation now done as JSON • Easier to deploy • Easier to automate • Easier to test • Can integrate non-batch applications as well
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Change one line to change workflow { ... "SubmitJob": { "Type": "Task", "Resource": "arn:aws:lambda:REGION:ACCOUN T:function:batchSubmitJob1", "Next": "GetJobStatus" }, ... } { ... "SubmitJob": { "Type": "Task", "Resource": "arn:aws:lambda:REGION:ACCOUN T:function:batchSubmitJob2", "Next": "GetJobStatus" }, ... }
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deployment with AWS Step Functions
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A Genomics Workflow Alignment Variant Calling Annotation QC
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Put it together $ aws stepfunctions start-execution --state-machine-arn <your- state-machine-arn> --input file://input.states.json AWS Command Line Interface AWS Batch console Step Function console S3 object listing
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lab 2: Creating the workflow layer
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!