Batch computing is a common way for developers, scientists and engineers to run a series of jobs on a large pool of shared compute resources, such as servers, virtual machines, and containers. Amazon ECS makes it easy to run and manage Docker-enabled applications across a cluster of Amazon EC2 instances. In this session will show you how to run batch jobs using Amazon ECS and together with other AWS services, such as AWS Lambda and Amazon SQS. We will see how you can leverage Amazon EC2 Spot Instances to power your ECS cluster and easily scale your batch workloads. You'll hear from Mapbox on how they use ECS to power their entire batch processing architecture to collect and process over 100 million miles of sensor data per day that they use for powering their maps. Mapbox will also discuss how they optimize their batch processing framework on ECS using Spot Instances and demo their open source framework that will help you get up and running with ECS in minutes.
2. What to Expect from the Session
• Understand the challenges of running batch processes
• Why Amazon ECS for Batch?
• Architectural Design Patterns
• Best Practices
• Mapbox and Amazon ECS
3. Challenges of Running Batch Workloads
• Typically resource intensive
• Time constraint for completion
• Potential impact to concurrent batch jobs
• Scaling infrastructure resources
• Ensuring effective resource utilization and cost savings
• Fragile and unreliable
4. What Batch Workloads Need
Reliable Easy Development Easy Deployment
High Efficiency Low Ops Load Cost Effective
9. Designed for Use with Other AWS Services
Elastic Load Balancing
Amazon Elastic Block Store
Amazon Virtual Private Cloud
AWS Identity and Access Management
AWS CloudTrail
10. Security
Your own EC2 instances in a VPC
with all its security features to
provide a high level of isolation.
17. Container Definition
Names and identifies your image
Includes default runtime attributes for your container
• Environment Variables
• Port Mappings
• Container entry point and commands
• Resource constraints
• Etc.
22. Container Instance
EC2 instance on which Tasks
are scheduled
We provide ECS-optimized AMI
or you can download lightweight
ECS Agent
Registers into cluster upon
launch
Different EC2 instance types for
variety in resource pool
24. Trigger Batch Processing with Lambda
Amazon ECS
Availability Zone Availability Zone
Container Instance Container Instance
AutoScaling Group
Task A
AWS Lambda
Amazon
S3 Bucket
(Source)
ecs:RunTask
Amazon
S3 Bucket
(Target)
Amazon
S3 Bucket
Object
Amazon
CloudWatch
AWS CloudTrail
25. Fleet of workers with ECS with SQS
Amazon ECS
Availability Zone Availability Zone
SQS queue
Container Instance Container Instance
AutoScaling Group
Task A
AWS Lambda
Amazon
S3
DynamoDB
Amazon
Kinesis
ecs:RunTask
Amazon
CloudWatch
AWS CloudTrail
26. Long-running Batch Jobs
• Utilize Spot Instances
• EC2 Spot Blocks for
Defined-Duration
Workloads
• ECS event stream for
CloudWatch Events
• Service Scaling and
Monitoring
Amazon ECS
Availability Zone Availability Zone
Container Instance Container Instance
AutoScaling Group
Task A Task B
Task C
Amazon
CloudWatch
AWS CloudTrail
27. Best Practices
• Store state and inputs, outputs in S3 or another datastore
• Minimize dependencies between task definitions (should
be independent of each other)
• Use Spot Instances and Spot fleets for long-running
batch jobs
• Monitor cluster state with ECS APIs
• Share pools of resources
• Auto Scaling, VPC, IAM, scheduled Reserved Instances
50. What is watchbot?
A library to help run a highly-scalable AWS service that
performs data processing tasks in response to external
events.
You provide the the messages and the logic to process
them, while Watchbot handles making sure that your
processing task is run at least once for each message.
53. Your task can do anything you want!
• Your task can be anything that works in Docker
• Use any language
• Environment variables as input
• bash exit codes to indicate success/failure/retry
• Do any I/O
• Save outputs to S3 or DynamoDB
54. Environment Variables
Name Description
Subject the message's subject
Message the message's body
MessageId the message's ID defined by SQS
SentTimestamp the time the message was sent
ApproximateFirstReceiveTimestamp the time the message was first received
ApproximateReceiveCount
the number of times the message has been
received
55. Messages
• Use any format as long as your task is equipped to handle
it
• JSON can capture more complex
56. Exit Codes
Exit code Description Outcome
0 completed successfully
message is removed from the queue without
notification
3 rejected the message
message is removed from the queue and a
notification is sent
4 no-op
message is returned to the queue without
notification
other failure
message is returned to the queue and a
notification is sent
57. More features!
• Logging - write logs to CloudWatch LogGroup
• Send alarms to SNS
• Reduce mode - tracks progress of distributed tasks and
runs a reduce task when everything finishes
58. Why not Lambda?
Watchbot is similar in many regards to AWS Lambda, but is
more configurable, more focused on data processing, and
not subject to several of Lambda's limitations.
• Full control over execution environment allows you to install anything you
want
• No limits on execution time
• No memory limits
• No concurrency limits or account-wide throttling
• No DynamoDB Streams or Kinesis support
59. Gotcha: EBS Boot
• ECS optimized instances are only available as EBS boot
AMIs so consider rolling your own instance store AMI
• EBS is more expensive - especially if you are running
many instances on Spot
• Slower than ephemeral disks