Many customers choose AWS because they need a highly reliable, scalable, and low-cost platform on which to run their applications. Low “pay only for what you use” pricing and frequent price decreases are just the beginning of how AWS can help you optimize your usage and achieve lower costs. In this session, you will learn about a few simple tools for monitoring and managing your AWS resource usage that you can start using right away, as well as some innovative features that can help you operate at lower costs programmatically. Cost allocation reporting, detailed usage reports, billing alerts, EC2 Auto Scaling, Spot and Reserved Instances, and idle resource detection are just a few of the tools and features we will cover.
1. Optimizing Your AWS
Apps and Usage to
Reduce Costs
Ianni Vamvadelis
Manager, Solution Architecture
2. Agenda
• Objective
- Review the spectrum of ways to save money on your AWS application
• Tenet: Fit the cloud to your product and business model
- Use Only What You Need (and pay only for what you use!)
- Measure and Manage
- Scale Opportunistically
• Customer Spotlight
- National Rail Enquiries
3. Use Only What You Need
And pay only for what you use!
5. Background
• Private company created in 1996 owned by the TOCs
• From the busiest phone number in the UK to the #1 website in travel
• Over 1 million visits everyday across web & mobile
• Achieved over 99% migration to self-service
• Customer complaints 1.3 per 100,000 contacts
• Over £800m of sales leads provided to TOCs and 3rd parties p.a.
• Over 500 services provided to 150 clients
• Annual growth of 50%
6. The Challenge
• Volatility of up to 10x peak demand
• Large deployed computer estate across 6 data centres
• Ageing computer estate
• Rapid growth in B2C and B2B business
• Ever increasing rich functionality in channels
• Multiple service desks
• Suppliers experts in application development not hosting
7. Why Cloud?
• Agility and elasticity – use what we need, when needed
• High performance – availability & resilience
• Market knowledge – solution provided by hosting & SIAM experts
• Low cost – pay for use, savings of 30%
• Commodity culture – ready and easy to use
• Flexibility and freedom – keep up to date & not locked in
8. Scale on demand
Rigid On-Premise Resources
Elastic Cloud Resources
Resources scaled to demand
Actual demand
Predicted Demand
Waste
Time
Customer
Dissatisfaction
VS.
Capacity
Capacity
Actual demand
Time
9. Use only what you need: AWS cost savings opportunities
Right-size your cloud resources
- Use resources that suit your needs (instance types, storage options, etc.)
- Improve performance: reduce churn, underutilization, bottlenecks
- Lower costs: maximize your output per dollar, don’t pay for performance you don’t require
Fit your payment model to your business model
- Do you value flexibility or predictability?
- Use a portfolio of payment models
Measure and manage your application and cloud resources
- Monitor your applications to identify new savings opportunities
10. Right-size your cloud resources
• An instance size for
every purpose
• Assess your memory &
CPU requirements
- Fit your application to
the resource
- Fit the resource to your
application
• Only use a larger
instance when needed
11. Optimize your storage choice too: S3 & Glacier
S3 and Glacier are both:
- Secure
- Flexible
- Low-cost
- Scalable: over 2 trillion
- Durable: 99.999999999% (11 “9”s)
Amazon
Glacier
12. Choosing between S3 and Glacier
Amazon Simple Storage Service (S3)
- Designed to serve static content at high volumes, low latency, frequent access
- Low cost: as low as 5.5¢ per GB-month (or 3.7¢ for reduced redundancy)
Amazon Glacier
- Designed for long-term cold storage: infrequent access, long retrieval times (3-5 hrs)
- Extremely low-cost: 1¢ per GB-month
Tips:
- Optimize access: Reduce payload size, # of accesses (e.g., consolidated logs)
- Monitor for unexpected access/growth patterns: e.g., misconfigured log archiving
- Set Lifecycle Policies: object expiration dates; auto-move S3 files to Glacier
Illumina, the leading provider of DNA sequencing
instruments, uses Glacier to store large blocks of
genomic data all over the world
13. Fit your payment model to your business model: EC2 pricing plans
On-Demand
Instances
Pay as you go for computing
power
Flat hourly rate, no up-front
commitments
Reserved
Instances
Spot
Instances
Pay an up-front fee for a
capacity reservation and a lower
hourly rate (up to 72% savings)
Pay what you want for spare
EC2 capacity: your instances run
if your bid exceeds the Spot price
1-year or 3-year terms
Potential for large scale at low
cost: When they’re available,
take advantage of 1,000s of Spot
Instances at up to 90% savings
RI Marketplace: sell RIs you no
longer need; buy RIs at a
discount
10:00
10:05
10:10
10:15
14. Use a spectrum of payment models
For example:
Frontend Applications
on On-Demand/Reserved Instances
Backend Applications*
on Spot Instances
+
* e.g., batch video transcoding
15. Reserved Instance Marketplace: Buy and Sell
• Benefits for Buyers:
• Same underlying EC2 hardware
• Buy RIs at a discount from AWS price
• Increased selection of term lengths & prices
• Benefits for Sellers:
• Moving to a new AWS region
• Changing your instance type
• Switching operating systems
• Selling capacity when project ends
17. Overview of AWS Monitoring and Management
Services
AWS provides detailed cloud monitoring and management
(see “Account Activity” navigation panel)
CloudWatch
(see AWS Management Console)
Billing Alerts
(see “Account Activity” navigation panel)
Trusted Advisor
(see “Support Center”)
Other APIs: tags, programmatic access, etc.
- Consolidated Billing
-
-
Third-party services are also available
18. Consolidated Billing: Single payer for a group of
accounts
• One Bill for multiple accounts
•
Easy Tracking of account charges
(e.g., download CSV of cost data)
•
Group Activities by Paying
Account (e.g., Dev, Stage, Test,
Prod)
•
Volume Discounts can be
reached faster with combined
usage
•
Reserved Instances are shared
across accounts (including RDS
Reserved DBs)
•
AWS Credits are combined to
minimize your bill
20. Consolidated Billing Demo (2/3)
From your payment account login, view details of each linked account in one place
21. Consolidated Billing Demo (3/3)
• Drill down into detail’s of each
account
• Download a CSV file for line item
details, then analyze via
spreadsheet, pivot tables, etc.
22. Amazon CloudWatch
• Overview
- Monitoring for AWS cloud resources and applications
• AWS Resources: EC2, RDS, EBS, ELB, SQS, SNS, DynamoDB, EMR,
Auto Scaling, …
• Custom metrics from your application (use Put API call)
- Gain insight, set alarms and notifications, react immediately
- Start using within minutes, auto-scale with your application
• Sophisticated Automation
- Use CloudWatch metrics with Auto Scaling to dynamically scale EC2 instances
23. Use CloudWatch to monitor & manage resource usage
• Monitor your resource utilization
- Are you using the right instance type?
- Have you left instances idle?
- Is your instance usage level or bursty?
• Manage your resource utilization
- Move bursty workloads to other instances
- Rebalance your worker nodes
- Scale nodes automatically with Auto Scaling
24. Use CloudWatch to create Billing Alerts
• Billing Alerts notify you when estimated charges reach a given threshold
• Use Billing Alerts to track an individual developer, or your whole business
• Easily set up your billing alarm and actions
25. Trusted Advisor:
Enterprise Strength Monitoring/Optimization
• Monitors and recommends
optimizations for:
• Cost
• Security
• Fault Tolerance
• Performance
• Available to customers with
Business and Enterprise-level
support
http://aws.amazon.com/premiumsupport/trustedadvisor/
30. Time-to-Result Case 1: Value of result quickly
diminishes
Example:
Engineering
simulation
Delay Loss of
productivity,
project slips
31. Time-to-Result Case 2: Result is valuable…until it’s
not
Example:
Weekend
regression tests
Delay Minimal
impact until
8:00AM Monday
32. Consider Spot Instances for greater savings and scale
• Spot in a nutshell
- Spot instances run when Your Bid ≥ Spot Price
- Spot instances = Spare EC2 instances
- Spot instances might be interrupted at any time
• Benefits
- Savings: Up to 90% off On-Demand
- Scale: Access up to 1,000s of EC2 instances
• To use Spot
- Decide on a bid price
- Launch via Console, API, Auto Scaling
- Monitor Bid Statuses via Console/API
33. What applications work on Spot?
• Good Spot applications are:
- Delayable: to balance SLA/cost
- Scalable: “embarrassingly parallel”
- Fault-tolerant: can be terminated without losing all work
- Portable across regions, AZs, instance types
Lucky Oyster crawled 3.4B Web
Pages, building a 400M entry index
• Examples:
in around 14 hours for $100 (>85%
- MapReduce (Hadoop, Amazon EMR)
savings)!
- Scientific Computing (Monte Carlo simulations)
- Batch Processing (video transcoding)
- Financial Computing (high-frequency trading algorithm backtesting)
- and many others…
34. Use Auto Scaling to dynamically scale your app
• Auto Scaling auto-sizes your fleet based on preset alarms and schedules
• Integrates with CloudWatch metrics
• Use Auto Scaling to
- Improve customer experience, application performance
- Maximize CPU/IO/Memory utilization
- Optimize other metrics
Scale with Real-Time Demand
36. Follow the Money vs. Follow the Customer
• Optimize utilization
- Auto Scale on utilization metrics: CPU, memory, requests, connections, …
• Optimize price paid
- Scale with Spot instances when Spot prices are low
- e.g., Run batch processes off-peak (nights, weekends) when Spot prices are lower
37. Follow the Money vs. Follow the Customer
• Optimize customer experience with Auto Scaling
• Example 1: Scale resources to meet customer demand
- Video service Auto Scales instances to respond to customer web service requests
• Example 2: Scale resources to ensure fresh results
- A scientific paper search engine Auto Scales on queue depth (# of new docs to crawl)
- 10 instances steady state and up to 5,000+ to ensure minimum throughput time
• Example 3: Scale resources preemptively before large demand
- A TV show marketing site scales up before the show and back down after
38. Cost-Saving Examples
Achieve potentially
large savings by
profiling your
application and
paying only for what
you need
Base Case
You run 10
m3.2xlarge’s OnDemand 24x7:
10 instances
X $1.00/inst-hours
X 24 hours/day
X ~30.5 days/month
= $7,320/month
Savings Examples
If you need to run 100% of the time, indefinitely:
10x 3-yr Heavy RIs @ 100% Utilization
= $2,731/month (63% savings)
If you can layer RIs and On Demand to meet
demand:
4x 3-yr Heavy RIs @ 100% Utilization
4x 3-yr Light RIs @ 15% Utilization
2x On-Demand @ 5% Utilization
= $1,843/month (75% savings)
If you Auto Scale from 2 to 10 instances around
primetime TV (6-11pm, Mon-Fri):
2x 3-yr Heavy RIs @ 100% Utilization
8x 3-yr Light RIs @ 15% Utilization
= $1,683/month (77% savings)
If you can use 40x Spot Instances at 25% up-time:
= $840/month (89% savings)
39. Conclusion (Part I):
Fit the cloud to your product and business model
• Use Only What You Need (and pay only for what you use!)
• Measure and Manage
• Scale Opportunistically
40. An example putting it all together: Saving on Batch
Processing
1. Pay Only
for What You
Use: Rightsize your cloud
resources
2. Monitor and
Manage your system
with CloudWatch,
Billing Alerts, Trusted
Advisor
3. Scale
Opportunistically:
Auto Scale worker
nodes based on size
of input queue
http://aws.amazon.com/architecture/
41. Conclusion (Part II):
Use the cloud to create new products & business models
On-Premises
Optimized Cloud
• Failure is
expensive
• Failure is
inexpensive
• Experiment
infrequently
• Experiment early
and often
• Less Innovation
• More Innovation
44. Other simple optimization tips
• Don’t forget to…
- Disassociate unused EIPs
- Delete unassociated Amazon EBS volumes
- Delete older Amazon EBS snapshots
- Leverage Amazon S3 Object Expiration
- Defer batch activity (e.g., Hadoop) to periods when
your RIs are regularly underutilized
(For Enterprise-level support, Trusted Advisor can help
with some of these.)
• Netflix’s Janitor Monkey automates clean-up
- Reduces “unintentional” resource usage
- Reduces cost and clutter
45. Other Spot Instance Use Cases
• Batch Processing:
• Hadoop:
• Scientific Computing:
• Video/Image Processing:
• Testing:
• Web/Data Crawling:
• Financial:
• HPC/HTC:
• Cheap Compute:
Generic batch processing (scale out computing)
MapReduce processing (e.g., Search, Big Data)
Scientific trials, simulations, analysis
Encoding, transcoding, rendering
Continuous testing, load testing websites, etc.
Analyzing data and processing it
Hedge fund analytics, energy trading, etc.
Embarrassingly parallel jobs
Backend servers for Facebook games, MineCraft
46. Application Usage Patterns
Steady State
Spiky Predictable
Uncertain unpredictable
Example: Corporate Website
Example: Marketing
Promotions Website
Example: Social game or
Mobile Website
47. Amazon EMR (Hadoop): Run Task Nodes on Spot
Data
Source
Code/
Scripts
Amazon S3
Upload large datasets or
log files directly
Mapper
Reducer
Input
Data
Outpu
tData
Task
Node
Amazon Elastic
MapReduce
Service
HiveQL
Pig Latin
Cascading
Amazon S3
Name
Node
Amazon SimpleDB
Task
Node
Runs multiple
JobFlow Steps
Core
Node
Core
Node
Metadata
HiveQL
Pig Latin
Query
HDFS
JDBC/ODB
C
Amazon Elastic MapReduce
Hadoop Cluster
BI Apps
48. Paying as you go on AWS lowers your Total Cost of
Ownership
• By paying only for what you use, you
can save on:
-
Servers
Storage
Network
Environment
Administration
• Example: 82% TCO savings for
Thomsen Reuters
• Learn more:
aws.amazon.com/economics