1. AWS Summit 2013 Tel Aviv
Oct 16 – Tel Aviv, Israel
Cost Optimization, TCO and ROI
Steffen Krause
Technology Evangelist
@AWS_Aktuell
skrause@amazon.de
2. Agenda
1. TCO comparison between cloud and traditional IT
2. How to save money on your AWS bill
3. Customer Spotlight: Time to Know
4. AWS lets you pay for only the infrastructure you need…
…and only when you need it
On-Premise
(or “Private Cloud”)
Capital Expense Model
Metered, Pay As You Go Model
High upfront capital cost,
Use only what you need,
high cost of ongoing support
using on-demand, reserved, or spot
Inflexible
Flexible
6. In your TCO analysis
DOs
DON’Ts
BONUS
3 or 5 Year Amortization
Use 3-Year Heavy RIs
Use Volume RI Discounts
Understand Usage Patterns
Ratios (VM:Physical, Servers:Racks, People:Servers)
Consider Tiered Pricing
(Less expensive at every Tier)
Cost Benefits of Automation (Auto scaling,
APIs, TA, Optimization)
7. In your TCO analysis
Forget Power/Cooling
DOs
(compute, storage, shared network)
Forget Administration Costs (procurement,
design, build, operations, network, security personnel)
DON’Ts
BONUS
Forget Rent/Real Estate
(building deprecation, taxes, shared services staff)
Forget Virtualization licensing and Software
Maintenance Costs
Forget to mention Cost of “Redundancy”,
Multi-AZ Facility
8. In your TCO analysis
Time from ordering to procurement
DOs
DON’Ts
BONUS
(Releasing early = Increased Revenue)
Cost of “capacity on shelf” (top of step)
Incremental cost of adding an on-premises
server when physical space is maxed out
Real cost of resource shortfalls (bottom of
step)
Cost of disappointed or lost customers when
unable to scale fast enough
10. USE ONLY WHAT YOU NEED
And pay only for what you use!
11. When you turn off your cloud resources, you
actually stop paying for them
12. Use only what you need
AWS cost savings opportunities
• Right-size
– Select appropriate resources
– Scale up and down as appropriate
– Turn off unused resources
• Payment models
– Flexibility vs. predictability
– Mixing payment models
• Measure and manage
– Monitor for saving options
13. Right-size: broad EC2 selection
Scale-out Compute, Batch
Processing
For Starters, Low throughout,
Websites
Parallel Processing
OLAP, Hadoop, File
Systems
Standard
High-CPU
High-Memory
Micro
Cluster Compute
Cluster GPU
High I/O
High Storage
High Cluster Memory
Most Apps, Low-cost, App
Server / Web Server
Databases, Databases
Databases…
Compute + Network
Throughput
NoSQL, Best for Random
IOPS
In-memory Apps and DBs.
Best $/RAM
14. Optimize your storage choice too
S3 & Glacier
• S3 and Glacier are both:
–
–
–
–
–
Secure
Flexible
Low-cost
Scalable: over 2 trillion customer objects
Durable: 99.999999999% (11 “9”s)
Amazon
Glacier
15. Choosing between S3 and Glacier
• Amazon Simple Storage Service (S3)
– Designed to serve static content
• high volumes, low latency, frequent access
– From 5.5¢/GB/Month: 11 9’s Durability
– From 3.7¢/GB/Month: 4 9’s Durability (reduced redundancy)
• Amazon Glacier
– Designed for long-term cold storage/archiving
• infrequent access, long retrieval times (3-5 hrs)
– From 1¢/GB/Month
• But retrieving data is slower and more expensive than on S3
16. S3 and Glacier tips
• Optimize access
– Reduce payload size
– # of accesses (e.g., consolidated logs)
• Monitor for unexpected access/growth patterns
– Misconfigured log archiving
• Set Lifecycle Policies
– Object expiration dates
– Auto-move S3 files to Glacier
17. Use only what you need
AWS cost savings opportunities
• Right-size
– Select appropriate resources
– Scale up and down as appropriate
– Turn off unused resources
• Payment models
– Flexibility vs. predictability
– Mixing payment models
• Measure and manage
– Monitor for saving options
18. EC2 pricing plans
On-Demand
Instances
Pay as you go for computing
power
Flat hourly rate, no up-front
commitments
Reserved
Instances
Spot
Instances
Pay an up-front fee for a
capacity reservation and a lower
hourly rate (up to 72% savings)
Pay what you want for spare EC2
capacity: your instances run if
your bid exceeds the Spot price
1-year or 3-year terms
Potential for large scale at low
cost: When they’re available,
take advantage of 1,000s of Spot
Instances at up to 90% savings
RI Marketplace: sell RIs you no
longer need; buy RIs at a
discount
10:00
10:05
10:10
10:15
19. Use a spectrum of payment models
Frontend Applications
on On-Demand/Reserved Instances
Backend Applications*
on Spot Instances
+
* e.g., batch video transcoding
20. The breakeven for RIs is surprisingly quick
• 1yr and 3yr RIs don’t mean that you must keep them 1 or 3 years
• In many cases, you save money way before
Sample Cash Flow Summary from RI Analysis
Sample Cash Flow Summary from RI Analysis, Aggregate of Light, Medium & Heavy RIs
21. Other simple optimization tips
• Don’t forget to…
–
–
–
–
–
Disassociate unused EIPs
Delete unassociated Amazon EBS volumes
Delete older Amazon EBS snapshots
Leverage Amazon S3 Object Expiration
Defer batch activity (e.g., Hadoop) to
periods when your RIs are regularly
underutilized
– (For Enterprise-level support, Trusted
Advisor can help with some of these.)
22. “If you cannot measure it, you cannot improve it.”
- Lord Kelvin
MEASURE AND MANAGE
23. AWS Monitoring and Management Services
• Detailed cloud monitoring and management
–
–
–
–
–
Consolidated Billing
(in “Account Activity”)
CloudWatch
(in AWS Management Console)
Billing Alerts
(in “Account Activity”)
Trusted Advisor
(in “Support Center”)
Other APIs: tags, programmatic access, etc.
• Third-party services are also available
24. Consolidated Billing
•
One Bill for multiple accounts
•
Easy Tracking of account charges
(e.g., download CSV of cost data)
•
Group Activities by Paying Account
(e.g., Dev, Stage, Test, Prod)
•
Volume Discounts can be reached
faster with combined usage
•
Reserved Instances are shared
across accounts (including RDS
Reserved DBs)
•
AWS Credits are combined to
minimize your bill
25. CloudWatch to monitor & manage usage
• Monitor your resource utilization
– Are you using the right instance type?
– Have you left instances idle?
– Is your instance usage level or bursty?
• Manage your resource utilization
– Move bursty workloads to other instances
– Rebalance your worker nodes
– Scale nodes automatically with Auto
Scaling
26. Use CloudWatch to create Billing Alerts
• Alert when estimated charges reach threshold
• Track an individual developer, or your whole business
• Set up your billing alarm and actions
27. Trusted Advisor
Enterprise Strength Monitoring/Optimization
• Monitors and
recommends
optimizations for:
–
–
–
–
Cost
Security
Fault Tolerance
Performance
• Available to customers
with Business and
Enterprise-level support
http://aws.amazon.com/premiumsupport/trustedadvisor/
30. Time To Know
The T2K Digital Teaching Platform is designed to
30
promote student acquisition of 21st century skills.
The platform enables the smart and effective
generation of localized content using T2K’s smart
Content Generation Studio.
Time To Know proven results show higher student
achievement levels and stronger motivation for
learning.
31. Time To Know
The T2K Digital Teaching Platform is designed to
31
promote student acquisition of 21st century skills.
The platform enables the smart and effective
generation of localized content using T2K’s smart
Content Generation Studio.
Time To Know proven results show higher student
achievement levels and stronger motivation for
learning.
32. Challenges
High hosting costs in traditional computing.
Decentralized management of our environments.
Difficult to predict growth of business and adjust
32
infrastructure and H.W accordingly
Reducing cost while off hours.
Reduce hosting locations to minimum.
33. Using AWS at T2K
$100
Cost (K)
Joining AWS
$80
$60
No formal workflow
with AWS
Inefficient use of
instances and
storage
Consolidated billing
Multiple zones for
Reducing instances
fewer clusters
and forming work
Reducing instances
procedures Internal tool
development
and forming work using
C# SDK
procedures for elasticity
90K
60K
Cloud operations
optimization (newvem)
Reserved
Instances
$40
35K
19K
2012
2013
$20
$0
2010
2011
Cost (K)
34. Operational Benefits
Elasticity – using the servers when we need – saving costs by more
34
than 40%
Reserve instances - saving our production costs by 38% compared to
on demand.
Reducing our investment and space needed on infrastructure and
hardware.
Central management for all our environments.
Variety of API’s which allow us to develop internal tool for
operational use.
Easy scale up.
Multiple regions for better user experience.
Auto scaling – using auto scaling API’s.
35. Best Practices
Using accounts and consolidated billing for easily tracking your
35
bills.
Use CloudWatch and SNS for billing monitoring.
Virtual private cloud – elasticity and easy control of your
environment.
Using regions for covering all our potential customers.
Tools for optimizing operation use – like Newvem.
Amazon SDK&API’s – developing internal tool for optimizing
operation use.
38. Time-to-Result Case 1:
Value of result quickly diminishes
Example:
Engineering
simulation
Delay Loss of
productivity,
project slips
39. Time-to-Result Case 2:
Result is valuable…until it’s not
Example:
Weekend
regression tests
Delay Minimal
impact until
8:00AM Monday
40. Spot Instances for greater savings and scale
• Spot in a nutshell
– Spot instances run when Your Bid ≥ Spot Price
– Spot instances = Spare EC2 instances
– Spot instances might be interrupted at any time
• Benefits
– Savings: Up to 90% off On-Demand
– Scale: Access up to 1,000s of EC2 instances
• To use Spot
– Decide on a bid price
– Launch via Console, API, Auto Scaling
– Monitor Bid Statuses via Console/API
41. What applications work on Spot?
• Good Spot applications are:
–
–
–
–
Delayable: to balance SLA/cost
Scalable: “embarrassingly parallel”
Fault-tolerant: can be terminated without losing all work
Portable across regions, AZs, instance types
• Examples:
–
–
–
–
–
MapReduce (Hadoop, Amazon EMR)
Scientific Computing (Monte Carlo simulations)
Batch Processing (video transcoding)
Financial Computing (high-frequency trading algorithm backtesting)
and many others…
42. Use Auto Scaling to dynamically scale your app
• Auto Scaling auto-sizes your cluster
– Based on preset triggers and schedules
• Integrates with CloudWatch metrics
• Use Auto Scaling to
– Improve customer experience,
application performance
– Maximize CPU/IO/Memory utilization
– Optimize other metrics
Scale with Real-Time Demand
44. Follow the Money vs. Follow the Customer
• Optimize utilization
– Auto Scale on utilization metrics: CPU, memory, requests,
connections, …
• Optimize price paid
– Scale with Spot instances when Spot prices are low
– e.g., Run batch processes off-peak (nights, weekends)
when Spot prices are lower
45. Follow the Money vs. Follow the Customer
• Optimize customer experience with Auto Scaling
• Example 1: Scale resources to meet customer demand
– Video service Auto Scales instances to respond to customer web service
requests
• Example 2: Scale resources to ensure fresh results
– A scientific paper search engine Auto Scales on queue depth (# of new
docs to crawl)
– 10 instances steady state and up to 5,000+ to ensure minimum
throughput time
• Example 3: Scale resources preemptively before large demand
– A TV show marketing site scales up before the show and back down after
46. Conclusion (Part I):
Fit the cloud to your product and business model
• Use Only What You Need (and pay only for what you
use!)
• Measure and Manage
• Scale Opportunistically
47. 1. Pay Only
for What You
Use: Rightsize your
cloud
resources
2. Monitor and
Manage your system
with CloudWatch,
Billing Alerts, Trusted
Advisor
3. Scale
Opportunistically:
Auto Scale worker
nodes based on size
of input queue
http://aws.amazon.com/architecture/
48. AWS Resources
Whitepapers available at
http://aws.amazon.com/whitepapers
TCO Online Calculator
http://aws.amazon.com/tco-calculator
AWS Simple Calculator
http://aws.amazon.com/calculator
49. Conclusion (Part II):
Use the cloud to create new products & business models
On-Premises
Optimized Cloud
• Failure is
expensive
• Failure is
inexpensive
• Experiment
infrequently
• Experiment early
and often
• Less Innovation
• More Innovation