1. Introduction to
Amazon Web Services
Dayanand Shanmugham
http://www.linkedin.com/in/dayanandshanmugham
14 September 2012
2. Agenda
1. What is Amazon Web Services?
2. IT & Architecture Perspective:
a. Amazon Product Stack
b. S3 (Simple Storage Service) Basic Concepts
c. Demo of how to use S3
d. Potential Use Cases
3. Cloud IT Eco-System / Business Perspective :
a. Cloud Storage Costs
b. Cloud Service Providers (CSPs)
c. Comparison Study Report
3. Amazon Web Services
http://aws.amazon.com/
1. Amazon Web Services offers a complete set of infrastructure and
application services that enable us to run virtually everything in the cloud
– from enterprise applications and big data projects to social games and
mobile apps.
2. In 2006, Amazon Web Services (AWS) began offering IT infrastructure
services to businesses in the form of web services – now commonly
known as cloud computing.
3. Key benefits is opportunity to replace up-front capital infrastructure
expenses with low variable costs that scale with our business:
a. Low Cost - Pay-as-you-go pricing
b. Agility and Instant Elasticity - Massive global cloud infrastructure
c. Open and Flexible - Language and operating system agnostic platform
d. Secure - PCI DSS Level 1, ISO 27001, FISMA Moderate, HIPAA & SAS 70 Type
II
7. Amazon S3 (Simple Storage Service)
What is S3? Features? Concepts? Controls? Demo
Quote “…Subscriptions to Cloud Storage Services to Reach Half-Billion Level This Year…
jump to 625 million next year… projected to hit 1.3 billion in 2017…”
http://www.bloomberg.com/article/2012-09-06/aUXSunqkHUP0.html
Large organizations such as Amazon, NASA, Netflix, Google, NBC, Zynga, etc depend on Cloud Storage
8. Amazon S3 (Simple Storage Service) http://aws.amazon.com/s3/
http://[BucketName].s3.amazonaws.com/[Filename]
Amazon Simple Storage Service (Amazon S3) is a web service that enables us to
store data in the cloud. We can then download data or use the data with other
AWS services, such as Amazon Elastic Cloud Compute (EC2).
Features:
• Storage as a Service: through Web Service API
• Highly Scalable (Unlimited Storage Space)
• High Data Durability (99.999999999%)
• Highly Available (99.99%)
• Server Side Encryption (Data at Rest)
• Versioning
S3 Basic Concepts:
Object Buckets With Region - Geographical location where
Buckets Amazon S3 service is available
Objects
(Storage Container)
1. Fundamental Entity in Buckets
2. Each Object has a unique Key
9. Access Controls
1. Amazon S3 enables us to manage access to objects and buckets using – we can
use them independently or together:
– Access control lists (ACLs)
– Bucket policies
– IAM policies
2. ACLs only grant permissions; they do not deny them. ACLs can contain the
following grantee types:
– Specific AWS accounts
– All AWS accounts
– Any anonymous request
3. Bucket policies provide access control management at the bucket level for –
these are collection of JSON statements:
– Bucket
– Objects
4. AWS Identity and Access Management (IAM) enables us to create multiple users
within our AWS account and manage their permissions via IAM policies:
– Bucket policies are attached to a bucket
– IAM policies are attached to individual users in your account
10. Access Controls
http://docs.amazonwebservices.com/AmazonS3/latest/dev/UsingIAMPolicies.html
Example#1: IAM policy and Bucket Policy gives Bob & Susan permission to Example#2: IAM policy gives Bob permission to use PutObject on
access PutObject on bucket_xyz bucket_xyz & bucket policy gives permission to use ListBucket
Example#3: Explicit deny always overrides an allow Example#4: Bucket policy denies upload object (s3:PutObject) permission
to everyone if the request does not include the x-amz-server-side-
encryption header requesting server-side encryption
11. Basic Operations
https://console.aws.amazon.com
http://[BucketName].s3.amazonaws.com/[Filename]
Store /Write
DC2
Read
DC1
Our App
Delete
DC3
List
Demo of S3 using Amazon Web Service Console
13. Distributed Storage
Systems Strategy
Option#1
Store /Write
DC2
Allow
Replication To DC1
Our App Complete
DC3
Response (OK)
List
14. Distributed Storage
Systems Strategy
Option#2
Store /Write
DC2
Response (OK)
DC1
Our App
Replicate after
Data is stored DC3
in DC1
List
15. Behaviour of S3 in
Concurrent Read(s) & Write(s)
http://[BucketName].s3.amazonaws.com/[Filename]
16. Potential Use Cases /
Case Studies
S3 in Overall IT Application Architecture – When to use? Need?
17. Potential Use Cases / Case Studies
Need:
1. This enterprise manages two disparate sets of
information. Table-oriented data is maintained in
an on-premise Oracle database, while a SAN is
used as a repository for file-based information.
2. For further safeguarding of these vital assets,
tapes are used for backup and disaster recovery
purposes.
3. Approximately 20 GB of new information is
generated each day.
4. Unfortunately, the backup and archive
management processes are cumbersome and
expensive, while restoring archived
information can take days to complete
Solution:
1. Architects decide to use AWS cloud
2. For file-based artifacts, access to Amazon S3
will be via the Amazon AWS SDK for Java and
AWS Toolkit for Eclipse
3. Storage architects will create one or more uniquely-
identified buckets in Amazon S3, each of which can
hold an unlimited amount of backup data
4. For relational data, Oracle’s Secure Backup Cloud
Module will be able to take advantage of existing
RMAN scripts to back up information directly
from the Oracle database into Amazon S3
5. Third-party storage management solution can be
used to manage the entire process, including
encryption and other security details
18. Potential Use Cases / Case Studies
Need:
1. This company provides a tremendous number
of multimedia files (audio files, videos, and
images), which are stored on internally hosted
servers and made available over the Internet.
In addition to the multimedia files themselves,
each file requires a significant amount of
metadata (such as title, author, keywords, size,
and so on).
2. On the plus side, their web site is a hit
3. Unfortunately, internal servers are failing to
keep up with this demand — the sheer
amount of data will soon overwhelm available
disk storage, and the amount of necessary
metadata indexing is outstripping processing
capacity
Solution:
1. Architects decide to publish all
content to the AWS cloud-based
storage, thereby eliminating the
need to purchase and maintain
internal servers.
2. Users will directly fetch content
from AWS-based storage
19. Potential Use Cases / Case Studies
Need:
1. A large corporation is maintaining a data
warehouse on a High-CPU Extra Large
Amazon EC2 instance, with ten 800 GB
EBS volumes holding the information itself
2. While this architecture successfully meets
business needs, some new requirements
will mandate an extension to their storage
composition. Specifically, a change to
organizational policy now requires
frequent data snapshots; these images
need to archive supplementary data
approximately every one to two hours.
Solution:
1. Architects decide to add Amazon S3 to
the mixture
2. Amazon EC2 instance and supporting
EBS volumes will continue in their current
roles
3. Developers will write a script or small
application that uses Amazon EBS API
to create incremental snapshots stored
in Amazon S3
4. The new snapshot application will be run
every 120 minutes using a Linux cron job
or a Windows scheduled task
20. Potential Use Cases / Case Studies
Need:
1. A New SaaS application is in the planning stages
2. Solution will need to manage a variety of information, including
traditional relational data, frequently changing status feeds, and
large amounts of multimedia.
3. The logic behind the application will be complex and processing-
intensive, needing numerous joins of relational data to produce the
required results.
4. Finally, as is the case with virtually all of these types of solutions,
users will demand high availability and fault tolerance.
5. Rather than being forced to hand-code all of these availability-
oriented capabilities, the designers hope to leverage the cloud
for features such as replication, scalability, and automated
backups, freeing their time for developing the core application
logic.
Solution:
1. The Architects of the new SaaS package opt for a collection of AWS
storage offerings, each of which will handle a specific responsibility.
2. RDS will serve as the repository for all information that requires full
relational database infrastructure
3. To process complex cross-table joins, RDS will deliver high availability
via its well-proven data replication architecture
4. Read operations will be served by the slaves, with writes being
processed on the master—all of which will be automatically
managed by RDS
5. RDS will provide database administration as a service, with one-click
vertical scaling, elastic storage, and automated backups
6. Multimedia objects will reside in Amazon S3, while SimpleDB will
be tasked with maintaining the continually updated status feeds
21. Potential Use Cases / Case Studies
Need:
1. This web-based social application has seen
a massive spike in traffic - likely cause for
this momentum is the release of a new client
application designed for Smartphones
2. Server-side information repository was
originally deployed using an on-premise
MySQL database
3. No longer possible to support the number
of clients or the amount of managed
information with that database alone
Solution:
1. Architects want to avoid forcing a costly and
time-consuming rewrite of their primary
application, so most of the database-focused
logic must remain unchanged
23. “File Storage Costs Less In The Cloud
Than In-House” By Forrestor
http://media.amazonwebservices.com/Forrester_File_Storage_Costs_Less_In_The_Cloud.pdf
http://aws.amazon.com/s3/#pricing
24. “File Storage Costs Less In The Cloud
Than In-House” By Forrestor
http://media.amazonwebservices.com/Forrester_File_Storage_Costs_Less_In_The_Cloud.pdf
http://aws.amazon.com/s3/#pricing
25. “File Storage Costs Less In The Cloud
Than In-House” By Forrestor
http://media.amazonwebservices.com/Forrester_File_Storage_Costs_Less_In_The_Cloud.pdf
http://aws.amazon.com/s3/#pricing
31. Comparative Study (Done by Nasuni)
http://www.nasuni.com/downloads/resources/87/the_state_of_cloud_storage.pdf
Background:
1. In April 2009, Nasuni began an ongoing and unprecedented evaluation process of
“Cloud Service Providers (CSPs) based on a wide array of factors
2. Technical test looked at how CSPs perform in a specific use case, namely, how
they perform for organizations that want to take advantage of the cloud for
primary storage, data protection and disaster recovery. Nasuni did not test for
every possible use case.
3. Test focus on three key areas of importance for organizations that want to take
advantage of cloud storage:
– Performance: The cloud needs to respond quickly to queries and not slow when
stressed. If performance is poor, organizations will spend too much time waiting on the
cloud and productivity will suffer.
– Stability/Availability: If organizations are to trust critical data to the cloud, it must be
available at all times.
– Scalability: One of the primary advantages of cloud storage is unlimited capacity.
Without this property, the cloud is much less valuable to organizations.
32. Comparative Study Results
http://www.nasuni.com/downloads/resources/87/the_state_of_cloud_storage.pdf
Results:
1. Ultimately, only 6 of 16 providers passed Nasuni’s testing:
– Amazon S3
– AT&T Synaptic Storage as a Service
– Microsoft Azure
– Nirvanix
– Peer1 Hosting
– Rackspace Cloud
2. Only Two Cloud Service Providers emerged as top performers in the
Nasuni study:
– Amazon S3
– Microsoft Azure
(Note: Amazon S3 standout across all evaluation areas)
33. Comparative Study Methodology
http://www.nasuni.com/downloads/resources/87/the_state_of_cloud_storage.pdf
Methodology:
1. API Integration – To ensure that it is possible to test the service at all
2. Unit Testing – Larger software components are broken down into
their building blocks (units) and then tested for inputs, outputs and
error cases
3. Performance Testing – To measure response time (how quickly one
can interact with the cloud), throughput (how fast data can move
back and forth to and from the cloud), and the impact of a higher
level of stress.
4. Stability Testing – To assess the long-term reliability of each CSP
5. Scalability Testing – To understand how well each CSP handles high
object counts
34. Comparative Study Reports
http://www.nasuni.com/downloads/resources/87/the_state_of_cloud_storage.pdf
35. Comparative Study Reports
http://www.nasuni.com/downloads/resources/87/the_state_of_cloud_storage.pdf
36. Comparative Study Reports
http://www.nasuni.com/downloads/resources/87/the_state_of_cloud_storage.pdf