Join our webinar to learn more about how to build a cost effective archive application using Amazon Glacier, an extremely low cost, secure, highly durable, and easy to use storage service in the AWS cloud.
We will explain how Amazon Glacier works and walk through some best practices to get the most out of the service
We will also highlight how to choose between Amazon Glacier and Amazon S3’s Glacier storage option.
Learn more: http://aws.amazon.com/glacier/
4. With Amazon Glacier, You Can:
Achieve extremely low storage costs for archive data
Pay only for what you use
No longer maintain your own physical storage infrastructure
Increase durability and geographic redundancy
Secure your data
Access on-demand computing EC2
5. What is Archival Data?
Most data stored is infrequently accessed (Cold Data)
Often older data still important for future reference
Typically long-lived (months or years)
Business and regulatory reasons to retain data
6. What is Amazon Glacier?
Extremely low cost archive storage service
Allows you to retrieve any amount of data within 3-5 hours
Provides high-durability storage
Makes it easy to retain data safely and securely for months,
years, or decades
7. Benefits with Amazon Glacier
Low cost
As little as $0.01/GB/month with no up-front
capital commitments.
Durable
Designed to provide an average annual
durability of 99.999999999% per archive.
Flexible
Store any amount of data on-demand. Eliminate
the need for capacity planning.
Secure
Leverage AWS’ robust security platform.
Control access to your data.
Simple
Eliminate your operational overhead. Focus
your resources on your core business.
Use multiple services
Easily leverage other AWS services once your
data is in the AWS cloud.
8. Customer Data Archiving Examples
Enterprise Archives
Media Archives
Scientific Archives
Enterprise Information Archiving includes archiving
email, business documents and other unstructured
content. Driven by business needs, compliance
requirements, and to reduce primary storage costs.
Media companies’ core assets (books, movies,
music, TV etc.) can grow to hundreds of petabytes.
Amazon Glacier reduces the cost of storing these
assets while simultaneously increasing the durability,
ease of use, and accessibility of the content.
Research and scientific organizations, such as
pharmaceutical and bio-tech companies, as well as
universities, store many large but rarely accessed
data sets.
10. High-level Amazon Glacier Architecture
Archive Application
Send + Receive Data
HTTP / REST APIs / AWS Import/Export
Archive Application
(Search, Policy-based data
management, eDiscovery)
Amazon Glacier
Amazon IAM
Control Access to your data
Index
(Index of your
archived data)
11. Amazon Glacier Concepts
Archives
An archive is a durably stored block of information. You store your data in
Amazon Glacier as archives. You may upload a single file as an archive,
but your request costs will be lower if you aggregate your data. TAR and
ZIP are common formats that customers use to aggregate multiple files into
a single file before uploading to Amazon Glacier
Vaults
You use vaults to organize the data you store in Amazon Glacier. Each
archive is stored in a vault of your choice. You may control access to your
data by setting vault-level access policies
12. Uploading Data to Amazon Glacier
2
1
Upload Archives
Create Vault
3
Configure Access Policies (Optional)
via
Amazon Identity and
Access Management
Retrieve Archives
Archives are retrieved 3 - 5 hours after being requested
Initiate
Job
Track
Job
Download
Job
Output
13. Retrieving Data from Amazon Glacier
2
1
Upload Archives
Create Vault
3
Configure Access Policies (Optional)
via
Amazon Identity and
Access Management
Retrieve Archives
Archives are retrieved 3 - 5 hours after being requested
Initiate
Job
Track
Job
Download
Job
Output
14. Sending / Retrieving Data
Sending and retrieving data
• Glacier REST-based APIs to send and retrieve data
• Direct Connect
• Amazon S3 lifecycle archival to Amazon Glacier
15. Additional Amazon Glacier / AWS Concepts
Vault Inventory
For a real time view of the contents of your vaults, you would refer
to your index. For Disaster Recovery purposes, in case you lose or
corrupt your index, Amazon Glacier maintains an inventory of all
your archives in a vault. The vault inventory is updated
approximately once a day
Amazon Simple Notification Service (Amazon SNS)
Amazon Simple Notification Service (Amazon SNS) is a web service
that makes it easy to set up, operate, and send notifications from the
cloud
16. Amazon Glacier Key Concepts
2
1
Create Vault
Configure Access Policies
(Optional) via
Amazon Identity and
Access Management
Configure Notification Policies
(Optional) via
Amazon Simple
Notification Service
AWS Management Console Operations
Also accessible via Amazon Glacier APIs or SDKs
3
Upload Archives
Download
Archives
Retrieve Archives
Archives retrieved 3 - 5 hours after being requested
Initiate
Job
Track
Job
Download
Job
Output
Amazon Glacier API Operations
Also accessible via Amazon Glacier SDKs
Notifications sent via
Amazon SNS
Your
Application
18. Aggregate Large Number of Smaller Files
Reduce overhead costs
Reduce requests costs
Find ideal archive size for your use case
19. Uploading Large files – MultipartUpload
Internet weather
Distance between your application and Amazon Glacier
Cost of retrying failed transmissions
Improve upload throughput
21. Optimize Data Retrieval and Download
Retrieval vs. Download
Ranged Retrieval
• Reduce cost, control retrieval rate
• Retrieve only what you need
Ranged Download (Get)
• Improve download speed
• Be aware of your download speed as data is only staged for 24 hours
22. Ranged Retrieval Example
Example 12 GB archive
Retrieved using a single 4 hour job = 3GB/hour peak
retrieval
Retrieved over 24 hours using 6 consecutive jobs =
0.5GB/hour peak retrieval
23. Amazon Glacier Benefits
Low cost
As little as $0.01/GB/month with no up-front
capital commitments.
Durable
Designed to provide an average annual
durability of 99.999999999% per archive.
Flexible
Store any amount of data on-demand. Eliminate
the need for capacity planning.
Secure
Leverage AWS’ robust security platform.
Control access to your data.
Simple
Eliminate your operational overhead. Focus
your resources on your core business.
Use multiple services
Easily leverage other AWS services once your
data is in the AWS cloud.