Not just for archiving or compliance use cases, Amazon Glacier accommodates customers simply looking to replace their on-premises long term storage with a cost efficient, durable, cloud option, from which they can easily and quickly access their data when they need to. This session will introduce newly launched features for Amazon Glacier, review the current service feature set, and share the global data center shut down and storage strategy for Sony DADC New Media Solutions (NMS). NMS is Sony’s digital servicing division providing global digital distribution, linear playout and white label OTT/Commerce solutions for clients such as BBC Worldwide, NBCUniversal, Sony Playstation, and Funimation Entertainment.
Hear from Andy Shenkler, NMS’s Chief Technology and Solutions Officer as he talks about the key factors that drove the organization’s decision to move away from tape and go towards the cloud and out of the infrastructure business overall. Learn more about the impact and operational practices inside a world class digital supply chain as they were able to move over 20 petabytes of data, over 1M hours of video, to the cloud and never looked back.
2. Audio archives – SoundCloud
• World’s leading social sound
platform
• Audio files transcoded and
stored in multiple formats
• Stores petabytes of data
• Transcoded files served from
Amazon S3
• Originals moved to Amazon
Glacier for longterm retention
3. Patient data – Philips Healthcare
• HealthSuite digital platform
powered by AWS
• 15 petabytes of patient data
• Securely stored for decades
(beyond the lifetime of patients)
• Uses HIPAA-eligible AWS
services
4. Tape replacement – King County
• Most populous county in
Washington State
• Replaced tape solution for
backups from 17 agencies
• Meets compliance
requirements
• Saved $1MM in first year, no
more tape refresh or
management churn
5.
6. Batches and Streams
Direct
Connect
Snowball,
Snowball Edge,
Snowmobile
3rd Party
Connectors
Transfer
Acceleration
Storage
Gateway
Kinesis Firehose
File
Amazon EFS
Block
Amazon EBS
(persistent)
Object
Amazon GlacierAmazon S3 Amazon EC2
Instance Store
(ephemeral)
7. Data Storage Demand
• Media assets, 4k, 8k
• Healthcare/life sciences
• Financial services
• Regulated industries
• Oil and gas/geospatial
• Digital preservation
• Longterm backups
• Logs
Archive:
• Secure and durable
• Low cost
• Flexible data access
• Compliant
8. Amazon Glacier
• Extremely low-cost archive storage service, starting at $0.004
per GB per month
• New! Three retrieval options ranging from minutes to hours
(more later)
• 99.999999999% of durability (5-6 orders of magnitude higher
than 2 copies of tape)
• All data is encrypted at rest
• Features: compliance, data management, cost management,
audit logging
9. Amazon Glacier
Metered
usage:
pay as you go
No capital investment
No commitment
No risky capacity
planning
Avoid risks of
physical media
handling
Control your
geographic
locality for
performance
and compliance
10. Key Terms and Concepts
• Vaults – container for archives, up to 1,000 vaults per account
• Archives – basic unit, write-once, 40 TB max, unlimited archives
• Inventory – cold index of archives refreshed every 24 hours
• Access – three ways to access Amazon Glacier
• Uploads – multipart, lifecycle, cost optimizations, AWS Snowball
• Data management – Vault Lock, tagging, audit logs
• Retrievals – retrieval policies, range retrievals, new retrieval
features
11. Accessing Amazon Glacier
1. Direct Amazon Glacier API/SDK
2. Amazon S3 lifecycle integration
3. Third-party tools and gateways
FastGlacier
12. Uploading data: Internet or sneaker-net
AWS Direct
Connect
Dedicated bandwidth between
your site and AWS
Internet
Transfer data in a secure SSL tunnel
over the public Internet
AWS Import/Export
AWS Snowball
Physical transfer of media into
and out of AWS
13. Uploading data: archive descriptions
• Use archive description field for
metadata
• If local index is corrupted or
destroyed, use archive description
to reconstruct critical mappings
• For example, create index entry,
add primary key to archive
description on upload
Local Index Entry
Primary key: 12345
Description: 2014Audit
Dept: FinanceDept
ArchiveID: 9FG23…..
…..
UploadArchive(data,
ArchiveDescription=“12345,
2014Audit,FinanceDept”) ->
Archive ID = 9FG23…..
14. Uploading data: optimizing costs
• Every archive has 32 KB of associated
overhead and some operations are charged per
request
• For archive size of 3.2 MB ~1% cost overheads
• For 1 KB archive, 97% of cost would go to
overhead
• Solution is aggregation – recommend minimum
size on the order of at least MBs
16. Best practices: multipart uploads
Improve throughput, reliability, and get idempotency
1. InitiateMultipartUpload(partSize) → uploadId
2. UploadPart(uploadId, data)
3. CompleteMultipartUpload(uploadId) → archiveId
Archive
Parallel Uploads
Parts
17. Amazon Glacier: Amazon S3 lifecycle policies
• Seamlessly move data from Amazon S3 to Amazon Glacier
• Automated lifecycle rules
• Transition based on object age
18. Amazon Glacier: Amazon S3 lifecycle policies
• Object-level tagging for S3
objects
• Apply lifecycle rules based on
object tags
• Example: transition objects to
Amazon Glacier when 1 year
old and have object tags
‘Project=Delta’ and ‘Data
type=HPI’.
20. Management features: audit logging via
AWS CloudTrail
• Enable AWS
CloudTrail in console
• Control plane events:
vault activities
• Data plane events:
archive activities
21. Management features: vault access policies
• Manage access to a vault in a single location – single AWS Identity and
Access Management (IAM) policy
– Grant/revoke access to internal business units/teams
– “Marketing_Vault” has an access policy that is distinct from
“DevOps_Vault”
• Easily manage cross-account access for your business partner
– Simply add a section for your business partner in the same policy
22. Management features: Vault Lock
• Non-overwrite, non-erasable records
• Time-based retention with “ArchiveAgeInDays” control
• Policy lockdown (strong governance)
• Legal hold with vault-level tags
• Configure optional designated third-party access and grant
temporary access
23. Vault Lock: two-step locking
• InitiateVaultLock
– Effectuates a retention policy for testing (in-progress state)
– Returns a unique lock ID (expires after 24 hours)
• AbortVaultLock
– Deletes an in-progress policy
– Ability to modify a policy before locking it down
• CompleteVaultLock
– Locks down the vault with the appropriate lock ID
– A Vault Lock policy cannot be aborted once locked
Management features: Vault Lock
24. • Set up a legal hold tag
– Configure a vault-level tag “LegalHold”
– Set initial value to “False”
• Add compliance control for legal hold in a vault lock policy
– Deny delete archive operation
– From anybody (root, administrators, users, business partners)
– When LegalHold tag = “True”
• Place or lift legal hold by updating the tag value
Legal hold with vault-level tags
Management features: Vault Lock
26. • Map one vault to a single retention range
– Group regulatory data by retention: 1-year vault, 6-year vault, etc.
• Create a new vault and lock it before storing production data
– Enforce the full ArchiveAgeInDays on all new archives
– Leave no “gap” on existing archives
• Thoroughly test a vault lock policy before locking it down (Abort/Initiate)
• Implement only the most restrictive controls with Vault Lock
– Leave the flexible controls to vault access policy
Vault Lock best practices
Management features: Vault Lock
27. Amazon Glacier received a third-party assessment
from Cohasset Associates on how Amazon Glacier
with Vault Lock can be used to meet the
requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c)
Third-party assessment
Management features: Vault Lock
28. Data retrievals: basic concepts
Initiate job
ArchiveId: AE99F…
Vault: Films -> Job ID
1
3-5 hours for job completion2
3 Job completion notification
4 Download output
31. Data retrievals: data retrieval policies
• Provides transparency and cost control for data retrievals
• Governs all retrieval activities for an account in a region
• Synchronously accepts or rejects each retrieval request
• Accounts for inflight retrieval operations
33. Data retrievals: expedited and bulk retrievals
Expedited Standard Bulk
Data Access Time 1 - 5 minutes 3 - 5 hours 5 - 12 hours
Data Retrievals $0.03 per GB $0.01 per GB $0.0025 per GB
Retrieval Requests $0.01 per request $0.05 per 1,000 requests $0.025 per 1,000 requests
• Expedited: designed for occasional urgent access to a small
number of archives
• Standard: low-cost option for retrieving data in just a few hours
• Bulk: lowest cost option optimized for large retrievals, up to
petabytes of data in 12 hours
• Three flexible and powerful retrieval options to access any of your
Amazon Glacier data
36. “If physical deliveries can happen
within one hour based on
unpredictable requests, surely we
are able to exceed such
expectations digitally”
@SonyDADCNMS
37. Our migration
The Challenge
• Seamlessly migrate a platform that enables content
delivery across all devices and more than 1,200
distribution points worldwide
• Store 20 petabytes of motion picture and television
content
• Equating to 1,000,000 M+ hours of content
• At a growth curve of ~1 petabyte every quarter
Desired Goals:
• One-hour delivery turn around time
• Agile, scalable, predictable cost model and
infrastructure
• Investing in innovation vs. hardware
@SonyDADCNMS