Compared to storing long-term datasets on-premises, archiving in the cloud is a smart alternative whether you’re looking for an active archive solution, tape replacement, or to fulfill a compliance requirement. Learn how AWS customers are simplifying their archiving strategy and meeting compliance needs using Amazon Glacier. Hear how customers have evolved their backup and disaster recovery architectures and replaced tape solutions by turning to AWS for a more cost efficient, durable and agile solution. We will showcase Sony DADC's active archive deployment on Glacier and demo how some of our financial service customers have set up compliant archives to meet their regulatory objectives.
2. Cloud Data Migration
Direct
Connect
Snow* data
transport
family
3rd Party
Connectors
Transfer
Acceleration
Storage
Gateway
Kinesis Firehose
The AWS Storage Portfolio
Object
Amazon GlacierAmazon S3
Block
Amazon EBS
(persistent)
Amazon EC2
Instance Store
(ephemeral)
File
Amazon EFS
3. Satellite Image Archive
• DigitalGlobe takes Satellite imagery of the Earth
• 100PB image library = 6 billion square kilometers
• 1PB new image every year
• Images to be archived and retained for decades
4. Patient data–Philips Healthcare
• HealthSuite digital platform powered by AWS
• 15 petabytes of patient data
• Archived for decades (beyond the lifetime of patients)
• Uses AWS HIPAA-eligible services in the BAA
5. Public sector–King County
• Most populous county in Washington state
• Replaced tape solution for backup from 17 agencies
• Meets compliance requirement
• Saved $1MM in first year; no more tape refresh or
management churn
6. Archive:
Data retained for the long term,
for compliance or potential
future reference
Data archiving needs are growing everywhere
• Media assets, 4K, 8K
• Health care/life sciences
• Financial services
• Regulated industries
• Oil and gas/geospatial
• Digital preservation
• Long-term backups
• Logs
8. Traditional archiving approaches
• Tape libraries, robots, drives, media
• Onsite (online and offline)
• Offsite tape out/vaulting
• Specialized software and personnel
• Tape refresh every 3-5 years
9. How can AWS help with your archival?
Metered usage:
Pay as you go
No capital investment
No commitment
No risky capacity planning
Avoid risks of physical
media handling
Control your
geographic locality for
performance and
compliance
10. 1 PB raw storage
800 TB usable storage
600 TB allocated storage
400 TB application data
Storage pricing - pay only for what you use
AWS Cloud
Storage
Amazon Glacier starts at $0.004/GB/month
Price dropped by 43% on 11/21/2016
14. Accessing Amazon Glacier
1. Direct Amazon Glacier API/SDK
2. Amazon S3 lifecycle integration
3. Third-party tools and gateways
FastGlacier
15. Amazon Glacier – Direct access/APIs
Create
Vault
Configure
Access
Upload
Archives
Register
Archive ID
Data Upload
Initiate
Retrieval
Async
Retrieval
Completion
Completion
Notification
Download
Data
Data Retrieval
16. Use Glacier via S3 Object Lifecycle
S3 Standard
Active data Archive dataInfrequently accessed data
S3 - Infrequent Access Amazon Glacier
Synchronous access Async accessSynchronous access
$0.023/GB/mo. $0.004/GB/mo.$0.0125/GB/mo.
17. - Transition Standard to Standard-IA
- Transition Standard-IA to Amazon Glacier
- Transition based on object tags
- Expiration and versioning
Data lifecycle management
T T+3 days T+5 days T+ 15 days T + 25 days T + 30 days T + 60 days T + 90 days T + 150 days T + 250 days T + 365 days
Data access frequency over time
19. Save money on storage
45% saving over S3 Standard-IA
68% saving over S3 Standard-IA
* Assumes the highest public pricing tier
20. Amazon Glacier – Third-party tools and gateways
• Consumer grade: less than $50
• Example: Cloudberry, FastGlacier, Arq (Haystack Software)
• Small / medium business: $500 - $1,000
• Example: Synology, Veeam, QNap
• Enterprise gateway and data management software
• Example: NetApp AltaVault, CommVault, StorNext, StoreReduce,
Vidispine
21. Which option should I choose?
• Use S3 lifecycle managed Amazon Glacier if the S3
object keys are sufficient for index/search capability
• Use Amazon Glacier directly if you already plan to store
more metadata/indices in a database
• Use 3rd party tools to minimize coding
22. Amazon Glacier – Data Retrieval Tiers
Standard Retrieval
• Current model
• 3-5 hours
• Disaster Recovery
Bulk Retrieval
• Batch/Bulk access
• 5-12 hours
• PB scale re-transcoding
or video/image analysis
Expedited Retrieval
• Emergency access
• 1-5 minutes
• Last minute play-out
schedule swap
$0.03/GB $0.01/GB $0.0025/GB
On-site tape replacement Off-site tape replacement
23. • Media distribution backbone (Ve.nue platform)
• Over-The-Top (OTT) broadcast service
• 20PBs of media assets, 1MM+ hours of high-res content
• Assets to be archived and retained for decades
Video archives
25. “If physical deliveries can happen
within one hour based on
unpredictable requests, surely we
are able to exceed such
expectations digitally”
@SonyDADCNMS
26. Our migration
The Challenge
• Seamlessly migrate a platform that enables content
delivery across all devices and more than 1,200
distribution points worldwide
• Store 20 petabytes of motion picture and television
content
• Equating to 1,000,000 M+ hours of content
• At a growth curve of ~1 petabyte every quarter
Desired Goals:
• One-hour delivery turn around time
• Agile, scalable, predictable cost model and
infrastructure
• Investing in innovation vs. hardware
@SonyDADCNMS
31. Amazon Glacier Vault Lock allows you to easily
set compliance controls on individual vaults and
enforce them via a lockable policy
Time-based retention
MFA authentication
Controls govern all
records in a vault
Immutable policy
Two-step locking
Compliance storage with Vault Lock
32. Vault Lock for compliance storage
• Non-overwrite, non-erasable records
• Time-based retention with “ArchiveAgeInDays” control
• Policy lockdown (strong governance)
• Legal hold with vault-level tags
• Configure optional designated third-party access and grant
temporary access
33. Amazon Glacier received a third-party assessment
from Cohasset Associates on how Amazon Glacier
with Vault Lock can be used to meet the requirements
of SEC Rule 17a-4(f) and CFTC 1.31(b)-(c).
34. Proofpoint
• Cloud-based security and compliance for the enterprise:
threat research, email, mobile, social, digital risk
• Founded 2002, public in 2012
• $350M annual revenue, $3B market cap
• Big AWS user
35. Proofpoint SocialPatrol
Policy controls and enforcement for social
• Combats fraudulent brand impersonation
• Moderates content at scale
• Ensures compliance in publishing
• Integrates with social APIs
• 150+ classifiers using NLP and ML
• Text, links, images, meta data
• Ingesting >1M social posts per day
• Built in AWS
36. Proofpoint SocialPatrol Archive with Glacier
SEC Rule 17a-4(f)-compliant archive, purpose-built for
social, enabled by Amazon Glacier and Vault Lock
PFPT in AWS
Policy engine MySQL/C*/SolrSocial
Amazon Glacier
& Vault Lock
40. Proofpoint SocialPatrol Archive
As social content flows in, we record its purge date and
surface that to the user. Each piece of social content is an
archive in the vault.
42. Managing Legacy Tape Data with AWS
Migrate Long Term Retention Data from Backup Tapes to AWS
43. Introducing Index Engines
▪ New Amazon partner – announced January 31st
▪ Software company delivering enterprise indexing technology
▪ Direct indexing, reporting and access to backup data
▪ Supports data backed up by IBM, Dell EMC, Veritas, HP, etc.
▪ Cost effective migration from legacy tape to AWS S3
▪ Index Engines Overview
▪ Partners include: Amazon, Dell EMC, EY, FTI
▪ Clients include: JPMC, Citi, DB, Barclays, TIAA-CREF, Rabo AgriFinance
▪ Patented technology
Copyright Index Engines Inc. 2017 All rights reserved. 43
44. Product Offering
▪ Native S3 support
▪ Currently supports S3
▪ Development in process to support S3-IA and Glacier
▪ Index Engines
▪ Index, search and report on tape data
▪ Determine data of value, or unique data set for migration
▪ Migrates and archives data in AWS
Copyright Index Engines Inc. 2017 All rights reserved. 44
45. Transforming Clients with IE + AWS
Data Center
Reduce Tape Infrastructure
Eliminate Offsite Storage
Reclaim Resources
Business Users
Faster Time to Data
More Intelligence
Leverage IP
Governance
Manage Risk
Support eDiscovery
Proactive Insights
Copyright Index Engines Inc. 2017 All rights reserved. 45
46. Hardware
• Servers (NDMP)
• Libraries
• Floor space
Resources
• Manpower
• Data center costs
Backup Software
• Maintenance
• Infrastructure
• Management
SLAs & Restores
• Time to restore data
• 3rd party restore
services
Tape Storage
• Offsite storage costs
• Tape management
• Tape purchases
Risk & Liability
• eDiscovery
• Regulatory
• Long-term risk
True Cost Associated with Tape
47. Sample Environment
Copyright Index Engines Inc. 2017 All rights reserved. 47
~1TB per tape (highly redundant)
~50PB total
Unique data set: 17PB files, email
and databases
Data for migration: 10PB all
files/email and 50% of databases
Data for migration: 2.5PB all
files/email
▪ 50,000 legacy tapes at Iron Mountain
▪ Veritas NetBackup generated tapes
Annual Cost for 2.5PB
S3 $717,672
S3-IA $424,332
Glacier $138,228
48. Next Steps
▪ Learn more:
▪ www.indexengines.com/aws
▪ info@indexengines.com
Copyright Index Engines Inc. 2017 All rights reserved. 48