SlideShare a Scribd company logo
1 of 56
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Carl Summers, Software Development Engineer
Omair Gillani, Sr. Product Manager
4/19/2016
Amazon S3
Deep Dive
AWS storage maturity
Amazon EFS
File
Amazon EBS
Amazon EC2
instance store
Block
Amazon S3 Amazon Glacier
Object
Data transfer
AWS Direct
Connect
Snowball ISV connectors Amazon
Kinesis
Firehose
Transfer
Acceleration
AWS Storage
Gateway
Our customer promise
Durable
11 9s
Available
Designed for 99.99%
Scalable
Gigabytes -> Exabytes
Cross-region
replication
- Amazon CloudWatch
metrics for Amazon S3
- AWS CloudTrail support
VPC endpoint
for Amazon S3
Amazon S3 bucket
limit increase
Event notifications
Read-after-write
consistency in all regions
Innovation for Amazon S3
Amazon S3
Standard-IA
Expired object delete
marker
Incomplete multipart
upload expiration
Lifecycle policy
Transfer
Acceleration
Innovation for Amazon S3, continued…
Choice of storage classes on Amazon S3
Standard
Active data Archive dataInfrequently accessed data
Standard - Infrequent Access Amazon Glacier
File sync and share
+
consumer file
storage
Backup and archive +
disaster recovery
Long retained
data
Some use cases have different requirements
11 9s of durability
Standard-Infrequent Access storage
Designed for
99.9% availability
Durable Available
Same throughput as
Amazon S3 Standard storage
High performance
• Server-side encryption
• Use your encryption keys
• KMS-managed encryption keys
Secure
• Lifecycle management
• Versioning
• Event notifications
• Metrics
Integrated
• No impact on user
experience
• Simple REST API
• Single bucket
Easy to use
Understand your cloud storage
Understand your cloud storage
Amazon S3
S3 Server
Access Logs
Amazon S3
Hive on Amazon EMR
Amazon S3
Aggregation Aggregation result storage Aggregation result analysis
Persist prepared
datasets
Load prepared data
Pre-processed data storage
Amazon Redshift
Amazon S3
S3 Server
Access Logs
Amazon S3
Hive on Amazon EMR
Amazon S3
Aggregation Aggregation result storage
Persist prepared
datasets
Load prepared data
Load data
Pre-processed data storage
1. Enable Access
Logs
2. Create EMR Cluster
3. Spark code to
aggregate logs
4. Submit code to EMR
5. Persist
interim
results on S3
7. Visualize Data
Aggregation result analysis
Amazon Redshift
6. Persist
final results
on S3
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DEMO
Understanding your cloud storage
• Main Spark app
• Persist pre-processed data
in S3
• Prefix aggregation
• Persist result in S3
Amazon S3 as your persistent data store
Separate compute and storage
Resize and shut down Amazon
EMR clusters with no data loss
Point multiple Amazon EMR clusters
at the same data in Amazon S3
EMR
EMR
Amazon
S3
EMRFS makes it easier to use Amazon S3
• Read-after-write consistency
• Very fast list operations
• Error handling options
• Support for Amazon S3 encryption
• Transparent to applications: s3://
Management policies
Lifecycle policies
• Automatic tiering and cost controls
• Includes two possible actions:
• Transition: archives to Standard-IA or Amazon
Glacier after specified time
• Expiration: deletes objects after specified time
• Allows for actions to be combined
• Set policies at the prefix level
Lifecycle policies
- Transition Standard to Standard-IA
- Transition Standard-IA to Amazon Glacier
storage
- Expiration lifecycle policy
- Versioning support
- Directly PUT to Standard-IA
Standard-Infrequent Access storage
Integrated: Lifecycle management
Standard - Infrequent Access
Lifecycle policy
Standard Storage -> Standard-IA
<LifecycleConfiguration>
<Rule>
<ID>sample-rule</ID>
<Prefix>documents/</Prefix>
<Status>Enabled</Status>
<Transition>
<Days>30</Days>
<StorageClass>STANDARD-IA</StorageClass>
</Transition>
<Transition>
<Days>365</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
</Rule>
</LifecycleConfiguration>
Standard-Infrequent Access storage
Lifecycle Policy
Standard Storage -> Standard-IA
<LifecycleConfiguration>
<Rule>
<ID>sample-rule</ID>
<Prefix>documents/</Prefix>
<Status>Enabled</Status>
<Transition>
<Days>30</Days>
<StorageClass>STANDARD-IA</StorageClass>
</Transition>
<Transition>
<Days>365</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
</Rule>
</LifecycleConfiguration>
Standard-IA Storage -> Amazon Glacier
Standard-Infrequent Access storage
Versioning S3 buckets
• Protects from accidental overwrites and
deletes
• New version with every upload
• Easy retrieval of deleted objects and roll
back
• Three states of an Amazon S3 bucket
• Default – Unversioned
• Versioning-enabled
• Versioning-suspended
Versioning
Best Practice
Versioning + lifecycle policies
Expired object delete marker policy
• Deleting a versioned object makes a
delete marker the current version of the
object
• No storage charge for delete marker
• Removing delete marker can improve
list performance
• Lifecycle policy to automatically remove
the current version delete marker when
previous versions of the object no
longer exist
Expired object delete
marker
Example lifecycle policy to remove current versions
<LifecycleConfiguration>
<Rule>
...
<Expiration>
<Days>60</Days>
</Expiration>
<NoncurrentVersionExpiration>
<NoncurrentDays>30</NoncurrentDays>
</NoncurrentVersionExpiration>
</Rule>
</LifecycleConfiguration>
Expired object delete marker policy
Leverage lifecycle to expire current
and non-current versions
S3 Lifecycle will automatically remove any
expired object delete markers
Example lifecycle policy for non-current version expiration
Lifecycle configuration with
NoncurrentVersionExpiration action
removes all the noncurrent versions,
<LifecycleConfiguration>
<Rule>
...
<Expiration>
<ExpiredObjectDeleteMarker>true</ExpiredObjectDeleteMarker>
</Expiration>
<NoncurrentVersionExpiration>
<NoncurrentDays>30</NoncurrentDays>
</NoncurrentVersionExpiration>
</Rule>
</LifecycleConfiguration>
Expired object delete marker policy
ExpiredObjectDeleteMarker element
removes expired object delete markers.
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DEMO
Expired object delete marker policy
Tip: Restricting deletes
• Bucket policies can restrict deletes
• For additional security, enable MFA (multi-factor
authentication) delete, which requires additional
authentication to:
• Change the versioning state of your bucket
• Permanently delete an object version
• MFA delete requires both your security credentials and a
code from an approved authentication device
Best Practice
Performance optimization for S3
Parallelizing PUTs with multipart uploads
• Increase aggregate throughput by
parallelizing PUTs on high-bandwidth
networks
• Move the bottleneck to the network
where it belongs
• Increase resiliency to network errors;
fewer large restarts on error-prone
networks
Best Practice
Multipart upload provides parallelism
• Allows faster, more flexible uploads
• Allows you to upload a single object as a set of parts
• Upon upload, Amazon S3 then presents all parts as
a single object
• Enables parallel uploads, pausing and resuming
an object upload and starting uploads before
you know the total object size
Incomplete multipart upload expiration policy
• Multipart upload feature improves
PUT performance
• Partial upload does not appear in
bucket list
• Partial upload does incur storage
charges
• Set a lifecycle policy to automatically
expire incomplete multipart uploads
after a predefined number of days
Incomplete multipart
upload expiration
Example lifecycle policy
Abort incomplete multipart uploads
seven days after initiation
<LifecycleConfiguration>
<Rule>
<ID>sample-rule</ID>
<Prefix>SomeKeyPrefix/</Prefix>
<Status>rule-status</Status>
<AbortIncompleteMultipartUpload>
<DaysAfterInitiation>7</DaysAfterInitiation>
</AbortIncompleteMultipartUpload>
</Rule>
</LifecycleConfiguration>
Incomplete multipart upload expiration policy
Parallelize your GETs
• Use range-based GETs to get
multithreaded performance when
downloading objects
• Compensates for unreliable networks
• Benefits of multithreaded parallelism
• Align your ranges with your parts!Best Practice
Parallelizing LIST
Parallelize LIST when you need a
sequential list of your keys
Secondary index to get a faster
alternative to LIST
• Sorting by metadata
• Search ability
• Objects by timestamp
Best Practice
SSL best practices to optimize performance
• Use the SDKs!!
• EC2 instance types
• AES-NI hardware acceleration (cat /proc/cpuinfo)
• Threads can work against you (finite network
capacity)
• Timeouts
• Connection pooling
• Perform keep-alives to avoid handshake
Best Practice
<my_bucket>/2013_11_13-164533125.jpg
<my_bucket>/2013_11_13-164533126.jpg
<my_bucket>/2013_11_13-164533127.jpg
<my_bucket>/2013_11_13-164533128.jpg
<my_bucket>/2013_11_12-164533129.jpg
<my_bucket>/2013_11_12-164533130.jpg
<my_bucket>/2013_11_12-164533131.jpg
<my_bucket>/2013_11_12-164533132.jpg
<my_bucket>/2013_11_11-164533133.jpg
<my_bucket>/2013_11_11-164533134.jpg
<my_bucket>/2013_11_11-164533135.jpg
<my_bucket>/2013_11_11-164533136.jpg
Use a key-naming scheme with randomness at the beginning for high
TPS
• Most important if you regularly exceed 100 TPS on a bucket
• Avoid starting with a date
• Consider adding a hash or reversed timestamp (ssmmhhddmmyy)
Don’t do this…
Distributing key names
Distributing key names
…because this is going to happen
1 2 N
1 2 N
Partition Partition Partition Partition
Distributing key names
Add randomness to the beginning of the key name…
<my_bucket>/521335461-2013_11_13.jpg
<my_bucket>/465330151-2013_11_13.jpg
<my_bucket>/987331160-2013_11_13.jpg
<my_bucket>/465765461-2013_11_13.jpg
<my_bucket>/125631151-2013_11_13.jpg
<my_bucket>/934563160-2013_11_13.jpg
<my_bucket>/532132341-2013_11_13.jpg
<my_bucket>/565437681-2013_11_13.jpg
<my_bucket>/234567460-2013_11_13.jpg
<my_bucket>/456767561-2013_11_13.jpg
<my_bucket>/345565651-2013_11_13.jpg
<my_bucket>/431345660-2013_11_13.jpg
Distributing key names
…so your transactions can be distributed across the partitions
1 2 N
1 2 N
Partition Partition Partition Partition
Best Practice
Other techniques for distributing key names
Store objects as a hash of their name
• add the original name as metadata
• “deadmau5_mix.mp3” 
0aa316fb000eae52921aab1b4697424958a53ad9
• prepend key name with short hash
• 0aa3-deadmau5_mix.mp3
(reverse)
• 5321354831-deadmau5_mix.mp3
Best Practice
Recap
• Amazon S3 Standard-Infrequent Access
• Using big data on Amazon S3 for analysis
• Amazon S3 management policies
• Versioning for Amazon S3
• Best practices and performance optimization for
Amazon S3
Deep Dive on Amazon S3

More Related Content

What's hot

What's hot (20)

Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
 
AWS Storage - S3 Fundamentals
AWS Storage - S3 FundamentalsAWS Storage - S3 Fundamentals
AWS Storage - S3 Fundamentals
 
Using AWS Key Management Service for Secure Workloads
Using AWS Key Management Service for Secure WorkloadsUsing AWS Key Management Service for Secure Workloads
Using AWS Key Management Service for Secure Workloads
 
Deep Dive - Amazon Virtual Private Cloud (VPC)
Deep Dive - Amazon Virtual Private Cloud (VPC)Deep Dive - Amazon Virtual Private Cloud (VPC)
Deep Dive - Amazon Virtual Private Cloud (VPC)
 
AWS re:Invent 2016: Become an AWS IAM Policy Ninja in 60 Minutes or Less (SAC...
AWS re:Invent 2016: Become an AWS IAM Policy Ninja in 60 Minutes or Less (SAC...AWS re:Invent 2016: Become an AWS IAM Policy Ninja in 60 Minutes or Less (SAC...
AWS re:Invent 2016: Become an AWS IAM Policy Ninja in 60 Minutes or Less (SAC...
 
Amazon S3 Masterclass
Amazon S3 MasterclassAmazon S3 Masterclass
Amazon S3 Masterclass
 
Deep Dive on Amazon S3 - AWS Online Tech Talks
Deep Dive on Amazon S3 - AWS Online Tech TalksDeep Dive on Amazon S3 - AWS Online Tech Talks
Deep Dive on Amazon S3 - AWS Online Tech Talks
 
Introduction to EC2
Introduction to EC2Introduction to EC2
Introduction to EC2
 
[AWS Builders] AWS 네트워크 서비스 소개 및 사용 방법 - 김기현, AWS 솔루션즈 아키텍트
[AWS Builders] AWS 네트워크 서비스 소개 및 사용 방법 - 김기현, AWS 솔루션즈 아키텍트[AWS Builders] AWS 네트워크 서비스 소개 및 사용 방법 - 김기현, AWS 솔루션즈 아키텍트
[AWS Builders] AWS 네트워크 서비스 소개 및 사용 방법 - 김기현, AWS 솔루션즈 아키텍트
 
AWS Security Week: AWS Secrets Manager
AWS Security Week: AWS Secrets ManagerAWS Security Week: AWS Secrets Manager
AWS Security Week: AWS Secrets Manager
 
Become an IAM Policy Master in 60 Minutes or Less (SEC316-R1) - AWS reInvent ...
Become an IAM Policy Master in 60 Minutes or Less (SEC316-R1) - AWS reInvent ...Become an IAM Policy Master in 60 Minutes or Less (SEC316-R1) - AWS reInvent ...
Become an IAM Policy Master in 60 Minutes or Less (SEC316-R1) - AWS reInvent ...
 
Object Storage: Amazon S3 and Amazon Glacier
Object Storage: Amazon S3 and Amazon GlacierObject Storage: Amazon S3 and Amazon Glacier
Object Storage: Amazon S3 and Amazon Glacier
 
Introduction to AWS VPC, Guidelines, and Best Practices
Introduction to AWS VPC, Guidelines, and Best PracticesIntroduction to AWS VPC, Guidelines, and Best Practices
Introduction to AWS VPC, Guidelines, and Best Practices
 
AWS S3 Tutorial For Beginners | Edureka
AWS S3 Tutorial For Beginners | EdurekaAWS S3 Tutorial For Beginners | Edureka
AWS S3 Tutorial For Beginners | Edureka
 
AWS IAM
AWS IAMAWS IAM
AWS IAM
 
AWS EC2
AWS EC2AWS EC2
AWS EC2
 
Introduction to AWS Storage Services
Introduction to AWS Storage ServicesIntroduction to AWS Storage Services
Introduction to AWS Storage Services
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute Services
 
Intro to AWS: Storage Services
Intro to AWS: Storage ServicesIntro to AWS: Storage Services
Intro to AWS: Storage Services
 
Deep Dive on Amazon EC2 Systems Manager
Deep Dive on Amazon EC2 Systems ManagerDeep Dive on Amazon EC2 Systems Manager
Deep Dive on Amazon EC2 Systems Manager
 

Viewers also liked

S3 Overview Presentation
S3 Overview PresentationS3 Overview Presentation
S3 Overview Presentation
bcburchn
 
Amazon Elastic Load Balancing
Amazon Elastic Load BalancingAmazon Elastic Load Balancing
Amazon Elastic Load Balancing
Duy Tan Geek
 

Viewers also liked (20)

(STG401) Amazon S3 Deep Dive & Best Practices
(STG401) Amazon S3 Deep Dive & Best Practices(STG401) Amazon S3 Deep Dive & Best Practices
(STG401) Amazon S3 Deep Dive & Best Practices
 
Amazon's Simple Storage Service (S3)
Amazon's Simple Storage Service (S3)Amazon's Simple Storage Service (S3)
Amazon's Simple Storage Service (S3)
 
Amazon S3 Overview
Amazon S3 OverviewAmazon S3 Overview
Amazon S3 Overview
 
Amazon EC2 Masterclass
Amazon EC2 MasterclassAmazon EC2 Masterclass
Amazon EC2 Masterclass
 
S3 Overview Presentation
S3 Overview PresentationS3 Overview Presentation
S3 Overview Presentation
 
Amazon S3 Deep Dive
Amazon S3 Deep DiveAmazon S3 Deep Dive
Amazon S3 Deep Dive
 
Deep Dive on Amazon S3
Deep Dive on Amazon S3Deep Dive on Amazon S3
Deep Dive on Amazon S3
 
AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...
AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...
AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...
 
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
 
Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013
Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013
Maximizing Amazon S3 Performance (STG304) | AWS re:Invent 2013
 
(SDD413) Amazon S3 Deep Dive and Best Practices | AWS re:Invent 2014
(SDD413) Amazon S3 Deep Dive and Best Practices | AWS re:Invent 2014(SDD413) Amazon S3 Deep Dive and Best Practices | AWS re:Invent 2014
(SDD413) Amazon S3 Deep Dive and Best Practices | AWS re:Invent 2014
 
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web Services
 
Amazon S3 and EC2
Amazon S3 and EC2Amazon S3 and EC2
Amazon S3 and EC2
 
Digital Media Ingest and Storage Options on AWS
Digital Media Ingest and Storage Options on AWSDigital Media Ingest and Storage Options on AWS
Digital Media Ingest and Storage Options on AWS
 
Amazon Elastic Load Balancing
Amazon Elastic Load BalancingAmazon Elastic Load Balancing
Amazon Elastic Load Balancing
 
AWS Deployment Best Practices - AWS Symposium 2014 - Washington D.C.
AWS Deployment Best Practices - AWS Symposium 2014 - Washington D.C. AWS Deployment Best Practices - AWS Symposium 2014 - Washington D.C.
AWS Deployment Best Practices - AWS Symposium 2014 - Washington D.C.
 
Understanding The Benefits Of Amazon EC2
Understanding The Benefits Of Amazon EC2Understanding The Benefits Of Amazon EC2
Understanding The Benefits Of Amazon EC2
 
Availability & Scalability with Elastic Load Balancing & Route 53 (CPN204) | ...
Availability & Scalability with Elastic Load Balancing & Route 53 (CPN204) | ...Availability & Scalability with Elastic Load Balancing & Route 53 (CPN204) | ...
Availability & Scalability with Elastic Load Balancing & Route 53 (CPN204) | ...
 

Similar to Deep Dive on Amazon S3

Similar to Deep Dive on Amazon S3 (20)

AWS April 2016 Webinar Series - S3 Best Practices - A Decade of Field Experience
AWS April 2016 Webinar Series - S3 Best Practices - A Decade of Field ExperienceAWS April 2016 Webinar Series - S3 Best Practices - A Decade of Field Experience
AWS April 2016 Webinar Series - S3 Best Practices - A Decade of Field Experience
 
Deep Dive on Amazon S3
Deep Dive on Amazon S3Deep Dive on Amazon S3
Deep Dive on Amazon S3
 
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Deep Dive on Amazon S3 - March 2017 AWS Online Tech TalksDeep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
 
Deep Dive on Amazon S3 (May 2016)
Deep Dive on Amazon S3 (May 2016)Deep Dive on Amazon S3 (May 2016)
Deep Dive on Amazon S3 (May 2016)
 
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
 
Builders' Day - Best Practises for S3 - BL
Builders' Day - Best Practises for S3 - BLBuilders' Day - Best Practises for S3 - BL
Builders' Day - Best Practises for S3 - BL
 
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
2016 Utah Cloud Summit: AWS S3
2016 Utah Cloud Summit: AWS S32016 Utah Cloud Summit: AWS S3
2016 Utah Cloud Summit: AWS S3
 
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
Protect & Manage Amazon S3 & Amazon Glacier Objects at Scale (STG316-R1) - AW...
Protect & Manage Amazon S3 & Amazon Glacier Objects at Scale (STG316-R1) - AW...Protect & Manage Amazon S3 & Amazon Glacier Objects at Scale (STG316-R1) - AW...
Protect & Manage Amazon S3 & Amazon Glacier Objects at Scale (STG316-R1) - AW...
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and Archive
 
February 2016 Webinar Series - Use AWS Cloud Storage as the Foundation for Hy...
February 2016 Webinar Series - Use AWS Cloud Storage as the Foundation for Hy...February 2016 Webinar Series - Use AWS Cloud Storage as the Foundation for Hy...
February 2016 Webinar Series - Use AWS Cloud Storage as the Foundation for Hy...
 
Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...
Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...
Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...
 
AWS Data Lifecycle and Storage Management Demo
AWS Data Lifecycle and Storage Management DemoAWS Data Lifecycle and Storage Management Demo
AWS Data Lifecycle and Storage Management Demo
 
Visualizing Amazon S3 Storage Management with QuickSight - AWS Online Tech Talks
Visualizing Amazon S3 Storage Management with QuickSight - AWS Online Tech TalksVisualizing Amazon S3 Storage Management with QuickSight - AWS Online Tech Talks
Visualizing Amazon S3 Storage Management with QuickSight - AWS Online Tech Talks
 
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
 
Storage & Content Delivery
Storage & Content Delivery Storage & Content Delivery
Storage & Content Delivery
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Deep Dive on Amazon S3

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Carl Summers, Software Development Engineer Omair Gillani, Sr. Product Manager 4/19/2016 Amazon S3 Deep Dive
  • 2. AWS storage maturity Amazon EFS File Amazon EBS Amazon EC2 instance store Block Amazon S3 Amazon Glacier Object Data transfer AWS Direct Connect Snowball ISV connectors Amazon Kinesis Firehose Transfer Acceleration AWS Storage Gateway
  • 3. Our customer promise Durable 11 9s Available Designed for 99.99% Scalable Gigabytes -> Exabytes
  • 4. Cross-region replication - Amazon CloudWatch metrics for Amazon S3 - AWS CloudTrail support VPC endpoint for Amazon S3 Amazon S3 bucket limit increase Event notifications Read-after-write consistency in all regions Innovation for Amazon S3
  • 5. Amazon S3 Standard-IA Expired object delete marker Incomplete multipart upload expiration Lifecycle policy Transfer Acceleration Innovation for Amazon S3, continued…
  • 6. Choice of storage classes on Amazon S3 Standard Active data Archive dataInfrequently accessed data Standard - Infrequent Access Amazon Glacier
  • 7. File sync and share + consumer file storage Backup and archive + disaster recovery Long retained data Some use cases have different requirements
  • 8. 11 9s of durability Standard-Infrequent Access storage Designed for 99.9% availability Durable Available Same throughput as Amazon S3 Standard storage High performance • Server-side encryption • Use your encryption keys • KMS-managed encryption keys Secure • Lifecycle management • Versioning • Event notifications • Metrics Integrated • No impact on user experience • Simple REST API • Single bucket Easy to use
  • 10. Understand your cloud storage Amazon S3 S3 Server Access Logs Amazon S3 Hive on Amazon EMR Amazon S3 Aggregation Aggregation result storage Aggregation result analysis Persist prepared datasets Load prepared data Pre-processed data storage Amazon Redshift
  • 11. Amazon S3 S3 Server Access Logs Amazon S3 Hive on Amazon EMR Amazon S3 Aggregation Aggregation result storage Persist prepared datasets Load prepared data Load data Pre-processed data storage 1. Enable Access Logs 2. Create EMR Cluster 3. Spark code to aggregate logs 4. Submit code to EMR 5. Persist interim results on S3 7. Visualize Data Aggregation result analysis Amazon Redshift 6. Persist final results on S3
  • 12. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DEMO Understanding your cloud storage
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. • Main Spark app • Persist pre-processed data in S3
  • 22. • Prefix aggregation • Persist result in S3
  • 23. Amazon S3 as your persistent data store Separate compute and storage Resize and shut down Amazon EMR clusters with no data loss Point multiple Amazon EMR clusters at the same data in Amazon S3 EMR EMR Amazon S3
  • 24. EMRFS makes it easier to use Amazon S3 • Read-after-write consistency • Very fast list operations • Error handling options • Support for Amazon S3 encryption • Transparent to applications: s3://
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 31. Lifecycle policies • Automatic tiering and cost controls • Includes two possible actions: • Transition: archives to Standard-IA or Amazon Glacier after specified time • Expiration: deletes objects after specified time • Allows for actions to be combined • Set policies at the prefix level Lifecycle policies
  • 32. - Transition Standard to Standard-IA - Transition Standard-IA to Amazon Glacier storage - Expiration lifecycle policy - Versioning support - Directly PUT to Standard-IA Standard-Infrequent Access storage Integrated: Lifecycle management Standard - Infrequent Access
  • 33. Lifecycle policy Standard Storage -> Standard-IA <LifecycleConfiguration> <Rule> <ID>sample-rule</ID> <Prefix>documents/</Prefix> <Status>Enabled</Status> <Transition> <Days>30</Days> <StorageClass>STANDARD-IA</StorageClass> </Transition> <Transition> <Days>365</Days> <StorageClass>GLACIER</StorageClass> </Transition> </Rule> </LifecycleConfiguration> Standard-Infrequent Access storage
  • 34. Lifecycle Policy Standard Storage -> Standard-IA <LifecycleConfiguration> <Rule> <ID>sample-rule</ID> <Prefix>documents/</Prefix> <Status>Enabled</Status> <Transition> <Days>30</Days> <StorageClass>STANDARD-IA</StorageClass> </Transition> <Transition> <Days>365</Days> <StorageClass>GLACIER</StorageClass> </Transition> </Rule> </LifecycleConfiguration> Standard-IA Storage -> Amazon Glacier Standard-Infrequent Access storage
  • 35. Versioning S3 buckets • Protects from accidental overwrites and deletes • New version with every upload • Easy retrieval of deleted objects and roll back • Three states of an Amazon S3 bucket • Default – Unversioned • Versioning-enabled • Versioning-suspended Versioning Best Practice
  • 37. Expired object delete marker policy • Deleting a versioned object makes a delete marker the current version of the object • No storage charge for delete marker • Removing delete marker can improve list performance • Lifecycle policy to automatically remove the current version delete marker when previous versions of the object no longer exist Expired object delete marker
  • 38. Example lifecycle policy to remove current versions <LifecycleConfiguration> <Rule> ... <Expiration> <Days>60</Days> </Expiration> <NoncurrentVersionExpiration> <NoncurrentDays>30</NoncurrentDays> </NoncurrentVersionExpiration> </Rule> </LifecycleConfiguration> Expired object delete marker policy Leverage lifecycle to expire current and non-current versions S3 Lifecycle will automatically remove any expired object delete markers
  • 39. Example lifecycle policy for non-current version expiration Lifecycle configuration with NoncurrentVersionExpiration action removes all the noncurrent versions, <LifecycleConfiguration> <Rule> ... <Expiration> <ExpiredObjectDeleteMarker>true</ExpiredObjectDeleteMarker> </Expiration> <NoncurrentVersionExpiration> <NoncurrentDays>30</NoncurrentDays> </NoncurrentVersionExpiration> </Rule> </LifecycleConfiguration> Expired object delete marker policy ExpiredObjectDeleteMarker element removes expired object delete markers.
  • 40. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DEMO Expired object delete marker policy
  • 41. Tip: Restricting deletes • Bucket policies can restrict deletes • For additional security, enable MFA (multi-factor authentication) delete, which requires additional authentication to: • Change the versioning state of your bucket • Permanently delete an object version • MFA delete requires both your security credentials and a code from an approved authentication device Best Practice
  • 43. Parallelizing PUTs with multipart uploads • Increase aggregate throughput by parallelizing PUTs on high-bandwidth networks • Move the bottleneck to the network where it belongs • Increase resiliency to network errors; fewer large restarts on error-prone networks Best Practice
  • 44. Multipart upload provides parallelism • Allows faster, more flexible uploads • Allows you to upload a single object as a set of parts • Upon upload, Amazon S3 then presents all parts as a single object • Enables parallel uploads, pausing and resuming an object upload and starting uploads before you know the total object size
  • 45. Incomplete multipart upload expiration policy • Multipart upload feature improves PUT performance • Partial upload does not appear in bucket list • Partial upload does incur storage charges • Set a lifecycle policy to automatically expire incomplete multipart uploads after a predefined number of days Incomplete multipart upload expiration
  • 46. Example lifecycle policy Abort incomplete multipart uploads seven days after initiation <LifecycleConfiguration> <Rule> <ID>sample-rule</ID> <Prefix>SomeKeyPrefix/</Prefix> <Status>rule-status</Status> <AbortIncompleteMultipartUpload> <DaysAfterInitiation>7</DaysAfterInitiation> </AbortIncompleteMultipartUpload> </Rule> </LifecycleConfiguration> Incomplete multipart upload expiration policy
  • 47. Parallelize your GETs • Use range-based GETs to get multithreaded performance when downloading objects • Compensates for unreliable networks • Benefits of multithreaded parallelism • Align your ranges with your parts!Best Practice
  • 48. Parallelizing LIST Parallelize LIST when you need a sequential list of your keys Secondary index to get a faster alternative to LIST • Sorting by metadata • Search ability • Objects by timestamp Best Practice
  • 49. SSL best practices to optimize performance • Use the SDKs!! • EC2 instance types • AES-NI hardware acceleration (cat /proc/cpuinfo) • Threads can work against you (finite network capacity) • Timeouts • Connection pooling • Perform keep-alives to avoid handshake Best Practice
  • 51. Distributing key names …because this is going to happen 1 2 N 1 2 N Partition Partition Partition Partition
  • 52. Distributing key names Add randomness to the beginning of the key name… <my_bucket>/521335461-2013_11_13.jpg <my_bucket>/465330151-2013_11_13.jpg <my_bucket>/987331160-2013_11_13.jpg <my_bucket>/465765461-2013_11_13.jpg <my_bucket>/125631151-2013_11_13.jpg <my_bucket>/934563160-2013_11_13.jpg <my_bucket>/532132341-2013_11_13.jpg <my_bucket>/565437681-2013_11_13.jpg <my_bucket>/234567460-2013_11_13.jpg <my_bucket>/456767561-2013_11_13.jpg <my_bucket>/345565651-2013_11_13.jpg <my_bucket>/431345660-2013_11_13.jpg
  • 53. Distributing key names …so your transactions can be distributed across the partitions 1 2 N 1 2 N Partition Partition Partition Partition Best Practice
  • 54. Other techniques for distributing key names Store objects as a hash of their name • add the original name as metadata • “deadmau5_mix.mp3”  0aa316fb000eae52921aab1b4697424958a53ad9 • prepend key name with short hash • 0aa3-deadmau5_mix.mp3 (reverse) • 5321354831-deadmau5_mix.mp3 Best Practice
  • 55. Recap • Amazon S3 Standard-Infrequent Access • Using big data on Amazon S3 for analysis • Amazon S3 management policies • Versioning for Amazon S3 • Best practices and performance optimization for Amazon S3