(STG406) Using S3 to Build and Scale an Unlimited Storage Service

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using Amazon S3 to Build and Scale
an Unlimited Storage Service for
Millions of Consumers
Tarlochan Cheema
Kevin Christen
October 2015
STG406

What to expect from the session
• What is Amazon Cloud Drive?
• Key Challenges
• Services design & architecture
• Content store deep dive
• Lessons learned

What Is Amazon Cloud Drive?
• Unlimited cloud storage from Amazon for consumers
• Subscription based storage plans
Unlimited photos
Unlimited photo storage, plus
5 GB for videos and files for just
$11.99 per year.
Unlimited everything
Securely store all of your photos,
videos, files and documents for just
$59.99 per year.
https://www.amazon.com/clouddrive/

How do I use it from anywhere and any device?
Amazon Apps for photos and files
Mobile Computer
Mac PC Web
https://www.amazon.com/clouddrive/apps

What’s in it for developers & partners?
Reach millions of customers
RESTful APIs
Android & iOS SDKs
Revenue sharing
https://developer.amazon.com/public/apis/experience/cloud-drive/

A growing partner ecosystem
Access to millions of Amazon customers
Revenue-sharing for developers and partnersnew!
https://www.amazon.com/clouddrive/apps

Key challenges?
• Unlimited storage
• Millions of users
• Billions of files
• Variety of content (photos/videos/docs)
• Variety of metadata
• Flexible indexing & querying
• Terabytes of logs

Key design goals?
• Highly scalable
• Durable
• Reliable
• RESTful
• Low latency
• Near real-time queries
• Consistency
• Idempotency
• Low cost

Amazon Cloud Drive service architecture
Indexing & Query
Analytics
AppsUsers
Asynchronous
Pipeline
Amazon
Kinesis
Stream
Message
Queue
Amazon Cloud Drive service Amazon EC2
Content
Store
Amazon S3
Metadata
Store
Amazon
DynamoDB
Notifications
Content
Processing
Amazon Elastic
Transcoder
Amazon
ELB

What does Cloud Drive store in Amazon S3?
• Customer content
• Derived content
• Transcoded videos
• Thumbnails of videos, documents
• Log files
• Dynamic configuration
• DynamoDB backups
• Using the publicly available AWS Java SDK

Storing customer content
• Single Amazon S3 bucket per geographical region
• Billions of objects per content bucket
• Randomly generated keys
• Keys are stored in Amazon DynamoDB
• Avoids hot key prefixes
• No list operations
• Amazon S3 server-side encryption
• AES 256

Managing log files
• Cloud Drive consists of 800+ servers in 3 AWS regions
• More during peak load times
• 200GB+ logs per hour
• Delivered to Timber log archiving service
• Timber encrypts and stores in Amazon S3

Log file types
• Application logs
• Time-stamped and severity-tagged messages
• Service logs
• Amazon-wide standard format
• Record per service invocation
• Source for metrics
• Wire logs

Log files
All logs archived in Amazon S3 by Timber

Log files
Service logs processed into Amazon Redshift load files

Log files
Amazon Redshift COPY command loads
files into data warehouse in parallel

Coordinating dynamic configuration
• Dynamic values like feature toggles
• Enable feature for test customers
• Dial capabilities up from 0% -> 100%
• Configuration files stored in S3
• Servers poll for changes using HTTP HEAD
(GetObjectMetadata)
• File is reloaded only if ETag has changed

Challenge 1/6: Upload size variation
• Uploads vary widely in size
• Text files to VM images
• Even images vary from 10K GIFs to 20MB RAW
• Maintain reasonable performance for all file sizes
• Prevent large files from causing resource starvation

Challenge 1/6: Upload size variation
• Solution: Size-aware upload logic
• Size < 15MB: PUT object
• Upload performed by the request thread
• Size larger or unknown: multipart upload API
• Parts uploaded by a thread pool with blocking array in front
• Fixed-size 5MB parts
• 50GB file size limit, due to 10,000 part limit for multipart API

Challenge 2/6: Rapid upload availability
• Content should be available as soon as possible
• But some content processing takes time
• Solution: a mix of synchronous, asynchronous, and
optimistic synchronous processing

• Metadata extraction from images and videos
• Quick
• Largely independent of file size
Synchronous
Asynchronous
Optimistic synchronous

• Video transcoding
• Necessary for playback on different devices
• Time consuming and size dependent
• We use the Amazon Elastic Transcoder service
Synchronous
Asynchronous

• Document transformation to PDF
• Timing is unpredictable
• Try synchronous with a timeout
• If timeout, queue SQS message for async processing
Synchronous
Asynchronous

Challenge 3/6: Intermittent connections
• Clients may have slow and intermittent connections to
our service
• Especially mobile devices
• This makes uploading a large file in a single HTTP
request difficult
• But multipart upload APIs are complex
• Especially for the happy path
• Solution: Resumable uploads

• Client attempts large upload
• If it fails mid-stream, Cloud Drive saves the transmitted bytes
• Leveraging existing Amazon S3 multipart upload
• Client queries for resumption point
• Client resumes upload
• HTTP Content-Range header
• Cloud Drive completes multipart upload

• Problem: Can’t use instance profile credentials from
different instances for a single multipart upload

• We used the AWS Security Token Service (STS)
to provide consistent credentials for each step of
the upload
• Amazon S3 presigned URLs are another option
• http://amzn.to/1FLeoii

Challenge 4/6: Download size variation
• Like uploads, downloads vary widely in size
• Maintain reasonable performance for all file sizes
• Prevent large requests from causing resource starvation
• Solution: Size-aware download logic

• Small downloads (<5MB)
• Single GET object
• In the request thread
• Retry once on failure
• This covers 90% of our customer’s files

• Large downloads
• Custom parallel download logic for large files
• 5MB part size (range requests)
• Dedicated thread pool with blocking queue to avoid affecting
uploads, small file downloads
• Connection reuse
• Single retry on failure or timeout
• Uses Apache HTTPClient

Challenge 5/6: Thumbnails of large images
• High traffic for thumbnails of images
• 3000+ requests per second
• Image thumbnails generated on-the-fly
• Large images thumbnails are expensive
• Large object to download from Amazon S3
• More time to generate thumbnail

Content
Bucket
Cloud
Drive
Thumbnail
Bucket
Solution: Create an intermediate JPEG
thumbnail and cache it in Amazon S3

• Cache in S3 bucket with 48 hour expiry
• Key on hash of customer id + image id + image version
• 2k X 2k JPEG, ~1MB
• Cache candidates:
• JPEG, PNG, TIFF >10MB
• All other images (primarily RAW)

Challenge 6/6: Large direct downloads
• No on-the-fly transformations to large files
• Downloading to disk doesn’t make sense
• Redirect to a short-lived Amazon S3 presigned URL

Takeaways
• Amazon S3 is flexible
• Not just for big data, but caching, coordinating configuration
• Selection of Amazon S3 keys is important
• Upload and download strategies depend on file size
and workflow
• First fallacy of distributed computing: the network
is reliable
• Retrying upload and download requests may be appropriate
• Limit retries

Final Thoughts
Experience Amazon Cloud Drive
amazon.com/clouddrive
Build Apps with Amazon Cloud Drive API
developer.amazon.com/public/apis/experience/cloud-drive
Earn revenue & reach millions of Amazon customers
http://tinyurl.com/Cloud-drive-revenue

Remember to complete
your evaluations!

(STG406) Using S3 to Build and Scale an Unlimited Storage Service

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to (STG406) Using S3 to Build and Scale an Unlimited Storage Service

Similar to (STG406) Using S3 to Build and Scale an Unlimited Storage Service (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

(STG406) Using S3 to Build and Scale an Unlimited Storage Service