The document provides an overview of AWS storage services including block storage, shared file systems, and object storage. It begins with an introduction to why AWS is chosen for storage and lists the global AWS infrastructure. It then covers block storage options like Amazon EBS, file storage with Amazon EFS, and object storage with Amazon S3. Specific features of each service are described like durability, availability and pricing. Example use cases are provided for each storage type.
Seton hall reference - https://na32.salesforce.com/a3l500000000EOWAA2
Each storage option has a unique combination of performance, durability, cost, and interface
Each storage option has a unique combination of performance, durability, cost, and interface
Each storage option has a unique combination of performance, durability, cost, and interface.
AWS SNOWBALL is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of the AWS cloud. Using Snowball addresses common challenges with large-scale data transfers including high network costs, long transfer times, and security concerns. Transferring data with Snowball is simple, fast, secure, and can be as little as one-fifth the cost of high-speed Internet.
AWS SNOWMOBILE is NEW and its a secure, Exabyte-scale data transfer service used to transfer large amounts of data into and out of AWS. Each Snowmobile can transfer up to 100PB. When you order a Snowmobile it comes to your site and AWS personnel connect a removable, high-speed network switch from Snowmobile to your local network. This makes Snowmobile appear as a network attached data store. Once it is connected, secure, high-speed data transfer begins. After your data is transferred to Snowmobile, it is driven back to AWS where the data is loaded into the AWS service you select, including S3, Glacier, Redshift and others. It allows customers with large amounts of data to migrate to AWS much faster and easier.
High-level description of EBS: network-based virtual disks, pay for what you provision, build-in redundancy (essentially RAID10), optimized for random I/O
Network device
Data lifecycle is independent from EC2 instance lifecycle
Each volume is like a hard drive on a physical server
Attach multiple volumes to an EC2 instance, but only one EC2 instance per volume
POSIX-compliant file systems
Virtual disk ideal for: OS boot device; file systems
Raw block devices
Ideal for Databases (Oracle Active Storage Manager)
Other raw block devices
Amazon Web Services give you reliable, durable backup storage without the up-front capital expenditures and complex capacity-planning burden of on-premises storage. Amazon storage services remove the need for complex and time-consuming capacity planning, ongoing negotiations with multiple hardware and software vendors, specialized training, and maintenance of offsite facilities or transportation of storage media to third party offsite locations.
STORY BACKGROUND
University of Maryland University College (UMUC) is an open-access university serving working adult students pursuing higher education through on-site and online courses.
When its legacy applications were due for renewal, UMUC decided turned to AWS to run its analytics platform and several administrative workloads.
By using Amazon Redshift, UMUC has seen a twenty-fold increase in the performance of its analytics platform allowing it to build more accurate predictive models and dashboard to improve student outcomes.
SOLUTION
[Main use case]. Big Data, Analytics and Business Intelligence (BI)
[Additional use cases]. Storage and Backup; Disaster Recovery & Archiving
[Keywords separated by commas]. Amazon Redshift, analytics, predictive, model, student outcome, university, education, public sector.
[List all AWS Services used by the customer]. Using Amazon EC2, Amazon RDS for Oracle, and Amazon RedShift
BENEFITS
The university built its new analytics platform on AWS leveraging Amazon Redshift and Amazon RDS for Oracle.
UMUC reports a 2x to 20x improvement in ETL performance for its analytics platform compared to its previous legacy applications
Using AWS enables UMUC engineers to focus on creating new applications instead of managing infrastructure
[Benefits Realized]. Better Performance, Lower Cost, Security
Describe EBS standard volumes as “best effort” and PIOPs as providing consistent performance. Mention how the most predictable performance will come by using EBS-Optimized instances to obtain dedicated storage throughput.
Chart on the left describes the expected throughput and max expected 16K IOPs for various instance sizes.
This table describes the use cases and performance characteristics for each volume type: Source: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html
Describe EBS standard volumes as “best effort” and PIOPs as providing consistent performance. Mention how the most predictable performance will come by using EBS-Optimized instances to obtain dedicated storage throughput.
Chart on the left describes the expected throughput and max expected 16K IOPs for various instance sizes.
http://aws.amazon.com/blogs/aws/enhanced-ebs-throughput/
About 256KB I/O requests:
16 times as cost-effective as previous 16 KB
Using multiple GP2 or PIOPS volumes you can achieve up to 800 MB/s
Referenced during “innovation @ Scale by James Hamilton” - https://www.youtube.com/watch?v=JIQETrFC_SQ
Also - http://aws.amazon.com/blogs/aws/larger-faster-ebs-ssd-volumes/
Describe how EBS snapshots work.
Free with your EC2 Instance
SAS and SSD options
Size/type based on instance type
Zero Network Overhead; local, direct attached resource.
Consistent performance for sequential reads and writes
Volatile
Free with your EC2 Instance
SAS and SSD options
Size/type based on instance type
Zero Network Overhead; local, direct attached resource.
Consistent performance for sequential reads and writes
Volatile
Currently (09/22/2015) in Preview mode
Describe how EFS works
Athena detailed slide in Appendix
Highlight customer architecture and how durability, avail, performance, and scalability relate to application type
Amazon Glacier provides three ways to retrieve your archives to meet varying access time and cost requirements: Expedited, Standard, and Bulk retrievals. Archives requested using Expedited retrievals are typically available within 1 – 5 minutes, allowing you to quickly access your data when occasional urgent requests for a subset of archives are required. With Standard retrievals, archives typically become accessible within 3 – 5 hours. Or you can use Bulk retrievals to cost-effectively access significant portions of your data, even petabytes, for just a quarter-of-a-cent per GB.
Updated pricing as of Dec 23, 2016
Amazon Glacier provides three ways to retrieve your archives to meet varying access time and cost requirements: Expedited, Standard, and Bulk retrievals. Archives requested using Expedited retrievals are typically available within 1 – 5 minutes, allowing you to quickly access your data when occasional urgent requests for a subset of archives are required. With Standard retrievals, archives typically become accessible within 3 – 5 hours. Or you can use Bulk retrievals to cost-effectively access significant portions of your data, even petabytes, for just a quarter-of-a-cent per GB.
The AWS SGW is typically deployed in your existing storage environment as a VM.
You connect your existing applications, storage systems, or devices to the SGW. The SGW provides standard storage protocol interfaces so apps can connect to it without changes.
The gateway in turn connects to AWS so you can store data securely and durably in Amazon S3, Glacier.
The gateway optimizes data transfer from on-premises to AWS. It also provides low-latency access through a local cache so your apps can access frequently used data locally. The service is also integrated with Cloudwatch, cloudtrail, IAM, etc. so you get an extension of aws management services locally.
---
“Enable cloud storage on-premises as part of your AWS platform”
“Native access
Industry standard protocols for file, block, and tape
Secure and durable storage in Amazon S3 and Glacier
Optimized data transfer from on-premises to AWS
Low-latency access to frequently used data
Integrated with AWS security and management services
The file gateway enables you to store and retrieve objects in Amazon S3 using industry-standard file protocols. Files are stored as objects in your S3 buckets, accessed through a Network File System (NFS) mount point. Ownership, permissions, and timestamps are durably stored in S3 in the user-metadata of the object associated with the file. Once objects are transferred to S3, they can be managed as native S3 objects, and bucket policies such as versioning, lifecycle management, and cross-region replication apply directly to objects stored in your bucket.
Customers use the file interface to migrate file data into S3 for use by object-based workloads, as a cost-effective storage target for traditional backup applications, and as a tier in the cloud for on-premises file storage.
The volume gateway presents your applications with disk volumes using the iSCSI block protocol. Data written to these volumes can be asynchronously backed up as point-in-time snapshots of your volumes, and stored in the cloud as Amazon EBS snapshots. You can set the schedule for when snapshots occur or create them via the AWS Management Console or service API. Snapshots are incremental backups that capture only changed blocks. All snapshot storage is also compressed to minimize your storage charges.
When connecting with the block interface, you can run the gateway in two modes: cached and stored.
In cached mode, you store your primary data in Amazon S3 and retain your frequently accessed data locally. With this mode, you can achieve substantial cost savings on primary storage, minimizing the need to scale your storage on-premises, while retaining low-latency access to your frequently accessed data. You can configure up to 32 volumes of 32 TB, for a total 1PB storage per gateway.
In stored mode, you store your entire data set locally, while performing asynchronous backups of this data in Amazon S3. This mode provides durable and inexpensive offsite backups that you can recover locally or from Amazon EC2.
Example applications such as databases or computational workloads
… where low latency is critical and the working set of data is large, ill-defined, or constantly changing.
On a stored volume gateway you can configure up to 32 volumes
… up to 16 TB each
… for a total of 512 TB per gateway
The tape gateway presents the Storage Gateway to your existing backup application as an industry-standard iSCSI-based virtual tape library (VTL), consisting of a virtual media changer and virtual tape drives. You can continue to use your existing backup applications and workflows while writing to a nearly limitless collection of virtual tapes. Each virtual tape is stored in Amazon S3. When you no longer require immediate or frequent access to data contained on a virtual tape, you can have your backup application archive it from the virtual tape library into Amazon Glacier, further reducing storage costs.
Storage Gateway is currently compatible with most leading backup applications. The VTL interface eliminates large upfront tape automation capital expenses, multi-year maintenance contract commitments and ongoing media costs. You pay only for the capacity you use and scale as your needs grow. The need to transport storage media to offsite facilities and handle tape media manually goes away, and your archives benefit from the design and durability of the AWS cloud platform.
In your VTL you can configure up to 1,500 tapes
… up to 2.5 TB each (LTO-6 size)
… for a total of 1PB TB per VTL
We see 3 broad categories of hybrid storage where SGW helps customers.
Let’s look in a little more detail at each of these.
(This is our service tenets said in a customer facing way – out value prop)
Capabilities:
Standard storage protocols integrate with on-premises applications
Transparent local caching for low-latency access to frequency used data
Asynchronous upload to AWS for durable storage of changed data
Efficient data transfer with local buffering and bandwidth management
Direct storage in AWS storage services
Resilient stateless on-premises gateway
Integrated with AWS management and security services
Original Snowball had 50 PB capacity; AWS Snowball Edge, like the original Snowball, is a petabyte-scale data transfer solution, but transports more data, up to 100TB of data, and retains the same embedded cryptography and security as the original Snowball. However, Snowball Edge hosts a file server and an S3-compatible endpoint that allow you to use the NFS protocol, S3 SDK or S3 CLI to transfer data directly to the device without specialized client software. Multiple units may be clustered together, forming a temporary data collection storage tier in your datacenter so you can work as data is generated without managing copies. As storage needs scale up and down, devices can be easily added or removed from the local cluster and returned to AWS.
What is AWS Import/Export Snowball?
Snowball is a new AWS Import/Export offering that provides a petabyte-scale data transfer service that uses Amazon-provided storage devices for transport. Previously customers purchased their own portable storage devices and used these devices to ship their data. With the launch of Snowball customers are now able to use
highly secure, rugged Amazon-owned Network Attached Storage (NAS) devices, called Snowballs, to ship their data. Once received and set up, customers are able to copy up to 50TB data from their on prem file system to the Snowball via the Snowball client software via a 10Gbps network interface . Prior to transfer to the Snowball all data is encrypted by 256-bit GSM encryption by the client. When customers finish transferring data to the device they simply ship it back to an AWS facility where the data is ingested
at high speed into Amazon S3.
The Snowball service is completely driven by the AWS console like our other services. In the console a customer is able to access the Snowball service under the AWS Import/Export Snowball link. Once there, a customer simply needs to create a data transfer job, specifying the S3 bucket(s) to use, the KMS encryption keys and the location they need a device shipped to. Once the device is received, the customer needs to connect the Snowball to power and the network, providing an IP address either manually or via DHCP. From there data is copied to the Snowball via the client software, a command line tool loaded on a host in the environment with encrypts all data before it is transferred to the Snowball. Once the data transfer is complete, simply power down the device and the return shipping information will update on the e-ink display automatically. Once the device is returned to Amazon, we will complete the data transfer from the Snowball to the specified S3 buckets. During this entire process the customer is notified during each step through the console, Amazon SNS, and/or via text message.
AWS Snowmobile is a secure, Exabyte-scale data transfer service used to transfer large amounts of data into and out of AWS. Each Snowmobile can transfer up to 100PB. When you order a Snowmobile it comes to your site and AWS personnel connect a removable, high-speed network switch from Snowmobile to your local network. This makes Snowmobile appear as a network attached data store. Once it is connected, secure, high-speed data transfer begins. After your data is transferred to Snowmobile, it is driven back to AWS where the data is loaded into the AWS service you select, including S3, Glacier, Redshift and others.
The team will consider and assess global requests as well
Snapshot Benefit: POSIX store multi-AZ redundancy. Failover to alternate AZ during AZ-local event. Store only actual blocks rather than allocation during extended offline periods.
Meta-Data Benefit: indexed meta-data for rapid searching, more advanced selection systems than direct to S3, faster response time for advanced queries compared to list operations on S3.
Caching Benefit: Reduced latency to objects from applications, reduced cost for subsequent gets (read local).
Edge Caching Benefit: Reduced latency to objects from customers, significant fault tolerance in availability of objects, serves RTMP streams w/o a running server (significant EC2 cost savings)
It is all about choice. Pick the technology that delivers the right performance at the right price. AWS allows you to consume one or multiple services as needed and to pay only for the capacity you use.