Raju Gulabani, vice president of AWS Database Services (AWS), discusses the evolution of database services on AWS and the new database services and features we launched this year, and shares our vision for continued innovation in this space. We are witnessing an unprecedented growth in the amount of data collected, in many different shapes and forms. Storage, management, and analysis of this data requires database services that scale and perform in ways not possible before. AWS offers a collection of such database and other data services like Amazon Aurora, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon ElastiCache, Amazon Kinesis, and Amazon EMR to process, store, manage, and analyze data. In this session, we provide an overview of AWS database services and discuss how our customers are using these services today.
2. What to Expect from the Session
• Learn our strategy and overview of our key services
• Get a sense of our scale and key customers per service
• Understand when to use which services for your apps
3. Strategy
• Start from the customer and work backwards
• Offer managed services
• Leverage the cloud architecture
• Support migration of apps and data from/to on-premises
• Multiple services, each optimized for different use case
4. Comprehensive Product Portfolio
Traditional Apps
Relational Databases
NoSQL & In-MemoryBig
Data
RDS
Aurora
Database
Migration
Service
Relational Databases
DynamoDB
ElastiCache
NoSQL & In-Memory
Amazon
Redshift
EMR
Data Pipeline
Athena
Big Data
QuickSight
Elasticsearch
Amazon ML
Analytics
5. Database Services Usage
• Amazon Aurora is the fastest growing service in AWS history
• More than 14,000 databases have been migrated using AWS
Database Migration Service
• DynamoDB served over 56 billion extra requests worldwide on
Prime Day compared to the same day the previous week.
8. • Multi-engine support: Aurora, MySQL, MariaDB, PostgreSQL,
Oracle, SQL Server
• Automated provisioning, patching, scaling, backup/restore,
failover
• Use with GP2 or Provisioned IOPS storage
• High availability with RDS Multi-AZ
– 99.95% SLA for Multi-AZ deployments
Amazon RDS
Amazon Aurora
9. Key Insight: Relational Databases are Complex
• Our experience running Amazon.com taught us that
relational databases can be a pain to manage and
operate with high availability
• Poorly-managed relational databases are a leading
cause of lost sleep and downtime in the IT world!
10. • Lower TCO because we manage the muck
• Get more leverage from your teams
• Focus on the things that differentiate you
• Built-in high availability and cross region replication across multiple data centers
• Available on all engines, including base/standard editions, not just for enterprise editions
• Now even a small startup can leverage multiple data centers to design highly
available apps with over 99.95% availability.
We Made Things Cheaper, Easier, and Better
11. Enterprise-grade fault tolerance
solution for production
databases
Automatic failover
Synchronous replication
Inexpensive & enabled with one click
High Availability Multi-AZ Deployments
13. • Airbnb moved its main MySQL database to Amazon RDS
with only 15 minutes of downtime
• RDS simplifies much of the time-consuming administrative
tasks associated with databases so engineers can spend
more time on features
• Uses asynchronous master-slave replication to improve
website performance launched via the RDS console or an
API call
• Leverages multi-Availability Zone (Multi-AZ) for high
availability
Airbnb – Amazon RDS for MySQL
15. Key Questions We Asked
• What if we started from a clean sheet of paper with only constraint being that the
database was a relational database?
• Could we offer much better performance by leveraging the massive scale of our
cloud?
• Could we give you a database with designed durability indistinguishable from 100%
and availability of 99.99%?
• …And could we be better and cheaper than the 30-year old commercial databases in
use today?
16. Yes, We Can. Answer = Amazon Aurora
• A new relational database engine, built from the ground
up to leverage AWS
• For all new apps that require SQL, we recommend Amazon Aurora
• Commercial-grade performance and availability at open
source prices
• Retains compatibility with MySQL 5.6
17. Amazon RDS for Aurora
• MySQL compatible with up to 5x better performance on the
same hardware: 100,000 writes/sec & 500,000 reads/sec
• Scalable with up to 64 TB in single database, up to 15 read
replicas
• Highly available, durable, and fault-tolerant custom SSD
storage layer: 6-way replicated across 3 Availability Zones
• Transparent encryption for data at rest using AWS KMS
• Stored procedures in Amazon Aurora can invoke AWS
Lambda functions
19. Use case: Near real-time analytics and reporting
Master
Read
Replica
Read
Replica
Read
Replica
Shared distributed storage volume
Reader end-point
A customer in the travel industry migrated to Aurora for
their core reporting application accessed by ~1,000
internal users.
Replicas can be created, deleted and scaled within
minutes based on load.
Read-only queries are load balanced across replica
fleet through a DNS endpoint – no application
configuration needed when replicas are added or
removed.
Low replication lag allows mining for fresh data with
no delays, immediately after the data is loaded.
Significant performance gains for core analytics
queries - some of the queries executing in 1/100th
the original time.
► Up to 15 promotable read replicas
► Low replica lag – typically < 10ms
► Reader end-point with load balancing
20. Amazon Aurora is now PostgreSQL-compatible
• PostgreSQL 9.6 compatibility with support for PostGIS
• All the features you expect from Amazon Aurora
including 15 read replicas with <10ms lag, shared
storage, failover without data loss, 6-way replication
across 3 Availability Zones, encryption with AWS KMS
• Available now in preview
21. Simplify monitoring from the
AWS Management Console
Database load: Identifies
database bottlenecks
Easy
Powerful
Identifies source of bottlenecks
Top SQL
Adjustable time frame
Hour, day, week, and longer
Max CPU
Performance Insights for Amazon RDS
22. AWS Database Migration Service
• Fully managed service for migration from on-premises to
the AWS Cloud with minimal downtime
• Migrates data to and from all widely used commercial and
open source DBs
• Schema Conversion Tool that converts source DB schemas,
stored procedures and application code to a different target
format
• Supports homogenous and heterogeneous data replication
• A terabyte-sized DB can be migrated for as little as $3
23. Database Conversion Capabilities in SCT
Source Database Target Database
Microsoft SQL Server Amazon Aurora, MySQL, PostgreSQL
MySQL PostgreSQL
Oracle Amazon Aurora, MySQL, PostgreSQL
Oracle Data Warehouse Amazon Redshift
PostgreSQL Amazon Aurora, MySQL
Teradata, Netezza, Greenplum Amazon Redshift
25. Heterogeneous Migration
• Oracle private DC to RDS PostgreSQL migration
• Used the AWS Schema Conversion Tool to convert their
database schema
• Used on-going replication (CDC) to keep databases in sync
until they reached the cutover window
• Benefits:
• Improved reliability of the cloud environment
• Savings on Oracle licensing costs
• SCT Assessment Report let them understand the scope of the
migration
29. Key Questions We Asked
• Aurora was designed with a single constraint
• SQL compatibility and relational database semantics
• What if we said no to this constraint?
• No to SQL = NoSQL
• Could we eliminate the things we didn’t like about
relational databases?
30. Yes, We Can. Answer = Amazon DynamoDB
• Database that can scale beyond a single box without any changes to your app
• You can start small but know that there is no limit to how successful your app can be
• If your app is running fast today with 10 users, it will always run fast, even when you have 1M,
10M or 100M users using your app
• No need to spend time tuning queries and diagnosing why your app is running slow
• Deliver availability and durability indistinguishable from 100%.
• 99.99% and 60 second failover are not good enough
• You don’t have to manage anything. You don’t even need to know what a database instance is
• No schema. All you need to tell us is the number of reads/sec and writes/sec you want to execute.
We do the rest
32. Lyft Easily Scales Up its Ride Location Tracking System using DynamoDB
It was so simple to scale out.
We had two knobs. One was
for reads and one was for
writes.
Chris Lambert
CTO, Lyft
”
“ • Lyft serves up to 8x more rides during
peak times
• The GPS location for all rides was
tracked in the ride location tracking
system.
• In June, 2014, Lyft deployed DynamoDB
in production.
• Lyft has since moved many of its other
data stores over to DynamoDB as well.
36. RDS and ElastiCache are Behind Grab’s Taxi-Booking App
The latency of a cab call must be low,
and remain low even in times of peak
traffic of hundreds of thousands of
cab requests per minute. We use
ElastiCache for Redis in front of RDS
MySQL to keep our systems’ real
time performance at any scale.
Ryan Ooi
Sr. Devops Engineer, Grab
”
“ • Grab is a popular taxi hailing app in southeast Asia.
• Average response time of the API layer is <40ms, mandating an
in-memory layer to achieve such performance.
• A small devops team that tried running Redis on EC2 before, but
that was too much work. Using both RDS and ElastiCache in
Multi-AZ allowed them to outsource all the management to AWS.
38. Amazon Redshift
• Petabyte-scale, relational, MPP, data
warehousing
• Fully managed with SSD and HDD platforms
• Built-in end-to-end security, including
customer-managed keys
• $1,000/TB/year; start at $0.25/hour
39. Why we built Amazon Redshift
• Customers were generating data in the cloud but moving
it on-premises to analyze it using a data warehouse
• Customers had migrated everything to AWS except their
on-premises data warehouses.
• They wanted to shut down these data centers but could not till we offered
them a solution in the cloud
40. Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Available for analysis
Generated data
1990 2000 2010 2020
Key Insight: Most Data Falls on the Floor
90% of the data in a company
is never analyzed
High costs and complexity of
traditional DW systems make it
hard to justify the capital
expense
41. Key Questions We Asked
• Could we design a system cheap and scalable enough
to let you analyze all your data?
• Could we build a service that was faster, cheaper, and
easier to use than traditional DW systems?
42. Yes, We Can. Answer = Amazon Redshift
• A massively parallel processing (MPP) system with up to 128 compute nodes to
store and process up to 2PB of compressed data
• At $1,000/TB/year, its so cheap that you can analyze all your data
• You can provision a petabyte in under three minutes and pay for it by the hour
• 10x performance and 1/10 the price of other solutions
• Fully managed with automated provisioning, patching, securing, backup,
restore, and built-in fault tolerance
44. NTT Docomo: Japan’s largest mobile provider
• 68 million customers
• 10s of TBs per day of data across
mobile network
• 6PB of total data (uncompressed)
• Data science for marketing
operations, logistics etc.
• Greenplum on-premises
• 125 node DS2.8XL cluster
• 4,500 vCPUs, 30TB RAM
• 6 PB uncompressed data
• 10x faster analytic queries
• 50% reduction in time for
new BI app. deployment
• Significantly less ops.
overhead
45. Amazon EMR
• Hadoop, Hive, Presto, Spark, Tez, Impala etc.
• Release 5.2: Hadoop 2.7.3, Hive 2.1, Spark 2.02, Zeppelin, Presto, HBase
1.2.3 and HBase on S3, Phoenix, Tez, Flink
• New applications added within 30 days of their open source release
• Fully managed, automatically scaling clusters with support for On-
Demand and Spot pricing
• Support for HDFS and S3 filesystems enabling separated compute and
storage; multiple clusters can run against the same data in S3
• HIPAA-eligible. Support for end-to-end encryption, IAM/VPC, S3 client-
side encryption with customer managed keys and AWS KMS
46. Why we built Amazon EMR
• Customers wanted to use the latest open source analytic
frameworks to analyze and transform their data
• Customers wanted to use technologies like Spark and
Presto in conjunction with AWS services like Amazon S3
and features like EC2 Spot Instances
• Customers wanted to benefit from the elasticity that
AWS offers
48. Amazon Athena
• Serverless query service for querying data in S3 with no
infrastructure to manage.
• No data loading required; query directly from Amazon S3
• Use standard ANSI SQL queries with support for joins,
JSON, and window functions.
• Support for multiple data formats include text, CSV, TSV,
JSON, Avro, ORC, Parquet
• Pay per query only when you’re running queries; $5/TB
scanned; if you compress your data, your queries cost less
49. Why we built Amazon Athena
• Customers wanted an easy way to run ad-hoc queries
on data in Amazon S3 with no infrastructure to manage
• Customers wanted a service that could complement their
use of Amazon Redshift and Amazon EMR
• Customers wanted to give this capability to anyone in
their company and only pay per query
52. As a native cloud service, QuickSight
combines the speed, scalability, and
and ease of deployment that our
customers have come to depend on
with the value and cost effectiveness
you expect from AWS.
Amazon QuickSight
Fast, easy to use business analytics service at 1/10th the
cost of traditional BI solutions.
53. Amazon QuickSight
• Auto-Discover AWS data sources like Amazon Redshift, RDS, and S3
• Connect to third-party sources like Excel, Salesforce, and other
hosted/on-premises databases
• Super-fast performance with SPICE
• Instant visualizations with Autograph
• Securely share and collaborate on analyses, dashboards and stories
• Native iPhone experience and web based access from all other devices
• Governed datasets
• User access controls
• Active Directory Integration
54. QuickSight providing real-time insights at MLB Advanced Media
QuickSight provides us with
a real-time, 360 degree view
of our business without
being constrained by pre-
built dashboards and
metrics expanding our use
of data to make informed
decisions.
Brandon Sangiovanni
Sr. BI Development Manager
”
“
55. Distributed search and analytics engine
Managed service using Elasticsearch and Kibana
Fully managed; zero admin
Highly available and reliable
Tightly integrated with other AWS servicesAmazon
Elasticsearch
Service
56. Amazon Elasticsearch Service Leading Use-Cases
Log Analytics &
Operational Monitoring
• Monitor the performance of your
application, web servers, and
hardware
• Easy to use, yet powerful data
visualization tools to detect
issues in near real-time
• Ability to dig into your logs in an
intuitive, fine-grained way
• Kibana provides fast, easy
visualization
Traditional Search
• Application or website provides
search capabilities over diverse
documents
• Tasked with making this knowledge
base searchable and accessible
• Key search features including text
matching, faceting, filtering, fuzzy
search, auto complete, and
highlighting
• Query API to support application
search
58. Case Study: Adobe Developer Platform (Adobe I/O)
Over 200,000 API calls per second peak
• destinations, response times, bandwidth
Log data is routed with Amazon Kinesis to Amazon Elasticsearch Service, then
displayed using AES Kibana
Adobe team can easily see traffic patterns and error rates, quickly identifying
anomalies and potential challenges
Amazon
Kinesis
Streams
Spark Streaming
Amazon
Elasticsearch
Service
Data
Sources
1
59. Which Service Should You Use?
Situation Solution
Existing application
Use your existing engine on RDS
• MySQL Amazon Aurora, RDS for MySQL
• PostgreSQL RDS for PostgreSQL
• Oracle, SQL Server RDS for Oracle, RDS for SQL Server
New application
• If you can avoid relational features DynamoDB
• If you need relational features Amazon Aurora
Data Warehouse & BI • Amazon Redshift and Amazon QuickSight
Ad hoc analysis of data in S3 • Amazon Athena and Amazon QuickSight
Spark, Hadoop, Hive, HBase • Amazon EMR
Log analytics, operational
monitoring and search
• Amazon Elasticsearch Service