The world is creating more data in more ways than ever before. The average internet user in 2017 generates 1.5GB of data per day, with the rate doubling every 18 months. A single autonomous vehicle can generate 4TB per day. Each smart manufacturing plant generates 1PB per day. Storing, managing, and analyzing this data requires integrated database and analytic services that provide reliability and security at scale. AWS offers a range of managed data services that let customers focus on making data useful, including Amazon Aurora, RDS, DynamoDB, Redshift, Spectrum, ElastiCache, Kinesis, EMR, Elasticsearch Service, and Glue. In this session, we discuss these services, share our vision for innovation, and show how our customers use these services today. Learn More: https://aws.amazon.com/government-education/
2. AWS Data Services to Accelerate Your Move to the Cloud
RDS
Open
Source
RDS
Commercial
Aurora
Migration for DB Freedom
DynamoDB
& DAX
ElastiCache EMR Amazon
Redshift
Redshift
Spectrum
AthenaElasticsearch
Service
Amazon
QuickSight
Glue
Lex
Polly
Rekognition Machine
Learning
Databases to Elevate your Apps
Relational Non-Relational
& In-Memory
Analytics to Engage your Data
Inline Data Warehousing Reporting
Data Lake
Amazon AI to Drive the Future
Deep Learning, MXNet
Database Migration
Schema Conversion
4. AWS Data Services to Accelerate Your Move to the Cloud
RDS
Open
Source
RDS
Commercial
Aurora
Migration for DB Freedom
DynamoDB
& DAX
ElastiCache EMR Amazon
Redshift
Redshift
Spectrum
AthenaElasticsearch
Service
Amazon
QuickSight
Glue
Lex
Polly
Rekognition Machine
Learning
Databases to Elevate your Apps
Relational Non-Relational
& In-Memory
Analytics to Engage your Data
Inline Data Warehousing Reporting
Data Lake
Amazon AI to Drive the Future
Deep Learning, MXNet
Database Migration
Schema Conversion
5. Multi-engine support
Open Source
Commercial
Amazon Aurora
Automated provisioning, patching, scaling, backup/restore, failover
Use with General Purpose SSD or Provisioned IOPS SSD storage
High availability with RDS Multi-AZ
Amazon RDS: Cheaper, Easier, and Better
6. Enterprise-grade fault tolerant
solution for production
databases
Automatic failover
Synchronous replication
Inexpensive & enabled with one click
High Availability Multi-AZ Deployments
7. Up To 5x Performance
Of High-end MySQL
Highly Available
and Durable
MySQL
Compatible*
1/10th The Cost Of
Commercial Grade Databases
Fastest Growing
AWS Service, Ever
Amazon Aurora: Speed and Availability of Commercial
Databases, with Cost-Effectiveness of Open Source
*PostgreSQL compatibility in Open Preview
8. BINLOG DATA DOUBLE-WRITELOG FRM FILES
TYPE OF WRITE
MySQL with Replica
Storage MirrorStorage Mirror
DC 1 DC 2
StorageStorage
Primary
Instance
Replica
Instance
AZ 1 AZ 3
Primary
Instance
Amazon S3
AZ 2
Replica
Instance
ASYNC
4/6 QUORUM
DISTRIBUTED
WRITES
Replica
Instance
Amazon Aurora
780K transactions
7,388K I/Os per million txns (excludes mirroring, standby)
Average 7.4 I/Os per transaction
MySQL IO profile for 30 min. Sysbench run
27,378K transactions 35X MORE
0.95 I/Os per transaction 7.7X LESS
Aurora IO profile for 30 min. Sysbench run
Aurora, Faster Because it is Built for AWS
9. DynamoDB: Non-Relational
Managed NoSQL Database Service
Schemaless data model
Consistent low latency performance
Predictable provisioned throughput
Seamless scalability with no storage limits
High durability & availability (replication across 3 facilities)
Easy administration – we scale for you!
Low cost
DynamoDB
DAXApp
DynamoDB Accelerator (DAX) offers caching
without coding for sub-millisecond read
latency and up to 10x throughput
10. DynamoDB at Amtrak
Built and deployed an operational database and data mart
for near-real-time reporting of sales data
Developed and released the solution in six months
Used cloud native technologies: DynamoDB, Kinesis,
Lambda, and S3
Benefits
Improved accuracy and single source of truth for sales data
Allows decommissioning of four legacy systems
Low maintenance and operational costs. No servers to manage.
11. Make Almost Any Database Faster
and Less Expensive
In-Memory Cache
Memcached and Redis
Fully managed
High Speed In-Memory Data Store
Persistent high availability
Clusters up to 3.5TB
Average read and write time of
under 500µs (0.5ms)
Amazon ElastiCache Provides Sub-millisecond
Caching and In-Memory Data
12. AWS Data Services to Accelerate Your Move to the Cloud
RDS
Open
Source
RDS
Commercial
Aurora
Migration for DB Freedom
DynamoDB
& DAX
ElastiCache EMR Amazon
Redshift
Redshift
Spectrum
AthenaElasticsearch
Service
Amazon
QuickSight
Glue
Lex
Polly
Rekognition Machine
Learning
Databases to Elevate your Apps
Relational Non-Relational
& In-Memory
Analytics to Engage your Data
Inline Data Warehousing Reporting
Data Lake
Amazon AI to Drive the Future
Deep Learning, MXNet
Database Migration
Schema Conversion
13. Amazon EMR: the Hadoop and Spark Ecosystem,
Without the Chaos
Design Patterns
Amazon S3 as HDFS
Core Nodes and Task Nodes
Elastic Clusters
Transient + Always On Clusters
Leverage the Hadoop ecosystem
Use Cases
Recommendation Engines
Personalization Engines
Semi-structured/unstructured data
Combine disparate data sets
Next generation ETL
Sentiment analysis
Batch analytics
Taming Big Data in the Cloud
Hadoop, Spark, Presto, Hive and more
Easy to use, fully managed
Launch a cluster in minutes
Baked in security features
Pay by the hour and save with Spot
14. Amazon Elasticsearch Service
Log Analytics &
Operational Monitoring
Monitor the performance of your
apps, web servers, and
infrastructure
Easy to use, yet powerful data
visualization tools to detect issues in
near real-time
Ability to dig into your logs in an
intuitive, fine-grained way
Kibana provides fast, easy
visualization
Search
Application or website provides search
capabilities over diverse documents
Tasked with making this knowledge
base searchable and accessible
Key search features including text
matching, faceting, filtering, fuzzy
search, auto complete, and highlighting
Query API to support application search
15. Amazon Redshift: Cloud Data Warehousing
Leader Node
Simple SQL endpoint
Stores metadata
Optimizes query plan
Coordinates query execution
Compute Nodes
Local columnar storage
Parallel/distributed execution of all queries,
loads, backups, restores, resizes
Up to 2 petabytes of managed data
Automated ingestion from S3, Kinesis,
EMR and DynamoDB
Leader
Node
Compute Nodes
S3 EMR DynamoDB EC2
16. Large Data Lakes: PB and XB
Run SQL queries directly against data in S3
Fast @ exabyte scale Elastic & highly available
On-demand, pay-per-queryHigh concurrency: Multiple
clusters access same data
No ETL: Query data in-place
using open file formats
Full SQL support
S3
SQL
Amazon Redshift Spectrum
Run SQL queries directly against
data in S3 using thousands of nodes
Amazon Athena
Serverless interactive query service
Query an Exabyte of data in
under 3 minutes
17. Data Catalog
Hive metastore compatible metadata repository of data sources
Crawls data source to infer table, data type, partition format
Job Execution
Runs jobs in Spark containers – automatic scaling based on SLA
Glue is serverless – only pay for the resources you consume
Job Authoring
Generates Python code to move data from source to destination
Edit with your favorite IDE; share code snippets using Git
AWS Glue for Automated, Serverless ETL
18. Amazon QuickSight: Fast Business Analytics
Data from Many Sources
AWS Managed Databases
Amazon S3
Databases on Amazon EC2
On-premises databases
Excel and CSV Files
Salesforce and other SaaS
Mobile and Web Access
iPhone, Android and Tablet
Most popular web browsers
Powered by SPICE
Super-fast, Parallel, In-memory Calculation Engine
Run fast interactive queries on large datasets
Low monthly cost per user
19. Old-World Vendors and Old-World Policies…
You’ve Got
Mail!
AUDIT
Very Expensive Proprietary Lock-In Punitive
Licensing
Unshackle From
H stile Database Vendors
20. Freedom Begins with Choice; Migrating Data and Schema
AWS Schema Conversion Tool
Automatically convert & move tables,
views, stored procedures, metadata
Highlights and recommends custom
actions as needed
AWS Database Migration Service
Start a migration in literally a few minutes
Keep apps running during the migration
Replicate from, within, or to Amazon EC2 or
managed database services or on-premises
0
1
2
3
4
5
WorkloadQualification
Framework Assess workloads by
complexity, technology,
effort, and other factors
Recommends strategy
and plans for migration
AWS Workload Qualification Framework
21. Heterogeneous Migration
Oracle private DC to RDS PostgreSQL migration
Used the AWS Schema Conversion Tool (AWS SCT) to convert their
database schema
Used on-going Change Data Capture (CDC) replication to keep
databases in sync until they reached the cutover window
Benefits
Improved reliability of the cloud environment
Savings on Oracle licensing costs
SCT Assessment Report showed the scope of the migration
24. FINRA: Data Sharing Pre-Cloud
Built a data hub of to deal with growing problem of point-to-
point dependencies between databases in the data center.
FINRA data center
App 1 DB
App 2 DB
App 3 DB
App N DB
HUB DB
FINRA data center
App 1 DB
App 2 DB
App 3 DB
App N DB
31. FINRA: Analytics Impacts
• Removed obstacles
“Before data analysis of this magnitude required intervention from technology.”
“We are now able to see underlying data and visual representation of summaries together
with outliers and anomalies. This reduces our time to market on examinations.”
“We moved away from requesting raw reports to requesting dashboards that provide
meaningful information and tell a story…”
• Lowered the cost of curiosity
“Analysts are able to quickly obtain a full picture of what happens to an order over time,
helping to inform decision making as to whether a rule violation has occurred.”
“[W]ith a click we can now compare firms of our choice or defined peer groups. This helps
use by reducing a lot of noise…”
“Using machine learning algorithms validates our assumptions and makes us data driven”
• Optimize batch and interactive workloads without compromise
• Greater innovation and more engaged staff
21