Understanding AWS Managed Database and Analytics Services | AWS Public Sector Summit 2017

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dan Neault, AWS DB, Analytics, & AI Customer Programs
Scott Donaldson, Senior Director, FINRA
June 13, 2017
Understanding AWS Managed Databases
and Analytics Services

AWS Data Services to Accelerate Your Move to the Cloud
RDS
Open
Source
RDS
Commercial
Aurora
Migration for DB Freedom
DynamoDB
& DAX
ElastiCache EMR Amazon
Redshift
Redshift
Spectrum
AthenaElasticsearch
Service
Amazon
QuickSight
Glue
Lex
Polly
Rekognition Machine
Learning
Databases to Elevate your Apps
Relational Non-Relational
& In-Memory
Analytics to Engage your Data
Inline Data Warehousing Reporting
Data Lake
Amazon AI to Drive the Future
Deep Learning, MXNet
Database Migration
Schema Conversion

Public Sector Customers Use
AWS Database, Analytic, and AI Services

Multi-engine support
Open Source
Commercial
Amazon Aurora
Automated provisioning, patching, scaling, backup/restore, failover
Use with General Purpose SSD or Provisioned IOPS SSD storage
High availability with RDS Multi-AZ
Amazon RDS: Cheaper, Easier, and Better

Enterprise-grade fault tolerant
solution for production
databases
Automatic failover
Synchronous replication
Inexpensive & enabled with one click
High Availability Multi-AZ Deployments

Up To 5x Performance
Of High-end MySQL
Highly Available
and Durable
MySQL
Compatible*
1/10th The Cost Of
Commercial Grade Databases
Fastest Growing
AWS Service, Ever
Amazon Aurora: Speed and Availability of Commercial
Databases, with Cost-Effectiveness of Open Source
*PostgreSQL compatibility in Open Preview

BINLOG DATA DOUBLE-WRITELOG FRM FILES
TYPE OF WRITE
MySQL with Replica
Storage MirrorStorage Mirror
DC 1 DC 2
StorageStorage
Primary
Instance
Replica
Instance
AZ 1 AZ 3
Primary
Instance
Amazon S3
AZ 2
Replica
Instance
ASYNC
4/6 QUORUM
DISTRIBUTED
WRITES
Replica
Instance
Amazon Aurora
780K transactions
7,388K I/Os per million txns (excludes mirroring, standby)
Average 7.4 I/Os per transaction
MySQL IO profile for 30 min. Sysbench run
27,378K transactions 35X MORE
0.95 I/Os per transaction 7.7X LESS
Aurora IO profile for 30 min. Sysbench run
Aurora, Faster Because it is Built for AWS

DynamoDB: Non-Relational
Managed NoSQL Database Service
Schemaless data model
Consistent low latency performance
Predictable provisioned throughput
Seamless scalability with no storage limits
High durability & availability (replication across 3 facilities)
Easy administration – we scale for you!
Low cost
DynamoDB
DAXApp
DynamoDB Accelerator (DAX) offers caching
without coding for sub-millisecond read
latency and up to 10x throughput

DynamoDB at Amtrak
Built and deployed an operational database and data mart
for near-real-time reporting of sales data
Developed and released the solution in six months
Used cloud native technologies: DynamoDB, Kinesis,
Lambda, and S3
Benefits
Improved accuracy and single source of truth for sales data
Allows decommissioning of four legacy systems
Low maintenance and operational costs. No servers to manage.

Make Almost Any Database Faster
and Less Expensive
In-Memory Cache
Memcached and Redis
Fully managed
High Speed In-Memory Data Store
Persistent high availability
Clusters up to 3.5TB
Average read and write time of
under 500µs (0.5ms)
Amazon ElastiCache Provides Sub-millisecond
Caching and In-Memory Data

Amazon EMR: the Hadoop and Spark Ecosystem,
Without the Chaos
Design Patterns
Amazon S3 as HDFS
Core Nodes and Task Nodes
Elastic Clusters
Transient + Always On Clusters
Leverage the Hadoop ecosystem
Use Cases
Recommendation Engines
Personalization Engines
Semi-structured/unstructured data
Combine disparate data sets
Next generation ETL
Sentiment analysis
Batch analytics
Taming Big Data in the Cloud
Hadoop, Spark, Presto, Hive and more
Easy to use, fully managed
Launch a cluster in minutes
Baked in security features
Pay by the hour and save with Spot

Amazon Elasticsearch Service
Log Analytics &
Operational Monitoring
Monitor the performance of your
apps, web servers, and
infrastructure
Easy to use, yet powerful data
visualization tools to detect issues in
near real-time
Ability to dig into your logs in an
intuitive, fine-grained way
Kibana provides fast, easy
visualization
Search
Application or website provides search
capabilities over diverse documents
Tasked with making this knowledge
base searchable and accessible
Key search features including text
matching, faceting, filtering, fuzzy
search, auto complete, and highlighting
Query API to support application search

Amazon Redshift: Cloud Data Warehousing
Leader Node
Simple SQL endpoint
Stores metadata
Optimizes query plan
Coordinates query execution
Compute Nodes
Local columnar storage
Parallel/distributed execution of all queries,
loads, backups, restores, resizes
Up to 2 petabytes of managed data
Automated ingestion from S3, Kinesis,
EMR and DynamoDB
Leader
Node
Compute Nodes
S3 EMR DynamoDB EC2

Large Data Lakes: PB and XB
Run SQL queries directly against data in S3
Fast @ exabyte scale Elastic & highly available
On-demand, pay-per-queryHigh concurrency: Multiple
clusters access same data
No ETL: Query data in-place
using open file formats
Full SQL support
S3
SQL
Amazon Redshift Spectrum
Run SQL queries directly against
data in S3 using thousands of nodes
Amazon Athena
Serverless interactive query service
Query an Exabyte of data in
under 3 minutes

Data Catalog
Hive metastore compatible metadata repository of data sources
Crawls data source to infer table, data type, partition format
Job Execution
Runs jobs in Spark containers – automatic scaling based on SLA
Glue is serverless – only pay for the resources you consume
Job Authoring
Generates Python code to move data from source to destination
Edit with your favorite IDE; share code snippets using Git
AWS Glue for Automated, Serverless ETL

Amazon QuickSight: Fast Business Analytics
Data from Many Sources
AWS Managed Databases
Amazon S3
Databases on Amazon EC2
On-premises databases
Excel and CSV Files
Salesforce and other SaaS
Mobile and Web Access
iPhone, Android and Tablet
Most popular web browsers
Powered by SPICE
Super-fast, Parallel, In-memory Calculation Engine
Run fast interactive queries on large datasets
Low monthly cost per user

Old-World Vendors and Old-World Policies…
You’ve Got
Mail!
AUDIT
Very Expensive Proprietary Lock-In Punitive
Licensing
Unshackle From
H stile Database Vendors

Freedom Begins with Choice; Migrating Data and Schema
AWS Schema Conversion Tool
Automatically convert & move tables,
views, stored procedures, metadata
Highlights and recommends custom
actions as needed
AWS Database Migration Service
Start a migration in literally a few minutes
Keep apps running during the migration
Replicate from, within, or to Amazon EC2 or
managed database services or on-premises
0
1
2
3
4
5
WorkloadQualification
Framework Assess workloads by
complexity, technology,
effort, and other factors
Recommends strategy
and plans for migration
AWS Workload Qualification Framework

Heterogeneous Migration
Oracle private DC to RDS PostgreSQL migration
Used the AWS Schema Conversion Tool (AWS SCT) to convert their
database schema
Used on-going Change Data Capture (CDC) replication to keep
databases in sync until they reached the cutover window
Benefits
Improved reliability of the cloud environment
Savings on Oracle licensing costs
SCT Assessment Report showed the scope of the migration

Amazon AI
Intelligent Services Powered By Deep Learning

FINRA: Data Sharing Pre-Cloud
Built a data hub of to deal with growing problem of point-to-
point dependencies between databases in the data center.
FINRA data center
App 1 DB
App 2 DB
App 3 DB
App N DB
HUB DB
FINRA data center
App 1 DB
App 2 DB
App 3 DB
App N DB

FINRA: Data Replication Services on AWS

FINRA: Varied Analytic Use Cases

FINRA: Analytics Architecture
Validation
Data Management
Linkage
Data Analytics
Normalization Amazon
EC2
Amazon
S3
Amazon
Glacier
Amazon
Redshift
Amazon
EMR
VPC
Amazon
EMR
Amazon
RDS
Amazon
Machine
Learning
AWS
KMS 12
Batch Analytics Interactive & Visualizations Data Science

FINRA: Universal Data Science Platform
16

FINRA: Evolution of the Analytics Portfolio

FINRA: Analytics Impacts
• Removed obstacles
“Before data analysis of this magnitude required intervention from technology.”
“We are now able to see underlying data and visual representation of summaries together
with outliers and anomalies. This reduces our time to market on examinations.”
“We moved away from requesting raw reports to requesting dashboards that provide
meaningful information and tell a story…”
• Lowered the cost of curiosity
“Analysts are able to quickly obtain a full picture of what happens to an order over time,
helping to inform decision making as to whether a rule violation has occurred.”
“[W]ith a click we can now compare firms of our choice or defined peer groups. This helps
use by reducing a lot of noise…”
“Using machine learning algorithms validates our assumptions and makes us data driven”
• Optimize batch and interactive workloads without compromise
• Greater innovation and more engaged staff
21

Questions?

Thank You!

Understanding AWS Managed Database and Analytics Services | AWS Public Sector Summit 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Understanding AWS Managed Database and Analytics Services | AWS Public Sector Summit 2017

Similar to Understanding AWS Managed Database and Analytics Services | AWS Public Sector Summit 2017 (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

Understanding AWS Managed Database and Analytics Services | AWS Public Sector Summit 2017