SlideShare a Scribd company logo
1 of 58
Download to read offline
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
John Loughlin, AWS Solutions Architect
Kishore Raja, Boingo Wireless, VP Strategy
Ajit Zadgaonkar, Edmunds.com Executive
Director, Engineering Operations
October 2015
ISM303
Migrating Your Enterprise Data
Warehouse to Amazon Redshift
Relational data warehouse
Massively parallel; Petabyte scale
Fully managed
HDD and SSD Platforms
$1,000/TB/Year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
Amazon Redshift works with your analysis tools
JDBC/ODBC
Amazon Redshift
Data loading options
• Parallel upload to Amazon S3
• AWS Direct Connect
• AWS Import/Export
• Amazon Kinesis
• Systems integrators
Data Integration Systems Integrators
Amazon Redshift architecture
Leader Node
Simple SQL end point
Stores metadata
Optimizes query plan
Coordinates query execution
Compute Nodes
Local columnar storage
Parallel/distributed execution of all queries, loads,
backups, restores, resizes
Start at $0.25/hour, grow to 2 PB (compressed)
DC1: SSD; scale from 160 GB to 326 TB
DS2: HDD; scale from 2 TB to 2 PB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
Amazon Redshift is priced to analyze all your data
DS2 (HDD)
Price Per Hour for
DW1.XL Single Node
Effective Annual
Price per TB compressed
On-Demand $ 0.850 $ 3,725
1 Year Reservation $ 0.500 $ 2,190
3 Year Reservation $ 0.228 $ 999
DC1 (SSD)
Price Per Hour for
DW2.L Single Node
Effective Annual
Price per TB compressed
On-Demand $ 0.250 $ 13,690
1 Year Reservation $ 0.161 $ 8,795
3 Year Reservation $ 0.100 $ 5,500
Pricing is simple
Number of nodes x price/hour
No charge for leader node
No upfront costs
Pay as you go
Common migration patterns
• Data from a variety of relational online transaction
processing (OLTP) systems structure lends itself to SQL
schemas
• Data from logs, devices, sensors,…data is less
structured
Structured data loading
• Data is often being loaded into another warehouse from
an existing ETL process
• Temptation is to “lift and shift” workload
• Resist temptation; instead consider:
• What do I really want to do?
• What do I need?
Ingesting less-structured data
• Some data does not lend itself to a relational schema
• Common pattern is to use Amazon EMR to:
• Impose structure
• Import into Amazon Redshift
• Other solutions are often home-grown scripting
applications
Loading data
• Load to an empty Amazon Redshift database
• Load changes captured in the source system to Amazon
Redshift
Truncate and load
This is by far the easiest option:
• Move the data to Amazon S3
• Multi-part upload
• Import/export service
• AWS Direct Connect
• COPY the data into Amazon Redshift, a table at a time
Load changes
• Identify changes in source systems
• Move data to Amazon S3
• Load changes:
• ‘Upsert process’
• Partner ETL tools
Partner ETL
• Amazon Redshift is supported by a variety of ETL
vendors
• Many simplify the process of data loading
• A variety of vendors offer a free trial of their products,
allowing you to evaluate and choose the one that suits
your needs
• Visit http://aws.amazon.com/redshift/partners
Upsert
• The goal is to insert new rows into and update changed
rows in Amazon Redshift
• Load data into a temporary staging table
• Join the staging table with production and delete the
common rows
• Copy the new data into the production table
• See Updating and Inserting New Data in the Amazon
Redshift Database Developer Guide
COPY command
• Set COMPUPDATE to ON when running on an empty
table
• Use the COPY command
• Each slice can load one file at a time
• Partition input files so all slices can load in parallel
• Use a manifest file
Use multiple input files to maximize throughput
• Use the COPY command
• Each slice can load one file at
a time
• A single input file means only
one slice is ingesting data
• Instead of 100 MB/s, you’re
getting only 6.25 MB/s
Use multiple input files to maximize throughput
• Use the COPY command
• You need at least as many
input files as you have slices
• With 16 input files, all slices
are working so you maximize
throughput
• Get 100 MB/s per node; scale
linearly as you add nodes
Primary keys and manifest files
• Amazon Redshift doesn’t enforce primary key
constraints:
• If you load data multiple times, Amazon Redshift won’t complain
• If you declare primary keys in your data manipulation language
(DML), the optimizer expects the data to be unique
• Use manifest files to control exactly what is loaded and
how to respond if input files are missing:
• Define a JSON manifest on Amazon S3
• Ensures that the cluster loads exactly what you want
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kishore Raja
VP, Strategy
Boingo Wireless
October 7, 2015 | Las Vegas, NV
TCO and ROI for Migrating from
Enterprise Database to Amazon Redshift
ISM303
- Data Architecture
- Success Criteria
- Solutions Evaluated
- Additional Benefits
- Big data Agility
- Summary
Agenda
90+ M
Ad engagements/year
100
Operator partners
100+ Countries
6 Continents
Media Largest ad network
Engaging mobile audiences via Wi-Fi
Wi-Fi Largest operator
of airport wireless networks in the world
DAS
Largest operator
of independent indoor cellular networks
in the U.S.
Broadband
Largest provider
of wireless high-speed Internet & TV
for the military
1 Million+
Hotspots
Nearly
2000
Commercial locations
19
DAS Locations
Boingo: Reaching 1 Billion Consumers Annually
100+
Worldwide
Boingo on AWS
S3
Datawarehouse
Storage and
Content Delivery
Compute and
Networking Database
RDS
Admin and
Security Deployment App Services
Amazon EC2 AMI Elastic IP
VPC VPN Conn Gateway(s)
Route 53 Route
Table
ELB
Auto scaling ENI Lambda
EBS
Glacier
CloudFront
ElastiCache
MySQL DB
CloudWatch
Trusted Advisor
IAM
CloudTrail
Elastic Beanstalk
CloudFormation
OpsWorks
MFA Token
SQS
SQS
Oracle 11g(r2)
Data Architecture
SAP Data Services
Eng data
S3
Flat files
Database
Oracle RDS 11g(r2)
Front end Visualization
(Business Objects)
1. ETL 2. Data Storage 3. Reporting
Issues
• Data is growing which is making OLAP slow
• Inefficient Row based approach (mostly)
• Standard Oracle compression
• Mediocre IOPS
• Single DB server (no concurrency)
• Not enough memory (64GB)
• Administration
– Partitioning
– DB patches, updates, OS patches, updates
– Maintenance (backup, snapshots, replication)
– Recovery failure etc.
• Expensive (license, hardware, support etc.)
Success Criteria
What do we need?
• Memory (at least 256GB)
• Parallel Processing
• Plenty of IOPS
• Less Administration
• Low TCO
Growth rate:
• Currently at 15TB
• 2-3TB average growth per year
Nice to have
• Ingest any data type/store
• Realtime Streaming analysis
• Massive Parallel Processing
• Scale (up or down)
• Integrate any (& every) database
• Multiple levels of Security
• Smart Alerts and Monitoring
• Cost Effective
• Lesser (or zero) CAPEX
• Keep up with Industry
Security/Compliances
• Automated audit reporting
Solutions
Exadata
AWS Data Solutions
• Oracle
• SQL Server
• PostgreSQL
• MySQL
• Aurora (MySQL
compatible)
• Small and large scale
non-RDS
• Schemaless
• Using open source
memcached/Redis
• Works on any database
• Datawarehouse
• Petabyte scale
• Massive Parallel
processing
RDS NoSQL In Memory
DataWarehouse
Redshift
Fully Managed, No CAPEX, Highly secure, Scalable
• DAT202: Understanding Database Options on AWS (Wednesday, Oct 7, 11:00 AM - 12:00 PM, San Polo 3501B)
• DAT302 - Relational Database Management Systems in the Cloud: Deploying SQL Server on AWS (Thursday, Oct 8, 5:30 PM - 6:30 PM, San Polo 3501B)
• DAT303: Oracle on AWS and Amazon RDS: Secure, Fast and Scalable (Friday, Oct 9 9:00-10AM, Delfino 4102)
Redshift TCO
EaaS
Eng. Data
S3
Flat files
Redshift
Datawarehouse
Front end Visualization
(Business Objects)
1. ETL 2. Data storage 3. BI reports
- Cluster of 50 DB servers
- 100 CPU cores
- 8TB SSD storage
- 750GB Memory
- Self organizing Cluster(s)
- 160GB increments
Annual Cost: $48,500Annual Cost: ~ $6,500
Annual Cost: ~ $55,000
Database installs, patches, OS installs,
patches, backup, replication, server
maintenance, scaling, security etc.
Managed Service
TCO Comparison
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
Exadata SAP HANA Redshift
TCO Estimates
$400,000
$300,000
$55,000
Performance Results
7,200
2,700
15 15
Query Performance Data Load Performance
1 year of data
1 million records
Latencyinseconds
RedshiftExisting System
7,200
55,000
6500
Existing System Redshift
ETLannualcost
ETL
Migration and Ease Of Use
Database installs, patches, OS installs,
patches, backup, replication, server
maintenance, scaling, security etc.
Administration and Support
0 1 2 3 4
Other Systems
Redshift
Migration Time (in months)
2
4
TCO
Estimated Cluster
- Cluster of 50 DB servers
- 100 CPU cores
- 8TB SSD storage
- 750GB Memory
- Self organizing Cluster(s)
- 160GB increments
Actual Cluster
$48,500
$12,000
Savings:
• 40% for upto 1 year term
• 60% for upto 3 year term
Options:
• No upfront 20% *
• Partial upfront 41% - 73%
• All upfront 42% - 76%
Cancellation:
• Full refund within 7 days *
• Prorated refund within 30 days *
• Prorated refund within 90 days
Talend ($6500)
* For 1 year term RI
Python Scripts ($0)
Elasticity Reserved Instances ETL
- ISM208 - The Science of Saving with AWS Reserved Instances (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Delfino 4105)
3. Subnets
Additional Benefits
1. Access Control
• “Deny All” DB cluster
• Firewall rules
• IAM management
2. VPC
• BYOIP
• Ingress access
• Extend to corporate
data center
Cloud
• MFA
• Encryption
• Transit : SSL with TLS v1.2
• Storage : Encryption at rest
• Further isolation inside VPC
• IAM management
• SEC302 - IAM Best Practices to Live By (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Palazzo K)
• NET201 - Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Titian 2201B)
• ARC403 - From One to Many: Evolving VPC Design (Wednesday, Oct 7, 2:45 PM - 3:45 PM, Palazzo N)
AES 256-bit AES 256-bit AES 256-bit
AES 256-bit AES 256-bit AES 256-bit
AES 256-bit
AES 256-bit
AES 256-bit
AES 256-bit
Database Key
Cluster Master Key
Customer Master Key
HSM
(Data center)
Advanced Encryption
Monitoring and Alerts
Intrusion Detection
• DDoS
• MiTM
• IP Spoofing
• Packet Sniffing
• Port Monitoring
Service
• DVO303 - Scaling Infrastructure Operations with AWS Service Catalog, AWS Config, and AWS CloudTrail (Friday, Oct 9, 9:00 AM - 10:00 AM, Lido 3001B)
• ARC302 - Running Lean Architectures: How to Optimize for Cost Efficiency (Friday, Oct 9, 9:00 AM - 10:00 AM, Palazzo K)
Big Data Agility
Production Datawarehouse
- Cluster of 50 DB servers
- 100 CPU cores
- 8TB SSD storage
- 750GB Memory
- Self organizing Cluster(s)
- 160GB increments
Backup
QA Cluster
Predictive Analysis/Adhoc Cluster
Performance Cluster
< 30mins
< 5/hour
< $5/hour
< $5/hour
DAT311 - Large-Scale Genomic Analysis with Amazon Redshift (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Lando 4306)
DAT308 - How Yahoo! Analyzes Billions of Events a Day on Amazon Redshift (Thursday, Oct 8, 4:15 PM - 5:15 PM, Palazzo C)
BDT401 - Amazon Redshift Deep Dive: Tuning and Best Practices (Thursday, Oct 8, 2:45 PM - 3:45 PM, Marcello 4506)
Summary
• (Very) Cost Efficient
• (Highly) Secure (Enterprise grade Encryption)
• Managed service (Administration)
• Quick(er) Migration time
• 167+ Security and Compliancy features
• Proved to work (NASDAQ, NASA, Financial Times, Pinterest etc.)
• Faster with better performance
• Future proof (Ecosystem, security, new services etc.)
• 2+ years on AWS
• Ease of use
ROI
Related Sessions
• DAT311 - Large-Scale Genomic Analysis with Amazon Redshift (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Lando 4306)
• DAT308 - How Yahoo! Analyzes Billions of Events a Day on Amazon Redshift (Thursday, Oct 8, 4:15 PM - 5:15 PM,
Palazzo C)
• BDT401 - Amazon Redshift Deep Dive: Tuning and Best Practices (Thursday, Oct 8, 2:45 PM - 3:45 PM, Marcello 4506)
• DAT202: Understanding Database Options on AWS (Wednesday, Oct 7, 11:00 AM - 12:00 PM, San Polo 3501B)
• DAT302 - Relational Database Management Systems in the Cloud: Deploying SQL Server on AWS (Thursday, Oct 8, 5:30
PM - 6:30 PM, San Polo 3501B)
• DAT303: Oracle on AWS and Amazon RDS: Secure, Fast and Scalable (Friday, Oct 9 9:00-10AM, Delfino 4102)
• SEC302 - IAM Best Practices to Live By (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Palazzo K)
• NET201 - Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options (Wednesday, Oct 7, 1:30 PM -
2:30 PM, Titian 2201B)
• ARC403 - From One to Many: Evolving VPC Design (Wednesday, Oct 7, 2:45 PM - 3:45 PM, Palazzo N)
• DVO303 - Scaling Infrastructure Operations with AWS Service Catalog, AWS Config, and AWS CloudTrail (Friday, Oct 9,
9:00 AM - 10:00 AM, Lido 3001B)
• ISM208 - The Science of Saving with AWS Reserved Instances (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Delfino 4105)
• ARC302 - Running Lean Architectures: How to Optimize for Cost Efficiency (Friday, Oct 9, 9:00 AM - 10:00 AM, Palazzo K)
RedshiftDatabasesInfrastructureCost
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ajit Zadgaonkar, Executive Director
October 2015
Migration to Amazon Redshift
Edmunds.com
18 MILLION Monthly Visitors
OF CAR BUYERS INFLUENCED BY
EDMUNDS.COM
59%
*R. L. Polk & Co.
Edmunds.com
• 18M unique visitors a month
• 200M+ page views a month
• Over 10k dealer partners
• 14k+ API users
• Over 6M automotive
inventory
• Over 1M content pages
• Lots and lots of data
• Continuously growing data
• 24x7 real-time BI
• DWH in Amazon Redshift
• 32-node cluster
From unsustainable, painful operations to:
• Efficient, cost-effective cluster
• Squeak-free operations
• Happy customers
• Cost reduction (new system costs 1/5 of the old one)
Improvement
Challenges
• Painfully slow queries
• High system resource utilization
• Slow data loading
• Timeouts !
• …all in all, we were running into HUGE PROBLEMS
Lessons learned
• Know the system, the strengths, and the limitations
• Understand the end-to-end usage scenario
• Design the processes following Best Practices
• Invest in real-time monitoring
• Lift and shift may not be the best choice
• Let Enterprise Support and TAMs be your partners
• Monitor, monitor, and trend
The System, the infrastructure
• Syntactical differences (i.e., PostgreSQL 7 vs.
PostgreSQL 8)
• Architectural choices (i.e., columnar database)
• Transaction processing
• Historical data analysis, business intelligence
• Node type, cluster size
• Shared infrastructure vs. dedicated throughput
• The larger the cluster, the bigger the resizing effort
Make the up-front investment: Design
• Select the right sort key
• Timestamp, range filtering on column name, joins
• Compound sort key, interleaved sort key
• Measure query performance, system load, and vacuum
• Ensuring tables have a sort key alone helped us gain
20% performance
• Over 50% of our tables did not have a sort key
• Ensuring that the right sort key is assigned is the path to
winning
Make the upfront investment: Use cases
• Select the right distribution style
• Locate data faster
• Uniform load
• Less data movement
• A good distribution style ensures a healthy system
• Many of our tables did not have the right distribution style
Queries
• Select * is #1 performance killer
• Use WHERE clause on the primary sort column
• Watch out for queries that create “temporary tables”
• Long-running queries might impact downstream services
• Define constraints
VACUUM
• Run VACUUM frequently
• Run right after loading data
• Monitor vacuum time
Data loading
• Load data in sort key order
• Load using multiple files (1 MB to 1 GB)
• #files: Multiples of slices in cluster
• Use compression
• Use single COPY command
• S3 is your best friend
A closer look
• Each node is split into slices
• One slice per core
• Each slice is allocated
memory, CPU, and disk
space
• Each slice processes a
piece of the workload in
parallel
Monitoring commit queue
Monitoring commit time
Monitoring
• Console/Amazon CloudWatch monitoring
• CPU, memory, processes
• Data distribution across slices
• Space used per table
• WLM query count, queue wait time, execution time
• Commit stats, top time-consuming queries
In closing
• Amazon Redshift is a great data warehousing platform
• Parting advice: Make investment in Best Practices
• Check out Redshift Utils
Thank you!

More Related Content

What's hot

Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesAmazon Web Services
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722Amazon Web Services
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon RedshiftAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftAmazon Web Services
 
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Amazon Web Services
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseAmazon Web Services
 
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...Amazon Web Services
 
AWS Webcast - Data Integration into Amazon Redshift
AWS Webcast - Data Integration into Amazon RedshiftAWS Webcast - Data Integration into Amazon Redshift
AWS Webcast - Data Integration into Amazon RedshiftAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData FlyData Inc.
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...Amazon Web Services
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with RedshiftAmazon Web Services
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSightAmazon Web Services
 

What's hot (20)

Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
 
AWS Webcast - Data Integration into Amazon Redshift
AWS Webcast - Data Integration into Amazon RedshiftAWS Webcast - Data Integration into Amazon Redshift
AWS Webcast - Data Integration into Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 

Viewers also liked

AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)Amazon Web Services
 
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...Amazon Web Services
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesAmazon Web Services
 
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Amazon Web Services
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAmazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)Amazon Web Services
 
Getting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheGetting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheAmazon Web Services
 
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...Amazon Web Services
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech TalksAmazon Web Services
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and consSaniya Khalsa
 
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon Web Services
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...Amazon Web Services
 
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Amazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)Amazon Web Services
 
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWSAmazon Web Services
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...Amazon Web Services
 

Viewers also liked (20)

AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
 
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)
 
Getting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheGetting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCache
 
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and cons
 
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
 
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
 
(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
 
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
 

Similar to (ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift

Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services
 
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介Amazon Web Services Japan
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Amazon Web Services
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database ServicesAmazon Web Services
 
Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014Amazon Web Services LATAM
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Amazon Web Services
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Amazon Web Services
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...Datapolis
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Amazon Web Services
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...Amazon Web Services
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017Pratim Das
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 

Similar to (ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift (20)

Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database Services
 
Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
 
Bases de datos en la nube con AWS
Bases de datos en la nube con AWSBases de datos en la nube con AWS
Bases de datos en la nube con AWS
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. John Loughlin, AWS Solutions Architect Kishore Raja, Boingo Wireless, VP Strategy Ajit Zadgaonkar, Edmunds.com Executive Director, Engineering Operations October 2015 ISM303 Migrating Your Enterprise Data Warehouse to Amazon Redshift
  • 2. Relational data warehouse Massively parallel; Petabyte scale Fully managed HDD and SSD Platforms $1,000/TB/Year; starts at $0.25/hour Amazon Redshift a lot faster a lot simpler a lot cheaper
  • 3. Amazon Redshift works with your analysis tools JDBC/ODBC Amazon Redshift
  • 4. Data loading options • Parallel upload to Amazon S3 • AWS Direct Connect • AWS Import/Export • Amazon Kinesis • Systems integrators Data Integration Systems Integrators
  • 5. Amazon Redshift architecture Leader Node Simple SQL end point Stores metadata Optimizes query plan Coordinates query execution Compute Nodes Local columnar storage Parallel/distributed execution of all queries, loads, backups, restores, resizes Start at $0.25/hour, grow to 2 PB (compressed) DC1: SSD; scale from 160 GB to 326 TB DS2: HDD; scale from 2 TB to 2 PB 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 6. Amazon Redshift is priced to analyze all your data DS2 (HDD) Price Per Hour for DW1.XL Single Node Effective Annual Price per TB compressed On-Demand $ 0.850 $ 3,725 1 Year Reservation $ 0.500 $ 2,190 3 Year Reservation $ 0.228 $ 999 DC1 (SSD) Price Per Hour for DW2.L Single Node Effective Annual Price per TB compressed On-Demand $ 0.250 $ 13,690 1 Year Reservation $ 0.161 $ 8,795 3 Year Reservation $ 0.100 $ 5,500 Pricing is simple Number of nodes x price/hour No charge for leader node No upfront costs Pay as you go
  • 7. Common migration patterns • Data from a variety of relational online transaction processing (OLTP) systems structure lends itself to SQL schemas • Data from logs, devices, sensors,…data is less structured
  • 8. Structured data loading • Data is often being loaded into another warehouse from an existing ETL process • Temptation is to “lift and shift” workload • Resist temptation; instead consider: • What do I really want to do? • What do I need?
  • 9. Ingesting less-structured data • Some data does not lend itself to a relational schema • Common pattern is to use Amazon EMR to: • Impose structure • Import into Amazon Redshift • Other solutions are often home-grown scripting applications
  • 10. Loading data • Load to an empty Amazon Redshift database • Load changes captured in the source system to Amazon Redshift
  • 11. Truncate and load This is by far the easiest option: • Move the data to Amazon S3 • Multi-part upload • Import/export service • AWS Direct Connect • COPY the data into Amazon Redshift, a table at a time
  • 12. Load changes • Identify changes in source systems • Move data to Amazon S3 • Load changes: • ‘Upsert process’ • Partner ETL tools
  • 13. Partner ETL • Amazon Redshift is supported by a variety of ETL vendors • Many simplify the process of data loading • A variety of vendors offer a free trial of their products, allowing you to evaluate and choose the one that suits your needs • Visit http://aws.amazon.com/redshift/partners
  • 14. Upsert • The goal is to insert new rows into and update changed rows in Amazon Redshift • Load data into a temporary staging table • Join the staging table with production and delete the common rows • Copy the new data into the production table • See Updating and Inserting New Data in the Amazon Redshift Database Developer Guide
  • 15. COPY command • Set COMPUPDATE to ON when running on an empty table • Use the COPY command • Each slice can load one file at a time • Partition input files so all slices can load in parallel • Use a manifest file
  • 16. Use multiple input files to maximize throughput • Use the COPY command • Each slice can load one file at a time • A single input file means only one slice is ingesting data • Instead of 100 MB/s, you’re getting only 6.25 MB/s
  • 17. Use multiple input files to maximize throughput • Use the COPY command • You need at least as many input files as you have slices • With 16 input files, all slices are working so you maximize throughput • Get 100 MB/s per node; scale linearly as you add nodes
  • 18. Primary keys and manifest files • Amazon Redshift doesn’t enforce primary key constraints: • If you load data multiple times, Amazon Redshift won’t complain • If you declare primary keys in your data manipulation language (DML), the optimizer expects the data to be unique • Use manifest files to control exactly what is loaded and how to respond if input files are missing: • Define a JSON manifest on Amazon S3 • Ensures that the cluster loads exactly what you want
  • 19. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kishore Raja VP, Strategy Boingo Wireless October 7, 2015 | Las Vegas, NV TCO and ROI for Migrating from Enterprise Database to Amazon Redshift ISM303
  • 20. - Data Architecture - Success Criteria - Solutions Evaluated - Additional Benefits - Big data Agility - Summary Agenda
  • 21. 90+ M Ad engagements/year 100 Operator partners 100+ Countries 6 Continents Media Largest ad network Engaging mobile audiences via Wi-Fi Wi-Fi Largest operator of airport wireless networks in the world DAS Largest operator of independent indoor cellular networks in the U.S. Broadband Largest provider of wireless high-speed Internet & TV for the military 1 Million+ Hotspots Nearly 2000 Commercial locations 19 DAS Locations Boingo: Reaching 1 Billion Consumers Annually 100+ Worldwide
  • 22. Boingo on AWS S3 Datawarehouse Storage and Content Delivery Compute and Networking Database RDS Admin and Security Deployment App Services Amazon EC2 AMI Elastic IP VPC VPN Conn Gateway(s) Route 53 Route Table ELB Auto scaling ENI Lambda EBS Glacier CloudFront ElastiCache MySQL DB CloudWatch Trusted Advisor IAM CloudTrail Elastic Beanstalk CloudFormation OpsWorks MFA Token SQS SQS Oracle 11g(r2)
  • 23. Data Architecture SAP Data Services Eng data S3 Flat files Database Oracle RDS 11g(r2) Front end Visualization (Business Objects) 1. ETL 2. Data Storage 3. Reporting
  • 24. Issues • Data is growing which is making OLAP slow • Inefficient Row based approach (mostly) • Standard Oracle compression • Mediocre IOPS • Single DB server (no concurrency) • Not enough memory (64GB) • Administration – Partitioning – DB patches, updates, OS patches, updates – Maintenance (backup, snapshots, replication) – Recovery failure etc. • Expensive (license, hardware, support etc.)
  • 25. Success Criteria What do we need? • Memory (at least 256GB) • Parallel Processing • Plenty of IOPS • Less Administration • Low TCO Growth rate: • Currently at 15TB • 2-3TB average growth per year Nice to have • Ingest any data type/store • Realtime Streaming analysis • Massive Parallel Processing • Scale (up or down) • Integrate any (& every) database • Multiple levels of Security • Smart Alerts and Monitoring • Cost Effective • Lesser (or zero) CAPEX • Keep up with Industry Security/Compliances • Automated audit reporting
  • 27. AWS Data Solutions • Oracle • SQL Server • PostgreSQL • MySQL • Aurora (MySQL compatible) • Small and large scale non-RDS • Schemaless • Using open source memcached/Redis • Works on any database • Datawarehouse • Petabyte scale • Massive Parallel processing RDS NoSQL In Memory DataWarehouse Redshift Fully Managed, No CAPEX, Highly secure, Scalable • DAT202: Understanding Database Options on AWS (Wednesday, Oct 7, 11:00 AM - 12:00 PM, San Polo 3501B) • DAT302 - Relational Database Management Systems in the Cloud: Deploying SQL Server on AWS (Thursday, Oct 8, 5:30 PM - 6:30 PM, San Polo 3501B) • DAT303: Oracle on AWS and Amazon RDS: Secure, Fast and Scalable (Friday, Oct 9 9:00-10AM, Delfino 4102)
  • 28. Redshift TCO EaaS Eng. Data S3 Flat files Redshift Datawarehouse Front end Visualization (Business Objects) 1. ETL 2. Data storage 3. BI reports - Cluster of 50 DB servers - 100 CPU cores - 8TB SSD storage - 750GB Memory - Self organizing Cluster(s) - 160GB increments Annual Cost: $48,500Annual Cost: ~ $6,500 Annual Cost: ~ $55,000 Database installs, patches, OS installs, patches, backup, replication, server maintenance, scaling, security etc. Managed Service
  • 30. Performance Results 7,200 2,700 15 15 Query Performance Data Load Performance 1 year of data 1 million records Latencyinseconds RedshiftExisting System 7,200 55,000 6500 Existing System Redshift ETLannualcost ETL
  • 31. Migration and Ease Of Use Database installs, patches, OS installs, patches, backup, replication, server maintenance, scaling, security etc. Administration and Support 0 1 2 3 4 Other Systems Redshift Migration Time (in months) 2 4
  • 32. TCO Estimated Cluster - Cluster of 50 DB servers - 100 CPU cores - 8TB SSD storage - 750GB Memory - Self organizing Cluster(s) - 160GB increments Actual Cluster $48,500 $12,000 Savings: • 40% for upto 1 year term • 60% for upto 3 year term Options: • No upfront 20% * • Partial upfront 41% - 73% • All upfront 42% - 76% Cancellation: • Full refund within 7 days * • Prorated refund within 30 days * • Prorated refund within 90 days Talend ($6500) * For 1 year term RI Python Scripts ($0) Elasticity Reserved Instances ETL - ISM208 - The Science of Saving with AWS Reserved Instances (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Delfino 4105)
  • 33. 3. Subnets Additional Benefits 1. Access Control • “Deny All” DB cluster • Firewall rules • IAM management 2. VPC • BYOIP • Ingress access • Extend to corporate data center Cloud • MFA • Encryption • Transit : SSL with TLS v1.2 • Storage : Encryption at rest • Further isolation inside VPC • IAM management • SEC302 - IAM Best Practices to Live By (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Palazzo K) • NET201 - Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Titian 2201B) • ARC403 - From One to Many: Evolving VPC Design (Wednesday, Oct 7, 2:45 PM - 3:45 PM, Palazzo N)
  • 34. AES 256-bit AES 256-bit AES 256-bit AES 256-bit AES 256-bit AES 256-bit AES 256-bit AES 256-bit AES 256-bit AES 256-bit Database Key Cluster Master Key Customer Master Key HSM (Data center) Advanced Encryption
  • 35. Monitoring and Alerts Intrusion Detection • DDoS • MiTM • IP Spoofing • Packet Sniffing • Port Monitoring Service • DVO303 - Scaling Infrastructure Operations with AWS Service Catalog, AWS Config, and AWS CloudTrail (Friday, Oct 9, 9:00 AM - 10:00 AM, Lido 3001B) • ARC302 - Running Lean Architectures: How to Optimize for Cost Efficiency (Friday, Oct 9, 9:00 AM - 10:00 AM, Palazzo K)
  • 36. Big Data Agility Production Datawarehouse - Cluster of 50 DB servers - 100 CPU cores - 8TB SSD storage - 750GB Memory - Self organizing Cluster(s) - 160GB increments Backup QA Cluster Predictive Analysis/Adhoc Cluster Performance Cluster < 30mins < 5/hour < $5/hour < $5/hour DAT311 - Large-Scale Genomic Analysis with Amazon Redshift (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Lando 4306) DAT308 - How Yahoo! Analyzes Billions of Events a Day on Amazon Redshift (Thursday, Oct 8, 4:15 PM - 5:15 PM, Palazzo C) BDT401 - Amazon Redshift Deep Dive: Tuning and Best Practices (Thursday, Oct 8, 2:45 PM - 3:45 PM, Marcello 4506)
  • 37. Summary • (Very) Cost Efficient • (Highly) Secure (Enterprise grade Encryption) • Managed service (Administration) • Quick(er) Migration time • 167+ Security and Compliancy features • Proved to work (NASDAQ, NASA, Financial Times, Pinterest etc.) • Faster with better performance • Future proof (Ecosystem, security, new services etc.) • 2+ years on AWS • Ease of use ROI
  • 38. Related Sessions • DAT311 - Large-Scale Genomic Analysis with Amazon Redshift (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Lando 4306) • DAT308 - How Yahoo! Analyzes Billions of Events a Day on Amazon Redshift (Thursday, Oct 8, 4:15 PM - 5:15 PM, Palazzo C) • BDT401 - Amazon Redshift Deep Dive: Tuning and Best Practices (Thursday, Oct 8, 2:45 PM - 3:45 PM, Marcello 4506) • DAT202: Understanding Database Options on AWS (Wednesday, Oct 7, 11:00 AM - 12:00 PM, San Polo 3501B) • DAT302 - Relational Database Management Systems in the Cloud: Deploying SQL Server on AWS (Thursday, Oct 8, 5:30 PM - 6:30 PM, San Polo 3501B) • DAT303: Oracle on AWS and Amazon RDS: Secure, Fast and Scalable (Friday, Oct 9 9:00-10AM, Delfino 4102) • SEC302 - IAM Best Practices to Live By (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Palazzo K) • NET201 - Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Titian 2201B) • ARC403 - From One to Many: Evolving VPC Design (Wednesday, Oct 7, 2:45 PM - 3:45 PM, Palazzo N) • DVO303 - Scaling Infrastructure Operations with AWS Service Catalog, AWS Config, and AWS CloudTrail (Friday, Oct 9, 9:00 AM - 10:00 AM, Lido 3001B) • ISM208 - The Science of Saving with AWS Reserved Instances (Wednesday, Oct 7, 1:30 PM - 2:30 PM, Delfino 4105) • ARC302 - Running Lean Architectures: How to Optimize for Cost Efficiency (Friday, Oct 9, 9:00 AM - 10:00 AM, Palazzo K) RedshiftDatabasesInfrastructureCost
  • 39. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ajit Zadgaonkar, Executive Director October 2015 Migration to Amazon Redshift Edmunds.com
  • 40. 18 MILLION Monthly Visitors
  • 41. OF CAR BUYERS INFLUENCED BY EDMUNDS.COM 59% *R. L. Polk & Co.
  • 42.
  • 43. Edmunds.com • 18M unique visitors a month • 200M+ page views a month • Over 10k dealer partners • 14k+ API users • Over 6M automotive inventory • Over 1M content pages • Lots and lots of data • Continuously growing data • 24x7 real-time BI • DWH in Amazon Redshift • 32-node cluster
  • 44. From unsustainable, painful operations to: • Efficient, cost-effective cluster • Squeak-free operations • Happy customers • Cost reduction (new system costs 1/5 of the old one) Improvement
  • 45. Challenges • Painfully slow queries • High system resource utilization • Slow data loading • Timeouts ! • …all in all, we were running into HUGE PROBLEMS
  • 46. Lessons learned • Know the system, the strengths, and the limitations • Understand the end-to-end usage scenario • Design the processes following Best Practices • Invest in real-time monitoring • Lift and shift may not be the best choice • Let Enterprise Support and TAMs be your partners • Monitor, monitor, and trend
  • 47. The System, the infrastructure • Syntactical differences (i.e., PostgreSQL 7 vs. PostgreSQL 8) • Architectural choices (i.e., columnar database) • Transaction processing • Historical data analysis, business intelligence • Node type, cluster size • Shared infrastructure vs. dedicated throughput • The larger the cluster, the bigger the resizing effort
  • 48. Make the up-front investment: Design • Select the right sort key • Timestamp, range filtering on column name, joins • Compound sort key, interleaved sort key • Measure query performance, system load, and vacuum • Ensuring tables have a sort key alone helped us gain 20% performance • Over 50% of our tables did not have a sort key • Ensuring that the right sort key is assigned is the path to winning
  • 49. Make the upfront investment: Use cases • Select the right distribution style • Locate data faster • Uniform load • Less data movement • A good distribution style ensures a healthy system • Many of our tables did not have the right distribution style
  • 50. Queries • Select * is #1 performance killer • Use WHERE clause on the primary sort column • Watch out for queries that create “temporary tables” • Long-running queries might impact downstream services • Define constraints
  • 51. VACUUM • Run VACUUM frequently • Run right after loading data • Monitor vacuum time
  • 52. Data loading • Load data in sort key order • Load using multiple files (1 MB to 1 GB) • #files: Multiples of slices in cluster • Use compression • Use single COPY command • S3 is your best friend
  • 53. A closer look • Each node is split into slices • One slice per core • Each slice is allocated memory, CPU, and disk space • Each slice processes a piece of the workload in parallel
  • 56. Monitoring • Console/Amazon CloudWatch monitoring • CPU, memory, processes • Data distribution across slices • Space used per table • WLM query count, queue wait time, execution time • Commit stats, top time-consuming queries
  • 57. In closing • Amazon Redshift is a great data warehousing platform • Parting advice: Make investment in Best Practices • Check out Redshift Utils