SlideShare a Scribd company logo
1 of 39
Migrate Your Data Warehouse to Amazon
Redshift
Greg Khairallah, Business Development Manger, AWS
David Giffin, VP Technology, TrueCar
Sharat Nair, Director of Data, TrueCar
Blagoy Kaloferov, Data Engineer, TrueCar
September 21, 2016
Agenda
• Motivation for Change and Migration
• Migration patterns and Best Practices
• AWS Database Migration Service
• Use Case – TrueCar
• Questions and Answers
Relational data warehouse
Massively parallel; petabyte scale
Fully managed
HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
Amazon Redshift delivers performance
“[Amazon] Redshift is twenty times faster than Hive.” (5x–20x reduction in query times) link
“Queries that used to take hours came back in seconds. Our analysts are orders of magnitude more
productive.” (20x–40x reduction in query times) link
“…[Amazon Redshift] performance has blown away everyone here (we generally see 50–100x
speedup over Hive).” link
“Team played with [Amazon] Redshift today and concluded it is awesome. Un-indexed complex
queries returning in < 10s.”
“Did I mention it's ridiculously fast? We'll be using it immediately to provide our analysts an
alternative to Hadoop.”
“We saw… 2x improvement in query times.”
Channel “We regularly process multibillion row datasets and we do that in a matter of hours.” link
Amazon Redshift is cost optimized
DS2 (HDD)
Price Per Hour for
DS2.XLarge Single Node
Effective Annual
Price per TB compressed
On-Demand $ 0.850 $ 3,725
1 Year Reservation $ 0.500 $ 2,190
3 Year Reservation $ 0.228 $ 999
DC1 (SSD)
Price Per Hour for
DC1.Large Single Node
Effective Annual
Price per TB compressed
On-Demand $ 0.250 $ 13,690
1 Year Reservation $ 0.161 $ 8,795
3 Year Reservation $ 0.100 $ 5,500
Pricing is simple
Number of nodes x price/hour
No charge for leader node
No up front costs
Pay as you go
Prices shown for US East
Other regions may vary
Considerations Before You Migrate
• Data is often being loaded into another warehouse
– existing ETL process with investment in code and process
• Temptation is to ‘lift & shift’ workload.
• Resist temptation. Instead consider:
– What do I really want to do?
– What do I need?
• Some data does not lend itself to a relational schema
• Common pattern is to use Amazon EMR:
– impose structure
– import into Amazon Redshift
Amazon Redshift architecture
• Leader Node
Simple SQL end point
Stores metadata
Optimizes query plan
Coordinates query execution
• Compute Nodes
Local columnar storage
Parallel/distributed execution of all queries, loads,
backups, restores, resizes
• Start at just $0.25/hour, grow to 2 PB
(compressed)
DC1: SSD; scale from 160 GB to 326 TB
DS2: HDD; scale from 2 TB to 2 PB
Ingestion/Backup
Backup
Restore
JDBC/ODBC
10 GigE
(HPC)
A deeper look at compute node architecture
Leader Node
Dense compute nodes
Large
• 2 slices/cores
• 15 GB RAM
• 160 GB SSD
8XL
• 32 slices/cores
• 244 GB RAM
• 2.56 TB SSD
Dense storage nodes
X-large
• 2 slices/4 cores
• 31 GB RAM
• 2 TB HDD
8XL
• 16 slices/ 36 cores
• 244 GB RAM
• 16 TB HDD
Amazon Redshift Migration Overview
AWS CloudCorporate Data center
Amazon
DynamoDB
Amazon S3
Data
Volume
Amazon Elastic
MapReduce
Amazon
RDS
Amazon
Redshift
Amazon
Glacier
logs / files
Source DBs
VPN
Connection
AWS Direct
Connect
S3 Multipart
Upload
Amazon
Snowball
EC2 or On-Prem
(using SSH)
Database Migration
Service
Kinesis
AWS Lambda
AWS Datapipeline
Uploading Files to Amazon S3
Amazon
Redshiftmydata
Client.txt
Corporate Data center
Region
Ensure that your
data resides in the
same Region as your
Redshift clusters
Split the data into
multiple files to
facilitate parallel
processing
Optionally, you can
encrypt your data
using Amazon S3
Server-Side or
Client-Side
Encryption
Client.txt.1
Client.txt.2
Client.txt.3
Client.txt.4
Files should be
individually
compressed using
GZIP or LZOP
• Use the COPY command
• Each slice can load one file at
a time
• A single input file means only
one slice is ingesting data
• Instead of 100MB/s, you’re
only getting 6.25MB/s
Loading – Use multiple input files to maximize
throughput
• Use the COPY command
• You need at least as many
input files as you have slices
• With 16 input files, all slices
are working so you maximize
throughput
• Get 100MB/s per node; scale
linearly as you add nodes
Loading – Use multiple input files to maximize
throughput
Loading Data with Manifest Files
• Use manifest to loads all required files
• Supply JSON-formatted text file that lists the files to be loaded
• Can load files from different buckets or with different prefix
{
"entries": [
{"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true},
{"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},
{"url":"s3://mybucket-beta/2013-10-04-custdata", "mandatory":true},
{"url":"s3://mybucket-beta/2013-10-05-custdata", "mandatory":true}
]
}
Redshift COPY Command
• Loads data into a table from data files in S3 or from an Amazon DynamoDB table.
• The COPY command requires only three parameters:
– Table name
– Data Source
– Credentials
Copy table_name FROM data_source CREDENTIALS
‘aws_access_credentials’
• Optional Parameters include:
– Column mapping options – mapping source to target
– Data Format Parameters – FORMAT, CSV, DELIMITER, FIXEDWIDTH, AVRO, JSON,
BZIP2, GZIP, LZOP
– Data Conversion Parameters – Data type conversion between source and target
– Data Load Operations –troubleshoot load times or reduce load times with parameters like
COMROWS, COMPUPDATE, MAXERROR, NOLOAD, STATUPDATE
Loading JSON Data
• COPY uses a jsonpaths text file to parse JSON data
• JSONPath expressions specify the path to JSON name elements
• Each JSONPath expression corresponds to a column in the Amazon
Redshift target table
Suppose you want to load the VENUE table with the following content
{ "id": 15, "name": "Gillette Stadium", "location": [ "Foxborough", "MA" ],
"seats": 68756 } { "id": 15, "name": "McAfee Coliseum", "location": [
"Oakland", "MA" ], "seats": 63026 }
You would use the following jsonpaths file to parse the JSON data.
{ "jsonpaths": [ "$['id']", "$['name']", "$['location'][0]",
"$['location'][1]", "$['seats']" ] }
Loading Data in Avro Format
• Avro is a data serialization protocol. An Avro source file includes a schema that defines the structure
of the data. The Avro schema type must be record.
• COPY uses a avro_option to parse Avro data. Valid values for avro_option are as follows:
– 'auto’ (default) - COPY automatically maps the data elements in the Avro source data to the
columns in the target table by matching field names in the Avro schema to column names in
the target table.
– 's3://jsonpaths_file' - To explicitly map Avro data elements to columns, you can use an
JSONPaths file.
Avro Schema
{
"name": "person",
"type": "record",
"fields": [
{"name": "id", "type": "int"},
{"name": "guid", "type": "string"},
{"name": "name", "type": "string"},
{"name": "address", "type": "string"}
}
Amazon Kinesis Firehose
Load massive volumes of streaming data into Amazon S3, Redshift
and Elasticsearch
• Zero administration: Capture and deliver streaming data into Amazon S3, Amazon
Redshift, and other destinations without writing an application or managing
infrastructure.
• Direct-to-data store integration: Batch, compress, and encrypt streaming data for
delivery into data destinations in as little as 60 secs using simple configurations.
• Seamless elasticity: Seamlessly scales to match data throughput w/o intervention
Capture and submit streaming
data
Analyze streaming data using your
favorite BI tools
Firehose loads streaming data continuously
into Amazon S3, Redshift and Elasticsearch
Best Practices for Loading Data
• Use a COPY Command to load data
• Use a single COPY command per table
• Split your data into multiple files
• Compress your data files with GZIP or LZOP
• Use multi-row inserts whenever possible
• Bulk insert operations (INSERT INTO…SELECT and CREATE TABLE AS)
provide high performance data insertion
• Use Amazon Kinesis Firehose for Streaming Data direct load to S3 and/or
Redshift
Best Practices for Loading Data Continued
• Load your data in sort key order to avoid needing to vacuum
• Organize your data as a sequence of time-series tables, where each table is
identical but contains data for different time ranges
• Use staging tables to perform an upsert
• Run the VACUUM command whenever you add, delete, or modify a large
number of rows, unless you load your data in sort key order
• Increase the memory available to a COPY or VACUUM by increasing
wlm_query_slot_count
• Run the ANALYZE command whenever you’ve made a non-trivial number of
changes to your data to ensure your table statistics are current
Amazon Partner ETL
• Amazon Redshift is supported by a variety of ETL vendors
• Many simplify the process of data loading
• Visit http://aws.amazon.com/redshift/partners
• There are a variety of vendors offering a free trial of their products, allowing you
to evaluate and choose the one that suits your needs.
• Start your first migration in 10 minutes or less
• Keep your apps running during the migration
• Replicate within, to, or from Amazon EC2 or RDS
• Move data from commercial database engines to
open source engines
• Or…move data to the same database engine
• Consolidate databases and/or tables
AWS Database Migration Service (DMS)
Benefits:
Sources and Targets for AWS DMS
Sources:
On-premises and Amazon EC2 instance databases:
• Oracle Database 10g – 12c
• Microsoft SQL Server 2005 – 2014
• MySQL 5.5 – 5.7
• MariaDB (MySQL-compatible data source)
• PostgreSQL 9.4 – 9.5
• SAP ASE 15.7+
RDS instance databases:
• Oracle Database 11g – 12c
• Microsoft SQL Server 2008R2 - 2014. CDC operations
are not supported yet.
• MySQL versions 5.5 – 5.7
• MariaDB (MySQL-compatible data source)
• PostgreSQL 9.4 – 9.5. CDC operations are not
supported yet.
• Amazon Aurora (MySQL-compatible data source)
Targets:
On-premises and EC2 instance databases:
• Oracle Database 10g – 12c
• Microsoft SQL Server 2005 – 2014
• MySQL 5.5 – 5.7
• MariaDB (MySQL-compatible data source)
• PostgreSQL 9.3 – 9.5
• SAP ASE 15.7+
RDS instance databases:
• Oracle Database 11g – 12c
• Microsoft SQL Server 2008 R2 - 2014
• MySQL 5.5 – 5.7
• MariaDB (MySQL-compatible data source)
• PostgreSQL 9.3 – 9.5
• Amazon Aurora (MySQL-compatible data source)
Amazon Redshift
AWS Database Migration Service Pricing
• T2 for developing and periodic data migration tasks
• C4 for large databases and minimizing time
• T2 pricing starts at $0.018 per hour for T2.micro
• C4 pricing starts at $0.154 per hour for C4.large
• 50 GB GP2 storage included with T2 instances
• 100 GB GP2 storage included with C4 instances
•
• Data transfer inbound and within AZ is free
• Data transfer across AZs starts at $0.01 per GB
AWS Schema Conversion Tool
Resources on the AWS Big Data Blog
• Best Practices for Micro-Batch Loading on Amazon Redshift
• Using Attunity Cloudbeam at UMUC to Replicate Data to Amazon RDS and
Amazon Redshift
• A Zero-Administration Amazon Redshift Database Loader
• Best Practices for Designing Tables
• Best Practices for Designing Queries
• Best Practices for Loading Data
Best Practices References
This Is The Presentation Title Entered In Master Slide Footer Area
Amazon Redshift at TrueCar
Sep 21, 2016
This Is The Presentation Title Entered In Master Slide Footer Area
● About TrueCar
● David Giffin – VP Technology
● Sharat Nair – Director of Data
● Blagoy Kaloferov – Data Engineer
About us
27
This Is The Presentation Title Entered In Master Slide Footer Area
● Amazon Redshift use case overview
● Architecture and migration process
● Tips and lessons learned
Agenda
28
Amazon Redshift at TrueCar
29
This Is The Presentation Title Entered In Master Slide Footer Area
● Datasets that flow into Amazon Redshift
● Clickstream Transaction
● Sales Inventory
● Dealer Leads
● How we do analytics and reporting
● Redshift is our data store for BI tools and ad-hoc
● Data that is loaded into Amazon Redshift is already processed
Amazon Redshift at TrueCar
30
This Is The Presentation Title Entered In Master Slide Footer Area
31
Architecture
31
ETL
(MR, Hive, Pig,
Oozie,Talend) Postgres
HDFS
Leads
Dealer
Transactions
Sales
Inventory
Clickstream
Staging, DWH ETL
Data Processing
This Is The Presentation Title Entered In Master Slide Footer Area
3232
ETL
(MR, Hive, Pig,
Oozie,Talend) Postgres
HDFS
Amazon
Redshift
Leads
Dealer
Transactions
Sales
Inventory
Clickstream
Staging, DWH ETL
S3
Loading utility
MSTR
Tableau
Data Processing Reporting
Ad Hoc
Architecture
This Is The Presentation Title Entered In Master Slide Footer Area
● Schemas
● Our datasets are in a read only schema for ad-hoc and scheduled reporting
● Ad-hoc and User tables in separate schemas
● Makes it easy to separate final data from user created one.
● Simple table’s naming conventions
● F_ - facts
● D_ - dimensions,
● AGG_ - aggregates
● V_ - views
Schema design
33
34
Amazon Redshift learnings
This Is The Presentation Title Entered In Master Slide Footer Area
● ETL is orchestrated through Talend and Oozie
● Processing tools: Talend, Hive , Pig and MapReduce pushing data into HDFS and S3
● We built our own Amazon Redshift loading utility
● Handles all loading use cases:
● Load
● TruncateLoad
● DeleteAppend
● Upsert
Redshift loading process
35
This Is The Presentation Title Entered In Master Slide Footer Area
● Train developers on table design and Redshift best practices
● Compress columns and encodings
● Analyze compression
● It makes a significant difference on space usage
● Sort and distribution keys
● Plan on Workload management strategy
● As usage of Redshift cluster grows you need to ensure that critical jobs get bandwidth
Table design considerations
36
This Is The Presentation Title Entered In Master Slide Footer Area
● Retain pre “COPY” data in S3
● It can easily be used by other tools (Spark, Pig, MapReduce)
● Offload historical datasets into separate tables on rolling basis
● Pre aggregate data when possible to reduce load on the system
Space considerations
37
This Is The Presentation Title Entered In Master Slide Footer Area
● Have a cluster resize strategy
● Use Reserved instances for cost savings
● Plan on having enough space for long-term growth
● Plan on your maintenance
● Vacuuming
● System tables are your friends
● Useful collection of utilities: https://github.com/awslabs/amazon-redshift-utils/
Long-term usage tips
38
Thanks!
Questions?
39

More Related Content

What's hot

Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Azure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDBAzure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDBNicholas Vossburg
 
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Amazon Web Services
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMark Kromer
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Deep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksDeep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksAmazon Web Services
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure DatabricksDustin Vannoy
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsDatabricks
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetupiwrigley
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 

What's hot (20)

Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Azure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDBAzure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDB
 
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
data warehouse vs data lake
data warehouse vs data lakedata warehouse vs data lake
data warehouse vs data lake
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Deep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksDeep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech Talks
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 

Viewers also liked

Introduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesIntroduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesAmazon Web Services
 
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Amazon Web Services
 
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)Amazon Web Services
 
Getting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheGetting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheAmazon Web Services
 
Understanding AWS Storage Options
Understanding AWS Storage OptionsUnderstanding AWS Storage Options
Understanding AWS Storage OptionsAmazon Web Services
 
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...Amazon Web Services
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...Amazon Web Services
 
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)Amazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)Amazon Web Services
 
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...Amazon Web Services
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech TalksAmazon Web Services
 
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Amazon Web Services
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...Amazon Web Services
 
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWSAmazon Web Services
 
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWSAmazon Web Services
 

Viewers also liked (20)

Introduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesIntroduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code Services
 
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
 
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
 
Getting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheGetting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCache
 
Intro to AWS: Storage Services
Intro to AWS: Storage ServicesIntro to AWS: Storage Services
Intro to AWS: Storage Services
 
Understanding AWS Storage Options
Understanding AWS Storage OptionsUnderstanding AWS Storage Options
Understanding AWS Storage Options
 
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
 
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
 
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
 
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
 
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
 
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
 

Similar to Migrate Your Data Warehouse to Amazon Redshift

(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722Amazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database ServicesAmazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...Amazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介Amazon Web Services Japan
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹Amazon Web Services
 
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...Amazon Web Services
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 
Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014Amazon Web Services LATAM
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017Pratim Das
 

Similar to Migrate Your Data Warehouse to Amazon Redshift (20)

(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Migrate Your Data Warehouse to Amazon Redshift

  • 1. Migrate Your Data Warehouse to Amazon Redshift Greg Khairallah, Business Development Manger, AWS David Giffin, VP Technology, TrueCar Sharat Nair, Director of Data, TrueCar Blagoy Kaloferov, Data Engineer, TrueCar September 21, 2016
  • 2. Agenda • Motivation for Change and Migration • Migration patterns and Best Practices • AWS Database Migration Service • Use Case – TrueCar • Questions and Answers
  • 3. Relational data warehouse Massively parallel; petabyte scale Fully managed HDD and SSD platforms $1,000/TB/year; starts at $0.25/hour Amazon Redshift a lot faster a lot simpler a lot cheaper
  • 4. Amazon Redshift delivers performance “[Amazon] Redshift is twenty times faster than Hive.” (5x–20x reduction in query times) link “Queries that used to take hours came back in seconds. Our analysts are orders of magnitude more productive.” (20x–40x reduction in query times) link “…[Amazon Redshift] performance has blown away everyone here (we generally see 50–100x speedup over Hive).” link “Team played with [Amazon] Redshift today and concluded it is awesome. Un-indexed complex queries returning in < 10s.” “Did I mention it's ridiculously fast? We'll be using it immediately to provide our analysts an alternative to Hadoop.” “We saw… 2x improvement in query times.” Channel “We regularly process multibillion row datasets and we do that in a matter of hours.” link
  • 5. Amazon Redshift is cost optimized DS2 (HDD) Price Per Hour for DS2.XLarge Single Node Effective Annual Price per TB compressed On-Demand $ 0.850 $ 3,725 1 Year Reservation $ 0.500 $ 2,190 3 Year Reservation $ 0.228 $ 999 DC1 (SSD) Price Per Hour for DC1.Large Single Node Effective Annual Price per TB compressed On-Demand $ 0.250 $ 13,690 1 Year Reservation $ 0.161 $ 8,795 3 Year Reservation $ 0.100 $ 5,500 Pricing is simple Number of nodes x price/hour No charge for leader node No up front costs Pay as you go Prices shown for US East Other regions may vary
  • 6. Considerations Before You Migrate • Data is often being loaded into another warehouse – existing ETL process with investment in code and process • Temptation is to ‘lift & shift’ workload. • Resist temptation. Instead consider: – What do I really want to do? – What do I need? • Some data does not lend itself to a relational schema • Common pattern is to use Amazon EMR: – impose structure – import into Amazon Redshift
  • 7. Amazon Redshift architecture • Leader Node Simple SQL end point Stores metadata Optimizes query plan Coordinates query execution • Compute Nodes Local columnar storage Parallel/distributed execution of all queries, loads, backups, restores, resizes • Start at just $0.25/hour, grow to 2 PB (compressed) DC1: SSD; scale from 160 GB to 326 TB DS2: HDD; scale from 2 TB to 2 PB Ingestion/Backup Backup Restore JDBC/ODBC 10 GigE (HPC)
  • 8. A deeper look at compute node architecture Leader Node Dense compute nodes Large • 2 slices/cores • 15 GB RAM • 160 GB SSD 8XL • 32 slices/cores • 244 GB RAM • 2.56 TB SSD Dense storage nodes X-large • 2 slices/4 cores • 31 GB RAM • 2 TB HDD 8XL • 16 slices/ 36 cores • 244 GB RAM • 16 TB HDD
  • 9. Amazon Redshift Migration Overview AWS CloudCorporate Data center Amazon DynamoDB Amazon S3 Data Volume Amazon Elastic MapReduce Amazon RDS Amazon Redshift Amazon Glacier logs / files Source DBs VPN Connection AWS Direct Connect S3 Multipart Upload Amazon Snowball EC2 or On-Prem (using SSH) Database Migration Service Kinesis AWS Lambda AWS Datapipeline
  • 10. Uploading Files to Amazon S3 Amazon Redshiftmydata Client.txt Corporate Data center Region Ensure that your data resides in the same Region as your Redshift clusters Split the data into multiple files to facilitate parallel processing Optionally, you can encrypt your data using Amazon S3 Server-Side or Client-Side Encryption Client.txt.1 Client.txt.2 Client.txt.3 Client.txt.4 Files should be individually compressed using GZIP or LZOP
  • 11. • Use the COPY command • Each slice can load one file at a time • A single input file means only one slice is ingesting data • Instead of 100MB/s, you’re only getting 6.25MB/s Loading – Use multiple input files to maximize throughput
  • 12. • Use the COPY command • You need at least as many input files as you have slices • With 16 input files, all slices are working so you maximize throughput • Get 100MB/s per node; scale linearly as you add nodes Loading – Use multiple input files to maximize throughput
  • 13. Loading Data with Manifest Files • Use manifest to loads all required files • Supply JSON-formatted text file that lists the files to be loaded • Can load files from different buckets or with different prefix { "entries": [ {"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true}, {"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true}, {"url":"s3://mybucket-beta/2013-10-04-custdata", "mandatory":true}, {"url":"s3://mybucket-beta/2013-10-05-custdata", "mandatory":true} ] }
  • 14. Redshift COPY Command • Loads data into a table from data files in S3 or from an Amazon DynamoDB table. • The COPY command requires only three parameters: – Table name – Data Source – Credentials Copy table_name FROM data_source CREDENTIALS ‘aws_access_credentials’ • Optional Parameters include: – Column mapping options – mapping source to target – Data Format Parameters – FORMAT, CSV, DELIMITER, FIXEDWIDTH, AVRO, JSON, BZIP2, GZIP, LZOP – Data Conversion Parameters – Data type conversion between source and target – Data Load Operations –troubleshoot load times or reduce load times with parameters like COMROWS, COMPUPDATE, MAXERROR, NOLOAD, STATUPDATE
  • 15. Loading JSON Data • COPY uses a jsonpaths text file to parse JSON data • JSONPath expressions specify the path to JSON name elements • Each JSONPath expression corresponds to a column in the Amazon Redshift target table Suppose you want to load the VENUE table with the following content { "id": 15, "name": "Gillette Stadium", "location": [ "Foxborough", "MA" ], "seats": 68756 } { "id": 15, "name": "McAfee Coliseum", "location": [ "Oakland", "MA" ], "seats": 63026 } You would use the following jsonpaths file to parse the JSON data. { "jsonpaths": [ "$['id']", "$['name']", "$['location'][0]", "$['location'][1]", "$['seats']" ] }
  • 16. Loading Data in Avro Format • Avro is a data serialization protocol. An Avro source file includes a schema that defines the structure of the data. The Avro schema type must be record. • COPY uses a avro_option to parse Avro data. Valid values for avro_option are as follows: – 'auto’ (default) - COPY automatically maps the data elements in the Avro source data to the columns in the target table by matching field names in the Avro schema to column names in the target table. – 's3://jsonpaths_file' - To explicitly map Avro data elements to columns, you can use an JSONPaths file. Avro Schema { "name": "person", "type": "record", "fields": [ {"name": "id", "type": "int"}, {"name": "guid", "type": "string"}, {"name": "name", "type": "string"}, {"name": "address", "type": "string"} }
  • 17. Amazon Kinesis Firehose Load massive volumes of streaming data into Amazon S3, Redshift and Elasticsearch • Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift, and other destinations without writing an application or managing infrastructure. • Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations. • Seamless elasticity: Seamlessly scales to match data throughput w/o intervention Capture and submit streaming data Analyze streaming data using your favorite BI tools Firehose loads streaming data continuously into Amazon S3, Redshift and Elasticsearch
  • 18. Best Practices for Loading Data • Use a COPY Command to load data • Use a single COPY command per table • Split your data into multiple files • Compress your data files with GZIP or LZOP • Use multi-row inserts whenever possible • Bulk insert operations (INSERT INTO…SELECT and CREATE TABLE AS) provide high performance data insertion • Use Amazon Kinesis Firehose for Streaming Data direct load to S3 and/or Redshift
  • 19. Best Practices for Loading Data Continued • Load your data in sort key order to avoid needing to vacuum • Organize your data as a sequence of time-series tables, where each table is identical but contains data for different time ranges • Use staging tables to perform an upsert • Run the VACUUM command whenever you add, delete, or modify a large number of rows, unless you load your data in sort key order • Increase the memory available to a COPY or VACUUM by increasing wlm_query_slot_count • Run the ANALYZE command whenever you’ve made a non-trivial number of changes to your data to ensure your table statistics are current
  • 20. Amazon Partner ETL • Amazon Redshift is supported by a variety of ETL vendors • Many simplify the process of data loading • Visit http://aws.amazon.com/redshift/partners • There are a variety of vendors offering a free trial of their products, allowing you to evaluate and choose the one that suits your needs.
  • 21. • Start your first migration in 10 minutes or less • Keep your apps running during the migration • Replicate within, to, or from Amazon EC2 or RDS • Move data from commercial database engines to open source engines • Or…move data to the same database engine • Consolidate databases and/or tables AWS Database Migration Service (DMS) Benefits:
  • 22. Sources and Targets for AWS DMS Sources: On-premises and Amazon EC2 instance databases: • Oracle Database 10g – 12c • Microsoft SQL Server 2005 – 2014 • MySQL 5.5 – 5.7 • MariaDB (MySQL-compatible data source) • PostgreSQL 9.4 – 9.5 • SAP ASE 15.7+ RDS instance databases: • Oracle Database 11g – 12c • Microsoft SQL Server 2008R2 - 2014. CDC operations are not supported yet. • MySQL versions 5.5 – 5.7 • MariaDB (MySQL-compatible data source) • PostgreSQL 9.4 – 9.5. CDC operations are not supported yet. • Amazon Aurora (MySQL-compatible data source) Targets: On-premises and EC2 instance databases: • Oracle Database 10g – 12c • Microsoft SQL Server 2005 – 2014 • MySQL 5.5 – 5.7 • MariaDB (MySQL-compatible data source) • PostgreSQL 9.3 – 9.5 • SAP ASE 15.7+ RDS instance databases: • Oracle Database 11g – 12c • Microsoft SQL Server 2008 R2 - 2014 • MySQL 5.5 – 5.7 • MariaDB (MySQL-compatible data source) • PostgreSQL 9.3 – 9.5 • Amazon Aurora (MySQL-compatible data source) Amazon Redshift
  • 23. AWS Database Migration Service Pricing • T2 for developing and periodic data migration tasks • C4 for large databases and minimizing time • T2 pricing starts at $0.018 per hour for T2.micro • C4 pricing starts at $0.154 per hour for C4.large • 50 GB GP2 storage included with T2 instances • 100 GB GP2 storage included with C4 instances • • Data transfer inbound and within AZ is free • Data transfer across AZs starts at $0.01 per GB
  • 25. Resources on the AWS Big Data Blog • Best Practices for Micro-Batch Loading on Amazon Redshift • Using Attunity Cloudbeam at UMUC to Replicate Data to Amazon RDS and Amazon Redshift • A Zero-Administration Amazon Redshift Database Loader • Best Practices for Designing Tables • Best Practices for Designing Queries • Best Practices for Loading Data Best Practices References
  • 26. This Is The Presentation Title Entered In Master Slide Footer Area Amazon Redshift at TrueCar Sep 21, 2016
  • 27. This Is The Presentation Title Entered In Master Slide Footer Area ● About TrueCar ● David Giffin – VP Technology ● Sharat Nair – Director of Data ● Blagoy Kaloferov – Data Engineer About us 27
  • 28. This Is The Presentation Title Entered In Master Slide Footer Area ● Amazon Redshift use case overview ● Architecture and migration process ● Tips and lessons learned Agenda 28
  • 29. Amazon Redshift at TrueCar 29
  • 30. This Is The Presentation Title Entered In Master Slide Footer Area ● Datasets that flow into Amazon Redshift ● Clickstream Transaction ● Sales Inventory ● Dealer Leads ● How we do analytics and reporting ● Redshift is our data store for BI tools and ad-hoc ● Data that is loaded into Amazon Redshift is already processed Amazon Redshift at TrueCar 30
  • 31. This Is The Presentation Title Entered In Master Slide Footer Area 31 Architecture 31 ETL (MR, Hive, Pig, Oozie,Talend) Postgres HDFS Leads Dealer Transactions Sales Inventory Clickstream Staging, DWH ETL Data Processing
  • 32. This Is The Presentation Title Entered In Master Slide Footer Area 3232 ETL (MR, Hive, Pig, Oozie,Talend) Postgres HDFS Amazon Redshift Leads Dealer Transactions Sales Inventory Clickstream Staging, DWH ETL S3 Loading utility MSTR Tableau Data Processing Reporting Ad Hoc Architecture
  • 33. This Is The Presentation Title Entered In Master Slide Footer Area ● Schemas ● Our datasets are in a read only schema for ad-hoc and scheduled reporting ● Ad-hoc and User tables in separate schemas ● Makes it easy to separate final data from user created one. ● Simple table’s naming conventions ● F_ - facts ● D_ - dimensions, ● AGG_ - aggregates ● V_ - views Schema design 33
  • 35. This Is The Presentation Title Entered In Master Slide Footer Area ● ETL is orchestrated through Talend and Oozie ● Processing tools: Talend, Hive , Pig and MapReduce pushing data into HDFS and S3 ● We built our own Amazon Redshift loading utility ● Handles all loading use cases: ● Load ● TruncateLoad ● DeleteAppend ● Upsert Redshift loading process 35
  • 36. This Is The Presentation Title Entered In Master Slide Footer Area ● Train developers on table design and Redshift best practices ● Compress columns and encodings ● Analyze compression ● It makes a significant difference on space usage ● Sort and distribution keys ● Plan on Workload management strategy ● As usage of Redshift cluster grows you need to ensure that critical jobs get bandwidth Table design considerations 36
  • 37. This Is The Presentation Title Entered In Master Slide Footer Area ● Retain pre “COPY” data in S3 ● It can easily be used by other tools (Spark, Pig, MapReduce) ● Offload historical datasets into separate tables on rolling basis ● Pre aggregate data when possible to reduce load on the system Space considerations 37
  • 38. This Is The Presentation Title Entered In Master Slide Footer Area ● Have a cluster resize strategy ● Use Reserved instances for cost savings ● Plan on having enough space for long-term growth ● Plan on your maintenance ● Vacuuming ● System tables are your friends ● Useful collection of utilities: https://github.com/awslabs/amazon-redshift-utils/ Long-term usage tips 38

Editor's Notes

  1. Edits (11/12/2015) Kinesis Firehose (via S3) RDS Migration Tool Add related reference to Create Table as Select (see below). Lab - Need to right-size data set and cluster size (7 GB on Tiny cluster takes ~10 min; too long for a lab).