SlideShare a Scribd company logo
1 of 22
SQL Server to Redshift
Background
RealityMine provides digital behaviour
analytics.
Our applications passively measure the
activity of opt-in users on all digital
platforms.
This could be focused on
• how to direct marketing
• how to direct product development
• question individuals whom
undertake certain behavior patterns
Starting State

•
•
•
•
•

SQL Server DW on in-house server
SQL Server 2008 R2 Enterprise Edition
Single 4 core (8 thread) i7 w/ 16GB RAM
2 960GB PCIe SSDs for DBs
1 240GB PCIe SSD for TempDb

SQL Server to Redshift - @joeharris76
Data Environment

•
•
•
•
•

~20 billion rows in active use
Largest table is also the widest
Volume is doubling more than annually
Data is in many languages
Starts as JSON, ends as Star Schema DW

SQL Server to Redshift - @joeharris76
Pain Points

•
•
•
•
•

Biggest cost is SQL Server license
Biggest bottleneck is single threaded perf.
Hand tuning needed to push CPU / disks
SSD reliability is not perfect
SSD performance degrades over time

SQL Server to Redshift - @joeharris76
Why Redshift

•
•
•
•
•
•

Vertica wanted £45k per terabyte
16 SQL Server Enterprise cores even more!
Teradata, Netezza, etc. don’t want <5TB sales
SAP HANA not viable for this volume on AWS
Infobright does not support incremental loads
Hadoop/Impala slow & requires lots of learning
SQL Server to Redshift - @joeharris76
Data Processing Approach

• No ETL tool truly supports Redshift
– Requirement to load from S3 is a killer
– Tried SSIS, Pentaho, Talend and others
• You’re stuck with ELT
– Load data then transform as needed
– Keep data raw as possible from source
SQL Server to Redshift - @joeharris76
War of Encodings
The road to heaven goes
through ÜÑÎÇØDÈ hell

SQL Server to Redshift - @joeharris76
Redshift: UTF-8 Only
• Redshift has zero-tolerance for certain chars
– NUL/0x00 => Treated as EOR, documented
– DEL/0x7F => Treated as EOR, undocumented
– 0xBFEFEF => UTF-8 spec "guaranteed non-char"
– These must be removed before loading data
• Other control characters can be loaded by escaping
– You cannot escape a single column, all or nothing

SQL Server to Redshift - @joeharris76
SQL Server: UTF-16LE Only
• NVARCHAR takes 2x as much space as a VARCHAR
• Makes functions consistent across ASCII & Unicode
– N/VARCHAR(32) = 32chars / Redshift = 32 bytes
• SQL Server tolerates anything character columns
• Input and output is not sanitized against UTF-16 spec
– Invalid or "guaranteed non-chars" are stored as is

SQL Server to Redshift - @joeharris76
SQL Extract: The Hard Way
• BCP is the “standard” way to extract data
• Using BCP your process looks something like this:
– Extract data as a huge UTF-16LE file using bcp
– Convert to a new UTF-8 file using iconv
– Remove or escape problem chars using sed
– Compress the final file using gzip
– All steps are heavily constrained by disk speed

SQL Server to Redshift - @joeharris76
SQL Extract: The Easy Way

SQLCMD one-liner for extracts:
Set the cmd code page to UTF-8
Interactive SQL terminal
Prevent summary in output
Select from the table / view
No column headers
Remove special characters
Delimit output with 1 ASCII char
No padding in output
Output in Unicode
Pipe stdout to gzip

chcp 65001 &
sqlcmd –E -Q
“SET NOCOUNT ON;
SELECT * FROM Db.Schema.Table;”
-h-1
-k1
-s”|”
-W
-u
| gzip > “C:file.gz”

SQL Server to Redshift - @joeharris76
Data Encryption

•
•
•
•
•

On SQL Server we use TDE
Redshift offers AES encrypted data on disk
Redshift can load client-side encrypted data
Client side encryption only applies while on S3
“Small performance penalty” for using AES

SQL Server to Redshift - @joeharris76
Security
• S3 Access => Create bucket(s) just for Redshift staging
• Redshift admin => Use IAM, create automation user(s)
• Redshift database =>
– Do not use admin it’s like SQL Server ‘sa’
• Database objects =>
– Must actively GRANT access to each object
– Use groups to make management easier

SQL Server to Redshift - @joeharris76
Sizing your cluster

• Redshift is over-provisioned on storage
• Redshift is super efficient at compression
– Compression not affected by the data model
• Redshift scale out is almost perfectly linear
– 2 nodes is twice as fast as 1 node
• You'll be sizing your cluster for speed!
SQL Server to Redshift - @joeharris76
Performance
• Redshift speed depends on node count
– A single node is not particularly fast
• Loading speed appears to be linked to S3 speed
– You must use multiple files for bulk loads
• Query speed appears to be CPU constrained
– Vacuum runs 250 MB/s, queries <20 MB/s
• Data modeling matters for complex query speed
– Use a star schema & well chosen distribution key
SQL Server to Redshift - @joeharris76
Data Modeling

2 main concepts to learn
• Distribution key
– Where data is placed, which node & slice
– Needs to be common across most tables
• Sort key
– How data is ordered on disk within the slice
– Good sort keys simply expensive joins
SQL Server to Redshift - @joeharris76
Database Maintenance
•
•
•
•

Data loaded to non-empty tables is not sorted
Data loaded to non-empty tables may kills their stats
ANALYZE rebuilds the stats without making changes
VACUUM re-sorts the physical data and rebuilds stats
– Needed to get the best performance
– Very similar to a REBUILD in SQL Server

SQL Server to Redshift - @joeharris76
Database Backups
• Redshift ‘backups’ are snapshots of the system
• Taken very quickly, much slower to restore
• Redshift automatically takes intra-day snapshots
• Manual snapshots can be run using AWS cmd line
• Snapshot storage is free up to size of cluster storage
• Snapshots must be restored to an identical cluster
• Snapshots cannot be restored to a running cluster

SQL Server to Redshift - @joeharris76
Code Changes

Code changes required so far
• ROW_NUMBER() missing in Redshift
• We gain LAG() and LEAD() which helps
• But very difficult to persist an order value
• DATETIMEOFFSET (e.g. timezone) not avail.
• DATETIMEs now split into 2 columns
• Work in progress…
SQL Server to Redshift - @joeharris76
That’s all folks!

SQL Server to Redshift - @joeharris76
Come Work With Me!
http://www.realitymine.com/careers/
• Currently trying to fill the following roles:
• Business Intelligence Architect (Redshift!)
• Business Intelligence Developer (Tableau!)
• Test Engineer (Quality!)
• Server Developer (C#!)
• Mobile App Developer (Android! iOS!)
• Project Manager
SQL Server to Redshift - @joeharris76

More Related Content

What's hot

Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesAmazon Web Services
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722Amazon Web Services
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features Amazon Web Services
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Amazon Web Services
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceAmazon Web Services
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Web Services
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best PracticesAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Web Services
 

What's hot (20)

Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722
 
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing Performance
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performance
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
 

Viewers also liked

(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
 
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)Amazon Web Services
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesAmazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...Amazon Web Services
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...Amazon Web Services
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)Amazon Web Services
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Amazon Web Services
 
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)Amazon Web Services
 
Delta Youth Support Link Society
Delta Youth Support Link SocietyDelta Youth Support Link Society
Delta Youth Support Link Societypizzastick
 
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013Michael Bohlig
 
Learn How to Run Python on Redshift
Learn How to Run Python on RedshiftLearn How to Run Python on Redshift
Learn How to Run Python on RedshiftChartio
 
Aws meetup ssm
Aws meetup ssmAws meetup ssm
Aws meetup ssmAdam Book
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...Treasure Data, Inc.
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...Julien SIMON
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Amazon Web Services
 
Como o Magazine Luiza inova suas operações utilizando as soluções de IoT e Bi...
Como o Magazine Luiza inova suas operações utilizando as soluções de IoT e Bi...Como o Magazine Luiza inova suas operações utilizando as soluções de IoT e Bi...
Como o Magazine Luiza inova suas operações utilizando as soluções de IoT e Bi...Amazon Web Services LATAM
 

Viewers also liked (20)

(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
 
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
 
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
 
Delta Youth Support Link Society
Delta Youth Support Link SocietyDelta Youth Support Link Society
Delta Youth Support Link Society
 
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
 
Learn How to Run Python on Redshift
Learn How to Run Python on RedshiftLearn How to Run Python on Redshift
Learn How to Run Python on Redshift
 
Começando com Amazon Redshift
Começando com Amazon RedshiftComeçando com Amazon Redshift
Começando com Amazon Redshift
 
REDSHIFT - Amazon
REDSHIFT - AmazonREDSHIFT - Amazon
REDSHIFT - Amazon
 
Aws meetup ssm
Aws meetup ssmAws meetup ssm
Aws meetup ssm
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
 
Prince 2 project managment Document Lessons learned log
Prince 2 project managment Document Lessons learned logPrince 2 project managment Document Lessons learned log
Prince 2 project managment Document Lessons learned log
 
Como o Magazine Luiza inova suas operações utilizando as soluções de IoT e Bi...
Como o Magazine Luiza inova suas operações utilizando as soluções de IoT e Bi...Como o Magazine Luiza inova suas operações utilizando as soluções de IoT e Bi...
Como o Magazine Luiza inova suas operações utilizando as soluções de IoT e Bi...
 

Similar to Migration to Redshift from SQL Server

AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]shuwutong
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Denny Lee
 
Remote DBA Experts SQL Server 2008 New Features
Remote DBA Experts SQL Server 2008 New FeaturesRemote DBA Experts SQL Server 2008 New Features
Remote DBA Experts SQL Server 2008 New FeaturesRemote DBA Experts
 
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019Dave Stokes
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core ConceptsJon Haddad
 
Configuring Sage 500 for Performance
Configuring Sage 500 for PerformanceConfiguring Sage 500 for Performance
Configuring Sage 500 for PerformanceRKLeSolutions
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Spark Summit
 
Customer Education Webcast: New Features in Data Integration and Streaming CDC
Customer Education Webcast: New Features in Data Integration and Streaming CDCCustomer Education Webcast: New Features in Data Integration and Streaming CDC
Customer Education Webcast: New Features in Data Integration and Streaming CDCPrecisely
 
Cassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoCassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoJon Haddad
 
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Databricks
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarDenny Lee
 
30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practicesDavid Dhavan
 
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Clustrix
 
iMobileMagic Teck Talk Scale Up
iMobileMagic Teck Talk Scale UpiMobileMagic Teck Talk Scale Up
iMobileMagic Teck Talk Scale UpPedro Machado
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyDataStax Academy
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to SparkSky Yin
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 

Similar to Migration to Redshift from SQL Server (20)

AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
 
Remote DBA Experts SQL Server 2008 New Features
Remote DBA Experts SQL Server 2008 New FeaturesRemote DBA Experts SQL Server 2008 New Features
Remote DBA Experts SQL Server 2008 New Features
 
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Configuring Sage 500 for Performance
Configuring Sage 500 for PerformanceConfiguring Sage 500 for Performance
Configuring Sage 500 for Performance
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
 
Customer Education Webcast: New Features in Data Integration and Streaming CDC
Customer Education Webcast: New Features in Data Integration and Streaming CDCCustomer Education Webcast: New Features in Data Integration and Streaming CDC
Customer Education Webcast: New Features in Data Integration and Streaming CDC
 
Cassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoCassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day Toronto
 
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinar
 
30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices
 
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
 
iMobileMagic Teck Talk Scale Up
iMobileMagic Teck Talk Scale UpiMobileMagic Teck Talk Scale Up
iMobileMagic Teck Talk Scale Up
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
 
What's new in SQL Server Integration Services 2012?
What's new in SQL Server Integration Services 2012?What's new in SQL Server Integration Services 2012?
What's new in SQL Server Integration Services 2012?
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Migration to Redshift from SQL Server

  • 1. SQL Server to Redshift
  • 2. Background RealityMine provides digital behaviour analytics. Our applications passively measure the activity of opt-in users on all digital platforms. This could be focused on • how to direct marketing • how to direct product development • question individuals whom undertake certain behavior patterns
  • 3. Starting State • • • • • SQL Server DW on in-house server SQL Server 2008 R2 Enterprise Edition Single 4 core (8 thread) i7 w/ 16GB RAM 2 960GB PCIe SSDs for DBs 1 240GB PCIe SSD for TempDb SQL Server to Redshift - @joeharris76
  • 4. Data Environment • • • • • ~20 billion rows in active use Largest table is also the widest Volume is doubling more than annually Data is in many languages Starts as JSON, ends as Star Schema DW SQL Server to Redshift - @joeharris76
  • 5. Pain Points • • • • • Biggest cost is SQL Server license Biggest bottleneck is single threaded perf. Hand tuning needed to push CPU / disks SSD reliability is not perfect SSD performance degrades over time SQL Server to Redshift - @joeharris76
  • 6. Why Redshift • • • • • • Vertica wanted £45k per terabyte 16 SQL Server Enterprise cores even more! Teradata, Netezza, etc. don’t want <5TB sales SAP HANA not viable for this volume on AWS Infobright does not support incremental loads Hadoop/Impala slow & requires lots of learning SQL Server to Redshift - @joeharris76
  • 7. Data Processing Approach • No ETL tool truly supports Redshift – Requirement to load from S3 is a killer – Tried SSIS, Pentaho, Talend and others • You’re stuck with ELT – Load data then transform as needed – Keep data raw as possible from source SQL Server to Redshift - @joeharris76
  • 8. War of Encodings The road to heaven goes through ÜÑÎÇØDÈ hell SQL Server to Redshift - @joeharris76
  • 9. Redshift: UTF-8 Only • Redshift has zero-tolerance for certain chars – NUL/0x00 => Treated as EOR, documented – DEL/0x7F => Treated as EOR, undocumented – 0xBFEFEF => UTF-8 spec "guaranteed non-char" – These must be removed before loading data • Other control characters can be loaded by escaping – You cannot escape a single column, all or nothing SQL Server to Redshift - @joeharris76
  • 10. SQL Server: UTF-16LE Only • NVARCHAR takes 2x as much space as a VARCHAR • Makes functions consistent across ASCII & Unicode – N/VARCHAR(32) = 32chars / Redshift = 32 bytes • SQL Server tolerates anything character columns • Input and output is not sanitized against UTF-16 spec – Invalid or "guaranteed non-chars" are stored as is SQL Server to Redshift - @joeharris76
  • 11. SQL Extract: The Hard Way • BCP is the “standard” way to extract data • Using BCP your process looks something like this: – Extract data as a huge UTF-16LE file using bcp – Convert to a new UTF-8 file using iconv – Remove or escape problem chars using sed – Compress the final file using gzip – All steps are heavily constrained by disk speed SQL Server to Redshift - @joeharris76
  • 12. SQL Extract: The Easy Way SQLCMD one-liner for extracts: Set the cmd code page to UTF-8 Interactive SQL terminal Prevent summary in output Select from the table / view No column headers Remove special characters Delimit output with 1 ASCII char No padding in output Output in Unicode Pipe stdout to gzip chcp 65001 & sqlcmd –E -Q “SET NOCOUNT ON; SELECT * FROM Db.Schema.Table;” -h-1 -k1 -s”|” -W -u | gzip > “C:file.gz” SQL Server to Redshift - @joeharris76
  • 13. Data Encryption • • • • • On SQL Server we use TDE Redshift offers AES encrypted data on disk Redshift can load client-side encrypted data Client side encryption only applies while on S3 “Small performance penalty” for using AES SQL Server to Redshift - @joeharris76
  • 14. Security • S3 Access => Create bucket(s) just for Redshift staging • Redshift admin => Use IAM, create automation user(s) • Redshift database => – Do not use admin it’s like SQL Server ‘sa’ • Database objects => – Must actively GRANT access to each object – Use groups to make management easier SQL Server to Redshift - @joeharris76
  • 15. Sizing your cluster • Redshift is over-provisioned on storage • Redshift is super efficient at compression – Compression not affected by the data model • Redshift scale out is almost perfectly linear – 2 nodes is twice as fast as 1 node • You'll be sizing your cluster for speed! SQL Server to Redshift - @joeharris76
  • 16. Performance • Redshift speed depends on node count – A single node is not particularly fast • Loading speed appears to be linked to S3 speed – You must use multiple files for bulk loads • Query speed appears to be CPU constrained – Vacuum runs 250 MB/s, queries <20 MB/s • Data modeling matters for complex query speed – Use a star schema & well chosen distribution key SQL Server to Redshift - @joeharris76
  • 17. Data Modeling 2 main concepts to learn • Distribution key – Where data is placed, which node & slice – Needs to be common across most tables • Sort key – How data is ordered on disk within the slice – Good sort keys simply expensive joins SQL Server to Redshift - @joeharris76
  • 18. Database Maintenance • • • • Data loaded to non-empty tables is not sorted Data loaded to non-empty tables may kills their stats ANALYZE rebuilds the stats without making changes VACUUM re-sorts the physical data and rebuilds stats – Needed to get the best performance – Very similar to a REBUILD in SQL Server SQL Server to Redshift - @joeharris76
  • 19. Database Backups • Redshift ‘backups’ are snapshots of the system • Taken very quickly, much slower to restore • Redshift automatically takes intra-day snapshots • Manual snapshots can be run using AWS cmd line • Snapshot storage is free up to size of cluster storage • Snapshots must be restored to an identical cluster • Snapshots cannot be restored to a running cluster SQL Server to Redshift - @joeharris76
  • 20. Code Changes Code changes required so far • ROW_NUMBER() missing in Redshift • We gain LAG() and LEAD() which helps • But very difficult to persist an order value • DATETIMEOFFSET (e.g. timezone) not avail. • DATETIMEs now split into 2 columns • Work in progress… SQL Server to Redshift - @joeharris76
  • 21. That’s all folks! SQL Server to Redshift - @joeharris76
  • 22. Come Work With Me! http://www.realitymine.com/careers/ • Currently trying to fill the following roles: • Business Intelligence Architect (Redshift!) • Business Intelligence Developer (Tableau!) • Test Engineer (Quality!) • Server Developer (C#!) • Mobile App Developer (Android! iOS!) • Project Manager SQL Server to Redshift - @joeharris76

Editor's Notes

  1. Data and Log are always on different disks.Criss-cross pattern used to balance wear.TempDbsplit across 8 files (1 per thread)
  2. TDE required for data encryption.Compression used to maximise SSD speed.A lot of tuning done to push CPU and disks harder.We&apos;ve seen silent partial failures without any indication.Now have to regularly run DBCC to verify databases. So far we&apos;ve seen a ~20% perf loss over a year.
  3. We’re actually using out existing SQL Server automation setup to run batch scripts that execute SQL on Redshift.
  4. Four byte character support was recently added and that makes things a little easier.SQL Server&apos;s REPLACE() function is **broken** and ***cannot remove any of these values***! Yes, really. I can&apos;t tell you how fun it was to figure that out. Because it wasn&apos;t fun at all.All escape sensitive data must be escaped in all columns.Embedded newlines **must** be escaped as &apos;\n’
  5. vsOracle which has LENGTH() for characters and LENGTHB() for bytes.vsRedshift which has only LENGTH() and no way to get the byte length.SQL Server will tolerate _anything_ inside a character columnNo sanitisation of inputs or outputsUTF-16LE *compatible*, rather than *compliant* I know this from painful experience
  6. All web searches will suggest using BCP.All ETL tools actually wrap BCP to get data out**Forget about BCP. BCP is the enemy.**BCP DOES NOT SUPPORT STDOUT!!!
  7. Voila! UTF-8 output from SQL Server directly to a gzip file.
  8. * On SQL Server we use TDE (transparent encryption) * Data on disk is AES encrypted, transparently.* Redshift offers AES encryption of the data on disk. * Not actively encrypted during use, same as SQL Server.* Redshift supports loading client-side &apos;evelope&apos; encrypted data. * Good luck with that! * Slow: You&apos;ll have land your data on disk and then reprocess it. * Custom: You&apos;ll have to write your own encrypter using Open SSL or some such. * Client side encryption is somewhat moot as it only applies while data is on S3. * My 2p: Enable AES on both S3 and Redshift. Call it a day.* Amazon says there is a &apos;small perfomance penalty&apos; for using AES. * In practice it seems to be acceptable. * I have *not actually tested* it without AES because I don&apos;t want to generate 10 billion rows of sample data.
  9. * Managing user and admin access is kind of a pain in Redshift1. Access to S3 * Create bucket(s) just for Redshift staging data.2. Access to Redshift admin * Use IAM access controls to limit individual&apos;s access. * Create users just for automation and enforce password rotation. 3. Access to Redshift database * **Do not allow** use of the admin user - it&apos;s like SQL Server&apos;s `sa`. * Create 1:1 map of external users to Redshift users (no LDAP/AD support)4. Access to specific database objects * You must actively `GRANT` access to each object. * Use groups to make this task easier. * We have just 2 groups: &quot;admin&quot; (`GRANT ALL`) and &quot;readers&quot; (`GRANT SELECT`)
  10. * Redshift nodes are waaaaaay over-provisioned on storage * 2 TB of storage available per node* Redshift is suuuuuper efficient at compression * Our data in Redshift is roughly 2x the gzipped UTF8 input. * The size varies depending on how we sort the tables. * Therefore you&apos;ll be sizing the cluster for **speed**. * You add nodes to go faster _not when you run out of disk._* Tough to get your head around.
  11. Still faster than SQL Server on PCIe SSDs for our dataYou must use multiple files for bulk loads
  12. You cannot schedule these AFAICTThey are auto-deleted on a schedule you can setDefault auto-delete is 1 dayPriced same as S3 beyond cluster size