SlideShare a Scribd company logo
1 of 41
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
November 30, 2016
Migrating Your Data Warehouse to Amazon Redshift
DAT202
Pavan Pothukuchi, Sr. Manager PM, Amazon Redshift
Ali Khan, Director of BI and Analytics, Scholastic
Laxmikanth Malladi, Principal Architect, Northbay Solutions
“It’s our biggest driver of growth in our biggest markets, and is a feature of the
company” …on Data Mining in Redshift
– Chris Lambert, Lyft CTO
“The doors were blown wide open to create custom dashboards for anyone to
instantly go in and see and assess what is going in our ad delivery landscape,
something we have never been able to do until now.”
– Bryan Blair, Vevo’s VP of Ad Operations
“Analytical queries are 10 times faster in Amazon Redshift than they
were with our previous data warehouse.”
– Yuki Moritani, NTT Docomo Innovation Manager
“We have several petabytes of data and use a massive Redshift
cluster. Our data science team can get to the data faster and then
analyze that data to find new ways to reduce costs, market
products, and enable new business.”
– Yuki Moritani, NTT Docomo Innovation Manager
“We saw a 2x performance improvement on a wide variety of
workloads. The more complex the queries, the higher the
performance improvement..”
- Naeem Ali, Director of Software Development, Data
Science at Cablevision (Optimum)
“Over the last few years, we’ve tried all kinds of databases in search of more
speed, including $15k of custom hardware. Of everything we’ve tried,
Amazon Redshift won out each time.”
– Periscope Data, Analyst’s Guide to Redshift
“We took Amazon Redshift for a test run the moment it was
released. It’s fast. It’s easy. Did I mention it’s ridiculously fast?
We’re using it to provide our analysts an alternative to Hadoop.”
– Justin Yan, Data Scientist at Yelp
“The move to Redshift also significantly improved dashboard query
performance… Redshift performed ~200% faster than the
traditional SQL Server we had been using in the past.”
- Dean Donovan, Product Development at DiamondStream
“…[Redshift] performance has blown away everyone here (we
generally see 50-100x speedup over Hive)”
- Jie Li Data Infrastructure at Pinterest
“450,000 online queries 98 percent faster than previous traditional data
center, while reducing infrastructure costs by 80 percent.”
- John O’Donovan, CTO, Financial Times
“We needed to load six months' worth of data, about 10 TB of data, for a
campaign. That type of load would have taken about 20 days with our previous
solution. By using Amazon Redshift, it only took six hours to load the data.”
- Zhong Hong, VP of Infrastructure, Vivaki (Publicis Groupe)
“We regularly process multibillion row datasets and we do that in a
matter of hours. We are heading to up to 10 times more data volumes in
the next couple of years, easily.”
- Bob Harris, CTO, Channel 4
“On our previous big data warehouse system, it took around 45
minutes to run a query against a year of data, but that number went
down to just 25 seconds using Amazon Redshift”
- Kishore Raja Director of Strategic Programs and R&D, Boingo Wireless
“Most competing data warehousing solutions would have cost us up
to $1 million a year. By contrast, Amazon Redshift costs us just
$100,000 all-in, representing a total cost savings of around 90%”
- Joel Cumming, Head of Data, Kik Interactive
“Annual costs of Redshift are equivalent to just the annual
maintenance of some of the cheaper on-premises options for
data warehouses..”
- Kevin Diamond, CTO, HauteLook (Nordstrom)
“Our data volume keeps growing, and we can support that
growth because Amazon Redshift scales so well.. We wouldn’t
have that capability using the supporting on-premises hardware in
our previous solution.”
- Ajit Zadgaonkar, Director of Ops. and Infrastructure, Edmunds
“With Amazon Redshift and Tableau, anyone in the company can set up
any queries they like - from how users are reacting to a feature, to growth by
demographic or geography, to the impact sales efforts had in different areas”
- Jon Hoffman, Head of Engineering, Foursquare
Today’s agenda
• Amazon Redshift Overview
• Use cases and benefits
• Migration options
• Scholastic’s use case
• Architecture details
• Technical overview
• Key project learnings
Relational data warehouse
Massively parallel; petabyte scale
Fully managed
HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical
representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any
vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
Forrester Wave™ Enterprise Data Warehouse Q4 ’15
Selected Amazon Redshift customers
Why migrate to Amazon Redshift?
100x faster
Scales from GBs to PBs
Analyze data without storage
constraints
10x cheaper
Easy to provision and operate
Higher productivity
10x faster
No programming
Standard interfaces and
integration to leverage BI tools,
machine learning, streaming
Transactional database MPP database Hadoop
Migration from Oracle @ Boingo Wireless
2000+ Commercial Wi-Fi locations
1 million+ Hotspots
90M+ ad engagements
100+ countries
Legacy DW: Oracle 11g based DW
Before migration
Rapid data growth slowed
analytics
Mediocre IOPS, limited memory,
vertical scaling
Admin overhead
Expensive (license, h/w, support)
After migration
180x performance improvement
7x cost savings
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
Exadata SAP
HANA
Redshift
$400,000
$300,000
$55,000
7,200
2,700
15 15
Query
Performance
Data Load
Performance
1 year of data
1 million records
Latencyinseconds
RedshiftExisting System
7X cheaper than Oracle Exadata 180X faster than Oracle database
Migration from Oracle @ Boingo Wireless
Migration from Greenplum @ NTT Docomo
68 million customers
10s of TBs per day of data across
mobile network
6PB of total data (uncompressed)
Data science for marketing
operations, logistics etc.
Legacy DW: Greenplum on-premises
After migration:
125 node DS2.8XL cluster
4,500 vCPUs, 30TB RAM
6 PB uncompressed
10x faster analytic queries
50% reduction in time for new BI
app. deployment
Significantly less ops. overhead
Migration from SQL on Hadoop @ Yahoo
Analytics for website/mobile events
across multiple Yahoo properties
On an average day
2B events
25M devices
Before migration: Hive – Found it to be
slow, hard to use, share and repeat
After migration:
21 node DC1.8XL (SSD)
50TB compressed data
100x performance improvement
Real-time insights
Easier deployment and
maintenance
Migration from SQL on Hadoop @ Yahoo
1
10
100
1000
10000
Count
Distinct
Devices
Count All
Events
Filter
Clauses
Joins
Seconds
Amazon Redshift
Impala
Business Value and Productivity
Business Productivity Benefits
Analyze more data
Faster time to market
Get better insights
Match capacity with demand
ENGINE X Amazon Redshift
ETL Scripts
SQL in reports
Adhoc. queries
How to Migrate?
Schema Conversion Database Migration
Map data types
Choose compression
encoding, sort keys,
distribution keys
Generate and apply DDL
Schema & Data
Transformation
Data Migration
Convert SQL Code
Bulk Load
Capture updates
Transformations
Assess Gaps
Stored Procedures
Functions
1 2
3
4
Convert schema in a few clicks
Sources include Oracle, Teradata,
Greenplum and Netezza
Automatic schema optimization
Converts application SQL code
Detailed assessment report
AWS Schema
Conversion Tool
(AWS SCT)
AWS Schema Conversion Tool
Start your first migration in few minutes
Sources include: Aurora, Oracle, SQL
Server, MySQL and PostgreSQL
Bulk load and continuous replication
Migrate a TB for $3
Fault tolerant
(AWS DMS)
AWS DMS: Change data capture
Replication instance
Source Target
Update
t1 t2
t1
t2
Transactions Change
apply
after bulk
load
Data integration partners
Data Integration Systems Integrators
Amazon Redshift
Beyond Amazon Redshift…
Scholastic, Established 1920
Where were we?
Platform
13+ years old. IBM AS/400 DB2 and Microsoft SQL Server are the primary data
warehouse platforms. BI Platform is primarily Microsoft (SSRS, SSAS, Excel, SharePoint)
500+ direct users across every LOB and business function
20+ TB. 5,500+ DB2 workloads, 350+ SQL Server workloads, 15 SSAS cubes, 150+
SSRS reports
Challenges
Inflexible, multi-layered architecture – slow time to market
Inability to meet internal SLAs due to performance of daily ETL processes
Scalability limitations with SQL Server Analysis Services (SSAS) for reports
Limited ability to perform self-service Business Intelligence
21
Moving forward: Key decision factors
• Improved performance, scalability, availability,
logging, security
• Enablement of self service business intelligence
• Leverage the skill set of current team (Relational DB
& SQL)
• Integration with existing technology stack
• Alignment with the tech strategy (devops model,
Cloud First)
• Ability to support Big Data initiatives
• Team up with an experienced consulting partner
22
Why we chose AWS and Amazon Redshift
AWS was chosen for its agility, scalability, elasticity, and
security
Redshift
• Scalable, fast
• Managed service, cost-optimization models,
elastic
• SQL/relational matched skillset of team
S3 was chosen as location for ingestion process
NorthBay was chosen as the implementation partner for
their expertise in Big Data and Redshift migrations
23
How the project unfolded
Goals
• 3-month pilot to migrate a Functional area in key LOB
• Demonstrate immediate business value
• Use AWS Stack & Open Source for Data Movement from DB2
(No CDC/ETL tool)
Outcomes
• Core Framework for Migration
• ELT Architecture and Validation
• Visualization/Self-service capability through Tableau
EMR Cluster running
Sqoop Script
Output Bucket EC2 Instance running
Copy Command
Redshift
(Staging)
Data Pipeline
SNS Topic
(Pipeline Status) (Pipeline Failure)
SNS Email Notification
Lambda
(Save Pipeline Stats)
RDS MySQL Instance
(Pipeline
Configurations)
DynamoDB
Redshift
(Enterprise Data
Repository)
AS400 / DB2
(Staging)
SQL Server EDW
Tableau
(Reporting Tool)
Source
DBs
SSAS CubesSSRS Reports
Scholastic data cloud: Technical architecture
Core Framework
• Jobs and Job Groups are defined as metadata in DynamoDB
• Control-M scheduler, Custom Application and Data Pipeline for
Orchestration
• ELT Process with EMR/Sqoop for Extraction. Load and Transform
the data through Redshift SQL scripts
• Core Framework enables
• Restart capability from point of failure
• Capturing of operational statistics (# of rows updated, etc.)
• Audit capability (which feed caused the Fact to change, etc.)
26
Extract
• Pre-create EMR resources at the start of Batch
• Achieve parallelism in Sqoop with mappers and Fair Scheduling
• Sqoop query to add additional fields like Batch_id, Updated_date etc
• Data extracts are split and compressed for optimized loading into Redshift
27
AS400 / DB2
EMR with Sqoop
S3
Metadata
KMS
Data Pipeline
1
2
3
4
5 6
Control Flow
Data Flow
Load
• Truncate and Load through Data Pipeline for Staging tables
• Dynamic Work Load Management (WLM) queues setup to allow maximum
resources during Loading/Transformation
• Check and terminate any locks on tables to allow truncation
• Capture metrics related to number of rows loaded, time taken, etc.28
StagingS3
KMS
Data Pipeline
4
1 2
3
EC2 Control Flow
Data Flow
Transform
• Custom Application for building Dimensions and Facts
• SQL Scripts are stored in S3 and executed by ELT process
• SQL scripts refactored from SQL Server and AS400 scripts
• Non-Functional Requirements are achieved through Custom App
29
1
3
2
4
5
6
7a
7b
S3
Staging
Facts
Metadata
Dimensions
App
Control Flow
Data Flow
Schema Design
• Modified Star Schema
• Natural Keys instead of generating unique identifiers
• Commonly used columns from Dimensions are copied over to
Facts
• Surrogate keys are eliminated except for few cases
• Compression
• Define appropriate Distribution and Sort Keys
• Define primary key and Foreign keys
Security
• AWS Key Management Service (KMS) is used for encrypting
access credentials to Source and Target databases
• Jenkins job to allow encrypting of credentials using KMS
directly by Database Administrators
• Amazon EMR, Jenkins resources are given KMS decrypt
permissions to allow connecting to Sources and Targets during
the ELT process
• Standard Security in Transit and at Rest throughout the process
• IAM federation through Enterprise Active Directory
31
Reporting
• Business users access to Facts/Dimensions through Tableau
• Power users access to Staging tables through Tableau
• Enable Data Analysts access to files in S3 using Hive/Presto
• Self-Service capability across business users
32
S3 Staging Facts/ Dimensions
Business
Analysts
Power
Users
Data
Analysts
EMR
Presto/Hive
Workstream Effort
• Define Jobs and Job Groups specific to each
Workstream
• Create Redshift tables (Staging, Facts, Dimensions)
based on mapping from AS400 and best practices
learned
• Create new SQL scripts (based on the logic from
AS400/SQL Server code) for transformation
• Develop, Test and Deploy in 2-week Agile sprints
33
Key Lessons - Technical
• Isolate core framework with project specific code repositories
• Consolidating logging solution across Amazon S3, Amazon
Redshift, Amazon DynamoDB etc., was a challenge
• Make appropriate schema changes when migrating to new
platform
• Custom Framework for gathering operational stats (eg: # of
rows loaded etc.)
• Start with Test Automation tools and Acceptance Test Driven
Development (ATDD) earlier in the project
34
Project timeline revisited
After the successful pilot:
• Executive Leadership accelerated timeline:
• Reduce project timeline by 50% (to 12 months) to
deliver value faster to LOBs
• Realize cost savings by eliminating the DB2 and
SQL Server platforms earlier
• Users wanted to be on the new platform!
• Scholastic & NorthBay partnered to create a
training curriculum to ensure a supply of skilled
staff would be available to our teams
35
Scaling up: 7 workstreams
• Developed a model for estimating effort and cost
(AWS costs & Labor per LOB migration)
• Running agile teams in parallel – employed Agile
coaches
• Enhanced the core framework to ensure it would
scale effectively when in use by multiple teams
simultaneously
• Building a Code repository for use by all teams
• Building CI / CD Frameworks
Where are we now?
• 4 of 7 LOBs migrated – framework enables complete migration of a
functional area within days/weeks as opposed to months. On track to
migrate and decommission entire legacy environment within next 6
months
• 10 weeks to migrate from an external vendor hosting data and providing
reports for one LoB
• Cost of Data Ingestion Framework is under $40/day (EC2, EMR, Data
Pipeline)
• First “Big Data” initiative in production, captures and processes an
average of 1.5 Million e reading events daily (peak: 7 Million)
• Profile: LOB #1
• Loading ~5-6 Million rows/day (6-7GB/day)
• Processing over 1.5 billion rows within Redshift daily
• Complete ETL/ELT batch cycle performance improved by over 170%
Key lessons – project execution
• Essential to monitor and optimize AWS costs
• “Data Champion” / “Data Guide” partnership absolutely critical for
successful adoption of new platforms
• Importance of strong Agile coaches while scaling out Agile teams
• Criticality of choosing consulting partners (AWS & North Bay)
who can ramp up and supply key resources fast and cycle off the
project when finished
• Creating new data platforms and migrating data into them is
easy, especially with AWS. Decommission of existing data
platforms is hard!
38
Thank you!
Remember to complete
your evaluations!
Related Sessions
Hear from other customers discussing their Amazon Redshift use cases:
• BDM402—Best Practices for Data Warehousing with Amazon Redshift (King.com)
• BDA304—What’s New with Amazon Redshift
• SVR308—Content and Data Platforms at Vevo: Rebuilding and Scaling from Zero in One Year
• GAM301—How EA Leveraged Amazon Redshift and AWS Partner 47Lining to Gather Meaningful
Player Insights
• BDA207—Fanatics: Deploying Scalable, Self-Service Business Intelligence on AWS
• BDM306— Netflix: Using Amazon S3 as the fabric of our big data ecosystem
• BDA203 — Billions of Rows Transformed in Record Time Using Matillion ETL for Amazon Redshift
(GE Power and Water)
• BDM206 — Understanding IoT Data: How to Leverage Amazon Kinesis in Building an IoT
Analytics Platform on AWS (Hello)
• STG307— Case Study: How Prezi Built and Scales a Cost-Effective, Multipetabyte Data Platform
and Storage Infrastructure on Amazon S3

More Related Content

What's hot

Mass Migration Strategy - A Key Step in the Enterprise Transformation - AWS C...
Mass Migration Strategy - A Key Step in the Enterprise Transformation - AWS C...Mass Migration Strategy - A Key Step in the Enterprise Transformation - AWS C...
Mass Migration Strategy - A Key Step in the Enterprise Transformation - AWS C...AWS Germany
 
AWS Cloud Adoption Framework and Workshops
AWS Cloud Adoption Framework and WorkshopsAWS Cloud Adoption Framework and Workshops
AWS Cloud Adoption Framework and WorkshopsTom Laszewski
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
AWS Cloud Cost Optimization
AWS Cloud Cost OptimizationAWS Cloud Cost Optimization
AWS Cloud Cost OptimizationYogesh Sharma
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
Large-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCLarge-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCAmazon Web Services
 
An Overview of Best Practices for Large Scale Migrations - AWS Transformation...
An Overview of Best Practices for Large Scale Migrations - AWS Transformation...An Overview of Best Practices for Large Scale Migrations - AWS Transformation...
An Overview of Best Practices for Large Scale Migrations - AWS Transformation...Amazon Web Services
 
Considerations for your Cloud Journey
Considerations for your Cloud JourneyConsiderations for your Cloud Journey
Considerations for your Cloud JourneyAmazon Web Services
 
Perform a Cloud Readiness Assessment for Your Own Company
Perform a Cloud Readiness Assessment for Your Own CompanyPerform a Cloud Readiness Assessment for Your Own Company
Perform a Cloud Readiness Assessment for Your Own CompanyAmazon Web Services
 
Cloud Migration Checklist | Microsoft Azure Migration
Cloud Migration Checklist | Microsoft Azure MigrationCloud Migration Checklist | Microsoft Azure Migration
Cloud Migration Checklist | Microsoft Azure MigrationIntellika
 
Accelerating Your Cloud Migration Journey with MAP
Accelerating Your Cloud Migration Journey with MAPAccelerating Your Cloud Migration Journey with MAP
Accelerating Your Cloud Migration Journey with MAPAmazon Web Services
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Journey Through the AWS Cloud: Cost Optimisation
Journey Through the AWS Cloud: Cost OptimisationJourney Through the AWS Cloud: Cost Optimisation
Journey Through the AWS Cloud: Cost OptimisationAmazon Web Services
 

What's hot (20)

AWS Migration Planning Roadmap
AWS Migration Planning RoadmapAWS Migration Planning Roadmap
AWS Migration Planning Roadmap
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Mass Migration Strategy - A Key Step in the Enterprise Transformation - AWS C...
Mass Migration Strategy - A Key Step in the Enterprise Transformation - AWS C...Mass Migration Strategy - A Key Step in the Enterprise Transformation - AWS C...
Mass Migration Strategy - A Key Step in the Enterprise Transformation - AWS C...
 
AWS Cloud Adoption Framework and Workshops
AWS Cloud Adoption Framework and WorkshopsAWS Cloud Adoption Framework and Workshops
AWS Cloud Adoption Framework and Workshops
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
AWS Cloud Cost Optimization
AWS Cloud Cost OptimizationAWS Cloud Cost Optimization
AWS Cloud Cost Optimization
 
Cloud Migration Workshop
Cloud Migration WorkshopCloud Migration Workshop
Cloud Migration Workshop
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Large-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCLarge-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSC
 
An Overview of Best Practices for Large Scale Migrations - AWS Transformation...
An Overview of Best Practices for Large Scale Migrations - AWS Transformation...An Overview of Best Practices for Large Scale Migrations - AWS Transformation...
An Overview of Best Practices for Large Scale Migrations - AWS Transformation...
 
Considerations for your Cloud Journey
Considerations for your Cloud JourneyConsiderations for your Cloud Journey
Considerations for your Cloud Journey
 
Perform a Cloud Readiness Assessment for Your Own Company
Perform a Cloud Readiness Assessment for Your Own CompanyPerform a Cloud Readiness Assessment for Your Own Company
Perform a Cloud Readiness Assessment for Your Own Company
 
AWS-Data-Migration-module3
AWS-Data-Migration-module3AWS-Data-Migration-module3
AWS-Data-Migration-module3
 
Cloud Migration Checklist | Microsoft Azure Migration
Cloud Migration Checklist | Microsoft Azure MigrationCloud Migration Checklist | Microsoft Azure Migration
Cloud Migration Checklist | Microsoft Azure Migration
 
Accelerating Your Cloud Migration Journey with MAP
Accelerating Your Cloud Migration Journey with MAPAccelerating Your Cloud Migration Journey with MAP
Accelerating Your Cloud Migration Journey with MAP
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Masterclass Live: Amazon EMR
Masterclass Live: Amazon EMRMasterclass Live: Amazon EMR
Masterclass Live: Amazon EMR
 
Amazon S3 Masterclass
Amazon S3 MasterclassAmazon S3 Masterclass
Amazon S3 Masterclass
 
Journey Through the AWS Cloud: Cost Optimisation
Journey Through the AWS Cloud: Cost OptimisationJourney Through the AWS Cloud: Cost Optimisation
Journey Through the AWS Cloud: Cost Optimisation
 

Viewers also liked

(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
 
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Amazon Web Services
 
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWSAmazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)Amazon Web Services
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech TalksAmazon Web Services
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...Amazon Web Services
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and consSaniya Khalsa
 
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Amazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)Amazon Web Services
 
Understanding AWS Storage Options
Understanding AWS Storage OptionsUnderstanding AWS Storage Options
Understanding AWS Storage OptionsAmazon Web Services
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...Amazon Web Services
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesAmazon Web Services
 
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon Web Services
 
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAmazon Web Services
 
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Amazon Web Services
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
Introduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesIntroduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesAmazon Web Services
 

Viewers also liked (20)

(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
 
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
 
(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive
 
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and cons
 
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
 
Intro to AWS: Storage Services
Intro to AWS: Storage ServicesIntro to AWS: Storage Services
Intro to AWS: Storage Services
 
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
 
Understanding AWS Storage Options
Understanding AWS Storage OptionsUnderstanding AWS Storage Options
Understanding AWS Storage Options
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
 
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
 
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Introduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesIntroduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code Services
 

Similar to Migrate Your Data Warehouse to Amazon Redshift

AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)Amazon Web Services
 
Amazon Redshift (February 2016)
Amazon Redshift (February 2016)Amazon Redshift (February 2016)
Amazon Redshift (February 2016)Julien SIMON
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...Julien SIMON
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Web Services
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightAmazon Web Services LATAM
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading StrategiesMongoDB
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...Amazon Web Services
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스Amazon Web Services Korea
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackSnapLogic
 

Similar to Migrate Your Data Warehouse to Amazon Redshift (20)

AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
 
Amazon Redshift (February 2016)
Amazon Redshift (February 2016)Amazon Redshift (February 2016)
Amazon Redshift (February 2016)
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management Stack
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 

Recently uploaded (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 

Migrate Your Data Warehouse to Amazon Redshift

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. November 30, 2016 Migrating Your Data Warehouse to Amazon Redshift DAT202 Pavan Pothukuchi, Sr. Manager PM, Amazon Redshift Ali Khan, Director of BI and Analytics, Scholastic Laxmikanth Malladi, Principal Architect, Northbay Solutions “It’s our biggest driver of growth in our biggest markets, and is a feature of the company” …on Data Mining in Redshift – Chris Lambert, Lyft CTO “The doors were blown wide open to create custom dashboards for anyone to instantly go in and see and assess what is going in our ad delivery landscape, something we have never been able to do until now.” – Bryan Blair, Vevo’s VP of Ad Operations “Analytical queries are 10 times faster in Amazon Redshift than they were with our previous data warehouse.” – Yuki Moritani, NTT Docomo Innovation Manager “We have several petabytes of data and use a massive Redshift cluster. Our data science team can get to the data faster and then analyze that data to find new ways to reduce costs, market products, and enable new business.” – Yuki Moritani, NTT Docomo Innovation Manager “We saw a 2x performance improvement on a wide variety of workloads. The more complex the queries, the higher the performance improvement..” - Naeem Ali, Director of Software Development, Data Science at Cablevision (Optimum) “Over the last few years, we’ve tried all kinds of databases in search of more speed, including $15k of custom hardware. Of everything we’ve tried, Amazon Redshift won out each time.” – Periscope Data, Analyst’s Guide to Redshift “We took Amazon Redshift for a test run the moment it was released. It’s fast. It’s easy. Did I mention it’s ridiculously fast? We’re using it to provide our analysts an alternative to Hadoop.” – Justin Yan, Data Scientist at Yelp “The move to Redshift also significantly improved dashboard query performance… Redshift performed ~200% faster than the traditional SQL Server we had been using in the past.” - Dean Donovan, Product Development at DiamondStream “…[Redshift] performance has blown away everyone here (we generally see 50-100x speedup over Hive)” - Jie Li Data Infrastructure at Pinterest “450,000 online queries 98 percent faster than previous traditional data center, while reducing infrastructure costs by 80 percent.” - John O’Donovan, CTO, Financial Times “We needed to load six months' worth of data, about 10 TB of data, for a campaign. That type of load would have taken about 20 days with our previous solution. By using Amazon Redshift, it only took six hours to load the data.” - Zhong Hong, VP of Infrastructure, Vivaki (Publicis Groupe) “We regularly process multibillion row datasets and we do that in a matter of hours. We are heading to up to 10 times more data volumes in the next couple of years, easily.” - Bob Harris, CTO, Channel 4 “On our previous big data warehouse system, it took around 45 minutes to run a query against a year of data, but that number went down to just 25 seconds using Amazon Redshift” - Kishore Raja Director of Strategic Programs and R&D, Boingo Wireless “Most competing data warehousing solutions would have cost us up to $1 million a year. By contrast, Amazon Redshift costs us just $100,000 all-in, representing a total cost savings of around 90%” - Joel Cumming, Head of Data, Kik Interactive “Annual costs of Redshift are equivalent to just the annual maintenance of some of the cheaper on-premises options for data warehouses..” - Kevin Diamond, CTO, HauteLook (Nordstrom) “Our data volume keeps growing, and we can support that growth because Amazon Redshift scales so well.. We wouldn’t have that capability using the supporting on-premises hardware in our previous solution.” - Ajit Zadgaonkar, Director of Ops. and Infrastructure, Edmunds “With Amazon Redshift and Tableau, anyone in the company can set up any queries they like - from how users are reacting to a feature, to growth by demographic or geography, to the impact sales efforts had in different areas” - Jon Hoffman, Head of Engineering, Foursquare
  • 2. Today’s agenda • Amazon Redshift Overview • Use cases and benefits • Migration options • Scholastic’s use case • Architecture details • Technical overview • Key project learnings
  • 3. Relational data warehouse Massively parallel; petabyte scale Fully managed HDD and SSD platforms $1,000/TB/year; starts at $0.25/hour Amazon Redshift a lot faster a lot simpler a lot cheaper
  • 4. The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change. Forrester Wave™ Enterprise Data Warehouse Q4 ’15
  • 6. Why migrate to Amazon Redshift? 100x faster Scales from GBs to PBs Analyze data without storage constraints 10x cheaper Easy to provision and operate Higher productivity 10x faster No programming Standard interfaces and integration to leverage BI tools, machine learning, streaming Transactional database MPP database Hadoop
  • 7. Migration from Oracle @ Boingo Wireless 2000+ Commercial Wi-Fi locations 1 million+ Hotspots 90M+ ad engagements 100+ countries Legacy DW: Oracle 11g based DW Before migration Rapid data growth slowed analytics Mediocre IOPS, limited memory, vertical scaling Admin overhead Expensive (license, h/w, support) After migration 180x performance improvement 7x cost savings
  • 8. 0 50,000 100,000 150,000 200,000 250,000 300,000 350,000 400,000 Exadata SAP HANA Redshift $400,000 $300,000 $55,000 7,200 2,700 15 15 Query Performance Data Load Performance 1 year of data 1 million records Latencyinseconds RedshiftExisting System 7X cheaper than Oracle Exadata 180X faster than Oracle database Migration from Oracle @ Boingo Wireless
  • 9. Migration from Greenplum @ NTT Docomo 68 million customers 10s of TBs per day of data across mobile network 6PB of total data (uncompressed) Data science for marketing operations, logistics etc. Legacy DW: Greenplum on-premises After migration: 125 node DS2.8XL cluster 4,500 vCPUs, 30TB RAM 6 PB uncompressed 10x faster analytic queries 50% reduction in time for new BI app. deployment Significantly less ops. overhead
  • 10. Migration from SQL on Hadoop @ Yahoo Analytics for website/mobile events across multiple Yahoo properties On an average day 2B events 25M devices Before migration: Hive – Found it to be slow, hard to use, share and repeat After migration: 21 node DC1.8XL (SSD) 50TB compressed data 100x performance improvement Real-time insights Easier deployment and maintenance
  • 11. Migration from SQL on Hadoop @ Yahoo 1 10 100 1000 10000 Count Distinct Devices Count All Events Filter Clauses Joins Seconds Amazon Redshift Impala
  • 12. Business Value and Productivity Business Productivity Benefits Analyze more data Faster time to market Get better insights Match capacity with demand
  • 13. ENGINE X Amazon Redshift ETL Scripts SQL in reports Adhoc. queries How to Migrate? Schema Conversion Database Migration Map data types Choose compression encoding, sort keys, distribution keys Generate and apply DDL Schema & Data Transformation Data Migration Convert SQL Code Bulk Load Capture updates Transformations Assess Gaps Stored Procedures Functions 1 2 3 4
  • 14. Convert schema in a few clicks Sources include Oracle, Teradata, Greenplum and Netezza Automatic schema optimization Converts application SQL code Detailed assessment report AWS Schema Conversion Tool (AWS SCT)
  • 16. Start your first migration in few minutes Sources include: Aurora, Oracle, SQL Server, MySQL and PostgreSQL Bulk load and continuous replication Migrate a TB for $3 Fault tolerant (AWS DMS)
  • 17. AWS DMS: Change data capture Replication instance Source Target Update t1 t2 t1 t2 Transactions Change apply after bulk load
  • 18. Data integration partners Data Integration Systems Integrators Amazon Redshift
  • 21. Where were we? Platform 13+ years old. IBM AS/400 DB2 and Microsoft SQL Server are the primary data warehouse platforms. BI Platform is primarily Microsoft (SSRS, SSAS, Excel, SharePoint) 500+ direct users across every LOB and business function 20+ TB. 5,500+ DB2 workloads, 350+ SQL Server workloads, 15 SSAS cubes, 150+ SSRS reports Challenges Inflexible, multi-layered architecture – slow time to market Inability to meet internal SLAs due to performance of daily ETL processes Scalability limitations with SQL Server Analysis Services (SSAS) for reports Limited ability to perform self-service Business Intelligence 21
  • 22. Moving forward: Key decision factors • Improved performance, scalability, availability, logging, security • Enablement of self service business intelligence • Leverage the skill set of current team (Relational DB & SQL) • Integration with existing technology stack • Alignment with the tech strategy (devops model, Cloud First) • Ability to support Big Data initiatives • Team up with an experienced consulting partner 22
  • 23. Why we chose AWS and Amazon Redshift AWS was chosen for its agility, scalability, elasticity, and security Redshift • Scalable, fast • Managed service, cost-optimization models, elastic • SQL/relational matched skillset of team S3 was chosen as location for ingestion process NorthBay was chosen as the implementation partner for their expertise in Big Data and Redshift migrations 23
  • 24. How the project unfolded Goals • 3-month pilot to migrate a Functional area in key LOB • Demonstrate immediate business value • Use AWS Stack & Open Source for Data Movement from DB2 (No CDC/ETL tool) Outcomes • Core Framework for Migration • ELT Architecture and Validation • Visualization/Self-service capability through Tableau
  • 25. EMR Cluster running Sqoop Script Output Bucket EC2 Instance running Copy Command Redshift (Staging) Data Pipeline SNS Topic (Pipeline Status) (Pipeline Failure) SNS Email Notification Lambda (Save Pipeline Stats) RDS MySQL Instance (Pipeline Configurations) DynamoDB Redshift (Enterprise Data Repository) AS400 / DB2 (Staging) SQL Server EDW Tableau (Reporting Tool) Source DBs SSAS CubesSSRS Reports Scholastic data cloud: Technical architecture
  • 26. Core Framework • Jobs and Job Groups are defined as metadata in DynamoDB • Control-M scheduler, Custom Application and Data Pipeline for Orchestration • ELT Process with EMR/Sqoop for Extraction. Load and Transform the data through Redshift SQL scripts • Core Framework enables • Restart capability from point of failure • Capturing of operational statistics (# of rows updated, etc.) • Audit capability (which feed caused the Fact to change, etc.) 26
  • 27. Extract • Pre-create EMR resources at the start of Batch • Achieve parallelism in Sqoop with mappers and Fair Scheduling • Sqoop query to add additional fields like Batch_id, Updated_date etc • Data extracts are split and compressed for optimized loading into Redshift 27 AS400 / DB2 EMR with Sqoop S3 Metadata KMS Data Pipeline 1 2 3 4 5 6 Control Flow Data Flow
  • 28. Load • Truncate and Load through Data Pipeline for Staging tables • Dynamic Work Load Management (WLM) queues setup to allow maximum resources during Loading/Transformation • Check and terminate any locks on tables to allow truncation • Capture metrics related to number of rows loaded, time taken, etc.28 StagingS3 KMS Data Pipeline 4 1 2 3 EC2 Control Flow Data Flow
  • 29. Transform • Custom Application for building Dimensions and Facts • SQL Scripts are stored in S3 and executed by ELT process • SQL scripts refactored from SQL Server and AS400 scripts • Non-Functional Requirements are achieved through Custom App 29 1 3 2 4 5 6 7a 7b S3 Staging Facts Metadata Dimensions App Control Flow Data Flow
  • 30. Schema Design • Modified Star Schema • Natural Keys instead of generating unique identifiers • Commonly used columns from Dimensions are copied over to Facts • Surrogate keys are eliminated except for few cases • Compression • Define appropriate Distribution and Sort Keys • Define primary key and Foreign keys
  • 31. Security • AWS Key Management Service (KMS) is used for encrypting access credentials to Source and Target databases • Jenkins job to allow encrypting of credentials using KMS directly by Database Administrators • Amazon EMR, Jenkins resources are given KMS decrypt permissions to allow connecting to Sources and Targets during the ELT process • Standard Security in Transit and at Rest throughout the process • IAM federation through Enterprise Active Directory 31
  • 32. Reporting • Business users access to Facts/Dimensions through Tableau • Power users access to Staging tables through Tableau • Enable Data Analysts access to files in S3 using Hive/Presto • Self-Service capability across business users 32 S3 Staging Facts/ Dimensions Business Analysts Power Users Data Analysts EMR Presto/Hive
  • 33. Workstream Effort • Define Jobs and Job Groups specific to each Workstream • Create Redshift tables (Staging, Facts, Dimensions) based on mapping from AS400 and best practices learned • Create new SQL scripts (based on the logic from AS400/SQL Server code) for transformation • Develop, Test and Deploy in 2-week Agile sprints 33
  • 34. Key Lessons - Technical • Isolate core framework with project specific code repositories • Consolidating logging solution across Amazon S3, Amazon Redshift, Amazon DynamoDB etc., was a challenge • Make appropriate schema changes when migrating to new platform • Custom Framework for gathering operational stats (eg: # of rows loaded etc.) • Start with Test Automation tools and Acceptance Test Driven Development (ATDD) earlier in the project 34
  • 35. Project timeline revisited After the successful pilot: • Executive Leadership accelerated timeline: • Reduce project timeline by 50% (to 12 months) to deliver value faster to LOBs • Realize cost savings by eliminating the DB2 and SQL Server platforms earlier • Users wanted to be on the new platform! • Scholastic & NorthBay partnered to create a training curriculum to ensure a supply of skilled staff would be available to our teams 35
  • 36. Scaling up: 7 workstreams • Developed a model for estimating effort and cost (AWS costs & Labor per LOB migration) • Running agile teams in parallel – employed Agile coaches • Enhanced the core framework to ensure it would scale effectively when in use by multiple teams simultaneously • Building a Code repository for use by all teams • Building CI / CD Frameworks
  • 37. Where are we now? • 4 of 7 LOBs migrated – framework enables complete migration of a functional area within days/weeks as opposed to months. On track to migrate and decommission entire legacy environment within next 6 months • 10 weeks to migrate from an external vendor hosting data and providing reports for one LoB • Cost of Data Ingestion Framework is under $40/day (EC2, EMR, Data Pipeline) • First “Big Data” initiative in production, captures and processes an average of 1.5 Million e reading events daily (peak: 7 Million) • Profile: LOB #1 • Loading ~5-6 Million rows/day (6-7GB/day) • Processing over 1.5 billion rows within Redshift daily • Complete ETL/ELT batch cycle performance improved by over 170%
  • 38. Key lessons – project execution • Essential to monitor and optimize AWS costs • “Data Champion” / “Data Guide” partnership absolutely critical for successful adoption of new platforms • Importance of strong Agile coaches while scaling out Agile teams • Criticality of choosing consulting partners (AWS & North Bay) who can ramp up and supply key resources fast and cycle off the project when finished • Creating new data platforms and migrating data into them is easy, especially with AWS. Decommission of existing data platforms is hard! 38
  • 41. Related Sessions Hear from other customers discussing their Amazon Redshift use cases: • BDM402—Best Practices for Data Warehousing with Amazon Redshift (King.com) • BDA304—What’s New with Amazon Redshift • SVR308—Content and Data Platforms at Vevo: Rebuilding and Scaling from Zero in One Year • GAM301—How EA Leveraged Amazon Redshift and AWS Partner 47Lining to Gather Meaningful Player Insights • BDA207—Fanatics: Deploying Scalable, Self-Service Business Intelligence on AWS • BDM306— Netflix: Using Amazon S3 as the fabric of our big data ecosystem • BDA203 — Billions of Rows Transformed in Record Time Using Matillion ETL for Amazon Redshift (GE Power and Water) • BDM206 — Understanding IoT Data: How to Leverage Amazon Kinesis in Building an IoT Analytics Platform on AWS (Hello) • STG307— Case Study: How Prezi Built and Scales a Cost-Effective, Multipetabyte Data Platform and Storage Infrastructure on Amazon S3