SlideShare a Scribd company logo
1 of 44
Download to read offline
©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved
Building Your Data Warehouse with
Amazon Redshift
Ian Meyers, AWS (meyersi@amazon.com)
Guest Speaker: Toby Moore, Co-Founder & CTO, Space Ape Games
(toby@spaceapegames.com)
Data Warehouse - Challenges
Cost
Complexity
Performance
Rigidity
1990	
   2000	
   2010	
   2020	
  
Enterprise	
  Data	
   Data	
  in	
  Warehouse	
  
Petabyte scale; massively parallel
Relational data warehouse
Fully managed; zero admin
SSD & HDD platforms
As low as $1,000/TB/Year
Amazon
Redshift
Redshift powers Clickstream Analytics for Amazon.com
Web log analysis for Amazon.com
Over one petabyte workload
Largest table: 400TB
2TB of data per day
Understand customer behavior
Who is browsing but not buying
Which products / features are winners
What sequence led to higher customer
conversion
Solution
Best scale out solution – Query across 1 week
Hadoop – query across 1 month
Redshift Performance Realized
Performance
Scan 15 months of data: 14 minutes
2.25 trillion rows
Load one day worth of data: 10 minutes
5 billion rows
Backfill one month of data: 9.75 hours
150 billion rows
Pig à Amazon Redshift: 2 days to 1 hr
10B row join with 700M rows
Oracle à Amazon Redshift: 90 hours to 8 hrs
Reduced number of SQLs by a factor of 3
Cost
2PB cluster
100 node dw1.8xl (3yr RI)
$180/hr
Complexity
20% time of one DBA
Backup
Restore
Resizing
0	
   4	
   8	
   12	
   16	
   20	
   24	
   28	
   32	
  
128	
  	
  
Nodes	
  
16	
  	
  
Nodes	
  
	
  2	
  	
  
Nodes	
  
Dura%on	
  (Minutes)	
  
Time	
  to	
  Deploy	
  and	
  Manage	
  a	
  Cluster	
  
Time	
  spent	
  	
  
on	
  clicks	
  
Deploy	
  	
   Connect	
   Backup	
   Restore	
   Resize	
  	
  
(2	
  to	
  16	
  nodes)	
  
Simplicity
Who uses Amazon Redshift?
Common Customer Use Cases
•  Reduce costs by
extending DW rather than
adding HW
•  Migrate completely from
existing DW systems
•  Respond faster to
business
•  Improve performance by
an order of magnitude
•  Make more data
available for analysis
•  Access business data via
standard reporting tools
•  Add analytic functionality
to applications
•  Scale DW capacity as
demand grows
•  Reduce HW & SW costs
by an order of magnitude
Traditional Enterprise DW Companies with Big Data SaaS Companies
Selected Amazon Redshift Customers
Amazon Redshift Partners
Amazon Redshift Architecture
•  Leader Node
–  SQL endpoint
–  Stores metadata
–  Coordinates query execution
•  Compute Nodes
–  Local, columnar storage
–  Execute queries in parallel
–  Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB or SSH
•  Two hardware platforms
–  Optimized for data processing
–  DW1: HDD; scale from 2TB to 1.6PB
–  DW2: SSD; scale from 160GB to 256TB
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3 / DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
Amazon Redshift dramatically reduces I/O
•  Data compression
•  Zone maps
•  Direct-attached storage
•  Large data block sizes
ID	
   Age	
   State	
   Amount	
  
123	
   20	
   CA	
   500	
  
345	
   25	
   WA	
   250	
  
678	
   40	
   FL	
   125	
  
957	
   37	
   WA	
   375	
  
Amazon Redshift dramatically reduces I/O
•  Data compression
•  Zone maps
•  Direct-attached storage
•  Large data block sizes
ID	
   Age	
   State	
   Amount	
  
123	
   20	
   CA	
   500	
  
345	
   25	
   WA	
   250	
  
678	
   40	
   FL	
   125	
  
957	
   37	
   WA	
   375	
  
Amazon Redshift dramatically reduces I/O
•  Column storage
•  Data compression
•  Zone maps
•  Direct-attached storage
•  Large data block sizes
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
Amazon Redshift dramatically reduces I/O
•  Column storage
•  Data compression
•  Direct-attached storage
•  Large data block sizes
•  Track of the minimum and
maximum value for each block
•  Skip over blocks that don’t
contain the data needed for a
given query
•  Minimize unnecessary I/O
Amazon Redshift dramatically reduces I/O
•  Column storage
•  Data compression
•  Zone maps
•  Direct-attached storage
•  Large data block sizes
•  Use direct-attached storage
to maximize throughput
•  Hardware optimized for high
performance data
processing
•  Large block sizes to make the
most of each read
•  Amazon Redshift manages
durability for you
Amazon Redshift has security built-in
•  SSL to secure data in transit
•  Encryption to secure data at rest
–  AES-256; hardware accelerated
–  All blocks on disks and in Amazon S3 encrypted
–  HSM Support
•  No direct access to compute nodes
•  Audit logging & AWS CloudTrail integration
•  Amazon VPC support
•  SOC 1/2/3, PCI-DSS Level 1, FedRAMP, others
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Amazon S3 / Amazon DynamoDB
Customer VPC
Internal
VPC
JDBC/ODBC
Leader
Node
Compute
Node
Compute
Node
Compute
Node
Amazon Redshift is 1/10th the Price of a Traditional Data
Warehouse
DW1 (HDD)
Price Per Hour for
DW1.XL Single Node
Effective Annual
Price per TB
On-Demand $ 0.850 $ 3,723
1 Year Reserved Instance $ 0.215 $ 2,192
3 Year Reserved Instance $ 0.114 $ 999
DW2 (SSD)
Price Per Hour for
DW2.L Single Node
Effective Annual
Price per TB
On-Demand $ 0.250 $ 13,688
1 Year Reserved Instance $ 0.075 $ 8,794
3 Year Reserved Instance $ 0.050 $ 5,498
Expanding Amazon Redshift’s
Functionality
New Dense Storage Instance
DS2, based on EC2’s D2, has twice the memory and CPU as DW1
Migrate from DS1 to DS2 by restoring from snapshot. We will help you migrate
your RIs
•  Twice the memory and compute power of DW1
•  Enhanced networking and 1.5X gain in disk throughput
•  40% to 60% performance gain over DW1
•  Available in the two node types: XL (2TB) and 8XL (16TB)
Custom ODBC and JDBC Drivers
•  Up to 35% higher performance than open source drivers
•  Supported by Informatica, Microstrategy, Pentaho, Qlik, SAS, Tableau
•  Will continue to support PostgreSQL open source drivers
•  Download drivers from console
Explain Plan Visualization
User Defined Functions
•  We’re enabling User Defined Functions (UDFs) so
you can add your own
–  Scalar and Aggregate Functions supported
•  You’ll be able to write UDFs using Python 2.7
–  Syntax is largely identical to PostgreSQL UDF Syntax
–  System and network calls within UDFs are prohibited
•  Comes with Pandas, NumPy, and SciPy pre-
installed
–  You’ll also be able import your own libraries for even more
flexibility
Scalar UDF example – URL parsing
CREATE FUNCTION f_hostname (VARCHAR url)
  RETURNS varchar
IMMUTABLE AS $$
  import urlparse
  return urlparse.urlparse(url).hostname
$$ LANGUAGE plpythonu;
Interleaved Multi Column Sort
•  Currently support Compound Sort Keys
–  Optimized for applications that filter data by one leading
column
•  Adding support for Interleaved Sort Keys
–  Optimized for filtering data by up to eight columns
–  No storage overhead unlike an index
–  Lower maintenance penalty compared to indexes
Compound Sort Keys Illustrated
Records in Redshift are
stored in blocks.
For this illustration, let’s
assume that four records
fill a block
Records with a given
cust_id are all in one
block
However, records with a
given prod_id are
spread across four
blocks
1
1
1
1
2
3
4
1
4
4
4
2
3
4
4
1
3
3
3
2
3
4
3
1
2
2
2
2
3
4
2
1
1  [1,1] [1,2] [1,3] [1,4]
2  [2,1] [2,2] [2,3] [2,4]
3  [3,1] [3,2] [3,3] [3,4]
4  [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
cust_id prod_id other columns blocks
1  [1,1] [1,2] [1,3] [1,4]
2  [2,1] [2,2] [2,3] [2,4]
3  [3,1] [3,2] [3,3] [3,4]
4  [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
Interleaved Sort Keys Illustrated
Records with a given
cust_id are spread across
two blocks
Records with a given
prod_id are also spread
across two blocks
Data is sorted in equal
measures for both keys
1
1
2
2
2
1
2
3
3
4
4
4
3
4
3
1
3
4
4
2
1
2
3
3
1
2
2
4
3
4
1
1
cust_id prod_id other columns blocks
How to use the feature
•  New keyword ‘INTERLEAVED’ when defining sort keys
–  Existing syntax will still work and behavior is unchanged
–  You can choose up to 8 columns to include and can query with any or
all of them
•  No change needed to queries
•  Benefits are significant
[ SORTKEY [ COMPOUND | INTERLEAVED ] ( column_name [, ...] ) ]
Amazon Redshift
Spend time with your data, not your database….
Gaming analytics with Redshift and AWS
Former CTO
Mind Candy / Moshi Monsters (2006-2012)
Introductions
Co-founder / CTO
Space Ape Games (2012-Present)
Toby Moore
Space Ape Games
12+ Million downloads
300k DAU
Coming soon!
Our games
Early needs + Approach
•  Highly empowered, analytical team
•  We hit a wall with 3rd party analytics tools
•  Big data is table stakes in games industry
•  We needed absolute flexibility on future tooling
•  No large capex spend
Pre-
Data
Not
Enough
Too
much
Tactical Predictive
Basic data
ANALYSIS
REPORTING
Data
Capture
A/B
Tests
Insights & Learning
Amazon S3
Amazon Redshift
Amazon EMR
Tactical data
ANALYSIS
REPORTING
CRM
Data
Capture
A/B
Tests
Amazon S3
Amazon Redshift
Amazon EMR
DATA MINING
ANALYSIS
REPORTING
CRM
Data
Capture
A/B
Tests
Insights & Learning
Amazon S3
Amazon Redshift
Amazon EMR
Predictive data
MODELLING
•  146Billion Rows
•  2 clusters: 1x 8 node 1 x 16 node dw1.xlarge
•  13TBOf Compressed Data
•  250m rows x 125 columns Rows Per Day
Today
Per	
  user,	
  Daily	
  Summary	
  (Over	
  200	
  Metrics)	
  
Spend Tier
In Game
Behaviour
Monetisation
Device
Tenure
Balances
Single	
  Player	
  View	
  
Platform
Spend BehaviourDevice
Retention
CountryLanguage
Acquisition Channel
Game Balances
Operating System
Castle Level
Retention
Modeling	
  and	
  predic%on	
  
Best offer bundle
Spend tier
Churn risk
Life time value
Spend propensity
Price optimisation
Never had to worry about:
•  Scalability
•  Backing up
•  Availability
•  Upgrades
•  Flexibility (ODBC etc.)
•  Performance
Summary
•  Move towards more real-time processing
•  Investigate machine learning
•  AWS Mobile analytics auto-export to Redshift
Next Steps
Thanks!
LONDON

More Related Content

What's hot

Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best PracticesAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
 
Getting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheGetting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheAmazon Web Services
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
농심 그룹 메가마트 : 온프레미스 Exadata의 AWS 클라우드 환경 전환 사례 공유-김동현, NDS Cloud Innovation Ce...
농심 그룹 메가마트 : 온프레미스 Exadata의 AWS 클라우드 환경 전환 사례 공유-김동현, NDS Cloud Innovation Ce...농심 그룹 메가마트 : 온프레미스 Exadata의 AWS 클라우드 환경 전환 사례 공유-김동현, NDS Cloud Innovation Ce...
농심 그룹 메가마트 : 온프레미스 Exadata의 AWS 클라우드 환경 전환 사례 공유-김동현, NDS Cloud Innovation Ce...Amazon Web Services Korea
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Riccardo Zamana
 
Data Migration Using AWS Snowball, Snowball Edge & Snowmobile
Data Migration Using AWS Snowball, Snowball Edge & SnowmobileData Migration Using AWS Snowball, Snowball Edge & Snowmobile
Data Migration Using AWS Snowball, Snowball Edge & SnowmobileAmazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentationadvaitdeo
 
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016Amazon Web Services
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
AWS Webcast - Cost and Performance Optimization in Amazon RDS
AWS Webcast - Cost and Performance Optimization in Amazon RDSAWS Webcast - Cost and Performance Optimization in Amazon RDS
AWS Webcast - Cost and Performance Optimization in Amazon RDSAmazon Web Services
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftAmazon Web Services
 

What's hot (20)

Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Amazon Redshift
Amazon Redshift Amazon Redshift
Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Getting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheGetting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCache
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
농심 그룹 메가마트 : 온프레미스 Exadata의 AWS 클라우드 환경 전환 사례 공유-김동현, NDS Cloud Innovation Ce...
농심 그룹 메가마트 : 온프레미스 Exadata의 AWS 클라우드 환경 전환 사례 공유-김동현, NDS Cloud Innovation Ce...농심 그룹 메가마트 : 온프레미스 Exadata의 AWS 클라우드 환경 전환 사례 공유-김동현, NDS Cloud Innovation Ce...
농심 그룹 메가마트 : 온프레미스 Exadata의 AWS 클라우드 환경 전환 사례 공유-김동현, NDS Cloud Innovation Ce...
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
 
Data Migration Using AWS Snowball, Snowball Edge & Snowmobile
Data Migration Using AWS Snowball, Snowball Edge & SnowmobileData Migration Using AWS Snowball, Snowball Edge & Snowmobile
Data Migration Using AWS Snowball, Snowball Edge & Snowmobile
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
 
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
AWS Webcast - Cost and Performance Optimization in Amazon RDS
AWS Webcast - Cost and Performance Optimization in Amazon RDSAWS Webcast - Cost and Performance Optimization in Amazon RDS
AWS Webcast - Cost and Performance Optimization in Amazon RDS
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 

Similar to Building Your Data Warehouse with Amazon Redshift

Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features Amazon Web Services
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Amazon Web Services
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksAmazon Web Services
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesAlexandra Sasha Blumenfeld
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftAmazon Web Services
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 

Similar to Building Your Data Warehouse with Amazon Redshift (20)

Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features AWS Webcast - Redshift Overview and New Features
AWS Webcast - Redshift Overview and New Features
 
Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...Getting Started with Managed Database Services on AWS - September 2016 Webina...
Getting Started with Managed Database Services on AWS - September 2016 Webina...
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 Minutes
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 

Recently uploaded (20)

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 

Building Your Data Warehouse with Amazon Redshift

  • 1. ©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved Building Your Data Warehouse with Amazon Redshift Ian Meyers, AWS (meyersi@amazon.com) Guest Speaker: Toby Moore, Co-Founder & CTO, Space Ape Games (toby@spaceapegames.com)
  • 2. Data Warehouse - Challenges Cost Complexity Performance Rigidity 1990   2000   2010   2020   Enterprise  Data   Data  in  Warehouse  
  • 3. Petabyte scale; massively parallel Relational data warehouse Fully managed; zero admin SSD & HDD platforms As low as $1,000/TB/Year Amazon Redshift
  • 4. Redshift powers Clickstream Analytics for Amazon.com Web log analysis for Amazon.com Over one petabyte workload Largest table: 400TB 2TB of data per day Understand customer behavior Who is browsing but not buying Which products / features are winners What sequence led to higher customer conversion Solution Best scale out solution – Query across 1 week Hadoop – query across 1 month
  • 5. Redshift Performance Realized Performance Scan 15 months of data: 14 minutes 2.25 trillion rows Load one day worth of data: 10 minutes 5 billion rows Backfill one month of data: 9.75 hours 150 billion rows Pig à Amazon Redshift: 2 days to 1 hr 10B row join with 700M rows Oracle à Amazon Redshift: 90 hours to 8 hrs Reduced number of SQLs by a factor of 3 Cost 2PB cluster 100 node dw1.8xl (3yr RI) $180/hr Complexity 20% time of one DBA Backup Restore Resizing
  • 6. 0   4   8   12   16   20   24   28   32   128     Nodes   16     Nodes    2     Nodes   Dura%on  (Minutes)   Time  to  Deploy  and  Manage  a  Cluster   Time  spent     on  clicks   Deploy     Connect   Backup   Restore   Resize     (2  to  16  nodes)   Simplicity
  • 7. Who uses Amazon Redshift?
  • 8. Common Customer Use Cases •  Reduce costs by extending DW rather than adding HW •  Migrate completely from existing DW systems •  Respond faster to business •  Improve performance by an order of magnitude •  Make more data available for analysis •  Access business data via standard reporting tools •  Add analytic functionality to applications •  Scale DW capacity as demand grows •  Reduce HW & SW costs by an order of magnitude Traditional Enterprise DW Companies with Big Data SaaS Companies
  • 11. Amazon Redshift Architecture •  Leader Node –  SQL endpoint –  Stores metadata –  Coordinates query execution •  Compute Nodes –  Local, columnar storage –  Execute queries in parallel –  Load, backup, restore via Amazon S3; load from Amazon DynamoDB or SSH •  Two hardware platforms –  Optimized for data processing –  DW1: HDD; scale from 2TB to 1.6PB –  DW2: SSD; scale from 160GB to 256TB 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores Amazon S3 / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node
  • 12. Amazon Redshift dramatically reduces I/O •  Data compression •  Zone maps •  Direct-attached storage •  Large data block sizes ID   Age   State   Amount   123   20   CA   500   345   25   WA   250   678   40   FL   125   957   37   WA   375  
  • 13. Amazon Redshift dramatically reduces I/O •  Data compression •  Zone maps •  Direct-attached storage •  Large data block sizes ID   Age   State   Amount   123   20   CA   500   345   25   WA   250   678   40   FL   125   957   37   WA   375  
  • 14. Amazon Redshift dramatically reduces I/O •  Column storage •  Data compression •  Zone maps •  Direct-attached storage •  Large data block sizes analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw
  • 15. Amazon Redshift dramatically reduces I/O •  Column storage •  Data compression •  Direct-attached storage •  Large data block sizes •  Track of the minimum and maximum value for each block •  Skip over blocks that don’t contain the data needed for a given query •  Minimize unnecessary I/O
  • 16. Amazon Redshift dramatically reduces I/O •  Column storage •  Data compression •  Zone maps •  Direct-attached storage •  Large data block sizes •  Use direct-attached storage to maximize throughput •  Hardware optimized for high performance data processing •  Large block sizes to make the most of each read •  Amazon Redshift manages durability for you
  • 17. Amazon Redshift has security built-in •  SSL to secure data in transit •  Encryption to secure data at rest –  AES-256; hardware accelerated –  All blocks on disks and in Amazon S3 encrypted –  HSM Support •  No direct access to compute nodes •  Audit logging & AWS CloudTrail integration •  Amazon VPC support •  SOC 1/2/3, PCI-DSS Level 1, FedRAMP, others 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Amazon S3 / Amazon DynamoDB Customer VPC Internal VPC JDBC/ODBC Leader Node Compute Node Compute Node Compute Node
  • 18. Amazon Redshift is 1/10th the Price of a Traditional Data Warehouse DW1 (HDD) Price Per Hour for DW1.XL Single Node Effective Annual Price per TB On-Demand $ 0.850 $ 3,723 1 Year Reserved Instance $ 0.215 $ 2,192 3 Year Reserved Instance $ 0.114 $ 999 DW2 (SSD) Price Per Hour for DW2.L Single Node Effective Annual Price per TB On-Demand $ 0.250 $ 13,688 1 Year Reserved Instance $ 0.075 $ 8,794 3 Year Reserved Instance $ 0.050 $ 5,498
  • 20. New Dense Storage Instance DS2, based on EC2’s D2, has twice the memory and CPU as DW1 Migrate from DS1 to DS2 by restoring from snapshot. We will help you migrate your RIs •  Twice the memory and compute power of DW1 •  Enhanced networking and 1.5X gain in disk throughput •  40% to 60% performance gain over DW1 •  Available in the two node types: XL (2TB) and 8XL (16TB)
  • 21. Custom ODBC and JDBC Drivers •  Up to 35% higher performance than open source drivers •  Supported by Informatica, Microstrategy, Pentaho, Qlik, SAS, Tableau •  Will continue to support PostgreSQL open source drivers •  Download drivers from console
  • 23. User Defined Functions •  We’re enabling User Defined Functions (UDFs) so you can add your own –  Scalar and Aggregate Functions supported •  You’ll be able to write UDFs using Python 2.7 –  Syntax is largely identical to PostgreSQL UDF Syntax –  System and network calls within UDFs are prohibited •  Comes with Pandas, NumPy, and SciPy pre- installed –  You’ll also be able import your own libraries for even more flexibility
  • 24. Scalar UDF example – URL parsing CREATE FUNCTION f_hostname (VARCHAR url)   RETURNS varchar IMMUTABLE AS $$   import urlparse   return urlparse.urlparse(url).hostname $$ LANGUAGE plpythonu;
  • 25. Interleaved Multi Column Sort •  Currently support Compound Sort Keys –  Optimized for applications that filter data by one leading column •  Adding support for Interleaved Sort Keys –  Optimized for filtering data by up to eight columns –  No storage overhead unlike an index –  Lower maintenance penalty compared to indexes
  • 26. Compound Sort Keys Illustrated Records in Redshift are stored in blocks. For this illustration, let’s assume that four records fill a block Records with a given cust_id are all in one block However, records with a given prod_id are spread across four blocks 1 1 1 1 2 3 4 1 4 4 4 2 3 4 4 1 3 3 3 2 3 4 3 1 2 2 2 2 3 4 2 1 1  [1,1] [1,2] [1,3] [1,4] 2  [2,1] [2,2] [2,3] [2,4] 3  [3,1] [3,2] [3,3] [3,4] 4  [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id cust_id prod_id other columns blocks
  • 27. 1  [1,1] [1,2] [1,3] [1,4] 2  [2,1] [2,2] [2,3] [2,4] 3  [3,1] [3,2] [3,3] [3,4] 4  [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id Interleaved Sort Keys Illustrated Records with a given cust_id are spread across two blocks Records with a given prod_id are also spread across two blocks Data is sorted in equal measures for both keys 1 1 2 2 2 1 2 3 3 4 4 4 3 4 3 1 3 4 4 2 1 2 3 3 1 2 2 4 3 4 1 1 cust_id prod_id other columns blocks
  • 28. How to use the feature •  New keyword ‘INTERLEAVED’ when defining sort keys –  Existing syntax will still work and behavior is unchanged –  You can choose up to 8 columns to include and can query with any or all of them •  No change needed to queries •  Benefits are significant [ SORTKEY [ COMPOUND | INTERLEAVED ] ( column_name [, ...] ) ]
  • 29. Amazon Redshift Spend time with your data, not your database….
  • 30. Gaming analytics with Redshift and AWS
  • 31. Former CTO Mind Candy / Moshi Monsters (2006-2012) Introductions Co-founder / CTO Space Ape Games (2012-Present) Toby Moore
  • 33. 12+ Million downloads 300k DAU Coming soon! Our games
  • 34. Early needs + Approach •  Highly empowered, analytical team •  We hit a wall with 3rd party analytics tools •  Big data is table stakes in games industry •  We needed absolute flexibility on future tooling •  No large capex spend Pre- Data Not Enough Too much Tactical Predictive
  • 35. Basic data ANALYSIS REPORTING Data Capture A/B Tests Insights & Learning Amazon S3 Amazon Redshift Amazon EMR
  • 37. DATA MINING ANALYSIS REPORTING CRM Data Capture A/B Tests Insights & Learning Amazon S3 Amazon Redshift Amazon EMR Predictive data MODELLING
  • 38. •  146Billion Rows •  2 clusters: 1x 8 node 1 x 16 node dw1.xlarge •  13TBOf Compressed Data •  250m rows x 125 columns Rows Per Day Today
  • 39. Per  user,  Daily  Summary  (Over  200  Metrics)   Spend Tier In Game Behaviour Monetisation Device Tenure Balances
  • 40. Single  Player  View   Platform Spend BehaviourDevice Retention CountryLanguage Acquisition Channel Game Balances Operating System Castle Level Retention
  • 41. Modeling  and  predic%on   Best offer bundle Spend tier Churn risk Life time value Spend propensity Price optimisation
  • 42. Never had to worry about: •  Scalability •  Backing up •  Availability •  Upgrades •  Flexibility (ODBC etc.) •  Performance Summary •  Move towards more real-time processing •  Investigate machine learning •  AWS Mobile analytics auto-export to Redshift Next Steps