SlideShare a Scribd company logo
1 of 56
Deep Dive on Amazon Redshift
Storage Subsystem and Query Life Cycle
Tony Gibbs, Data Warehousing Solutions Architect
June 2017
Deep Dive Overview
• Amazon Redshift History and Development
• Cluster Architecture
• Concepts and Terminology
• Storage Deep Dive
• Design Considerations
• Query Life Cycle
• Loading Best Practices
• New & Upcoming Feature
• Open Q&A
Amazon Redshift History & Development
Columnar
MPP
OLAP
AWS IAMAmazon VPCAmazon SWF
Amazon S3 AWSKMS Amazon
Route 53
Amazon
CloudWatch
Amazon EC2
PostgreSQL Amazon Redshift
February 2013
June 2017
> 100 Significant Patches
> 150 Significant Features
Amazon Redshift Cluster Architecture
Redshift Cluster Architecture
• Massively parallel, shared nothing
• Leader node
– SQL endpoint
– Stores metadata
– Coordinates parallel SQL processing
• Compute nodes
– Local, columnar storage
– Executes queries in parallel
– Load, backup, restore
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
S3 / EMR / DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
• Parser & Rewriter
• Planner & Optimizer
• Code Generator
• Input: Optimized plan
• Output: >=1 C++ functions
• Compiler
• Task Scheduler
• WLM
• Admission
• Scheduling
• PostgreSQL Catalog Tables
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
• Query execution processes
• Backup & restore processes
• Replication processes
• Local Storage
• Disks
• Slices
• Tables
• Columns
• Blocks
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
• Query execution processes
• Backup & restore processes
• Replication processes
• Local Storage
• Disks
• Slices
• Tables
• Columns
• Blocks
Concepts and Terminology
Designed for I/O Reduction
• Columnar storage
• Data compression
• Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Accessing dt with row storage:
– Need to read everything
– Unnecessary I/O
aid loc dt
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
Designed for I/O Reduction
• Columnar storage
• Data compression
• Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Accessing dt with columnar storage:
– Only scan blocks for relevant
column
aid loc dt
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
Designed for I/O Reduction
• Columnar storage
• Data compression
• Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
• Columns grow and shrink independently
• Effective compression ratios due to like data
• Reduces storage requirements
• Reduces I/O
aid loc dt
CREATE TABLE loft_deep_dive (
aid INT ENCODE LZO
,loc CHAR(3) ENCODE BYTEDICT
,dt DATE ENCODE RUNLENGTH
);
Designed for I/O Reduction
• Columnar storage
• Data compression
• Zone maps
aid loc dt
1 SFO 2016-09-01
2 JFK 2016-09-14
3 SFO 2017-04-01
4 JFK 2017-05-14
aid loc dt
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
• In-memory block metadata
• Contains per-block MIN and MAX value
• Effectively prunes blocks which cannot
contain data for a given query
• Eliminates unnecessary I/O
SELECT COUNT(*) FROM LOGS WHERE DATE = '09-JUNE-2013'
MIN: 01-JUNE-2013
MAX: 20-JUNE-2013
MIN: 08-JUNE-2013
MAX: 30-JUNE-2013
MIN: 12-JUNE-2013
MAX: 20-JUNE-2013
MIN: 02-JUNE-2013
MAX: 25-JUNE-2013
Unsorted Table
MIN: 01-JUNE-2013
MAX: 06-JUNE-2013
MIN: 07-JUNE-2013
MAX: 12-JUNE-2013
MIN: 13-JUNE-2013
MAX: 18-JUNE-2013
MIN: 19-JUNE-2013
MAX: 24-JUNE-2013
Sorted By Date
Zone Maps
Terminology and Concepts: Data Sorting
• Goals:
• Physically order rows of table data based on certain column(s)
• Optimize effectiveness of zone maps
• Enable MERGE JOIN operations
• Impact:
• Enables rrscans to prune blocks by leveraging zone maps
• Overall reduction in block I/O
• Achieved with the table property SORTKEY defined over one or more columns
• Optimal SORTKEY is dependent on:
• Query patterns
• Data profile
• Business requirements
Terminology and Concepts: Slices
• A slice can be thought of like a “virtual compute node”
– Unit of data partitioning
– Parallel query processing
• Facts about slices:
– Each compute node has either 2, 16, or 32 slices
– Table rows are distributed to slices
– A slice processes only its own data
Data Distribution
• Distribution style is a table property which dictates how that table’s data is
distributed throughout the cluster:
• KEY: Value is hashed, same value goes to same location (slice)
• ALL: Full table data goes to first slice of every node
• EVEN: Round robin
• Goals:
• Distribute data evenly for parallel processing
• Minimize data movement during query processing
KEY
ALL
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
EVEN
Data Distribution: Example
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE (EVEN|KEY|ALL);
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Data Distribution: EVEN Example
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE EVEN;
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
INSERT INTO loft_deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0 Rows: 0 Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24MB)
Rows: 1 Rows: 1 Rows: 1 Rows: 1
Data Distribution: KEY Example #1
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE KEY DISTKEY (loc);
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
INSERT INTO loft_deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 2 Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (2 slices) = 12 Blocks (12MB)
Rows: 0Rows: 1
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 2Rows: 0Rows: 1
Data Distribution: KEY Example #2
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE KEY DISTKEY (aid);
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
INSERT INTO loft_deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0 Rows: 0 Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24MB)
Rows: 1 Rows: 1 Rows: 1 Rows: 1
Data Distribution: ALL Example
CREATE TABLE loft_deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
) DISTSTYLE ALL;
CN1
Slice 0 Slice 1
CN2
Slice 2 Slice 3
INSERT INTO loft_deep_dive VALUES
(1, 'SFO', '2016-09-01'),
(2, 'JFK', '2016-09-14'),
(3, 'SFO', '2017-04-01'),
(4, 'JFK', '2017-05-14');
Rows: 0 Rows: 0
(3 User Columns + 3 System Columns) x (2 slice) = 12 Blocks (12MB)
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
Table: loft_deep_dive
User Columns System Columns
aid loc dt ins del row
Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
Terminology and Concepts: Data Distribution
• KEY
– The key creates an even distribution of data
– Joins are performed between large fact/dimension tables
– Optimizing merge joins and group by
• ALL
– Small and medium size dimension tables (< 2-3M)
• EVEN
– When key cannot produce an even distribution
Storage Deep Dive
Storage Deep Dive: Disks
• Redshift utilizes locally attached storage devices
• Compute nodes have 2.5-3x the advertised storage capacity
• 1, 3, 8, or 24 disks depending on node type
• Each disk is split into two partitions
– Local data storage, accessed by local CN
– Mirrored data, accessed by remote CN
• Partitions are raw devices
– Local storage devices are ephemeral in nature
– Tolerant to multiple disk failures on a single node
Storage Deep Dive: Blocks
• Column data is persisted to 1MB immutable blocks
• Each block contains in-memory metadata:
– Zone Maps (MIN/MAX value)
– Location of previous/next block
• Blocks are individually compressed with 1 of 10 encodings
• A full block contains between 16 and 8.4 million values
Storage Deep Dive: Columns
• Column: Logical structure accessible via SQL
• Physical structure is a doubly linked list of blocks
• These blockchains exist on each slice for each column
• All sorted & unsorted blockchains compose a column
• Column properties include:
– Distribution Key
– Sort Key
– Compression Encoding
• Columns shrink and grow independently, 1 block at a time
• Three system columns per table-per slice for MVCC
Block Properties: Design Considerations
• Small writes:
• Batch processing system, optimized for processing massive amounts of data
• 1MB size + immutable blocks means that we clone blocks on write so as not to
introduce fragmentation
• Small write (~1-10 rows) has similar cost to a larger write (~100 K rows)
• UPDATE and DELETE:
• Immutable blocks means that we only logically delete rows on UPDATE or DELETE
• Must VACUUM or DEEP COPY to remove ghost rows from table
Column Properties: Design Considerations
• Compression:
• COPY automatically analyzes and compresses data when loading into empty tables
• ANALYZE COMPRESSION checks existing tables and proposes optimal
compression algorithms for each column
• Changing column encoding requires a table rebuild
• DISTKEY and SORTKEY significantly influence performance (orders of magnitude)
• Distribution Keys:
• A poor DISTKEY can introduce data skew and an unbalanced workload
• A query completes only as fast as the slowest slice completes
• Sort Keys:
• A sortkey is only effective as the data profile allows it to be
• Selectivity needs to be considered
Parallelism Deep Dive
Storage Deep Dive: Slices
• Each compute node has either 2, 16, or 32 slices
• A slice can be thought of like a “virtual compute node”
– Unit of data partitioning
– Parallel query processing
• Facts about slices:
– Table rows are distributed to slices
– A slice processes only its own data
– Within a compute node all slices read from and write to all disks
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
• Parser & Rewriter
• Planner & Optimizer
• Code Generator
• Input: Optimized plan
• Output: >=1 C++ functions
• Compiler
• Task Scheduler
• WLM
• Admission
• Scheduling
• PostgreSQL Catalog Tables
• Redshift System Tables (STV)
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
128GB RAM
16TB disk
16 cores
Compute Node
Leader Node
• Parser & Rewriter
• Planner & Optimizer
• Code Generator
• Input: Optimized plan
• Output: >=1 C++ functions
• Compiler
• Task Scheduler
• WLM
• Admission
• Scheduling
• PostgreSQL Catalog Tables
• Redshift System Tables (STV)
Query Execution Terminology
• Step: An individual operation needed during query execution. Steps need to be
combined to allow compute nodes to perform a join. Examples: scan, sort,
hash, aggr
• Segment: A combination of several steps that can be done by a single process.
The smallest compilation unit executable by a slice. Segments within a stream
run in parallel.
• Stream: A collection of combined segments which output to the next stream or
SQL client.
Visualizing Streams, Segments, and Steps
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Time
client
JDBC ODBC
Leader Node
Parser
Query Planner
Code Generator
Final Computations
Generate code for
all segments of
one stream
Explain Plans
Compute Node
Receive Compiled Code
Run the Compiled Code
Return results to Leader
Compute Node
Receive Compiled Code
Run the Compiled Code
Return results to Leader
Return results to client
Segments in a stream are
executed concurrently.
Each step in a segment is
executed serially.
Query Lifecycle
Query Execution Deep Dive: Leader Node
1. The leader node receives the query and parses the SQL.
2. The parser produces a logical representation of the original query.
3. This query tree is input into the query optimizer (volt).
4. Volt rewrites the query to maximize its efficiency. Sometimes a single query will be
rewritten as several dependent statements in the background.
5. The rewritten query is sent to the planner which generates >= 1 query plans for the
execution with the best estimated performance.
6. The query plan is sent to the execution engine, where it’s translated into steps,
segments, and streams.
7. This translated plan is sent to the code generator, which generates a C++ function
for each segment.
8. This generated C++ is compiled with gcc to a .o file and distributed to the compute
nodes.
Query Execution Deep Dive: Compute Nodes
• Slices execute the query segments in parallel.
• Executable segments are created for one stream at a time. When the segments
of that stream are complete, the engine generates the segments for the next
stream.
• When the compute nodes are done, they return the query results to the leader
node for final processing.
• The leader node merges the data into a single result set and addresses any
needed sorting or aggregation.
• The leader node then returns the results to the client.
Visualizing Streams, Segments, and Steps
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Time
Query Execution
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Time
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Stream 0
Segment 0
Step 0 Step 1 Step 2
Segment 1
Step 0 Step 1 Step 2 Step 3 Step 4
Segment 2
Step 0 Step 1 Step 2 Step 3
Segment 3
Step 0 Step 1 Step 2 Step 3 Step 4 Step 5
Stream 1
Segment 4
Step 0 Step 1 Step 2 Step 3
Segment 5
Step 0 Step 1 Step 2
Segment 6
Step 0 Step 1 Step 2 Step 3 Step 4
Stream 2
Segment 7
Step 0 Step 1
Segment 8
Step 0 Step 1
Slices
0
1
2
3
Parallelism considerations with Redshift slices
DS2.8XL Compute Node
• Ingestion Throughput:
– Each slice’s query processors can load one file at a time:
• Streaming decompression
• Parse
• Distribute
• Write
• Realizing only partial node usage as 6.25% of slices are active
0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
Design considerations for Redshift slices
• Use at least as many input
files as there are slices in the
cluster
• With 16 input files, all slices
are working so you maximize
throughput
• COPY continues to scale
linearly as you add nodes
16 Input Files
DS2.8XL Compute Node
0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
Data Preparation
• Export Data from Source System
– CSV Recommend (Simple Delimiter '|' or ’,')
• Be aware of UTF-8 varchar columns (UTF-8 take 4 bytes per char)
• Be aware of your NULL character (N)
– GZIP Compress Files
– Split Files (1MB – 1GB after gzip compression)
• Useful COPY Options for PoC Data
– MAXERRORS
– ACCEPTINVCHARS
– NULL AS
New & Upcoming Features
Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query
High concurrency:
Multiple clusters access
same data
No ETL: Query data in-
place using open file
formats
Full Amazon Redshift
SQL support
S3
SQL
Run SQL queries directly against data in S3 using thousands of nodes
Recently Released Features: Amazon Redshift Spectrum
• Amazon Redshift Spectrum seamlessly integrates with your existing SQL & BI
apps
• Support for complex joins, nested queries & window functions
• Support for data partitioned in S3 by any key
Date, time, and any other custom keys
e.g., year, month, day, hour
Recently Released Features: Amazon Redshift Spectrum
Query
SELECT COUNT(*)
FROM S3.EXT_TABLE
GROUP BY…
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
Recently Released Features
QMR - Query Monitoring Rules
• Apply rules to inflight queries
New Data Type - TIMESTAMPTZ
• Support for Timestamp with Time zone : New TIMESTAMPTZ data type to input complete timestamp values that
include the date, the time of day, and a time zone.
Eg: 30 Nov 07:37:16 2016 PST
Multi-byte Object Names
• Support for Multi-byte (UTF-8) characters for tables, columns, and other database object names
User Connection Limits
• You can now set a limit on the number of database connections a user is permitted to have open concurrently
Automatic Data Compression for CTAS
• All newly created tables will leverage default encoding
New Column Encoding ZSTD
Recently Released Features
Performance Enhancements
• Vacuum (10x faster for deletes)
• Snapshot Restore (2x faster)
• Queries (Up to 5x faster)
Copy Can Extend Sorted Region on Single Sort Key
• No need to vacuum when loading in sorted order
Enhanced VPC Routing
• Restrict S3 Bucket Access
Schema Conversion Tool - One-Time Data Exports
• Oracle, Teradata, SQL Server
Schema Conversion Tool
• Vertica, SQL Server, Netezza and Greenplum
BI tools SQL clientsAnalytics tools
Client AWS
Redshift
ADFS
Corporate
Active Directory IAM
Amazon Redshift
ODBC/JDBC
User groups Individual user
Single Sign-On
Identity providers
New Redshift
ODBC/JDBC
drivers. Grab the
ticket (userid) and
get a SAML
assertion.
Coming Soon: IAM Authentication
Coming Soon: Lots More …
Automatic and Incremental Background VACUUM
• Reclaims space and sorts when Amazon Redshift clusters are idle
• Vacuum is initiated when performance can be enhanced
• Improves ETL and query performance
Short Query Bias
• Prioritize interactive short running queries
010101010101
Resources
• https://github.com/awslabs/amazon-redshift-utils
• https://github.com/awslabs/amazon-redshift-monitoring
• https://github.com/awslabs/amazon-redshift-udfs
• Admin scripts
Collection of utilities for running diagnostics on your cluster
• Admin views
Collection of utilities for managing your cluster, generating schema DDL, etc.
• ColumnEncodingUtility
Gives you the ability to apply optimal column encoding to an established schema with
data already loaded
• Amazon Redshift Engineering’s Advanced Table Design Playbook
https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design-playbook-preamble-prerequisites-and-
prioritization/
Tony Gibbs
aws.amazon.com/activate
Thank you!

More Related Content

What's hot

AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기Amazon Web Services Korea
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
DAT305_Amazon ElastiCache Deep Dive
DAT305_Amazon ElastiCache Deep DiveDAT305_Amazon ElastiCache Deep Dive
DAT305_Amazon ElastiCache Deep DiveAmazon Web Services
 
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...Amazon Web Services Japan
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSAmazon Web Services
 
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Amazon Web Services
 

What's hot (20)

AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Amazon S3 Masterclass
Amazon S3 MasterclassAmazon S3 Masterclass
Amazon S3 Masterclass
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Amazon Aurora: Under the Hood
Amazon Aurora: Under the HoodAmazon Aurora: Under the Hood
Amazon Aurora: Under the Hood
 
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
DAT305_Amazon ElastiCache Deep Dive
DAT305_Amazon ElastiCache Deep DiveDAT305_Amazon ElastiCache Deep Dive
DAT305_Amazon ElastiCache Deep Dive
 
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
 

Similar to Amazon Redshift deep dive on storage and query processing

Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftAmazon Web Services
 
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon RedshiftData Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
SRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftSRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftAmazon Web Services
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSCobus Bernard
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with RedshiftAmazon Web Services
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedTony Rogerson
 
16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf3operatordcslipiPeng
 

Similar to Amazon Redshift deep dive on storage and query processing (20)

Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon RedshiftData Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
 
SRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftSRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
SQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - AdvancedSQL Server 2014 Memory Optimised Tables - Advanced
SQL Server 2014 Memory Optimised Tables - Advanced
 
16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf16119 - Get to Know Your Data Sets (1).pdf
16119 - Get to Know Your Data Sets (1).pdf
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRachelAnnTenibroAmaz
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEMCharmi13
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...漢銘 謝
 
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxAsifArshad8
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comsaastr
 
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...Henrik Hanke
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.KathleenAnnCordero2
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxCarrieButtitta
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxaryanv1753
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRRsarwankumar4524
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Escort Service
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptxogubuikealex
 
Early Modern Spain. All about this period
Early Modern Spain. All about this periodEarly Modern Spain. All about this period
Early Modern Spain. All about this periodSaraIsabelJimenez
 

Recently uploaded (20)

SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
 
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATIONRACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
RACHEL-ANN M. TENIBRO PRODUCT RESEARCH PRESENTATION
 
Quality by design.. ppt for RA (1ST SEM
Quality by design.. ppt for  RA (1ST SEMQuality by design.. ppt for  RA (1ST SEM
Quality by design.. ppt for RA (1ST SEM
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
 
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptxEngaging Eid Ul Fitr Presentation for Kindergartners.pptx
Engaging Eid Ul Fitr Presentation for Kindergartners.pptx
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
 
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
DGT @ CTAC 2024 Valencia: Most crucial invest to digitalisation_Sven Zoelle_v...
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptx
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptx
 
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRRINDIAN GCP GUIDELINE. for Regulatory  affair 1st sem CRR
INDIAN GCP GUIDELINE. for Regulatory affair 1st sem CRR
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170
 
Chizaram's Women Tech Makers Deck. .pptx
Chizaram's Women Tech Makers Deck.  .pptxChizaram's Women Tech Makers Deck.  .pptx
Chizaram's Women Tech Makers Deck. .pptx
 
Early Modern Spain. All about this period
Early Modern Spain. All about this periodEarly Modern Spain. All about this period
Early Modern Spain. All about this period
 

Amazon Redshift deep dive on storage and query processing

  • 1. Deep Dive on Amazon Redshift Storage Subsystem and Query Life Cycle Tony Gibbs, Data Warehousing Solutions Architect June 2017
  • 2. Deep Dive Overview • Amazon Redshift History and Development • Cluster Architecture • Concepts and Terminology • Storage Deep Dive • Design Considerations • Query Life Cycle • Loading Best Practices • New & Upcoming Feature • Open Q&A
  • 3. Amazon Redshift History & Development
  • 4. Columnar MPP OLAP AWS IAMAmazon VPCAmazon SWF Amazon S3 AWSKMS Amazon Route 53 Amazon CloudWatch Amazon EC2 PostgreSQL Amazon Redshift
  • 5. February 2013 June 2017 > 100 Significant Patches > 150 Significant Features
  • 6. Amazon Redshift Cluster Architecture
  • 7. Redshift Cluster Architecture • Massively parallel, shared nothing • Leader node – SQL endpoint – Stores metadata – Coordinates parallel SQL processing • Compute nodes – Local, columnar storage – Executes queries in parallel – Load, backup, restore 10 GigE (HPC) Ingestion Backup Restore SQL Clients/BI Tools 128GB RAM 16TB disk 16 cores S3 / EMR / DynamoDB / SSH JDBC/ODBC 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node 128GB RAM 16TB disk 16 coresCompute Node Leader Node
  • 8. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node
  • 9. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Parser & Rewriter • Planner & Optimizer • Code Generator • Input: Optimized plan • Output: >=1 C++ functions • Compiler • Task Scheduler • WLM • Admission • Scheduling • PostgreSQL Catalog Tables
  • 10. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Query execution processes • Backup & restore processes • Replication processes • Local Storage • Disks • Slices • Tables • Columns • Blocks
  • 11. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Query execution processes • Backup & restore processes • Replication processes • Local Storage • Disks • Slices • Tables • Columns • Blocks
  • 13. Designed for I/O Reduction • Columnar storage • Data compression • Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Accessing dt with row storage: – Need to read everything – Unnecessary I/O aid loc dt CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date );
  • 14. Designed for I/O Reduction • Columnar storage • Data compression • Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Accessing dt with columnar storage: – Only scan blocks for relevant column aid loc dt CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date );
  • 15. Designed for I/O Reduction • Columnar storage • Data compression • Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 • Columns grow and shrink independently • Effective compression ratios due to like data • Reduces storage requirements • Reduces I/O aid loc dt CREATE TABLE loft_deep_dive ( aid INT ENCODE LZO ,loc CHAR(3) ENCODE BYTEDICT ,dt DATE ENCODE RUNLENGTH );
  • 16. Designed for I/O Reduction • Columnar storage • Data compression • Zone maps aid loc dt 1 SFO 2016-09-01 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 aid loc dt CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); • In-memory block metadata • Contains per-block MIN and MAX value • Effectively prunes blocks which cannot contain data for a given query • Eliminates unnecessary I/O
  • 17. SELECT COUNT(*) FROM LOGS WHERE DATE = '09-JUNE-2013' MIN: 01-JUNE-2013 MAX: 20-JUNE-2013 MIN: 08-JUNE-2013 MAX: 30-JUNE-2013 MIN: 12-JUNE-2013 MAX: 20-JUNE-2013 MIN: 02-JUNE-2013 MAX: 25-JUNE-2013 Unsorted Table MIN: 01-JUNE-2013 MAX: 06-JUNE-2013 MIN: 07-JUNE-2013 MAX: 12-JUNE-2013 MIN: 13-JUNE-2013 MAX: 18-JUNE-2013 MIN: 19-JUNE-2013 MAX: 24-JUNE-2013 Sorted By Date Zone Maps
  • 18. Terminology and Concepts: Data Sorting • Goals: • Physically order rows of table data based on certain column(s) • Optimize effectiveness of zone maps • Enable MERGE JOIN operations • Impact: • Enables rrscans to prune blocks by leveraging zone maps • Overall reduction in block I/O • Achieved with the table property SORTKEY defined over one or more columns • Optimal SORTKEY is dependent on: • Query patterns • Data profile • Business requirements
  • 19. Terminology and Concepts: Slices • A slice can be thought of like a “virtual compute node” – Unit of data partitioning – Parallel query processing • Facts about slices: – Each compute node has either 2, 16, or 32 slices – Table rows are distributed to slices – A slice processes only its own data
  • 20. Data Distribution • Distribution style is a table property which dictates how that table’s data is distributed throughout the cluster: • KEY: Value is hashed, same value goes to same location (slice) • ALL: Full table data goes to first slice of every node • EVEN: Round robin • Goals: • Distribute data evenly for parallel processing • Minimize data movement during query processing KEY ALL Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 EVEN
  • 21. Data Distribution: Example CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE (EVEN|KEY|ALL); CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 Table: loft_deep_dive User Columns System Columns aid loc dt ins del row
  • 22. Data Distribution: EVEN Example CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE EVEN; CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO loft_deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0 Rows: 0 Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24MB) Rows: 1 Rows: 1 Rows: 1 Rows: 1
  • 23. Data Distribution: KEY Example #1 CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE KEY DISTKEY (loc); CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO loft_deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 2 Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (2 slices) = 12 Blocks (12MB) Rows: 0Rows: 1 Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 2Rows: 0Rows: 1
  • 24. Data Distribution: KEY Example #2 CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE KEY DISTKEY (aid); CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO loft_deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0 Rows: 0 Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (4 slices) = 24 Blocks (24MB) Rows: 1 Rows: 1 Rows: 1 Rows: 1
  • 25. Data Distribution: ALL Example CREATE TABLE loft_deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ) DISTSTYLE ALL; CN1 Slice 0 Slice 1 CN2 Slice 2 Slice 3 INSERT INTO loft_deep_dive VALUES (1, 'SFO', '2016-09-01'), (2, 'JFK', '2016-09-14'), (3, 'SFO', '2017-04-01'), (4, 'JFK', '2017-05-14'); Rows: 0 Rows: 0 (3 User Columns + 3 System Columns) x (2 slice) = 12 Blocks (12MB) Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3 Table: loft_deep_dive User Columns System Columns aid loc dt ins del row Rows: 0Rows: 1Rows: 2Rows: 4Rows: 3
  • 26. Terminology and Concepts: Data Distribution • KEY – The key creates an even distribution of data – Joins are performed between large fact/dimension tables – Optimizing merge joins and group by • ALL – Small and medium size dimension tables (< 2-3M) • EVEN – When key cannot produce an even distribution
  • 28. Storage Deep Dive: Disks • Redshift utilizes locally attached storage devices • Compute nodes have 2.5-3x the advertised storage capacity • 1, 3, 8, or 24 disks depending on node type • Each disk is split into two partitions – Local data storage, accessed by local CN – Mirrored data, accessed by remote CN • Partitions are raw devices – Local storage devices are ephemeral in nature – Tolerant to multiple disk failures on a single node
  • 29. Storage Deep Dive: Blocks • Column data is persisted to 1MB immutable blocks • Each block contains in-memory metadata: – Zone Maps (MIN/MAX value) – Location of previous/next block • Blocks are individually compressed with 1 of 10 encodings • A full block contains between 16 and 8.4 million values
  • 30. Storage Deep Dive: Columns • Column: Logical structure accessible via SQL • Physical structure is a doubly linked list of blocks • These blockchains exist on each slice for each column • All sorted & unsorted blockchains compose a column • Column properties include: – Distribution Key – Sort Key – Compression Encoding • Columns shrink and grow independently, 1 block at a time • Three system columns per table-per slice for MVCC
  • 31. Block Properties: Design Considerations • Small writes: • Batch processing system, optimized for processing massive amounts of data • 1MB size + immutable blocks means that we clone blocks on write so as not to introduce fragmentation • Small write (~1-10 rows) has similar cost to a larger write (~100 K rows) • UPDATE and DELETE: • Immutable blocks means that we only logically delete rows on UPDATE or DELETE • Must VACUUM or DEEP COPY to remove ghost rows from table
  • 32. Column Properties: Design Considerations • Compression: • COPY automatically analyzes and compresses data when loading into empty tables • ANALYZE COMPRESSION checks existing tables and proposes optimal compression algorithms for each column • Changing column encoding requires a table rebuild • DISTKEY and SORTKEY significantly influence performance (orders of magnitude) • Distribution Keys: • A poor DISTKEY can introduce data skew and an unbalanced workload • A query completes only as fast as the slowest slice completes • Sort Keys: • A sortkey is only effective as the data profile allows it to be • Selectivity needs to be considered
  • 34. Storage Deep Dive: Slices • Each compute node has either 2, 16, or 32 slices • A slice can be thought of like a “virtual compute node” – Unit of data partitioning – Parallel query processing • Facts about slices: – Table rows are distributed to slices – A slice processes only its own data – Within a compute node all slices read from and write to all disks
  • 35. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Parser & Rewriter • Planner & Optimizer • Code Generator • Input: Optimized plan • Output: >=1 C++ functions • Compiler • Task Scheduler • WLM • Admission • Scheduling • PostgreSQL Catalog Tables • Redshift System Tables (STV)
  • 36. 128GB RAM 16TB disk 16 cores 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node 128GB RAM 16TB disk 16 cores Compute Node Leader Node • Parser & Rewriter • Planner & Optimizer • Code Generator • Input: Optimized plan • Output: >=1 C++ functions • Compiler • Task Scheduler • WLM • Admission • Scheduling • PostgreSQL Catalog Tables • Redshift System Tables (STV)
  • 37. Query Execution Terminology • Step: An individual operation needed during query execution. Steps need to be combined to allow compute nodes to perform a join. Examples: scan, sort, hash, aggr • Segment: A combination of several steps that can be done by a single process. The smallest compilation unit executable by a slice. Segments within a stream run in parallel. • Stream: A collection of combined segments which output to the next stream or SQL client.
  • 38. Visualizing Streams, Segments, and Steps Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Time
  • 39. client JDBC ODBC Leader Node Parser Query Planner Code Generator Final Computations Generate code for all segments of one stream Explain Plans Compute Node Receive Compiled Code Run the Compiled Code Return results to Leader Compute Node Receive Compiled Code Run the Compiled Code Return results to Leader Return results to client Segments in a stream are executed concurrently. Each step in a segment is executed serially. Query Lifecycle
  • 40. Query Execution Deep Dive: Leader Node 1. The leader node receives the query and parses the SQL. 2. The parser produces a logical representation of the original query. 3. This query tree is input into the query optimizer (volt). 4. Volt rewrites the query to maximize its efficiency. Sometimes a single query will be rewritten as several dependent statements in the background. 5. The rewritten query is sent to the planner which generates >= 1 query plans for the execution with the best estimated performance. 6. The query plan is sent to the execution engine, where it’s translated into steps, segments, and streams. 7. This translated plan is sent to the code generator, which generates a C++ function for each segment. 8. This generated C++ is compiled with gcc to a .o file and distributed to the compute nodes.
  • 41. Query Execution Deep Dive: Compute Nodes • Slices execute the query segments in parallel. • Executable segments are created for one stream at a time. When the segments of that stream are complete, the engine generates the segments for the next stream. • When the compute nodes are done, they return the query results to the leader node for final processing. • The leader node merges the data into a single result set and addresses any needed sorting or aggregation. • The leader node then returns the results to the client.
  • 42. Visualizing Streams, Segments, and Steps Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Time
  • 43. Query Execution Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Time Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Stream 0 Segment 0 Step 0 Step 1 Step 2 Segment 1 Step 0 Step 1 Step 2 Step 3 Step 4 Segment 2 Step 0 Step 1 Step 2 Step 3 Segment 3 Step 0 Step 1 Step 2 Step 3 Step 4 Step 5 Stream 1 Segment 4 Step 0 Step 1 Step 2 Step 3 Segment 5 Step 0 Step 1 Step 2 Segment 6 Step 0 Step 1 Step 2 Step 3 Step 4 Stream 2 Segment 7 Step 0 Step 1 Segment 8 Step 0 Step 1 Slices 0 1 2 3
  • 44. Parallelism considerations with Redshift slices DS2.8XL Compute Node • Ingestion Throughput: – Each slice’s query processors can load one file at a time: • Streaming decompression • Parse • Distribute • Write • Realizing only partial node usage as 6.25% of slices are active 0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
  • 45. Design considerations for Redshift slices • Use at least as many input files as there are slices in the cluster • With 16 input files, all slices are working so you maximize throughput • COPY continues to scale linearly as you add nodes 16 Input Files DS2.8XL Compute Node 0 2 4 6 8 10 12 141 3 5 7 9 11 13 15
  • 46. Data Preparation • Export Data from Source System – CSV Recommend (Simple Delimiter '|' or ’,') • Be aware of UTF-8 varchar columns (UTF-8 take 4 bytes per char) • Be aware of your NULL character (N) – GZIP Compress Files – Split Files (1MB – 1GB after gzip compression) • Useful COPY Options for PoC Data – MAXERRORS – ACCEPTINVCHARS – NULL AS
  • 47. New & Upcoming Features
  • 48. Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query High concurrency: Multiple clusters access same data No ETL: Query data in- place using open file formats Full Amazon Redshift SQL support S3 SQL Run SQL queries directly against data in S3 using thousands of nodes Recently Released Features: Amazon Redshift Spectrum
  • 49. • Amazon Redshift Spectrum seamlessly integrates with your existing SQL & BI apps • Support for complex joins, nested queries & window functions • Support for data partitioned in S3 by any key Date, time, and any other custom keys e.g., year, month, day, hour Recently Released Features: Amazon Redshift Spectrum
  • 50. Query SELECT COUNT(*) FROM S3.EXT_TABLE GROUP BY… Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore
  • 51. Recently Released Features QMR - Query Monitoring Rules • Apply rules to inflight queries New Data Type - TIMESTAMPTZ • Support for Timestamp with Time zone : New TIMESTAMPTZ data type to input complete timestamp values that include the date, the time of day, and a time zone. Eg: 30 Nov 07:37:16 2016 PST Multi-byte Object Names • Support for Multi-byte (UTF-8) characters for tables, columns, and other database object names User Connection Limits • You can now set a limit on the number of database connections a user is permitted to have open concurrently Automatic Data Compression for CTAS • All newly created tables will leverage default encoding New Column Encoding ZSTD
  • 52. Recently Released Features Performance Enhancements • Vacuum (10x faster for deletes) • Snapshot Restore (2x faster) • Queries (Up to 5x faster) Copy Can Extend Sorted Region on Single Sort Key • No need to vacuum when loading in sorted order Enhanced VPC Routing • Restrict S3 Bucket Access Schema Conversion Tool - One-Time Data Exports • Oracle, Teradata, SQL Server Schema Conversion Tool • Vertica, SQL Server, Netezza and Greenplum
  • 53. BI tools SQL clientsAnalytics tools Client AWS Redshift ADFS Corporate Active Directory IAM Amazon Redshift ODBC/JDBC User groups Individual user Single Sign-On Identity providers New Redshift ODBC/JDBC drivers. Grab the ticket (userid) and get a SAML assertion. Coming Soon: IAM Authentication
  • 54. Coming Soon: Lots More … Automatic and Incremental Background VACUUM • Reclaims space and sorts when Amazon Redshift clusters are idle • Vacuum is initiated when performance can be enhanced • Improves ETL and query performance Short Query Bias • Prioritize interactive short running queries 010101010101
  • 55. Resources • https://github.com/awslabs/amazon-redshift-utils • https://github.com/awslabs/amazon-redshift-monitoring • https://github.com/awslabs/amazon-redshift-udfs • Admin scripts Collection of utilities for running diagnostics on your cluster • Admin views Collection of utilities for managing your cluster, generating schema DDL, etc. • ColumnEncodingUtility Gives you the ability to apply optimal column encoding to an established schema with data already loaded • Amazon Redshift Engineering’s Advanced Table Design Playbook https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design-playbook-preamble-prerequisites-and- prioritization/