SlideShare a Scribd company logo
1 of 52
Download to read offline
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sanjay Kotecha, Solution Architect
Eric Ferreira, Principal Database Engineer
July 21, 2015
Best Practices: Amazon Redshift
Optimizing Performance
Getting Started – June Webinar Series:
https://www.youtube.com/watch?v=biqBjWqJi-Q
Best Practices – July Webinar Series:
Optimizing Performance – July 21, 2015
Migration and Data Loading – July 22,2015
Reporting and Advanced Analytics – July 23, 2015
Amazon Redshift – Resources
Architecture
Distribution
Sort Keys
Compression
DDL
Loading
Vacuum
Analyze
Workload Management
Agenda
Leader Node
• SQL endpoint
• Stores metadata
• Coordinates query execution
Compute Nodes
• Local, columnar storage
• Execute queries in parallel
• Load, backup, restore via S3
• Parallel load from DynamoDB or SSH
HW optimized for data processing
• Optimized for data processing
• DS2: HDD; scale from 2TB to 2PB
• DC1: SSD; scale from 160GB to 356TB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
Amazon Redshift Architecture
– One slice per core
– DS2 – 2 slices on XL, 16 on 8XL
– DC1 – 2 slices on XL, 32 on 8XL
Architecture – Nodes and Slices
Table Distribution Styles
Distribution Key All
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
All data on
every node
Same key to same location
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Even
Round robin
distribution
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
cloudfront
uri = /games/g1.exe
user_id=1234
…
user_profile
user_id=1234
name=janet
…
user_profile
user_id=6789
name=fred
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
user_profile
user_id=2345
name=bill
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
user_profile
user_id=4312
name=fred
…
order_line
order_line_id = 25693
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
Data Distribution with Distribution Keys
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
user_profile
user_id=1234
name=janet
…
user_profile
user_id=6789
name=fred
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
user_profile
user_id=2345
name=bill
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
user_profile
user_id=4312
name=fred
…
order_line
order_line_id = 25693
…
Distribution Keys determine which data resides on which slices
cloudfront
uri = /games/g1.exe
user_id=1234
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
Records with same
distribution key for a table
are on the same slice
Data Distribution and Distribution Keys
Node 1
Slice 1 Slice 2
cloudfront
uri = /games/g1.exe
user_id=1234
…
user_profile
user_id=1234
name=janet
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
user_profile
user_id=2345
name=bill
…
order_line
order_line_id = 25693
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
Records from other tables
with the same distribution
key value are also on the
same slice
Records with same
distribution key for a table
are on the same slice
Distribution Keys help with data locality for join evaluation
Node 2
Slice 3 Slice 4
user_profile
user_id=6789
name=fred
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
user_profile
user_id=4312
name=fred
…
Data Distribution and Distribution Keys
Example Query (TPC-H dataset)
Data Distribution - Comparison
Distribution Type
Query against the tables with distribution
key was 178% faster
Key Even
14 seconds 39 seconds
Query plan for tables with distribution key
Data Distribution - Comparison
Query plan for tables without distribution key
Query Plan
http://docs.aws.amazon.com/redshift/latest/dg/c-query-processing.html
Tools – AdminScripts
Tools – AdminViews
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
cloudfront
uri = /games/g1.exe
user_id=1234
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
2M records
5M records
1M records
4M records
Poor key choices lead to uneven distribution of records…
Data Distribution and Distribution Keys
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
cloudfront
uri = /games/g1.exe
user_id=1234
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
2M records
5M records
1M records
4M records
Unevenly distributed data cause processing imbalances!
Data Distribution and Distribution Keys
Node 1
Slice 1 Slice 2
Node 2
Slice 3 Slice 4
cloudfront
uri = /games/g1.exe
user_id=1234
…
cloudfront
uri = /imgs/ad1.png
user_id=2345
…
cloudfront
uri=/games/g10.exe
user_id=4312
…
cloudfront
uri = /img/ad_5.img
user_id=1234
…
2M records2M records 2M records 2M records
Evenly distributed data improves query performance
select * from v_check_data_distribution where tablename = 'lineitem';
Data Distribution and Distribution Keys
KEY
• Large Fact tables
• Large dimension tables
ALL
• Medium dimension tables (1K – 2M)
EVEN
• Tables with no joins or group by
• Small dimension tables (<1000)
Data Distribution
Tools – Admin Scripts: table_info.sql
SELECT COUNT(*) FROM
LOGS WHERE DATE = ‘09-
JUNE-2015’
MIN: 01-JUNE-2015
MAX: 20-JUNE-2015
MIN: 08-JUNE-2015
MAX: 30-JUNE-2015
MIN: 12-JUNE-2015
MAX: 20-JUNE-2015
MIN: 02-JUNE-2015
MAX: 25-JUNE-2015
MIN: 06-JUNE-2015
MAX: 12-JUNE-2015
Unsorted Table
MIN: 01-JUNE-2015
MAX: 06-JUNE-2015
MIN: 07-JUNE-2015
MAX: 12-JUNE-2015
MIN: 13-JUNE-2015
MAX: 18-JUNE-2015
MIN: 19-JUNE-2015
MAX: 24-JUNE-2015
MIN: 25-JUNE-2015
MAX: 30-JUNE-2015
Sorted By Date
READ
READ
READ
READ
READ
Sort Keys – Zone Maps
Sort Keys - How to choose
Timestamp column
Frequent range filtering or equality filtering on one column
Join column:
create table customer (
c_custkey int8 not null,
c_name varchar(25) not null,
c_address varchar(40) not null,
c_nationkey int4 not null,
c_phone char(15) not null,
c_acctbal numeric(12,2) not null,
c_mktsegment char(10) not null,
c_comment varchar(117) not null
) distkey(c_custkey) sortkey(c_custkey) ;
Single Column
Compound
Interleaved
Sort Keys
Table is sorted by 1 column
[ SORTKEY ( date ) ]
Best for:
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group bys
• Quickest to VACUUM
Date Region Country
2-JUN-2015 Oceania New Zealand
2-JUN-2015 Asia Singapore
2-JUN-2015 Africa Zaire
2-JUN-2015 Asia Hong Kong
3-JUN-2015 Europe Germany
3-JUN-2015 Asia Korea
Sort Keys – Single Column
• Table is sorted by 1st column , then 2nd column etc.
[ SORTKEY COMPOUND ( date, region, country) ]
• Best for:
• Queries that use 1st column as primary filter, then other cols
• Can speed up joins and group bys
• Slower to VACUUM
Date Region Country
2-JUN-2015 Oceania New Zealand
2-JUN-2015 Asia Singapore
2-JUN-2015 Africa Zaire
2-JUN-2015 Asia Hong Kong
3-JUN-2015 Europe Germany
3-JUN-2015 Asia Korea
Sort Keys – Compound
• Equal weight is given to each column.
[ SORTKEY INTERLEAVED ( date, region, country) ]
• Best for:
• Queries that use different columns in filter
• Queries get faster the more columns used in the filter (up to 8)
• Slowest to VACUUM
Date Region Country
2-JUN-2015 Oceania New Zealand
2-JUN-2015 Asia Singapore
2-JUN-2015 Africa Zaire
2-JUN-2015 Asia Hong Kong
3-JUN-2015 Europe Germany
3-JUN-2015 Asia Korea
Sort Keys – Interleaved
Sort Keys – Comparing Styles
Single
create table
cust_sales_dt_single
sortkey (c_custkey)
as select * from
cust_sales_date;
Compound
create table
cust_sales_dt_compound
compound sortkey
(c_custkey, c_region,
c_mktsegment, d_date) as
select * from
cust_sales_date;
Interleaved
create table
cust_sales_dt_interleaved
interleaved sortkey
(c_custkey, c_region,
c_mktsegment, d_date)
as select * from
cust_sales_date;
Query 1
select max(lo_revenue),
min(lo_revenue)
from cust_sales_date_single
where c_custkey < 100000;
select max(lo_revenue),
min(lo_revenue)
from cust_sales_date_compound
where c_custkey < 100000;
select max(lo_revenue),
min(lo_revenue) from
cust_sales_date_interleaved
where c_custkey < 100000;
Query 2
select max(lo_revenue),
min(lo_revenue)
from cust_sales_date_single
where c_region = 'ASIA'
and c_mktsegment = 'FURNITURE';
select max(lo_revenue),
min(lo_revenue)
from cust_sales_date_compound
where c_region = 'ASIA'
and c_mktsegment = 'FURNITURE';
select max(lo_revenue),
min(lo_revenue)
from cust_sales_date_interleaved
where c_region = 'ASIA'
and c_mktsegment = 'FURNITURE';
Query 3
select max(lo_revenue), min(lo_revenue)
from cust_sales_date_single
where d_date between '01/01/1996' and
'01/14/1996'
and c_mktsegment = 'FURNITURE'
and c_region = 'ASIA';
select max(lo_revenue), min(lo_revenue)
from cust_sales_date_compound
where d_date between '01/01/1996' and
'01/14/1996'
and c_mktsegment = 'FURNITURE'
and c_region = 'ASIA';
select max(lo_revenue), min(lo_revenue)
from cust_sales_date_interleaved
where d_date between '01/01/1996' and
'01/14/1996'
and c_mktsegment = 'FURNITURE'
and c_region = 'ASIA';
Sort Keys – Comparing Styles
Sort Style Query 1 Query 2 Query 3
Single 0.25 seconds 18.37 seconds 30.04 seconds
Compound 0.27 seconds 18.24 seconds 30.14 seconds
Interleaved 0.94 seconds 1.46 seconds 0.80 seconds
Sort Keys – Comparing Styles
Increased load and vacuum times
More effective with large tables (> 100M+ rows)
Use Compound Sort Key when appending data in order
Sort Keys – Interleaved Considerations
Tools – Admin Scripts: table_info.sql
Raw encoding (RAW)
Byte-dictionary (BYTEDICT)
Delta encoding (DELTA / DELTA32K)
Mostly encoding (MOSTLY8 / MOSTLY16 / MOSTLY32)
Runlength encoding (RUNLENGTH)
Text encoding (TEXT255 / TEXT32K)
LZO encoding (
Average: 2-4x
Compression - Encodings
COPY samples data automatically when loading into an empty table
• Samples up to 100,000 rows and picks optimal encoding
If use temp tables or staging tables
• Turn off automatic compression
• Use analyze compression to determine the right encodings
• Bake those encodings into your DML
COPY <tablename> FROM 's3://<bucket-name>/<object-prefix>' CREDENTIALS
<AWS_ACCESS_KEY>;<AWS_SECRET_ACCESS_KEY> DELIMITER ',' COMPUPDATE OFF
MANIFEST;
Compression
Compression Encodings
Compression - Comparison
No Compression Encodings
Example Query (TPC-H dataset)
Compressed Uncompressed
14 seconds 37 seconds
Query against the tables with
compression was 164% faster
Compression - Comparison
• Zone maps store min/max per block
• Once we know which block(s) contain the
range, we know which row offsets to scan
• Highly compressed sort keys means many
rows per block
• You’ll scan more data blocks than you need
• If your sort keys compress significantly
more than your data columns, you may
want to skip compression
Compression – Sort Keys
Tools – Admin Scripts: table_info.sql
CREATE TABLE orders (
orderkey int8 NOT NULL DISTKEY,
custkey int8 NOT NULL,
orderstatus char(1) NOT NULL ,
totalprice numeric(12,2) NOT NULL ,
orderdate date NOT NULL SORTKEY ,
orderpriority char(15) NOT NULL,
clerk char(15) NOT NULL ,
shippriority int4 NOT NULL,
comment varchar(79) NOT NULL
);
DDL
During queries and ingestion,
the system allocates buffers
based on column width
Wider than needed columns
mean memory is wasted
Fewer rows fit into memory;
increased likelihood of queries
spilling to disk
DDL – Make Columns as narrow as possible
Define Primary & Foreign Keys
Not Enforced but…..
Helps optimizer with query plan
DDL
Use the COPY command
Each slice can load one file at a
time
A single input file means only one
slice is ingesting data
Instead of 100MB/s, you’re only
getting 6.25MB/s
Loading – Use multiple input files to maximize
throughput
Use the COPY command
You need at least as many input
files as you have slices
With 16 input files, all slices are
working so you maximize
throughput
Get 100MB/s per node; scale
linearly as you add nodes
Loading – Use multiple input files to maximize
throughput
Tools – Use the AdminScripts
VACUUM reclaims space and re-sorts tables
VACUUM can be run in 4 modes:
• VACUUM FULL
• Reclaims space and re-sorts
• VACUUM DELETE ONLY
• Reclaims space but does not re-sort
• VACUUM SORT ONLY
• Re-sorts but does not reclaim space
• VACUUM REINDEX
• Used for INTERLEAVED sort keys.
• Re-Analyzes sort keys and then runs FULL VACUUM
Vacuum
VACUUM is an I/O intensive operation and can take time to run.
To minimize the impact of VACUUM:
• Run VACUUM on a regular schedule
• Use TRUNCATE instead of DELETE where possible
• TRUNCATE or DROP test tables
• Perform a Deep Copy instead of VACUUM
• Load Data in sort order and remove need for VACUUM
Vacuum
• Is an alternate to VACUUM.
• Will remove deleted rows and also re-sort the table
• Is more efficient than VACUUM
• You can’t make concurrent updates to the table
Deep copy options:
• Use original table DDL and run INSERT INTO…SELECT
• Best option - Retains all table attributes
• Use CREATE TABLE AS
• New table does not inherit encoding, distkey, sortkey, primary keys, or foreign keys.
• Use CREATE TABLE LIKE
• New table inherits all attributes except primary and foreign keys
• Use a TEMP table to COPY data out and back in again
• Retains all attributes but requires two full inserts of the table
Vacuum – Deep Copy
Redshift’s query optimizer relies on up-to-date statistics
Update stats on sort/dist key columns after every load
Analyze
Analyze – AdminScripts: missing_table_stats.sql
Workload Management
Workload management is about creating queues for different workloads
User Group A
Short-running queueLong-running queue
Short
Query Group
Long
Query Group
Workload Management
Workload Management
Don’t set concurrency to more that you need
set query_group to allqueries;
select avg(l.priceperticket*s.qtysold) from listing l, sales s where l.listid <40000;
reset query_group;
Resources
Sanjay Kotecha | kotechas@amazon.com
Detail Pages
• http://aws.amazon.com/redshift
• https://aws.amazon.com/marketplace/redshift/
Best Practices
• http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html
• http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html
• http://docs.aws.amazon.com/redshift/latest/dg/c-optimizing-query-performance.html
Deep Drive Webinar Series in July
• Migration and Loading Data – July 22nd, 2015
• Reporting and Advanced Analytics – July 23rd, 2015
Thank you!

More Related Content

What's hot

(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 
Performance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabasePerformance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabaseTung Nguyen Thanh
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdfChris Hoyean Song
 
PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuningFederico Campoli
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresEDB
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareAltinity Ltd
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta LakeKnoldus Inc.
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management SystemAAKANKSHA JAIN
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftAWS Germany
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesAmazon Web Services
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 

What's hot (20)

(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
SQL Server on AWS
SQL Server on AWSSQL Server on AWS
SQL Server on AWS
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Amazon Redshift
Amazon Redshift Amazon Redshift
Amazon Redshift
 
Performance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabasePerformance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL Database
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
 
PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuning
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon Redshift
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 

Viewers also liked

SQL Server Security Best Practices - Евгений Недашковский
SQL Server Security Best Practices - Евгений НедашковскийSQL Server Security Best Practices - Евгений Недашковский
SQL Server Security Best Practices - Евгений НедашковскийHackIT Ukraine
 
Кирилл Алешин - Big Data и Lambda архитектура на практике
Кирилл Алешин - Big Data и Lambda архитектура на практикеКирилл Алешин - Big Data и Lambda архитектура на практике
Кирилл Алешин - Big Data и Lambda архитектура на практикеIT Share
 
Как мы строили аналитическую платформу на несколько миллиардов событии в месяц
Как мы строили аналитическую платформу на несколько миллиардов событии в месяцКак мы строили аналитическую платформу на несколько миллиардов событии в месяц
Как мы строили аналитическую платформу на несколько миллиардов событии в месяцMikhail Tabunov
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesAmazon Web Services
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...Amazon Web Services
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
 
深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 

Viewers also liked (16)

SQL Server Security Best Practices - Евгений Недашковский
SQL Server Security Best Practices - Евгений НедашковскийSQL Server Security Best Practices - Евгений Недашковский
SQL Server Security Best Practices - Евгений Недашковский
 
Кирилл Алешин - Big Data и Lambda архитектура на практике
Кирилл Алешин - Big Data и Lambda архитектура на практикеКирилл Алешин - Big Data и Lambda архитектура на практике
Кирилл Алешин - Big Data и Lambda архитектура на практике
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
 
Как мы строили аналитическую платформу на несколько миллиардов событии в месяц
Как мы строили аналитическую платформу на несколько миллиардов событии в месяцКак мы строили аналитическую платформу на несколько миллиардов событии в месяц
Как мы строили аналитическую платформу на несколько миллиардов событии в месяц
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
IAM Best Practices
IAM Best PracticesIAM Best Practices
IAM Best Practices
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
 

Similar to AWS July Webinar Series: Amazon Redshift Optimizing Performance

2017 AWS DB Day | Amazon Redshift 소개 및 실습
2017 AWS DB Day | Amazon Redshift  소개 및 실습2017 AWS DB Day | Amazon Redshift  소개 및 실습
2017 AWS DB Day | Amazon Redshift 소개 및 실습Amazon Web Services Korea
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarinn5712036
 
Simplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functionsSimplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functionsClayton Groom
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...Altinity Ltd
 
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADataconomy Media
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
 
Managing Statistics for Optimal Query Performance
Managing Statistics for Optimal Query PerformanceManaging Statistics for Optimal Query Performance
Managing Statistics for Optimal Query PerformanceKaren Morton
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
Performance tuning ColumnStore
Performance tuning ColumnStorePerformance tuning ColumnStore
Performance tuning ColumnStoreMariaDB plc
 
Tablas y almacenamiento en windows azure
Tablas y almacenamiento en windows azureTablas y almacenamiento en windows azure
Tablas y almacenamiento en windows azureEduardo Castro
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...Flink Forward
 
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...Datavail
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
 

Similar to AWS July Webinar Series: Amazon Redshift Optimizing Performance (20)

2017 AWS DB Day | Amazon Redshift 소개 및 실습
2017 AWS DB Day | Amazon Redshift  소개 및 실습2017 AWS DB Day | Amazon Redshift  소개 및 실습
2017 AWS DB Day | Amazon Redshift 소개 및 실습
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
 
Simplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functionsSimplifying SQL with CTE's and windowing functions
Simplifying SQL with CTE's and windowing functions
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
 
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project ADN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
DN 2017 | Reducing pain in data engineering | Martin Loetzsch | Project A
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
Managing Statistics for Optimal Query Performance
Managing Statistics for Optimal Query PerformanceManaging Statistics for Optimal Query Performance
Managing Statistics for Optimal Query Performance
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
SQL Windowing
SQL WindowingSQL Windowing
SQL Windowing
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
Performance tuning ColumnStore
Performance tuning ColumnStorePerformance tuning ColumnStore
Performance tuning ColumnStore
 
Tablas y almacenamiento en windows azure
Tablas y almacenamiento en windows azureTablas y almacenamiento en windows azure
Tablas y almacenamiento en windows azure
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 

Recently uploaded (20)

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 

AWS July Webinar Series: Amazon Redshift Optimizing Performance

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sanjay Kotecha, Solution Architect Eric Ferreira, Principal Database Engineer July 21, 2015 Best Practices: Amazon Redshift Optimizing Performance
  • 2. Getting Started – June Webinar Series: https://www.youtube.com/watch?v=biqBjWqJi-Q Best Practices – July Webinar Series: Optimizing Performance – July 21, 2015 Migration and Data Loading – July 22,2015 Reporting and Advanced Analytics – July 23, 2015 Amazon Redshift – Resources
  • 4. Leader Node • SQL endpoint • Stores metadata • Coordinates query execution Compute Nodes • Local, columnar storage • Execute queries in parallel • Load, backup, restore via S3 • Parallel load from DynamoDB or SSH HW optimized for data processing • Optimized for data processing • DS2: HDD; scale from 2TB to 2PB • DC1: SSD; scale from 160GB to 356TB 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC Amazon Redshift Architecture
  • 5. – One slice per core – DS2 – 2 slices on XL, 16 on 8XL – DC1 – 2 slices on XL, 32 on 8XL Architecture – Nodes and Slices
  • 6. Table Distribution Styles Distribution Key All Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 All data on every node Same key to same location Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Even Round robin distribution
  • 7. Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 cloudfront uri = /games/g1.exe user_id=1234 … user_profile user_id=1234 name=janet … user_profile user_id=6789 name=fred … cloudfront uri = /imgs/ad1.png user_id=2345 … user_profile user_id=2345 name=bill … cloudfront uri=/games/g10.exe user_id=4312 … user_profile user_id=4312 name=fred … order_line order_line_id = 25693 … cloudfront uri = /img/ad_5.img user_id=1234 … Data Distribution with Distribution Keys
  • 8. Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 user_profile user_id=1234 name=janet … user_profile user_id=6789 name=fred … cloudfront uri = /imgs/ad1.png user_id=2345 … user_profile user_id=2345 name=bill … cloudfront uri=/games/g10.exe user_id=4312 … user_profile user_id=4312 name=fred … order_line order_line_id = 25693 … Distribution Keys determine which data resides on which slices cloudfront uri = /games/g1.exe user_id=1234 … cloudfront uri = /img/ad_5.img user_id=1234 … Records with same distribution key for a table are on the same slice Data Distribution and Distribution Keys
  • 9. Node 1 Slice 1 Slice 2 cloudfront uri = /games/g1.exe user_id=1234 … user_profile user_id=1234 name=janet … cloudfront uri = /imgs/ad1.png user_id=2345 … user_profile user_id=2345 name=bill … order_line order_line_id = 25693 … cloudfront uri = /img/ad_5.img user_id=1234 … Records from other tables with the same distribution key value are also on the same slice Records with same distribution key for a table are on the same slice Distribution Keys help with data locality for join evaluation Node 2 Slice 3 Slice 4 user_profile user_id=6789 name=fred … cloudfront uri=/games/g10.exe user_id=4312 … user_profile user_id=4312 name=fred … Data Distribution and Distribution Keys
  • 10. Example Query (TPC-H dataset) Data Distribution - Comparison Distribution Type Query against the tables with distribution key was 178% faster Key Even 14 seconds 39 seconds
  • 11. Query plan for tables with distribution key Data Distribution - Comparison Query plan for tables without distribution key
  • 15. Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 cloudfront uri = /games/g1.exe user_id=1234 … cloudfront uri = /imgs/ad1.png user_id=2345 … cloudfront uri=/games/g10.exe user_id=4312 … cloudfront uri = /img/ad_5.img user_id=1234 … 2M records 5M records 1M records 4M records Poor key choices lead to uneven distribution of records… Data Distribution and Distribution Keys
  • 16. Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 cloudfront uri = /games/g1.exe user_id=1234 … cloudfront uri = /imgs/ad1.png user_id=2345 … cloudfront uri=/games/g10.exe user_id=4312 … cloudfront uri = /img/ad_5.img user_id=1234 … 2M records 5M records 1M records 4M records Unevenly distributed data cause processing imbalances! Data Distribution and Distribution Keys
  • 17. Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 cloudfront uri = /games/g1.exe user_id=1234 … cloudfront uri = /imgs/ad1.png user_id=2345 … cloudfront uri=/games/g10.exe user_id=4312 … cloudfront uri = /img/ad_5.img user_id=1234 … 2M records2M records 2M records 2M records Evenly distributed data improves query performance select * from v_check_data_distribution where tablename = 'lineitem'; Data Distribution and Distribution Keys
  • 18. KEY • Large Fact tables • Large dimension tables ALL • Medium dimension tables (1K – 2M) EVEN • Tables with no joins or group by • Small dimension tables (<1000) Data Distribution
  • 19. Tools – Admin Scripts: table_info.sql
  • 20. SELECT COUNT(*) FROM LOGS WHERE DATE = ‘09- JUNE-2015’ MIN: 01-JUNE-2015 MAX: 20-JUNE-2015 MIN: 08-JUNE-2015 MAX: 30-JUNE-2015 MIN: 12-JUNE-2015 MAX: 20-JUNE-2015 MIN: 02-JUNE-2015 MAX: 25-JUNE-2015 MIN: 06-JUNE-2015 MAX: 12-JUNE-2015 Unsorted Table MIN: 01-JUNE-2015 MAX: 06-JUNE-2015 MIN: 07-JUNE-2015 MAX: 12-JUNE-2015 MIN: 13-JUNE-2015 MAX: 18-JUNE-2015 MIN: 19-JUNE-2015 MAX: 24-JUNE-2015 MIN: 25-JUNE-2015 MAX: 30-JUNE-2015 Sorted By Date READ READ READ READ READ Sort Keys – Zone Maps
  • 21. Sort Keys - How to choose Timestamp column Frequent range filtering or equality filtering on one column Join column: create table customer ( c_custkey int8 not null, c_name varchar(25) not null, c_address varchar(40) not null, c_nationkey int4 not null, c_phone char(15) not null, c_acctbal numeric(12,2) not null, c_mktsegment char(10) not null, c_comment varchar(117) not null ) distkey(c_custkey) sortkey(c_custkey) ;
  • 23. Table is sorted by 1 column [ SORTKEY ( date ) ] Best for: • Queries that use 1st column (i.e. date) as primary filter • Can speed up joins and group bys • Quickest to VACUUM Date Region Country 2-JUN-2015 Oceania New Zealand 2-JUN-2015 Asia Singapore 2-JUN-2015 Africa Zaire 2-JUN-2015 Asia Hong Kong 3-JUN-2015 Europe Germany 3-JUN-2015 Asia Korea Sort Keys – Single Column
  • 24. • Table is sorted by 1st column , then 2nd column etc. [ SORTKEY COMPOUND ( date, region, country) ] • Best for: • Queries that use 1st column as primary filter, then other cols • Can speed up joins and group bys • Slower to VACUUM Date Region Country 2-JUN-2015 Oceania New Zealand 2-JUN-2015 Asia Singapore 2-JUN-2015 Africa Zaire 2-JUN-2015 Asia Hong Kong 3-JUN-2015 Europe Germany 3-JUN-2015 Asia Korea Sort Keys – Compound
  • 25. • Equal weight is given to each column. [ SORTKEY INTERLEAVED ( date, region, country) ] • Best for: • Queries that use different columns in filter • Queries get faster the more columns used in the filter (up to 8) • Slowest to VACUUM Date Region Country 2-JUN-2015 Oceania New Zealand 2-JUN-2015 Asia Singapore 2-JUN-2015 Africa Zaire 2-JUN-2015 Asia Hong Kong 3-JUN-2015 Europe Germany 3-JUN-2015 Asia Korea Sort Keys – Interleaved
  • 26. Sort Keys – Comparing Styles Single create table cust_sales_dt_single sortkey (c_custkey) as select * from cust_sales_date; Compound create table cust_sales_dt_compound compound sortkey (c_custkey, c_region, c_mktsegment, d_date) as select * from cust_sales_date; Interleaved create table cust_sales_dt_interleaved interleaved sortkey (c_custkey, c_region, c_mktsegment, d_date) as select * from cust_sales_date;
  • 27. Query 1 select max(lo_revenue), min(lo_revenue) from cust_sales_date_single where c_custkey < 100000; select max(lo_revenue), min(lo_revenue) from cust_sales_date_compound where c_custkey < 100000; select max(lo_revenue), min(lo_revenue) from cust_sales_date_interleaved where c_custkey < 100000; Query 2 select max(lo_revenue), min(lo_revenue) from cust_sales_date_single where c_region = 'ASIA' and c_mktsegment = 'FURNITURE'; select max(lo_revenue), min(lo_revenue) from cust_sales_date_compound where c_region = 'ASIA' and c_mktsegment = 'FURNITURE'; select max(lo_revenue), min(lo_revenue) from cust_sales_date_interleaved where c_region = 'ASIA' and c_mktsegment = 'FURNITURE'; Query 3 select max(lo_revenue), min(lo_revenue) from cust_sales_date_single where d_date between '01/01/1996' and '01/14/1996' and c_mktsegment = 'FURNITURE' and c_region = 'ASIA'; select max(lo_revenue), min(lo_revenue) from cust_sales_date_compound where d_date between '01/01/1996' and '01/14/1996' and c_mktsegment = 'FURNITURE' and c_region = 'ASIA'; select max(lo_revenue), min(lo_revenue) from cust_sales_date_interleaved where d_date between '01/01/1996' and '01/14/1996' and c_mktsegment = 'FURNITURE' and c_region = 'ASIA'; Sort Keys – Comparing Styles
  • 28. Sort Style Query 1 Query 2 Query 3 Single 0.25 seconds 18.37 seconds 30.04 seconds Compound 0.27 seconds 18.24 seconds 30.14 seconds Interleaved 0.94 seconds 1.46 seconds 0.80 seconds Sort Keys – Comparing Styles
  • 29. Increased load and vacuum times More effective with large tables (> 100M+ rows) Use Compound Sort Key when appending data in order Sort Keys – Interleaved Considerations
  • 30. Tools – Admin Scripts: table_info.sql
  • 31. Raw encoding (RAW) Byte-dictionary (BYTEDICT) Delta encoding (DELTA / DELTA32K) Mostly encoding (MOSTLY8 / MOSTLY16 / MOSTLY32) Runlength encoding (RUNLENGTH) Text encoding (TEXT255 / TEXT32K) LZO encoding ( Average: 2-4x Compression - Encodings
  • 32. COPY samples data automatically when loading into an empty table • Samples up to 100,000 rows and picks optimal encoding If use temp tables or staging tables • Turn off automatic compression • Use analyze compression to determine the right encodings • Bake those encodings into your DML COPY <tablename> FROM 's3://<bucket-name>/<object-prefix>' CREDENTIALS <AWS_ACCESS_KEY>;<AWS_SECRET_ACCESS_KEY> DELIMITER ',' COMPUPDATE OFF MANIFEST; Compression
  • 33. Compression Encodings Compression - Comparison No Compression Encodings
  • 34. Example Query (TPC-H dataset) Compressed Uncompressed 14 seconds 37 seconds Query against the tables with compression was 164% faster Compression - Comparison
  • 35. • Zone maps store min/max per block • Once we know which block(s) contain the range, we know which row offsets to scan • Highly compressed sort keys means many rows per block • You’ll scan more data blocks than you need • If your sort keys compress significantly more than your data columns, you may want to skip compression Compression – Sort Keys
  • 36. Tools – Admin Scripts: table_info.sql
  • 37. CREATE TABLE orders ( orderkey int8 NOT NULL DISTKEY, custkey int8 NOT NULL, orderstatus char(1) NOT NULL , totalprice numeric(12,2) NOT NULL , orderdate date NOT NULL SORTKEY , orderpriority char(15) NOT NULL, clerk char(15) NOT NULL , shippriority int4 NOT NULL, comment varchar(79) NOT NULL ); DDL
  • 38. During queries and ingestion, the system allocates buffers based on column width Wider than needed columns mean memory is wasted Fewer rows fit into memory; increased likelihood of queries spilling to disk DDL – Make Columns as narrow as possible
  • 39. Define Primary & Foreign Keys Not Enforced but….. Helps optimizer with query plan DDL
  • 40. Use the COPY command Each slice can load one file at a time A single input file means only one slice is ingesting data Instead of 100MB/s, you’re only getting 6.25MB/s Loading – Use multiple input files to maximize throughput
  • 41. Use the COPY command You need at least as many input files as you have slices With 16 input files, all slices are working so you maximize throughput Get 100MB/s per node; scale linearly as you add nodes Loading – Use multiple input files to maximize throughput
  • 42. Tools – Use the AdminScripts
  • 43. VACUUM reclaims space and re-sorts tables VACUUM can be run in 4 modes: • VACUUM FULL • Reclaims space and re-sorts • VACUUM DELETE ONLY • Reclaims space but does not re-sort • VACUUM SORT ONLY • Re-sorts but does not reclaim space • VACUUM REINDEX • Used for INTERLEAVED sort keys. • Re-Analyzes sort keys and then runs FULL VACUUM Vacuum
  • 44. VACUUM is an I/O intensive operation and can take time to run. To minimize the impact of VACUUM: • Run VACUUM on a regular schedule • Use TRUNCATE instead of DELETE where possible • TRUNCATE or DROP test tables • Perform a Deep Copy instead of VACUUM • Load Data in sort order and remove need for VACUUM Vacuum
  • 45. • Is an alternate to VACUUM. • Will remove deleted rows and also re-sort the table • Is more efficient than VACUUM • You can’t make concurrent updates to the table Deep copy options: • Use original table DDL and run INSERT INTO…SELECT • Best option - Retains all table attributes • Use CREATE TABLE AS • New table does not inherit encoding, distkey, sortkey, primary keys, or foreign keys. • Use CREATE TABLE LIKE • New table inherits all attributes except primary and foreign keys • Use a TEMP table to COPY data out and back in again • Retains all attributes but requires two full inserts of the table Vacuum – Deep Copy
  • 46. Redshift’s query optimizer relies on up-to-date statistics Update stats on sort/dist key columns after every load Analyze
  • 47. Analyze – AdminScripts: missing_table_stats.sql
  • 48. Workload Management Workload management is about creating queues for different workloads User Group A Short-running queueLong-running queue Short Query Group Long Query Group
  • 50. Workload Management Don’t set concurrency to more that you need set query_group to allqueries; select avg(l.priceperticket*s.qtysold) from listing l, sales s where l.listid <40000; reset query_group;
  • 51. Resources Sanjay Kotecha | kotechas@amazon.com Detail Pages • http://aws.amazon.com/redshift • https://aws.amazon.com/marketplace/redshift/ Best Practices • http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html • http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html • http://docs.aws.amazon.com/redshift/latest/dg/c-optimizing-query-performance.html Deep Drive Webinar Series in July • Migration and Loading Data – July 22nd, 2015 • Reporting and Advanced Analytics – July 23rd, 2015