SlideShare a Scribd company logo
1 of 54
Download to read offline
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Eric Ferreira, AWS
Ari Miller, TripAdvisor
October 2015
BDT401
Amazon Redshift Deep Dive
Tuning and Best Practices
What to Expect from the Session
Architecture Review
Ingestion
• COPY
• PKs and Manifest
• Data Hygiene
• Auto Compression /
Sort key Compression
Recent Features
• New Functions
• UDFs
• Interleaved Sort keys
Migration Tips
Workload Tuning
• Workload
• WLM
• Console
Sharing Amazon Redshift
• Custom Computation
Clusters
• Infra/Automation
• Monitoring
• Views
• Queue Management
Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/year
Amazon Redshift
Amazon Redshift Delivers Performance
“Redshift is twenty times faster than Hive” (5x – 20x reduction in query times) link
“Queries that used to take hours came back in seconds. Our analysts are orders of magnitude
more productive.” (20x – 40x reduction in query times) link
…[Redshift] performance has blown away everyone here (we generally see 50-100x speedup
over Hive). link
“Team played with Redshift today and concluded it is ****** awesome. Un-indexed complex
queries returning in < 10s.”
“Did I mention it's ridiculously fast? We'll be using it immediately to provide our analysts
an alternative to Hadoop.”
“We saw…2x improvement in query times”
Channel We regularly process multibillion row datasets and we do that in a matter of hours. link
Amazon Redshift System Architecture
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
A Deeper Look at Compute Node Architecture
Leader Node
Dense Compute Nodes
Large
• 2 slices/cores
• 15GB RAM
• 160GB SSD
8XL
• 32 slices/cores
• 244 GB RAM
• 2.56TB SSD
Dense Storage Nodes
X-Large
• 2 slices/ 4 cores
• 31GB RAM
• 2TB HDD
8XL
• 16 slices/ 36 cores
• 244 GB RAM
• 16TB HDD
Ingestion
Use Multiple Input Files to Maximize Throughput
COPY command
Each slice loads one file at a time
A single input file means
only one slice is ingesting data
Instead of 100MB/s,
you’re only getting 6.25MB/s
Use Multiple Input Files to Maximize Throughput
COPY command
You need at least as many input
files as you have slices
With 16 input files, all slices are
working so you maximize
throughput
Get 100 MB/s per node; scale
linearly as you add nodes
Primary Keys and Manifest Files
Amazon Redshift doesn’t enforce primary key constraints
• If you load data multiple times, Amazon Redshift won’t complain
• If you declare primary keys in your DML, the optimizer will
expect the data to be unique
Use manifest files to control exactly what is loaded and
how to respond if input files are missing
• Define a JSON manifest on Amazon S3
• Ensures that the cluster loads exactly what you want
Data Hygiene
Analyze tables regularly
• Every single load for popular columns
• Weekly for all columns
• Look SVV_TABLE_INFO(stats_off) for stale stats
• Look stl_alert_event_log for missing stats
Vacuum tables regularly
• Weekly is a good target
• Number of unsorted blocks as trigger
• Look SVV_TABLE_INFO(unsorted, empty)
• Deep copy might be faster for high percent unsorted
(20% unsorted usually is faster to deep copy)
Automatic Compression is a Good Thing (Mostly)
Better performance, lower costs
Samples data automatically when COPY into an empty table
• Samples up to 100,000 rows and picks optimal encoding
Regular ETL process using temp or staging tables:
Turn off automatic compression
• Use analyze compression to determine the right encodings
• Bake those encodings into your DML
Be Careful When Compressing Your Sort Keys
Zone maps store min/max per block
After we know which block(s) contain the range,
we know which row offsets to scan
Highly compressed sort keys means many rows
per block
You’ll scan more data blocks than you need
If your sort keys compress significantly more
than your data columns, you may want to skip
compression of sortkey column(s)
Check SVV_TABLE_INFO(skew_sortkey1)
COL1 COL2
Keep Your Columns as Narrow as Possible
• Buffers allocated based on declared
column width
• Wider than needed columns mean
memory is wasted
• Fewer rows fit into memory; increased
likelihood of queries spilling to disk
• Check
SVV_TABLE_INFO(max_varchar)
Recent Features
New SQL Functions
We add SQL functions regularly to expand Amazon Redshift’s query capabilities
Added 25+ window and aggregate functions since launch, including:
• LISTAGG
• [APPROXIMATE] COUNT
• DROP IF EXISTS, CREATE IF NOT EXISTS
• REGEXP_SUBSTR, _COUNT, _INSTR, _REPLACE
• PERCENTILE_CONT, _DISC, MEDIAN
• PERCENT_RANK, RATIO_TO_REPORT
We’ll continue iterating but also want to enable you to write your own
Scalar User Defined Functions
You can write UDFs using Python 2.7
• Syntax is largely identical to PostgreSQL UDF syntax
• System and network calls within UDFs are prohibited
Comes with Pandas, NumPy, and SciPy pre-installed
• You’ll also be able import your own libraries for even more
flexibility
Scalar UDF Example
CREATE FUNCTION f_hostname (VARCHAR url)
RETURNS varchar
IMMUTABLE AS $$
import urlparse
return urlparse.urlparse(url).hostname
$$ LANGUAGE plpythonu;
Scalar UDF Examples from Partners
http://www.looker.com/blog/amazon-redshift-
user-defined-functions
https://www.periscope.io/blog/redshift-user-
defined-functions-python.html
1-click deployment to launch, on
multiple regions around the world
Pay-as-you-go pricing with no long
term contracts required
Advanced Analytics Business IntelligenceData Integration
Interleaved Sort Keys
Compound Sort Keys
Records in Amazon
Redshift are stored in
blocks.
For this illustration, let’s
assume that four
records fill a block
Records with a given
cust_id are all in one
block
However, records with a
given prod_id are
spread across four
blocks
1
1
1
1
2
3
4
1
4
4
4
2
3
4
4
1
3
3
3
2
3
4
3
1
2
2
2
2
3
4
2
1
1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
cust_id prod_id other columns blocks
Select sum(amt)
From big_tab
Where cust_id = (1234);
Select sum(amt)
From big_tab
Where prod_id = (5678);
1 [1,1] [1,2] [1,3] [1,4]
2 [2,1] [2,2] [2,3] [2,4]
3 [3,1] [3,2] [3,3] [3,4]
4 [4,1] [4,2] [4,3] [4,4]
1 2 3 4
prod_id
cust_id
Interleaved Sort Keys
Records with a given
cust_id are spread
across two blocks
Records with a given
prod_id are also spread
across two blocks
Data is sorted in equal
measures for both keys
1
1
2
2
2
1
2
3
3
4
4
4
3
4
3
1
3
4
4
2
1
2
3
3
1
2
2
4
3
4
1
1
cust_id prod_id other columns blocks
Usage
New keyword ‘INTERLEAVED’ when defining sort keys
• Existing syntax will still work and behavior is unchanged
• You can choose up to 8 columns to include and can query with any or all of them
No change needed to queries
We’re just getting started with this feature
• Benefits are significant; load penalty is higher than we’d like and we’ll fix that quickly
• Check SVV_INTERLEAVED_COLUMNS(interleaved_skew) to decide when to VACUUM REINDEX
• A value greater than 5 will indicate the need to VACUUM REINDEX
[[ COMPOUND | INTERLEAVED ] SORTKEY ( column_name [, ...] ) ]
Migrating Existing Workloads
Forklift = BAD
Typical ETL/ELT on Legacy DW
• One file per table, maybe a few if too big
• Many updates (“massage” the data)
• Every job clears the data, then load
• Count on PK to block double loads
• High concurrency of load jobs
• Small table(s) to control the job stream
Two Questions to Ask
Why you do what you do?
• Many times, they don’t even know
What is the customer need?
• Many times, needs do not match current practice
• You might benefit from adding other AWS services
On Amazon Redshift
Updates are delete + insert of the row
• Deletes just mark rows for deletion
Blocks are immutable
• Minimum space used is one block per column, per slice
Commits are expensive
• 4 GB write on 8XL per node
• Mirrors WHOLE dictionary
• Cluster-wide serialized
On Amazon Redshift
• Not all aggregations created equal
• Pre-aggregation can help
• Order on group by matters
• Concurrency should be low for better throughput
• Caching layer for dashboards is recommended
• WLM parcels RAM to queries. Use multiple queues for
better control.
Resultset Metadata
Using SQL or JDBC, you have access to column names.
ResultSet rs = stmt.executeQuery("SELECT * FROM emp");
ResultSetMetaData rsmd = rs.getMetaData();
String name = rsmd.getColumnName(1);
Unload does not provide columns names (yet).
• Use SELECT top 0…
• Instead of adding 0=1 to your WHERE clause
Open Source Tools
https://github.com/awslabs/amazon-redshift-utils
Admin Scripts
• Collection of utilities for running diagnostics on your cluster.
Admin Views
• Collection of utilities for managing your cluster, generating schema DDL, etc.
Column Encoding Utility
• Gives you the ability to apply optimal column encoding to an established schema with data already loaded.
Analyze and Vacuum Utility
• Gives you the ability to automate VACUUM and ANALYZE operations.
Unload and Copy Utility
• Helps you to migrate data between Amazon Redshift clusters or databases.
Tuning Your Workload
top_queries.sql
perf_alerts.sql
Workload Management (WLM)
Concurrency and memory can now be changed dynamically.
You can have distinct values for load time and query time.
Use wlm_apex_hourly.sql to monitor “queue pressure”
Using the Console for Query Tuning
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ari Miller, TripAdvisor
October 2015
Sharing Amazon Redshift
Amazon Redshift performance
as a shared resource
The Shared Resource Problem
TripAdvisor
• 375 million unique monthly visitors
• 8 ds2.8xlarge cluster
• Largest table 12 TB (1 month of data)
The Shared Resource Problem
• 460 engineers, data scientists, analysts, and product
managers with personal logins to Amazon Redshift.
• > 2500 tables
• If you don't build it, they will still come
• Company motto is "speed wins" -- no tolerance for delayed
analysis
• 293 million row table imported using INSERT statements
The Solution
Reduce contention (custom clusters)
Infrastructure/automation
Monitoring (tables/performance/usage)
Own all common processing
Custom Computation Clusters
stay out of the way
Custom Computation Clusters
Spin up specialized ephemeral clusters. Guidelines:
• Mighty mouse cluster (32 dc1.large nodes) -- great bang
for the buck ($8/hr)
• 4 hour job on 8 d1s.8xlarge -> 40 minutes on mighty
mouse
• Much smaller wins vs d2s
• Customize the configuration:
• 2 slots only, to maximize memory
• Rely on Amazon Redshift to parallelize
• Primary limitation is on the size of the copy
Custom Computation Clusters
Ephemeral Infrastructure (Java)
• sync_redshift_objects.sh -- create schema/tables into a
new cluster from an existing cluster
• unload.sh
• manifest file
• automated checks for Amazon S3 consistency
• copy.sh
• Parameters: schema, table
• mirrors unload
Infrastructure/Automation
Azkaban – batch workflow + UI (LinkedIn)
Azkaban calls data transfer infrastructure, SQL, etc.
Infrastructure/Automation
Custom UI for non engineers -- point and click.
• Import from file or Hive
• Auto-analyze source data, suggest table ddl (sort,
distribution, compression), schedule
Monitoring
• Long running query alerts
• Accounts created through a UI that maps to our LDAP
• Every account mapped to an email address
• Daily and peak usage reports, by user and type of
activity
• Daily table size, space taken up by deleted rows by table
report – deep_copy.sh instead of vacuum
• Initial monitoring cost can be minimal because of psql
compatibility
• psql --quiet --html --table-attr cellspacing='0' --file
$QUERY_FILE --output $RESULT_FILE
Monitoring
Tableau helps
Shared Development Cluster
• Calisi syntax allows for parallel development on a single
shared development cluster
• Each developer has their own personal schema
• Any workflow can be run against their personal schema
if the SQL is marked up in this way for substitution.
/*{*/sherpa/*}*/ -> jbezos
/*{*/sherpa/*}*/ -> wvogels
Shared Development Cluster
String.format("create schema %s %s", targetSchema,
authorizationSchema));
List<Table> tables =
syncRedshiftObjects.loadTablesForSchema(originalSche
ma);
for (Table table : tables)
{
String tableName = table.getTable();
String ddl =
table.getCreateTableStatement().replace(table.getSch
ema() + ".", targetSchema + ".");
View Performance
• Create tables by month, with views spanning the
months.
• Use Calisi syntax to surround SQL using the view:
/*{*/rio_flookback/*}*/ -> rio_flookback_aug
• Syntax is completely legitimate SQL against a view --
autocomplete columns in IDE, run.
• When run through a framework, can substitute
automatically to run against the underlying monthly table
where the query doesn't span multiple months.
Queue Management
• Keep it simple, the memory is reserved
• Mission critical
• DBA
• Quick
• Default
• Include a default timeout
• Groups allow you to dynamically switch default queues
• Tableau
• Can’t run set query group
• Use group membership to switch queues
Wrap Up
• Amazon Redshift has transformed analytics at
TripAdvisor
• Thousands of queries a day, hundreds of users
• Tableau dashboards built on Amazon Redshift
• Iterative real time exploration, hundreds of times faster
• Open LDAP access for any employee
• Open sourcing these solutions
• Feb 2016
• Find the link here: http://engineering.tripadvisor.com/
Related Sessions
Hear from other customers discussing their Amazon Redshift use cases:
• DAT201—Introduction to Amazon Redshift (RetailMeNot)
• ISM303—Migrating Your Enterprise Data Warehouse to Amazon Redshift (Boingo Wireless
and Edmunds)
• ARC303—Pure Play Video OTT: A Microservices Architecture in the Cloud (Verizon)
• ARC305—Self-Service Cloud Services: How J&J Is Managing AWS at Scale for Enterprise
Workloads
• BDT306—The Life of a Click: How Hearst Publishing Manages Clickstream Analytics with
AWS
• DAT308 - How Yahoo! Analyzes Billions of Events a Day on Amazon Redshift
• DAT311—Large-Scale Genomic Analysis with Amazon Redshift (Human Longevity)
• BDT314—Running a Big Data and Analytics Application on Amazon EMR and Amazon
Redshift with a Focus on Security (Nasdaq)
• BDT316—Offloading ETL to Amazon Elastic MapReduce (Amgen)
• Building a Mobile App using Amazon EC2, Amazon S3, Amazon DynamoDB, and Amazon
Redshift (Tinder)
Remember to complete
your evaluations!
Thank you!

More Related Content

What's hot

Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...RTTS
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systemsDave Gardner
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
 
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017Amazon Web Services
 
AWS 9월 웨비나 | Amazon Aurora Deep Dive
AWS 9월 웨비나 | Amazon Aurora Deep DiveAWS 9월 웨비나 | Amazon Aurora Deep Dive
AWS 9월 웨비나 | Amazon Aurora Deep DiveAmazon Web Services Korea
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and DeltaDatabricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
How to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyHow to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyVMware Tanzu
 
AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!Chris Taylor
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 

What's hot (20)

Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
 
AWS 9월 웨비나 | Amazon Aurora Deep Dive
AWS 9월 웨비나 | Amazon Aurora Deep DiveAWS 9월 웨비나 | Amazon Aurora Deep Dive
AWS 9월 웨비나 | Amazon Aurora Deep Dive
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
How to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor NettyHow to Avoid Common Mistakes When Using Reactor Netty
How to Avoid Common Mistakes When Using Reactor Netty
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 

Viewers also liked

Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...Amazon Web Services
 
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Amazon Web Services
 
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)Amazon Web Services
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesAmazon Web Services
 
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...Amazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)Amazon Web Services
 
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAmazon Web Services
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...Amazon Web Services
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...Amazon Web Services
 
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWSAmazon Web Services
 
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon Web Services
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and consSaniya Khalsa
 
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Amazon Web Services
 
Understanding AWS Storage Options
Understanding AWS Storage OptionsUnderstanding AWS Storage Options
Understanding AWS Storage OptionsAmazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)Amazon Web Services
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 

Viewers also liked (20)

Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
 
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
 
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
AWS Data Transfer Services - AWS Gateway, AWS Snowball, AWS Snowball Edge, an...
 
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)
AWS re:Invent 2016: Deep Dive on Amazon Elastic File System (STG202)
 
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
 
Intro to AWS: Storage Services
Intro to AWS: Storage ServicesIntro to AWS: Storage Services
Intro to AWS: Storage Services
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
 
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
 
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and cons
 
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
 
Understanding AWS Storage Options
Understanding AWS Storage OptionsUnderstanding AWS Storage Options
Understanding AWS Storage Options
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 

Similar to (BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices

Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftAmazon Web Services LATAM
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftAWS Germany
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017Pratim Das
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722Amazon Web Services
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Julien SIMON
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceAmazon Web Services
 
World-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftWorld-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftLars Kamp
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Amazon Web Services
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Aioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_featuresAioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_featuresAiougVizagChapter
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architectureAjeet Singh
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMahesh Salaria
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 

Similar to (BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices (20)

Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon Redshift
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon Redshift
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 
AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722AWS July Webinar Series: Amazon redshift migration and load data 20150722
AWS July Webinar Series: Amazon redshift migration and load data 20150722
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performance
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
World-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftWorld-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon Redshift
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Aioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_featuresAioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_features
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architecture
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source Database
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Eric Ferreira, AWS Ari Miller, TripAdvisor October 2015 BDT401 Amazon Redshift Deep Dive Tuning and Best Practices
  • 2. What to Expect from the Session Architecture Review Ingestion • COPY • PKs and Manifest • Data Hygiene • Auto Compression / Sort key Compression Recent Features • New Functions • UDFs • Interleaved Sort keys Migration Tips Workload Tuning • Workload • WLM • Console Sharing Amazon Redshift • Custom Computation Clusters • Infra/Automation • Monitoring • Views • Queue Management
  • 3. Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/year Amazon Redshift
  • 4. Amazon Redshift Delivers Performance “Redshift is twenty times faster than Hive” (5x – 20x reduction in query times) link “Queries that used to take hours came back in seconds. Our analysts are orders of magnitude more productive.” (20x – 40x reduction in query times) link …[Redshift] performance has blown away everyone here (we generally see 50-100x speedup over Hive). link “Team played with Redshift today and concluded it is ****** awesome. Un-indexed complex queries returning in < 10s.” “Did I mention it's ridiculously fast? We'll be using it immediately to provide our analysts an alternative to Hadoop.” “We saw…2x improvement in query times” Channel We regularly process multibillion row datasets and we do that in a matter of hours. link
  • 5. Amazon Redshift System Architecture 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 6. A Deeper Look at Compute Node Architecture Leader Node Dense Compute Nodes Large • 2 slices/cores • 15GB RAM • 160GB SSD 8XL • 32 slices/cores • 244 GB RAM • 2.56TB SSD Dense Storage Nodes X-Large • 2 slices/ 4 cores • 31GB RAM • 2TB HDD 8XL • 16 slices/ 36 cores • 244 GB RAM • 16TB HDD
  • 8. Use Multiple Input Files to Maximize Throughput COPY command Each slice loads one file at a time A single input file means only one slice is ingesting data Instead of 100MB/s, you’re only getting 6.25MB/s
  • 9. Use Multiple Input Files to Maximize Throughput COPY command You need at least as many input files as you have slices With 16 input files, all slices are working so you maximize throughput Get 100 MB/s per node; scale linearly as you add nodes
  • 10. Primary Keys and Manifest Files Amazon Redshift doesn’t enforce primary key constraints • If you load data multiple times, Amazon Redshift won’t complain • If you declare primary keys in your DML, the optimizer will expect the data to be unique Use manifest files to control exactly what is loaded and how to respond if input files are missing • Define a JSON manifest on Amazon S3 • Ensures that the cluster loads exactly what you want
  • 11. Data Hygiene Analyze tables regularly • Every single load for popular columns • Weekly for all columns • Look SVV_TABLE_INFO(stats_off) for stale stats • Look stl_alert_event_log for missing stats Vacuum tables regularly • Weekly is a good target • Number of unsorted blocks as trigger • Look SVV_TABLE_INFO(unsorted, empty) • Deep copy might be faster for high percent unsorted (20% unsorted usually is faster to deep copy)
  • 12. Automatic Compression is a Good Thing (Mostly) Better performance, lower costs Samples data automatically when COPY into an empty table • Samples up to 100,000 rows and picks optimal encoding Regular ETL process using temp or staging tables: Turn off automatic compression • Use analyze compression to determine the right encodings • Bake those encodings into your DML
  • 13. Be Careful When Compressing Your Sort Keys Zone maps store min/max per block After we know which block(s) contain the range, we know which row offsets to scan Highly compressed sort keys means many rows per block You’ll scan more data blocks than you need If your sort keys compress significantly more than your data columns, you may want to skip compression of sortkey column(s) Check SVV_TABLE_INFO(skew_sortkey1) COL1 COL2
  • 14. Keep Your Columns as Narrow as Possible • Buffers allocated based on declared column width • Wider than needed columns mean memory is wasted • Fewer rows fit into memory; increased likelihood of queries spilling to disk • Check SVV_TABLE_INFO(max_varchar)
  • 16. New SQL Functions We add SQL functions regularly to expand Amazon Redshift’s query capabilities Added 25+ window and aggregate functions since launch, including: • LISTAGG • [APPROXIMATE] COUNT • DROP IF EXISTS, CREATE IF NOT EXISTS • REGEXP_SUBSTR, _COUNT, _INSTR, _REPLACE • PERCENTILE_CONT, _DISC, MEDIAN • PERCENT_RANK, RATIO_TO_REPORT We’ll continue iterating but also want to enable you to write your own
  • 17. Scalar User Defined Functions You can write UDFs using Python 2.7 • Syntax is largely identical to PostgreSQL UDF syntax • System and network calls within UDFs are prohibited Comes with Pandas, NumPy, and SciPy pre-installed • You’ll also be able import your own libraries for even more flexibility
  • 18. Scalar UDF Example CREATE FUNCTION f_hostname (VARCHAR url) RETURNS varchar IMMUTABLE AS $$ import urlparse return urlparse.urlparse(url).hostname $$ LANGUAGE plpythonu;
  • 19. Scalar UDF Examples from Partners http://www.looker.com/blog/amazon-redshift- user-defined-functions https://www.periscope.io/blog/redshift-user- defined-functions-python.html
  • 20. 1-click deployment to launch, on multiple regions around the world Pay-as-you-go pricing with no long term contracts required Advanced Analytics Business IntelligenceData Integration
  • 22. Compound Sort Keys Records in Amazon Redshift are stored in blocks. For this illustration, let’s assume that four records fill a block Records with a given cust_id are all in one block However, records with a given prod_id are spread across four blocks 1 1 1 1 2 3 4 1 4 4 4 2 3 4 4 1 3 3 3 2 3 4 3 1 2 2 2 2 3 4 2 1 1 [1,1] [1,2] [1,3] [1,4] 2 [2,1] [2,2] [2,3] [2,4] 3 [3,1] [3,2] [3,3] [3,4] 4 [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id cust_id prod_id other columns blocks Select sum(amt) From big_tab Where cust_id = (1234); Select sum(amt) From big_tab Where prod_id = (5678);
  • 23. 1 [1,1] [1,2] [1,3] [1,4] 2 [2,1] [2,2] [2,3] [2,4] 3 [3,1] [3,2] [3,3] [3,4] 4 [4,1] [4,2] [4,3] [4,4] 1 2 3 4 prod_id cust_id Interleaved Sort Keys Records with a given cust_id are spread across two blocks Records with a given prod_id are also spread across two blocks Data is sorted in equal measures for both keys 1 1 2 2 2 1 2 3 3 4 4 4 3 4 3 1 3 4 4 2 1 2 3 3 1 2 2 4 3 4 1 1 cust_id prod_id other columns blocks
  • 24. Usage New keyword ‘INTERLEAVED’ when defining sort keys • Existing syntax will still work and behavior is unchanged • You can choose up to 8 columns to include and can query with any or all of them No change needed to queries We’re just getting started with this feature • Benefits are significant; load penalty is higher than we’d like and we’ll fix that quickly • Check SVV_INTERLEAVED_COLUMNS(interleaved_skew) to decide when to VACUUM REINDEX • A value greater than 5 will indicate the need to VACUUM REINDEX [[ COMPOUND | INTERLEAVED ] SORTKEY ( column_name [, ...] ) ]
  • 27. Typical ETL/ELT on Legacy DW • One file per table, maybe a few if too big • Many updates (“massage” the data) • Every job clears the data, then load • Count on PK to block double loads • High concurrency of load jobs • Small table(s) to control the job stream
  • 28. Two Questions to Ask Why you do what you do? • Many times, they don’t even know What is the customer need? • Many times, needs do not match current practice • You might benefit from adding other AWS services
  • 29. On Amazon Redshift Updates are delete + insert of the row • Deletes just mark rows for deletion Blocks are immutable • Minimum space used is one block per column, per slice Commits are expensive • 4 GB write on 8XL per node • Mirrors WHOLE dictionary • Cluster-wide serialized
  • 30. On Amazon Redshift • Not all aggregations created equal • Pre-aggregation can help • Order on group by matters • Concurrency should be low for better throughput • Caching layer for dashboards is recommended • WLM parcels RAM to queries. Use multiple queues for better control.
  • 31. Resultset Metadata Using SQL or JDBC, you have access to column names. ResultSet rs = stmt.executeQuery("SELECT * FROM emp"); ResultSetMetaData rsmd = rs.getMetaData(); String name = rsmd.getColumnName(1); Unload does not provide columns names (yet). • Use SELECT top 0… • Instead of adding 0=1 to your WHERE clause
  • 32. Open Source Tools https://github.com/awslabs/amazon-redshift-utils Admin Scripts • Collection of utilities for running diagnostics on your cluster. Admin Views • Collection of utilities for managing your cluster, generating schema DDL, etc. Column Encoding Utility • Gives you the ability to apply optimal column encoding to an established schema with data already loaded. Analyze and Vacuum Utility • Gives you the ability to automate VACUUM and ANALYZE operations. Unload and Copy Utility • Helps you to migrate data between Amazon Redshift clusters or databases.
  • 34. Workload Management (WLM) Concurrency and memory can now be changed dynamically. You can have distinct values for load time and query time. Use wlm_apex_hourly.sql to monitor “queue pressure”
  • 35. Using the Console for Query Tuning
  • 36. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ari Miller, TripAdvisor October 2015 Sharing Amazon Redshift Amazon Redshift performance as a shared resource
  • 37. The Shared Resource Problem TripAdvisor • 375 million unique monthly visitors • 8 ds2.8xlarge cluster • Largest table 12 TB (1 month of data)
  • 38. The Shared Resource Problem • 460 engineers, data scientists, analysts, and product managers with personal logins to Amazon Redshift. • > 2500 tables • If you don't build it, they will still come • Company motto is "speed wins" -- no tolerance for delayed analysis • 293 million row table imported using INSERT statements
  • 39. The Solution Reduce contention (custom clusters) Infrastructure/automation Monitoring (tables/performance/usage) Own all common processing
  • 41. Custom Computation Clusters Spin up specialized ephemeral clusters. Guidelines: • Mighty mouse cluster (32 dc1.large nodes) -- great bang for the buck ($8/hr) • 4 hour job on 8 d1s.8xlarge -> 40 minutes on mighty mouse • Much smaller wins vs d2s • Customize the configuration: • 2 slots only, to maximize memory • Rely on Amazon Redshift to parallelize • Primary limitation is on the size of the copy
  • 42. Custom Computation Clusters Ephemeral Infrastructure (Java) • sync_redshift_objects.sh -- create schema/tables into a new cluster from an existing cluster • unload.sh • manifest file • automated checks for Amazon S3 consistency • copy.sh • Parameters: schema, table • mirrors unload
  • 43. Infrastructure/Automation Azkaban – batch workflow + UI (LinkedIn) Azkaban calls data transfer infrastructure, SQL, etc.
  • 44. Infrastructure/Automation Custom UI for non engineers -- point and click. • Import from file or Hive • Auto-analyze source data, suggest table ddl (sort, distribution, compression), schedule
  • 45. Monitoring • Long running query alerts • Accounts created through a UI that maps to our LDAP • Every account mapped to an email address • Daily and peak usage reports, by user and type of activity • Daily table size, space taken up by deleted rows by table report – deep_copy.sh instead of vacuum • Initial monitoring cost can be minimal because of psql compatibility • psql --quiet --html --table-attr cellspacing='0' --file $QUERY_FILE --output $RESULT_FILE
  • 47. Shared Development Cluster • Calisi syntax allows for parallel development on a single shared development cluster • Each developer has their own personal schema • Any workflow can be run against their personal schema if the SQL is marked up in this way for substitution. /*{*/sherpa/*}*/ -> jbezos /*{*/sherpa/*}*/ -> wvogels
  • 48. Shared Development Cluster String.format("create schema %s %s", targetSchema, authorizationSchema)); List<Table> tables = syncRedshiftObjects.loadTablesForSchema(originalSche ma); for (Table table : tables) { String tableName = table.getTable(); String ddl = table.getCreateTableStatement().replace(table.getSch ema() + ".", targetSchema + ".");
  • 49. View Performance • Create tables by month, with views spanning the months. • Use Calisi syntax to surround SQL using the view: /*{*/rio_flookback/*}*/ -> rio_flookback_aug • Syntax is completely legitimate SQL against a view -- autocomplete columns in IDE, run. • When run through a framework, can substitute automatically to run against the underlying monthly table where the query doesn't span multiple months.
  • 50. Queue Management • Keep it simple, the memory is reserved • Mission critical • DBA • Quick • Default • Include a default timeout • Groups allow you to dynamically switch default queues • Tableau • Can’t run set query group • Use group membership to switch queues
  • 51. Wrap Up • Amazon Redshift has transformed analytics at TripAdvisor • Thousands of queries a day, hundreds of users • Tableau dashboards built on Amazon Redshift • Iterative real time exploration, hundreds of times faster • Open LDAP access for any employee • Open sourcing these solutions • Feb 2016 • Find the link here: http://engineering.tripadvisor.com/
  • 52. Related Sessions Hear from other customers discussing their Amazon Redshift use cases: • DAT201—Introduction to Amazon Redshift (RetailMeNot) • ISM303—Migrating Your Enterprise Data Warehouse to Amazon Redshift (Boingo Wireless and Edmunds) • ARC303—Pure Play Video OTT: A Microservices Architecture in the Cloud (Verizon) • ARC305—Self-Service Cloud Services: How J&J Is Managing AWS at Scale for Enterprise Workloads • BDT306—The Life of a Click: How Hearst Publishing Manages Clickstream Analytics with AWS • DAT308 - How Yahoo! Analyzes Billions of Events a Day on Amazon Redshift • DAT311—Large-Scale Genomic Analysis with Amazon Redshift (Human Longevity) • BDT314—Running a Big Data and Analytics Application on Amazon EMR and Amazon Redshift with a Focus on Security (Nasdaq) • BDT316—Offloading ETL to Amazon Elastic MapReduce (Amgen) • Building a Mobile App using Amazon EC2, Amazon S3, Amazon DynamoDB, and Amazon Redshift (Tinder)