Introducing Azure SQL Data Warehouse

Grant Fritchey | www.ScaryDBA.com
www.ScaryDBA.com
Introducing
Azure SQL Data Warehouse
Grant Fritchey
grant@scarydba.com

Goals
 Understand the basic infrastructure and architecture behindAzure SQL
Data Warehouse
 Learn different methods of design, querying, and data migration in
order to begin an implementation ofAzure SQL Data Warehouse
 Investigate the tooling available in support of automation and
monitoring around Azure SQL Data Warehouse

Get in touch Grant Fritchey
scarydba.com
grant@scarydba.com
@gfritchey

 Analytics Platform System (APS)
 Not simply a database
» Massively parallel computing platform
 Platform as a Service (PaaS)
 Pay for what you use
» Pay for when you use it
 Connectivity dependent
 Just a database
4

ARCHITECTURE
AzureSQL DataWarehouse
5

 Built on a combination ofAzure SQL Database and Analytics Platform
System(APS)
 DBMS = Azure SQL Database
 Processing = APS
 Storage = Azure BLOB Storage
 Default storage is through columnstore
 It’s still SQL Server at it’s core
6

Grant Fritchey | www.ScaryDBA.com 7
BlobStorage
APS
Control Node:
Coordinates data movement
and workload management
Compute Nodes:
Provide processing mechanisms
in parallel or individually
Massively Parallel Processing
Engine
Read Access Geo-Redundant Storage:
RA-GRS stores multi-terabyte data
across Azure geo regions
Application

Table Architecture
 Clustered columnstore by default
 Each “table” consists of 60 tables
 Tables consist of segments
» 100k per compressed row group improves performance
» 1 million rows per/group is max
 Columnstore storage
» Compressed colulmnstore segments
» Delta store (standard clustered index)
8

Protection Features
 Locally Redundant Storage
 Geo-Redundant Storage
 Automated backups
» Every 8 hours
» Kept for 7 days
 Transparent Data Encryption
9

Security
 SQL Server logins
 AzureActive Directory
 Manage ResourceGroups
 Firewall
 Built-in Auditing
10

DATABASE DESIGN
12

Actually, Table Design
 Define table distribution
 Partitioning
 Statistics
 GeneralTips
 Unsupported
13

Table Distribution
 Each table consists of 60 tables
» 60 distributions
 Round-robin
» One, then the next
 Hash
 For best performance, pick the distribution method
14

Round-Robin Distribution
 Starting out
 No join key to other tables
 No good hash candidate
 Joins against this table aren’t significant
 Staging or temporary table
15

Hash Distribution
 Ensure
» No updates
» Even data distribution
» Minimal data movement
 Suggestions for Hash key
» Highly selective data
» Minimal nulls and duplicates
» Avoid dates
» Avoid fewer than 60 values
» Foreign key columns
16

Ensuring Index Quality
 Avoid memory pressure when building indexes
» Balance memory with concurrency
 Avoid high volume DML operations
» Deletes are not deleted until table rebuild
» Inserts are added to delta group
» Updates are logical delete then an insert (delta group)
» Different than large DML operations
— 102,400 rows per distribution, or 6.144 million rows in an operation goes to direct
storage
 Avoid small or trickle load operations
» Very small data loads always go to delta group
 Be cautious with the number of partitions
» Each partition is a new table
» Each table is 60 tables
17

Table Tips
 Row Store
» < 60 million rows
» Frequent updates
» Small dimension tables
 Columnstore
» > 60 million rows
» Infrequent updates
» Fact tables & large dimension tables
18

Partitioning
 60 million rows per partition to see benefits
 There can be too many partitions
 Partitioning can prevent 1 million rows per group
 Partitioning can cause rows to go to delta row group instead of
compressed row group
 Partition elimination must occur to see benefits
19

Statistics
 No automatic creation
 No automatic update
 Microsoft suggests creating statistics on every column as a start point
» I don’t agree, but this is a better choice than no statistics
 Multi-column statistics supported
» Histogram is still only on first column
 Syntax is the same
20

General Tips
 Denormalization is actually viable
 Use minimum viable data size
 Heap tables for transient data
21

Unsupported
 Currently (these things change)
» Identity
» Primary key, foreign key, unique and check constraints
» Unique indexes
» Computed columns
» Sparse columns
» User-Defined types
» Sequence
» Triggers
» Indexed views
» Synonyms
22

And Memory
 Connection group setting
 More memory more processing as ADW size increases
 Still only 30 connections
 Fundamental to data loads as well as querying
23

D-SQL
25

New & Different
 CREATETABLEAS SELECT
 GROUP BY differences
 Labels
 Stored procedures limitations
 View limitations
 General Notes
26

CREATE TABLE AS SELECT
 Must define distribution
 Uses parallel processing
 Uses
» Copy a table
» Change structure on a table
» Replace ANSI derived tables (unsupported)
» External data import
27

GROUP BY
 Unsupported
» ROLLUP
» GROUPING SETS
» CUBE
28

Labels
 Mark a query
 Useful for troubleshooting
29

Stored procedures limitations
 Unsupported
» Temporary stored procedures
» Numbered stored procedures
» Extended stored procedures
» CLR stored procedures
» Encryption
» Replication
» Table-valued parameters
» Read-only parameters
» Default parameters
» Execution contexts
» RETURN statement
30

View Limitations
 Schema binding
 No data manipulation through view
 No temporary tables
 No support for EXPAND/NOEXPAND
 No indexed views
31

General Notes
 Cursurs are not supported
» UseWHILE
 Transaction isolation level is limited to READ_UNCOMMITTED
 No SELECT or UPDATE for variable assignment
» Instead
SET @i = (SELECT count(*) FROM dbo.Table)
32

DATA IMPORT MECHANISMS
33

Import Processes
 Azure Data Factory
 SSIS
 Polybase
 3rd Party
34

Azure Data Factory
 Currently single core through control node
» Can use Polybase
 Reads from
» Azure blob storage
» Azure SQL Database
» On-premises SQL Server
» SQL ServerVM in Azure
 Requires software installations locally to On-Premise andVMs
 Second slowest method (unless Polybase is used)
35

SSIS
 Single core through control node only
 Include retry logic
 Increase timeout, radically
 Use “all or nothing” load processing
 Parallel loads from multiple SSIS can help
 Slowest method according to Microsoft
36

Polybase
 Supports delimted file and Hadoop
 Supports compressed files
» Gzip,zlab, snappy
 Single compressed file per reader, for better performance, multiple
compressed files scaled for DWU
 Compressed files load slower, but upload faster
 Single operation
 Load speed increases with scale
» Readers increase
» Writers increase
37

3rd Party
38

Data Loading Tips
 Network bandwidth must be considered unless the load is all done
withinAzure
» Express Route, paid access, can help
 Memory affects columnstore, so use more memory for load processes
 Fixed length file format not currently supported by Polybase
 Remember, it’s all a balancing act between upload speed & import
speeds
 100k chunks to get data onto compressed segments in columnstore
39

TOOLING
40

Available Tools
 Azure Portal
 Visual Studio
 SQL Server Management Studio
 PowerShell
41

MAINTENANCE
43

SQL Server
 Index Maintenance
» But not for defragmentation
 Statistics maintenance
 Monitoring
 Backups
» Managed for you, just monitor
44

Statistics
 No automatic creation
 No automatic update
» Update after data loads
» Update after data modification
» If either of the above doesn’t change data distribution, don’t update the
statistics
 Target columns
» JOIN
» GROUP BY
» ORDER BY
» WHERE
» HAVING
 Syntax is the same as SQL Server
45

DBCC SHOW_STATISTICS()
 Limits
» No undocumented features
» No stats_stream
» Square brackets not supported
» Cannot use column names to identify stats
— Must use the stats name
46

Monitoring
 Portal
 Dynamic ManagementViews
» Sys.pdw_loader_backup_runs
» Sys.dm_pdw_exec_sessions
» Sys.dm_pdw_exec_requests
» Sys.dm_pdw_request_steps
» Sys.dm_pdw_sql_requests
» Sys.dm_pdw_dms_workers
» Sys.dm_pdw_waits
 DBCC
» PDW_SHOWEXECUTIONPLAN
» PDW_SHOWSPACEUSED
47

Microsoft Marketing Slide
48

Resources
 Microsoft Documentation
 Azure Data Platform Learning Resources
 Grant Fritchey
 ColumnstoreArchitecture
 Troubleshooting
 CreatingArtificial KeyValues
49

Most useful docs
 https://azure.microsoft.com/en-us/documentation/articles/sql-data-
warehouse-best-practices/
warehouse-tables-index/#causes-of-poor-columnstore-index-quality
warehouse-tables-distribute/
52

Introducing Azure SQL Data Warehouse

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introducing Azure SQL Data Warehouse

Similar to Introducing Azure SQL Data Warehouse (20)

More from Grant Fritchey

More from Grant Fritchey (20)

Recently uploaded

Recently uploaded (20)

Introducing Azure SQL Data Warehouse