This document provides an introduction to Azure SQL Data Warehouse. It discusses the architecture of ASDW including how it is built on Azure SQL Database and Analytics Platform System (APS). It covers various topics like database design, querying, data loading, tooling, and maintenance for ASDW. The goals are to understand the basic infrastructure, learn design/querying/migration methods, and investigate available tooling for automation and monitoring of ASDW.
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
Introducing Azure SQL Data Warehouse
1. Grant Fritchey | www.ScaryDBA.com
www.ScaryDBA.com
Introducing
Azure SQL Data Warehouse
Grant Fritchey
grant@scarydba.com
2. Grant Fritchey | www.ScaryDBA.com
Goals
Understand the basic infrastructure and architecture behindAzure SQL
Data Warehouse
Learn different methods of design, querying, and data migration in
order to begin an implementation ofAzure SQL Data Warehouse
Investigate the tooling available in support of automation and
monitoring around Azure SQL Data Warehouse
3. Grant Fritchey | www.ScaryDBA.com
Get in touch Grant Fritchey
scarydba.com
grant@scarydba.com
@gfritchey
4. Grant Fritchey | www.ScaryDBA.com
Azure SQL Data Warehouse
Analytics Platform System (APS)
Not simply a database
» Massively parallel computing platform
Platform as a Service (PaaS)
Pay for what you use
» Pay for when you use it
Connectivity dependent
Just a database
4
5. Grant Fritchey | www.ScaryDBA.com
ARCHITECTURE
AzureSQL DataWarehouse
5
6. Grant Fritchey | www.ScaryDBA.com
Azure SQL Data Warehouse
Built on a combination ofAzure SQL Database and Analytics Platform
System(APS)
DBMS = Azure SQL Database
Processing = APS
Storage = Azure BLOB Storage
Default storage is through columnstore
It’s still SQL Server at it’s core
6
7. Grant Fritchey | www.ScaryDBA.com 7
BlobStorage
APS
Control Node:
Coordinates data movement
and workload management
Compute Nodes:
Provide processing mechanisms
in parallel or individually
Massively Parallel Processing
Engine
Read Access Geo-Redundant Storage:
RA-GRS stores multi-terabyte data
across Azure geo regions
Application
8. Grant Fritchey | www.ScaryDBA.com
Table Architecture
Clustered columnstore by default
Each “table” consists of 60 tables
Tables consist of segments
» 100k per compressed row group improves performance
» 1 million rows per/group is max
Columnstore storage
» Compressed colulmnstore segments
» Delta store (standard clustered index)
8
9. Grant Fritchey | www.ScaryDBA.com
Protection Features
Locally Redundant Storage
Geo-Redundant Storage
Automated backups
» Every 8 hours
» Kept for 7 days
Transparent Data Encryption
9
14. Grant Fritchey | www.ScaryDBA.com
Table Distribution
Each table consists of 60 tables
» 60 distributions
Round-robin
» One, then the next
Hash
For best performance, pick the distribution method
14
15. Grant Fritchey | www.ScaryDBA.com
Round-Robin Distribution
Starting out
No join key to other tables
No good hash candidate
Joins against this table aren’t significant
Staging or temporary table
15
16. Grant Fritchey | www.ScaryDBA.com
Hash Distribution
Ensure
» No updates
» Even data distribution
» Minimal data movement
Suggestions for Hash key
» Highly selective data
» Minimal nulls and duplicates
» Avoid dates
» Avoid fewer than 60 values
» Foreign key columns
16
17. Grant Fritchey | www.ScaryDBA.com
Ensuring Index Quality
Avoid memory pressure when building indexes
» Balance memory with concurrency
Avoid high volume DML operations
» Deletes are not deleted until table rebuild
» Inserts are added to delta group
» Updates are logical delete then an insert (delta group)
» Different than large DML operations
— 102,400 rows per distribution, or 6.144 million rows in an operation goes to direct
storage
Avoid small or trickle load operations
» Very small data loads always go to delta group
Be cautious with the number of partitions
» Each partition is a new table
» Each table is 60 tables
17
18. Grant Fritchey | www.ScaryDBA.com
Table Tips
Row Store
» < 60 million rows
» Frequent updates
» Small dimension tables
Columnstore
» > 60 million rows
» Infrequent updates
» Fact tables & large dimension tables
18
19. Grant Fritchey | www.ScaryDBA.com
Partitioning
60 million rows per partition to see benefits
There can be too many partitions
Partitioning can prevent 1 million rows per group
Partitioning can cause rows to go to delta row group instead of
compressed row group
Partition elimination must occur to see benefits
19
20. Grant Fritchey | www.ScaryDBA.com
Statistics
No automatic creation
No automatic update
Microsoft suggests creating statistics on every column as a start point
» I don’t agree, but this is a better choice than no statistics
Multi-column statistics supported
» Histogram is still only on first column
Syntax is the same
20
21. Grant Fritchey | www.ScaryDBA.com
General Tips
Denormalization is actually viable
Use minimum viable data size
Heap tables for transient data
21
23. Grant Fritchey | www.ScaryDBA.com
And Memory
Connection group setting
More memory more processing as ADW size increases
Still only 30 connections
Fundamental to data loads as well as querying
23
25. Grant Fritchey | www.ScaryDBA.com
D-SQL
AzureSQL DataWarehouse
25
26. Grant Fritchey | www.ScaryDBA.com
New & Different
CREATETABLEAS SELECT
GROUP BY differences
Labels
Stored procedures limitations
View limitations
General Notes
26
27. Grant Fritchey | www.ScaryDBA.com
CREATE TABLE AS SELECT
Must define distribution
Uses parallel processing
Uses
» Copy a table
» Change structure on a table
» Replace ANSI derived tables (unsupported)
» External data import
27
28. Grant Fritchey | www.ScaryDBA.com
GROUP BY
Unsupported
» ROLLUP
» GROUPING SETS
» CUBE
28
29. Grant Fritchey | www.ScaryDBA.com
Labels
Mark a query
Useful for troubleshooting
29
31. Grant Fritchey | www.ScaryDBA.com
View Limitations
Schema binding
No data manipulation through view
No temporary tables
No support for EXPAND/NOEXPAND
No indexed views
31
32. Grant Fritchey | www.ScaryDBA.com
General Notes
Cursurs are not supported
» UseWHILE
Transaction isolation level is limited to READ_UNCOMMITTED
No SELECT or UPDATE for variable assignment
» Instead
SET @i = (SELECT count(*) FROM dbo.Table)
32
33. Grant Fritchey | www.ScaryDBA.com
DATA IMPORT MECHANISMS
AzureSQL DataWarehouse
33
34. Grant Fritchey | www.ScaryDBA.com
Import Processes
Azure Data Factory
SSIS
Polybase
3rd Party
34
35. Grant Fritchey | www.ScaryDBA.com
Azure Data Factory
Currently single core through control node
» Can use Polybase
Reads from
» Azure blob storage
» Azure SQL Database
» On-premises SQL Server
» SQL ServerVM in Azure
Requires software installations locally to On-Premise andVMs
Second slowest method (unless Polybase is used)
35
36. Grant Fritchey | www.ScaryDBA.com
SSIS
Single core through control node only
Include retry logic
Increase timeout, radically
Use “all or nothing” load processing
Parallel loads from multiple SSIS can help
Slowest method according to Microsoft
36
37. Grant Fritchey | www.ScaryDBA.com
Polybase
Supports delimted file and Hadoop
Supports compressed files
» Gzip,zlab, snappy
Single compressed file per reader, for better performance, multiple
compressed files scaled for DWU
Compressed files load slower, but upload faster
Single operation
Load speed increases with scale
» Readers increase
» Writers increase
37
39. Grant Fritchey | www.ScaryDBA.com
Data Loading Tips
Network bandwidth must be considered unless the load is all done
withinAzure
» Express Route, paid access, can help
Memory affects columnstore, so use more memory for load processes
Fixed length file format not currently supported by Polybase
Remember, it’s all a balancing act between upload speed & import
speeds
100k chunks to get data onto compressed segments in columnstore
39
40. Grant Fritchey | www.ScaryDBA.com
TOOLING
AzureSQL DataWarehouse
40
41. Grant Fritchey | www.ScaryDBA.com
Available Tools
Azure Portal
Visual Studio
SQL Server Management Studio
PowerShell
41
43. Grant Fritchey | www.ScaryDBA.com
MAINTENANCE
AzureSQL DataWarehouse
43
44. Grant Fritchey | www.ScaryDBA.com
SQL Server
Index Maintenance
» But not for defragmentation
Statistics maintenance
Monitoring
Backups
» Managed for you, just monitor
44
45. Grant Fritchey | www.ScaryDBA.com
Statistics
No automatic creation
No automatic update
» Update after data loads
» Update after data modification
» If either of the above doesn’t change data distribution, don’t update the
statistics
Target columns
» JOIN
» GROUP BY
» ORDER BY
» WHERE
» HAVING
Syntax is the same as SQL Server
45
46. Grant Fritchey | www.ScaryDBA.com
DBCC SHOW_STATISTICS()
Limits
» No undocumented features
» No stats_stream
» Square brackets not supported
» Cannot use column names to identify stats
— Must use the stats name
46
49. Grant Fritchey | www.ScaryDBA.com
Resources
Microsoft Documentation
Azure Data Platform Learning Resources
Grant Fritchey
ColumnstoreArchitecture
Troubleshooting
CreatingArtificial KeyValues
49
50. Grant Fritchey | www.ScaryDBA.com
Goals
Understand the basic infrastructure and architecture behindAzure SQL
Data Warehouse
Learn different methods of design, querying, and data migration in
order to begin an implementation ofAzure SQL Data Warehouse
Investigate the tooling available in support of automation and
monitoring around Azure SQL Data Warehouse
51. Grant Fritchey | www.ScaryDBA.com
Get in touch Grant Fritchey
scarydba.com
grant@scarydba.com
@gfritchey
52. Grant Fritchey | www.ScaryDBA.com
Most useful docs
https://azure.microsoft.com/en-us/documentation/articles/sql-data-
warehouse-best-practices/
https://azure.microsoft.com/en-us/documentation/articles/sql-data-
warehouse-tables-index/#causes-of-poor-columnstore-index-quality
https://azure.microsoft.com/en-us/documentation/articles/sql-data-
warehouse-tables-distribute/
52