SlideShare a Scribd company logo
1 of 59
Tuning and Optimizing
U-SQL queries
for maximum performance
Michael Rys, Principal Program Manager, Microsoft
@MikeDoesBigData, usql@microsoft.com
AD-315-MAD-400-M
Please silence
cell phones
Session Objectives And Takeaways
Session Objective(s):
• Understand the U-SQL Query execution
• Be able to understand and improve U-SQL job performance/cost
• Be able to understand and improve the U-SQL query plan
• Know how to write more efficient U-SQL scripts
Key Takeaways:
• U-SQL is designed for scale-out
• U-SQL provides scalable execution of user code
• U-SQL has a tool set that can help you analyze and improve your scalability, cost and performance
Agenda
• Job Execution Experience and Investigations
Query Execution
Stage Graph
Dryad crash course
Job Metrics
Resource Planning
• Job Performance Analysis
Analyze the critical path
Heat Map
Critical Path
Data Skew
• Tuning / Optimizations
Cost Optimizations
Data Partitioning
Partition Elimination
Predicate Pushing
Column Pruning
Some Data Hints
UDOs can be evil
INSERT optimizations
U-SQL Query Execution and Performance Tuning
Job Scheduler
& Queue
Front-EndService
6
Vertex Execution
Consume
Overall U-SQL Batch Job Execution Lifetime
Local
Storage
Data Lake
Store
Author
Plan
Compiler Optimizer
Vertexes
running in
YARN
Containers
U-SQL
Runtime
Optimized
Plan
Vertex Scheduling
On containers
Job Manager
USQL
Compiler
Service &
USQL Catalog
Expression-flow
Programming Style
Automatic "in-lining" of U-SQL expressions
– whole script leads to a single execution
model.
Execution plan that is optimized out-of-the-
box and w/o user intervention.
Per job and user driven level of
parallelization.
Detail visibility into execution steps, for
debugging.
Heatmap like functionality to identify
performance bottlenecks.
U-SQL Compilation Process
C#
C++
Algebra
Other files
(system files, deployed resources)
managed dll
Unmanaged dll
Compilation output (in job folder)
Compiler &
Optimizer
U-SQL Metadata
Service
Deployed to
Vertices
Analyzing a job
Parallelism
1000 (ADLAUs)
Work composed of
12K Vertices
1 ADLAU currently maps to a VM with 2 cores and 6 GB of memory
U-SQL Query Execution
Physical plans vs. Dryad stage graph…
Stage Details
252 Pieces of work
AVG Vertex
execution time
4.3 Billion rows
Data Read &
Written
Super Vertex = Stage
16
U-SQL Query Execution
Redefinition of big-data…
17
U-SQL Query Execution
Redefinition of big-data…
18
U-SQL Performance Analysis
Analyze the critical path, heat maps, playback, and runtime metrics on every vertex…
Tuning for
Cost Efficiency
Dips down to 1 active vertex at
these times
Smallest estimated time when
given 2425 ADLAUs
1410 seconds
= 23.5 minutes
Model with 100 ADLAUs
8709 seconds
= 145.5 minutes
Data Storage
• Files
• Tables
• Unstructured Data in files
• Files are split into 250MB extents
• 4 extents per vertex -> 1GB per vertex
• Different file content formats:
• Splittable formats are parallelizable:
• row-oriented (CSV etc)
• Where data does not span extents
• Non-splittable formats cannot be parallelized:
• XML/JSON
• Have to be processed in single vertex extractor with
atomicFileProcessing=true.
• Use File Sets to provide semantic partition pruning
• Tables
• Clustered Index (row-oriented) storage
• Vertical and horizontal partitioning
• Statistics for the optimizer (CREATE STATISTICS)
• Native scalar value serialization
Querying
unstructured data
// Unstructured Files (24 hours daily log impressions)
@Impressions =
EXTRACT ClientId int, Market string, OS string, ...
FROM @"wasb://ads@wcentralus/2015/10/30/{*}.nif"
FROM @"wasb://ads@wcentralus/2015/10/30/{Market}_{*}.nif"
;
// …
// Filter to by Market
@US =
SELECT * FROM @Impressions
WHERE Market == "en"
;
U-SQL Optimizations
Partition Elimination – Unstructured Files
Partition Elimination
• Even with unstructured files!
• Leverage Virtual Columns (Named)
• Avoid unnamed {*}
• WHERE predicates on named virtual columns
• That binds the PE range during compilation time
• Named virtual columns without predicate = warning
• Design directories/files with PE in mind
• Design for elimination early in the tree, not in the leaves
Extracts all files in the folder
Post filter = pay I/O cost to drop most data
PE pushes this predicate to the EXTRACT
EXTRACT now only reads “en” files!
en_10.0.nif
en_8.1.nif
de_10.0.nif
jp_7.0.nif
de_8.1.nif
../2015/10/30/
…
How many clicks per domain?
@rows = SELECT
Domain,
SUM(Clicks) AS TotalClicks
FROM @ClickData
GROUP BY Domain;
File
Read Read
Partition Partition
Full Agg
Write
Full Agg
Write
Full Agg
Write
Read
Partition
Partial Agg Partial Agg Partial Agg
CNN,
FB,
WH
EXTENT 1 EXTENT 2 EXTENT 3
CNN,
FB,
WH
CNN,
FB,
WH
U-SQL Table Distributed by Domain
Read Read
Full Agg Full Agg
Write Write
Read
Full Agg
Write
FB
EXTENT 1
WH
EXTENT 2
CNN
EXTENT 3
Expensive!
Scaling out with
Distributions
Data
Partitioning
Tables
Table Partitioning and Distribution
• Fine grained (horizontal) partitioning/distribution
• Distributes within a partition (together with clustering) to keep same
data values close
• Choose for:
• Join alignment, partition size, filter selectivity, partition elimination
• Coarse grained (vertical) partitioning
• Based on Partition keys
• Partition is addressable in language
• Query predicates will allow partition pruning
• Choose for data life cycle management, partition elimination
Distribution
Scheme
When to use?
HASH(keys) Automatic Hash for fast item lookup
DIRECT HASH(id) Exact control of hash bucket value
RANGE(keys) Keeps ranges together
ROUND ROBIN To get equal distribution (if others give skew)
Partitions,
Distributions
and Clusters
TABLE
T ( id …
, C …
, date DateTime, …
, INDEX i
CLUSTERED (id, C)
PARTITIONED BY
(date)
DISTRIBUTED BY
HASH(id) INTO 4)
PARTITION (@date1) PARTITION (@date2) PARTITION (@date3)
HASH DISTRIBUTION 1
HASH DISTRIBUTION 2
HASH DISTRIBUTION 3
HASH DISTRIBUTION 1
HASH DISTRIBUTION 1
HASH DISTRIBUTION 2
HASH DISTRIBUTION 3
HASH DISTRIBUTION 4 HASH DISTRIBUTION 3
C1
C2
C3
C1
C2
C4
C5
C4
C6
C6
C7
C8
C7
C5
C6
C9
C10
C1
C3
/catalog/…/tables/Guid(T)/
Guid(T.p1).ss Guid(T.p2).ss Guid(T.p3).ss
LOGICAL
PHYSICAL
Benefits of
Clustered
Index in
Distribution
Benefits
• Design for most frequent/costly queries
• Manage data skew in distribution bucket
• Provide locality of same data values
• Provide seeks and range scans for query predicates (index
lookup)
Clustered index in tables is mandatory, chose according to
desired benefits
Pro Tip:
Distribution keys should be prefix of Clustered Index keys:
Especially for RANGE distribution
Optimizer will make use of global ordering then:
If you make the RANGE distribution key a prefix of the index key, U-
SQL will repartition on demand to align any UNIONALLed or JOINed
tables or partitions!
Split points of table distribution partitions are choosen independently,
so any partitioned table can do UNION ALL in this manner if the data
is to be processed subsequently on the distribution key.
Benefits of
Distribution in
Tables
Benefits
• Design for most frequent/costly queries
• Manage data skew in partition/table
• Manage parallelism in querying (by number of
distributions)
• Manage minimizing data movement in joins
• Provide distribution seeks and range scans for query
predicates (distribution bucket elimination)
Distribution in tables is mandatory, chose according to
desired benefits
Benefits of
Partitioned
Tables
Benefits
• Partitions are addressable
• Enables finer-grained data lifecycle management at
partition level
• Manage parallelism in querying by number of partitions
• Query predicates provide partition elimination
• Predicate has to be constant-foldable
Use partitioned tables for
• Managing large amounts of incrementally growing
structured data
• Queries with strong locality predicates
• point in time, for specific market etc
• Managing windows of data
• provide data for last x months for processing
Partitioned
tables
Use partitioned tables
for querying parts of
large amounts of
incrementally growing
structured data
Get partition
elimination
optimizations with the
right query predicates
Creating partition table
CREATE TABLE PartTable(id int, event_date DateTime, lat float, long float
, INDEX idx CLUSTERED (vehicle_id ASC)
PARTITIONED BY(event_date) DISTRIBUTED BY HASH (vehicle_id) INTO 4);
Creating partitions
DECLARE @pdate1 DateTime = new DateTime(2014, 9, 14,
00,00,00,00,DateTimeKind.Utc);
DECLARE @pdate2 DateTime = new DateTime(2014, 9, 15,
00,00,00,00,DateTimeKind.Utc);
ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@pdate2);
Loading data into partitions dynamically
DECLARE @date1 DateTime = DateTime.Parse("2014-09-14");
DECLARE @date2 DateTime = DateTime.Parse("2014-09-16");
INSERT INTO vehiclesP ON INTEGRITY VIOLATION IGNORE
SELECT vehicle_id, event_date, lat, long FROM @data
WHERE event_date >= @date1 AND event_date <= @date2;
• Filters and inserts clean data only, ignore “dirty” data
Loading data into partitions statically
ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@baddate);
INSERT INTO vehiclesP ON INTEGRITY VIOLATION MOVE TO @baddate
SELECT vehicle_id, lat, long FROM @data
WHERE event_date >= @date1 AND event_date <= @date2;
@Impressions =
SELECT * FROM
searchDM.SML.PageView(@start, @end) AS PageView
OPTION(SKEWFACTOR(Query)=0.5)
;
// Q1(A,B)
@Sessions =
SELECT
ClientId,
Query,
SUM(PageClicks) AS Clicks
FROM
@Impressions
GROUP BY
Query, ClientId
;
// Q2(B)
@Display =
SELECT * FROM @Sessions
INNER JOIN @Campaigns
ON @Sessions.Query == @Campaigns.Query
;
U-SQL Optimizations
Distributions – Minimize (re)partitions
Input must be distributed on:
(Query)
Input must be distributed on:
(Query) or (ClientId) or (Query, ClientId)
Optimizer wants to distribute only once
But Query could be skewed
Data Distribution
• Re-Distributing is very expensive
• Many U-SQL operators can handle multiple distribution choices
• Optimizer bases decision upon estimations
Wrong statistics may result in worse query performance
// Unstructured (24 hours daily log impressions)
@Huge = EXTRACT ClientId int, ...
FROM
@"wasb://ads@wcentralus/2015/10/30/{*}.nif"
;
// Small subset (ie: ForgetMe opt out)
@Small = SELECT * FROM @Huge
WHERE Bing.ForgetMe(x,y,z)
OPTION(ROWCOUNT=500)
;
// Result (not enough info to determine simple Broadcast
join)
@Remove = SELECT * FROM Bing.Sessions
INNER JOIN @Small ON Sessions.Client ==
@Small.Client
;
U-SQL Optimizations
Distribution - Cardinality
Broadcast JOIN right?
Broadcast is now a candidate.
Wrong statistics may result in worse query performance
=> CREATE STATISTICS
Optimizer has no stats this is small...
What is Data Skew?
• Some data points are
much more common
than others
• data may be
distributed such that
all rows that match a
certain key go to a
single vertex
• imbalanced
execution, vertex
time out.
0
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
40,000,000
California
Florida
Ohio
NorthCarolina
Washington
Indiana
Maryland
Colorado
Louisiana
Oklahoma
Mississippi
Utah
Nebraska
Hawaii
RhodeIsland
SouthDakota
DistrictofColumbia
Population by State
Low Distinctiveness Keys
• Keys with small
selectivity can lead
to large vertices even
without skew
@rows =
SELECT Gender, AGG<MyAgg>(…) AS Result
FROM @HugeInput
GROUP BY Gender;
Gender==Male Gender==Female
@HugeInput
Vertex 0 Vertex 1
Why is this a problem?
Vertexes have a 5 hour runtime limit!
Your UDO or join may excessively allocate memory.
• Your memory usage may not be obvious due to garbage collection
Addressing Data Skew/Low distinctiveness
• Improve data partition sizes:
• Find more fine grained keys, eg, states and congressional districts or ZIP codes
• If no fine grained keys can be found or are too fine-grained: use ROUND ROBIN
distribution
• Write queries that can handle data skew:
• Use filters that prune skew out early
• Use Data Hints to identify skew and “low distinctness” in keys:
• SKEWFACTOR(columns) = x
provides hint that given columns have a skew factor x between
0 (no skew) and 1 (very heavy skew))
• DISTINCTVALUE(columns) = n
let’s you specify how many distinct values the given columns have (n>1)
• Implement aggregation/reducer recursively if possible
Non-Recursive vs Recursive SUM
1 2 3 4 5 6 7 8 36
1 2 3 4 5 6 7 8
6 15 15
36
U-SQL Partitioning during Processing
Data Skew
U-SQL Partitioning
Data Skew – Recursive Reducer
// Metrics per domain
@Metric =
REDUCE @Impressions ON UrlDomain
USING new Bing.TopNReducer(count:10)
;
// …
Inherent Data Skew
[SqlUserDefinedReducer(IsRecursive = true)]
public class TopNReducer : IReducer
{
public override IEnumerable<IRow>
Reduce(IRowset input, IUpdatableRow output)
{
// Compute TOP(N) per group
// …
}
}
Recursive
• Allow multi-stage aggregation trees
• Requires same schema (input => output)
• Requires associativity:
• R(x, y) = R( R(x), R(y) )
• Default = non-recursive
• User code has to honor recursive semantics
www.bing.com
brought to a single vertex
// Bing impressions
@Impressions = SELECT * FROM
searchDM.SML.PageView(@start, @end) AS PageView
;
// Compute sessions
@Sessions =
REDUCE @Impressions ON Client, Market
READONLY Market
USING new Bing.SessionReducer(range : 30)
;
// Users metrics
@Metrics =
SELECT * FROM @Sessions
WHERE
Market == "en-us"
;
// …
Microsoft Confidential
U-SQL Optimizations
Predicate pushing – UDO pass-through columns
Show me U-
SQL UDOs!
https://blogs.msdn.microsoft.com/azuredatalake/20
16/06/27/how-do-i-combine-overlapping-ranges-
using-u-sql-introducing-u-sql-reducer-udos/
// Bing impressions
@Impressions = SELECT * FROM
searchDM.SML.PageView(@start, @end) AS PageView
;
// Compute page views
@Impressions =
PROCESS @Impressions
READONLY Market
PRODUCE Client, Market, Header string
USING new Bing.HtmlProcessor()
;
@Sessions =
REDUCE @Impressions ON Client, Market
READONLY Market
USING new Bing.SessionReducer(range : 30)
;
// Users metrics
@Metrics =
SELECT * FROM @Sessions
WHERE
Market == "en-us"
;
Microsoft Confidential
U-SQL Optimizations
Predicate pushing – UDO row level processors
public abstract class IProcessor : IUserDefinedOperator
{
/// <summary/>
public abstract IRow Process(IRow input, IUpdatableRow output);
}
public abstract class IReducer : IUserDefinedOperator
{
/// <summary/>
public abstract IEnumerable<IRow> Reduce(IRowset input, IUpdatableRow output);
}
// Bing impressions
@Impressions = SELECT Client, Market, Html FROM
searchDM.SML.PageView(@start, @end) AS PageView
;
// Compute page views
@Impressions =
PROCESS @Impressions
PRODUCE Client, Market, Header string
USING new Bing.HtmlProcessor()
;
// Users metrics
@Metrics =
SELECT * FROM @Sessions
WHERE
Market == "en-us"
&& Header.Contains("microsoft.com")
AND Header.Contains("microsoft.com")
;
U-SQL Optimizations
Predicate pushing – relational vs. C# semantics
// Bing impressions
@Impressions = SELECT * FROM
searchDM.SML.PageView(@start, @end) AS PageView
;
// Compute page views
@Impressions =
PROCESS @Impressions
PRODUCE *
REQUIRED ClientId, HtmlContent(Header, Footer)
USING new Bing.HtmlProcessor()
;
// Users metrics
@Metrics =
SELECT ClientId, Market, Header FROM @Sessions
WHERE
Market == "en-us"
;
U-SQL Optimizations
Column Pruning and dependencies
C H M
C H M
C H M
Column Pruning
• Minimize I/O (data shuffling)
• Minimize CPU (complex processing, html)
• Requires dependency knowledge:
• R(D*) = Input ( Output )
• Default no pruning
• User code has to honor reduced columns
A B C D E F G J KH I … M … 1000
UDO Tips
and
Warnings
• Tips when Using UDOs:
• READONLY clause to allow pushing predicates through UDOs
• REQUIRED clause to allow column pruning through UDOs
• PRESORT on REDUCE if you need global order
• Hint Cardinality if it does choose the wrong plan
• Warnings and better alternatives:
• Use SELECT with UDFs instead of PROCESS
• Use User-defined Aggregators instead of REDUCE
• Learn to use Windowing Functions (OVER expression)
• Good use-cases for PROCESS/REDUCE/COMBINE:
• The logic needs to dynamically access the input and/or output
schema.
E.g., create a JSON doc for the data in the row where the columns
are not known apriori.
• Your UDF based solution creates too much memory pressure and
you can write your code more memory efficient in a UDO
• You need an ordered Aggregator or produce more than 1 row
per group
INSERT Multiple INSERTs into same table
• Generates separate file per insert in physical
storage:
• Can lead to performance degradation
• Recommendations:
• Try to avoid small inserts
• Rebuild table after frequent insertions with:
ALTER TABLE T REBUILD;
Future Items
GA and beyond • Tooling
• Resource planning based on $-cost
• Storage support
• Storage compression (available since this week!)
• Columnar Storage/Index
• Secondary Index
Additional
Resources
Blogs and community page:
• http://usql.io (U-SQL Github)
• http://blogs.msdn.microsoft.com/mrys/
• http://blogs.msdn.microsoft.com/azuredatalake/
• https://channel9.msdn.com/Search?term=U-SQL#ch9Search
Documentation and articles:
• http://aka.ms/usql_reference
• https://azure.microsoft.com/en-us/documentation/services/data-lake-
analytics/
• https://msdn.microsoft.com/en-us/magazine/mt614251
ADL forums and feedback
• http://aka.ms/adlfeedback
• https://social.msdn.microsoft.com/Forums/azure/en-
US/home?forum=AzureDataLake
• http://stackoverflow.com/questions/tagged/u-sql
Slide decks
• http://www.Slideshare.net/MichaelRys
Explore Everything PASS Has to Offer
FREE ONLINE WEBINAR EVENTS FREE 1-DAY LOCAL TRAINING EVENTS
LOCAL USER GROUPS
AROUND THE WORLD
ONLINE SPECIAL INTEREST
USER GROUPS
BUSINESS ANALYTICS TRAINING
VOLUNTEERING OPPORTUNITIES
PASS COMMUNITY NEWSLETTER
BA INSIGHTS NEWSLETTERFREE ONLINE RESOURCES
Session Evaluations
ways to access
Go to passSummit.com Download the GuideBook App
and search: PASS Summit 2016
Follow the QR code link displayed
on session signage throughout the
conference venue and in the
program guide
Submit by 5pm
Friday November 6th to
WIN prizes
Your feedback is
important and valuable. 3
Thank You
Learn more from
Michael Rys
usql@microsoft.com or follow @MikeDoesBigData

More Related Content

What's hot

U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Michael Rys
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)Michael Rys
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Michael Rys
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Michael Rys
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLMichael Rys
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLMichael Rys
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...Michael Rys
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQLMichael Rys
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)Michael Rys
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)Michael Rys
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)Michael Rys
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningMichael Rys
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveIlyas F ☁☁☁
 
U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)Michael Rys
 
U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)Michael Rys
 
ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300Mark Kromer
 

What's hot (20)

U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQL
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
 
Azure data lake sql konf 2016
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance Tuning
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep Dive
 
U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)U-SQL Intro (SQLBits 2016)
U-SQL Intro (SQLBits 2016)
 
U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)
 
ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300
 

Viewers also liked

Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)Michael Rys
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
FileTable and Semantic Search in SQL Server 2012
FileTable and Semantic Search in SQL Server 2012FileTable and Semantic Search in SQL Server 2012
FileTable and Semantic Search in SQL Server 2012Michael Rys
 
02 JavaScript Syntax
02 JavaScript Syntax02 JavaScript Syntax
02 JavaScript SyntaxYnon Perek
 
Web Programming Intro
Web Programming IntroWeb Programming Intro
Web Programming IntroYnon Perek
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeBizTalk360
 

Viewers also liked (9)

Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Running SQL Queries on the Moodle Database
Running SQL Queries on the Moodle DatabaseRunning SQL Queries on the Moodle Database
Running SQL Queries on the Moodle Database
 
FileTable and Semantic Search in SQL Server 2012
FileTable and Semantic Search in SQL Server 2012FileTable and Semantic Search in SQL Server 2012
FileTable and Semantic Search in SQL Server 2012
 
02 JavaScript Syntax
02 JavaScript Syntax02 JavaScript Syntax
02 JavaScript Syntax
 
Web Programming Intro
Web Programming IntroWeb Programming Intro
Web Programming Intro
 
100 sql queries
100 sql queries100 sql queries
100 sql queries
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
 

Similar to Tuning and Optimizing U-SQL Queries (SQLPASS 2016)

Informix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_tableInformix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_tableKeshav Murthy
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005rainynovember12
 
Tech-Spark: Scaling Databases
Tech-Spark: Scaling DatabasesTech-Spark: Scaling Databases
Tech-Spark: Scaling DatabasesRalph Attard
 
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...Datavail
 
SQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersSQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersAdam Hutson
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineChester Chen
 
2017 AWS DB Day | Amazon Redshift 소개 및 실습
2017 AWS DB Day | Amazon Redshift  소개 및 실습2017 AWS DB Day | Amazon Redshift  소개 및 실습
2017 AWS DB Day | Amazon Redshift 소개 및 실습Amazon Web Services Korea
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
TCC14 tour hague optimising workbooks
TCC14 tour hague optimising workbooksTCC14 tour hague optimising workbooks
TCC14 tour hague optimising workbooksMrunal Shridhar
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveWill Du
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sqlŁukasz Grala
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 

Similar to Tuning and Optimizing U-SQL Queries (SQLPASS 2016) (20)

Informix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_tableInformix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_table
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005
 
Tech-Spark: Scaling Databases
Tech-Spark: Scaling DatabasesTech-Spark: Scaling Databases
Tech-Spark: Scaling Databases
 
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
 
DB
DBDB
DB
 
SQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersSQL Server 2008 Development for Programmers
SQL Server 2008 Development for Programmers
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data Pipeline
 
2017 AWS DB Day | Amazon Redshift 소개 및 실습
2017 AWS DB Day | Amazon Redshift  소개 및 실습2017 AWS DB Day | Amazon Redshift  소개 및 실습
2017 AWS DB Day | Amazon Redshift 소개 및 실습
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
TCC14 tour hague optimising workbooks
TCC14 tour hague optimising workbooksTCC14 tour hague optimising workbooks
TCC14 tour hague optimising workbooks
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache Hive
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 

More from Michael Rys

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Michael Rys
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Michael Rys
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 

More from Michael Rys (7)

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 

Recently uploaded

Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 

Recently uploaded (20)

Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 

Tuning and Optimizing U-SQL Queries (SQLPASS 2016)

  • 1. Tuning and Optimizing U-SQL queries for maximum performance Michael Rys, Principal Program Manager, Microsoft @MikeDoesBigData, usql@microsoft.com AD-315-MAD-400-M
  • 3. Session Objectives And Takeaways Session Objective(s): • Understand the U-SQL Query execution • Be able to understand and improve U-SQL job performance/cost • Be able to understand and improve the U-SQL query plan • Know how to write more efficient U-SQL scripts Key Takeaways: • U-SQL is designed for scale-out • U-SQL provides scalable execution of user code • U-SQL has a tool set that can help you analyze and improve your scalability, cost and performance
  • 4. Agenda • Job Execution Experience and Investigations Query Execution Stage Graph Dryad crash course Job Metrics Resource Planning • Job Performance Analysis Analyze the critical path Heat Map Critical Path Data Skew • Tuning / Optimizations Cost Optimizations Data Partitioning Partition Elimination Predicate Pushing Column Pruning Some Data Hints UDOs can be evil INSERT optimizations U-SQL Query Execution and Performance Tuning
  • 5.
  • 6. Job Scheduler & Queue Front-EndService 6 Vertex Execution Consume Overall U-SQL Batch Job Execution Lifetime Local Storage Data Lake Store Author Plan Compiler Optimizer Vertexes running in YARN Containers U-SQL Runtime Optimized Plan Vertex Scheduling On containers Job Manager USQL Compiler Service & USQL Catalog
  • 7. Expression-flow Programming Style Automatic "in-lining" of U-SQL expressions – whole script leads to a single execution model. Execution plan that is optimized out-of-the- box and w/o user intervention. Per job and user driven level of parallelization. Detail visibility into execution steps, for debugging. Heatmap like functionality to identify performance bottlenecks.
  • 8. U-SQL Compilation Process C# C++ Algebra Other files (system files, deployed resources) managed dll Unmanaged dll Compilation output (in job folder) Compiler & Optimizer U-SQL Metadata Service Deployed to Vertices
  • 9.
  • 11. Parallelism 1000 (ADLAUs) Work composed of 12K Vertices 1 ADLAU currently maps to a VM with 2 cores and 6 GB of memory
  • 12. U-SQL Query Execution Physical plans vs. Dryad stage graph…
  • 13. Stage Details 252 Pieces of work AVG Vertex execution time 4.3 Billion rows Data Read & Written Super Vertex = Stage
  • 16. 18 U-SQL Performance Analysis Analyze the critical path, heat maps, playback, and runtime metrics on every vertex…
  • 17.
  • 19. Dips down to 1 active vertex at these times
  • 20. Smallest estimated time when given 2425 ADLAUs 1410 seconds = 23.5 minutes
  • 21. Model with 100 ADLAUs 8709 seconds = 145.5 minutes
  • 22.
  • 23. Data Storage • Files • Tables • Unstructured Data in files • Files are split into 250MB extents • 4 extents per vertex -> 1GB per vertex • Different file content formats: • Splittable formats are parallelizable: • row-oriented (CSV etc) • Where data does not span extents • Non-splittable formats cannot be parallelized: • XML/JSON • Have to be processed in single vertex extractor with atomicFileProcessing=true. • Use File Sets to provide semantic partition pruning • Tables • Clustered Index (row-oriented) storage • Vertical and horizontal partitioning • Statistics for the optimizer (CREATE STATISTICS) • Native scalar value serialization
  • 25. // Unstructured Files (24 hours daily log impressions) @Impressions = EXTRACT ClientId int, Market string, OS string, ... FROM @"wasb://ads@wcentralus/2015/10/30/{*}.nif" FROM @"wasb://ads@wcentralus/2015/10/30/{Market}_{*}.nif" ; // … // Filter to by Market @US = SELECT * FROM @Impressions WHERE Market == "en" ; U-SQL Optimizations Partition Elimination – Unstructured Files Partition Elimination • Even with unstructured files! • Leverage Virtual Columns (Named) • Avoid unnamed {*} • WHERE predicates on named virtual columns • That binds the PE range during compilation time • Named virtual columns without predicate = warning • Design directories/files with PE in mind • Design for elimination early in the tree, not in the leaves Extracts all files in the folder Post filter = pay I/O cost to drop most data PE pushes this predicate to the EXTRACT EXTRACT now only reads “en” files! en_10.0.nif en_8.1.nif de_10.0.nif jp_7.0.nif de_8.1.nif ../2015/10/30/ …
  • 26.
  • 27. How many clicks per domain? @rows = SELECT Domain, SUM(Clicks) AS TotalClicks FROM @ClickData GROUP BY Domain;
  • 28. File Read Read Partition Partition Full Agg Write Full Agg Write Full Agg Write Read Partition Partial Agg Partial Agg Partial Agg CNN, FB, WH EXTENT 1 EXTENT 2 EXTENT 3 CNN, FB, WH CNN, FB, WH U-SQL Table Distributed by Domain Read Read Full Agg Full Agg Write Write Read Full Agg Write FB EXTENT 1 WH EXTENT 2 CNN EXTENT 3 Expensive!
  • 30. Data Partitioning Tables Table Partitioning and Distribution • Fine grained (horizontal) partitioning/distribution • Distributes within a partition (together with clustering) to keep same data values close • Choose for: • Join alignment, partition size, filter selectivity, partition elimination • Coarse grained (vertical) partitioning • Based on Partition keys • Partition is addressable in language • Query predicates will allow partition pruning • Choose for data life cycle management, partition elimination Distribution Scheme When to use? HASH(keys) Automatic Hash for fast item lookup DIRECT HASH(id) Exact control of hash bucket value RANGE(keys) Keeps ranges together ROUND ROBIN To get equal distribution (if others give skew)
  • 31. Partitions, Distributions and Clusters TABLE T ( id … , C … , date DateTime, … , INDEX i CLUSTERED (id, C) PARTITIONED BY (date) DISTRIBUTED BY HASH(id) INTO 4) PARTITION (@date1) PARTITION (@date2) PARTITION (@date3) HASH DISTRIBUTION 1 HASH DISTRIBUTION 2 HASH DISTRIBUTION 3 HASH DISTRIBUTION 1 HASH DISTRIBUTION 1 HASH DISTRIBUTION 2 HASH DISTRIBUTION 3 HASH DISTRIBUTION 4 HASH DISTRIBUTION 3 C1 C2 C3 C1 C2 C4 C5 C4 C6 C6 C7 C8 C7 C5 C6 C9 C10 C1 C3 /catalog/…/tables/Guid(T)/ Guid(T.p1).ss Guid(T.p2).ss Guid(T.p3).ss LOGICAL PHYSICAL
  • 32. Benefits of Clustered Index in Distribution Benefits • Design for most frequent/costly queries • Manage data skew in distribution bucket • Provide locality of same data values • Provide seeks and range scans for query predicates (index lookup) Clustered index in tables is mandatory, chose according to desired benefits Pro Tip: Distribution keys should be prefix of Clustered Index keys: Especially for RANGE distribution Optimizer will make use of global ordering then: If you make the RANGE distribution key a prefix of the index key, U- SQL will repartition on demand to align any UNIONALLed or JOINed tables or partitions! Split points of table distribution partitions are choosen independently, so any partitioned table can do UNION ALL in this manner if the data is to be processed subsequently on the distribution key.
  • 33. Benefits of Distribution in Tables Benefits • Design for most frequent/costly queries • Manage data skew in partition/table • Manage parallelism in querying (by number of distributions) • Manage minimizing data movement in joins • Provide distribution seeks and range scans for query predicates (distribution bucket elimination) Distribution in tables is mandatory, chose according to desired benefits
  • 34. Benefits of Partitioned Tables Benefits • Partitions are addressable • Enables finer-grained data lifecycle management at partition level • Manage parallelism in querying by number of partitions • Query predicates provide partition elimination • Predicate has to be constant-foldable Use partitioned tables for • Managing large amounts of incrementally growing structured data • Queries with strong locality predicates • point in time, for specific market etc • Managing windows of data • provide data for last x months for processing
  • 35. Partitioned tables Use partitioned tables for querying parts of large amounts of incrementally growing structured data Get partition elimination optimizations with the right query predicates Creating partition table CREATE TABLE PartTable(id int, event_date DateTime, lat float, long float , INDEX idx CLUSTERED (vehicle_id ASC) PARTITIONED BY(event_date) DISTRIBUTED BY HASH (vehicle_id) INTO 4); Creating partitions DECLARE @pdate1 DateTime = new DateTime(2014, 9, 14, 00,00,00,00,DateTimeKind.Utc); DECLARE @pdate2 DateTime = new DateTime(2014, 9, 15, 00,00,00,00,DateTimeKind.Utc); ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@pdate2); Loading data into partitions dynamically DECLARE @date1 DateTime = DateTime.Parse("2014-09-14"); DECLARE @date2 DateTime = DateTime.Parse("2014-09-16"); INSERT INTO vehiclesP ON INTEGRITY VIOLATION IGNORE SELECT vehicle_id, event_date, lat, long FROM @data WHERE event_date >= @date1 AND event_date <= @date2; • Filters and inserts clean data only, ignore “dirty” data Loading data into partitions statically ALTER TABLE vehiclesP ADD PARTITION (@pdate1), PARTITION (@baddate); INSERT INTO vehiclesP ON INTEGRITY VIOLATION MOVE TO @baddate SELECT vehicle_id, lat, long FROM @data WHERE event_date >= @date1 AND event_date <= @date2;
  • 36. @Impressions = SELECT * FROM searchDM.SML.PageView(@start, @end) AS PageView OPTION(SKEWFACTOR(Query)=0.5) ; // Q1(A,B) @Sessions = SELECT ClientId, Query, SUM(PageClicks) AS Clicks FROM @Impressions GROUP BY Query, ClientId ; // Q2(B) @Display = SELECT * FROM @Sessions INNER JOIN @Campaigns ON @Sessions.Query == @Campaigns.Query ; U-SQL Optimizations Distributions – Minimize (re)partitions Input must be distributed on: (Query) Input must be distributed on: (Query) or (ClientId) or (Query, ClientId) Optimizer wants to distribute only once But Query could be skewed Data Distribution • Re-Distributing is very expensive • Many U-SQL operators can handle multiple distribution choices • Optimizer bases decision upon estimations Wrong statistics may result in worse query performance
  • 37. // Unstructured (24 hours daily log impressions) @Huge = EXTRACT ClientId int, ... FROM @"wasb://ads@wcentralus/2015/10/30/{*}.nif" ; // Small subset (ie: ForgetMe opt out) @Small = SELECT * FROM @Huge WHERE Bing.ForgetMe(x,y,z) OPTION(ROWCOUNT=500) ; // Result (not enough info to determine simple Broadcast join) @Remove = SELECT * FROM Bing.Sessions INNER JOIN @Small ON Sessions.Client == @Small.Client ; U-SQL Optimizations Distribution - Cardinality Broadcast JOIN right? Broadcast is now a candidate. Wrong statistics may result in worse query performance => CREATE STATISTICS Optimizer has no stats this is small...
  • 38.
  • 39. What is Data Skew? • Some data points are much more common than others • data may be distributed such that all rows that match a certain key go to a single vertex • imbalanced execution, vertex time out. 0 5,000,000 10,000,000 15,000,000 20,000,000 25,000,000 30,000,000 35,000,000 40,000,000 California Florida Ohio NorthCarolina Washington Indiana Maryland Colorado Louisiana Oklahoma Mississippi Utah Nebraska Hawaii RhodeIsland SouthDakota DistrictofColumbia Population by State
  • 40. Low Distinctiveness Keys • Keys with small selectivity can lead to large vertices even without skew @rows = SELECT Gender, AGG<MyAgg>(…) AS Result FROM @HugeInput GROUP BY Gender; Gender==Male Gender==Female @HugeInput Vertex 0 Vertex 1
  • 41. Why is this a problem? Vertexes have a 5 hour runtime limit! Your UDO or join may excessively allocate memory. • Your memory usage may not be obvious due to garbage collection
  • 42. Addressing Data Skew/Low distinctiveness • Improve data partition sizes: • Find more fine grained keys, eg, states and congressional districts or ZIP codes • If no fine grained keys can be found or are too fine-grained: use ROUND ROBIN distribution • Write queries that can handle data skew: • Use filters that prune skew out early • Use Data Hints to identify skew and “low distinctness” in keys: • SKEWFACTOR(columns) = x provides hint that given columns have a skew factor x between 0 (no skew) and 1 (very heavy skew)) • DISTINCTVALUE(columns) = n let’s you specify how many distinct values the given columns have (n>1) • Implement aggregation/reducer recursively if possible
  • 43. Non-Recursive vs Recursive SUM 1 2 3 4 5 6 7 8 36 1 2 3 4 5 6 7 8 6 15 15 36
  • 44. U-SQL Partitioning during Processing Data Skew
  • 45. U-SQL Partitioning Data Skew – Recursive Reducer // Metrics per domain @Metric = REDUCE @Impressions ON UrlDomain USING new Bing.TopNReducer(count:10) ; // … Inherent Data Skew [SqlUserDefinedReducer(IsRecursive = true)] public class TopNReducer : IReducer { public override IEnumerable<IRow> Reduce(IRowset input, IUpdatableRow output) { // Compute TOP(N) per group // … } } Recursive • Allow multi-stage aggregation trees • Requires same schema (input => output) • Requires associativity: • R(x, y) = R( R(x), R(y) ) • Default = non-recursive • User code has to honor recursive semantics www.bing.com brought to a single vertex
  • 46.
  • 47. // Bing impressions @Impressions = SELECT * FROM searchDM.SML.PageView(@start, @end) AS PageView ; // Compute sessions @Sessions = REDUCE @Impressions ON Client, Market READONLY Market USING new Bing.SessionReducer(range : 30) ; // Users metrics @Metrics = SELECT * FROM @Sessions WHERE Market == "en-us" ; // … Microsoft Confidential U-SQL Optimizations Predicate pushing – UDO pass-through columns
  • 48. Show me U- SQL UDOs! https://blogs.msdn.microsoft.com/azuredatalake/20 16/06/27/how-do-i-combine-overlapping-ranges- using-u-sql-introducing-u-sql-reducer-udos/
  • 49. // Bing impressions @Impressions = SELECT * FROM searchDM.SML.PageView(@start, @end) AS PageView ; // Compute page views @Impressions = PROCESS @Impressions READONLY Market PRODUCE Client, Market, Header string USING new Bing.HtmlProcessor() ; @Sessions = REDUCE @Impressions ON Client, Market READONLY Market USING new Bing.SessionReducer(range : 30) ; // Users metrics @Metrics = SELECT * FROM @Sessions WHERE Market == "en-us" ; Microsoft Confidential U-SQL Optimizations Predicate pushing – UDO row level processors public abstract class IProcessor : IUserDefinedOperator { /// <summary/> public abstract IRow Process(IRow input, IUpdatableRow output); } public abstract class IReducer : IUserDefinedOperator { /// <summary/> public abstract IEnumerable<IRow> Reduce(IRowset input, IUpdatableRow output); }
  • 50. // Bing impressions @Impressions = SELECT Client, Market, Html FROM searchDM.SML.PageView(@start, @end) AS PageView ; // Compute page views @Impressions = PROCESS @Impressions PRODUCE Client, Market, Header string USING new Bing.HtmlProcessor() ; // Users metrics @Metrics = SELECT * FROM @Sessions WHERE Market == "en-us" && Header.Contains("microsoft.com") AND Header.Contains("microsoft.com") ; U-SQL Optimizations Predicate pushing – relational vs. C# semantics
  • 51. // Bing impressions @Impressions = SELECT * FROM searchDM.SML.PageView(@start, @end) AS PageView ; // Compute page views @Impressions = PROCESS @Impressions PRODUCE * REQUIRED ClientId, HtmlContent(Header, Footer) USING new Bing.HtmlProcessor() ; // Users metrics @Metrics = SELECT ClientId, Market, Header FROM @Sessions WHERE Market == "en-us" ; U-SQL Optimizations Column Pruning and dependencies C H M C H M C H M Column Pruning • Minimize I/O (data shuffling) • Minimize CPU (complex processing, html) • Requires dependency knowledge: • R(D*) = Input ( Output ) • Default no pruning • User code has to honor reduced columns A B C D E F G J KH I … M … 1000
  • 52. UDO Tips and Warnings • Tips when Using UDOs: • READONLY clause to allow pushing predicates through UDOs • REQUIRED clause to allow column pruning through UDOs • PRESORT on REDUCE if you need global order • Hint Cardinality if it does choose the wrong plan • Warnings and better alternatives: • Use SELECT with UDFs instead of PROCESS • Use User-defined Aggregators instead of REDUCE • Learn to use Windowing Functions (OVER expression) • Good use-cases for PROCESS/REDUCE/COMBINE: • The logic needs to dynamically access the input and/or output schema. E.g., create a JSON doc for the data in the row where the columns are not known apriori. • Your UDF based solution creates too much memory pressure and you can write your code more memory efficient in a UDO • You need an ordered Aggregator or produce more than 1 row per group
  • 53.
  • 54. INSERT Multiple INSERTs into same table • Generates separate file per insert in physical storage: • Can lead to performance degradation • Recommendations: • Try to avoid small inserts • Rebuild table after frequent insertions with: ALTER TABLE T REBUILD;
  • 55. Future Items GA and beyond • Tooling • Resource planning based on $-cost • Storage support • Storage compression (available since this week!) • Columnar Storage/Index • Secondary Index
  • 56. Additional Resources Blogs and community page: • http://usql.io (U-SQL Github) • http://blogs.msdn.microsoft.com/mrys/ • http://blogs.msdn.microsoft.com/azuredatalake/ • https://channel9.msdn.com/Search?term=U-SQL#ch9Search Documentation and articles: • http://aka.ms/usql_reference • https://azure.microsoft.com/en-us/documentation/services/data-lake- analytics/ • https://msdn.microsoft.com/en-us/magazine/mt614251 ADL forums and feedback • http://aka.ms/adlfeedback • https://social.msdn.microsoft.com/Forums/azure/en- US/home?forum=AzureDataLake • http://stackoverflow.com/questions/tagged/u-sql Slide decks • http://www.Slideshare.net/MichaelRys
  • 57. Explore Everything PASS Has to Offer FREE ONLINE WEBINAR EVENTS FREE 1-DAY LOCAL TRAINING EVENTS LOCAL USER GROUPS AROUND THE WORLD ONLINE SPECIAL INTEREST USER GROUPS BUSINESS ANALYTICS TRAINING VOLUNTEERING OPPORTUNITIES PASS COMMUNITY NEWSLETTER BA INSIGHTS NEWSLETTERFREE ONLINE RESOURCES
  • 58. Session Evaluations ways to access Go to passSummit.com Download the GuideBook App and search: PASS Summit 2016 Follow the QR code link displayed on session signage throughout the conference venue and in the program guide Submit by 5pm Friday November 6th to WIN prizes Your feedback is important and valuable. 3
  • 59. Thank You Learn more from Michael Rys usql@microsoft.com or follow @MikeDoesBigData

Editor's Notes

  1. This slide is required. Do NOT delete. This should be the first slide after your Title Slide. This is an important year and we need to arm our attendees with the information they can use to Grow Share! Please ensure that your objectives are SMART (defined below) and that they will enable them to go in and win against the competition to grow share. If you have questions, please contact your Track PM for guidance. We have also posted guidance on writing good objectives, out on the Speaker Portal (https://www.mytechready.com).   This slide should introduce the session by identifying how this information helps the attendee, partners and customers be more successful. Why is this content important? This slide should call out what’s important about the session (sort of the why should we care, why is this important and how will it help our customers/partners be successful) as well as the key takeaways/objectives associated with the session. Call out what attendees will be able to execute on using the information gained in this session. What will they be able to walk away from this session and execute on with their customers. Good Objectives should be SMART (specific, measurable, achievable, realistic, time-bound). Focus on the key takeaways and why this information is important to the attendee, our partners and our customers. Each session has objectives defined and published on www.mytechready.com, please work with your Track PM to call these out here in the slide deck. If you have questions, please contact your Track PM. See slide 5 in this template for a complete list of Tracks and TPMs.
  2. https://github.com/Azure/usql/tree/master/Examples/AmbulanceDemos/AmbulanceDemos/5-Ambulance-StreamSets-PartitionedTables
  3. https://github.com/Azure/usql/tree/master/Examples/AmbulanceDemos/AmbulanceDemos/5-Ambulance-StreamSets-PartitionedTables
  4. https://github.com/Azure/usql/tree/master/Examples/AmbulanceDemos/AmbulanceDemos/5-Ambulance-StreamSets-PartitionedTables
  5. https://github.com/Azure/usql/tree/master/Examples/AmbulanceDemos/AmbulanceDemos/5-Ambulance-StreamSets-PartitionedTables