Accompanying slides for the ADF team live stream of ETL performance tuning, optimization, and troubleshooting with data flows: https://www.youtube.com/watch?v=5KUek4JfSYs
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
ETL Best Practices with ADF Data Flows
1. ETL Best Practices, Performance Tuning, and
Troubleshooting
with Azure Data Factory Data Flows
2.
3. ADF Data Flow UI Designer
Data Flow Script
ADF Monitoring View
Design, debug,
manage data
transform logic in
browser UI
The UI builds data transformation scripts that
contain metadata defining your data flow logic.
This script payload is combined with the ADF
JSON definition of your pipeline activities. ADF
spins-up a JIT on-demand Databricks cluster and
builds an execution plan for your data flow in
Spark.
Based on your Azure IR configuration, ADF will
spin-up Azure Databricks clusters as VMs and an
executor job will execute your data transformation
logic on Spark. The results of the data ingest,
transformation, data partitioning, data egress, and
timings will all appear in the monitoring view.
6. https://docs.microsoft.com/en-us/azure/data-factory/data-flow-troubleshoot-guide
• Error code: DF-Executor-BroadcastTimeout
• Message: Broadcast join timeout error, make sure broadcast stream produces data within 60 secs in debug runs and 300
secs in job runs
• Causes: Broadcast has a default timeout of 60 secs in debug runs and 300 secs in job runs. Stream chosen for broadcast
seems to large to produce data within this limit.
• Recommendation: Avoid broadcasting large data streams where the processing can take more than 60 secs. Choose a
smaller stream to broadcast instead. Large SQL/DW tables and source files are typically bad candidates.
• Error code: Hit unexpected exception and execution failed
• Message: During Data Flow activity execution: Hit unexpected exception and execution failed
• Causes: This is a back-end service error. You can retry the operation and also restart your debug session
• Recommendation: If retry and restart do not resolve the issue, contact customer support
• Error code: JSON single line (Corrupt_record)
• Error code: Job failed due to reason: DF-SYS-01 at Sink 'WriteToDatabase': java.sql.BatchUpdateException: String or binary
data would be truncated. java.sql.BatchUpdateException: String or binary data would be truncated.
• Add data constraints to your data flow using Conditional Split
• https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-flow-error-rows