SlideShare a Scribd company logo
1 of 27
Cloud-First ETL in the Modern Data
Warehouse w/Azure Data Factory
Sr. Program Manager
Azure Data Management
https://github.com/kromerm/Azure-Data-Week-ADF
A fully-managed data integration service in the cloud
A Z U R E D ATA F A C T O R Y
H Y B R I D S C A L A B L EP R O D U C T I V E T R U S T E D
 Serverless scalability
with no infrastructure
to manage
 Drag & Drop UI
 Codeless Data
Movement
 Orchestrate where
your data lives
 Lift SSIS packages
to Azure
 Certified compliant
Data Movement
MODEL & SERVE
Azure Analysis ServicesAzure SQL Data
Warehouse
Power BI
Modernize your enterprise data warehouse at scale
A Z U R E D A T A F A C T O R Y
On-premises data
Oracle, SQL, Teradata,
fileshares, SAP
Cloud data
Azure, AWS, GCP
SaaS data
Salesforce, Workday,
Dynamics
INGEST STORE PREP & TRAIN
Azure Data Factory Azure Blob Storage
Azure Databricks
Polybase
Microsoft Azure also supports other Big Data services like Azure HDInsight, Azure SQL Database and Azure Data Lake to allow customers to tailor the above architecture to meet their unique needs.
Orchestrate with Azure Data Factory
Lift your SQL Server Integration Services (SSIS) packages to Azure
On-Premise data sources
SQL DB Managed Instance
SQL Server
VNET
Azure Data Factory
SSIS Cloud ETL
SSIS Integration Runtime
Cloud data sources
Cloud
On-premises
Microsoft
SQL Server
Integration Services
Control Flow in ADF Pipeline Builder
Coordinate pipeline activities into finite execution steps to enable looping,
conditionals and chaining while separating data transformations into
individual data flows
Activity 1 Activity 2
Activity 3
“On
Error”
Activity 1
Success,
params
Error,
param
s
My Pipeline 1
…
My Pipeline 2For Each…
Activity 4
Success,
params
Trigger
Event
Wall Clock
On Demand
Activity 1
Activity 2
…
Built-in source
control support
ADF Integration Runtime (IR)
 ADF compute environment with multiple capabilities:
- Activity dispatch & monitoring
- Data movement
- SSIS package execution
 To integrate data flow and control flow across the
enterprises’ hybrid cloud, customer can instantiate
multiple IR instances for different network environments:
- On premises (similar to DMG in ADF V1)
- In public cloud
- Inside VNet
 Bring a consistent provision and monitoring experience
across the network environments
Portal Application & SDK
Azure Data Factory Service
Data Movement & Activity
Dispatch on-prem, Cloud,
VNET
Data Movement & Activity
Dispatch In Azure Public
Network, SSIS
VNET coming soon
Self-Hosted IR Azure IR
Azure Data Factory Patterns
Azure SQL Database Change Tracking
Data Flow: Star Schema Fact &
Dimension Table Loading
Pipeline Master / Child Controller Pattern
Microsoft ETL/ELT Services in Azure
• Azure-SSIS IR: Managed cluster of Azure VMs
(nodes) dedicated to run your SSIS packages
and no other activities
• You can scale it up/out by specifying the node
size /number of nodes in the cluster
• You can bring your own Azure SQL Database
(DB)/Managed Instance (MI) server to host the
catalog of SSIS projects/packages (SSISDB) that
will be attached to it
• You can join it to a Virtual Network (VNet) that is
connected to your on-prem network to enable on-
prem data access
• Once provisioned, you can enter your Azure SQL
DB/MI server endpoint on SSDT/SSMS to deploy
SSIS projects/packages and configure/execute
them just like using SSIS on premises
Execute/Manage
Provision
SSDT SSMS
Cloud
On-
Premises Design/Deploy
SSIS Server
Design/Deploy Execute/Manage
ISVs
SSIS PaaS w/
ADF compute
resource
ADF
(PROVISIONING)
HDI
ML
Azure Data Factory
Visual Data Flow
Visual Data Flow Authoring
• Transform Data, At Scale, in the Cloud, Zero-Code
• Cloud-first, scale-out ELT
• Code-free dataflow pipelines
• Serverless scale-out transformation execution engine
• Maximum Productivity for Data Engineers
• Does NOT require understanding of Spark / Scala / Python / Java
• Resilient Data Transformation Flows
• Built for big data scenarios with unstructured data requirements
• Operationalize with Data Factory scheduling, control flow and monitoring
Code-free Data Transformation At Scale
• Does not require understanding of Spark, Big Data Execution
Engines, Clusters, Scala, Python …
• Focus on building business logic and data transformation
• Data cleansing
• Aggregation
• Data conversions
• Data prep
• Data exploration
ADF Data Flow Workstream
Data Sources Staging Transformation
s
Destination
Sort, Merge, Join,
Lookup …
• Explicit user action
• User places data
source(s) on design
surface, from toolbox
• Select explicit sources
• Implicit/Explicit
• Data Lake staging area
as default
• User does not need to
configure this manually
• Advanced feature to set
staging area options
• File Formats / Types
(Parquet, JSON, txt,
CSV …)
• Explicit user action
• User places
transformations on
design surface, from
toolbox
• User must set
properties for
transformation steps
and step connectors
• Explicit user action
• User chooses
destination
connector(s)
• User sets connector
property options
Simple Copy Flow
Build your logical data flows adding data
transformations in a guided experience
Switch to Debug Mode and select sample
data to work with for debugging
Debug mode provides row-level context and
visible results in inspector pane
Debug mode provides row-level context and
visible results in inspector pane
Interactive Expression Builder – Build data
transform expressions, not Spark code
Data Engineer derives columns using template expression
based on name and type matching. No need to define static
field names.

More Related Content

What's hot

What's hot (20)

Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
 
Azure datafactory
Azure datafactoryAzure datafactory
Azure datafactory
 
Core Concepts in azure data factory
Core Concepts in azure data factoryCore Concepts in azure data factory
Core Concepts in azure data factory
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Intro to Azure Data Factory v1
Intro to Azure Data Factory v1Intro to Azure Data Factory v1
Intro to Azure Data Factory v1
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
 
ETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure Databricks
 
Data Migration to Azure
Data Migration to AzureData Migration to Azure
Data Migration to Azure
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 

Similar to Azure Data Factory for Azure Data Week

Similar to Azure Data Factory for Azure Data Week (20)

SQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudSQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
 
Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018
 
Azure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layerAzure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layer
 
ADF Mapping Data Flows Training Slides V1
ADF Mapping Data Flows Training Slides V1ADF Mapping Data Flows Training Slides V1
ADF Mapping Data Flows Training Slides V1
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptx
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
 
ME_Snowflake_Introduction_for new students.pptx
ME_Snowflake_Introduction_for new students.pptxME_Snowflake_Introduction_for new students.pptx
ME_Snowflake_Introduction_for new students.pptx
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
New capabilities for modern data integration in the cloud
New capabilities for modern data integration in the cloudNew capabilities for modern data integration in the cloud
New capabilities for modern data integration in the cloud
 
New capabilities for modern data integration in the cloud
New capabilities for modern data integration in the cloudNew capabilities for modern data integration in the cloud
New capabilities for modern data integration in the cloud
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration Service
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql database
 
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive sessionMicrosoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 

More from Mark Kromer

More from Mark Kromer (20)

Fabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxFabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptx
 
Build data quality rules and data cleansing into your data pipelines
Build data quality rules and data cleansing into your data pipelinesBuild data quality rules and data cleansing into your data pipelines
Build data quality rules and data cleansing into your data pipelines
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22
 
Data cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flowsData cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flows
 
Data cleansing and data prep with synapse data flows
Data cleansing and data prep with synapse data flowsData cleansing and data prep with synapse data flows
Data cleansing and data prep with synapse data flows
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
 
Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021
 
Data Lake ETL in the Cloud with ADF
Data Lake ETL in the Cloud with ADFData Lake ETL in the Cloud with ADF
Data Lake ETL in the Cloud with ADF
 
Azure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power QueryAzure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power Query
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101
 
Data Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADFData Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADF
 
Data quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADFData quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADF
 
Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
 
ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300
 
ADF Mapping Data Flows Training V2
ADF Mapping Data Flows Training V2ADF Mapping Data Flows Training V2
ADF Mapping Data Flows Training V2
 
ADF Mapping Data Flow Private Preview Migration
ADF Mapping Data Flow Private Preview MigrationADF Mapping Data Flow Private Preview Migration
ADF Mapping Data Flow Private Preview Migration
 
Azure Data Factory Data Flow Limited Preview for January 2019
Azure Data Factory Data Flow Limited Preview for January 2019Azure Data Factory Data Flow Limited Preview for January 2019
Azure Data Factory Data Flow Limited Preview for January 2019
 
Microsoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow ScenariosMicrosoft Azure Data Factory Data Flow Scenarios
Microsoft Azure Data Factory Data Flow Scenarios
 
Azure Data Factory Data Flow Preview December 2019
Azure Data Factory Data Flow Preview December 2019Azure Data Factory Data Flow Preview December 2019
Azure Data Factory Data Flow Preview December 2019
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Azure Data Factory for Azure Data Week

  • 1. Cloud-First ETL in the Modern Data Warehouse w/Azure Data Factory Sr. Program Manager Azure Data Management https://github.com/kromerm/Azure-Data-Week-ADF
  • 2. A fully-managed data integration service in the cloud A Z U R E D ATA F A C T O R Y H Y B R I D S C A L A B L EP R O D U C T I V E T R U S T E D  Serverless scalability with no infrastructure to manage  Drag & Drop UI  Codeless Data Movement  Orchestrate where your data lives  Lift SSIS packages to Azure  Certified compliant Data Movement
  • 3. MODEL & SERVE Azure Analysis ServicesAzure SQL Data Warehouse Power BI Modernize your enterprise data warehouse at scale A Z U R E D A T A F A C T O R Y On-premises data Oracle, SQL, Teradata, fileshares, SAP Cloud data Azure, AWS, GCP SaaS data Salesforce, Workday, Dynamics INGEST STORE PREP & TRAIN Azure Data Factory Azure Blob Storage Azure Databricks Polybase Microsoft Azure also supports other Big Data services like Azure HDInsight, Azure SQL Database and Azure Data Lake to allow customers to tailor the above architecture to meet their unique needs. Orchestrate with Azure Data Factory
  • 4. Lift your SQL Server Integration Services (SSIS) packages to Azure On-Premise data sources SQL DB Managed Instance SQL Server VNET Azure Data Factory SSIS Cloud ETL SSIS Integration Runtime Cloud data sources Cloud On-premises Microsoft SQL Server Integration Services
  • 5. Control Flow in ADF Pipeline Builder Coordinate pipeline activities into finite execution steps to enable looping, conditionals and chaining while separating data transformations into individual data flows Activity 1 Activity 2 Activity 3 “On Error” Activity 1 Success, params Error, param s My Pipeline 1 … My Pipeline 2For Each… Activity 4 Success, params Trigger Event Wall Clock On Demand Activity 1 Activity 2 …
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. ADF Integration Runtime (IR)  ADF compute environment with multiple capabilities: - Activity dispatch & monitoring - Data movement - SSIS package execution  To integrate data flow and control flow across the enterprises’ hybrid cloud, customer can instantiate multiple IR instances for different network environments: - On premises (similar to DMG in ADF V1) - In public cloud - Inside VNet  Bring a consistent provision and monitoring experience across the network environments Portal Application & SDK Azure Data Factory Service Data Movement & Activity Dispatch on-prem, Cloud, VNET Data Movement & Activity Dispatch In Azure Public Network, SSIS VNET coming soon Self-Hosted IR Azure IR
  • 12. Azure Data Factory Patterns Azure SQL Database Change Tracking Data Flow: Star Schema Fact & Dimension Table Loading Pipeline Master / Child Controller Pattern
  • 13. Microsoft ETL/ELT Services in Azure • Azure-SSIS IR: Managed cluster of Azure VMs (nodes) dedicated to run your SSIS packages and no other activities • You can scale it up/out by specifying the node size /number of nodes in the cluster • You can bring your own Azure SQL Database (DB)/Managed Instance (MI) server to host the catalog of SSIS projects/packages (SSISDB) that will be attached to it • You can join it to a Virtual Network (VNet) that is connected to your on-prem network to enable on- prem data access • Once provisioned, you can enter your Azure SQL DB/MI server endpoint on SSDT/SSMS to deploy SSIS projects/packages and configure/execute them just like using SSIS on premises Execute/Manage Provision SSDT SSMS Cloud On- Premises Design/Deploy SSIS Server Design/Deploy Execute/Manage ISVs SSIS PaaS w/ ADF compute resource ADF (PROVISIONING) HDI ML
  • 14.
  • 15.
  • 17. Visual Data Flow Authoring • Transform Data, At Scale, in the Cloud, Zero-Code • Cloud-first, scale-out ELT • Code-free dataflow pipelines • Serverless scale-out transformation execution engine • Maximum Productivity for Data Engineers • Does NOT require understanding of Spark / Scala / Python / Java • Resilient Data Transformation Flows • Built for big data scenarios with unstructured data requirements • Operationalize with Data Factory scheduling, control flow and monitoring
  • 18. Code-free Data Transformation At Scale • Does not require understanding of Spark, Big Data Execution Engines, Clusters, Scala, Python … • Focus on building business logic and data transformation • Data cleansing • Aggregation • Data conversions • Data prep • Data exploration
  • 19. ADF Data Flow Workstream Data Sources Staging Transformation s Destination Sort, Merge, Join, Lookup … • Explicit user action • User places data source(s) on design surface, from toolbox • Select explicit sources • Implicit/Explicit • Data Lake staging area as default • User does not need to configure this manually • Advanced feature to set staging area options • File Formats / Types (Parquet, JSON, txt, CSV …) • Explicit user action • User places transformations on design surface, from toolbox • User must set properties for transformation steps and step connectors • Explicit user action • User chooses destination connector(s) • User sets connector property options
  • 20.
  • 22. Build your logical data flows adding data transformations in a guided experience
  • 23. Switch to Debug Mode and select sample data to work with for debugging
  • 24. Debug mode provides row-level context and visible results in inspector pane
  • 25. Debug mode provides row-level context and visible results in inspector pane
  • 26. Interactive Expression Builder – Build data transform expressions, not Spark code
  • 27. Data Engineer derives columns using template expression based on name and type matching. No need to define static field names.

Editor's Notes

  1. 5
  2. SSIS PaaS requires VNet to enable on-prem data access for now, but we may also reuse ADF Data Management Gateway (DMG) to do the same in the near future.