Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Intelligent Cloud Conference 2018 - Next Generation of Data Integration with Azure Data Factory

500 views

Published on

Azure Data Factory is a hybrid data integration service in Azure that allows you to create, manage & operate data pipelines in Azure. It is a serverless orchestrator that allows you to create data pipelines to either move, transform, load data; a fully managed Extract, Transform, Load (ETL) & Extract, Load, Transform (ELT) service if you will.

In this talk I'll cover the basics of Azure Data Factory and show you how you can create, manage & operate data pipelines.

Published in: Software
  • Login to see the comments

Intelligent Cloud Conference 2018 - Next Generation of Data Integration with Azure Data Factory

  1. 1. Next Generation of Data Integration with Azure Data Factory Tom Kerkhove Azure Consultant at Codit, MSFT Azure MVP
  2. 2. Expo Sponsors Event Sponsors Expo Light Sponsors
  3. 3. Hi! Tom Kerkhove • Azure Consultant at Codit • Microsoft Azure MVP & Advisor • Belgian Azure User Group (AZUG) blog.tomkerkhove.be @TomKerkhove tomkerkhove
  4. 4. Azure Serverless Azure Logic AppsAzure Functions Azure Event Grid
  5. 5. Azure Serverless Azure Logic AppsAzure Functions Azure Event GridAzure Data Factory
  6. 6. Disclaimer Azure Data Factory 2.0 Preview https://bit.ly/adf-v1-vs-v2
  7. 7. ➔ Managed data orchestration service ➔ Allows you to run pipelines ➔ Execute SSIS packages ➔ Support for hybrid scenarios ➔ Data movement-as-a-service with 70+ connectors ➔ Visual tooling & programmability ➔ .NET, Python, REST, ARM What is Azure Data Factory?
  8. 8. What is Azure Data Factory? Trigger(s) Activity ActivityActivity Activity Activity Pipeline
  9. 9. ➔ A pipeline represents a business process with multiple “steps” which are represented by activities and is started by a trigger ➔ Activities represent a steps in a business process that perform a specific action. ➔ This is based on the outcome of the previous step and can be on success, failure, skipped or completion What is Azure Data Factory?
  10. 10. ➔ Different types of triggers ➔ On-Demand (Via REST API, .NET, etc.) • Azure API Management can make this easier ➔ Scheduled / Wall-clock ➔ Tumbling Windows (aka “data slicing”) ➔ Event-based (New file is added to blob storage) ➔ Support for passing parameters Triggers
  11. 11. What is Azure Data Factory? Trigger(s) Activity ActivityActivity Activity Activity Pipeline
  12. 12. ➔ Data Movement ➔ Azure, Databases, NoSQL, File, SaaS, Web, etc ➔ Data Transformation ➔ Pig, Hive, Stored Procedure, U-SQL, ML, Spark, MapReduce, etc. ➔ Control Flow ➔ Web call, Lookup, Get Metadata, If, Wait, ForEach, Execute Pipeline, etc ➔ Custom ➔ Run commands on an Azure Batch cluster ➔ Run R scripts on a HDInsight cluster Activities
  13. 13. ➔ An activity can produce or consume a data set. It is a representation of a data structure in a data store that can be used as a source or sink. ➔ Linked Services define how an activity can connect to an external system. This external system can be a data store or compute resource. What is Azure Data Factory?
  14. 14. What is Azure Data Factory? Activity Data Set Linked Service Represents data stored in Produces Consumes
  15. 15. ➔ Compute infrastructure used by Data Factory ➔ Azure, Azure-SSIS or Self-Hosted (Any cloud or on-prem) ➔ Core capabilities ➔ Data movement ➔ Pipeline activity execution ➔ SSIS package execution ➔ Pipelines issues commands & control, integration runtime executes ➔ Data movement is from IR to IR ➔ All executions are happening in sources & sinks Integration Runtime (IR)
  16. 16. Integration Runtime (IR)
  17. 17. ➔ Stores SSISDB in Azure SQL DB or Managed Instance ➔ Azure-SSIS integration runtime as compute-layer ➔ Compute part for running SSIS ➔ Managed cluster of Azure VMs ➔ Compute-layer ➔ Can be linked to VNET for hybrid scenarios ➔ Lift & shift packages to the cloud Running SSIS packages in Azure
  18. 18. Running SSIS packages in Azure
  19. 19. ➔ Native support for Managed Service Identity (MSI) ➔ Native integration with Azure Key Vault ➔ Encrypted-in-transit via HTTPS ➔ Supports encryption-at-rest with data stores Security
  20. 20. Show it to me!
  21. 21. ➔ Every user should be capable of requesting their data Using Azure Serverless to become GDPR compliant User Profile information StackExchange Data Set Kerkhove.tom @gmail.com
  22. 22. Using Azure Serverless to become GDPR compliant
  23. 23. ➔ Visual monitoring in the portal ➔ Monitoring per pipeline run ➔ Detailed information per activity ➔ Azure Monitor integration ➔ Diagnostic Logs ➔ Metrics ➔ Alerts Monitoring
  24. 24. ➔ Serverless orchestration ➔ Pay for what you use ➔ Data-centric vs Application-centric workflows ➔ Work together seamlessly How is this different from Logic Apps?
  25. 25. ➔ Azure Data Factory is a great way to orchestrate data processes and build data-integration pipelines ➔ Very powerful for data-centric workloads ➔ Unsung hero in the serverless space ➔ A perfect match with Azure Logic Apps ➔ Allows you to get to market very quickly with the built-in connectors Conclusion
  26. 26. 28

×