Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Azure Data Lake Intro (SQLBits 2016)

2,477 views

Published on

Azure Data Lake Intro (SQLBits 2016 ADL/USQL Pre-Conference)
Data Lake concept, Azure Data Lake, HDINSIGHT, Azure Data Lake Storage and Analytics

Published in: Data & Analytics
  • Login to see the comments

Azure Data Lake Intro (SQLBits 2016)

  1. 1. SQLBits 2016 Azure Data Lake & U-SQL Michael Rys, @MikeDoesBigData http://www.azure.com/datalake {mrys, usql}@microsoft.com
  2. 2. The Data Lake Approach
  3. 3. CLOUD MOBILE
  4. 4. Implement Data Warehouse Reporting & Analytics Development Reporting & Analytics Design Physical DesignDimension Modelling ETL Development ETL Design Install and TuneSetup Infrastructure Traditional data warehousing approach Data sources ETL BI and analytics Data warehouse Understand Corporate Strategy Gather Requirements Business Requirements Technical Requirements
  5. 5. The Data Lake approach Ingest all data regardless of requirements Store all data in native format without schema definition Do analysis Using analytic engines like Hadoop Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics Devices
  6. 6. Source: ComScore 2009-2015 Search Report US 9% 11% 15% 16% 18% 19% 20% 0% 5% 10% 15% 20% 25% 2009 2010 2011 2012 2013 2014 2015 MICROSOFT DOUBLES SEARCH SHARE How Microsoft has used Big Data We needed to better leverage data and analytics to win in search We changed our approach • More experiments by more people! So we… Built an Exabyte-scale data lake for everyone to put their data. Built tools approachable by any developer. Built machine learning tools for collaborating across large experiment models.
  7. 7. Introducing Azure Data Lake Big Data Made Easy
  8. 8. Cortana Analytics Suite Big Data & Advanced Analytics
  9. 9. Analytics Storage HDInsight (“managed clusters”) Azure Data Lake Analytics Azure Data Lake Storage Azure Data Lake
  10. 10. Azure Data Lake Storage Service
  11. 11. No limits to SCALE Store ANY DATA in its native format HADOOP FILE SYSTEM (HDFS) for the cloud ENTERPRISE GRADE access control, encryption at rest Optimized for analytic workload PERFORMANCE Azure Data Lake Store A hyper scale repository for big data analytics workloads IN PREVIEW
  12. 12. Data Lake Store: Built for the cloud Secure Must be highly secure to prevent unauthorized access (especially as all data is in one place). Native format Must permit data to be stored in its ‘native format’ to track lineage and for data provenance. Low latency Must have low latency for high-frequency operations. Must support multiple analytic frameworks—Batch, Real-time, Streaming, Machine Learning, etc. No one analytic framework can work for all data and all types of analysis. Multiple analytic frameworks Details Must be able to store data with all details; aggregation may lead to loss of details. Throughput Must have high throughput for massively parallel processing via frameworks such as Hadoop and Spark. Reliable Must be highly available and reliable (no permanent loss of data). Scalable Must be highly scalable. When storing all data indefinitely, data volumes can quickly add up. All sources Must be able ingest data from a variety of sources-LOB/ERP, Logs, Devices, Social NWs etc.
  13. 13. Four pillars of security and compliance
  14. 14. Social ClickstreamWeb
  15. 15. FULLY SUPPORTED Hadoop for the cloud Available on LINUX and WINDOWS Works on AZURE STORAGE or DATA LAKE STORE 100% OPEN SOURCE Apache Hadoop (HDP 2.3) Clusters up and RUNNING IN MINUTES Use familiar BI TOOLS FOR ANALYSIS like Excel Azure HDInsight Hadoop Platform as a Service on Azure
  16. 16. Azure Data Lake Analytics Service
  17. 17. WebHDFS YARN U-SQL ADL Analytics ADL HDInsight Store HiveAnalytics Storage Azure Data Lake (Store, HDInsight, Analytics)
  18. 18. ADLA complements HDInsight Target the same scenarios, tools, and customers HDInsight For developers familiar with the Open Source: Java, Eclipse, Hive, etc. Clusters offer customization, control, and flexibility in a managed Hadoop cluster ADLA Enables customers to leverage existing experience with C#, SQL & PowerShell Offers convenience, efficiency, automatic scale, and management in a “job service” form factor
  19. 19. No limits to SCALE Includes U-SQL, a language that unifies the benefits of SQL with the expressive power of C# Optimized to work with ADL STORE FEDERATED QUERY across Azure data sources ENTERPRISE GRADE role-based access control and auditing Pay PER QUERY and scale PER QUERY Azure Data Lake Analytics A distributed analytics service built on Apache YARN that dynamically scales to your needs IN PREVIEW
  20. 20. ADL and SQLDW
  21. 21. Work across all cloud data Azure Data Lake Analytics Azure SQL DW Azure SQL DB Azure Storage Blobs Azure Data Lake Store SQL DB in an Azure VM
  22. 22. Simplified management and administration Web-based management in Azure Portal Automate tasks using PowerShell Role-based access control with Azure AD Monitor service operations and activity
  23. 23. Get started Log in to Azure Create an ADLA account Write and submit an ADLA job with U-SQL (or Hive/Pig) The job reads and writes data from storage 1 2 3 4 30 seconds ADLS Azure Blobs Azure DB …
  24. 24. Azure Data Lake SDK/CLI
  25. 25. Account Management Create new account List accounts Update account properties Delete account Transferring Data Upload into store from local disk Download from store to local disk Files and Folders List contents of folder Create Move Delete Does file exist Security Get ACLs Update ACLs Get Owner Set Owner File Content Set file content Append file content Get file content Merge files
  26. 26. Account Management Create new account List accounts Update account properties Delete account Data Sources Add a data source List data sources Update data source Delete data source Compute List jobs Submit job Cancel job Catalog Items List items in U-SQL catalog Update item Catalog Secrets Create catalog secret List catalog secrets Delete catalog secrets
  27. 27. ADL .NET SDKs Azure and ADL REST APIs ADL PowerShell ADL XPlat CLI ADL Node.js SDK ADL Java SDK Your application
  28. 28. Management Create and manage ADLA accounts Jobs Submit and manage jobs Catalog Explore catalog items Management Create and manage ADLS accounts File System Upload, download, list, delete, rename, append (WebHDFS) Analytics Store
  29. 29. Analytics .NET SDK Store .NET SDK • Management • Catalog • Jobs • Management • Filesystem • Uploader SDKs NuGet packages
  30. 30. 1. 2. 3.
  31. 31. http://aka.ms/AzureDataLake

×