SlideShare a Scribd company logo
1 of 24
A D N A N M A S O O D , P H D
S Y S T E M S A R C H I T E C T / D A T A S C I E N T I S T
A D N A N . M A S O O D @ O W A S P . O R G
( H T T P : / / B L O G . A D N A N M A S O O D . C O M )
G I T H U B ( G I T H U B . C O M / A D N A N M A S O O D ) ,
T W I T T E R ( @ A D N A N M A S O O D ) .
P R E S E N T E D A T M I C R O S O F T D A T A S C I E N C E G R O U P –
T A M P A B A Y D A T A S C I E N C E P R O F E S S I O N A L S
H T T P : / / W W W . M E E T U P . C O M / D A T A - S C I E N T I S T S - T A M P A - B A Y / E V E N T S / 2 3 1 2 9 3 0 7 7 /
Spark with Azure HDInsight
About the Speaker
Adnan Masood, Ph.D. is a developer, software architect, and researcher and specializes
in FinTech, machine learning and Bayesian belief networks. Before joining PDS
Health care, and GDC (a leading prepaid financial technology institution), he enjoyed
life as a principal engineer of a start-up and worked for a leading UK based nonprofit
organization as a solutions architect.
A strong believer in the development community, Adnan is an active member of the
Open Web Application Security Project (OWASP), an organization dedicated to
software security. In the .NET community, he is a cofounder and president of the
Pasadena .NET Developers group, which he has been successfully leading for 8 years.
He led a number of successful enterprise solutions and consulted for several Fortune
500 company projects.
Adnan devotes himself to his own continual, practical education. He holds
certifications in big data, machine learning, and systems architecture from
Massachusetts Institute of Technology; an Application Security certification from
Stanford University; an SOA Smarts certification from Carnegie Mellon University;
and certifications as a ScrumMaster, Microsoft Certified Trainer, Microsoft Certified
Solutions Developer, and Sun Certified Java Developer.
For more details, visit Adnan's blog (http://blog.adnanmasood.com), GitHub
repository (http://github.com/adnanmasood), and Twitter (@adnanmasood). Adnan
can be reached at adnan.masood@owasp.org.
Announcement: Apache Spark for Azure HDInsight now
generally available
Channel 9 Walk through of Apache Spark on Azure HDInsight
Spark 101
 Spark is a unified framework for big data
analytics. Spark provides one integrated API
for use by developers, data scientists, and
analysts to perform diverse tasks that would
have previously required separate
processing engines such as batch analytics,
stream processing and statistical modeling.
Spark supports a wide range of popular
languages including Python, R, Scala, SQL,
and Java. Spark can read from diverse data
sources and scale to thousands of nodes.
Spark with Azure HDInsight
Deployment Models
Big Data Deployment – Public Cloud
• Hadoop-as-a-Service
- Amazon Web Services EC2 and EMR
- Microsoft Azure HDInsight
- Google Cloud Dataproc
- IBM Bluemix ... and others
• Spark-as-a-Service
- All of the above
- Databricks
Big Data Deployment – On-Premises
• Bare-Metal
• Virtual Machines
- VMware Big Data Extensions
- OpenStack Sahara
• Containers
- BlueData
- Mesos
HDInsight as Part of Azure Portal
Spark - Benefits
Spark – Use Cases
Spark is Fast!
Spark is Fast!
Demo - Creating a HDInsight Spark Cluster
HDInsight Spark Streaming
“Along with traditional Hadoop technologies, HDInsight also provides
Spark as a cloud service. Spark is an integrated set of open source
technologies that can run on a Hadoop cluster. The Spark family
includes options for analyzing large amounts of operational data,
doing machine learning, and more. It also includes Spark Streaming, a
technology for working with streaming data.
Spark Streaming is similar to Storm in some ways. Like Storm, it’s a
general-purpose technology for processing streaming data. Unlike
Storm, Spark Streaming is implemented as an extension to the basic
Spark engine—it’s not an add-on technology. This tight connection can
make Spark applications faster, since there’s less need to move data
between components, and easier to create, since everything uses the
same core Spark technology. Because of this, Spark Streaming (and
Spark in general) are getting more popular by the day”
David Chappell
STREAMING SCENARIOS USING THE MICROSOFT DATA PLATFORM
A GUIDE FOR IT LEADERS
HDInsight Spark Streaming
• What is it?
- Distributed compute framework, an extension of the core Apache Spark API
- Allows users to integrate real-time data from disparate event streams (e.g. Kafka,
HDFS, Twitter) in event-driven, asynchronous, scalable, type-safe, and fault tolerant
applications
• When to use it?
- When organizations need realtime decision making
- When you are working with streams of continuous data
• Why Spark Streaming?
- Enables high-throughput and reliable processing of live data streams
- Batch, Iterative, and Streaming analysis on the same platform
- Easily add Machine Learning for streaming data pathways
Getting Started.
References & Further Reading
 Use MapReduce in Hadoop on HDInsight
https://azure.microsoft.com/en-
us/documentation/articles/hdinsight-use-mapreduce
 Get started: Create Apache Spark cluster on HDInsight
Linux and run interactive queries using Spark SQL
https://azure.microsoft.com/en-
us/documentation/articles/hdinsight-apache-spark-
zeppelin-notebook-jupyter-spark-sql/
 Azure Machine Learning -
https://azure.microsoft.com/en-us/services/machine-
learning/
References & Further Reading
 Announcing Apache Spark on Azure HDInsight
https://channel9.msdn.com/Shows/Azure-Friday/Announcing-Apache-
Spark-on-Azure-HDInsight
 Apache Zeppelin https://zeppelin.incubator.apache.org
 Project Jupyter http://jupyter.org/
 https://azure.microsoft.com/en-us/services/hdinsight/
 https://azure.microsoft.com/en-us/blog/apache-spark-for-azure-
hdinsight-now-generally-available/
 Microsoft expands its commitment to Apache Spark big-data framework
 https://azure.microsoft.com/en-us/documentation/articles/hdinsight-
apache-spark-use-zeppelin-notebook/
 https://channel9.msdn.com/Shows/Azure-Friday/Announcing-Apache-
Spark-on-Azure-HDInsight
 http://www.c-sharpcorner.com/UploadFile/aa700f/jumpstart-into-big-
data-with-hdinsight/
References & Further Reading
 Get started: Create Apache Spark cluster on HDInsight Linux and run
interactive queries using Spark SQL https://azure.microsoft.com/en-
us/documentation/articles/hdinsight-apache-spark-jupyter-spark-sql/
 EdX Course: Processing Big Data with Azure HDInsight Processing Big
Data with Azure HDInsight Learn how to use Hadoop technologies in
Microsoft Azure HDInsight to process big data in this five week, hands-on
course. https://www.edx.org/course/processing-big-data-azure-hdinsight-
microsoft-dat202-1x-0
 Apache Spark for Azure HDInsight https://azure.microsoft.com/en-
us/services/hdinsight/apache-spark/
 Build Machine Learning applications to run on Apache Spark clusters on
HDInsight Linux https://azure.microsoft.com/en-
us/documentation/articles/hdinsight-apache-spark-ipython-notebook-
machine-learning/
References & Further Reading
Questions

More Related Content

What's hot

Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceDataWorks Summit
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for ArchitectsTomasz Kopacz
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsDataWorks Summit
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoopMaulik Thaker
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep duttaCapgemini
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningDataWorks Summit
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...DataWorks Summit/Hadoop Summit
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformCaserta
 

What's hot (20)

Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
A Continuously Deployed Hadoop Analytics Platform?
A Continuously Deployed Hadoop Analytics Platform?A Continuously Deployed Hadoop Analytics Platform?
A Continuously Deployed Hadoop Analytics Platform?
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoop
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
 

Viewers also liked

Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureKhalid Salama
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Data science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionData science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionAdnan Masood
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
 
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...Hortonworks
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
[PL] Code Europe 2016 - Python and Microsoft Azure
[PL] Code Europe 2016 - Python and Microsoft Azure[PL] Code Europe 2016 - Python and Microsoft Azure
[PL] Code Europe 2016 - Python and Microsoft AzureMichał Smereczyński
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
 
Why Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataWhy Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataData Con LA
 
Dell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonDell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonGreg Kirchoff
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation BriefBoni Bruno
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the CloudDATAVERSITY
 
PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015Krishna-Kumar
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData, Inc.
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 

Viewers also liked (20)

Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Realtime analytics with_hadoop
Realtime analytics with_hadoopRealtime analytics with_hadoop
Realtime analytics with_hadoop
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Data science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionData science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief Introduction
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
 
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
[PL] Code Europe 2016 - Python and Microsoft Azure
[PL] Code Europe 2016 - Python and Microsoft Azure[PL] Code Europe 2016 - Python and Microsoft Azure
[PL] Code Europe 2016 - Python and Microsoft Azure
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Why Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataWhy Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueData
 
Dell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonDell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with Isilon
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation Brief
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for Hadoop
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the Cloud
 
PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 Overview
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 

Similar to Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD

Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark ZaranTech LLC
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Edureka!
 
Coding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistanceCoding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistancephdAssistance1
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...phdAssistance1
 
Career opportunities in open source framework
Career opportunities in open source frameworkCareer opportunities in open source framework
Career opportunities in open source frameworkedunextgen
 
Career opportunities in open source framework
Career opportunities in open source framework Career opportunities in open source framework
Career opportunities in open source framework edunextgen
 
Big data processing with apache spark
Big data processing with apache sparkBig data processing with apache spark
Big data processing with apache sparksarith divakar
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
Data scientist versus big data
Data scientist versus big dataData scientist versus big data
Data scientist versus big dataPrwaTech
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 

Similar to Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD (20)

Madhu
MadhuMadhu
Madhu
 
Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark
 
BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
sudipto_resume
sudipto_resumesudipto_resume
sudipto_resume
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
 
Coding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistanceCoding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - Phdassistance
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
 
Career opportunities in open source framework
Career opportunities in open source frameworkCareer opportunities in open source framework
Career opportunities in open source framework
 
Career opportunities in open source framework
Career opportunities in open source framework Career opportunities in open source framework
Career opportunities in open source framework
 
Big data processing with apache spark
Big data processing with apache sparkBig data processing with apache spark
Big data processing with apache spark
 
Apresentação Hadoop
Apresentação HadoopApresentação Hadoop
Apresentação Hadoop
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
Data scientist versus big data
Data scientist versus big dataData scientist versus big data
Data scientist versus big data
 
Aioug big data and hadoop
Aioug  big data and hadoopAioug  big data and hadoop
Aioug big data and hadoop
 
Apache spark
Apache sparkApache spark
Apache spark
 
INFO491FinalPaper
INFO491FinalPaperINFO491FinalPaper
INFO491FinalPaper
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 

More from Adnan Masood

Restructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachRestructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachAdnan Masood
 
System Quality Attributes for Software Architecture
System Quality Attributes for Software ArchitectureSystem Quality Attributes for Software Architecture
System Quality Attributes for Software ArchitectureAdnan Masood
 
Agile Software Development
Agile Software DevelopmentAgile Software Development
Agile Software DevelopmentAdnan Masood
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationAdnan Masood
 
Bayesian Networks and Association Analysis
Bayesian Networks and Association AnalysisBayesian Networks and Association Analysis
Bayesian Networks and Association AnalysisAdnan Masood
 
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...Adnan Masood
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionAdnan Masood
 
Web API or WCF - An Architectural Comparison
Web API or WCF - An Architectural ComparisonWeb API or WCF - An Architectural Comparison
Web API or WCF - An Architectural ComparisonAdnan Masood
 
SOLID Principles of Refactoring Presentation - Inland Empire User Group
SOLID Principles of Refactoring Presentation - Inland Empire User GroupSOLID Principles of Refactoring Presentation - Inland Empire User Group
SOLID Principles of Refactoring Presentation - Inland Empire User GroupAdnan Masood
 
Brief bibliography of interestingness measure, bayesian belief network and ca...
Brief bibliography of interestingness measure, bayesian belief network and ca...Brief bibliography of interestingness measure, bayesian belief network and ca...
Brief bibliography of interestingness measure, bayesian belief network and ca...Adnan Masood
 

More from Adnan Masood (10)

Restructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachRestructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality Approach
 
System Quality Attributes for Software Architecture
System Quality Attributes for Software ArchitectureSystem Quality Attributes for Software Architecture
System Quality Attributes for Software Architecture
 
Agile Software Development
Agile Software DevelopmentAgile Software Development
Agile Software Development
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
Bayesian Networks and Association Analysis
Bayesian Networks and Association AnalysisBayesian Networks and Association Analysis
Bayesian Networks and Association Analysis
 
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
Web API or WCF - An Architectural Comparison
Web API or WCF - An Architectural ComparisonWeb API or WCF - An Architectural Comparison
Web API or WCF - An Architectural Comparison
 
SOLID Principles of Refactoring Presentation - Inland Empire User Group
SOLID Principles of Refactoring Presentation - Inland Empire User GroupSOLID Principles of Refactoring Presentation - Inland Empire User Group
SOLID Principles of Refactoring Presentation - Inland Empire User Group
 
Brief bibliography of interestingness measure, bayesian belief network and ca...
Brief bibliography of interestingness measure, bayesian belief network and ca...Brief bibliography of interestingness measure, bayesian belief network and ca...
Brief bibliography of interestingness measure, bayesian belief network and ca...
 

Recently uploaded

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 

Recently uploaded (20)

Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 

Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD

  • 1. A D N A N M A S O O D , P H D S Y S T E M S A R C H I T E C T / D A T A S C I E N T I S T A D N A N . M A S O O D @ O W A S P . O R G ( H T T P : / / B L O G . A D N A N M A S O O D . C O M ) G I T H U B ( G I T H U B . C O M / A D N A N M A S O O D ) , T W I T T E R ( @ A D N A N M A S O O D ) . P R E S E N T E D A T M I C R O S O F T D A T A S C I E N C E G R O U P – T A M P A B A Y D A T A S C I E N C E P R O F E S S I O N A L S H T T P : / / W W W . M E E T U P . C O M / D A T A - S C I E N T I S T S - T A M P A - B A Y / E V E N T S / 2 3 1 2 9 3 0 7 7 / Spark with Azure HDInsight
  • 2. About the Speaker Adnan Masood, Ph.D. is a developer, software architect, and researcher and specializes in FinTech, machine learning and Bayesian belief networks. Before joining PDS Health care, and GDC (a leading prepaid financial technology institution), he enjoyed life as a principal engineer of a start-up and worked for a leading UK based nonprofit organization as a solutions architect. A strong believer in the development community, Adnan is an active member of the Open Web Application Security Project (OWASP), an organization dedicated to software security. In the .NET community, he is a cofounder and president of the Pasadena .NET Developers group, which he has been successfully leading for 8 years. He led a number of successful enterprise solutions and consulted for several Fortune 500 company projects. Adnan devotes himself to his own continual, practical education. He holds certifications in big data, machine learning, and systems architecture from Massachusetts Institute of Technology; an Application Security certification from Stanford University; an SOA Smarts certification from Carnegie Mellon University; and certifications as a ScrumMaster, Microsoft Certified Trainer, Microsoft Certified Solutions Developer, and Sun Certified Java Developer. For more details, visit Adnan's blog (http://blog.adnanmasood.com), GitHub repository (http://github.com/adnanmasood), and Twitter (@adnanmasood). Adnan can be reached at adnan.masood@owasp.org.
  • 3. Announcement: Apache Spark for Azure HDInsight now generally available
  • 4. Channel 9 Walk through of Apache Spark on Azure HDInsight
  • 5. Spark 101  Spark is a unified framework for big data analytics. Spark provides one integrated API for use by developers, data scientists, and analysts to perform diverse tasks that would have previously required separate processing engines such as batch analytics, stream processing and statistical modeling. Spark supports a wide range of popular languages including Python, R, Scala, SQL, and Java. Spark can read from diverse data sources and scale to thousands of nodes.
  • 6. Spark with Azure HDInsight
  • 8. Big Data Deployment – Public Cloud • Hadoop-as-a-Service - Amazon Web Services EC2 and EMR - Microsoft Azure HDInsight - Google Cloud Dataproc - IBM Bluemix ... and others • Spark-as-a-Service - All of the above - Databricks
  • 9. Big Data Deployment – On-Premises • Bare-Metal • Virtual Machines - VMware Big Data Extensions - OpenStack Sahara • Containers - BlueData - Mesos
  • 10. HDInsight as Part of Azure Portal
  • 11.
  • 13. Spark – Use Cases
  • 16. Demo - Creating a HDInsight Spark Cluster
  • 17. HDInsight Spark Streaming “Along with traditional Hadoop technologies, HDInsight also provides Spark as a cloud service. Spark is an integrated set of open source technologies that can run on a Hadoop cluster. The Spark family includes options for analyzing large amounts of operational data, doing machine learning, and more. It also includes Spark Streaming, a technology for working with streaming data. Spark Streaming is similar to Storm in some ways. Like Storm, it’s a general-purpose technology for processing streaming data. Unlike Storm, Spark Streaming is implemented as an extension to the basic Spark engine—it’s not an add-on technology. This tight connection can make Spark applications faster, since there’s less need to move data between components, and easier to create, since everything uses the same core Spark technology. Because of this, Spark Streaming (and Spark in general) are getting more popular by the day” David Chappell STREAMING SCENARIOS USING THE MICROSOFT DATA PLATFORM A GUIDE FOR IT LEADERS
  • 18. HDInsight Spark Streaming • What is it? - Distributed compute framework, an extension of the core Apache Spark API - Allows users to integrate real-time data from disparate event streams (e.g. Kafka, HDFS, Twitter) in event-driven, asynchronous, scalable, type-safe, and fault tolerant applications • When to use it? - When organizations need realtime decision making - When you are working with streams of continuous data • Why Spark Streaming? - Enables high-throughput and reliable processing of live data streams - Batch, Iterative, and Streaming analysis on the same platform - Easily add Machine Learning for streaming data pathways
  • 20. References & Further Reading  Use MapReduce in Hadoop on HDInsight https://azure.microsoft.com/en- us/documentation/articles/hdinsight-use-mapreduce  Get started: Create Apache Spark cluster on HDInsight Linux and run interactive queries using Spark SQL https://azure.microsoft.com/en- us/documentation/articles/hdinsight-apache-spark- zeppelin-notebook-jupyter-spark-sql/  Azure Machine Learning - https://azure.microsoft.com/en-us/services/machine- learning/
  • 21. References & Further Reading  Announcing Apache Spark on Azure HDInsight https://channel9.msdn.com/Shows/Azure-Friday/Announcing-Apache- Spark-on-Azure-HDInsight  Apache Zeppelin https://zeppelin.incubator.apache.org  Project Jupyter http://jupyter.org/  https://azure.microsoft.com/en-us/services/hdinsight/  https://azure.microsoft.com/en-us/blog/apache-spark-for-azure- hdinsight-now-generally-available/  Microsoft expands its commitment to Apache Spark big-data framework  https://azure.microsoft.com/en-us/documentation/articles/hdinsight- apache-spark-use-zeppelin-notebook/  https://channel9.msdn.com/Shows/Azure-Friday/Announcing-Apache- Spark-on-Azure-HDInsight  http://www.c-sharpcorner.com/UploadFile/aa700f/jumpstart-into-big- data-with-hdinsight/
  • 22. References & Further Reading  Get started: Create Apache Spark cluster on HDInsight Linux and run interactive queries using Spark SQL https://azure.microsoft.com/en- us/documentation/articles/hdinsight-apache-spark-jupyter-spark-sql/  EdX Course: Processing Big Data with Azure HDInsight Processing Big Data with Azure HDInsight Learn how to use Hadoop technologies in Microsoft Azure HDInsight to process big data in this five week, hands-on course. https://www.edx.org/course/processing-big-data-azure-hdinsight- microsoft-dat202-1x-0  Apache Spark for Azure HDInsight https://azure.microsoft.com/en- us/services/hdinsight/apache-spark/  Build Machine Learning applications to run on Apache Spark clusters on HDInsight Linux https://azure.microsoft.com/en- us/documentation/articles/hdinsight-apache-spark-ipython-notebook- machine-learning/

Editor's Notes

  1. Slides courtesy Microsoft Corporation - Scott talks to Asad Khan about the addition of Apache Spark on Azure HDInsight. Apache Spark is a unified, open source, parallel data processing framework for Big Data Analytics. Spark brings together batch processing, real-time processing, stream analytics, machine learning, and interactive SQL and Azure makes Spark "Software as a Service." It's a great time for you to jump into the world of Big Data on Azure.
  2. Slide Courtesy Neudesic