SlideShare a Scribd company logo
1 of 28
Global Sponsor:
Big Data on the Microsoft Platform
Andrew J. Brust, CEO, Blue Badge Insights
With Hadoop, MS BI and the SQL Server stack
CEO and Founder, Blue Badge Insights
Big Data blogger for ZDNet
Microsoft Regional Director, MVP
Co-chair VSLive! and 17 years as a speaker
Founder, Microsoft BI User Group of NYC
 http://www.msbigdatanyc.com
Co-moderator, NYC .NET Developers Group
 http://www.nycdotnetdev.com
“Redmond Review” columnist for
Visual Studio Magazine
brustblog.com, Twitter: @andrewbrust
Meet Andrew
Read all about it!
My New Blog (bit.ly/bigondata)
Agenda
Big Data, Hadoop and HDInsight
MapReduce
Hive ODBC, BI Stack
Hekaton, NoSQL
SQL Server Parallel Data Warehouse, MPP, PolyBase
What is Big Data?
100s of TB into PB and higher
Involving data from: financial data, sensors, web logs,
social media, etc.
Parallel processing often involved
 Hadoop is emblematic, but other technologies are Big Data too
Processing of data sets too large for transactional
databases
 Analyzing interactions, rather than transactions
 The three V’s: Volume, Velocity, Variety
•Big Data tech sometimes imposed on small data problems
What is Hadoop?
Open source implementation of Google’s MapReduce and
GFS (Google File System)
Allows for scale-out processing of petabyte scale data
 1 PB = 1,024 TB
Also distributed storage
Commodity hardware
Can work against flat files, or certain database formats
Native processing involves imperative Java code
Other languages supported through “Streaming”
7
What is HDInsight?
Microsoft’s Hadoop distribution, on Windows
 Most other distros on Linux
Based on Hortonworks Data Platform (HDP)
Runs on Azure, eventually on Windows Server, and as
sandbox on dev PC
For .NET devs: .NET SDK for Hadoop, LINQ provider
8
Global Sponsor:
Demo
HDInsight
The Hadoop Stack
MapReduce, HDFS
Database
RDBMS Import/Export
Query: HiveQL and Pig Latin
Machine Learning/Data Mining
Log file integration
MapReduce, in a Diagram
mapper
mapper
mapper
mapper
mapper
mapper
Input
reducer
reducer
reducer
Input
Input
Input
Input
Input
Input
Output
Output
Output
Output
Output
Output
Output
Input
Input
Input
K1
K2
K3
Output
Output
Output
A MapReduce Example
• Count by suite, on each floor
• Send per-suite, per platform totals to lobby
• Sort totals by platform
• Send two platform packets to 10th, 20th, 30th floor
• Tally up each platform
• Merge tallies into one spreadsheet
• Collect the tallies
MapReduce Options
Pig, Hive, Sqoop, Mahout also generate MapReduce code
13
Java
JavaScript (“Rhino”)
Other languages, especially Python, via Streaming
C# via Streaming
C# via .NET SDK
Hortonworks Data
Platform for
Windows
MRLib (NuGet
Package)
LINQ to Hive
OdbcClient +
Hive ODBC
Driver
Deployment
WebHDFS
client
MR code in C#,
HadoopJob,
MapperBase,
ReducerBase
Amenities for
Visual Studio/.NET
Global Sponsor:
Demo
MapReduce
Hive
Began as Hadoop sub-project
 Now top-level Apache project
Provides a SQL-like (“HiveQL”) abstraction over
MapReduce
Has its own HDFS table file format (and it’s fully schema-
bound)
Can also work over HBase
Acts as a bridge to many BI products which expect tabular
data
Hive ODBC Consumers
17
Excel 2010 or 2013 (including via add-in)
PowerPivot
SQL Server Analysis Services, Tabular Mode
SQL Server Reporting Services
ADO.NET OdbcClient provider
LINQ provider
xVelocity Technologies
Formerly known as VertiPaq
PowerPivot, SSAS Tabular, SQL Server columnar indexes
Implements BI Semantic Model (BISM)
Uses column store technology
 Compression
 In-memory
 Speed
Not a Big Data technology per se, but very useful for
analysis of job output
18
Power View
Reports on BISM models (PowerPivot, SSAS Tabular)
Hosted in SharePoint 2010, 2013 Enterprise
Also Excel 2013 (but not on ARM/Windows RT)
Interactive data exploration
19
Global Sponsor:
Demo
Hive ODBC + BI Stack
Project “Hekaton”
In-memory engine for SQL Server transactional workloads
Tables must be declared as in-memory explicitly
In-memory and standard tables can coexist in same db
Stored procs on in-mem tables are compiled to native code
Hekaton and xVelocity are separate
Hekaton ≠ PowerPivot/SSAS Tabular
Hekaton ≠ Columnstore indexes
Compare to SAP HANA
 In-memory, transactional, analytical, column store
21
NoSQL
NoSQL databases are non-relational and non- or loosely-
schematized
HBase is a NoSQL database, of the wide column variety
 Hive implements a SQL layer over it
 HBase not yet in HDInsight
HBase table = HDFS file
Three other NoSQL categories
 Key-value store, document store, graph database
 Azure Table Storage is a key-value store NoSQL database
Some of them aren’t really Big Data tools, but market
themselves that way anyway
22
SQL Parallel Data Warehouse (PDW)
SQL PDW is a Massively Parallel Processing (MPP)
database
Teradata, IBM Netezza, HP Vertica also in this category
It’s an array/cluster of SQL servers made to look like one
SQL Server
Available as appliance only
 Purchase from HP, Dell
 Server, storage and network all pre-built and configured
Many other MPP products based on PostgreSQL
PDW loosely based on acquired DATAllegro product
 Implemented MPP with Ingres, written in Java, running on Linux
23
MapReduce versus MPP
24
MapReduce MPP
 Splits preprocessing amongst mapper
nodes and aggregation amongst reducers
 Scales infinitely on commodity hardware
 Uses locally attached commodity disks on
nodes
 Uses imperative code
 Processes flat files, wide column tables
(HBase), relational tables (Hive)
 Divide-and-conquer approach, parallel,
distributed
 Splits query amongst nodes then unifies
result sets
 Scales to high-end assets in the
appliance cabinet
 Uses shared storage (can be more
network traffic)
 Uses SQL
 Works with relational tables only
 Divide-and-conquer approach, parallel,
distributed
PolyBase
To be included in next version of PDW
Mashup of SQL Server and Hadoop
Enables PDW to address Hadoop data nodes (HDFS)
directly
Parallelism managed by PDW
Tables are imported into SQL Server db
 They are EXTERNAL tables
 They can participate in joins with standard tables
25
Resources
MS Big Data/HDInsight
 http://www.microsoft.com/bigdata
Apache Hadoop
 http://hadoop.apache.org/
Apache HBase
 http://hbase.apache.org/
SQL PDW
 http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/pdw.aspx
PolyBase
 http://gsl.azurewebsites.net/Projects/Polybase.aspx
xVelocity
 http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/in-
memory.aspx
Column store
 http://en.wikipedia.org/wiki/Column-oriented_DBMS
Power View
 http://office.microsoft.com/en-us/excel-help/power-view-explore-visualize-and-present-your-
data-HA102835634.aspx
Hekaton
 http://blogs.technet.com/b/dataplatforminsider/archive/2012/11/08/breakthrough-performance-
with-in-memory-technologies.aspx
26
Global Sponsor:
Questions?
Global Sponsor:
Thank You for Attending

More Related Content

What's hot

Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1Abbas Maazallahi
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop WorldCloudera, Inc.
 

What's hot (20)

Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
Big data & hadoop
Big data & hadoopBig data & hadoop
Big data & hadoop
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Hadoop at 10
Apache Hadoop at 10Apache Hadoop at 10
Apache Hadoop at 10
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop World
 

Viewers also liked

Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Understanding yarn - Pune apex meetup jan 06 2016
Understanding yarn - Pune apex meetup jan 06 2016 Understanding yarn - Pune apex meetup jan 06 2016
Understanding yarn - Pune apex meetup jan 06 2016 Priyanka Gugale
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisCloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisAndrew Brust
 
MongoDB Hadoop and Humongous Data
MongoDB Hadoop and Humongous DataMongoDB Hadoop and Humongous Data
MongoDB Hadoop and Humongous DataMongoDB
 
SQL Saturday Paris 2015 - Polybase
SQL Saturday Paris 2015 - PolybaseSQL Saturday Paris 2015 - Polybase
SQL Saturday Paris 2015 - PolybaseRomain Casteres
 
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf FraenkelBi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf Fraenkelsqlserver.co.il
 
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paperSql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paperWendy Frodyma
 
Versa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinarVersa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinarShawn Rao
 
PDW value proposition
PDW value propositionPDW value proposition
PDW value propositionWendy Frodyma
 
What's New in SQL Server 2016 for BI
What's New in SQL Server 2016 for BIWhat's New in SQL Server 2016 for BI
What's New in SQL Server 2016 for BITeo Lachev
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and SemanticsMartin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semanticssemanticsconference
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsCloudera, Inc.
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
How to get started with R programming
How to get started with R programmingHow to get started with R programming
How to get started with R programmingRamon Salazar
 
Transitioning to a BI Role
Transitioning to a BI RoleTransitioning to a BI Role
Transitioning to a BI RoleJames Serra
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Best Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsBest Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsJames Serra
 

Viewers also liked (20)

Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Understanding yarn - Pune apex meetup jan 06 2016
Understanding yarn - Pune apex meetup jan 06 2016 Understanding yarn - Pune apex meetup jan 06 2016
Understanding yarn - Pune apex meetup jan 06 2016
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisCloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
 
MongoDB Hadoop and Humongous Data
MongoDB Hadoop and Humongous DataMongoDB Hadoop and Humongous Data
MongoDB Hadoop and Humongous Data
 
SQL Saturday Paris 2015 - Polybase
SQL Saturday Paris 2015 - PolybaseSQL Saturday Paris 2015 - Polybase
SQL Saturday Paris 2015 - Polybase
 
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf FraenkelBi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
 
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paperSql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
 
Versa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinarVersa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinar
 
PDW value proposition
PDW value propositionPDW value proposition
PDW value proposition
 
What's New in SQL Server 2016 for BI
What's New in SQL Server 2016 for BIWhat's New in SQL Server 2016 for BI
What's New in SQL Server 2016 for BI
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and SemanticsMartin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Modern Veri Ambarı_Cem Kubilay
Modern Veri Ambarı_Cem KubilayModern Veri Ambarı_Cem Kubilay
Modern Veri Ambarı_Cem Kubilay
 
How to get started with R programming
How to get started with R programmingHow to get started with R programming
How to get started with R programming
 
Transitioning to a BI Role
Transitioning to a BI RoleTransitioning to a BI Role
Transitioning to a BI Role
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Best Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsBest Practices to Deliver BI Solutions
Best Practices to Deliver BI Solutions
 

Similar to Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack

Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Andrew Brust
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 
Srikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydSrikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydsrikanth K
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014Stratebi
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 

Similar to Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack (20)

Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
מיכאל
מיכאלמיכאל
מיכאל
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 
Srikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydSrikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hyd
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Hadoop intro
Hadoop introHadoop intro
Hadoop intro
 

More from Andrew Brust

Azure ml screen grabs
Azure ml screen grabsAzure ml screen grabs
Azure ml screen grabsAndrew Brust
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
NoSQL: An Analysis
NoSQL: An AnalysisNoSQL: An Analysis
NoSQL: An AnalysisAndrew Brust
 
Hitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIHitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIAndrew Brust
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
NoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooNoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooAndrew Brust
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooAndrew Brust
 
Hadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionHadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionAndrew Brust
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Big Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-LandBig Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-LandAndrew Brust
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystemAndrew Brust
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012Andrew Brust
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataAndrew Brust
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012Andrew Brust
 
Grasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmGrasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmAndrew Brust
 
SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms Andrew Brust
 
Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Andrew Brust
 

More from Andrew Brust (17)

Azure ml screen grabs
Azure ml screen grabsAzure ml screen grabs
Azure ml screen grabs
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
NoSQL: An Analysis
NoSQL: An AnalysisNoSQL: An Analysis
NoSQL: An Analysis
 
Hitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BIHitchhiker’s Guide to SharePoint BI
Hitchhiker’s Guide to SharePoint BI
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
NoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooNoSQL and The Big Data Hullabaloo
NoSQL and The Big Data Hullabaloo
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data Hullabaloo
 
Hadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionHadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in Action
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Big Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-LandBig Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-Land
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012SQL Server Workshop for Developers - Visual Studio Live! NY 2012
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
 
Power View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s DataPower View: Analysis and Visualization for Your Application’s Data
Power View: Analysis and Visualization for Your Application’s Data
 
Evolved BI with SQL Server 2012
Evolved BIwith SQL Server 2012Evolved BIwith SQL Server 2012
Evolved BI with SQL Server 2012
 
Grasping The LightSwitch Paradigm
Grasping The LightSwitch ParadigmGrasping The LightSwitch Paradigm
Grasping The LightSwitch Paradigm
 
SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms SQL Server Denali: BI on Your Terms
SQL Server Denali: BI on Your Terms
 
Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis Microsoft and its Competition: A Developer-Friendly Market Analysis
Microsoft and its Competition: A Developer-Friendly Market Analysis
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack

  • 1. Global Sponsor: Big Data on the Microsoft Platform Andrew J. Brust, CEO, Blue Badge Insights With Hadoop, MS BI and the SQL Server stack
  • 2. CEO and Founder, Blue Badge Insights Big Data blogger for ZDNet Microsoft Regional Director, MVP Co-chair VSLive! and 17 years as a speaker Founder, Microsoft BI User Group of NYC  http://www.msbigdatanyc.com Co-moderator, NYC .NET Developers Group  http://www.nycdotnetdev.com “Redmond Review” columnist for Visual Studio Magazine brustblog.com, Twitter: @andrewbrust Meet Andrew
  • 4. My New Blog (bit.ly/bigondata)
  • 5. Agenda Big Data, Hadoop and HDInsight MapReduce Hive ODBC, BI Stack Hekaton, NoSQL SQL Server Parallel Data Warehouse, MPP, PolyBase
  • 6. What is Big Data? 100s of TB into PB and higher Involving data from: financial data, sensors, web logs, social media, etc. Parallel processing often involved  Hadoop is emblematic, but other technologies are Big Data too Processing of data sets too large for transactional databases  Analyzing interactions, rather than transactions  The three V’s: Volume, Velocity, Variety •Big Data tech sometimes imposed on small data problems
  • 7. What is Hadoop? Open source implementation of Google’s MapReduce and GFS (Google File System) Allows for scale-out processing of petabyte scale data  1 PB = 1,024 TB Also distributed storage Commodity hardware Can work against flat files, or certain database formats Native processing involves imperative Java code Other languages supported through “Streaming” 7
  • 8. What is HDInsight? Microsoft’s Hadoop distribution, on Windows  Most other distros on Linux Based on Hortonworks Data Platform (HDP) Runs on Azure, eventually on Windows Server, and as sandbox on dev PC For .NET devs: .NET SDK for Hadoop, LINQ provider 8
  • 10. The Hadoop Stack MapReduce, HDFS Database RDBMS Import/Export Query: HiveQL and Pig Latin Machine Learning/Data Mining Log file integration
  • 11. MapReduce, in a Diagram mapper mapper mapper mapper mapper mapper Input reducer reducer reducer Input Input Input Input Input Input Output Output Output Output Output Output Output Input Input Input K1 K2 K3 Output Output Output
  • 12. A MapReduce Example • Count by suite, on each floor • Send per-suite, per platform totals to lobby • Sort totals by platform • Send two platform packets to 10th, 20th, 30th floor • Tally up each platform • Merge tallies into one spreadsheet • Collect the tallies
  • 13. MapReduce Options Pig, Hive, Sqoop, Mahout also generate MapReduce code 13 Java JavaScript (“Rhino”) Other languages, especially Python, via Streaming C# via Streaming C# via .NET SDK
  • 14. Hortonworks Data Platform for Windows MRLib (NuGet Package) LINQ to Hive OdbcClient + Hive ODBC Driver Deployment WebHDFS client MR code in C#, HadoopJob, MapperBase, ReducerBase Amenities for Visual Studio/.NET
  • 16. Hive Began as Hadoop sub-project  Now top-level Apache project Provides a SQL-like (“HiveQL”) abstraction over MapReduce Has its own HDFS table file format (and it’s fully schema- bound) Can also work over HBase Acts as a bridge to many BI products which expect tabular data
  • 17. Hive ODBC Consumers 17 Excel 2010 or 2013 (including via add-in) PowerPivot SQL Server Analysis Services, Tabular Mode SQL Server Reporting Services ADO.NET OdbcClient provider LINQ provider
  • 18. xVelocity Technologies Formerly known as VertiPaq PowerPivot, SSAS Tabular, SQL Server columnar indexes Implements BI Semantic Model (BISM) Uses column store technology  Compression  In-memory  Speed Not a Big Data technology per se, but very useful for analysis of job output 18
  • 19. Power View Reports on BISM models (PowerPivot, SSAS Tabular) Hosted in SharePoint 2010, 2013 Enterprise Also Excel 2013 (but not on ARM/Windows RT) Interactive data exploration 19
  • 21. Project “Hekaton” In-memory engine for SQL Server transactional workloads Tables must be declared as in-memory explicitly In-memory and standard tables can coexist in same db Stored procs on in-mem tables are compiled to native code Hekaton and xVelocity are separate Hekaton ≠ PowerPivot/SSAS Tabular Hekaton ≠ Columnstore indexes Compare to SAP HANA  In-memory, transactional, analytical, column store 21
  • 22. NoSQL NoSQL databases are non-relational and non- or loosely- schematized HBase is a NoSQL database, of the wide column variety  Hive implements a SQL layer over it  HBase not yet in HDInsight HBase table = HDFS file Three other NoSQL categories  Key-value store, document store, graph database  Azure Table Storage is a key-value store NoSQL database Some of them aren’t really Big Data tools, but market themselves that way anyway 22
  • 23. SQL Parallel Data Warehouse (PDW) SQL PDW is a Massively Parallel Processing (MPP) database Teradata, IBM Netezza, HP Vertica also in this category It’s an array/cluster of SQL servers made to look like one SQL Server Available as appliance only  Purchase from HP, Dell  Server, storage and network all pre-built and configured Many other MPP products based on PostgreSQL PDW loosely based on acquired DATAllegro product  Implemented MPP with Ingres, written in Java, running on Linux 23
  • 24. MapReduce versus MPP 24 MapReduce MPP  Splits preprocessing amongst mapper nodes and aggregation amongst reducers  Scales infinitely on commodity hardware  Uses locally attached commodity disks on nodes  Uses imperative code  Processes flat files, wide column tables (HBase), relational tables (Hive)  Divide-and-conquer approach, parallel, distributed  Splits query amongst nodes then unifies result sets  Scales to high-end assets in the appliance cabinet  Uses shared storage (can be more network traffic)  Uses SQL  Works with relational tables only  Divide-and-conquer approach, parallel, distributed
  • 25. PolyBase To be included in next version of PDW Mashup of SQL Server and Hadoop Enables PDW to address Hadoop data nodes (HDFS) directly Parallelism managed by PDW Tables are imported into SQL Server db  They are EXTERNAL tables  They can participate in joins with standard tables 25
  • 26. Resources MS Big Data/HDInsight  http://www.microsoft.com/bigdata Apache Hadoop  http://hadoop.apache.org/ Apache HBase  http://hbase.apache.org/ SQL PDW  http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/pdw.aspx PolyBase  http://gsl.azurewebsites.net/Projects/Polybase.aspx xVelocity  http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/in- memory.aspx Column store  http://en.wikipedia.org/wiki/Column-oriented_DBMS Power View  http://office.microsoft.com/en-us/excel-help/power-view-explore-visualize-and-present-your- data-HA102835634.aspx Hekaton  http://blogs.technet.com/b/dataplatforminsider/archive/2012/11/08/breakthrough-performance- with-in-memory-technologies.aspx 26
  • 28. Global Sponsor: Thank You for Attending