SlideShare a Scribd company logo
1 of 19
10 Amazing Things 
To Do With a 
Hadoop-Based Data 
Lake 
Strata Conference New York 2014 
Greg Chase 
Director, Product Marketing, Pivotal Software 
© 2014 Pivotal Software, Inc. All rights reserved. 2
Pivotal Business Data Lake Architecture 
Sources Ingestion 
Action Tier 
Tier 
Insights 
Tier 
Unified Operations Tier 
Command Center 
Spring XD, Oozie 
Processing Tier 
GemFire XD 
HAWQ/Greenplum 
Distillation Tier 
Pivotal HD 
Unstructured and structured data 
GemFire XD 
Spring XD 
Spring XD 
GemFire XD 
Sqoop 
Flume 
Spring XD 
GemFire XD 
HAWQ 
HBase 
HAWQ 
GemFire XD 
HBase 
HAWQ 
MapReduce 
Hive 
Pig 
Query interfaces 
Clickstream 
Sensor Data 
Weblogs 
Network 
Data 
CRM Data 
ERP Data 
GemFire 
RabbitMQ 
Redis 
Pivotal CF 
© 2014 Pivotal Software, Inc. All rights reserved. 3
Pivotal Business Data Lake Architecture 
Sources Ingestion 
Action Tier 
Tier 
Insights 
Tier 
Unified Operations Tier 
Command Center 
Spring XD, Oozie 
Processing Tier 
GemFire XD 
HAWQ/Greenplum 
Distillation Tier 
Pivotal HD 
Unstructured and structured data 
GemFire XD 
Spring XD 
Spring XD 
GemFire XD 
Sqoop 
Flume 
Spring XD 
GemFire XD 
HAWQ 
HBase 
HAWQ 
GemFire XD 
HBase 
HAWQ 
MapReduce 
Hive 
Pig 
Query interfaces 
Clickstream 
Sensor Data 
Weblogs 
Network 
Data 
CRM Data 
ERP Data 
GemFire 
RabbitMQ 
Redis 
Pivotal CF 
© 2014 Pivotal Software, Inc. All rights reserved. 4
1. Store Massive Data Sets 
… 
Rack 1 Rack 2 Rack 3 Rack n 
Scale-out: 
use 
commodity 
hardware 
and storage 
© 2014 Pivotal Software, Inc. All rights reserved. 5
2. Mix Disparate Data Sources 
101010101010 
Sensor data 
CRM data 
Website click streams 
Schema 
flexibility: 
adsorb 
different 
data types 
from data 
sources 
© 2014 Pivotal Software, Inc. All rights reserved. 6
Pivotal Business Data Lake Architecture 
Sources Ingestion 
Action Tier 
Tier 
Insights 
Tier 
Unified Operations Tier 
Command Center 
Spring XD, Oozie 
Processing Tier 
GemFire XD 
HAWQ/Greenplum 
Distillation Tier 
Pivotal HD 
Unstructured and structured data 
GemFire XD 
Spring XD 
Spring XD 
GemFire XD 
Sqoop 
Flume 
Spring XD 
GemFire XD 
HAWQ 
HBase 
HAWQ 
GemFire XD 
HBase 
HAWQ 
MapReduce 
Hive 
Pig 
Query interfaces 
Clickstream 
Sensor Data 
Weblogs 
Network 
Data 
CRM Data 
ERP Data 
GemFire 
RabbitMQ 
Redis 
Pivotal CF 
© 2014 Pivotal Software, Inc. All rights reserved. 7
3. Ingest Bulk Data 
D … 
D … D 
Microbatch 
Scalable 
open source 
tools for 
batch 
loading data 
Batch 
Flume 
 Event driven 
 Any source 
Spring XD 
 Bulk load 
 With processing 
 With analytics 
 Any source 
Sqoop 
 Bulk load 
 RDBMS 
© 2014 Pivotal Software, Inc. All rights reserved. 8
4. Ingest High-Velocity Data 
Capture all 
volatile data. 
Apply 
structure. 
1010101010101010101 
1010101010101010101 
1010101010101010101 
Spring XD 
 Bulk load 
 Real-time ingest 
 With processing 
 With analytics 
 Any source 
Pivotal GemFire XD 
 Advanced DB operations 
 Consistency 
 Reliable persistence 
 Convert to structured 
Streaming data 
© 2014 Pivotal Software, Inc. All rights reserved. 9
Pivotal Business Data Lake Architecture 
Sources Ingestion 
Action Tier 
Tier 
Insights 
Tier 
Unified Operations Tier 
Command Center 
Spring XD, Oozie 
Processing Tier 
GemFire XD 
HAWQ/Greenplum 
Distillation Tier 
Pivotal HD 
Unstructured and structured data 
GemFire XD 
Spring XD 
Spring XD 
GemFire XD 
Sqoop 
Flume 
Spring XD 
GemFire XD 
HAWQ 
HBase 
HAWQ 
GemFire XD 
HBase 
HAWQ 
MapReduce 
Hive 
Pig 
Query interfaces 
Clickstream 
Sensor Data 
Weblogs 
Network 
Data 
CRM Data 
ERP Data 
GemFire 
RabbitMQ 
Redis 
Pivotal CF 
© 2014 Pivotal Software, Inc. All rights reserved. 10
5. Apply Structure to Unstructured / Semi- 
Structured Data 
Flexible 
processing 
of different 
data types 
101010101010 
1 
101010101010 
1 
101010101010 
1 
© 2014 Pivotal Software, Inc. All rights reserved. 11
6. Make Data Available for MPP SQL Analysis 
Name 
Node 
Fast 
processing 
for 
advanced 
analytics in 
many 
supported 
HDFS 
formats 
Resource 
Manager 
HAWQ 
Master 
Data 
Node 
Node 
Manager 
HAWQ 
Segment(s) 
Data 
Node 
Node 
Manager 
Data 
Node 
Node 
Manager 
Data 
Node 
Node 
Manager 
HAWQ 
Segment(s) 
HAWQ 
Segment(s) 
HAWQ 
Segment(s) 
Hadoop Cluster 
© 2014 Pivotal Software, Inc. All rights reserved. 12
7. Achieve Data Integration 
Create multi-dimensional 
analytical 
models. 
101010101010 
1 
101010101010 
1 
101010101010 
1 
© 2014 Pivotal Software, Inc. All rights reserved. 13
8. Improve Machine Learning & Predictive 
Analytics 
Richer, 
deeper data 
sets for 
accurate 
predictive 
analytics. 
HAWQ 
Master 
HAWQ 
Segment(s) 
HAWQ 
Segment(s) 
HAWQ 
Segment(s) 
© 2014 Pivotal Software, Inc. All rights reserved. 14
9. Deploy Real-Time Automation at Scale 
Respond in 
real-time, at 
scale. 
Archive 
history in 
Hadoop. 
Pivotal 
GemFire XD 
101010101010 
Web 
App 
Web 
App 
Web 
App 
101010101010 
In-Memory 
© 2014 Pivotal Software, Inc. All rights reserved. 15
10. Achieve Continuous Innovation at Scale 
HAWQ 
Master 
HAWQ 
Segment(s) 
HAWQ 
Segment(s) 
HAWQ 
Segment(s) 
In-Memory 
Web 
App 
Web 
App 
Web 
App 
101010101010 
Sensor data 
CRM data 
Website click streams 
Deploy automation 
At scale 
Capture and store all data 
Analyze to 
discover insights 
& algorithms 
© 2014 Pivotal Software, Inc. All rights reserved. 16
Increase Value Derived from Data With a Data 
Lake 
Store 
massive 
data sets 
Mix 
disparate 
data 
Ingest bulk 
data 
Ingest 
high-velocity 
data 
Apply 
structure 
Enable 
MPP 
analysis 
Achieve 
data 
integration 
Business Value 
Improve 
predictive 
analytics 
Deploy 
real-time 
automation 
at scale 
Achieve 
continuous 
innovation 
© 2014 Pivotal Software, Inc. All rights reserved. 17
For more information on 
Pivotal Big Data Suite 
Visit Pivotal.io/big-data 
© 2014 Pivotal Software, Inc. All rights reserved. 18
10 Amazing Things To Do With a Hadoop-Based Data Lake

More Related Content

What's hot

Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDataWorks Summit
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsHortonworks
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Hortonworks
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data LakesKiran Kamreddy
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success DataWorks Summit/Hadoop Summit
 
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariHortonworks
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...DataWorks Summit
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Oncrawl elasticsearch meetup france #12
Oncrawl elasticsearch meetup france #12Oncrawl elasticsearch meetup france #12
Oncrawl elasticsearch meetup france #12Tanguy MOAL
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark Summit
 

What's hot (20)

Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Oncrawl elasticsearch meetup france #12
Oncrawl elasticsearch meetup france #12Oncrawl elasticsearch meetup france #12
Oncrawl elasticsearch meetup france #12
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
 

Similar to 10 Amazing Things To Do With a Hadoop-Based Data Lake

Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopTrafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopKrishna-Kumar
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformEMC
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudDataWorks Summit/Hadoop Summit
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachDataWorks Summit
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Inside Analysis
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - finalHortonworks
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개Seungdon Choi
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
 
Business intelligence in the era of big data
Business intelligence in the era of big dataBusiness intelligence in the era of big data
Business intelligence in the era of big dataJC Raveneau
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方HortonworksJapan
 

Similar to 10 Amazing Things To Do With a Hadoop-Based Data Lake (20)

Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopTrafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?Hadoop as an Analytic Platform: Why Not?
Hadoop as an Analytic Platform: Why Not?
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Business intelligence in the era of big data
Business intelligence in the era of big dataBusiness intelligence in the era of big data
Business intelligence in the era of big data
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
 

More from VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

10 Amazing Things To Do With a Hadoop-Based Data Lake

  • 1.
  • 2. 10 Amazing Things To Do With a Hadoop-Based Data Lake Strata Conference New York 2014 Greg Chase Director, Product Marketing, Pivotal Software © 2014 Pivotal Software, Inc. All rights reserved. 2
  • 3. Pivotal Business Data Lake Architecture Sources Ingestion Action Tier Tier Insights Tier Unified Operations Tier Command Center Spring XD, Oozie Processing Tier GemFire XD HAWQ/Greenplum Distillation Tier Pivotal HD Unstructured and structured data GemFire XD Spring XD Spring XD GemFire XD Sqoop Flume Spring XD GemFire XD HAWQ HBase HAWQ GemFire XD HBase HAWQ MapReduce Hive Pig Query interfaces Clickstream Sensor Data Weblogs Network Data CRM Data ERP Data GemFire RabbitMQ Redis Pivotal CF © 2014 Pivotal Software, Inc. All rights reserved. 3
  • 4. Pivotal Business Data Lake Architecture Sources Ingestion Action Tier Tier Insights Tier Unified Operations Tier Command Center Spring XD, Oozie Processing Tier GemFire XD HAWQ/Greenplum Distillation Tier Pivotal HD Unstructured and structured data GemFire XD Spring XD Spring XD GemFire XD Sqoop Flume Spring XD GemFire XD HAWQ HBase HAWQ GemFire XD HBase HAWQ MapReduce Hive Pig Query interfaces Clickstream Sensor Data Weblogs Network Data CRM Data ERP Data GemFire RabbitMQ Redis Pivotal CF © 2014 Pivotal Software, Inc. All rights reserved. 4
  • 5. 1. Store Massive Data Sets … Rack 1 Rack 2 Rack 3 Rack n Scale-out: use commodity hardware and storage © 2014 Pivotal Software, Inc. All rights reserved. 5
  • 6. 2. Mix Disparate Data Sources 101010101010 Sensor data CRM data Website click streams Schema flexibility: adsorb different data types from data sources © 2014 Pivotal Software, Inc. All rights reserved. 6
  • 7. Pivotal Business Data Lake Architecture Sources Ingestion Action Tier Tier Insights Tier Unified Operations Tier Command Center Spring XD, Oozie Processing Tier GemFire XD HAWQ/Greenplum Distillation Tier Pivotal HD Unstructured and structured data GemFire XD Spring XD Spring XD GemFire XD Sqoop Flume Spring XD GemFire XD HAWQ HBase HAWQ GemFire XD HBase HAWQ MapReduce Hive Pig Query interfaces Clickstream Sensor Data Weblogs Network Data CRM Data ERP Data GemFire RabbitMQ Redis Pivotal CF © 2014 Pivotal Software, Inc. All rights reserved. 7
  • 8. 3. Ingest Bulk Data D … D … D Microbatch Scalable open source tools for batch loading data Batch Flume  Event driven  Any source Spring XD  Bulk load  With processing  With analytics  Any source Sqoop  Bulk load  RDBMS © 2014 Pivotal Software, Inc. All rights reserved. 8
  • 9. 4. Ingest High-Velocity Data Capture all volatile data. Apply structure. 1010101010101010101 1010101010101010101 1010101010101010101 Spring XD  Bulk load  Real-time ingest  With processing  With analytics  Any source Pivotal GemFire XD  Advanced DB operations  Consistency  Reliable persistence  Convert to structured Streaming data © 2014 Pivotal Software, Inc. All rights reserved. 9
  • 10. Pivotal Business Data Lake Architecture Sources Ingestion Action Tier Tier Insights Tier Unified Operations Tier Command Center Spring XD, Oozie Processing Tier GemFire XD HAWQ/Greenplum Distillation Tier Pivotal HD Unstructured and structured data GemFire XD Spring XD Spring XD GemFire XD Sqoop Flume Spring XD GemFire XD HAWQ HBase HAWQ GemFire XD HBase HAWQ MapReduce Hive Pig Query interfaces Clickstream Sensor Data Weblogs Network Data CRM Data ERP Data GemFire RabbitMQ Redis Pivotal CF © 2014 Pivotal Software, Inc. All rights reserved. 10
  • 11. 5. Apply Structure to Unstructured / Semi- Structured Data Flexible processing of different data types 101010101010 1 101010101010 1 101010101010 1 © 2014 Pivotal Software, Inc. All rights reserved. 11
  • 12. 6. Make Data Available for MPP SQL Analysis Name Node Fast processing for advanced analytics in many supported HDFS formats Resource Manager HAWQ Master Data Node Node Manager HAWQ Segment(s) Data Node Node Manager Data Node Node Manager Data Node Node Manager HAWQ Segment(s) HAWQ Segment(s) HAWQ Segment(s) Hadoop Cluster © 2014 Pivotal Software, Inc. All rights reserved. 12
  • 13. 7. Achieve Data Integration Create multi-dimensional analytical models. 101010101010 1 101010101010 1 101010101010 1 © 2014 Pivotal Software, Inc. All rights reserved. 13
  • 14. 8. Improve Machine Learning & Predictive Analytics Richer, deeper data sets for accurate predictive analytics. HAWQ Master HAWQ Segment(s) HAWQ Segment(s) HAWQ Segment(s) © 2014 Pivotal Software, Inc. All rights reserved. 14
  • 15. 9. Deploy Real-Time Automation at Scale Respond in real-time, at scale. Archive history in Hadoop. Pivotal GemFire XD 101010101010 Web App Web App Web App 101010101010 In-Memory © 2014 Pivotal Software, Inc. All rights reserved. 15
  • 16. 10. Achieve Continuous Innovation at Scale HAWQ Master HAWQ Segment(s) HAWQ Segment(s) HAWQ Segment(s) In-Memory Web App Web App Web App 101010101010 Sensor data CRM data Website click streams Deploy automation At scale Capture and store all data Analyze to discover insights & algorithms © 2014 Pivotal Software, Inc. All rights reserved. 16
  • 17. Increase Value Derived from Data With a Data Lake Store massive data sets Mix disparate data Ingest bulk data Ingest high-velocity data Apply structure Enable MPP analysis Achieve data integration Business Value Improve predictive analytics Deploy real-time automation at scale Achieve continuous innovation © 2014 Pivotal Software, Inc. All rights reserved. 17
  • 18. For more information on Pivotal Big Data Suite Visit Pivotal.io/big-data © 2014 Pivotal Software, Inc. All rights reserved. 18

Editor's Notes

  1. This is an architecture of a Business Data Lake. It is centered around Hadoop-based storage. It includes tools and components for ingesting data from different kinds of data sources, processing data for analytics and insights, and for supporting applications that utilize data, implement insights, and contribute data back to the data lake. In this presentation, we will look at the various components of a business data lake architecture, and show how to use it to maximize the value of your company’s data.
  2. Let’s first look at why Hadoop and HDFS for a data lake makes a lot of sense.
  3. Hadoop, and its underlying Hadoop File System, or HDFS, is a distributed file system that supports arbitrarily large clusters. This means your data storage can theoretically be as large as needed to fit your needs. You simply add more clusters as you need more space.
  4. HDFS is schema-less, which means it can support files of any type and format. This is great for storing unstructured or semi-structured data, as well as non relational data formats such as binary streams from sensors, image data, machine logging. It’s also just fine for storing for structured, relational tabular data.
  5. When your data storage can take any kind of data from any kind of source, allowing this data to be loaded and stored can be a challenge. This is why a wide selection of tools for ingest is needed to implement a data lake.
  6. Batch loading can be achieved with a variety of tools, depending on additional sources needed. Sqoop, for example, is great for handling large data batch loading, and can even pull data from legacy databases. On the other hand, if your bulk loading operation needs some additional processing on it – such as you want to transform data from one format to another or create metadata, and if you want to be able to create analytics, then another open source tool, Spring XD, is available and provides scale and flexibility to handle your specific needs. Microbatch – in other words, smaller, but recurring batch loads, such as data change deltas or event-triggered updates, is handled well by Flume.
  7. Storing high-velocity data into Hadoop is a different challenge altogether. Considering that your source could be in any volume in addition to speed. If ensuring you store all the data is paramount, you need tools that can capture and queuedata in any scale or volume until the Hadoop cluster is able to store. A data lake based on Pivotal Big Data Suite has two tools built for these use cases. In fact they can work together: Spring XD can scale to handle data streaming at real time, and provide the same capabilities of processing and analyzing. Pivotal GemFire XD can work with Spring XD to provide advanced database operations such search for duplicates in a window of time, for example, and allows you to ensure consistency of data in writes. Since it it’s a SQL-based database, it’s also great for helping convert or add structure to ingested data.
  8. Once you have the ability to store and load data into your data lake, the next is deriving business value by processing, gaining insights, and taking action on the data.
  9. It’s great that one can can get any kind of data into an HDFS data store. However, to be able to conduct advanced analytics on it, you often need to make it accessible to structured-based analysis tools. This kind of processing may involve direct transformation of file types, or it might simply mean analyzing and creating meta data about the file type. This can be done on ingest with some of the tools described, or can be processed after being stored in Hadoop. Examples might be transforming binary image formats into RDBMS tables to enable large scale image processing, or even simple ETL processes on web logs so that it can later be turned into fact tables.
  10. Once you have structure applied to your data, its possible to leverage SQL-based tools to do fast processing on your data for advanced analytics and data science. Only HAWQ provides full analytic SQL support on Hadoop in massive parallel processing. This allows you enjoy very high performance leveraging advanced analytics functions in MADlib, as well as when using analytics applications such as SAS.
  11. With structure applied to your data, and the ability to deploy advanced analytics, now you can start doing some very powerful investigation, which is actually supported by Hadoop. By discovering relationships between otherwise seemingly unrelated data sets, its possible to discover correlations and potential causation, and create multi-dimensional analytical models that have higher precision in predictive analytics.
  12. Since HDFS allows you to store as much data as you want at a very cheap price, its possible to store larger detail data sets such as time series feeds, and application logs. In traditional data warehousing, ETL processes will aggregate and summarize this information, and lose detail for purposes of facilitating reporting. By saving the detail, its possible to run machine learning algorithms on the data to help build more accurate predictive analytics.
  13. Distributed in-memory databases such as Pivotal GemFire XD make it possible to deploy real-time data-driven automation at scale. This means you can deploy applications for responding to and processing incoming streaming data such as for Internet of Things applications, or support large scale mobile-web applications. You want to create intelligent user experiences, and provide smart automation and processing in the backend. You also want to be able to capture and store detailed logging of all interactions for further analysis.
  14. The ability to deploy automation at scale, capture and store all data, and analyze to discover insights and algorithms is an ongoing process of continuous improvement and innovation.
  15. Using the full capabilities of a data lake from storing massive data sets to achieving coninuous innovation allows your company to maximize the business value it generates off its data.