SlideShare a Scribd company logo
1 of 39
© Copyright 2016 EMC Corporation. All rights reserved.
EMC EMERGING TECHNOLOGIES
ROBERT HOUT - ADVISORY SYSTEMS ENGINEER
@rob_hout
ACCELERATING ANALYTICS
VALUE WITH A DATA LAKE
STAMPEDECON 2016
What is a Data Lake?
CONNECTED PEOPLE
2.3B
7B
2015 2020
CONNECTED DEVICES
4.9B
30B
2015 2020
DATA ON PLANET
8ZB
44ZB
2015 2020
2 0 2 0 : A N E W D I G I T A L W O R L D
3X 6X 5X
Why Ingest All The Data?
69%
83%
Source: “The Business of Data” and Economist Intelligence Report, Published Jan 2016
Every Organization Can Gain Insights
60% Generating revenue
from data
Starting new BU developing
data-related products / services
Used data to make existing products /
services more profitable
Every
Organization is A
Data
Organization
The Data Lake: Bringing Compute to Data
EDWs	
Marts	 Storage	
Search	
Servers	
Documents	
Archives	
ERP,	CRM,	RDBMS,	Machines	 Files,	Images,	Video,	Logs,	Clickstreams	 External	Data	Sources	
Mul$-workload	analy$c	pla1orm	
•  Bring	applicaDons	to	data	
•  Combine	different	workloads	on		
common	data	(i.e.	SQL	+	Search)	
•  True	BI	agility	
4
1
2
1
34
Ac$ve	archive	
•  Full	fidelity	original	data	
•  Indefinite	Dme,	any	source	
•  Lowest	cost	storage	
1
Data	management,	transforma$ons	
•  One	source	of	data	for	all	analyDcs	
•  Persisted	state	of	transformed	data	
•  Significantly	faster	&	cheaper	
2
Self-service	exploratory	BI	
•  Simple	search	+	BI	tools	
•  “Schema	on	read”	agility	
•  Reduce	BI	user	backlog	requests	
3
TO SUCCEED, SIMPLIFY TECHNOLOGY SO YOU
CAN SHIFT FOCUS TO BUSINESS OUTCOMES
KEY CAPABILITIES TO LOOK FOR IN A COMPREHENSIVE BIG
DATA SOLUTION
INGEST
Capture data from
a wide range of
sources, traditional
and new
STORE
Store everything in
one environment for
cross data analysis
ANALYZE
Use advanced
algorithms to discover
new, predictive
patterns
SURFACE
Share insights
with business
domain experts
ACT
Build data-driven
applications to meet
business needs
© Copyright 2014 EMC Corporation. All rights reserved.
Why a Data Lake
It delivers comprehensive data services not a point solution
•  Our traditional IT customers solve the most pressing issues first, e.g.
building a physical Hadoop cluster
•  Customers are very good at building parts of a data lake that don’t always
align to one another
•  Customers struggle with integrating, managing, and deploying the various
platforms needed for business analytics
•  Customers have little or no overall data governance, but they need it in
order to establish a fully functional data lake
The Data lake high-level Vision
• Business-led, cross-functional, methodology focused on short, iterative release cycles
• Functional distinction between Data Preparation (IT) and Data Usage (Business)
• Enabling on-demand services - BI and Analytics sandboxes, tools, and data
Self-Service BI and Analytics
• The provisioning of data and services to the business independent of data end usage
• A key foundation for of Self-Service BI (Data Preparation)
• Services can include publication, profiling, archiving, metadata, alerts, and notifications
Data as a Service
• Alternative to traditional data warehousing focused on agility, flexibility and time to value
• Land data ‘as-is’ and transform on demand (‘schema on read’)
• Scale out architecture that is adaptive to business cost/performance constraints
Business Data Lake
Process
Technology
DataGovernance
•  Combine different
data sources
•  Minimize data
movement
•  Leverage the
Apache ecosystem
•  Evolve seamlessly
•  Serve the
Enterprise
Data Lake implementation strategy needs to…
Production
Data
Web
Logs
Public
Sales
Billing
CRM
SCM
Social
Media
Location
Click
Streams
Sensor
Data
DATA LAKE
Security
Business Continuity
Compliance
Tools & Apps
Business Units
Data Migration
PRODUCTION HADOOP HAS SEVERAL CHALLENGES
Scalability
System Availability
Uptime Downtime (per year)
99.999% (AKA 5 nines) 5.26 minutes
99.99% (AKA 4 nines) 52.6 minutes
99.5% 1.83 days
99% (AKA 2 nines) 7.30 days
95% 18.25 days
What is your Data Warehouses’ uptime SLA?
What is your Hadoop uptime SLA?
Why are they different?
•  Virtualization becoming more common
•  Enterprise data management, protection, security
•  SQL on Hadoop the norm
•  Spark exploding
–  Generally Lambda architecture, not Spark vs. M/R
•  Non-HDFS App Data Integration
–  ELK, MongoDB, Cassandra..
•  High performance/ACID/Mem DBs with HDFS Backend
•  IoT data collection considerations (HWX Onyara/NiFi)
APACHE Ecosystem Trends
Traditional Hadoop For The Data Lake?
Direct-attached storage
Stand-alone Servers
Single purpose
All commodity environment
Traditional Hadoop
Efficiency, Agility, SLAs
Rapid deployment
Purpose Built Silos
Operational Complexity
Enterprise Challenges
Reintroduces challenges that Enterprise IT solved years ago
Hadoop HAS MULTIPLE WORKLOADS
“One size fits all” approach to Hadoop Infrastructure does
not scale for diverse production workloads
Hadoop
Archive
Spark
HBase
SQL-on-
Hadoop
Hive/Tez
MapReduce
Geo-Dist
Hadoop
COLLECT, STORE, ANALYZE & USE
Traditional and Emerging Sources
Social Networks,
User Generated Content
Public records
Location DataInternet Of Things
Emerging
Enterprise File Data
Machine Data
Traditional
Video Archive
COLLECT, STORE, ANALYZE & USE
Traditional and Emerging Sources
EmergingTraditional
DAS
CLOUD
OBJECTTAPE
SAN
NAS
Isilon
Scale-Out
Data Lake
18
Data Silos vs Consolidated Data Lake
•  One
instance of
the file
services all
dependent
workloads
simultaneo
usly
FILE
19
FILE
EMC Isilon Next-Gen Access Methods
•  An access zone is:
–  A way to carve the cluster into smaller clusters
–  A way to control access based on individual authentication
–  OneFS’s Multi-Tenancy solution
NFS, SMB, HDFS and OpenStack Swift
Access Zones
Chez
NFSAccess
Zone-1
System
Zone
Access
Zone-2
Kerberos-1
Domain Controller-2
LDAP-1 NIS - 1
Group Database - 1
Kerberos-2
Domain Controller-1
Group Database - 2
Data Sharing Across Access Zones
•  Same files can be accessed by
different access zone clients
•  Best for:
–  multi-group collaboration w/
untrusted Active Directories
–  multi-group data access
governed by IP subnet
–  Hadoop analytics over multi
access zone data
•  Uniquely solve collaboration
challenge; saves time and
money
DATA LAKE (HADOOP)RDBMS
MACHINE
IOT
STATISTICAL MODELING/NLP VISUALIZATION
TRANSFORM
BI
ORGANIZE MANAGE/
CATALOG
DATA WAREHOUSESTREAM
CEP
NEAR
REAL-TIME
MODELS MAY TAKE HOUR OR DAYS
QUERIES MAY RETURN IN SECONDS OR MINUTES
SECONDS
SEARCH/INDEX
ENTERPRISE LOG ANALYSIS
APPLICATIONS
3rd PARTY
EMAIL
SOCIAL
MEDIA
SQL ON HADOOP
THE BIG DATA LANDSCAPE
DATA LAKE (HADOOP)RDBMS
MACHINE
IOT
STATISTICAL MODELING/NLP VISUALIZATION
TRANSFORM
BI
ORGANIZE MANAGE/
CATALOG
DATA WAREHOUSESTREAM
CEP
NEAR
REAL-TIME
MODELS MAY TAKE HOUR OR DAYS
QUERIES MAY RETURN IN SECONDS OR MINUTES
SECONDS
SEARCH/INDEX
ENTERPRISE LOG ANALYSIS
APPLICATIONS
3rd PARTY
EMAIL
SOCIAL
MEDIA
SQL ON HADOOP
THE BIG DATA LANDSCAPE
A Next Gen Data Lake Architecture
Clickstream	
Web	&	Social	
Geoloca$on	
Sensor	&	Machine	
Server		Logs	
EXISTINGSOURCES
ERP	
CRM	
Commodity Compute
DATA		
SERVICES	
OPERATIONAL	
SERVICES	
Hadoop	Pla1orm	
HADOOP	CORE	
Business
Analytics
Business
Analytics
Visualization
& Dashboards
Visualization
& Dashboards
IT
Applications
NEWSOURCES
2
3
1
Data
Marts
Data
Management
ETL/ELT OFFLOAD
ACTIVE ARCHIVE
ENRICH WITH NEW
DATA TYPES
MULTI-PROTOCOL
ACCESS
ENTERPRISE-GRADE
DATA MANAGEMENT
5
NFS, SMB,
HTTP, Swift
1
2
3
4
5Isilon
4
New Data Flow
Current Data
Flow
Legend
OFFLOAD
© Copyright 2014 EMC Corporation. All rights reserved.
The Data Lake Vision
Storage Layer
Data Store Manager
3rd Party
INGEST
MANAGER
STREAM
Exploratory Analytics
Isilon XtremIO ECS/ViPR DSSD
DATA
GOVERNOR
SECURITY
INDEXING
CATALOGING
POLICY
Modeling
Correlations
SQL NSQL BATCH
Interactive Analytics
Aggregates
OLAP
SQL NSQL
Realtime Analytics
Modeling
Scoring
SQL NSQL In MEM
Shared Store(s) Private Store(s)
FILE COLUMN DB RELATIONAL DB GRAPH DB KEY VALUE DOCUMENT
LOGS
FILE
BATCH
SQL
ETL
MARKETPLACE MANAGERDATA SERVICES PORTAL
VNX
APPLICATIONS USERS
Analytics Platform ManagerVMware Openstack Docker Evo:Rails
© Copyright 2014 EMC Corporation. All rights reserved.
Ingest Manager
Rapid collection of data from unlimited sources
Application Services Portal User Services Portal
Application Platform Manager
Data Ingest Manager
Ingest Application Provisioning
Ingest Management and Control
Catalog Connector Locality Manager
Indexing Connector Security Manager
Data Governor
© Copyright 2014 EMC Corporation. All rights reserved.
The Data Governor
Enabling comprehensive data management
Data Catalog Management
Security Management
Security and Roles
LDAP AD BUILT-IN
Policy Management
Data Types
Shared Private
Data Sources
Consumer Access Rights
Compliant Data Sets
Encryption/Location Reqs.
Lineage Requirements
Index Management
Licensing
Resource Policies
Usage Limits
Index Management
Index Type
Index Usage
Indexing Resources
Index Engine(s)
Data Catalog Types
Public Private
Catalog Type
Catalog Usage
Catalog Resources
Catalog Engine(s)
Catalog Security and Roles
Catalog Operations
Collection Scavenging
© Copyright 2014 EMC Corporation. All rights reserved.
Data Store Manager
Manage and provision storage for a variety of uses
Application Platform Manager
Data Store Manager
Storage Manager
Shared Stores
Data
GovernorPrivate Stores Compliant Stores Temporary Stores
3rd PartyIsilon XtremIO ECS/ViPR DSSDVNX
Storage Configuration and Provisioning Manager
© Copyright 2014 EMC Corporation. All rights reserved.
Application Provisioning
Application Platform Manager
Rapidly and seamlessly deploy applications and resources
Application Services Portal User Services Portal
Application Platform ManagerApplication Platform Manager
Application Provisioning
Compute Resource Provisioning
Platform Provisioning
VM Workflows
Provisioning Rules
Networking
Optimizations
Application Deployment Management
Package Manager App Store Manager
Data Store Manager
Data Governor
© Copyright 2014 EMC Corporation. All rights reserved.
An example analytic workflow
From idea to action using the BDL
Application Provisioning
User Services Portal
Application Platform Manager
Platform Provisioning
Data Store Manager
Data Governor
Data Catalog Management
Security Management
Policy Management
Index Management
Optimization Engine
Recommendation Engine
WOULD YOU RATHER
INTEGRATEOR
INNOVATE?
THE DATA LAKE
One Customer’s Journey with Hadoop
Use Cases & Requirements
• As we evaluated business use cases that would support it was determined that
we had a variety of workloads with different impacts to the platform
Use Cases
•  Enterprise Data Hub that can consolidate
disparate data sources to a common
platform (i.e. data types)
•  Migrate Enterprise Data Warehouse (EDW)
transient data to lower cost storage
platform
•  Enable data enrichment services to enable
in-record validation, data standardization
and analytic processing
•  Integrate and provision data to target
systems using Hadoop ecosystem
components (i.e. Pig, Hive)
Requirements
•  Ensure that the platform meets both
availability and recoverability targets
•  Align technology to internal skills and
competencies
•  Enable existing systems to interoperate with
the platform using native protocols or
services
•  Ability to test and certify commercial
products via a multi-distribution
environment
•  Enable co-resident processing of products to
optimize the use of deployed infrastructure
•  Ability to provide data protection and
isolation of client data within a single
instance of the platform (i.e. sub-tenancy
Solution Approach
Support
for a
variety of
acquisition
channels
•  3
Common
method for
data types
and
formats
Orchestration
framework that
manages all job
execution
Includes
capabilities around
data catalog, file
validation, and
schema evolution
Data integration
and provisioning
framework
Support for relational
stores and exploration
tools
Platform Approach
As we better defined and understood use cases and requirements,
it led us down a different path from a platform perspective
Data
Warehouse
Offload
Data
Integration
Enterprise
Data
Hub
Enrichment
Validation
and Quality
ü The ability to independently scale storage
and compute
ü Provide data protection for critical business
information
ü Support backup and disaster recovery
ü Centrally managed via intuitive user interface
ü Leverage existing assets deployed in the
enterprise
Example: Multi-protocol Support
•  3
One of our deployed use cases is multi-protocol support. This
enables us to leverage existing assets and talent in the enterprise
but can still leverage the compute paradigm of Hadoop
Example: Multi-distribution Support
•  37
Our organization sells a number of products to the market. Many of
these deployments are on-premise due to concerns around data
privacy or control, data transfer considerations, etc. To support this
need a multi-distribution platform was needed that could be used
for product certification across similar data sets
The Isilon Advantage for Hadoop
In-place analytics
•  No data ingest necessary, Isilon provides shared multi-protocol access
•  Native integration speeds time to insight
Enterprise data protection
•  Fast snapshots, backup, and data recovery
•  Simple, efficient data replication for disaster recovery
Lower costs
•  Eliminates the need for dedicated Hadoop infrastructure
•  Eliminates 3x mirroring for data protection
•  Much more efficient than DAS-based approach
Increase flexibility
•  Simultaneous support for any Apache-compliant Hadoop distribution
•  Collaborative engineering efforts with Cloudera, Hortonworks, and Pivotal
•  Ambari integration for management, monitoring, and provisioning
Scale-out storage with native Hadoop integration
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016

More Related Content

What's hot

Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksAmazon Web Services
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Lviv Startup Club
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS CloudIdan Tohami
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
 

What's hot (20)

Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 

Similar to Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016

The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)Cloudera, Inc.
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data IntegrationJeffrey T. Pollock
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockJeffrey T. Pollock
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationEric Kavanagh
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsDataWorks Summit
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 

Similar to Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016 (20)

The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
 

Recently uploaded

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 

Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016

  • 1. © Copyright 2016 EMC Corporation. All rights reserved. EMC EMERGING TECHNOLOGIES ROBERT HOUT - ADVISORY SYSTEMS ENGINEER @rob_hout ACCELERATING ANALYTICS VALUE WITH A DATA LAKE STAMPEDECON 2016
  • 2. What is a Data Lake?
  • 3. CONNECTED PEOPLE 2.3B 7B 2015 2020 CONNECTED DEVICES 4.9B 30B 2015 2020 DATA ON PLANET 8ZB 44ZB 2015 2020 2 0 2 0 : A N E W D I G I T A L W O R L D 3X 6X 5X
  • 4. Why Ingest All The Data?
  • 5. 69% 83% Source: “The Business of Data” and Economist Intelligence Report, Published Jan 2016 Every Organization Can Gain Insights 60% Generating revenue from data Starting new BU developing data-related products / services Used data to make existing products / services more profitable Every Organization is A Data Organization
  • 6. The Data Lake: Bringing Compute to Data EDWs Marts Storage Search Servers Documents Archives ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources Mul$-workload analy$c pla1orm •  Bring applicaDons to data •  Combine different workloads on common data (i.e. SQL + Search) •  True BI agility 4 1 2 1 34 Ac$ve archive •  Full fidelity original data •  Indefinite Dme, any source •  Lowest cost storage 1 Data management, transforma$ons •  One source of data for all analyDcs •  Persisted state of transformed data •  Significantly faster & cheaper 2 Self-service exploratory BI •  Simple search + BI tools •  “Schema on read” agility •  Reduce BI user backlog requests 3
  • 7. TO SUCCEED, SIMPLIFY TECHNOLOGY SO YOU CAN SHIFT FOCUS TO BUSINESS OUTCOMES KEY CAPABILITIES TO LOOK FOR IN A COMPREHENSIVE BIG DATA SOLUTION INGEST Capture data from a wide range of sources, traditional and new STORE Store everything in one environment for cross data analysis ANALYZE Use advanced algorithms to discover new, predictive patterns SURFACE Share insights with business domain experts ACT Build data-driven applications to meet business needs
  • 8. © Copyright 2014 EMC Corporation. All rights reserved. Why a Data Lake It delivers comprehensive data services not a point solution •  Our traditional IT customers solve the most pressing issues first, e.g. building a physical Hadoop cluster •  Customers are very good at building parts of a data lake that don’t always align to one another •  Customers struggle with integrating, managing, and deploying the various platforms needed for business analytics •  Customers have little or no overall data governance, but they need it in order to establish a fully functional data lake
  • 9. The Data lake high-level Vision • Business-led, cross-functional, methodology focused on short, iterative release cycles • Functional distinction between Data Preparation (IT) and Data Usage (Business) • Enabling on-demand services - BI and Analytics sandboxes, tools, and data Self-Service BI and Analytics • The provisioning of data and services to the business independent of data end usage • A key foundation for of Self-Service BI (Data Preparation) • Services can include publication, profiling, archiving, metadata, alerts, and notifications Data as a Service • Alternative to traditional data warehousing focused on agility, flexibility and time to value • Land data ‘as-is’ and transform on demand (‘schema on read’) • Scale out architecture that is adaptive to business cost/performance constraints Business Data Lake Process Technology DataGovernance
  • 10. •  Combine different data sources •  Minimize data movement •  Leverage the Apache ecosystem •  Evolve seamlessly •  Serve the Enterprise Data Lake implementation strategy needs to… Production Data Web Logs Public Sales Billing CRM SCM Social Media Location Click Streams Sensor Data DATA LAKE
  • 11. Security Business Continuity Compliance Tools & Apps Business Units Data Migration PRODUCTION HADOOP HAS SEVERAL CHALLENGES Scalability
  • 12. System Availability Uptime Downtime (per year) 99.999% (AKA 5 nines) 5.26 minutes 99.99% (AKA 4 nines) 52.6 minutes 99.5% 1.83 days 99% (AKA 2 nines) 7.30 days 95% 18.25 days What is your Data Warehouses’ uptime SLA? What is your Hadoop uptime SLA? Why are they different?
  • 13. •  Virtualization becoming more common •  Enterprise data management, protection, security •  SQL on Hadoop the norm •  Spark exploding –  Generally Lambda architecture, not Spark vs. M/R •  Non-HDFS App Data Integration –  ELK, MongoDB, Cassandra.. •  High performance/ACID/Mem DBs with HDFS Backend •  IoT data collection considerations (HWX Onyara/NiFi) APACHE Ecosystem Trends
  • 14. Traditional Hadoop For The Data Lake? Direct-attached storage Stand-alone Servers Single purpose All commodity environment Traditional Hadoop Efficiency, Agility, SLAs Rapid deployment Purpose Built Silos Operational Complexity Enterprise Challenges Reintroduces challenges that Enterprise IT solved years ago
  • 15. Hadoop HAS MULTIPLE WORKLOADS “One size fits all” approach to Hadoop Infrastructure does not scale for diverse production workloads Hadoop Archive Spark HBase SQL-on- Hadoop Hive/Tez MapReduce Geo-Dist Hadoop
  • 16. COLLECT, STORE, ANALYZE & USE Traditional and Emerging Sources Social Networks, User Generated Content Public records Location DataInternet Of Things Emerging Enterprise File Data Machine Data Traditional Video Archive
  • 17. COLLECT, STORE, ANALYZE & USE Traditional and Emerging Sources EmergingTraditional DAS CLOUD OBJECTTAPE SAN NAS
  • 18. Isilon Scale-Out Data Lake 18 Data Silos vs Consolidated Data Lake
  • 19. •  One instance of the file services all dependent workloads simultaneo usly FILE 19 FILE EMC Isilon Next-Gen Access Methods
  • 20. •  An access zone is: –  A way to carve the cluster into smaller clusters –  A way to control access based on individual authentication –  OneFS’s Multi-Tenancy solution NFS, SMB, HDFS and OpenStack Swift Access Zones Chez NFSAccess Zone-1 System Zone Access Zone-2 Kerberos-1 Domain Controller-2 LDAP-1 NIS - 1 Group Database - 1 Kerberos-2 Domain Controller-1 Group Database - 2
  • 21. Data Sharing Across Access Zones •  Same files can be accessed by different access zone clients •  Best for: –  multi-group collaboration w/ untrusted Active Directories –  multi-group data access governed by IP subnet –  Hadoop analytics over multi access zone data •  Uniquely solve collaboration challenge; saves time and money
  • 22. DATA LAKE (HADOOP)RDBMS MACHINE IOT STATISTICAL MODELING/NLP VISUALIZATION TRANSFORM BI ORGANIZE MANAGE/ CATALOG DATA WAREHOUSESTREAM CEP NEAR REAL-TIME MODELS MAY TAKE HOUR OR DAYS QUERIES MAY RETURN IN SECONDS OR MINUTES SECONDS SEARCH/INDEX ENTERPRISE LOG ANALYSIS APPLICATIONS 3rd PARTY EMAIL SOCIAL MEDIA SQL ON HADOOP THE BIG DATA LANDSCAPE
  • 23. DATA LAKE (HADOOP)RDBMS MACHINE IOT STATISTICAL MODELING/NLP VISUALIZATION TRANSFORM BI ORGANIZE MANAGE/ CATALOG DATA WAREHOUSESTREAM CEP NEAR REAL-TIME MODELS MAY TAKE HOUR OR DAYS QUERIES MAY RETURN IN SECONDS OR MINUTES SECONDS SEARCH/INDEX ENTERPRISE LOG ANALYSIS APPLICATIONS 3rd PARTY EMAIL SOCIAL MEDIA SQL ON HADOOP THE BIG DATA LANDSCAPE
  • 24. A Next Gen Data Lake Architecture Clickstream Web & Social Geoloca$on Sensor & Machine Server Logs EXISTINGSOURCES ERP CRM Commodity Compute DATA SERVICES OPERATIONAL SERVICES Hadoop Pla1orm HADOOP CORE Business Analytics Business Analytics Visualization & Dashboards Visualization & Dashboards IT Applications NEWSOURCES 2 3 1 Data Marts Data Management ETL/ELT OFFLOAD ACTIVE ARCHIVE ENRICH WITH NEW DATA TYPES MULTI-PROTOCOL ACCESS ENTERPRISE-GRADE DATA MANAGEMENT 5 NFS, SMB, HTTP, Swift 1 2 3 4 5Isilon 4 New Data Flow Current Data Flow Legend OFFLOAD
  • 25. © Copyright 2014 EMC Corporation. All rights reserved. The Data Lake Vision Storage Layer Data Store Manager 3rd Party INGEST MANAGER STREAM Exploratory Analytics Isilon XtremIO ECS/ViPR DSSD DATA GOVERNOR SECURITY INDEXING CATALOGING POLICY Modeling Correlations SQL NSQL BATCH Interactive Analytics Aggregates OLAP SQL NSQL Realtime Analytics Modeling Scoring SQL NSQL In MEM Shared Store(s) Private Store(s) FILE COLUMN DB RELATIONAL DB GRAPH DB KEY VALUE DOCUMENT LOGS FILE BATCH SQL ETL MARKETPLACE MANAGERDATA SERVICES PORTAL VNX APPLICATIONS USERS Analytics Platform ManagerVMware Openstack Docker Evo:Rails
  • 26. © Copyright 2014 EMC Corporation. All rights reserved. Ingest Manager Rapid collection of data from unlimited sources Application Services Portal User Services Portal Application Platform Manager Data Ingest Manager Ingest Application Provisioning Ingest Management and Control Catalog Connector Locality Manager Indexing Connector Security Manager Data Governor
  • 27. © Copyright 2014 EMC Corporation. All rights reserved. The Data Governor Enabling comprehensive data management Data Catalog Management Security Management Security and Roles LDAP AD BUILT-IN Policy Management Data Types Shared Private Data Sources Consumer Access Rights Compliant Data Sets Encryption/Location Reqs. Lineage Requirements Index Management Licensing Resource Policies Usage Limits Index Management Index Type Index Usage Indexing Resources Index Engine(s) Data Catalog Types Public Private Catalog Type Catalog Usage Catalog Resources Catalog Engine(s) Catalog Security and Roles Catalog Operations Collection Scavenging
  • 28. © Copyright 2014 EMC Corporation. All rights reserved. Data Store Manager Manage and provision storage for a variety of uses Application Platform Manager Data Store Manager Storage Manager Shared Stores Data GovernorPrivate Stores Compliant Stores Temporary Stores 3rd PartyIsilon XtremIO ECS/ViPR DSSDVNX Storage Configuration and Provisioning Manager
  • 29. © Copyright 2014 EMC Corporation. All rights reserved. Application Provisioning Application Platform Manager Rapidly and seamlessly deploy applications and resources Application Services Portal User Services Portal Application Platform ManagerApplication Platform Manager Application Provisioning Compute Resource Provisioning Platform Provisioning VM Workflows Provisioning Rules Networking Optimizations Application Deployment Management Package Manager App Store Manager Data Store Manager Data Governor
  • 30. © Copyright 2014 EMC Corporation. All rights reserved. An example analytic workflow From idea to action using the BDL Application Provisioning User Services Portal Application Platform Manager Platform Provisioning Data Store Manager Data Governor Data Catalog Management Security Management Policy Management Index Management Optimization Engine Recommendation Engine
  • 32. THE DATA LAKE One Customer’s Journey with Hadoop
  • 33. Use Cases & Requirements • As we evaluated business use cases that would support it was determined that we had a variety of workloads with different impacts to the platform Use Cases •  Enterprise Data Hub that can consolidate disparate data sources to a common platform (i.e. data types) •  Migrate Enterprise Data Warehouse (EDW) transient data to lower cost storage platform •  Enable data enrichment services to enable in-record validation, data standardization and analytic processing •  Integrate and provision data to target systems using Hadoop ecosystem components (i.e. Pig, Hive) Requirements •  Ensure that the platform meets both availability and recoverability targets •  Align technology to internal skills and competencies •  Enable existing systems to interoperate with the platform using native protocols or services •  Ability to test and certify commercial products via a multi-distribution environment •  Enable co-resident processing of products to optimize the use of deployed infrastructure •  Ability to provide data protection and isolation of client data within a single instance of the platform (i.e. sub-tenancy
  • 34. Solution Approach Support for a variety of acquisition channels •  3 Common method for data types and formats Orchestration framework that manages all job execution Includes capabilities around data catalog, file validation, and schema evolution Data integration and provisioning framework Support for relational stores and exploration tools
  • 35. Platform Approach As we better defined and understood use cases and requirements, it led us down a different path from a platform perspective Data Warehouse Offload Data Integration Enterprise Data Hub Enrichment Validation and Quality ü The ability to independently scale storage and compute ü Provide data protection for critical business information ü Support backup and disaster recovery ü Centrally managed via intuitive user interface ü Leverage existing assets deployed in the enterprise
  • 36. Example: Multi-protocol Support •  3 One of our deployed use cases is multi-protocol support. This enables us to leverage existing assets and talent in the enterprise but can still leverage the compute paradigm of Hadoop
  • 37. Example: Multi-distribution Support •  37 Our organization sells a number of products to the market. Many of these deployments are on-premise due to concerns around data privacy or control, data transfer considerations, etc. To support this need a multi-distribution platform was needed that could be used for product certification across similar data sets
  • 38. The Isilon Advantage for Hadoop In-place analytics •  No data ingest necessary, Isilon provides shared multi-protocol access •  Native integration speeds time to insight Enterprise data protection •  Fast snapshots, backup, and data recovery •  Simple, efficient data replication for disaster recovery Lower costs •  Eliminates the need for dedicated Hadoop infrastructure •  Eliminates 3x mirroring for data protection •  Much more efficient than DAS-based approach Increase flexibility •  Simultaneous support for any Apache-compliant Hadoop distribution •  Collaborative engineering efforts with Cloudera, Hortonworks, and Pivotal •  Ambari integration for management, monitoring, and provisioning Scale-out storage with native Hadoop integration