SlideShare a Scribd company logo
1 of 18
© 2015 Impetus Technologies1
Recorded version available at http://lf1.me/wNb/
We Implement Big Data
Webinar: Implementing the Enterprise Big Data Lake: Challenges, Strategies, Maximizing
Benefits
Vineet Tyagi- CTO and Head of Labs, Impetus Technologies
Larry Pearson- Vice President, Marketing, Impetus Technologies
© 2015 Impetus Technologies2
Recorded version available at http://lf1.me/wNb/
Agenda
• Overview
• What is a Data Lake?
• Drivers for Data Lake implementation
• Building a Data Lake
• Challenges of Data Lake implementation
• Strategies of building a Data Lake
• Q & A
© 2015 Impetus Technologies3
Recorded version available at http://lf1.me/wNb/
Enterprise Data Warehouse – Current
Environment
Optimize Existing DW/BI Infrastructure or
Create New Capabilities
• Handle Big Data and the 3 V’s
• Volume, Variety, Velocity
• Integrate Multiple Data Silos
• ERP, CRM, HRM and others
• Reduce Cost
― ETL process
― Analytical process
― Mainframe process
― Cloud feasibility for data analytics
• Applying Science
• Unstructured data for enhancing analytics
• Data Science for advanced analytics
• Reduce Time to Market by Faster Processing/ Analytics
© 2015 Impetus Technologies4
Recorded version available at http://lf1.me/wNb/
What is a Data Lake ?
A massive, easily accessible, flexible and scalable data repository
• Built on inexpensive computer hardware
• Designed for storing uncategorized pools of data “as is”, including the following:
– Data immediately of interest
– Data potentially of interest
– Data for which the intended usage is not yet known
© 2015 Impetus Technologies5
Recorded version available at http://lf1.me/wNb/
What Capabilities Does the Data Lake
Bring?
Active
Archive
Self Service
Exploratory BI
Advanced Analytics at Scale
(Moving from Analyst Intuition
to Empirical Insights)
Lower cost of
transformation
A Data Lake brings newer capabilities and insights to business users which
include but not limited to the following:
© 2015 Impetus Technologies6
Recorded version available at http://lf1.me/wNb/
Data Lake Architecture Benefits
• Allows organization to create the
"Adjunct" to the EDW
― Offload relatively colder data and
workloads
― Support for unstructured data
― Create a cultural shift towards
"democratizing data access“
• Acquire a capability of running
workloads based on cost /performance
© 2015 Impetus Technologies7
Recorded version available at http://lf1.me/wNb/
Drivers for Data Lake Architecture
• Nature of the Data
― How much unstructured v/s structured data do you use for insights
• Level of unification of Data
― Time analysis of information on source data, delta data detection
• Encourage experimentation
― Creation of Point Solutions by Line of Business
• Moving from “analyst intuition” & statistics to empirical data science driven
insights
• Scale is driven by demand
© 2015 Impetus Technologies8
Recorded version available at http://lf1.me/wNb/
Building a Data Lake
Design
Principle
s
Discovery
without
limitations
Low latency
at any scale
Reactive to
predictive
Affordable
unlimited
scale
Elasticity in
infrastructu
re
© 2015 Impetus Technologies9
Recorded version available at http://lf1.me/wNb/
Data Lake (Big Data PaaS)
Data Servicing
Relational Data
(PostgreSQL,
Oracle, DB2, SQL
Server…)
Flat
Files/XML/JSON/
CSV
Existing Systems
(ARS, PLC,
Cimplicity, Active
Plant)
Data Sources
Machine Data
Data Ingestion
Streaming
Kafka/Flume
Sqoop/
Connectors
Existing DI
Tools
REST
JDBC
SOAP
Custom
Data Processing
Data
Curation
Indexing
Data
Governance
Data Quality
Data
Classification
Information
Policy
Lifecycle
Management
Hive/Pig/Impala/Drill/Spark
SQL
Query engines
Data
Store
Virtualization
Search
Federation
Access
Delivery
HA
Provision
Security
Monitoring
Business
Intelligence
Machine Data
Analysis
Predictive &
Statistical
Analytics
Data Discovery
Visualization &
Reporting
Reference Big Data Architecture
© 2015 Impetus Technologies10
Recorded version available at http://lf1.me/wNb/
Patterns for Implementing a Data Lake
Data Reservoir Support Iterative
Investigation
Drive Analytical
Applications
An comprehensive Data Lake strategy requires effective
implementation patterns to be in place for systems and
process.
© 2015 Impetus Technologies11
Recorded version available at http://lf1.me/wNb/
Building a Data Lake – Stages of Evolution
Staged Approach to Data Lake Roll-out
Handle and ingest
data at scale
Stage 1
Building the analytical
muscle, laying data
pipelines, monitoring,
supporting use cases
Stage 2
Operational Impact -
have EDW and Data
Lake work in unison
Stage 3
Enterprise capability
in the lake
Stage 4
© 2015 Impetus Technologies12
Recorded version available at http://lf1.me/wNb/
Stage 1: Handle and Ingest Data at Scale
Landing and
ingestion
Structured
Unstructured
External
Social
Machine
Geospatial
Time Series
Streaming
Enterprise
Data Lake
The organization needs to determine the existing and new data source that it can
leverage. The data sources are integrated and the variety of voluminous data is
ingested at high velocity in Hadoop storage.
© 2015 Impetus Technologies13
Recorded version available at http://lf1.me/wNb/
Stage 2: Building the Analytical Muscle
Landing and
ingestion
Structured
Unstructured
External
Social
Machine
Geospatial
Time Series
Streaming
Provisioning, Workflow, Monitoring and Security
Enterpris
e
Data
Lake
Predictive
applications
Exploration &
discovery
Enterprise
applications
Real-Time applications
Leveraging the enterprise Data Lake in Hadoop, the organization builds batch,
mini-batch and real time applications for enterprise usage, exploratory
analytics and predictive use cases. Various tools and frameworks are utilized
in this stage.
© 2015 Impetus Technologies14
Recorded version available at http://lf1.me/wNb/
Stage 3: EDW and Data Lake Work in
Unison
The enterprise data warehouse (EDW) and Hadoop based Big Data Lake would
co-exist to allow the enterprise to leverage the strengths of each architecture.
Landing and
ingestion
Structured
Unstructured
External
Social
Machine
Geospatial
Time Series
Streaming
Provisioning, Workflow, Monitoring and Security
Enterpris
e
Data
Lake
Predictive
applications
Exploration &
discovery
Enterprise
applications
Real-Time applications
Traditional
data
repositories
RDBMS MPP
© 2015 Impetus Technologies15
Recorded version available at http://lf1.me/wNb/
The Data Lake DILEMMA
Data
• Ingestion and
Storage
• Governance
• Security &
Compliance
Information
Lifecycle
Management
• Lineage
Enterprise
Metadata
Management
• Meta data
discovery
• Ontology
Access
• Query
Performance
• Search data
D IL
EM
M A
Effective use of the Data Lake as a true enterprise data reservoir introduces new
challenges. We call these the Data Lake “DILEMMA”. Addressing these will help
avoid turning the lake into a “data swamp” and inhibit or slow enterprise adoption.
© 2015 Impetus Technologies16
Recorded version available at http://lf1.me/wNb/
Stage 4: Enterprise Capability in the Lake
Broad adoption of unified Data Lake architectures, will require information
governance, meta data management and information lifecycle management
capabilities.
Landing and
ingestion
Structured
Unstructured
External
Social
Machine
Geospatial
Time Series
Streaming
Provisioning, Workflow, Monitoring and Security
Enterpris
e
Data
Lake
Predictive
applications
Exploration &
discovery
Enterprise
applications
Real-Time applications
Traditional
data
repositories
RDBMS MPP
Governance, Information Lifecycle, Enterprise Meta
Data Management
© 2015 Impetus Technologies17
Recorded version available at http://lf1.me/wNb/
Summary
• Hadoop based Big Data architectures have changed the face of
the Data Warehouse/BI/Analytics world forever.
• Enterprise adoption of Big Data architectures is accelerating as a
way to enable broad new opportunities across all industries.
• There is a growing acceptance of the concept of a “Data Lake” as
a cornerstone component of an enterprise Big Data strategy.
• “Big Data Warehouse” architectures will complement rather than
replace the enterprise data warehouses of today.
• Models and approaches are emerging to address the enterprise
class DILEMMA related to security, governance and operations.
• There are defined best practices and roadmaps for implementing
an enterprise Data Lake architecture.
© 2015 Impetus Technologies18
Recorded version available at http://lf1.me/wNb/
Q&A
(Use the chat/Q&A panel)
For general inquiries about our services and solutions reach us at
bigdata@impetus.com
?
Follow us on Twitter- @impetustech

More Related Content

More from Impetus Technologies

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Impetus Technologies
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarImpetus Technologies
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarImpetus Technologies
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Impetus Technologies
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in ElasticsearchImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Impetus Technologies
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Impetus Technologies
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...Impetus Technologies
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastImpetus Technologies
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Impetus Technologies
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Impetus Technologies
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabImpetus Technologies
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trendsImpetus Technologies
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...Impetus Technologies
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastImpetus Technologies
 

More from Impetus Technologies (20)

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus Webinar
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus Webcast
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trends
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 

Recently uploaded

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Enterprise Big Data Lake: Challenges, Strategies, Maximizing Benefits - Impetus Webinar

  • 1. © 2015 Impetus Technologies1 Recorded version available at http://lf1.me/wNb/ We Implement Big Data Webinar: Implementing the Enterprise Big Data Lake: Challenges, Strategies, Maximizing Benefits Vineet Tyagi- CTO and Head of Labs, Impetus Technologies Larry Pearson- Vice President, Marketing, Impetus Technologies
  • 2. © 2015 Impetus Technologies2 Recorded version available at http://lf1.me/wNb/ Agenda • Overview • What is a Data Lake? • Drivers for Data Lake implementation • Building a Data Lake • Challenges of Data Lake implementation • Strategies of building a Data Lake • Q & A
  • 3. © 2015 Impetus Technologies3 Recorded version available at http://lf1.me/wNb/ Enterprise Data Warehouse – Current Environment Optimize Existing DW/BI Infrastructure or Create New Capabilities • Handle Big Data and the 3 V’s • Volume, Variety, Velocity • Integrate Multiple Data Silos • ERP, CRM, HRM and others • Reduce Cost ― ETL process ― Analytical process ― Mainframe process ― Cloud feasibility for data analytics • Applying Science • Unstructured data for enhancing analytics • Data Science for advanced analytics • Reduce Time to Market by Faster Processing/ Analytics
  • 4. © 2015 Impetus Technologies4 Recorded version available at http://lf1.me/wNb/ What is a Data Lake ? A massive, easily accessible, flexible and scalable data repository • Built on inexpensive computer hardware • Designed for storing uncategorized pools of data “as is”, including the following: – Data immediately of interest – Data potentially of interest – Data for which the intended usage is not yet known
  • 5. © 2015 Impetus Technologies5 Recorded version available at http://lf1.me/wNb/ What Capabilities Does the Data Lake Bring? Active Archive Self Service Exploratory BI Advanced Analytics at Scale (Moving from Analyst Intuition to Empirical Insights) Lower cost of transformation A Data Lake brings newer capabilities and insights to business users which include but not limited to the following:
  • 6. © 2015 Impetus Technologies6 Recorded version available at http://lf1.me/wNb/ Data Lake Architecture Benefits • Allows organization to create the "Adjunct" to the EDW ― Offload relatively colder data and workloads ― Support for unstructured data ― Create a cultural shift towards "democratizing data access“ • Acquire a capability of running workloads based on cost /performance
  • 7. © 2015 Impetus Technologies7 Recorded version available at http://lf1.me/wNb/ Drivers for Data Lake Architecture • Nature of the Data ― How much unstructured v/s structured data do you use for insights • Level of unification of Data ― Time analysis of information on source data, delta data detection • Encourage experimentation ― Creation of Point Solutions by Line of Business • Moving from “analyst intuition” & statistics to empirical data science driven insights • Scale is driven by demand
  • 8. © 2015 Impetus Technologies8 Recorded version available at http://lf1.me/wNb/ Building a Data Lake Design Principle s Discovery without limitations Low latency at any scale Reactive to predictive Affordable unlimited scale Elasticity in infrastructu re
  • 9. © 2015 Impetus Technologies9 Recorded version available at http://lf1.me/wNb/ Data Lake (Big Data PaaS) Data Servicing Relational Data (PostgreSQL, Oracle, DB2, SQL Server…) Flat Files/XML/JSON/ CSV Existing Systems (ARS, PLC, Cimplicity, Active Plant) Data Sources Machine Data Data Ingestion Streaming Kafka/Flume Sqoop/ Connectors Existing DI Tools REST JDBC SOAP Custom Data Processing Data Curation Indexing Data Governance Data Quality Data Classification Information Policy Lifecycle Management Hive/Pig/Impala/Drill/Spark SQL Query engines Data Store Virtualization Search Federation Access Delivery HA Provision Security Monitoring Business Intelligence Machine Data Analysis Predictive & Statistical Analytics Data Discovery Visualization & Reporting Reference Big Data Architecture
  • 10. © 2015 Impetus Technologies10 Recorded version available at http://lf1.me/wNb/ Patterns for Implementing a Data Lake Data Reservoir Support Iterative Investigation Drive Analytical Applications An comprehensive Data Lake strategy requires effective implementation patterns to be in place for systems and process.
  • 11. © 2015 Impetus Technologies11 Recorded version available at http://lf1.me/wNb/ Building a Data Lake – Stages of Evolution Staged Approach to Data Lake Roll-out Handle and ingest data at scale Stage 1 Building the analytical muscle, laying data pipelines, monitoring, supporting use cases Stage 2 Operational Impact - have EDW and Data Lake work in unison Stage 3 Enterprise capability in the lake Stage 4
  • 12. © 2015 Impetus Technologies12 Recorded version available at http://lf1.me/wNb/ Stage 1: Handle and Ingest Data at Scale Landing and ingestion Structured Unstructured External Social Machine Geospatial Time Series Streaming Enterprise Data Lake The organization needs to determine the existing and new data source that it can leverage. The data sources are integrated and the variety of voluminous data is ingested at high velocity in Hadoop storage.
  • 13. © 2015 Impetus Technologies13 Recorded version available at http://lf1.me/wNb/ Stage 2: Building the Analytical Muscle Landing and ingestion Structured Unstructured External Social Machine Geospatial Time Series Streaming Provisioning, Workflow, Monitoring and Security Enterpris e Data Lake Predictive applications Exploration & discovery Enterprise applications Real-Time applications Leveraging the enterprise Data Lake in Hadoop, the organization builds batch, mini-batch and real time applications for enterprise usage, exploratory analytics and predictive use cases. Various tools and frameworks are utilized in this stage.
  • 14. © 2015 Impetus Technologies14 Recorded version available at http://lf1.me/wNb/ Stage 3: EDW and Data Lake Work in Unison The enterprise data warehouse (EDW) and Hadoop based Big Data Lake would co-exist to allow the enterprise to leverage the strengths of each architecture. Landing and ingestion Structured Unstructured External Social Machine Geospatial Time Series Streaming Provisioning, Workflow, Monitoring and Security Enterpris e Data Lake Predictive applications Exploration & discovery Enterprise applications Real-Time applications Traditional data repositories RDBMS MPP
  • 15. © 2015 Impetus Technologies15 Recorded version available at http://lf1.me/wNb/ The Data Lake DILEMMA Data • Ingestion and Storage • Governance • Security & Compliance Information Lifecycle Management • Lineage Enterprise Metadata Management • Meta data discovery • Ontology Access • Query Performance • Search data D IL EM M A Effective use of the Data Lake as a true enterprise data reservoir introduces new challenges. We call these the Data Lake “DILEMMA”. Addressing these will help avoid turning the lake into a “data swamp” and inhibit or slow enterprise adoption.
  • 16. © 2015 Impetus Technologies16 Recorded version available at http://lf1.me/wNb/ Stage 4: Enterprise Capability in the Lake Broad adoption of unified Data Lake architectures, will require information governance, meta data management and information lifecycle management capabilities. Landing and ingestion Structured Unstructured External Social Machine Geospatial Time Series Streaming Provisioning, Workflow, Monitoring and Security Enterpris e Data Lake Predictive applications Exploration & discovery Enterprise applications Real-Time applications Traditional data repositories RDBMS MPP Governance, Information Lifecycle, Enterprise Meta Data Management
  • 17. © 2015 Impetus Technologies17 Recorded version available at http://lf1.me/wNb/ Summary • Hadoop based Big Data architectures have changed the face of the Data Warehouse/BI/Analytics world forever. • Enterprise adoption of Big Data architectures is accelerating as a way to enable broad new opportunities across all industries. • There is a growing acceptance of the concept of a “Data Lake” as a cornerstone component of an enterprise Big Data strategy. • “Big Data Warehouse” architectures will complement rather than replace the enterprise data warehouses of today. • Models and approaches are emerging to address the enterprise class DILEMMA related to security, governance and operations. • There are defined best practices and roadmaps for implementing an enterprise Data Lake architecture.
  • 18. © 2015 Impetus Technologies18 Recorded version available at http://lf1.me/wNb/ Q&A (Use the chat/Q&A panel) For general inquiries about our services and solutions reach us at bigdata@impetus.com ? Follow us on Twitter- @impetustech

Editor's Notes

  1. Amex specific recommendations Meta data store- Discoverable Leaner data mart/BI delivery Data driven Business Analytics
  2. Distillation is key else data lakes can easily turn into Data Swamps
  3. A data lake brings newer capabilities and insights to business users which include but not limited to the following: Active Archive: One place to store all your data, in any format, at any volume, for as long as you like, allowing you to address compliance requirements and deliver data on demand to satisfy internal and external regulatory demands. Because it is secure, you control who sees what; because it delivers governance and lineage services, you can trace access to, and the evolution of, your data over time. Transformation and Processing: ETL workloads that previously had to run on expensive systems can migrate to the enterprise data hub, where they run at very low cost, in parallel, much faster than before. Optimizing the placement of these workloads and the data on which they operate frees capacity on high-end analytic and data warehouse systems, making them more valuable by allowing them to concentrate on the business- critical OLAP and other applications that they run. > Self-Service Exploratory BI: Users frequently want access to enterprise data for reporting, exploration, and analysis. Production enterprise data warehouse systems must often be protected from casual use so they can run the mission-critical financial and operational workloads they support. An enterprise data hub allows users to explore data, with full security, using traditional interactive business intelligence tools via SQL and keyword search. > Advanced Analytics: Multiple computing frameworks that enable analytics, search, machine learning, and more unlock value in new and old data sources. Rather than examining samples of data, or snapshots from short time periods, all historical data, in full fidelity, can be combined in comprehensive analyses. Simple tabular data can mix with more complex and multi-structured data in ways that were never before possible
  4. key is use or like to use if you are comitted to gather higher amoutns of unstrcutured data then a big cultural change, data lake is the vehicle to implement the change The cost angle and data movement through costly ETL process Hard bound locked silos, lack of integration, takes months to move data
  5. Canot assume and plan to store all data Social Media data is good example, cannot plan to keep all data. Have tootls to move and manage external sources of information on demandLevel of unification of data Time analysis of information on source data, delta data detection Encourage experimentation Creation of Point Solutions by LOB Scale is driven by demand Moving from Analyst intuition & statistics to Empirical Data Science driven insights
  6. Some of the key patterns adopted by enterprises today are the following
  7. As the first stage, the organization needs to determine the existing and new data source that it can leverage. The data source are integrated and the variety of voluminous data is ingested at high velocity in Hadoop storage.
  8. 14
  9. Amex specific recommendations Meta data store- Discoverable Leaner data mart/BI delivery Data driven Business Analytics