SlideShare a Scribd company logo
1 of 17
Best Practices for the Data Lake
Who is using it and how can you get the most out of it?
© Attunity, Hortonworks, and Teradata
We’ll discuss the research findings including:
• The two biggest Data Lake adoption issues
• Common use cases for Data Lakes
• Rethinking how data is used across an organization
• Data Lake best practices and pitfalls
• How business leaders gain C-level buy-in for Data
Lake projects
Agenda
© Attunity, Hortonworks, and Teradata
Survey Demographics
Roles
IT/Business Manager CXO/Executive Academic
Company Size
Large Enterprise Midsize SMB
Source: Data Lake Adoption and Maturity Survey Findings Report
© Attunity, Hortonworks, and Teradata
High-level findings
• The Data Lake is increasingly
recognized within a data strategy
• Clear early use cases exist for the Data
Lake
• Governance and security are still top of
mind as challenges and success factors
for the Data Lake
© Attunity, Hortonworks, and Teradata
0% 10% 20% 30% 40% 50% 60%
Currently researching and learning about it
Actively involved with it
Heard of it, but don't know what it is
Have not heard of it
What is your familiarity with the term “Data Lake”?
Source: Data Lake Adoption and Maturity Survey Findings Report
What is a Data Lake?
A data lake is a collection of long term data containers that capture, refine, and explore
any form of raw data at scale, enabled by low cost technologies, from which multiple
downstream facilities may draw upon.
Data sources Downstream
Sensors email
TransactionsMachine logs
Geolocation Media
BI Tools IDW
Data Marts Analysis
Apps Other
Data LakeData Lake
C
Value from Data Lakes
• New insights from unknown or under appreciated
data
• New forms of analytics
• Expanded corporate memory retention
• Data integration optimization
C
Data Manufacturing
DATA R&D
DATA LAKE DATA PRODUCTS
R
Data Manufacturing: Logical View of Workloads
DATA R&D
• Goal: analytic agility, flexibility
• Exploratory tools, algorithms, skills
• Finding new high value questions
• Light governance, no SLAs
• Data scientists, data miners
DATA LAKE
• Goal: original raw data at low cost
• Refinery feeds data R&D, data products
• Medium governance, SLAs
• Low business value density
• Programmers and data scientists
DATA PRODUCTS
• Goal: consumable analytic results
• Integrated, cleansed, + metadata
• High governance, SLAs, cost
• High business value density
• Shared by many users, roles, skills
R
AccessPreparationAcquisition
Data Lake Architecture
Math
and Stats
Data
Mining
Business
Intelligence
Applications
Languages
Marketing
ANALYTIC TOOLS
& APPS
USERS
Marketing
Executives
Operational
Systems
Frontline
Workers
Customers
Partners
Engineers
Data
Scientists
Business
Analysts
Streams SearchAggregations
Security, Metadata/Lineage, Administration
Distributed Storage
Msg. queues Cleansing Access
ExperimentsGovernanceFeeds
SOURCES
Sensors
email
Social
Telemetry
Mobile
Tabular Data
Machine logs
© Attunity, Hortonworks, and Teradata
0% 5% 10% 15% 20% 25% 30% 35% 40%
Have an approved budget
Have submitted a budget
Still researching
Already have an initiative
Do you have budget for a Data Lake initiative?
Source: Data Lake Adoption and Maturity Survey Findings Report
© Attunity, Hortonworks, and Teradata
0% 10% 20% 30% 40% 50% 60% 70% 80%
Data discovery/Data Science/Big Data
Real-time analytics/Operationalized insights
Decentralized data acquisition or staging for other systems
Offloading data from other systems
What use cases are you primarily using Hadoop clusters for
currently?
Source: Data Lake Adoption and Maturity Survey Findings Report
© Attunity, Hortonworks, and Teradata
0% 10% 20% 30% 40% 50% 60% 70% 80%
Governance
Metadata
Security
End user skills
Ingest
What are the key challenges you have experienced in making
the Data Lake concept a reality?
Source: Data Lake Adoption and Maturity Survey Findings Report
© Attunity, Hortonworks, and Teradata
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%
Lack of agreed upon definition and strategy
Budget-oriented challenge
Data integration challenge
Organizational challenges
Technology challenges
What are the obstacles for your company in achieving Data
Lake goals?
Source: Data Lake Adoption and Maturity Survey Findings Report
© Attunity, Hortonworks, and Teradata
• The Data Lake is a viable component
of a data strategy
• Companies of all sizes are interested
in Data Lakes
• Critical success factors – Hadoop
skillsets, budget, and data integration –
will persist as adoption increases
• Data Lake maturity will increase with
additional use cases
Survey summary
© Attunity, Hortonworks, and Teradata
To get your own copy
of the “Data Lake
Adoption and
Maturity Survey
Findings Report”,
click here.
Download the research today
Thanks!
hortonworks.com
To view the recorded version of this
webinar, click here.

More Related Content

Viewers also liked

Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaAttunity
 
Rethinking leadership and management to drive innovation
Rethinking leadership and management to drive innovationRethinking leadership and management to drive innovation
Rethinking leadership and management to drive innovationStocker Partnership
 
The Agile Stategy Planner by Stocker Partnership
The Agile Stategy Planner by Stocker PartnershipThe Agile Stategy Planner by Stocker Partnership
The Agile Stategy Planner by Stocker PartnershipStocker Partnership
 
How to create corporate values and purpose
How to create corporate values and purposeHow to create corporate values and purpose
How to create corporate values and purposeStocker Partnership
 
Break Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and MicrosoftBreak Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and MicrosoftAttunity
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThomas Kelly, PMP
 
Medical & Healthcare IoT M2M Solutions
Medical & Healthcare IoT M2M SolutionsMedical & Healthcare IoT M2M Solutions
Medical & Healthcare IoT M2M SolutionsEurotech
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
The many faces of IoT (Internet of Things) in Healthcare
The many faces of IoT (Internet of Things) in HealthcareThe many faces of IoT (Internet of Things) in Healthcare
The many faces of IoT (Internet of Things) in HealthcareStocker Partnership
 

Viewers also liked (14)

Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Rethinking leadership and management to drive innovation
Rethinking leadership and management to drive innovationRethinking leadership and management to drive innovation
Rethinking leadership and management to drive innovation
 
The Agile Stategy Planner by Stocker Partnership
The Agile Stategy Planner by Stocker PartnershipThe Agile Stategy Planner by Stocker Partnership
The Agile Stategy Planner by Stocker Partnership
 
How to create corporate values and purpose
How to create corporate values and purposeHow to create corporate values and purpose
How to create corporate values and purpose
 
Break Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and MicrosoftBreak Free From Oracle with Attunity and Microsoft
Break Free From Oracle with Attunity and Microsoft
 
IoT for Healthcare
IoT for HealthcareIoT for Healthcare
IoT for Healthcare
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Medical & Healthcare IoT M2M Solutions
Medical & Healthcare IoT M2M SolutionsMedical & Healthcare IoT M2M Solutions
Medical & Healthcare IoT M2M Solutions
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Webinar: IoT in Healthcare - An Overview
Webinar: IoT in Healthcare - An OverviewWebinar: IoT in Healthcare - An Overview
Webinar: IoT in Healthcare - An Overview
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
IoT in Healthcare
IoT in HealthcareIoT in Healthcare
IoT in Healthcare
 
The many faces of IoT (Internet of Things) in Healthcare
The many faces of IoT (Internet of Things) in HealthcareThe many faces of IoT (Internet of Things) in Healthcare
The many faces of IoT (Internet of Things) in Healthcare
 

More from Attunity

Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraAttunity
 
How Glidewell Moves Data to Amazon Redshift
How Glidewell Moves Data to Amazon RedshiftHow Glidewell Moves Data to Amazon Redshift
How Glidewell Moves Data to Amazon RedshiftAttunity
 
Optimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseOptimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseAttunity
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?Attunity
 
Attunity Solutions for Teradata
Attunity Solutions for TeradataAttunity Solutions for Teradata
Attunity Solutions for TeradataAttunity
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
9 Ways The Internet of Things Is Changing Everything
9 Ways The Internet of Things Is Changing Everything9 Ways The Internet of Things Is Changing Everything
9 Ways The Internet of Things Is Changing EverythingAttunity
 

More from Attunity (7)

Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
 
How Glidewell Moves Data to Amazon Redshift
How Glidewell Moves Data to Amazon RedshiftHow Glidewell Moves Data to Amazon Redshift
How Glidewell Moves Data to Amazon Redshift
 
Optimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseOptimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data Warehouse
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?
 
Attunity Solutions for Teradata
Attunity Solutions for TeradataAttunity Solutions for Teradata
Attunity Solutions for Teradata
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
9 Ways The Internet of Things Is Changing Everything
9 Ways The Internet of Things Is Changing Everything9 Ways The Internet of Things Is Changing Everything
9 Ways The Internet of Things Is Changing Everything
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 

Recently uploaded (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 

Best Practices for the Data Lake

  • 1. Best Practices for the Data Lake Who is using it and how can you get the most out of it?
  • 2. © Attunity, Hortonworks, and Teradata We’ll discuss the research findings including: • The two biggest Data Lake adoption issues • Common use cases for Data Lakes • Rethinking how data is used across an organization • Data Lake best practices and pitfalls • How business leaders gain C-level buy-in for Data Lake projects Agenda
  • 3. © Attunity, Hortonworks, and Teradata Survey Demographics Roles IT/Business Manager CXO/Executive Academic Company Size Large Enterprise Midsize SMB Source: Data Lake Adoption and Maturity Survey Findings Report
  • 4. © Attunity, Hortonworks, and Teradata High-level findings • The Data Lake is increasingly recognized within a data strategy • Clear early use cases exist for the Data Lake • Governance and security are still top of mind as challenges and success factors for the Data Lake
  • 5. © Attunity, Hortonworks, and Teradata 0% 10% 20% 30% 40% 50% 60% Currently researching and learning about it Actively involved with it Heard of it, but don't know what it is Have not heard of it What is your familiarity with the term “Data Lake”? Source: Data Lake Adoption and Maturity Survey Findings Report
  • 6. What is a Data Lake? A data lake is a collection of long term data containers that capture, refine, and explore any form of raw data at scale, enabled by low cost technologies, from which multiple downstream facilities may draw upon. Data sources Downstream Sensors email TransactionsMachine logs Geolocation Media BI Tools IDW Data Marts Analysis Apps Other Data LakeData Lake C
  • 7. Value from Data Lakes • New insights from unknown or under appreciated data • New forms of analytics • Expanded corporate memory retention • Data integration optimization C
  • 8. Data Manufacturing DATA R&D DATA LAKE DATA PRODUCTS R
  • 9. Data Manufacturing: Logical View of Workloads DATA R&D • Goal: analytic agility, flexibility • Exploratory tools, algorithms, skills • Finding new high value questions • Light governance, no SLAs • Data scientists, data miners DATA LAKE • Goal: original raw data at low cost • Refinery feeds data R&D, data products • Medium governance, SLAs • Low business value density • Programmers and data scientists DATA PRODUCTS • Goal: consumable analytic results • Integrated, cleansed, + metadata • High governance, SLAs, cost • High business value density • Shared by many users, roles, skills R
  • 10. AccessPreparationAcquisition Data Lake Architecture Math and Stats Data Mining Business Intelligence Applications Languages Marketing ANALYTIC TOOLS & APPS USERS Marketing Executives Operational Systems Frontline Workers Customers Partners Engineers Data Scientists Business Analysts Streams SearchAggregations Security, Metadata/Lineage, Administration Distributed Storage Msg. queues Cleansing Access ExperimentsGovernanceFeeds SOURCES Sensors email Social Telemetry Mobile Tabular Data Machine logs
  • 11. © Attunity, Hortonworks, and Teradata 0% 5% 10% 15% 20% 25% 30% 35% 40% Have an approved budget Have submitted a budget Still researching Already have an initiative Do you have budget for a Data Lake initiative? Source: Data Lake Adoption and Maturity Survey Findings Report
  • 12. © Attunity, Hortonworks, and Teradata 0% 10% 20% 30% 40% 50% 60% 70% 80% Data discovery/Data Science/Big Data Real-time analytics/Operationalized insights Decentralized data acquisition or staging for other systems Offloading data from other systems What use cases are you primarily using Hadoop clusters for currently? Source: Data Lake Adoption and Maturity Survey Findings Report
  • 13. © Attunity, Hortonworks, and Teradata 0% 10% 20% 30% 40% 50% 60% 70% 80% Governance Metadata Security End user skills Ingest What are the key challenges you have experienced in making the Data Lake concept a reality? Source: Data Lake Adoption and Maturity Survey Findings Report
  • 14. © Attunity, Hortonworks, and Teradata 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% Lack of agreed upon definition and strategy Budget-oriented challenge Data integration challenge Organizational challenges Technology challenges What are the obstacles for your company in achieving Data Lake goals? Source: Data Lake Adoption and Maturity Survey Findings Report
  • 15. © Attunity, Hortonworks, and Teradata • The Data Lake is a viable component of a data strategy • Companies of all sizes are interested in Data Lakes • Critical success factors – Hadoop skillsets, budget, and data integration – will persist as adoption increases • Data Lake maturity will increase with additional use cases Survey summary
  • 16. © Attunity, Hortonworks, and Teradata To get your own copy of the “Data Lake Adoption and Maturity Survey Findings Report”, click here. Download the research today
  • 17. Thanks! hortonworks.com To view the recorded version of this webinar, click here.

Editor's Notes

  1. Attunity, Hortonworks and Think Big sponsored research on Data Lake adoption and maturity. Working with Database Trends an Applications (DBTA), we had Radiant Advisors and Unisphere Research survey 385 IT practitioners and stakeholders at organizations within a variety of industries. Today, we’ll be discussing the results of the survey and talking about what we’re seeing from our customers who are using Data Lakes today.
  2. The survey respondents were highly technical. Approximately 60% were IT and database administrators while the remainder held CXO or similar executive leader roles. Less than 5% or respondents were from academia. The company size of respondents was made up of 30% large enterprise (over 20,000 employees worldwide), 48% mid-size (less than 20,000 employees but more than 250 employees) and 22% small business (less than 250 employees.) There was a broad spectrum of industry verticals with finance and software companies as the most highly represented at 13% and 11%. Other industries well represented included government, education, and manufacturing. And, over 80% of respondents were from North America.
  3. At a high level, the research showed that: The Data Lake is increasingly recognized within a data strategy Clear early use cases exist for the Data Lake Governance and security are still top of mind as challenges and success factors for the Data Lake
  4. This chart shows that about 51% of respondents are familiar with the term “Data Lake” and 20% of respondents are actively involved with it.
  5. Drawing upon many sources of definitions as well as on-site experience with customers, Teradata came to this definition. We locked 14 of our top experts in a room --including our CTO, VP of development, and president of Think Big to debate the definition. We came to agreement sooner than we expected. Notice it does NOT say Hadoop but it does say low cost technologies which includes hardware. Scalability is a crucial SLA. It also means your quad core Intel server with 10 terabytes is not a data lake. Call it a puddle. Raw data is the key to the data lake’s goals. We want to keep the first version of a file in its native format. Yes, we will do light transformations. But if we have the original file, we can always repeat those transformations. If you only have derivative 5 in series, you cannot reproduce derivatives 2, 3, and 4. Also note that ERP, CRM, and SCM data extracted is also raw data. Don’t try eating this stuff, cook it first. Note the emphasis on downstream services. The clarifies a huge role for the data lake.
  6. The Data Lake is independent of technology. It is technology neutral. A data lake holds raw data and initial light refinements. Think of this as ETL. This is the first stages of refinement. The data products often get their data from the data lake. The data is further refined to a final state in the data products system. Data products are what the business user consumes with their applications or BI tools. Data R&D is research, looking for the questions that need to be asked on a regular business task. We tend to equate this with data scientists and Aster but its not limited to these. Like everything else in this diagram, data R&D is an abstract concept, not a technology. Yes, there is some overlap where each of these systems can do the work of the other. For example, analytics can be applied in all 3 systems. But best fit engineering drives most implementations to the system with the most capability for a specific task. Think of it like a manufacturing line: the data lake receives the raw ore from dirt in the ground. It refines it into steel ingots. The data products division refines it further into consumer useful items such as pans, lamps, automobiles (consumer goods). The R&D division is constantly looking at the raw dirt and ingots for traces of new ores or clues of what can be done with this lightly refined materials.
  7. Each of these major subsystems has different service levels for availability, performance, data quality, data freshness. The SLAS are different for a data product versus a data lake versus data R&D. For example, a data scientist cares less about high performance and data quality than about agility. They don’t want data models or ETL. Data products need high availability, high performance, high data quality The raw data lake holds all data (includes non record oriented data) forever (some hype in those words). But it does mean we need an extremely low cost of storage and access. If we can reduce costs 10X, we can store 10X more data. A subset of the raw data is transformed by data scientists into data products. More often, data is refined and promoted to the data products area. High business value density means that every record stored has value to some sector of the user population. Low density means there is a lot of noise in the data that must be sifted and discarded to find the valuable data. In this logical abstraction, we must separate the tools and people from the logical workload.
  8. This illustrates the fundamental processing in iconic form.
  9. The research showed that 20% of respondents have an initiative underway and 35% have an approved budget for a Data Lake initiative.
  10. The largest majority of respondents in the survey – 70% - are using Hadoop for data discovery, data science, and Big Data projects.
  11. ITAMAR: Survey result shows that as adoption of the Data Lake continues, respondents are most concerned about governance, metadata management issues and security. Additional areas of continues concern include availability of skills and data ingest. It’s interesting to note that when it comes to governance, 62% of responders said that governance was a “must have” from the beginning, while 31% said that governance can be added incrementally. When it comes to security, 42% said that a Data Lake strategy can not be started without a security framework in place. Another 45% said that the security framework must be consistent or even more robust than other IT database security policies. With that – about half of the respondents also highlighted the lack of skills as well as data ingest as key challenges.
  12. The survey showed us that companies are using Data Lakes today, but that there are still obstacles towards achieving goals.
  13. We invite you to get your own copy of the research.