Best Practices for the Data Lake

Best Practices for the Data Lake
Who is using it and how can you get the most out of it?

© Attunity, Hortonworks, and Teradata
We’ll discuss the research findings including:
• The two biggest Data Lake adoption issues
• Common use cases for Data Lakes
• Rethinking how data is used across an organization
• Data Lake best practices and pitfalls
• How business leaders gain C-level buy-in for Data
Lake projects
Agenda

Survey Demographics
Roles
IT/Business Manager CXO/Executive Academic
Company Size
Large Enterprise Midsize SMB
Source: Data Lake Adoption and Maturity Survey Findings Report

High-level findings
• The Data Lake is increasingly
recognized within a data strategy
• Clear early use cases exist for the Data
Lake
• Governance and security are still top of
mind as challenges and success factors
for the Data Lake

0% 10% 20% 30% 40% 50% 60%
Currently researching and learning about it
Actively involved with it
Heard of it, but don't know what it is
Have not heard of it
What is your familiarity with the term “Data Lake”?

What is a Data Lake?
A data lake is a collection of long term data containers that capture, refine, and explore
any form of raw data at scale, enabled by low cost technologies, from which multiple
downstream facilities may draw upon.
Data sources Downstream
Sensors email
TransactionsMachine logs
Geolocation Media
BI Tools IDW
Data Marts Analysis
Apps Other
Data LakeData Lake
C

Value from Data Lakes
• New insights from unknown or under appreciated
data
• New forms of analytics
• Expanded corporate memory retention
• Data integration optimization
C

Data Manufacturing
DATA R&D
DATA LAKE DATA PRODUCTS
R

Data Manufacturing: Logical View of Workloads
DATA R&D
• Goal: analytic agility, flexibility
• Exploratory tools, algorithms, skills
• Finding new high value questions
• Light governance, no SLAs
• Data scientists, data miners
DATA LAKE
• Goal: original raw data at low cost
• Refinery feeds data R&D, data products
• Medium governance, SLAs
• Low business value density
• Programmers and data scientists
DATA PRODUCTS
• Goal: consumable analytic results
• Integrated, cleansed, + metadata
• High governance, SLAs, cost
• High business value density
• Shared by many users, roles, skills
R

AccessPreparationAcquisition
Data Lake Architecture
Math
and Stats
Data
Mining
Business
Intelligence
Applications
Languages
Marketing
ANALYTIC TOOLS
& APPS
USERS
Marketing
Executives
Operational
Systems
Frontline
Workers
Customers
Partners
Engineers
Data
Scientists
Business
Analysts
Streams SearchAggregations
Security, Metadata/Lineage, Administration
Distributed Storage
Msg. queues Cleansing Access
ExperimentsGovernanceFeeds
SOURCES
Sensors
email
Social
Telemetry
Mobile
Tabular Data
Machine logs

0% 5% 10% 15% 20% 25% 30% 35% 40%
Have an approved budget
Have submitted a budget
Still researching
Already have an initiative
Do you have budget for a Data Lake initiative?

0% 10% 20% 30% 40% 50% 60% 70% 80%
Data discovery/Data Science/Big Data
Real-time analytics/Operationalized insights
Decentralized data acquisition or staging for other systems
Offloading data from other systems
What use cases are you primarily using Hadoop clusters for
currently?

0% 10% 20% 30% 40% 50% 60% 70% 80%
Governance
Metadata
Security
End user skills
Ingest
What are the key challenges you have experienced in making
the Data Lake concept a reality?

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%
Lack of agreed upon definition and strategy
Budget-oriented challenge
Data integration challenge
Organizational challenges
Technology challenges
What are the obstacles for your company in achieving Data
Lake goals?

• The Data Lake is a viable component
of a data strategy
• Companies of all sizes are interested
in Data Lakes
• Critical success factors – Hadoop
skillsets, budget, and data integration –
will persist as adoption increases
• Data Lake maturity will increase with
additional use cases
Survey summary

To get your own copy
of the “Data Lake
Adoption and
Maturity Survey
Findings Report”,
click here.
Download the research today

Thanks!
hortonworks.com
To view the recorded version of this
webinar, click here.

Best Practices for the Data Lake

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

More from Attunity

More from Attunity (7)

Recently uploaded

Recently uploaded (20)

Best Practices for the Data Lake

Editor's Notes