The document discusses considerations for building an enterprise data lake. It notes that traditional data warehousing approaches do not scale well for new data sources like sensors and streaming data. It advocates adopting a data lake approach with separate systems for data acquisition, management, and access instead of a monolithic architecture. A data lake requires a distributed architecture and platform services to support various data flows, formats, and processing needs. The data architecture should not enforce models or limitations upfront but rather allow for evolution and change over time.
5. Events and sensors are a relatively new data source
Sensor data doesn’t fit well with current methods of modeling,
collection and storage, or with the technology to process and analyze it.
8. These sorts of things slow user requests down
Conclusion: any methodology built on the premise that you
must know and model all the data first is untenable
17. Schema
In the DW world both data and processing are bounded
No consideration for feedback loops and change
Processing only
happens here
Carefully
controlled
access
here
Nobodyherecreates
newinformation
Sources few and
well understood
Complex DI
is controlled
by IT
Schemas are few
and designed
Tools are authorized,
few in number and
kind
One way flow
This is a monolithic, layered architecture
20. Data lake subsystems / components
The acquisition component allows any data to be collected at any latency. The
management component allows some data to be standardized and integrated. The
access component provides access at any latency and via any means an application
chooses. Processing can be done to any data at any time from any area.
Data Acquisition
Collect & Store
Incremental
Batch
One-time copy
Real time
Data Lake Platform Services
Data Management
Process & Integrate
Data Access
Deliver & Use
Data storage
In reality, you are building three systems, not one. Avoid the monolith.
35. About Third Nature
Third Nature is a consulting and advisory firm focused on new and emerging technology
and practices in information strategy, analytics, business intelligence and data
management. If your question is related to data, analytics, information strategy and
technology infrastructure then you‘re at the right place.
Our goal is to help organizations solve problems using data. We offer education, consulting
and research services to support business and IT organizations as well as technology
vendors.
We fill the gap between what the industry analyst firms cover and what IT needs. We
specialize in strategy and architecture, so we look at emerging technologies and markets,
evaluating how technologies are applied to solve problems rather than evaluating product
features.