This document discusses data virtualization and how it can help organizations leverage data lakes to access all their data from disparate sources through a single interface. It addresses how data virtualization can help avoid data swamps, prevent physical data lakes from becoming silos, and support use cases like IoT, operational data stores, and offloading. The document outlines the benefits of a logical data lake created through data virtualization and provides examples of common use cases.
Data Virtualization Fulfills Promise of Data Lakes
1. Data Virtualization: Fulfilling the Promise of Data Lakes
Dr. Christian Kurze
Principal Sales Engineer – DACH
ckurze@denodo.com
heiko.klarl@xdi360.com
2. 2
Key qestions I want to answer today
What is Data Virtualization?
How to leverage Hadoop Data Lakes to support Internet of Things /
Operational Data Store / Offloading / … use cases?
How to query Hadoop Data Lakes combined with any other structured,
semi-structured and unstructured data sources using a single logical data
lake? What about Cloud?
How to avoid Data Swamps via a light weight data governance approach
that helps enterprises maximize the value of their Data Lake?
How to use a logical data lake/data warehouse to prevent a physical data
lake from becoming a silo?
Agenda
3. 3
Status Quo – Data Integration
Access to all information
MarketingSales ExecutiveSupport
Access to complete information
… in an economically meaningful way
… real-time and in high quality incl.
monitoring, security and audit
Cross-sell / Up-sell
Channel
Warranty
Product Customer
Database
Apps
Warehouse Cloud
Big Data
Documents
AppsNoSQL
Manual Access to legacy systems and
constantly new technologies – IoT, Big
Data, Cloud
Point-to-Point connections
Too slow projects for new initiatives
– from disparate silos and technologies
The Requirement…
… versus the current architecture
4. 4
Status Quo – Data Integration
Access to all information
MarketingSales ExecutiveSupport
Access to complete information
… in an economically meaningful way
… real-time and in high quality incl.
monitoring, security and audit
Cross-sell / Up-sell
Channel
Warranty
Product Customer
Database
Apps
Warehouse Cloud
Big Data
Documents
AppsNoSQL
Manual Access to legacy systems and
constantly new technologies – IoT, Big
Data, Cloud
Point-to-Point connections
Too slow projects for new initiatives
– from disparate silos and technologies
The Requirement…
… versus the current architecture
„My architecture works fine, but I am not
able to access all my silos.“
- Enterprise Data Architect
• Different locations
• Different technologies
• Different data structures
• Too large datasets to move them
• Different APIs and access methods
• Excessive use of ETL to copy data
• Synchronization issues
5. 5
The Solution
Data Virtualization as a Data Abstraction Layer
DATA ABSTRACTION LAYER
Central repository to access all data
Abstracts the underlying technology of
the data sources
Enables the definition of a semantic
data model
Offers a metadata-rich catalog
Multiple access methods:
SQL based
Keyword based search (via index)
RESTful navigation (hyperlinks)
Native support for nexted document
structures (XML, JSON, …)
6. 6
Modelling in a Data Virtualization Solution
Sources
Combine,
Transform
&
Integrate
Publish
Base View
(Source
Abstraction)Client Address Client
Type
Company Invoicing Service
Usage
Product Logs Web
Incidents
Customer Invoice Product
Customer Invoicing
Service Usage Incident
Hadoop Web SiteRest
Web Service
MultidimensionalSalesforceSQL ServerOracle
SQL, SOAP, REST, ODATA,
Message Queues (JMS), etc..
Denodo’s
Information Self Service
Independent of the
access method – all
views use the same
metadata and access
privileges
7. 7
Common Data Virtualization Use Cases
Data Virtualization
BIG DATA, CLOUD INTEGRATION
Advanced Analytics
Data Warehouse Offloading
Big Data for Enterprise
Cloud / SaaS Integration
AGILE BUSINESS INTELLIGENCE
Logical Data Warehouse
Virtual Data Marts
Self-Service BI
Operational BI / Analytics
SINGLE VIEW APPLICATIONS
Single Customer View - Call Centers, Portals
Single Product View - Catalogs
Single Inventory View - Inventory Reconciliation
Vertical Specific - Single View of Wells
DATA SERVICES
Unified Data Services Layer
Logical Data Abstraction
Agile Application Development
Linked Data Services
8. 8
DWH & MartsAdvanced Analytics
(multiple structures)
Advanced Analytics
(structured)
MDMStreams
Multiple platforms optimized for different Workloads
Additionally in a hybrid environment: OnPrem vs. Cloud
C
R
U
D
NoSQL /
Graph DB
Data Lake:
Hadoop /
Spark / Hive /
…
EDW
Mart
DW
Appliance
DW
Appliance
Cust
Prod
Real-time stream
processing &
decision
management
Graph
analysis
Graph
analysis
Investigative
analysis,
data refinery
Data mining,
model
development
Data mining,
model
development
Traditional
query,
reporting &
analysis
Governed
context
information
Traditional
query,
reporting &
analysis
9. 9
Business requires a combination of data
MDM
C
R
U
D
Hadoop
Cust
Prod
Who are our customers?
What products do we sell?
What are the most popular
naviational paths through
our web site that led to
high-fee products?
Who are our most loyal, low
risk customers that generate
low fees?
What is the online behavior
of our loyal, low risk, low fee
customers so that we can
offer them higher fee
products?
Where do I find this data?
How to combine this data?
How to share it with my
colleagues? What about
their access privileges?
EDW
10. Big Data Connectivity
BigData and Cloud Databases Connectivity
■ Hadoop Ecosystem:
■ SQL on Hadoop: Hive, Impala, Presto,…
■ HDFS, Parquet, Avro, CSV…
■ Execution of map/reduce Jobs
■ Certified with major Hadoop distributions
■ In-memory platforms: Apache Spark SQL, Presto DB, HANA,…
■ Parallel DWs and Appliances: Vertica, Impala, Teradata, Greenplum,…
■ Cloud RDBMS: Redshift, Snowflake, DynamoDB,…
■ NoSQL (MongoDB, CouchDB, Neo4J, Redis, Oracle NoSQL, Cassandra, etc.)
■ Streaming data (Spark streams, Splunk, IBM Streams, Kafka,…)
10
Enhanced Adapters for Big Data ecosystem
Delimited text files
Sequence files
Map files
Avro files
11. 11
How to provide access by multiple tools and technologies?
DWH MDM Hadoop Appliances NoSQL External
Services
Excel /
MS BI
Tableau Power BI
Composite
Desktop
360 Views Cockpit
Other
Applications
Complex Security Policies? RBAC?
Single Sign On (Kerberos)
Governance / Audit
Fast Prototyping?
Automated Processes?
Manual development of Service Layer?
Source Changes
New Attributes and Requirements
Accounting of source usage
(cloud migration pending)
Refactoring of sources
New Sources
12. 12
Marketing
Data Lakes
Research
Logical Data Lake
Finance
Self-Service
Analytics
Operational
Apps
A Single Governed Logical Data Lake
Data Virtualization combines one or more physical data lakes with other enterprise data to create a
“virtual” or “logical” data lake.
Other Data Sources
MDM Cloud Apps
BI/Analytical
Tools
Excel
Reports
DATA VIRTUALIZATION
Semantic
Model
Data
Discovery
Metadata
Catalog
Security
Governance
Denodo Platform Bridges Distinct Data Architectures
Simplified Architecture
Single Point of Access
Lower TCO
Lower Operational Costs
Improved Agility
Improved Flexibility
Consistency and Integrity
for multiple tools
13. 13
Information Self Service
E/R diagram
1
Click on a view to
navigate to the
details
2
Hover on the
arrows to show
the details of
the PK-FK
relationships
14. 14
Information Self Service
Browse Metadata Catalog
1Browse and search
virtual databases
2 Browse and search
available views
3 Review metadata
and descriptions
4 Query the view
15. 15
Information Self Service
Search Metadata Catalog
1 Full-text search within view metadata
(name, column names, descriptions)
2 Show additional view
information and query data
16. 16
Information Self Service
Querying Data
1Access to the
Denodo catalog
2 Query and filter
for data
3 Click on the green arrows to drill
down into related information
17. 17
Information Self Service
Data Lineage
1 Select Data Lineage
for the View
2 Select column
to see lineage
3 Hover and click the
icons to see details
18. 18
Telematics & Predictive Maintenance
Leading Construction Manufacturer
Dealer
Maintenance
Parts Inventory
OSI PI Hadoop Cluster
Tableau: Dealer / Customer Dashboard
19. 19
Business Benefits
Improved asset performance and proactive maintenance.
Reduced warranty costs due to proactive maintenance of
parts preventing parts failure.
Optimized pricing for services and parts among global service
providers.
New Business Model opportunities based on real-time
analysis of detailed sensor data.
20. 20
How can I get started?
Read New Whitepaper by Rick F. Van der Lans
Developing a Bimodal Logical Data Warehouse Architecture
Using Data Virtualization
Register at: http://bit.ly/2frs782
Get Started Today!
Download Denodo Express: www.denodoexpress.com
Access Denodo on AWS:
www.denodo.com/en/denodo-platform/denodo-platform-for-aws