In an ever-growing data landscape, BI teams find it hard to navigate and comprehend complex movements of data between several systems as well as within each system.
Data lineage is often described as horizontal or vertical movement of data, but if the universe of data could be depicted as a multidimensional one?
With a complete understanding of the data origin, what happened to it and where it is distributed around the BI environment, responsiveness to business needs becomes more effective. The result is accelerated capability to provide in depth reporting with a broad perspective of the data.
In this webinar you will learn about cross-system lineage, inner-system lineage, and end-to-end column lineage as part of a new approach to data lineage.
2. David Loshin
President, Knowledge Integrity
David Loshin, Knowledge Integrity’s president, is globally recognized as an expert in business intelligence, data quality, and master data management, frequently
contributing to Intelligent Enterprise, DM Review, and the Data Administration Newsletter TDAN (www.tdan.com).
David’s book, “Business Intelligence: The Savvy Manager’s Guide” (June 2003) has been hailed as a resource allowing readers to “gain an understanding of
business intelligence, business management disciplines, data warehousing, and how all of the pieces work together.” His earlier book, “Enterprise Knowledge
Management – The Data Quality Approach,” has been recognized across the information management industry as a key resource for both business and technical
data quality professionals.
David Bitton
VP Global Sales, Octopai
David has extensive product knowledge coupled with creative ideas for product applications and a solid history of global sales & marketing management of SaaS
(Software as a Service) and internet driven products.
About the Speakers
loshin@knowledge-integrity.com 2
3. How to Accelerate BI Responsiveness with Data Lineage
Program Director
Master of Information Management Program
University of Maryland
loshin@knowledge-integrity.com 3
President
Principal Consultant
Knowledge Integrity, Inc.
David Loshin
May 25, 2021
4. Data, Information, and Informed Decision-Making
• “Ideally information is the sum total of
data needed for decision-making”
– Rudolph E. Hirsch, “The Value of
Information” Journal of Accountancy June
1968
• We have known for decades that
business decision-making relies on
available, trustworthy, and current
information
• The tightly-structured traditional data
warehouse architecture for informed
decision-making is gradually
disintegrating
loshin@knowledge-integrity.com 4
Information availability
Information awareness
Information format
Information trust
Information accessibility
Information currency
Information freshness
5. Data, Information, and Informed Decision-Making
• The conventional data
warehouse architecture is
predicated on presumptions
that:
– All data originate from known
sources
– There are well-defined
processing pipelines that
transform data in its original
format to one usable for
reporting and BI
– There is some control over the
data production processes
loshin@knowledge-integrity.com 5
Data
Extraction
Data
Extraction
Data
Extraction
Data
Extraction
Data
Extraction
Staging Area
Data standardization
Data cleansing
Data validation
Transformation
Reorganization into
target model
Preparationfor
loading into data
warehouse
Scheduled batch
loads
Data
Warehouse
Senior
Manager
Business
Analyst
Corporate data center firewall
6. Emerging Hybrid Multicloud Data Strategies
• Motivating factors for modernized data strategies:
– Cloud migration
– Data architecture renovation
– Data virtualization
– Increasing numbers of internal and external data sources
loshin@knowledge-integrity.com 6
• Enterprise data environments
are increasing in complexity
• Data awareness is eroding
with increase in distributed
authority
• Growing number of
sophisticated data consumers
exercise control over their
own data pipelines
Data
Warehouse
Data
Warehouse
On-Premises Data
Cloud Data Environment
Cloud Data Environment
Cloud Data Environment
Cloud Data Environment
7. Informed Decision-Making Requires Data Awareness
• Data decision-makers relying on business
intelligence, reporting, and analytics have
three key questions about their process:
– What source data sets can inform the process?
– How are these data sources combined to produce
the information necessary for business decisioning?
– How are different business intelligence processes
interdependent on the same information?
• Yet increasingly complex data architecture
raise questions about data utility!
loshin@knowledge-integrity.com 7
Can I trust the data in
the data warehouse?
How is the report
impacted by a change
to a data source?
Am I getting the data
that I need at the right
time?
8. Data Lineage to the Rescue
• Data lineage methods help to
develop a map of the enterprise
data landscape
• Data lineage provides a holistic
description of each data object’s
– Sources
– Information pipelines
– Transformations
– Methods of access
– Controls
– All other fundamental aspects of
information utility
loshin@knowledge-integrity.com 8
Production
lineage data
Technical
lineage data
Procedural
lineage data
The semantic aspects of
tracing how data element
values are produced
The structural aspects of
data element concepts
and their use across the
enterprise
A trace of data's journey
through different systems
and data stores, providing
an audit trail of the
changes along the way
Data lineage combines three
different aspects of corporate
metadata:
9. Perspectives on Data Lineage are Changing…
• First-gen data lineage tools focused on (largely manually) documenting metadata
and capturing dependencies
• Second-gen data lineage tools had simple automation for
– Creating data inventories
– Harvesting metadata
– Inferring system-to-system dependencies (i.e., which systems read/write data)
– Visual representations
– Bridge to data catalog
• Emerging “bleeding edge” tools incorporate automation for inferring multiple
dimensions of lineage
– Lineage of data elements across different systems
– Transformations from source to target
– Between-system columnar dependencies
loshin@knowledge-integrity.com 9
10. Data Architecture Complexity and the Need for Automation
• Organizations continue to expand their data landscape across a variety of on-premises and
cloud platforms
• Increasing complexity of enterprise information strategies means that manual oversight and
management of data lineage will be difficult, if not impossible
• Organizations need tools that automatically infer, capture, manage, and present
multidimensional data lineage
loshin@knowledge-integrity.com 10
Data
Warehouse
Data
Warehouse
On-Premises Data
Cloud Data Environment
Cloud Data Environment
Cloud Data Environment
Cloud Data Environment
• Manual capture and
documentation of lineage is
difficult, time-consuming, and
error-prone
• Automated capture and
management of lineage
provides trustworthy details
about data origin,
transformations, and
dependencies
11. Data Lineage Accelerates BI Responsiveness
• Data lineage informs many critical organizational processes
and requirements:
– Integrated auditing for regulatory compliance
– Impact analysis to assess how code or model modifications impact
data pipelines
– Assessment of replication of data pipeline segments for optimization
– Root cause analysis
• Access to the different dimensions of data lineage inform
data consumers
– What report data elements are available
– How they were produced
– Dependencies on original sources
– Transformations applied across the pipelines
loshin@knowledge-integrity.com 11
12. Example: Modification to a Data Privacy Law
• Protection against exposure of sensitive data is engineered into a collection of applications
• However, when the law changes, it may be difficult to determine which parts of application
code are impacted
• For example, imagine if the definition of “private data” were expanded to include a data
element that previously had not been included
– What processes are impacted?
– What reports are affected?
– What code needs to be reviewed and updated?
• Data lineage provides visibility for impact analysis:
– Cross-system lineage allows you to identify which systems are impacted by a modification to an
externally-defined policy modifying the use of a particular source data element
– Column lineage shows where direct dependencies need to be reviewed
– Inner-system lineage exposes where internal data dependencies might inadvertently create exposure
loshin@knowledge-integrity.com 12
13. Considerations: What to Look For in a Data Lineage Solution
loshin@knowledge-integrity.com 13
Data
Lineage
Breadth
Automation
Visualization
Integration
14. Questions & Suggestions
• www.knowledge-integrity.com
• If you have questions, comments, or
suggestions, please contact me
David Loshin
301-754-6350
dloshin@umd.edu
loshin@knowledge-integrity.com
loshin@knowledge-integrity.com 14