Data lineage is about tracking the flow of information. It is necessary to guarantee the quality, usability and security of your data. For large organizations, it is also a key conformity requirement.
Read to learn:
* why Neo4j and graph databases are perfect for data lineage;
* how to use data lineage to drive confidence in your data, agility and security;
* how everyone can explore data lineage with Linkurious.
1. Tracking data lineage
with Neo4j and
Linkurious.
SAS founded in 2013 in Paris | http://linkurio.us | @linkurious
2. French startup specialized in graph-visualization.
CTO
Web-scale archiving
Université de
Technologie de
Compiègne
CMO
>5 years in consulting
Sciences Po + Ecole
de Guerre
Economique
Jean
Villedieu
Sébastien
Heymann
David
Rapin
CEO
Created Gephi
Phd in CS and
complex systems
from UPMC
3. What is a graph?
PERSON
name: Séb
age: 29
PERSON
name: Jean
age: 31
LOCATION
name: Paris
Lives
in Lives
in
Knows
4. A graph is a set of nodes and relationships.
This is a node
This is a
relationship
PERSON
name: Séb
age: 29
PERSON
name: Jean
age: 31
LOCATION
name: Paris
This is a
property
5. What is data lineage?
“Data lineage is defined as a
data life cycle that includes the
data's origins and where it
moves over time. It describes
what happens to data as it goes
through diverse processes”
- Wikipedia
7. Top 5 data lineage questions.
1. Where is this data coming from?
2. Who has access to that information?
3. Do we have sensitive data that’s being propagated
unsafely?
4. Is my database still being used in an important
company process or can I remove it?
5. What systems and reports would be impacted by a
change in that particular process?
8. Traditional databases are not adapted to data lineage.
Hard to query
Querying connected data
through SQL is a hard and
error-prone process.
Slow
Slow performances for
questions requiring looking
up multiple connections.
Too rigid
Hard to accommodate an
evolving data model in a
relational database.
9. The cost of bad data lineage.
● A general lack of confidence in data;
● Potential legal exposure;
● Finding answers and making decisions
becomes complex and time-
consuming;
...it results in wasted time, money,
opportunities, etc.
10. Graph DBs are perfect
for data lineage.
● Easy to model the flow of data in a graph;
● Query relationships with ease and in real-time;
● Adapt your schema to accommodate new data
and relationships;
● Popularity of graph databases has increased
500% in the last 2 years and our partner Neo4j
is the leader.
11. Linkurious brings
ability to find answers.
● Tech and business users can search the data
lineage intuitively and find answers;
● Visualization brings ability to understand and
communicate complex connections;
● Accelerate and improve decisions.
12. Unique ability to store and analyse your data lineage.
Neo4j
Your data lineage is a large
graph. Store and query it
quickly with Neo4j.
Linkurious
Search and find answers
easily through a visual
interface.
14. Question #1: what’s the data lineage of this report?
Our business people need to know what data was
used to generate this month’s sales report. I need
to understand which metadata, which systems
and which processes were involved.
IT Analyst
15. Question #1: visualize the data lineage of a report.
It only takes a few minutes to search a report and analyse its lineage. No need to
be an expert!
16. Question #2: what is this database used for?
We’re relocating our datacenter and need to move
a server on which a database is stored? Can we
decommission it? I need to understand what
processes and reports rely on this server.
IT Analyst
17. Question #2: visualize an impact analysis.
We can visualize and inspect the complex set of relationships involved in the
impact analysis.