5. Photo by Helena Lopes on Unsplash
Network Structure
is Highly Predictive of
Pay, Promotions and
Positive Reviews
• People Near Structural Holes
• Organizational Misfits
“Organizational Misfits and the Origins of Brokerage in Intrafirm Networks” A. Kleinbaum
“Structural Holes and Good Ideas” R. Burt
6. Relationships and Network Structure
Strongest Predictors of Behavior & Complex Outcomes
“Research into networks reveal that,
surprisingly, the most connected
people inside a tight group within a
single industry are less valuable than
the people who span the gaps ...”
6
“…jumping from ladder to ladder is a
more effective strategy, and that lateral
or even downward moves across an
organization are more promising in the
longer run . . .”
9. Network Structure and Predictions
Neo4j for Graph Data Science
Steps of Graph Data Science
Overview
10.
11. Relationships
The Strongest Predictors of Behavior!
“Increasingly we're learning that you can
make better predictions about people by
getting all the information from their
friends and their friends’ friends than
you can from the information you have
about the person themselves”
11
15. Better Predictions with Graphs
Using the Data You Already Have
• Current data science models ignore network structure
• Graphs add highly predictive features to ML models, increasing accuracy
• Otherwise unattainable predictions based on relationships
Machine Learning Pipeline
15
17. Goals of Graph Data Science
Better
Decisions
Higher
Accuracy
New Learning
and more Trust
17
18. The Steps of Graph Data Science
Decision
Support
Graph Based
Predictions
Graph Native
Learning
18
Graph Feature
Engineering
Graph
Embeddings
Graph
Networks
Knowledge
Graphs
Graph
Analytics
19. The Steps of Graph Data Science
Graph Feature
Engineering
Graph
Embeddings
Graph
Networks
19
Graph
AnalyticsKnowledge
Graphs
Graph search
and queries
Support domain
experts
20. Knowledge Graph with Queries
Connecting the Dots has become...
20
Multiple graph layers of financial information
Includes corporate data with cross-relationships and external news
21. Knowledge Graph with Queries
Connecting the Dots
Dashboards and tools
• Credit risk
• Investment risk
• Portfolio news recommendations
• Typical analyst portfolio is 200
companies
• Custom relative weights
1 Week Snapshot:
800,000 shortest path calculations for the
ranked newsfeed. Each calculation
optimized to take approximately 10 ms.
has become...
21
22. The Steps of Graph Data Science
Graph Feature
Engineering
Graph
Embeddings
Graph
Networks
22
Knowledge
Graphs
Graph
Analytics
Graph queries &
algorithms for
offline analysis
Understanding
Structures
23. Query
(e.g. Cypher)
Fast, local decisioning
and pattern matching
Graph Algorithms
(e.g. Neo4j Algorithms Library)
Global analysis
and iterations
You know what you’re
looking for and
making a decision
You’re learning the overall
structure of a network, updating
data, and predicting
Local Patterns Global Computation
23
24. Deceptively Simple Queries
How many flagged accounts are in the
applicant’s network 4+ hops out?
How many login / account variables in
common?
Add these metrics to your approval
process
Difficult for RDMS systems over 3 hops
Graph Analytics via Queries
Detecting Financial Fraud
Improving existing pipelines to identify fraud via heuristics
24
25. Graph Analytics via Algorithms
Generally Unsupervised
25
A subset of data science algorithms that come from network science,
Graph Algorithms enable reasoning about network structure.
Pathfinding
and Search
Centrality
(Importance)
Community
Detection
Heuristic
Link Prediction
Similarity
26. 26
45+ Graph Algorithms in Neo4j
Pathfinding
and Search
Centrality
(Importance)
Community
Detection
Heuristic
Link Prediction
Similarity
Parallel BFS
Parallel DFS
Shortest Path
Single Source Shortest path
All Pairs Shortest Path
Minimum Spanning Tree
A* Shortest Path
Yen’s K-Shortest Path
Minimum Spanning Tree
Random Walk
Degree Centrality
Closeness Centrality
(inc. harmonic, Dangalchev,
Wasserman & Faust)
Betweenness Centrality
Approx. Betweenness
Centrality
Page Rank
Personalized Page Rank
ArticleRank
Eigenvector Centrality
Triangle Count
Clustering Coefficients
Connected Components (aka
Union Find)
Strongly Connected
Components
Label Propagation
Louvain Modularity
Balanced Triad
Adamic Adar
Common Neighbours
Preferential Attachment
Resource Allocations
Same Community
Total Neighbours
Euclidean Distance
Cosine Similarity
Jaccard Similarity
Overlap Similarity
Pearson Similarity
Approximate KNN
27. The Steps of Graph Data Science
Graph
Embeddings
Graph
Networks
27
Knowledge
Graphs
Graph
Analytics
Graph Feature
Engineering
Graph algorithms
& queries for
machine learning
Improve Prediction
Accuracy
28. Graph Feature Engineering
Feature Engineering is how we combine and process the
data to create new, more meaningful features, such as
clustering or connectivity metrics.
Graph features add more dimensions to
machine learning
EXTRACTION
28
29. Feature Engineering using Graph Queries
Telecom-churn prediction
Churn prediction research has
found that simple hand-
engineered features are highly
predictive
• How many calls/texts has
an account made?
• How many of their contacts
have churned?
30. 30
Feature Engineering using Graph Queries
Telecom-churn prediction
Add graph features based on graph queries to ML data
Raw Data:
Call Detail Records
Input Data:
CDR Sample
Call Stats by:
Incoming
Outgoing
Per day
Short durations
In-network
Centrality
SMS’s
…
Test/Training Data
Caller ID
Receiver ID
Time
Duration
Location
…
Caller ID
Receiver ID
Time
Duration
Location
…
Identify Early Predictors:
Select simple, interpretable metrics
that are highly correlated w/churn
Churn Score:
Supervised learning to predict
binary & continuous measures of
churn
Output/Results
Random
Sample
Selection
Feature
Engineering
31. 31
Feature Engineering using Graph Queries
Telecom-churn prediction
89.4% Accuracy in Subscriber
Churn Prediction
Raw Data:
Call Detail Records
Input Data:
CDR Sample
Call Stats by:
Incoming
Outgoing
Per day
Short durations
In-network
Centrality
SMS’s
…
Test/Training
Data
Caller ID
Receiver ID
Time
Duration
Location
…
Caller ID
Receiver ID
Time
Duration
Location
…
Identify Early Predictors:
Select simple, interpretable metrics
that are highly correlated w/churn
Churn Score:
Supervised learning to predict
binary & continuous measures of
churn
Output/Results
Random
Sample
Selection
Feature
Engineering
Source: Behavioral Modeling for Churn Prediction by Khan et al, 2015
32. Feature Engineering using Graph Algorithms
Detecting Financial Fraud
Using Structure to
Improve ML Predictions
Connected components
identify disjointed group sharing
identifiers
PageRank to measure influence
and transaction volumes
Louvain to identify communities
that frequently interact
Jaccard to measure account
similarity
33. The Steps of Graph Data Science
Decision
Support
Graph Based
Predictions
Graph Native
Learning
33
Graph Feature
Engineering
Graph
Embeddings
Graph
Networks
Knowledge
Graphs
Graph
Analytics
FUTURE
34. for Enterprise-Ready, Graph Data Science
34
Harness the natural power of
relationships and network
structures to infer behavior
Neo4j Graph
Algorithms
Practical, Scalable
Graph Data Science
Native Graph
Creation & Persistence
Get all the graph you can eat with
an integrated database built to
store and protect relationships
Neo4j
Database
Graph Exploration
& Prototyping
Explore results visually, quickly
prototype and collaborate with
different groups
Neo4j Desktop
and Browser
Neo4j Bloom
35. A Neo4j Graph Data Science Library
35
Data scientists are under pressure to add more value, faster.
That means putting predictive models into production quickly
with the data they already have.
Practical, easy-to-use graph
data science and analytics
Use network structures to
increase predictive accuracy
Enterprise-grade features
and scale
Evolving the Neo4j Graph
Algorithms Library to
focus on Data Scientists
Preview
36. 36
Data Modeling
Which Algorithms?
Learn Syntax
Reshape
What Now?
How do I represent my data
as a graph? Which library?
Streamlined &
Supported
How do I know what this
algorithm is telling me?
Pick library that seems easy, learn
syntax and fight esoteric error
messages.
What!? I have to convert my data
into different format myself?
Did I get it right? How the $#@! do I get
it into production?
We’re a graph database, your data
are already in the right shape.
We support high value algorithms
that are well documented.
Our syntax is standardized and
simplified across our library!
Our graph loaders seamlessly
reshape your data.
It’s easy to write your results and
move straight to production!
Graph Data Science
Typical Experience
39. 39
“AI is not all about Machine
Learning.
Context, structure, and
reasoning are necessary
ingredients, and Knowledge
Graphs and Linked Data are
key technologies for this.”
Wais Bashir
Managing Editor, Onyx Advisory