Linked Data in Production: Moving Beyond Ontologies
How Graphs are Changing AI
1. 1
Graphs & AI
A Path for Enterprise Data Science
Amy Hodler @amyhodler
Director, Graph Analytics & AI Programs
Neo4j
3. Relationships
The Strongest Predictors of Behavior!
“Increasingly we're learning that you can make
better predictions about people by getting all the
information from their friends and their friends’
friends than you can from the information you have
about the person themselves”
James Fowler
11
7. Better Predictions with Graphs
Using the Data You Already Have
• Current data science models ignore network structure
• Graphs add highly predictive features to ML models, increasing accuracy
• Otherwise unattainable predictions based on relationships
Machine Learning Pipeline
15
8. Goals of Graph Data Science
Better
Decisions
Higher
Accuracy
New Learning
and More Trust
16
Decision
Support
Graph Based
Prediction
Graph Native
Learning
9. The Path of Graph Data Science
Decision Support Graph Based
Prediction
Graph Native Learning
17
Graph Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
Knowledge
Graphs
Graph
Analytics
10. The Path of Graph Data Science
Graph Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
18
Graph
AnalyticsKnowledge
Graphs
Graph search
and queries
Support domain
experts
11. Knowledge Graph with Queries
Connecting the Dots has become...
19
Multiple graph layers of financial information
Includes corporate data with cross-relationships and external news
12. Knowledge Graph with Queries
Connecting the Dots
Dashboards and tools
• Credit risk
• Investment risk
• Portfolio news recommendations
• Typical analyst portfolio is 200 companies
• Custom relative weights
1 Week Snapshot:
800,000 shortest path calculations for the ranked
newsfeed. Each calculation optimized to take
approximately 10 ms.
has become...
20
13. The Path of Graph Data Science
Graph Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
21
Knowledge
Graphs
Graph
Analytics
Graph queries &
algorithms for offline
analysis
Understanding
Structures
14. Query
(e.g. Cypher/Python)
Fast, local decisioning
and pattern matching
Graph Algorithms
(e.g. Neo4j library, GraphX)
Global analysis
and iterations
You know what you’re looking
for and
making a decision
You’re learning the overall structure of a
network, updating data, and predicting
Local Patterns Global Computation
22
15. Deceptively Simple Queries
How many flagged accounts are in the applicant’s
network 4+ hops out?
How many login / account variables in
common?
Add these metrics to your approval process
Difficult for RDMS systems over 3 hops
Graph Analytics via Queries
Detecting Financial Fraud
Improving existing pipelines to identify fraud via heuristics
23
16. Graph Analytics via Algorithms
Generally Unsupervised
24
A subset of data science algorithms that come from network science,
Graph Algorithms enable reasoning about network structure.
Pathfinding
and Search
Centrality
(Importance)
Community Detection Heuristic
Link Prediction
Similarity
17. • Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
• Approximate KNN
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• Balanced Triad (identification)
+45 Graph Algorithms in Neo4j
• Parallel Breadth First Search
• Parallel Depth First Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
• Approximate KNN
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors25
There is significant demand for graph
algorithms. Neo4j will be the first
enterprise grade way to run them.
18. The Path of Graph Data Science
Graph
Embeddings
Graph Neural
Networks
26
Knowledge
Graphs
Graph
Analytics
Graph Feature
Engineering
Graph algorithms &
queries for machine
learning
Improve Prediction
Accuracy
19. Graph Feature Engineering
Feature Engineering is how we combine and process the data to create
new, more meaningful features, such as clustering or connectivity
metrics.
Graph features add more dimensions to machine
learning
EXTRACTION
27
20. Feature Engineering using Graph Queries
Telecom-churn prediction
Churn prediction research has found
that simple hand-engineered features
are highly predictive
• How many calls/texts has an
account made?
• How many of their contacts have
churned?
21. 30
Feature Engineering using Graph Queries
Telecom-churn prediction
Add connected features based on graph queries to tabular data
Raw Data:
Call Detail Records
Input Data:
CDR Sample
Call Stats by: Incoming
Outgoing
Per day
Short durations
In-network
Centrality
SMS’s
…
Test/Training Data
Caller ID
Receiver ID
Time
Duration
Location
…
Caller ID
Receiver ID
Time
Duration
Location
…
Identify Early Predictors:
Select simple, interpretable metrics that are
highly correlated w/churn
Churn Score:
Supervised learning to predict binary &
continuous measures of churn
Output/Results
Random
Sample
Selection
Feature
Engineering
22. 31
Feature Engineering using Graph Queries
Telecom-churn prediction
89.4% Accuracy in Subscriber
Churn Prediction
Raw Data:
Call Detail Records
Input Data:
CDR Sample
Call Stats by: Incoming
Outgoing
Per day
Short durations
In-network
Centrality
SMS’s
…
Test/Training Data
Caller ID
Receiver ID
Time
Duration
Location
…
Caller ID
Receiver ID
Time
Duration
Location
…
Identify Early Predictors:
Select simple, interpretable metrics that are
highly correlated w/churn
Churn Score:
Supervised learning to predict binary &
continuous measures of churn
Output/Results
Random
Sample
Selection
Feature
Engineering
Source: Behavioral Modeling for Churn Prediction by Khan et al, 2015
23. Feature Engineering using Graph Algorithms
Detecting Financial Fraud
Using Structure to
Improve ML Predictions
Connected components
identify disjointed group sharing identifiers
PageRank to measure influence and
transaction volumes
Louvain to identify communities that
frequently interact
Jaccard to measure account similarity
24. The Path of Graph Data Science
Graph Feature
Engineering
Graph Neural
Networks
33
Knowledge
Graphs
Graph
Analytics
Graph
Embeddings
Graph embedding
algorithms for
ML features
Predictions on complex
structures
25. Embedding transforms graphs into a feature vector, or set of vectors, describing
topology, connectivity, or attributes of nodes
and relationships in the graph
Graph Embeddings
• Node embeddings: describe connectivity of each node
• Path embeddings: traversals across the graph
• Graph embeddings: encode an entire graph into a single vector
Phases of Deep Walk Approach
34
26. Graph Embeddings RECOMMENDATIONS
Explainable Reasoning over
Knowledge Graphs for Recommendations
35
Pop
Folk
Castle on the Hill
÷ Album
Ed Sheeran
I See FireTony
Shape of You
SungBy IsSingerOf
Interact
Produce
WrittenBy
Derek
Recommendations for
Derek
0.06
0.24
0.24
0.26
0.03
0.30
.63
27. The Path of Graph Data Science
Graph Feature
Engineering
Graph
Embeddings
36
Knowledge
Graphs
Graph
Analytics
Graph Neural
Networks
ML within a Graph
New learning methods
28. “Graphs bring an ability to generalize about
structure that the individual neural nets don't have.”
don't have.”
Next Major Advancement in AI: Graph Native Learning
29. Next Major Advancement in AI: Graph Native Learning
38
Implements machine learning in a graph environment
Input data as
a graph
Learns while
preserving transient
states
Output as
a graph
Track and validate AI
decision paths
More accurate with less
data and training
30. The Path of Graph Data Science
Decision Support Graph Based
Prediction
Graph Native Learning
39
Graph Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
Knowledge
Graphs
Graph
Analytics
31. Resources
Business – AI Whitepaper
neo4j.com/use-cases/
artificial-intelligence-analytics/
Data Scientists
neo4j.com/sandbox
Developers
neo4j.com/download
neo4j.com/graph-algorithms-book
33. 43
“AI is not all about Machine Learning.
Context, structure, and reasoning are
necessary ingredients, and Knowledge
Graphs and Linked Data are key
technologies for this.”
Wais Bashir
Managing Editor, Onyx Advisory
34. 44
Graphs & AI
A Path for Enterprise Data Science
Amy Hodler @amyhodler
Director, Graph Analytics & AI Programs
Neo4j