1. Graphs add predictive power to machine learning models by incorporating network structure and relationships between entities.
2. Building graph machine learning models involves aggregating data from various sources to construct a graph, engineering graph features using algorithms and embeddings, and training predictive models that leverage the graph structure.
3. Graph algorithms, embeddings, and neural networks are increasingly being used to power applications in domains like financial services, healthcare, cybersecurity, and more by enabling novel and more accurate predictions based on relationships in data.
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Leveraging Graphs for Better AI
1. 1
Leveraging Graphs for Better AI
Alicia Frame
Senior Data Scientist, neo4j
alicia.frame@neo4j.com
Washington DC, May 2019
2.
3.
4.
5. Financial Services Drug Discovery Recommendations
Cybersecurity Predictive Maintenance
Customer Segmentation
Churn Prediction Search/MDM
Graph Data Science Applications
6. • Current data science models ignore network structure
• Graphs add highly predictive features to existing ML models
• Otherwise unattainable predictions based on relationships
Novel & More Accurate Predictions
with the Data You Already Have
Machine Learning Pipeline
7. “The idea is that graph networks are bigger than
any one machine-learning approach.
Graphs bring an ability to generalize about structure that the
individual neural nets don't have.”
"Where do the graphs
come from that
graph networks operate
over?”
8. Building a Graph ML Model
Data
Sources
Native Graph Platform Machine
Learning
Aggregate Disparate Data
and Cleanse
Build Predictive ModelsUnify Graphs and Engineer
Features
Parquet JSON
and more…
MLlib
and more…
9. Spark Graph Native Graph Platform Machine Learning
Example: Spark & Neo4j Workflow
Graph
Transactions
Graph
Analytics
Cypher 9 in Spark 3.0 to
create non-persistent
graphs
MLlib to Train Models
Native Graph Algorithms,
Processing, and Storage
10. Explore Graphs Build Graph Solutions
• Massively scalable
• Powerful data pipelining
• Robust ML Libraries
• Non-persistent, non-native graphs
• Persistent, dynamic graphs
• Graph native query and algorithm
performance
• Constantly growing list of graph
algorithms and embeddings
11. The Steps of Graph Data Science
Query Based
Knowledge Graph
Query Based
Feature
Engineering
Graph Algorithm
Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
Enterprise Maturity
DataScienceComplexity
Knowledge
Graphs
Graph Feature
Engineering
Graph Native
Learning
Graph Persistence
12. Steps Forward in Graph Data Science
Query Based
Knowledge Graph
Query Based
Feature Engineering
Graph Algorithm
Feature Engineering
Graph Embeddings
Graph Neural
Networks
Enterprise Maturity
DataScienceComplexity
13. Query-Based Knowledge Graphs
Connecting the Dots
• Many connected data sources:
corporate data with cross-
relationships, external news,
and customized weighting
• Dashboards and tools
• Credit risk
• Investment risk
• Portfolio news
recommendations
14. Steps Forward in Graph Data Science
Query Based
Knowledge Graph
Graph Algorithm
Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
Query Based
Feature
Engineering
Enterprise Maturity
DataScienceComplexity
15. HetioNet is a knowledge
graph integrating over 50
years of biomedical data
Leveraged to predict new
uses for drugs by using the
graph topology to create
features to predict new links
Query-Based Feature Engineering
Mining Data for Drug Discovery
16. HetioNet is a knowledge
graph integrating over 50
years of biomedical data
Leveraged to predict new
uses for drugs by using the
graph topology to create
features to predict new links
Query-Based Feature Engineering
Mining Data for Drug Discovery
17. HetioNet is a knowledge
graph integrating over 50
years of biomedical data
Leveraged to predict new
uses for drugs by using the
graph topology to create
features to predict new links
Query-Based Feature Engineering
Mining Data for Drug Discovery
18. Spark Graph Native Graph Platform Machine Learning
• Merge distributed data into
DataFrames
• Reshape your tables
into graphs
• Explore cypher queries
• Move to Neo4j to build
expert queries
• Persist your graph
Knowledge Graphs:
Getting Started Example with Spark
• Bring query based graph
features to ML pipeline
Graph
Transactions
Graph
Analytics
19. Steps Forward in Graph Data Science
Query Based
Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
Query Based
Knowledge Graph
Graph Algorithm
Feature
Engineering
Enterprise Maturity
DataScienceComplexity
20. Feature Engineering is how we combine and process the data to
create new, more meaningful features, such as clustering or
connectivity metrics.
Graph Feature Engineering
Add More Descriptive Features:
- Influence
- Relationships
- Communities
21. 27
Graph Feature Categories & Algorithms
Pathfinding
& Search
Finds the optimal paths or evaluates
route availability and quality
Centrality /
Importance
Determines the importance of
distinct nodes in the network
Community
Detection
Detects group clustering or
partition options
Heuristic
Link Prediction
Estimates the likelihood of nodes
forming a relationship
Evaluates how alike nodes
are
Similarity
Embeddings
Learned representations
of connectivity or topology
22. • Connected components to identify
disjointed graphs sharing identifiers
• PageRank to measure influence and
transaction volumes
• Louvain to identify communities that
frequently interact
• Jaccard to measure account similarity
based on relationships
28
Financial Crime: Detecting Fraud
Large financial institutions already have existing pipelines to
identify fraud via heuristics and models
Graph based features improve accuracy:
24. Spark Graph Native Graph Platform Machine Learning
• Merge distributed data into
DataFrames
• Reshape your tables
into graphs
• Explore cypher queries and
simple algorithms
• Persist your graph
• Create rule based features
• Run native graph
algorithms and write to
graph or stream
Graph Feature Engineering:
Getting Started Example with Spark
• Bring graph features to ML
pipeline for training
Graph
Transactions
Graph
Analytics
25. 31
Graph Algorithms in Neo4J
• Parallel Breadth First Search
• Parallel Depth First Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity – 1 Step & Multi-Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
neo4j.com/docs/
graph-algorithms/current/
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
26. Steps Forward in Graph Data Science
Query Based
Knowledge Graph
Graph Algorithm
Feature
Engineering
Graph Neural
Networks
Query Based
Feature
Engineering
Graph
Embeddings
Enterprise Maturity
DataScienceComplexity
27. Embedding transforms graphs into a vector, or set of vectors,
describing topology, connectivity, or attributes of nodes and edges
in the graph
33
Graph Embeddings
• Vertex embeddings: describe connectivity of each node
• Path embeddings: traversals across the graph
• Graph embeddings: encode an entire graph into a single vector
29. 35
Graph Embeddings - Recommendations
Explainable Reasoning over Knowledge Graphs for
Recommendation
30. Spark Graph Native Graph Platform Machine Learning
• Merge distributed data into
DataFrames
• Reshape your tables
into graphs
• Explore cypher queries and
simple algorithms
• Move to Neo4j to build
expert queries
• Write to persist
• Stay tuned for DeepWalk
and DeepGL algorithms
Graph Feature Engineering:
Getting Started Example with Spark
• Bring graph features to ML
pipeline for training
Graph
Transactions
Graph
Analytics
31. Steps Forward in Graph Data Science
Query Based
Knowledge Graph
Graph Algorithm
Feature
EngineeringQuery Based
Feature
Engineering
Graph Neural
Networks
Graph
Embeddings
Enterprise Maturity
DataScienceComplexity
32. Deep Learning refers to training multi-layer neural networks using
gradient descent
39
Graph Native Learning
33. Graph Native Learning refers to deep learning models that take a
graph as an input, performs computations, and return a graph
40
Graph Native Learning
Battaglia et al, 2018
34. Example: electron path prediction
Bradshaw et al, 2019
41
Graph Native Learning
Given reactants and reagents, what will the
products be?
Given reactants and reagents, what will the
products be?
40. 49
Example: electron path prediction Bradshaw et al, 2019
Graph Native Learning
Predicting Chemical Reactions
41. Example: electron path prediction Bradshaw et al, 2019
50
Graph Native Learning
Predicting Chemical Reactions
Given reactants and reagents, what will the products be?
45. Query-Based Knowledge Graphs
Connecting the Dots
“Using Neo4j someone from our Orion
project found information from the Apollo
project that prevented an issue, saving well
over two years of work and one million
dollars of taxpayer funds.”
David Meza, Chief Knowledge Architect – NASA 2015