This document provides an overview of graph algorithms and how they can be used with Neo4j. It discusses how graph algorithms can extract structure and infer behavior from networked data. It covers categories of graph algorithms like pathfinding, centrality measures, community detection, and similarity measures. The document demonstrates how these algorithms can be used through Neo4j to enhance applications, like using PageRank and personalized PageRank on a business reviews dataset. It provides examples of graph algorithms and discusses how they can be accessed and run through Neo4j.
3. Relationships
The Strongest Predictors of Behavior!
“Increasingly we're learning that you can make
better predictions about people by getting all the
information from their friends and their friends’
friends than you can from the information you
have about the person themselves”
James Fowler David Burkus
Albert-László
Barabási
3
5. Graph Platform
● Database management system (DBMS)
● Property Graph data model
● Cypher query language
● Graph analytics
● Data visualization
● Developer tool for building applications
What is Neo4j?
neo4j.com/
7. Query (e.g. Cypher/Python)
Real-time, local decisioning
and pattern matching
Graph Algorithms Libraries
Global analysis
and iterations
You know what you’re looking
for and making a decision
You’re learning the overall structure of a
network, updating data, and predicting
Local
Patterns
Global
Computation
10. Using Graph Algorithms
Explore, Plan, Measure
Find significant patterns and plan
for optimal structures
Score outcomes and set a threshold
value for a prediction
Feature Engineering for
Machine Learning
The measures as features to train
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0
11. • Current data science models ignore network structure & complex relationships
• Graphs add highly predictive features to existing ML models
• Otherwise unattainable predictions based on relationships
More Accurate Predictions
with the Data You Already Have
Machine Learning Pipeline
15. 15
NumberofNODES
Number of RELATIONSHIPS per Node
Average Distribution
Most nodes have the same
number of relationships
“No Network in Nature that we
know of that would be described
by the Random network model.”
–Albert-László
Barabási
Random
Network
17. 17
NumberofNODES
Number of RELATIONSHIPS per Node
Many approaches erroneously focus on
the average population where few
entities actually exist
Graphs help us invest
in populous areas
Find strategic
entities
Uncover structural
information
18. Graph Algorithms
Extract Structure and Infer Behavior
Source: “Communities, modules and large-scale structure in networks“ - Mark Newman
Source: “Hierarchical structure and the prediction of missing links in networks”; ”Structure and inference in annotated networks” - A. Clauset, C. Moore, and M.E.J. Newman.
20. Graph Algorithm Categories in Neo4j
neo4j.com/
graph-algorithms-
book/
Book Signing 3:15
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Link
Prediction
Finds optimal paths
or evaluates route
availability and quality
Determines the
importance of distinct
nodes in the network
Detects group
clustering or partition
options
Evaluates how
alike nodes are
Estimates the likelihood
of nodes forming a
future relationship
Similarity
21. Graph Algorithms in Neo4j
• Parallel Breadth First Search &
DFS
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity – 1 Step & Multi-
Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
neo4j.com/docs/
graph-algorithms/current/
Updated June 2019
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
+35
32. Overlap Similarity
Algorithm
Ideal choice for finding
hierarchy in data and
developing super and sub-
categories
Overlap similarity coefficient
represents the co-occurrence of
items between groups
A B
A B
33. Jaccard Similarity
Algorithm
Often used to find
recommendations of similar
items as well as part of link
prediction
Jaccard similarity measures the
similarity between sets
A B
A B
40. PageRank & Personalized PageRank
Measures the transitive (directional) influence
of nodes and considers the influence of
neighbors and their neighbors
Personalized PageRank
CALL algo.pageRank('Page', 'LINKS',
{iterations:20, dampingFactor:0.85,
sourceNodes: [siteA]})
41. 44
Using Personalized PageRank To Surface
Relevant Reviews
● Find influential reviewers
in my network
● Find users who have
reviewed the same
businesses as me
48. 51
Query Using TRUSTS Relationships
Order reviews by PageRank score
on TRUSTS relationship
49. 53
Personalized Recommendations
Content based vs collaborative filtering
● Photo based recommendations:
1) Similar photos using Jaccard
similarity
2) Cluster similar photos using
Label Propagation
3) Recommend businesses
connected to photos in the same
community
64. 69
Game of Thrones (TV-Series)
• Based on Andrew Beveridge's
script to graph work
• 400 Nodes (people)
• 3,550 Relationships (interactions)
65. Triangles and Clustering Coefficient
Communities
Measures can be counted/normalized globally
u
Triangles = 2
CC= 0.2
Triangle Count determines the
number of triangles passing
through a node in the graph
Clustering Coefficient is the
probability that neighbors of a
particular node are connected to
each other
u
Triangles = 2
CC= 0.33
66. Triangles and Clustering Coefficient
Communities
Use When
Basic network analysis, e.g.
does the network exhibit
small-world structures?
Estimating stability
Finding structural holes
Scoring for machine learning
Spam Classification
Semi-streaming web page analysis
(local triangle and CC)
I don’t like spam!
68. A C
D E
B
F
G
Strongly Connected
Components
All nodes can reach each other following
direction, but not necessarily directly
Find cycles, collapse tight communities,
estimate similarity in-group
Connected Components
All nodes can each other when
disregarding direction
Find disconnected subgraphs or nodes
in common, preprocess data
Communities
70. Betweenness Centrality
Influence
Tip / Caution
Computationally intensive: use
RA Brandes approximation on
large graphs.
Assumes all communication
between nodes happens along
the shortest path and with the
same frequency (not always
the case in real life)
The sum of the %
shortest paths that
pass through a node,
calculated by pairs
71. Betweenness Centrality - Uses
Influence
Use When
Identify bridges
Uncover control points
Find bottlenecks and
vulnerabilities
Network Resilience
Key points of cascading failure