Graph Algorithms for Developers

Graph Algorithms
For Developers
Amy Hodler & William Lyon, @Neo4j
Seattle Graph Tour
July 2019
bit.ly/seattlealgos

Relationships
The Strongest Predictors of Behavior!
“Increasingly we're learning that you can make
better predictions about people by getting all the
information from their friends and their friends’
friends than you can from the information you
have about the person themselves”
James Fowler David Burkus
Albert-László
Barabási
3

William Lyon
@lyonwj
4
Neo4j Labs Engineer
Amy Hodler
@amyhodler
Analytics & AI
Program Manager

Graph Platform
● Database management system (DBMS)
● Property Graph data model
● Cypher query language
● Graph analytics
● Data visualization
● Developer tool for building applications
What is Neo4j?
neo4j.com/

Query (e.g. Cypher/Python)
Real-time, local decisioning
and pattern matching
Graph Algorithms Libraries
Global analysis
and iterations
You know what you’re looking
for and making a decision
You’re learning the overall structure of a
network, updating data, and predicting
Local
Patterns
Global
Computation

What Do People
Do with Graph
Algorithms?

Understand & Predict Complex Behavior
Requires Understanding Relationships and Structures
Flow &
Dynamics
Interactions &
Resiliency
Propagation
Pathways

Using Graph Algorithms
Explore, Plan, Measure
Find significant patterns and plan
for optimal structures
Score outcomes and set a threshold
value for a prediction
Feature Engineering for
Machine Learning
The measures as features to train
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0

• Current data science models ignore network structure & complex relationships
• Graphs add highly predictive features to existing ML models
• Otherwise unattainable predictions based on relationships
More Accurate Predictions
with the Data You Already Have
Machine Learning Pipeline

Financial
Crimes Recommendations
Cybersecurity
Predictive
Maintenance
Customer
Segmentation
Churn
Prediction
Search
& MDM
Graph Data Science Applications EXAMPLES
Drug
Discovery
12

Why Not Use
Other
Types of
Analytics?

15
NumberofNODES
Number of RELATIONSHIPS per Node
Average Distribution
Most nodes have the same
number of relationships
“No Network in Nature that we
know of that would be described
by the Random network model.”
–Albert-László
Barabási
Random
Network

16
NumberofNODES
Power-Law Distribution
Some nodes have a lot of
relationships but MOST
have very few
Structured
Network

17
NumberofNODES
Many approaches erroneously focus on
the average population where few
entities actually exist
Graphs help us invest
in populous areas
Find strategic
entities
Uncover structural
information

Graph Algorithms
Extract Structure and Infer Behavior
Source: “Communities, modules and large-scale structure in networks“ - Mark Newman
Source: “Hierarchical structure and the prediction of missing links in networks”; ”Structure and inference in annotated networks” - A. Clauset, C. Moore, and M.E.J. Newman.

Graph Algorithm Categories in Neo4j
neo4j.com/
graph-algorithms-
book/
Book Signing 3:15
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Link
Prediction
Finds optimal paths
or evaluates route
availability and quality
Determines the
importance of distinct
nodes in the network
Detects group
clustering or partition
options
Evaluates how
alike nodes are
Estimates the likelihood
of nodes forming a
future relationship
Similarity

Graph Algorithms in Neo4j
• Parallel Breadth First Search &
DFS
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity – 1 Step & Multi-
Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
neo4j.com/docs/
graph-algorithms/current/
Updated June 2019
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
+35

Demo:
Business
Reviews
Application

Yelp Public Dataset (as a graph)
https://yelp.com/dataset

24
How To Use Graph Algos To Enhance An Application?
● Category hierarchy - knowledge graph
○ Overlap similarity algorithm
● Showing relevant reviews
○ Personalized PageRank
● Personalized recommendations
○ Jaccard similarity
○ Label propagation

25
Business Reviews Application
Neo4j JavaScript Driver

27
Cypher Query Language
average(5,4,2)
= 3.67

Neo4j Graph Algorithms Plugin
One-click install in Neo4j Desktop
… or grab plugin directly
https://neo4j.com/download-center/

1. Call as Cypher procedure
1. Pass in specification (Label, Prop, Query) and
configuration
1. stream variant returns (a lot) of results
CALL algo.<name>.stream('Label','TYPE',{conf})
YIELD nodeId, score
2. non-stream variant writes results to graph returns
statistics
CALL algo.<name>('Label','TYPE',{conf})
How To…
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Link
Prediction
Similarity

Overlap Similarity
Algorithm
Ideal choice for finding
hierarchy in data and
developing super and sub-
categories
Overlap similarity coefficient
represents the co-occurrence of
items between groups
A B
A B

Jaccard Similarity
Algorithm
Often used to find
recommendations of similar
items as well as part of link
prediction
Jaccard similarity measures the
similarity between sets
A B
A B

35
Category Co-Occurence
What
categories
show up
together?

36
Running Overlap Similarity In Neo4j
1) Find category co-
occurences
1) Pass to overlap
similarity
procedure

38
Find Top Level Categories
Only return categories without an outgoiong
NARROWER_THAN relationship

PageRank & Personalized PageRank
Measures the transitive (directional) influence
of nodes and considers the influence of
neighbors and their neighbors
Personalized PageRank
CALL algo.pageRank('Page', 'LINKS',
{iterations:20, dampingFactor:0.85,
sourceNodes: [siteA]})

44
Using Personalized PageRank To Surface
Relevant Reviews
● Find influential reviewers
in my network
● Find users who have
reviewed the same
businesses as me

46
Projected User-User Graph
Leverage inferred relationships.
We can run graph algorithms on
these projected graphs.

47
Running Personalized PageRank

48

49

50

51
Query Using TRUSTS Relationships
Order reviews by PageRank score
on TRUSTS relationship

53
Personalized Recommendations
Content based vs collaborative filtering
● Photo based recommendations:
1) Similar photos using Jaccard
similarity
2) Cluster similar photos using
Label Propagation
3) Recommend businesses
connected to photos in the same
community

Label Propagation
Nodes adopt labels
based on neighbors
to infer clusters
IterateIterate
Community Detection

56
Photo Labels Labeled using Google’s Vision API
https://cloud.google.com/vision/ https://github.com/moxious/vision-api

59
How Similar Are These Photos?

Label Propagation
IterateIterate
Community Detection
Adds partition property to group photos by
community
Group by community and count
photos per community

Label Propagation
Community Detection

Photo Based Recommendations What photos do you like?

Enter the NEuler (Algorithms Playground)

install.graphapp.io neo4j.com/developer/
neo4j-desktop

69
Game of Thrones (TV-Series)
• Based on Andrew Beveridge's
script to graph work
• 400 Nodes (people)
• 3,550 Relationships (interactions)

Triangles and Clustering Coefficient
Communities
Measures can be counted/normalized globally
u
Triangles = 2
CC= 0.2
Triangle Count determines the
number of triangles passing
through a node in the graph
Clustering Coefficient is the
probability that neighbors of a
particular node are connected to
each other
u
Triangles = 2
CC= 0.33

Triangles and Clustering Coefficient
Communities
Use When
Basic network analysis, e.g.
does the network exhibit
small-world structures?
Estimating stability
Finding structural holes
Scoring for machine learning
Spam Classification
Semi-streaming web page analysis
(local triangle and CC)
I don’t like spam!

A C
D E
B
F
G
Strongly Connected
Components
All nodes can reach each other following
direction, but not necessarily directly
Find cycles, collapse tight communities,
estimate similarity in-group
Connected Components
All nodes can each other when
disregarding direction
Find disconnected subgraphs or nodes
in common, preprocess data
Communities

Betweenness Centrality
Influence
Tip / Caution
Computationally intensive: use
RA Brandes approximation on
large graphs.
Assumes all communication
between nodes happens along
the shortest path and with the
same frequency (not always
the case in real life)
The sum of the %
shortest paths that
pass through a node,
calculated by pairs

Betweenness Centrality - Uses
Influence
Use When
Identify bridges
Uncover control points
Find bottlenecks and
vulnerabilities
Network Resilience
Key points of cascading failure

81
Online Training!
Applied Graph Algorithms Data Science With Neo4j
https://neo4j.com/graphacademy/online-training/data-science
https://neo4j.com/graphacademy/online-training/applied-graph-algorithms

Free Digital Book
neo4j.com/
graph-algorithms-book
• Spark & Neo4j Examples
• Machine Learning Chapter

neo4j.com/online-summit/
30+ Sessions. Online. October 10.

Questions?
Amy Hodler
amy.hodler@neo4j.com
Will Lyon
will@neo4j.com
bit.ly/seattlealgos

Graph Algorithms for Developers

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Graph Algorithms for Developers

Similar to Graph Algorithms for Developers (20)

More from Neo4j

More from Neo4j (20)

Recently uploaded

Recently uploaded (20)

Graph Algorithms for Developers