1. Graphs for AI and ML
Dr. Jim Webber
Chief Scientist, Neo4j
@jimwebber
2. ● Some no-BS definitions
● Graphs and an accidental Skynet
● Graph theory
● Contemporary graph ML
● The future of graph AI
Overview
3. ● ML - Machine Learning
○ Finding functions from historical data to guide future
interactions within a given domain
● AI - Artificial Intelligence
● The property of a system that it appears intelligent to its users
● Often, but not always, using ML techniques
● Or ML implementations that can be cheaply retrained to address
neighboring domains
A Bluffer’s Guide to AI-cronyms
4. ● Predictive analytics
● Use past data to predict the future
● General purpose AI
● ML with transfer learning such that learned experiences in one
domain can be applied elsewhere
● Human-like AI
Often conflated with
7. Extract all the features!
• What do we do? Turn it to
vectors and pump it through a
classification or regression
model
• That’s actually not a bad
thing
• But we can do so much before
we even get to ML…
• … if we have graph data
10. • Nodes with optional properties and optional labels
• Named, directed relationships with optional properties
• Relationships have exactly one start and end node
• Which may be the same node
Labeled Property graph model
12. stole
from
loves
loves
enemy
enemy
A Good
Man Goes
to War
appeared
in
appeared
in
appeared
in
appeared
in
Victory of
the Daleks
appeared
in
appeared
in
companion
companion
enemy
planet
prop
species
species
species
character
character
character
episode
episode
23. Toolkit matures into
proper database
• Cypher and Neo4j server make
real time graph analytical
patterns simple to apply
• Amazing and humane to
implement
42. Graph Theory
• Rich knowledge of how graphs
operate in many domains
• Off the shelf algorithms to
process those graphs for
information, insight, predictions
• Low barrier to entry
• Amazingly powerful
61. It if a node has strong relationships to two neighbours, then these
neighbours must have at least a weak relationship between them.
[Wikipedia]
Strong Triadic Closure
64. • Relationships can have “strength” as well as intent
• Think: weighting on a relationship in a property graph
• Weak links play another super-important structural role in graph
theory
• They bridge neighbourhoods
Weak relationships
66. “If a node A in a network satisfies the Strong Triadic Closure Property
and is involved in at least two strong relationships, then any local
bridge it is involved in must be a weak relationship.”
[Easley and Kleinberg]
Local Bridge Property
68. • (NP) Hard problem
• Repeatedly remove the spanning links between dense regions
• Or recursively merge nodes into ever larger “subgraph” nodes
• Choose your algorithm carefully – some are better than others for
a given domain
• Can use to (almost exactly) predict the
break up of the karate club!
Graph Partitioning
75. Find and stop spammers
Extract graph structure over time
Not message content!
(Fakhraei et al, KDD 2015)
Learning to stop bad guys
Result: find and classify 70% spammers with 90% accuracy
76. Much of modern graph ML is still about turning graphs to vectors
Graph2Vec and friends
Highly complementary techniques
Mixing structural data and features gives better results
Better data into the model, better results out
But we don’t have to always vectorize graphs...
Graph ML
77. Knowledge Graphs
• Semantic domain knowledge for
inference and understanding
• E.g. eBay Google Assistant
• What’s the next best question to ask
when a potential customer says they
want a bag?
• Price? Function? Colour?
• Depends on context! Demographic,
history, user journey.
• Richly connected data makes the
system seem intelligent
• But it’s “just” data and algorithms in
reality
78. Graph Convolutional
Neural Networks
A general architecture for
predicting node and relationship
attributes in graphs.
(Kipf and Welling, ICLR 2017)
Credit: Andrew Docherty (CSIRO), YowData 2017
https://www.youtube.com/watch?v=Gmxz41L70Fg
79. Graph Networks for
Structured Causal Models
• Position paper from Google,
MIT, Edinburgh
• Structured representations and
computations (graphs) are key
• Goal: generalize beyond direct
experience
• Like human infants can
https://arxiv.org/pdf/1806.01261.pdf
ML - this is what nerds do. Sometimes ML is so compelling that it seems intelligent, but in reality it’s data and algorithms all the way down.
AI - train a system to classify animals, might also work on shoes. See: hot dog; not hot dog!
GP-AI - systems like AlphaGo might be an architecture to support this in future, but we’re not there today
GP-AI - systems like AlphaGo might be an architecture to support this in future, but we’re not there today
Here’s where we are mostly today. Row-oriented data.
Maybe some documents, maybe some columns, but mostly rows of data from arcane data models.
You already know graphs
People talk about Codd’s relational model being mature because it was proposed in 1969 – 49 years old.
Euler’s graph theory was proposed in 1736 – 282 years old.
Now we use the labelled property graph model. A very simple set of idioms that can build very sophisticated models.
Graphs are the most natural way to model most domains. You already know this because you draw graphs on a whiteboard, but you’ve never had the opportunity to take that down into the database before.
Nodes are a bit like documents, but they’re flat at present in Neo4j.
You pour data into your nodes and then connect them – easy peasy.
This enables high fidelity domain modeling because this is how your domains work.
And you don’t have to do this stuff in your application code – it’s right there in the database
Let’s prove it by exploring a fun domain…
Graphs are the most natural way to model most domains. You already know this because you draw graphs on a whiteboard, but you’ve never had the opportunity to take that down into the database before.
Nodes are a bit like documents, but they’re flat at present in Neo4j.
You pour data into your nodes and then connect them – easy peasy.
This enables high fidelity domain modeling because this is how your domains work.
And you don’t have to do this stuff in your application code – it’s right there in the database
Let’s prove it by exploring a fun domain…
If you want to know who followed Matt Smith, easy!
Traversing the regenerated (or any) relationship takes about 1/40 millionth of a second on this mac in a steady state database
What if you want to know who preceded Matt Smith?
Easy. Traverse the regenerated rels in the other way.
Cost? About 1/40 millionth of a second on this laptop in a steady state database.
Find all the paths to any doctor
OR
Just ask the database to find the shortest
Note this is a pretty loosely specified query: production queries name relationships and labels (and other predicates) to help narrow matches and lower latency.
But since we can traverse 40 million rels/sec, don’t be worried about those “joins”
My shortest path to Doctor Who?
All the way back to Autumn 2008
November 2007 met Emil at Øredev in Malmö Sweden
Java and Maven build-your-own-DBMS toolkit called Neo4j
Java Core API only
Long afternoon of loading data and writing a recommendation query...
Find the current customer
Find things they own
Find things that depend on the things they own
Sell
Repeat
All we did at first was understand the dependencies between products and bundles.
We never tried to upsell something incompatible. Never tried to sell them something they already owned. Never undersold them.
And it opened a world of possibilities to combine other graphs: demographic, social, geographical, municipal, network...
The system made intelligent suggestions, but it was not ML or AI, just graph queries. It was good.
Unexpectedly Powerful
Solved a problem in a long afternoon was meant to take years with OTS software
Applied same pattern to PoS retail recommendations, fraud detection… in subsequent months
Still amazed!
Effect: join Neo4j as Chief Scientist in 2010.
So let’s get into graphs.
Realtime retail recommendations.
Historical anecdote about beer and nappies.
We had a data model
Some of it taxonomical
Some of it stock-centric.
Some transactional
The insight here is that we have a typical young father who buys beer, nappies and a game console simply by reducing subgraph
We have a pattern to search for
We knew it was young fathers, but I bet your model would classify them as lazy, drunken, gamers right?
Now we look for young fathers – implied by beer and nappies purchases – who haven’t bought a game console.
Turn it to text. And…
Neo4j 2.0:
MATCH (u:User), (n:ProductType), (b:ProductType), (x:ProductType)
WHERE
n.name = "nappies" AND
b.name = "Beer" AND
x.name = "Xbox" AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(n) AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(b) AND
NOT((u)-[:BOUGHT]->()<-[:MEMBER_OF]-(x))
RETURN u
Neo4j 2.0:
MATCH (u:User), (n:ProductType), (b:ProductType), (x:ProductType)
WHERE
n.name = "nappies" AND
b.name = "Beer" AND
x.name = "Xbox" AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(n) AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(b) AND
NOT((u)-[:BOUGHT]->()<-[:MEMBER_OF]-(x))
RETURN u
Neo4j 2.0:
MATCH (u:User), (n:ProductType), (b:ProductType), (x:ProductType)
WHERE
n.name = "nappies" AND
b.name = "Beer" AND
x.name = "Xbox" AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(n) AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(b) AND
NOT((u)-[:BOUGHT]->()<-[:MEMBER_OF]-(x))
RETURN u
Neo4j 2.0:
MATCH (u:User), (n:ProductType), (b:ProductType), (x:ProductType)
WHERE
n.name = "nappies" AND
b.name = "Beer" AND
x.name = "Xbox" AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(n) AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(b) AND
NOT((u)-[:BOUGHT]->()<-[:MEMBER_OF]-(x))
RETURN u
This is fast: query latency is proportional to the amount of graph searched
Now called “network science”
First we need to talk about some local properties
A triadic closure is a local property of (social) graphs whereby if two nodes are connected via a path involving a third node, there is an increased likelihood that the two nodes will become directly connected in future.
This is a familiar enough situation for us in a social setting whereby if we happen to be friends with two people, ultimately there's an increased chance that those people will become direct friends too, since by being our friend in the first place, it's an indication of social similarity and suitability.
It’s called triadic closure, because we try to close the triangle.
We see this all the time – it’s likely that if we have two friends, that they will also become at least acquaintances and potentially friends themselves!
In general, if a node A has relationships to B & C then the relationship between B&C is likely to form – especially if the existing relationships are both strong.
This is an incredibly strong assertion and will not be typically upheld by all subgraphs in a graph. Nonetheless it is sufficiently commonplace (particularly in social networks) to be trusted as a predictive aid.
Sentiment plays a role in how closures form too – there is a notion of balance.
From a triadic closure perspective this is OK, but intuitively it seems odd.
Cartman’s friends shouldn’t be friends with his enemies. Nor should Cartman’s enemies be friends with his friends.
This makes sense – Cartman’s friend Craig is also an enemy of Cartman’s enemy Tweek
Two negative sentiments and one positive sentiment is a balanced structure – and it makes sense too since we gang up with our friends on our poor beleaguered enemy
Is this true?
Yes.
Is it nice?
No.
Is it realistic?
Oh yes.
Another balanced – and more pleasant – arrangement is for three positive sentiments, in this case mutual friends.
A starting point for a network of friends and enemies 100 years on from the armistice
Red links indicate enemy of relationship
Black links indicate friend of relationship
The Three Emperor’s league
Italy forms the with Austria and Germany – a balanced +++ triadic closure
If Italy had made only a single alliance (or enemy) it would have been unstable and another relationship would be likely to form anyway!
Triple Alliance
Russia becomes hostile to Austria and Germany – a balance --+ d triadic closure
becomes agnostic towards France.
German-Russian Lapse
The French and Russians ally, forming a balanced --+ triadic closure with the UK
French-Russian Alliance
The UK and France enter into the famous
Entente Cordiale
This produces an unbalanced ++- triadic closure with Russia, and the graph doesn’t like it.
The British and Russians form an alliance, thereby changing their previously unbalanced triadic closure into a balanced one.
Other local pressures on the graph make other closures form.
Italy becomes hostile to Russia, forming a balanced --+ closure with the France, and another balanced --+ closure with the UK.
Germany and the UK become hostile forming a balanced --+ closure with Austria and another balanced --+ closure with Italy
British-Russian Alliance
That WWI can be predicted without domain knowledge by iterating a graph and applying local structural constraints is nothing short of astonishing to me.
Note how the network slides into a balanced labeling — and into World War I.
A very surprising result: graphs don’t know about human conflicts.
In this case the string triadic closure property still holds – though it is a weak link that characterises the relationship between Stan and Cartman.
Given a starting graph, we can apply this simple local principal to see how it would evolve.
In this case the string triadic closure property still holds – though it is a weak link that characterises the relationship between Stan and Cartman.
Given a starting graph, we can apply this simple local principal to see how it would evolve.
A local bridge acts as a link – perhaps the only realistic link - between two otherwise distant (or separate) subgraphs.
Local bridges are semantically rich – they provide conduits for information flow between otherwise independent groups.
In this case DATING is a local bridge – it must also be a weak relationship according to our definition of a local bridge
Intuitively this makes sense – your girl/boyfriend is rather less important at age 8 than your regular friends, IIRC.
How do we identify local bridges?
Any weak link which would cause a component of the graph to become disconnected.
Being able to identify local bridges is important – in this case it’s the only know conduit to allow the girls and boys to communicate.
In real life local bridges are apparent in your organisation as experts (or managers); appear as nexus in fraud cases;
Zachary in the Journal of Anthropological Research 1977
Intuitively we can see “clumps” in this graph.
But how do we separate them out? It’s called minimum cut.
What’s interesting is that it’s mechanical – no domain knowledge is necessary.
There’s only one failure with the method Zachary chose to partition the graph: node 9 should have gone to the instructor’s club but instead went with the original president of the club (node 34).
Why? Because the student was three weeks away from completing a four-year quest to obtain a black belt, which he could only do with the instructor (node 1)
Other minimum cut approaches might deliver slightly different results, but on the whole it’s amazing you get such insight from an algorithm!
But is there enough information in the graph itself to predict the schism?
But is there enough information in the graph itself to predict the schism?
Actually neo4j already has a bunch of these algorithms.
Call them easily from Cypher
Emergent intelligence from the graph!
Efficiency for graph operations is paramount.
You don’t need huge macho clusters to do this.
Large payment provider, transaction history
A 300M node, ~18B rel graph pageranked with 20 iterations in less than 2 hours using the graph algos.
On commodity hardware.
Contemporary AI
Graph structure itself is rich.
In this example we don’t need to know the content of the messages to know they’re spam at high confidence, just their position in the graph.
Mine a vector of graph features, feed it into the trained model.
Graphs have a key advantage: structural context. Where is the node in the graph? Who are its neighbours? Etc.
That richness feeds into the model and makes it better, more accurate, more dependable.
PageRank, Degree, Neighbourhood, Colour, etc are all features that improve your ML outcomes but are only available from graphs.
ICLR = International Conference on Learning Representations
Graph of movies that a user liked.
Feed into neural net
Graph of users who rated one of those movies.
Feed into neural net.
Recurse through the data until you get to all the movies and all the users which are just embedding vectors (fancy hashes that place like near like in a vector space).
[Can change these vectors for features to avoid cold-starts, without changing overall architecture.]
Graph of back-propagated trained neural nets.
Incremental: Scalable for both training and prediction.
Extensible: bring in other graph layers!
Better than collaborative filtering because it can work on any graph, not just bipartite user-likes-movies graphs. E.g. User likes actor in movies with genre – much richer!
A bipartite graph, also called a bigraph, is a set of graph vertices decomposed into two disjoint sets such that no two graph vertices within the same set are adjacent. I.e. Users don’t connect to users, only to movies.
This is already happening - it’s YouTube’s recommender algorithm.
A growing realisation from leaders in the AI community: graph networks as the foundational building block for human-like AI.
Argue: combinatorial generalization must be a top priority for AI to achieve human-like abilities. Must be able to compose a finite set of elements in infinite ways (eg like language)
We draw analogies by aligning the relational structure between two domains and drawing inferences about one based on corresponding knowledge about the other (Gentner and Markman, 1997; Hummel and Holyoak, 2003). Hierarchies are critical.
Inductive bias: how the algorithm prioritises solutions.
Relational inductive biases to guide deep learning about entities, relations, and rules for composing them. I.e. the learning understands graphs
All this might seem hard at first – we’re used to tables, and our toolkits expect them.
Graphs changes this for the better. Once you get graphs, all the other things seem hard
“a vast gap between human and machine intelligence remains, especially with respect to efficient, generalizable learning”
70% of graph ML today is still turning graphs to vectors
E.g. deep walk - random walk through graph, assign vector node when encountered based on neighborhood
30% is truly graph AI - “differential neural computer” -> discern patterns that users can’t; write sophisticated algorithms (fraud, shortest path, etc) from incentive declarations.
E.g. no longer need a human expert to discover the “young father” pattern in our data, the machine learns it’s a valuable query in some contexts.
So enjoy using graphs for AI, but please remember graphs for good!