1. Graphs for AI and ML
(a personal journey)
Dr. Jim Webber
Chief Scientist, Neo4j
2. ● Some no-BS definitions
● Social history
● Accidental Skynet
● Diapers and beer
● Graph theory
● Contemporary graph ML
● The future of graph AI
Overview
3. ● ML - Machine Learning
○ Finding functions from historical data to guide future
interactions within a given domain
● AI - Artificial Intelligence
● The property of a system that it appears intelligent to its users
● Often, but not always, using ML techniques
● Or ML implementations that can be cheaply retrained to address
neighboring domains
A Bluffer’s Guide to AI-cronyms
4. ● Predictive analytics
● Use past data to predict the future
● General purpose AI
● ML with transfer learning such that learned experiences in one
domain can be applied elsewhere
● Human-like AI
Often conflated with
7. Extract all the features!
• What do we do? Turn it to
vectors and pump it through a
classification or regression
model
• That’s actually not a bad
thing
• But we can do so much before
we even get to ML
• If we have graph data
14. Toolkit matures into
proper database
• Cypher and Neo4j server make
real time graph analytical
patterns simple to apply
• Amazing and humane to
implement
32. Graph Theory
• Rich knowledge of how graphs
operate in many domains
• Off the shelf algorithms to
process those graphs for
information, insight, predictions
• Low barrier to entry
• Amazingly powerful
48. • Relationships can have “strength” as well as intent
• Think: weighting on a relationship in a property graph
• Weak links play another important structural role in graph theory
• They bridge neighborhoods
Weak relationships
52. “If a node A in a network satisfies the Strong Triadic Closure Property
and is involved in at least two strong relationships, then any local
bridge it is involved in must be a weak relationship.”
[Easley and Kleinberg]
Local Bridge Property
54. • (NP) Hard problem
• Repeatedly remove the spanning links between dense regions
• Or recursively merge nodes into ever larger “subgraph” nodes
• Choose your algorithm carefully – some are better than others for
a given domain
• Can use to (almost exactly) predict the break up of the karate club!
Graph Partitioning
61. Find and stop spammers
Extract graph structure over time
Not message content!
(Fakhraei et al, KDD 2015)
Learning to stop bad guys
Result: find and classify 70% spammers with 90% accuracy
62. Much of modern graph ML is still about turning graphs to vectors
Graph2Vec and friends
Highly complementary techniques
Mixing structural data and features gives better results
Better data into the model, better results out
But we don’t have to always vectorize graphs...
Graph ML
63. Knowledge Graphs
• Semantic domain knowledge for
inference and understanding
• E.g. eBay shopbot
• What’s the next best question to ask
when a potential customer says they
want a bag?
• Price? Function? Color?
• Depends on context! Demographic,
history, user journey.
• Richly connected data makes the
system seem intelligent
• But it’s “just” data and algorithms in
reality
64. Graph Convolutional
Neural Networks
A general architecture for
predicting node and relationship
attributes in graphs.
(Kipf and Welling, ICLR 2017)
Credit: Andrew Docherty (CSIRO), YowData 2017
https://www.youtube.com/watch?v=Gmxz41L70Fg
65. Graph Networks for
Structured Causal Models
• Position paper from Google,
MIT, Edinburgh
• Structured representations and
computations (graphs) are key
• Goal: generalize beyond direct
experience
• Like human infants can
https://arxiv.org/pdf/1806.01261.pdf
This is a talk about my history with graphs and AI.
It is peppered with surprises, and inflexion points, and anecdotes.
But what we derive from this, is that we’ve had graph data and algorithms
ML - this is what nerds do. Sometimes ML is so compelling that it seems intelligent, but in reality it’s data and algorithms all the way down.
AI - train a system to classify animals, might also work on shoes. See: hot dog; not hot dog!
GP-AI - systems like AlphaGo might be an architecture to support this in future, but we’re not there today
GP-AI - systems like AlphaGo might be an architecture to support this in future, but we’re not there today
Here’s where we are mostly today. Row-oriented data.
Maybe some documents, maybe some columns, but mostly rows of data from arcane data models.
All the way back to Fall 2008
Perhaps some of you in finance remember that period, right?
November 2007 met Emil at Øredev in Malmö Sweden
Java and Maven build-your-own-DBMS toolkit called Neo4j
Java Core API only
Long afternoon of loading data and writing a recommendation query...
Find the current customer
Find things they own
Find things that depend on the things they own
Sell
Repeat
All we did at first was understand the dependencies between products and bundles.
We never tried to upsell something incompatible. Never tried to sell them something they already owned. Never undersold them.
And it opened a world of possibilities to combine other graphs: demographic, social, geographical, municipal, network...
Unexpectedly Powerful
Solved a problem in a long afternoon was meant to take years with OTS software
Applied same pattern to PoS retail recommendations, fraud detection… in subsequent months
Still amazed!
Effect: join Neo4j as Chief Scientist in 2010.
Realtime retail recommendations.
Historical anecdote about beer and nappies.
We had a data model
Some of it taxonomical
Some of it stock-centric.
Some transactional
The insight here is that we have a typical young father who buys beer, nappies and a game console simply by reducing subgraph
We have a pattern to search for
Now we look for young fathers – implied by beer and nappies purchases – who haven’t bought a game console.
Turn it to text. And…
Neo4j 2.0:
MATCH (u:User), (n:ProductType), (b:ProductType), (x:ProductType)
WHERE
n.name = "nappies" AND
b.name = "Beer" AND
x.name = "Xbox" AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(n) AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(b) AND
NOT((u)-[:BOUGHT]->()<-[:MEMBER_OF]-(x))
RETURN u
Neo4j 2.0:
MATCH (u:User), (n:ProductType), (b:ProductType), (x:ProductType)
WHERE
n.name = "nappies" AND
b.name = "Beer" AND
x.name = "Xbox" AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(n) AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(b) AND
NOT((u)-[:BOUGHT]->()<-[:MEMBER_OF]-(x))
RETURN u
Neo4j 2.0:
MATCH (u:User), (n:ProductType), (b:ProductType), (x:ProductType)
WHERE
n.name = "nappies" AND
b.name = "Beer" AND
x.name = "Xbox" AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(n) AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(b) AND
NOT((u)-[:BOUGHT]->()<-[:MEMBER_OF]-(x))
RETURN u
Neo4j 2.0:
MATCH (u:User), (n:ProductType), (b:ProductType), (x:ProductType)
WHERE
n.name = "nappies" AND
b.name = "Beer" AND
x.name = "Xbox" AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(n) AND
(u)-[:BOUGHT]->()<-[:MEMBER_OF]-(b) AND
NOT((u)-[:BOUGHT]->()<-[:MEMBER_OF]-(x))
RETURN u
This is fast: query latency is proportional to the amount of graph searched
Now called “network science”
First we need to talk about some local properties
A triadic closure is a local property of (social) graphs whereby if two nodes are connected via a path involving a third node, there is an increased likelihood that the two nodes will become directly connected in future.
This is a familiar enough situation for us in a social setting whereby if we happen to be friends with two people, ultimately there's an increased chance that those people will become direct friends too, since by being our friend in the first place, it's an indication of social similarity and suitability.
It’s called triadic closure, because we try to close the triangle.
We see this all the time – it’s likely that if we have two friends, that they will also become at least acquaintances and potentially friends themselves!
In general, if a node A has relationships to B & C then the relationship between B&C is likely to form – especially if the existing relationships are both strong.
This is an incredibly strong assertion and will not be typically upheld by all subgraphs in a graph. Nonetheless it is sufficiently commonplace (particularly in social networks) to be trusted as a predictive aid.
Sentiment plays a role in how closures form too – there is a notion of balance.
From a triadic closure perspective this is OK, but intuitively it seems odd.
Cartman’s friends shouldn’t be friends with his enemies. Nor should Cartman’s enemies be friends with his friends.
This makes sense – Cartman’s friend Craig is also an enemy of Cartman’s enemy Tweek
Two negative sentiments and one positive sentiment is a balanced structure – and it makes sense too since we gang up with our friends on our poor beleaguered enemy
Another balanced – and more pleasant – arrangement is for three positive sentiments, in this case mutual friends.
A starting point for a network of friends and enemies
Red links indicate enemy of relationship
Black links indicate friend of relationship
The Three Emperor’s league
Italy forms the with Austria and Germany – a balanced +++ triadic closure
If Italy had made only a single alliance (or enemy) it would have been unstable and another relationship would be likely to form anyway!
Triple Alliance
Russia becomes hostile to Austria and Germany – a balance --+ d triadic closure
becomes agnostic towards France.
German-Russian Lapse
The French and Russians ally, forming a balanced --+ triadic closure with the UK
French-Russian Alliance
The UK and France enter into the famous
Entente Cordiale
This produces an unbalanced ++- triadic closure with Russia, and the graph doesn’t like it.
The British and Russians form an alliance, thereby changing their previously unbalanced triadic closure into a balanced one.
Other local pressures on the graph make other closures form.
Italy becomes hostile to Russia, forming a balanced --+ closure with the France, and another balanced --+ closure with the UK.
Germany and the UK become hostile forming a balanced --+ closure with Austria and another balanced --+ closure with Italy
British-Russian Alliance
That WWI can be predicted without domain knowledge by iterating a graph and applying local structural constraints is nothing short of astonishing to me.
Note how the network slides into a balanced labeling — and into World War I.
In this case the string triadic closure property still holds – though it is a weak link that characterises the relationship between Stan and Cartman.
Given a starting graph, we can apply this simple local principal to see how it would evolve.
In this case the string triadic closure property still holds – though it is a weak link that characterises the relationship between Stan and Cartman.
Given a starting graph, we can apply this simple local principal to see how it would evolve.
A local bridge acts as a link – perhaps the only realistic link - between two otherwise distant (or separate) subgraphs.
Local bridges are semantically rich – they provide conduits for information flow between otherwise independent groups.
In this case DATING is a local bridge – it must also be a weak relationship according to our definition of a local bridge
Intuitively this makes sense – your girl/boyfriend is rather less important at age 8 than your regular friends, IIRC.
How do we identify local bridges?
Any weak link which would cause a component of the graph to become disconnected.
Being able to identify local bridges is important – in this case it’s the only know conduit to allow the girls and boys to communicate.
In real life local bridges are apparent in your organisation as experts (or managers); appear as nexus in fraud cases;
Zachary in the Journal of Anthropological Research 1977
Intuitively we can see “clumps” in this graph.
But how do we separate them out? It’s called minimum cut.
What’s interesting is that it’s mechanical – no domain knowledge is necessary.
There’s only one failure with the method Zachary chose to partition the graph: node 9 should have gone to the instructor’s club but instead went with the original president of the club (node 34).
Why? Because the student was three weeks away from completing a four-year quest to obtain a black belt, which he could only do with the instructor (node 1)
Other minimum cut approaches might deliver slightly different results, but on the whole it’s amazing you get such insight from an algorithm!
Student 9 was about to take their 1st dan under instructor 1. Though social pressure said they should defect, they stayed for practical reasons.
Actually neo4j already has a bunch of these algorithms.
Call them easily from Cypher
Emergent intelligence from the graph!
Efficiency for graph operations is paramount.
You don’t need huge macho clusters to do this.
Large payment provider, transaction history
A 300M node, ~18B rel graph pageranked with 20 iterations in less than 2 hours using the graph algos.
On commodity hardware.
Contemporary AI
Graph structure itself is rich.
In this example we don’t need to know the content of the messages to know they’re spam at high confidence, just their position in the graph.
Mine a vector of graph features, feed it into the trained model.
Graphs have a key advantage: structural context. Where is the node in the graph? Who are its neighbours? Etc.
That richness feeds into the model and makes it better, more accurate, more dependable.
But we’re still back in a vector! Can we do better?
ICLR = International Conference on Learning Representations
Graph of movies that a user liked.
Feed into neural net
Graph of users who rated one of those movies.
Feed into neural net.
Recurse through the data until you get to all the movies and all the users which are just embedding vectors (fancy hashes that place like near like in a vector space).
[Can change these vectors for features to avoid cold-starts, without changing overall architecture.]
Graph of back-propagated trained neural nets.
Incremental: Scalable for both training and prediction.
Extensible: bring in other graph layers!
Better than collaborative filtering because it can work on any graph, not just bipartite user-likes-movies graphs. User likes users who likes movies. Have you ever sat through some dull scifi or excruciating period drama for your partner? Of course you have!
A bipartite graph, also called a bigraph, is a set of graph vertices decomposed into two disjoint sets such that no two graph vertices within the same set are adjacent. I.e. Users don’t connect to users, only to movies.
This is already happening - it’s YouTube’s recommender algorithm.
A growing realisation from leaders in the AI community: graph networks as the foundational building block for human-like AI.
Argue: combinatorial generalization must be a top priority for AI to achieve human-like abilities. Must be able to compose a finite set of elements in infinite ways (eg like language)
We draw analogies by aligning the relational structure between two domains and drawing inferences about one based on corresponding knowledge about the other (Gentner and Markman, 1997; Hummel and Holyoak, 2003).
Inductive bias: how the algorithm prioritises solutions.
Relational inductive biases to guide deep learning about entities, relations, and rules for composing them. I.e. the learning understands graphs
All this might seem hard at first
The ML community needs on data, but it really hasn’t been good at exploiting advances in data: extracting features from rows is still commonplace.
Graphs changes this for the better. Once you get graphs, all the other things seem hard
“a vast gap between human and machine intelligence remains, especially with respect to efficient, generalizable learning”
70% of graph ML today is still turning graphs to vectors
E.g. deep walk - random walk through graph, assign vector node when encountered based on neighborhood
30% is truly graph AI - “differential neural computer” -> discern patterns that users can’t; write sophisticated algorithms (fraud, shortest path, etc) from incentive declarations.
E.g. no longer need a human expert to discover the “young father” pattern in our data, the machine learns it’s a valuable query in some contexts.
Finally ML is being applied to operations: Pavlo of CMU’s “Peloton” tunes databases better than professional DBAs - makes the DB self-driving. Neo4j will head in this direction too.