Graphs are useful data structures that can be used to model various sorts of data: from molecular protein structures to social networks, pandemic spreading models, and visually rich content such as websites & invoices. In the recent few years, graph neural networks have done a huge leap forward. It is a powerful tool that every data scientist should know. In this talk, we will review their basic structure, show some example usages, and explore the existing (python) tools.
26. Message Passing
For each node :
1. Aggregate neighbor nodes into an intermediate representation
2. Transform the aggregated representation with a linear projection followed by a
non-linearity (ReLU)
Mathematically:
H →Network Layer
W→Network Weights
A →Adjacency Matrix
D →Degree Matrix
27. Model types
● Graph classification
○ Chemical properties of a molecule
○ Comparing user preferences / activities
● Node classification - node label prediction
○ Malicious users in a social network
○ Visually inferred Named Entity Recognition (NER)
○ Node clustering
● Edge prediction
○ Recommendation system
○ Protein-protein interaction
○ “Friend” suggestion
29. NetworkX
● Store and mutate Graphs
● Graph algorithms (Shortest path - Dijkstra, TreeWidth, clustering, centrality)
● Network analysis
● Node / edge data
● Visualization tools
30. DGL
● Building blocks
● Great tutorials
● Generative graphs
● Great for research and complicated tasks
31. ● An extension library for pyTorch
● Officially part of the pyTorch ecosystem
● Easily extensible
● Papers are implemented directly in it
● Looooooooooooooong list of ready-to-use methods and algorithms:
○ TransformerConf (2020)
○ GCN2Conv (2020)
○ DeeperGCN (2020)
○ Top-K Pooling
○ PairNorm
PyTorch Geometric
32. Model types
● Graph classification
○ Chemical properties of a molecule
○ Comparing user preferences / activities
● Node classification - node label prediction
○ Malicious users in a social network
○ Visually inferred Named Entity Recognition (NER)
○ Node clustering
● Edge prediction
○ Recommendation system
○ Protein-protein interaction
○ “Friend” suggestion
35. Potential Pitfalls when going ‘deep’
● Vanishing Gradient
● Overfitting
● Over-smoothing
Node-vectors become too similar
● Bottleneck (Over-Squashing)
A single node vector contains data of too many nodes
https://arxiv.org/abs/2006.07107 - Effective Training Strategies for Deep Graph Neural Networks