Predict Molecular Properties with Graph Neural Networks
1. Towards Predicting Molecular
Property by Graph Neural
Networks
Shion HONDA
The Graduate School of Information Science
and Technology,
The University of Tokyo
@ National Taiwan University
2. Contents
• Basics of molecular property prediction
• Basics of graph theory
• Introduction to graph convolution
• Graph convolutional networks
• Recent advancements
2019/3/20 Shion HONDA 2
3. Who Am I?
• Studying cheminformatics, or ML application
to drug discovery
• My interests: NLP, CV, GANs, RL etc
• SNS
2019/3/20 Shion HONDA 3
@shion_honda
@shionhonda
4. Why Graphs?
• Many kinds of data can be represented as
graphs
• Web
• Traffic network
• Social network
• Citation network
• Neuronal network
• Molecules
2019/3/20 Shion HONDA 4
Zachary’s karate club
Salicylic acid
6. Graphs Are Difficult
• Unaligned structure
• Graphs are not aligned like images/texts
• Different structures and different tasks
• Directed vs undirected
• Weighted vs unweighted etc.
• Link prediction vs graph embedding
• Scalability
• Some graphs (e.g., web, SNS) are huge
• Domain knowledge
• In the case of molecules, link degrees are up to 5
• In the case of SNS, links are unevenly distributed
2019/3/20 Shion HONDA 6
7. Preliminaries And Definition
2019/3/20 Shion HONDA 7
Symbol Meaning
V, E Set of nodes, edges
N, M Number of nodes, edges
G=(V,E) Graph
𝐹 𝑉
, 𝐹 𝐸 Feature vector of nodes, edges
A Adjacency matrix
D Degree matrix
L=D-A Laplacian matrix
H Hidden vectors
Normalized Laplacian matrix
Learnable parameters
k-hop neighbors from node i
8. Graph Fourier Transform
• Eigen decomposition of normalized Laplacian
• U is a unitary matrix as L is a real symmetric
(Hermitian) matrix
• Graph Fourier transform of a signal on a
graph is defined as:
• Inverse operation
2019/3/20 Shion HONDA 8
9. Graph Convolution
• Convolution theorem
• Convolution of a filter
where
• Graph convolution is defined as a product of
matrices
2019/3/20 Shion HONDA 9
10. Naïve GCN
• is a filter
->Analogy of CNNs for images
• Arbitrary network can be
constructed by repeating
this operation
• Problems
• Eigen-decomposition has at least O(N^2) time
complexity
• Parameters cannot be shared over graphs of
different sizes
2019/3/20 Shion HONDA 10
11. ChebNet
• Eigen-decomposition can be avoided by
Chebyshev polynomials
• Learnable parameters are:
• Then graph convolution is defined as:
2019/3/20 Shion HONDA 11
12. GCN (Kipf & Welling)
• ChebNet when
• By assuming , graph conv. is
• By replacing and
stacking over channels to
where
2019/3/20 Shion HONDA 12
Independent of N
14. Recent Advances
• Autoencoder/VAE
• Decode in a non-parametric manner
• GAN + reinforcement learning
• Generate drug candidates
• e.g., MolGAN, GCPN
• Dynamic graphs
• In most of the real problems, graphs are dynamic
• e.g., web, electricity, SNS
• GCN+LSTM
2019/3/20 Shion HONDA 14
15. Summary
• Graph convolution can be defined with graph
Fourier transform
• Chebyshev polynomials approximation is
used to avoid computationally-expensive
eigen-decomposition
• Several other GCNs have been proposed
• Next direction: Generative models and
dynamic graphs
2019/3/20 Shion HONDA 15
16. References
[1] Z. Zhang et al., Deep Learning on Graphs: A Survey, arXiv, 2018.
[2] J. Zhou et al., Graph Neural Networks: A Review of Methods and
Applications, arXiv, 2018.
[3] Z. Wu et al., A Comprehensive Survey on Graph Neural Networks,
arXiv, 2019.
[4] D. I. Shuman et al., The Emerging Field of Signal Processing on
Graphs: Extending High-Dimensional Data Analysis to Networks and
Other Irregular Domains, IEEE Signal Processing Magazine, 2013.
[5]グラフラプラシアン - 初級Mathマニアの寝言
2019/3/20 Shion HONDA 16
Editor's Notes
But first of all, let me introduce myself briefly.
I’m a master student, studying cheminformatics, or ML application to drug discovery.
My interests lies over ML: NLP, CV, GANs, RL etc.
Let’s move on to the topic.
Why are graph neural networks important? Why do they attract many people?
Because, many kinds of data can be represented as graphs and they have not been benefitted from DL so much as NLP and CV.
One of the applications is chemistry. molecular property prediction, molecule generation, and so on.
And I’m working on these stuff.
I’ll now briefly tell you about the backgrounds of Molecular Property Prediction.
When predicting molecular property by some model, vector representations, or embeddings, are useful.
This feature vectors are called fingerprints in chemistry.
Fingerprints have been hand-crafted for a long time. For example, MACCS Keys is a binary vector that represents the existence of pre-defined substructures.
However, hand-crafted fingerprints are sparse and do not perform well for ML models.
Recent researches try to learn better fingerprints applying seq2seq on the character representation called SMILES.
Or, simply applying GNNs. Today I focus on the GNN approaches.
Generally speaking, graphs are more complicated than texts and images.
This is firstly due to their unaligned structure. There is no order in nodes, so even judging whether two given graphs are the same or not is difficult.
Second, there are several types of graphs such as directed vs undirected…
Scalability and Domain knowledge are also problems.
Before moving on to graph convolution, let me give you definitions.
To keep up with the following talk, You need to remember adjacency matrix, degree matrix. Laplacian matrix, and normalized Laplacian matrix.
Graph convolution is defined through graph fourier transform.
Consider Eigen decomposition of normalized Laplacian.
Here, U is a unitary matrix as L is a real symmetric matrix.
Now, Graph Fourier transform of a signal on a graph x (blue pillars) is defined by multiplying transposed U.
Inverse operation is simple.
Convolution theorem holds just as convolution on images and time-series data. I mean, convolution in space domain is equal to the element-wise product in the Fourier domain.
Let theta as the diagonal matrix like this, then convolution of a filter F can be written as:
Then, Graph convolution is defined as a product of matrices
There is a beautiful theory behind the formula, so if you are interested, please check it out afterwards.
Now, “naive” graph convolutional network can be designed by naively imitating CNN for images.
You can compose arbitrary network by repeating this operation.
However, there are 2 major problems:
First, Eigen-decomposition has O(N^2) time complexity
Second, Parameters cannot be shared over graphs of different sizes because theta depends on the size of U, the number of nodes
But don’t worry. Chebyshev save the world!
ChebNet avoided Eigen-decomposition by using k-degree Chebyshev polynomials!
They set learnable parameters in this formula.
Then graph convolution is defined without eigen decomposition.
Kipf and Welling futrher approximate by considering K=1 case.
By assuming thetas are the same, graph convolution is:
And lastly, by replacing and stacking over channels, theta is independent of graph size N.
The following researches proposed variants of GCNs.
Here are some examples.
I didn’t have enough time to step into recent topics such as autoencoder/VAE, GANs, RL, and dynamic graphs.
If you are interested, I can give you some relevant literature.