3. Workshop overview
• 15 invited talks, 22 posters, 3 sponsors
• Session titles
– Introduction to Machine Learning and Chemistry
– Machine Learning Applications in Chemistry
– Kernel Learning with Structured Data
– Deep Learning Approaches
• Areas of interest
– ML + (Quantum) Chemistry / ML + Quantum Physics / Material Informatics
– DL : Vinyals (DeepMind), Duvenaud (Google), Smola (Amazon)
4. Why materials and molecules?
• Material informatics
– Material genome initiative
– MI2I project (NIMS)
• Drug discovery
– Big pharmas’ investment
– IPAB drug discovery contest
https://medium.com/the-ai-lab/artificial-intelligence-in-drug-discovery-is-
overhyped-examples-from-astrazeneca-harvard-315d69a7f863
5. Chemical prediction - Two approaches
• Quantum simulation
– Theory-based approach
– e.g. DFT (Density Functional Theory)
J Precision is guaranteed
L High calculation cost
• Machine learning
– Data-based approach
– e.g. Graph convolution
J Low cost, high speed calculation
L Hard to guarantee precision
“Neural message passing for quantum chemistry”Justin et al
6. Hardness of learning with molecules
• How to represent molecules?
– Discrete and structured nature of molecules
– 2D and 3D information
• Vast search space (~10**60)
10. SMILES
A format of encoding molecules in text.
Simple solution: Treat a molecule as a sequential data and apply NLP techniques.
OC[C@@H](O1)[C@@H](O)[C@H]
(O)[C@@H](O)[C@@H](O)1
11. Variational AutoEncoder (VAE) [Kingma+13][Rezende+14]
Kingma, D. P., & Welling, M. (2013). Auto-
encoding variational bayes. arXiv preprint
arXiv:1312.6114.
Rezende, D. J., Mohamed, S., & Wierstra, D.
(2014). Stochastic backpropagation and
approximate inference in deep generative
models. arXiv preprint arXiv:1401.4082.
• Variational inference
• Use NN as an inference model.
• Train in end-to-end manner with backpropagation.
• Extension to RNN encoder/decoder [Fabius+15]
https://www.slideshare.net/KentaOono/vaetype-
deep-generative-models
z
x
z
x
approximate
Inference model
qφ(z | x)
Generative model
pθ (z | x)
12. Molecule generation with VAE (CVAE) [Gómez-Bombarelli+16]
• Encode and decode molecules represented
as SMILE with VAE.
• Latent representation can be used for semi-
supervised learning.
• We can use learned models to find
molecule with desired property by
optimizing representation in latent space
and decode it.
L generated molecules are not guaranteed to be
valid syntactically.
Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hernández-Lobato, J. M., Sánchez-Lengeling, B., Sheberla, D., ... & Aspuru-
Guzik, A. (2016). Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science.
13. Grammar VAE (GVAE) [Kusner+17]
Kusner, M. J., Paige, B., & Hernández-Lobato, J. M. (2017). Grammar Variational
Autoencoder. arXiv preprint arXiv:1703.01925.
• Generate sequence of production
rules of syntax of SMILES
• Generated molecules are
guaranteed to be valid
syntactically.
Encode
Decode
• Represent SMILES syntax as CFG
• Convert a molecule to a parse tree
to get a sequence of production
rules.
• Feed the sequence to RNN-VAE.
L generated molecules are not guaranteed
to be valid semantically.
14. Syntax-Directed VAE (SDVAE) Best paper award
• Use attribute grammar to guarantee
that generated molecules are both
syntactically and semantically valid.
• Generate attributes stochastically
(stochastic lazy attributes) for on-the-
fly semantic check.
← Simplified schematic view
(Note: Bottom up semantic check for
explanation)
http://www.quantum-
machine.org/workshops/nips2017/assets/pdf/sd
vae_workshop_camera_ready.pdf
https://openreview.net/forum?id=SyqShMZRb
15. Discussion
• Is SMILES appropriate as an input representation?
– Input representation is not unique (e.g. CC#C and C#CC represent same molecule).
– Molecule representation is not guaranteed to be invariant to relabeling (i.e. permutation of
indexes) of molecules.
– SMILES is not natural language. Can we justify to apply NLP techniques?
• Synthesizability is not considered.
16. Related papers
• Extension of VAE
– Semi-supervised Continuous Representation of Molecules
– Learning Hard Quantum Distributions With Variational Autoencoders
• Seq2seq models
– “Found in translation”: Predicting Outcomes of Complex Organic Chemistry Reactions Using
Neural Sequence-to-sequence Models
• Molecule generation
– Learning a Generative Model for Validity in Complex Discrete Structure
– ChemTS: de novo molecular generation with MCTS and RNN (for rollout)
18. Extended Connectivity Fingerprint (ECFP)
Convert molecule into fixed length bit representation
J Pros
• Calculation is fast
• Show presence of particular substructures
L Cons
• Bit collision
– Two (or more) different substructure features
could be represented by the same bit position
• Task-independent featurizer
https://chembioinfo.com/2011/10/30/revisiting-
molecular-hashed-fingerprints/
https://docs.chemaxon.com/display/docs/Extended
+Connectivity+Fingerprint+ECFP
19. How graph convolution works
Graph convolution
Convolution kernel depends on Graph structure
Image
class label
Chemical
property
CNN on image
20. Unified view of graph convolution
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212.
Update Readout
v
w
hw
evw
hv
mv
mv
mv
mv
hv
y
Many message-passing algorithms (NFP, GGNN, Weave) are formulated as the
iterative application of Update function and Readout function [Gilmer et al. 17].
Aggregates neighborhood information and
updates node representations.
Aggregates all node representations
and updates the final output.
21. Neural Fingerprint (NFP) [Duvenaud+15]
Atom feature embedding
Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T.,
Aspuru-Guzik, A.,&Adams, R. P. (2015). Convolutional networks on graphs
for learning molecular fingerprints. In Advances in neural information
processing systems (pp. 2224-2232).
HCNOS
22. Neural Fingerprint (NFP)
Update
hnew
3= σ ( W2(h3+h2+h4) )
hnew
7= σ ( W3(h7+h6+h8+h9) )
Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T.,
Aspuru-Guzik, A.,&Adams, R. P. (2015). Convolutional networks on graphs
for learning molecular fingerprints. In Advances in neural information
processing systems (pp. 2224-2232).
25. Comparison between graph convolution networks
NFP GGNN Weave SchNet
How to extract
atom features
Man-made or
Embed
Man-made or
Embed
Man-made or
Embed
Man-made or
Embed
Graph convolution
strategy
Adjacent
atoms only
Adjacent
atoms only
All atom-atom
pairs
All atom-atom
pairs
How to represent
connection
information
Degree Bond type
Man-made
pair features
(bond type,distance etc.)
Distance
26. End-to-end Learning of Graph Neural Networks
for Molecular Representation [Tsubaki+17]
1. Embed r-radius subgraphs
2. Update node and vertex representations
3. Use LSTM to capture long-term dependency in vertices and edges
4. Readout the final output with self-attention mechanism
Best paper award
https://www.dropbox.com/s/ujzuj2kd2nyz348/tsubaki_nips2017.pdf
27. Extension to semi-supervised learning [Hai+17]
Compute representations of subgraphs inductively with
neural message passing (→)
Optimize the representation in unsupervised manner in
the same way as Paragraph vector (↓)
Nguyen, H., Maeda, S. I.,&Oono, K. (2017).
Semi-supervised learning of hierarchical
representations of molecules using neural
message passing.arXiv preprint
arXiv:1711.10168.
Workshop paper
28. Chainer Chemistry (http://chainer-chemistry.readthedocs.io/)
Chainer extension library for Biology and Chemistry
FileParser (SDF, CSV) Loader (QM 9, Tox 21)
Graph convolution NN
(NFP, GGNN, SchNet, Weave)
Preprocessing
Example
Multitask
learning with
QM9 / Tox21
Model
Layer
Dataset
Pretrained
Model
Feature extractor
(TBD)
GraphLinear, EmbedAtomID
Basic information
Release:12/14/2017, Version: v0.1.0, License: MIT, Language: Python
29. Discussion
• Is message passing neural network general enough to formulate many
graph convolution algorithms?
• How can we incorporate 3D information to graph convolution algorithms (e.g.
Chirality).
30. Other topics (DNN models)
• CNN models
– ChemNet: A Transferable and Generalizable Deep Neural Network for
Small-molecule Property Prediction
– Ligand Pose Optimization With Atomic Grid-based Convolutional Neural
Networks
• Other DNN models
– Deep Learning for Prediction of Synergistic Effects of Anti-cancer Drugs
– Deep Learning Yields Virtual Assays
– Neural Network for Learning Universal Atomic Forces
31. Other topics
• Chemical synthesis
– Automatically Extracting Action Graphs From Materials Science
Synthesis Procedures
– Marwin Segler’s talk: Planning Chemical Syntheses with Neural
Networks and Monte Carlo Tree Search
• Bayesian optimization
– Bayesian Protein Optimization
– Constrained Bayesian Optimization for Automatic Chemical Design
Segler, M. H., Preuss, M.,&Waller, M. P. (2017). Learning to Plan Chemical
Syntheses. arXiv preprint arXiv:1708.04202.
32. Summary
• Data-driven approach for understanding molecules are being paid attention
in material informatics, quantum chemistry, and quantum physics fields.
• Recent advances of :
– Molecule generation with VAE
– Learning graph-structured data with graph convolution algorithms.
34. Chainer Chemistry (http://chainer-chemistry.readthedocs.io/)
Chainer extension library for Biology and Chemistry
Basic information
release:12/14/2017, version: v0.1.0, license: MIT, language: Python
Features
• State-of-the-art deep learning neural network models (especially graph
convolutions) for chemical molecules (NFP, GGNN, Weave, SchNet etc.)
• Preprocessors of molecules tailored for these models
• Parsers for several standard file formats (CSV, SDF etc.)
• Loaders for several well-known datasets (QM9, Tox21 etc.)
35. Example: HOMO prediction with QM9 dataset
# Dataset preprocessing (for NFP Network)
preprocessor = preprocess_method_dict['nfp']()
dataset = D.get_qm9(preprocessor, labels='homo’)
# Cache dataset for second use
NumpyTupleDataset.save('input/nfp_homo/data.npz', dataset)
train, val = split_dataset_random(dataset, first_size=10000)
# Build model and use as an ordinary Chain
model = GraphConvPredictor(NFP(16, 16, 4), MLP(16, 1))