2. Development history
l 6/12: v1.0
– Basics of Variable/Function, FunctionSet & Optimizer, CUDA support
l 7/7: v1.1
– Caffe referece model, type checking (forward/backward), Py3 support
l 8/19: v1.2
– Many functions are added, collect_̲parameters is deprecated, remove type
checking on backward
l 9/2: v1.3
– CuPy, functions module is reorganized
2
3. CuPy
l CUDA array implementation with NumPy-‐‑‒subset API
l Custom elementwise and reduction kernels are still supported (with
broadcasting)
l No dependence on PyCUDA and scikits.cuda
– Cf.) sudden renaming of scikit-‐‑‒cuda to scikits.cuda
l NumPy API coverage is still incomplete
l Most operations are not supported yet on the Function/Variable level
3
4. Development history
l 6/12: v1.0
– Basics of Variable/Function, FunctionSet & Optimizer, CUDA support
l 7/7: v1.1
– Caffe referece model, type checking (forward/backward), Py3 support
l 8/19: v1.2
– Many functions are added, collect_̲parameters is deprecated, remove type
checking on backward
l 9/2: v1.3
– CuPy, functions module is reorganized
l 10/28: v1.4 (planned, delayed)
– Some functions are added?
4
5. The cause of the delay
l New model structure (#363)
l Iʼ’ve been working on this since the release of v1.3
l It is unexpectedly difficult to make the design
– Still in designing phase
– Iʼ’m planning to release this feature in v1.5
5
6. Objective
l Replacement of FunctionSet/Optimizer
l Goals:
– Provide a solid way of sharing and reusing (sub)network definitions
– Avoid the “to_̲cpu/to_̲gpu trap” between FunctionSet and Optimizer
– Portable save/load
– Make all functions pure for more flexibility and reusability
6
7. Solution (current idea)
l Hierarchy of network definitions
l Example:
– An autoencoder uses an encoder network and a decoder network
– Each of the networks might be MLPs, ConvNets, etc.
– MLP consists of several fully-‐‑‒connected layers
– Each fully-‐‑‒connected layer defines a simple operation on the input variable
l Call each component a chain
l Modeling in Chainer will be linking several chains into one big chain
7
8. Terminology
l Link
– A minimal component of the chain (e.g. Linear, Convolution2D, etc.)
– “Parameterized function” in the previous versions
– It combines parameter variables with input variables to compute the output
variables
l Chain, ChainList
– Composition of child chains (including links)
– Chain manages the child chains by a dictionary, while ChainList does by a list
8
9. Schematic of Link/Chain
9
Linear Linear Linear
Link
Chain
Function
layer1
layer2
layer3
predictor
x
t
loss
Example of a classifier with a multi-‐‑‒layer perceptron
MLP
Classifier
10. Schematic of Link/Chain
Example of Variational AutoEncoder
10
Linear
Linear
Linear
Linear Linearx
kld
nll
loss
+
encoder
decoder
z
VariationalAutoEncoder
MLP
MLP(?)
11. Define by Run
l Note that these diagrams do not mean the computational graph must be
fixed at the defnition of chains
– The graph is dynamically constructed on the forward computation (define-‐‑‒by-‐‑‒
run)
l A chain might implements multiple methods that constructs different
graphs
11
17. Planned features of Link/Chain/ChainList
l The hierarchy is directly mapped to HDF5 format on serialization
– Only the parameters and auxiliary variables (computed by learning) are saved
l Helper method to traverse the hierarchy
– Iterate all subchains in the hierarchy
– Iterate all parameter variables in the hierarchy
17
18. New Optimizer
l Optimizer is also updated
l Optimizer will be aware of the target chain
– Track the migration of the target chain between CPUs and GPUs
l Optimizer is also serializable (in HDF5 format)
18
19. Parallel work: introduction of Cython
l CuPy drawback: the CPU side manipulation is slow
l No single huge bottleneck: the cause of slow down is already scattered
l The easiest point to fix: ctypes
– ctypes is verrrrrrrrrrrry slow
– Even extracting the current device consumes non-‐‑‒negligible running time
– @okuta san is trying to make Cython replace it
l Major impact on the Chainer package
– Low level interface will change
– setup.py is drastically updated (since Cython extension requires Cython to
build, while we have to make the package installable to environments into
which Cython is not installed yet)
19
20. Future work
l Lazy computation
– See VAE example: it computes all intermediate variables in the _̲_̲call_̲_̲
operator, while there might be a usage that a user only wants some of them
– Chainer currently computes eagerly, which causes unneeded computations
– Avoiding unneeded computations is one of the easiest graph optimization
– More in general, I believe that the future is in fusion of symbolic and
dynamic paradigms
l Symbolic optimization of computations on Variables (loop fusion, etc.)
l Variable tags (or annotations)
– Cf.) Blocks
l Learning process abstraction, Data loading abstraction, etc.
20