Towards Chainer v1.5

Towards Chainer v1.5
10/14 Chainer meetup @ PFI/PFN
Seiya Tokui (Preferred Networks)

Development history
l  6/12: v1.0
–  Basics of Variable/Function, FunctionSet & Optimizer, CUDA support
l  7/7: v1.1
–  Caﬀe referece model, type checking (forward/backward), Py3 support
l  8/19: v1.2
–  Many functions are added, collect_̲parameters is deprecated, remove type
checking on backward
l  9/2: v1.3
–  CuPy, functions module is reorganized
2

CuPy
l  CUDA array implementation with NumPy-‐‑‒subset API
l  Custom elementwise and reduction kernels are still supported (with
broadcasting)
l  No dependence on PyCUDA and scikits.cuda
–  Cf.) sudden renaming of scikit-‐‑‒cuda to scikits.cuda
l  NumPy API coverage is still incomplete
l  Most operations are not supported yet on the Function/Variable level
3

Development history
l  6/12: v1.0
–  Basics of Variable/Function, FunctionSet & Optimizer, CUDA support
l  7/7: v1.1
–  Caﬀe referece model, type checking (forward/backward), Py3 support
l  8/19: v1.2
–  Many functions are added, collect_̲parameters is deprecated, remove type
checking on backward
l  9/2: v1.3
–  CuPy, functions module is reorganized
l  10/28: v1.4 (planned, delayed)
–  Some functions are added?
4

The cause of the delay
l  New model structure (#363)
l  Iʼ’ve been working on this since the release of v1.3
l  It is unexpectedly diﬃcult to make the design
–  Still in designing phase
–  Iʼ’m planning to release this feature in v1.5
5

Objective
l  Replacement of FunctionSet/Optimizer
l  Goals:
–  Provide a solid way of sharing and reusing (sub)network deﬁnitions
–  Avoid the “to_̲cpu/to_̲gpu trap” between FunctionSet and Optimizer
–  Portable save/load
–  Make all functions pure for more ﬂexibility and reusability
6

Solution (current idea)
l  Hierarchy of network deﬁnitions
l  Example:
–  An autoencoder uses an encoder network and a decoder network
–  Each of the networks might be MLPs, ConvNets, etc.
–  MLP consists of several fully-‐‑‒connected layers
–  Each fully-‐‑‒connected layer deﬁnes a simple operation on the input variable
l  Call each component a chain
l  Modeling in Chainer will be linking several chains into one big chain
7

Terminology
l  Link
–  A minimal component of the chain (e.g. Linear, Convolution2D, etc.)
–  “Parameterized function” in the previous versions
–  It combines parameter variables with input variables to compute the output
variables
l  Chain, ChainList
–  Composition of child chains (including links)
–  Chain manages the child chains by a dictionary, while ChainList does by a list
8

Schematic of Link/Chain
9
Linear Linear Linear
Link
Chain
Function
layer1
layer2
layer3
predictor
x
t
loss
Example of a classiﬁer with a multi-‐‑‒layer perceptron
MLP
Classifier

Schematic of Link/Chain
Example of Variational AutoEncoder
10
Linear
Linear
Linear
Linear Linearx
kld
nll
loss
+
encoder
decoder
z
VariationalAutoEncoder
MLP
MLP(?)

Define by Run
l  Note that these diagrams do not mean the computational graph must be
fixed at the defnition of chains
–  The graph is dynamically constructed on the forward computation (define-‐‑‒by-‐‑‒
run)
l  A chain might implements multiple methods that constructs different
graphs
11

Example (gist: https://goo.gl/JKQgSy)
12

13

14
User can freely design the predictor chain.

15

16
User can freely design the encoder/decoder chains.

Planned features of Link/Chain/ChainList
l  The hierarchy is directly mapped to HDF5 format on serialization
–  Only the parameters and auxiliary variables (computed by learning) are saved
l  Helper method to traverse the hierarchy
–  Iterate all subchains in the hierarchy
–  Iterate all parameter variables in the hierarchy
17

New Optimizer
l  Optimizer is also updated
l  Optimizer will be aware of the target chain
–  Track the migration of the target chain between CPUs and GPUs
l  Optimizer is also serializable (in HDF5 format)
18

Parallel work: introduction of Cython
l  CuPy drawback: the CPU side manipulation is slow
l  No single huge bottleneck: the cause of slow down is already scattered
l  The easiest point to ﬁx: ctypes
–  ctypes is verrrrrrrrrrrry slow
–  Even extracting the current device consumes non-‐‑‒negligible running time
–  @okuta san is trying to make Cython replace it
l  Major impact on the Chainer package
–  Low level interface will change
–  setup.py is drastically updated (since Cython extension requires Cython to
build, while we have to make the package installable to environments into
which Cython is not installed yet)
19

Future work
l  Lazy computation
–  See VAE example: it computes all intermediate variables in the _̲_̲call_̲_̲
operator, while there might be a usage that a user only wants some of them
–  Chainer currently computes eagerly, which causes unneeded computations
–  Avoiding unneeded computations is one of the easiest graph optimization
–  More in general, I believe that the future is in fusion of symbolic and
dynamic paradigms
l  Symbolic optimization of computations on Variables (loop fusion, etc.)
l  Variable tags (or annotations)
–  Cf.) Blocks
l  Learning process abstraction, Data loading abstraction, etc.
20

Towards Chainer v1.5

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Towards Chainer v1.5

Similar to Towards Chainer v1.5 (20)

More from Seiya Tokui

More from Seiya Tokui (20)

Recently uploaded

Recently uploaded (20)

Towards Chainer v1.5