4. Objective of this part
•List up the design choices of NN frameworks
•Introduce the objective differences between
existing frameworks on these choices
• Two or more choices at each topic
• Pros/cons of each choice
PAKDD2016 DLIF Tutorial 4
5. Outline
•Recall the steps of training NNs
•Quick comparison of existing frameworks
•Details of design choices
PAKDD2016 DLIF Tutorial 5
6. Outline
•Recall the steps of training NNs
•Quick comparison of existing frameworks
•Details of design choices
PAKDD2016 DLIF Tutorial 6
7. Steps for Training Neural Networks
Prepare the training dataset
Repeat until meeting some criterion
Prepare for the next (mini) batch
Compute the loss (forward prop)
Initialize the NN parameters
Save the NN parameters
Define how to compute the loss of this batch
Compute the gradient (backprop)
Update the NN parameters
PAKDD2016 DLIF Tutorial 7
8. Training of Neural Networks
Prepare the training dataset
Repeat until meeting some criterion
Prepare for the next (mini) batch
Compute the loss (forward prop)
Initialize the NN parameters
Save the NN parameters
Define how to compute the loss of this batch
Compute the gradient (backprop)
Update the NN parameters
automated
PAKDD2016 DLIF Tutorial 8
9. Training of Neural Networks
Prepare the training dataset
Repeat until meeting some criterion
Prepare for the next (mini) batch
Compute the loss (forward prop)
Initialize the NN parameters
Save the NN parameters
Define how to compute the loss of this batch
Compute the gradient (backprop)
Update the NN parameters
automated
PAKDD2016 DLIF Tutorial 9
10. Framework Design Choices
• The most crucial part of NN frameworks is
• How to define the parameters
• How to define the loss function of the parameters
(= how to write computational graphs)
• These also influence on APIs for forward prop, backprop, and
parameter updates (i.e., numerical optimization)
• And all of these are determined by how to implement
computational graphs
• Other parts are also important, but are mostly common to
implementations of other types of machine learning methods
PAKDD2016 DLIF Tutorial 10
11. Outline
•Recall the steps of training NNs
•Quick comparison of existing frameworks
•Details of design choices
PAKDD2016 DLIF Tutorial 11
12. List of Frameworks (not exhaustive)
• Torch.nn
• Theano and ones on top of it (Keras, Blocks, Lasagne, etc.)
• We omit introduction of each NN framework here, since
1) there are too many frameworks on top of Theano, and
2) most of them share characteristics derived from Theano
• Caffe
• autograd (NumPy, Torch)
• Chainer
• MXNet
• TensorFlow
PAKDD2016 DLIF Tutorial 12
13. Torch.nn
PAKDD2016 DLIF Tutorial 13
• MATLAB-like environment built on LuaJIT
• Fast scripting, CPU/GPU support with unified array backend
14. Theano (and ones on top of it)
PAKDD2016 DLIF Tutorial 14
• Support computational optimizations and compilations
• Python package to build computational graphs
15. Caffe
• Fast implementation of NNs in C++
• Mainly focusing on computer vision applications
PAKDD2016 DLIF Tutorial 15
16. autograd (NumPy, Torch)
• Original one adds automatic differentiation on NumPy APIs
• It is also ported to Torch
PAKDD2016 DLIF Tutorial 16
17. Chainer
• Support backprop through dynamically constructed graphs
• It also provides a NumPy-compatible GPU array backend
PAKDD2016 DLIF Tutorial 17
18. MXNet
• Mixed paradigm support (symbolic/imperative computations)
• It also supports distributed computations
PAKDD2016 DLIF Tutorial 18
19. TensorFlow
• Fast execution by distributed computations
• It also supports some control flows on top of the graphs
PAKDD2016 DLIF Tutorial 19
20. Framework Comparison: Basic information*
Viewpoint Torch.nn** Theano*** Caffe
autograd
(NumPy,
Torch)
Chainer MXNet
Tensor-
Flow
GitHub
stars
4,719 3,457 9,590
N: 654
T: 554
1,295 3,316 20,981
Started
from
2002 2008 2013 2015 2015 2015 2015
Open
issues/PRs
97/26 525/105 407/204
N: 9/0
T: 3/1
95/25 271/18 330/33
Main
developers
Facebook,
Twitter,
Google, etc.
Université
de Montréal
BVLC
(U.C. Berkeley)
N: HIPS
(Harvard Univ.)
T: Twitter
Preferred
Networks
DMLC Google
Core
languages
C/Lua C/Python C++ Python/Lua Python C++ C++/Python
Supported
languages
Lua Python
C++/Python
MATLAB
Python/Lua Python
C++/Python
R/Julia/Go
etc.
C++/Python
* Data was taken on Apr. 12, 2016
** Includes statistics of Torch7
*** There are many frameworks on top of Theano, though we omit them due to the space constraints
PAKDD2016 DLIF Tutorial 20
21. List of Important Design Choices
Programming paradigms
1. How to write NNs in text format
2. How to build computational graphs
3. How to compute backprop
4. How to represent parameters
5. How to update parameters
Performance improvements
6. How to achieve the computational performance
7. How to scale the computations
PAKDD2016 DLIF Tutorial 21
22. Framework Comparison: Design Choices
Design
Choice
Torch.nn
Theano-
based
Caffe
autograd
(NumPy,
Torch)
Chainer MXNet
Tensor-
Flow
1.NN
definition
Script
(Lua)
Script*
(Python)
Data
(protobuf)
Script
(Python,
Lua)
Script
(Python)
Script
(many)
Script
(Python)
2. Graph
construction
Prebuild Prebuild Prebuild Dynamic Dynamic Prebuild** Prebuild
3.
Backprop
Through
graph
Extended
graph
Through
graph
Extended
graph
Through
graph
Through
graph
Extended
graph
4.
Parameters
Hidden in
operators
Separate
nodes
Hidden in
operators
Separate
nodes
Separate
nodes
Separate
nodes
Separate
nodes
5. Update
formula
Outside of
graphs
Part of
graphs
Outside of
graphs
Outside of
graphs
Outside of
graphs
Outside of
graphs**
Part of
graphs
6.
Optimization
-
Advanced
optimization
- - - -
Simple
optimization
57 Parallel
computation
Multi GPU
Multi GPU
(libgpuarray)
Multi GPU
Multi GPU
(Torch)
Multi GPU
Multi node
Multi GPU
Multi node
Multi GPU
* Some of Theano-based frameworks use data (e.g. yaml)
** Dynamic dependency analysis and optimization is supported (no autodiff support) 22
23. Outline
• Recall the steps of training NNs
• Quick comparison of existing frameworks
• Details of design choices
PAKDD2016 DLIF Tutorial 23
24. List of Important Design Choices
Programming paradigms
1. How to write NNs in text format
2. How to build computational graphs
3. How to compute backprop
4. How to represent parameters
5. How to update parameters
Performance improvements
6. How to achieve the computational performance
7. How to scale the computations
PAKDD2016 DLIF Tutorial 24
25. How to write NNs in text format
Write NNs in declarative
configuration files
Framework builds layers of
NNs as written in the files
(e.g. prototxt, YAML).
E.g.: Caffe (prototxt),
Pylearn2 (YAML)
PAKDD2016 DLIF Tutorial 25
Write NNs by procedural
scripting
Framework provides APIs of
scripting languages to build
NNs.
E.g.: most other frameworks
26. How to write NNs in text format
Write NNs in declarative
configuration files
High portability
The configuration files are
easy to parse, and reuse for
other frameworks.
Low flexibility
Most static data format does
not support structured
programming, so it is hart to
write complex NNs.
PAKDD2016 DLIF Tutorial 26
Write NNs by procedural
scripting
Low portability
It requires much efforts to port
NNs to other frameworks.
High flexibility
Users can use the abstraction
power of the scripting
languages on building NNs.
27. List of Important Design Choices
Programming paradigms
1. How to write NNs in text format
2. How to build computational graphs
3. How to compute backprop
4. How to represent parameters
5. How to update parameters
Performance improvements
6. How to achieve the computational performance
7. How to scale the computations
PAKDD2016 DLIF Tutorial 27
28. 2. How to build computational graphs
Prepare the training dataset
Repeat until meeting some criterion
Prepare for the next (mini) batch
Compute the loss (forward prop)
Initialize the NN parameters
Save the NN parameters
Compute the gradient (backprop)
Update the NN parameters
Define how to compute the loss
PAKDD2016 DLIF Tutorial 28
Prepare the training dataset
Repeat until meeting some criterion
Prepare for the next (mini) batch
Compute the loss (forward prop)
Initialize the NN parameters
Save the NN parameters
Define how to compute the loss
Compute the gradient (backprop)
Update the NN parameters
Build once, run several times Build one at every iteration
29. 2. How to build computational graphs
PAKDD2016 DLIF Tutorial 29
Build once, run several
times
Computational graphs are
built once before entering
the loop.
E.g.: most frameworks
(Torch.nn, Theano, Caffe,
TensorFlow, MXNet, etc.)
Build one at every iteration
Computational graphs are
rebuilt at every iteration.
E.g.: autograd, Chainer
30. 2. How to build computational graphs
PAKDD2016 DLIF Tutorial 30
Build once, run several
times
Easy to optimize the
computations
Framework can optimize the
computational graphs on
constructing them.
Low flexibility and usability
Users cannot build different
graphs for different iterations
using language syntaxes.
Build one at every iteration
Hard to optimize the
computations
It is basically difficult to do
optimization every iteration due
to its computational cost.
High flexibility and usability
Users can build different graphs
for different iterations using
language syntaxes.
31. Flexibility and availability of runtime
language syntaxes
Example: recurrent nets for variable length sequences
Batch 1
Batch 2
Batch 3
Batch 4
In “build once” approach, we must
build all possible graphs
beforehand, or use framework-
specific “control flow operators”.
PAKDD2016 DLIF Tutorial 31
In “build every time” approach, we
can use for loops of the
underlying languages to build
such graphs, using data-
dependent termination conditions.
32. List of Important Design Choices
Programming paradigms
1. How to write NNs in text format
2. How to build computational graphs
3. How to compute backprop
4. How to represent parameters
5. How to update parameters
Performance improvements
6. How to achieve the computational performance
7. How to scale the computations
PAKDD2016 DLIF Tutorial 32
33. 3. How to compute backprop
PAKDD2016 DLIF Tutorial 33
Backprop through graphs
Framework only builds
graphs of forward prop, and
do backprop by backtracking
the graphs.
E.g.: Torch.nn, Caffe, MXNet,
Chainer
Backprop as extended graphs
Framework builds graphs for
backprop as well as those for
forward prop.
E.g.: Theano, TensorFlow
a mul suby
c
z
b
a mul suby
c
z
b
dzid
neg
mul
mul
dy
dc
da
db
∇y z∇x1 z ∇z z = 1
34. 3. How to compute backprop
PAKDD2016 DLIF Tutorial 34
Backprop through graphs
Easy and simple to
implement
Backprop computation need
not be defined as graphs.
Low flexibility
Features available for graphs
may not apply to backprop
computations (e.g., applying
additional backprop thorugh
them, computational
optimizations, etc.).
Backprop as extended graphs
Implementation gets
complicated
High flexibility
Any features available for
graphs can also be applied to
backprop computations.
35. List of Important Design Choices
Programming paradigms
1. How to write NNs in text format
2. How to build computational graphs
3. How to compute backprop
4. How to represent parameters
5. How to update parameters
Performance improvements
6. How to achieve the computational performance
7. How to scale the computations
PAKDD2016 DLIF Tutorial 35
36. 4. How to represent parameters
PAKDD2016 DLIF Tutorial 36
Parameters as part of
operator nodes
Parameters are owned by
operator nodes (e.g.,
convolution layers), and not
directly appear in the graphs.
E.g.: Torch.nn, Caffe, MXNet
Parameters as separate nodes
in the graphs
Parameters are represented as
separate variable nodes.
E.g.: Theano, Chainer,
TensorFlow
x
Affine
(own W and b)
y
x
Affine yW
b
37. 4. How to represent parameters
PAKDD2016 DLIF Tutorial 37
Parameters as part of
operator nodes
Intuitiveness
This representation
resembles the classical
formulation of NNs.
Low flexibility and
reusability
We cannot do same things
for the parameters that can
be done for variable nodes.
Parameters as separate nodes
in the graphs
High flexibility and reusability
We can apply any operations
that can be done for variable
nodes to the parameters.
38. 5. How to update parameters
PAKDD2016 DLIF Tutorial 38
Update parameters by own
routines outside of the
graphs
Update formulae are
implemented directly using
the backend array libraries.
E.g.: Torch.nn, Caffe, MXNet,
Chainer
Represent update formulae as
a part of the graphs
Update formulae are built as a
part of computational graphs.
E.g.: Theano, TensorFlow
39. 5. How to update parameters
PAKDD2016 DLIF Tutorial 39
Update parameters by own
routines outside of the
graphs
Easy to implement
We can use any features of
the array backend on writing
update formulae.
Low integrity
Update formulae are not
integrated to computational
graphs.
Represent update formulae as
a part of the graphs
Implementation gets
complicated
Framework must support assign
or update operations within the
computational graphs.
High integrity
We can apply e.g. optimizations
to the update formulae.
40. List of Important Design Choices
Programming paradigms
1. How to write NNs in text format
2. How to build computational graphs
3. How to compute backprop
4. How to represent parameters
5. How to update parameters
Performance improvements
6. How to achieve the computational performance
7. How to scale the computations
PAKDD2016 DLIF Tutorial 40
41. 6. How to achieve the computational
performance
PAKDD2016 DLIF Tutorial 41
Transform the graphs to
optimize the computations
There are many ways to
optimize the computations.
Theano supports variout
optimizations.
TensorFlow does simple
ones.
Provide easy ways to write
custom operator nodes
Users can write their own
operator nodes optimized to
their purposes.
Torch, MXNet, and Chainer
provide ways to write one code
that runs both on CPU and GPU.
Chainer also provides ways to
write custom CUDA kernels
without manual compilation
steps.
42. 7. How to scale the computations
PAKDD2016 DLIF Tutorial 42
Multi-GPU parallelizations
Nowadays, most popular
frameworks start supporting
multi-GPU computations.
Multi-GPU (one machine) is
enough for most use cases
today.
Distributed computations (i.e.,
multi-node parallelizations)
Some frameworks also support
distributed computations to
further scale the learning.
MXNet uses a simple
distributed key-value store.
TensorFlow uses gRPC. It will
also support easy-to-use cloud
environments.
CNTK uses simple MPI.
43. Ease and comfortability of writing NNs
• I mainly explained the abilities of each framework
• But it does not include many things around the framework
comparison
• Choice of frameworks actually depends on the ease and
comfortability of writing NNs on them
• Many people chooses Torch for research, because Lua is simple
and fast so that they do not have to care about the performance (in
most cases)
• Try and error is important here again (as well as its
importance on deep learning research itself)
• The choice of frameworks finally depends on your preference
• The capabilities are still important to satisfy your demands
PAKDD2016 DLIF Tutorial 43
44. Summary
• The important points of framework differences are in the
ways to define computational graphs and how to use them
• There are several design choices on the framework
development
• Each of them influences on their performance and flexibility
(i.e., the range of easily representable NNs and their learning
procedures)
• Once your demands are satisfied, choose one that you feel
comfortable (it strongly depends on your own preferences!)
PAKDD2016 DLIF Tutorial 44
45. Conclusion
• We introduced the basics of NNs, typical designs of their
implementations, and pros/cons of various design choices.
• Deep learning is an emerging field with increasing speed of
development, so quick try-and-error is crutial for the
research/development in this field
• In that mean, using frameworks as highly reusable parts of
NNs is important
• There are growing number of frameworks in this world,
though most of them have different aspects, so it is also
important to choose one appropriate for your purpose
PAKDD2016 DLIF Tutorial 45