The document introduces primitiv, a neural network toolkit that uses computation graphs and dynamic construction with lazy evaluation. It discusses different strategies for constructing computation graphs, including static, dynamic define-by-run, and dynamic with lazy evaluation. Primitiv uses the dynamic with lazy evaluation approach, which allows for interactive graph construction while also enabling just-in-time optimization. The document provides an overview of primitiv's design goals, which include being simple, compact, device-independent, supporting implicit minibatching, and allowing usage from multiple languages.
2. Agenda
• Basics of neural networks with computation graphs
• Design details and examples of primitiv
• An example usage
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 2
14. Function and Variable of Graph
𝑋1, 𝑔 𝑋1
𝑌, 𝑔 𝑌
𝑋2, 𝑔 𝑋2
𝑓𝑓𝑤, 𝑓𝑏𝑤
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 14
15. Function and Variable of Graph
𝑋1, 𝑔 𝑋1
𝑌, 𝑔 𝑌
𝑋2, 𝑔 𝑋2
𝑓𝑓𝑤, 𝑓𝑏𝑤
Function: specifies the forward/backward calculation
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 15
16. Function and Variable of Graph
𝑋1, 𝑔 𝑋1
𝑌, 𝑔 𝑌
𝑋2, 𝑔 𝑋2
𝑓𝑓𝑤, 𝑓𝑏𝑤
Variable: represents actual values and gradients
Function: specifies the forward/backward calculation
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 16
17. Function and Variable of Graph
𝑋1, 𝑔 𝑋1
𝑌, 𝑔 𝑌
𝑋2, 𝑔 𝑋2
𝑓𝑓𝑤, 𝑓𝑏𝑤
Arguments: 0 or more
Results: 1 or more
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 17
19. Combined Functions
• Any subgraphs with starts/ends by Function can be
as one Function.
𝑋, 𝑔 𝑋
𝑌, 𝑔 𝑌
𝑊
matmul
parameter
𝑏
parameter
add 𝑢 ReLU
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 19
20. Combined Functions
• Any subgraphs with starts/ends by Function can be
as one Function.
𝑋, 𝑔 𝑋
𝑌, 𝑔 𝑌
𝑊
matmul
“Linear” function in some
toolkits owns parameters itself,
and applies 2-3 functions.
parameter
𝑏
parameter
add 𝑢 ReLU
“Linear”
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 20
21. 3 Strategies to Construct
Computation Graphs
• Difference: when/how to construct graph and
calculate the results.
• Static construction
• Caffe, Torch, TensorFlow, etc.
• Dynamic construction (define-by-run)
• Chainer, PyTorch, etc.
• Dynamic construction with lazy evaluation
• DyNet, PyTorch(partially), primitiv
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 21
27. Dynamic Construction
(define-by-run)
• Graph construction and actual calculation are
performed simultaneously.
3
15
2
5 matmulRun 1
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 27
28. Dynamic Construction
(define-by-run)
• Graph construction and actual calculation are
performed simultaneously.
3
15
2
5
17
matmul
add
Run 1
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 28
29. Dynamic Construction
(define-by-run)
• Graph construction and actual calculation are
performed simultaneously.
3
15
2
5
17 0.9…
matmul
add tanh
Run 1
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 29
30. Dynamic Construction
(define-by-run)
• Graph construction and actual calculation are
performed simultaneously.
3
15
2
5
17 0.9…
3.9…matmul
add tanh
addRun 1
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 30
31. Dynamic Construction
(define-by-run)
• Graph construction and actual calculation are
performed simultaneously.
3
15
2
5
17 0.9…
3.9…matmul
add tanh
addRun 1
9
18
-1
2
17 0.9…
9.9…matmul
add tanh
addRun 2
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 31
32. Dynamic Construction with Lazy
Evaluation
• Consists of 2 steps:
1. Constructing graphs using only types of values.
2. Performs actual computation (forward/backward)
along the graph.
3
?
2
5
? ?
?matmul
add tanh
add
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 32
33. Dynamic Construction with Lazy
Evaluation
• Consists of 2 steps:
1. Constructing graphs using only types of values.
2. Performs actual computation (forward/backward)
along the graph.
3
?
2
5
? ?
?matmul
add tanh
add
Query
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 33
34. Dynamic Construction with Lazy
Evaluation
• Consists of 2 steps:
1. Constructing graphs using only types of values.
2. Performs actual computation (forward/backward)
along the graph.
3
15
2
5
17 0.9…
3.9…matmul
add tanh
add
Query
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 34
35. Pros/cons of each strategies
• Static
• Capable of strong compile-time optimization
• Difficult to construct interactive graphs
• Dynamic (define-by-run)
• Capable of constructing interactive graphs
• Much overheads and difficulty of optimization
• Dynamic + Lazy
• Also capable of interactive graphs
• Applying just-in-time optimization
• 2-pass traversal over the graph
• calculate shapes always, then calculate values on demand.
• Still difficult to entire optimization
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 35
37. primitiv: Dynamic+Lazy NN Toolkit
• Originally forked from DyNet
• Restructured whole components
• Concepts
• Simple
• Compact
• Device/environment independent
• Implicit minibatching
• Multiple language support
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 37
38. primitiv: Simple
• Consists of essential functionalities.
• Pointless things are mostly omitted.
• Less learning cost
• But, the code does not become long.
• Encoder-decoder can be implemented about 300 lines in
C++ (see examples in the repository).
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 38
39. primitiv: Compact
• For minimal installation, you only need GCC/Clang
and CMake.
• If you need to use some specific hardware (e.g.
CUDA), all you need is adding the build switch.
$ git clone https://github.com/primitiv/primitiv
$ cd primitiv
$ cmake .
$ make
$ make install
$ echo "That's all."
$ cmake . –DPRIMITIV_USE_CUDA=ON
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 39
40. primitiv: Device/environment
Independent
• Device-specific code and network structure are
completely separated.
• Once the model was written, the code can be
executed using any (even unknown) hardware with
no modification.
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 40
41. primitiv: Device/environment
Independent
• Device-specific code and network structure are
completely separated.
• Once the model was written, the code can be
executed using any (even unknown) hardware with
no modification.
#include <primitiv/primitiv.h>
using namespace primitiv;
namespace F = primitiv::functions;
Node predict(Node &x, Parameter &w, Parameter &b) {
Node ww = F::parameter<Node>(w);
Node bb = F::parameter<Node>(b);
return F::tanh(F::matmul(w, x) + b) + x;
}
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 41
42. primitiv: Device/environment
Independent
• Device-specific code and network structure are
completely separated.
• Once the model was written, the code can be
executed using any (even unknown) hardware with
no modification.
#include <primitiv/primitiv.h>
using namespace primitiv;
namespace F = primitiv::functions;
Node predict(Node &x, Parameter &w, Parameter &b) {
Node ww = F::parameter<Node>(w);
Node bb = F::parameter<Node>(b);
return F::tanh(F::matmul(w, x) + b) + x;
}
Run on CPU
Run on CUDA
Run on OpenCL
Run on somewhere
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 42
43. primitiv: Implicit Minibatching
• Most networks can be utilized to both
single/minibatched data.
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 43
44. primitiv: Implicit Minibatching
• Most networks can be utilized to both
single/minibatched data.
Node predict(
Node &x,
Parameter &w,
Parameter &b)
{
Node ww = F::parameter<Node>(w);
Node bb = F::parameter<Node>(b);
return F::tanh(F::matmul(w, x) + b) + x;
}
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 44
45. primitiv: Implicit Minibatching
• Most networks can be utilized to both
single/minibatched data.
Node predict(
Node &x,
Parameter &w,
Parameter &b)
{
Node ww = F::parameter<Node>(w);
Node bb = F::parameter<Node>(b);
return F::tanh(F::matmul(w, x) + b) + x;
}
3
3.9
Single data
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 45
46. primitiv: Implicit Minibatching
• Most networks can be utilized to both
single/minibatched data.
Node predict(
Node &x,
Parameter &w,
Parameter &b)
{
Node ww = F::parameter<Node>(w);
Node bb = F::parameter<Node>(b);
return F::tanh(F::matmul(w, x) + b) + x;
}
3
3.9
3
3.9
4 5
4.9 5.9
3-minibatched data
Single data
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 46
52. Core Components of primitiv
• Shape
• Device and Tensor
• Graph and Node
• Parameter and Optimizer
• Other functionalities
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 52
53. Shape
• A Shape represents the volume and the minibatch
size of the data.
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 53
54. Shape
• A Shape represents the volume and the minibatch
size of the data.
A scalar Shape({})
1 value
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 54
55. Shape
• A Shape represents the volume and the minibatch
size of the data.
A scalar Shape({})
1 value
A column vector Shape({3})
3 values
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 55
56. Shape
• A Shape represents the volume and the minibatch
size of the data.
A scalar Shape({})
1 value
A column vector Shape({3})
3 values
A matrix Shape({3, 4})
12 values
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 56
57. Shape
• A Shape represents the volume and the minibatch
size of the data.
A scalar Shape({})
1 value
A column vector Shape({3})
3 values
A matrix Shape({3, 4})
12 values
5 matrices Shape({3, 4}, 5)
60 values
×5
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 57
58. Shape Equivalence Rule
• "1" at the end dimensions are identical with none:
• "1" minibatch is identical with the single data:
Shape({3, 1}) == Shape({3})
Matrix Column vector
Shape({1}) == Shape({})
Column vector Scalar
Shape({2, 3, 4, 1, 1, 1}) == Shape({2, 3, 4})
Shape({2, 3, 4}, 1) == Shape({2, 3, 4})
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 58
59. Minibatch Broadcasting Rule
• Arguments of 𝑛 ≥ 2 -ary functions/ops with
minibatch size 1 are implicitly broadcasted.
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 59
x = data with Shape({2, 2}, 123);
y = data with Shape({2, 2}, 123);
z = data with Shape({2, 2});
w = data with Shape({2, 2}, 42);
F::matmul(x, y); Shape({2, 2}, 123)
Operation will be performed for each minibatch separately.
F::matmul(z, w); Shape({2, 2}, 42)
`z` will be implicitly broadcasted.
F::sum({x, y, z}); Shape({2, 2}, 123)
`z` will be implicitly broadcasted.
F::sum({y, z, w}); Error!
Different sizes (123 vs 42) can not be calculated.
60. Device
• Device objects manages actual subroutines and the
memory management on a specific hardware.
• All hardware-related programs (e.g., CUDA) is
encapsulated in the Device.
CPU CUDA Other Hardwares
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 60
61. Device
• Device objects manages actual subroutines and the
memory management on a specific hardware.
• All hardware-related programs (e.g., CUDA) is
encapsulated in the Device.
CPU CUDA Other Hardwares
Unified "Device" Interface
CPU-specific
Routines
CUDA-specific
Routines
Other Routines
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 61
62. Device
• Device objects manages actual subroutines and the
memory management on a specific hardware.
• All hardware-related programs (e.g., CUDA) is
encapsulated in the Device.
CPU CUDA Other Hardwares
Unified "Device" Interface
Application Application
CPU-specific
Routines
CUDA-specific
Routines
Other Routines
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 62
63. Tensor
• Tensor is the most elementary interface of data.
• Each Tensor is related to a Device, has a Device-
specific memory, and a Shape to represent the
appearance of the data.
• Calculation is performed by eager evaluation.
Results are obtained immediately
Tensor
Reference to
the Device
Device-specific
Memory
Shape
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 63
64. Snippet:
Using Device and Tensor
#include <primitiv/primitiv.h>
using namespace primitiv;
// primitiv::functions has many functions for Tensor.
namespace F = primitiv::functions;
devices::Naive dev1; // Initializes CPU device
devices::CUDA dev2(0); // Initializes CUDA on GPU 0
devices::CUDA dev3(1); // Initializes CUDA on GPU 1
// `dev1` -- `dev3` have the same "Device" interface.
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 64
65. Snippet:
Using Device and Tensor
// Making a new Tensor on `dev1`
Shape s({2, 2});
std::vector<float> data {1, 2, 3, 4}; // column-major
Tensor x1 = F::input<Tensor>(s, data, dev1);
// Making an 2-dimensional identity matrix on `dev2`
Tensor x2 = F::identity<Tensor>(2, dev2);
// Move x1 onto `dev2`
Tensor x11 = F::copy(x1, dev2);
// Math
Tensor x3 = x11 + x2; // x3 == {2, 2, 3, 5}
Tensor xe = x1 + x2; // Error: different device
Tensor x4 = F::exp(x1);
std::vector<float> ret = x4.to_vector(); // {2.7,7.4,20.,55.}
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 65
66. Default Device
• The "Device" argument of each function can be
omitted using the default device.
devices::CUDA dev(0);
// Specifies `dev` as the default
Device::set_default(dev);
// Same as F::input<Tensor>(shape, data, dev);
Tensor x = F::input<Tensor>(shape, data);
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 66
67. Graph and Node
• Graph object represents a computation graph and
its states.
*
*
*
*
* *
*matmul
add tanh
add
parameter
parameter
input
Graph
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 67
68. Graph and Node
• Graph object represents a computation graph and
its states.
• Node object represents a variable node in the
Graph.
*
*
*
*
* *
*matmul
add tanh
add
parameter
parameter
input
GraphNode
Reference
to Graph
Variable
ID
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 68
69. Adding new Nodes into Graph
• Simply applying functions to add a new calculation
into the Graph.
• Node has the similar interface to Tensor.
• Math functions
• Arithmetic operations
x
Node x = F::input<Node>(shape, data);
input
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 69
70. Adding new Nodes into Graph
• Simply applying functions to add a new calculation
into the Graph.
• Node has the similar interface to Tensor.
• Math functions
• Arithmetic operations
x yexpx
Node x = F::input<Node>(shape, data);
Node y = F::exp(x);
input input
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 70
71. Lazy Evaluation through Nodes
• Unlike Tensor, Node is just a placeholder of values,
and does not invoke actual computation when it is
created.
• When the value was explicitly queried, all required
calculations are invoked.
? ?exp
std::vector<float> ret = y.to_vector()
input
y
Query
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 71
72. Lazy Evaluation through Nodes
• Unlike Tensor, Node is just a placeholder of values,
and does not invoke actual computation when it is
created.
• When the value was explicitly queried, all required
calculations are invoked.
Val ?exp
std::vector<float> ret = y.to_vector()
input
Invoke!
y
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 72
73. Lazy Evaluation through Nodes
• Unlike Tensor, Node is just a placeholder of values,
and does not invoke actual computation when it is
created.
• When the value was explicitly queried, all required
calculations are invoked.
Val Valexp
std::vector<float> ret = y.to_vector()
input
Invoke!
y
Return
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 73
74. Lazy Evaluation through Nodes
• Once the results are calculated, Nodes caches the
values and it will be reused by future queries.
• Unused values are never calculated.
Cac
hed
Cac
hed
?
Cac
hed
? ?
?matmul
add tanh
add
Query6/1/2018 Copyright (c) 2018 by Yusuke Oda. 74
75. Lazy Evaluation through Nodes
• Once the results are calculated, Nodes caches the
values and it will be reused by future queries.
• Unused values are never calculated.
Cac
hed
Cac
hed
Val
Cac
hed
Val Val
?matmul
add tanh
add
Invoked Invoked Invoked
Not
Invoked
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 75
76. Parameter
• Parameter objects represents a trainable parameter
in the network.
• Its values can be used in a variable of Graph, and its
gradients are updated by Graph.
• Initial values can be specified by hand, or using
Initializer object.
Parameter
Reference to
the Device
Values
Cumulative
Gradients
Other
Statistics
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 76
77. Optimizer
• Optimizer manages an update policy (SGD, Adam,
etc.) of Parameters.
• It consumes the gradient information that
Parameter holds to update the values.
• It also registers the statistics of each Parameter that
the update policy requires.
• I.e., all statistics about the Parameter is populated in the
Parameter object itself, and Optimizer does not have
such information.
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 77
78. Snippet:
Initializing Parameter/Optimizer
// Device
devices::CUDA dev(0);
Device::set_default(dev);
// Parameter/Optimizer
Parameter p1(Shape({3}), {1, 2, 3});
Parameter p2(Shape({3}), initializers::Uniform(-1, 1));
// Using a uniform distribution to the initial values.
// Optimizer
optimizers::SGD opt(0.1); // Initializes SGD with LR=0.1.
opt.add(p1, p2); // Registers `p1` and `p2` to the optimizer.
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 78
79. Backpropagation
• Backpropagation can be performed through Nodes
by invoking bakcward() function.
• Tensors can not perform backpropagation because they
don't manage gradients and computation graphs.
• If the computation graph had some Parameters,
their gradients are updated by backward().
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 79
80. Snippet: Backpropagation
Graph g;
Graph::set_default(g); // Make g as the default.
Parameter p(Shape({3}), {1, 2, 3});
optimizers::SGD opt(0.1);
opt.add(p);
Node w = F::parameter(p);
Node x = F::input(Shape({3}), {2, 3, 5});
Node y = w * x; // Elementwise multiplication
y.to_vector(); // {2, 6, 15}
opt.reset_gradients(); // Make all gradients of parameters 0.
y.backward(); // Performs the backpropagation.
p.gradient().to_vector(); // {2, 3, 5}
opt.update(); // Performs the SGD rule:
// {1, 2, 3} – 0.1 * {2, 3, 5}
p.value().to_vector(); // {0.8, 1.7, 2.5}
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 80
86. Code 1: Initialization
• Including headers and declaring the main function
#include <iostream>
#include <vector>
#include <primitiv/primitiv.h>
using namespace std;
using namespace primitiv;
int main() {
devices::Naive dev; // uses CPU
Graph g;
Device::set_default(dev);
Graph::set_default(g);
// All code will be described here.
return 0;
}
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 86
87. Code 2: Parameter and Optimizer
• We have 4 parameters: 𝑊𝒉𝑦, 𝑏 𝑦, 𝑊𝒙𝒉, 𝒃 𝒉.
(in main function)
constexpr unsigned N = 8; // #hidden units
Parameter pw_xh({N, 2}, initializers::XavierUniform());
Parameter pb_h({N}, initializers::Constant(0));
Parameter pw_hy({1, N}, initializers::XavierUniform());
Parameter pb_y({}, initializers::Constant(0));
constexpr float learning_rate = 0.1;
optimizers::SGD opt(learning_rate);
opt.add(pw_xh, pb_h, pw_hy, pb_y);
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 87
88. Code 3: Writing The Network
• Using lambda:
(in main function)
auto feedforward = [&](const Node &x) {
namespace F = primitiv::functions;
const Node w_xh = F::parameter<Node>(pw_xh); // Shape({N, 2})
const Node b_h = F::parameter<Node>(pb_h); // Shape({N})
const Node w_hy = F::parameter<Node>(pw_hy); // Shape({1, N})
const Node b_y = F::parameter<Node>(pb_y); // Shape({})
const Node h = F::tanh(F::matmul(w_xh, x) + b_h); // Shape({N}, B)
return F::tanh(F::matmul(w_hy, h) + b_y); // Shape({}, B)
};
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 88
89. Code 4: Loss Function
• Similar to the main network:
(in main function)
auto squared_loss = [](const Node &y, const Node &t) {
namespace F = primitiv::functions;
const Node diff = y - t; // Shape({}, B)
return F::batch::mean(diff * diff); // Shape({})
};
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 89
90. Code 5: Making The Minibatch
• This section is out of the toolkit, just up to the data.
(in main function)
constexpr float data_sd = 1.0;
constexpr float noise_sd = 0.1;
DataSource data_source(data_sd, noise_sd);
auto next_data = [&](unsigned minibatch_size) {
std::vector<float> data;
std::vector<float> labels;
for (unsigned i = 0; i < minibatch_size; ++i) {
float x1, x2, t;
std::tie(x1, x2, t) = data_source();
data.emplace_back(x1);
data.emplace_back(x2);
labels.emplace_back(t);
}
namespace F = primitiv::functions;
return std::make_tuple(
F::input<Node>(Shape({2}, minibatch_size), data), // input data `x`
F::input<Node>(Shape({}, minibatch_size), labels)); // label data `t`
};
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 90
91. Code 6: Training Loop
(in main function)
for (unsigned epoch = 0; epoch < 100; ++epoch) {
g.clear();
// Initializes the computation graph
Node x, t;
std::tie(x, t) = next_data(1000); // Obtains the next data
const Node y = feedforward(x); // Calculates the network
const Node loss = squared_loss(y, t); // Calculates the loss
std::cout << epoch << ": train loss=" << loss.to_float() << std::endl;
// Performs backpropagation and updates parameters
opt.reset_gradients();
loss.backward();
opt.update();
}
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 91
92. Code 6: Training Loop
(in main function)
for (unsigned epoch = 0; epoch < 100; ++epoch) {
g.clear();
// Initializes the computation graph
Node x, t;
std::tie(x, t) = next_data(1000); // Obtains the next data
const Node y = feedforward(x); // Calculates the network
const Node loss = squared_loss(y, t); // Calculates the loss
std::cout << epoch << ": train loss=" << loss.to_float() << std::endl;
// Performs backpropagation and updates parameters
opt.reset_gradients();
loss.backward();
opt.update();
}
$ g++ -std=c++11 code.cc -lprimitiv
$ ./a.out
0: loss=1.17221
1: loss=1.07423
2: loss=1.06282
3: loss=1.04641
4: loss=1.00851
5: loss=1.01904
...
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 92
93. Code 7: Testing
(in main function)
for (unsigned epoch = 0; epoch < 100; ++epoch) {
(Training process written in the previous code block)
if (epoch % 10 == 9) {
namespace F = primitiv::functions;
const vector<float> test_x_data {1, 1, -1, 1, -1, -1, 1, -1};
const vector<float> test_x_data {1, -1, 1, -1};
const Node test_x = F::input<Node>(Shape({2}, 4), test_x_data);
const Node test_t = F::input<Node>(Shape({}, 4), test_t_data);
const Node test_y = feedforward(test_x);
const Node test_loss = squared_loss(test_y, test_t);
std::cout << "test results:";
for (float val : test_y.to_vector()) {
std::cout << ' ' << val;
}
std::cout << "ntest loss: " << test_loss.to_float() << std::endl;
}
}
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 93