論文輪読資料「Gated Feedback Recurrent Neural Networks」

Gated Feedback Recurrent Neural Networks
Matsuo lab. paper reading session
Jul.17 2015
School of Engineering, The University of Tokyo
Hiroki Kurotaki
kurotaki@weblab.t.u-tokyo.ac.jp

Contents
2
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion

Contents
3
・Introduction
・Related Works
・Proposed Methods
・Conclusion

Paper Information
・Gated Feedback Recurrent Neural Networks
・Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio
Dept. IRO, Universite de Montreal, CIFAR Senior Fellow
・Proceedings of The 32nd International Conference on Machine Learning, pp.
2067–2075, 2015 (ICML 2015)
・1st submission to arXiv.org is on 9 Feb 2015
・Cited by 9 (Google Scholar, Jul 17 2015)
・http://jmlr.org/proceedings/papers/v37/chung15.html
4

Contents
5
・Introduction
・Related Works
・Proposed Methods
・Conclusion

Introduction 1/3
・They propose a novel recurrent neural network(RNN) architecture,
Gated-feedback RNN (GN-RNN).
・GN-RNN allows connections from upper layers to lower layers,
then controlls the signals by global gating unit.
6
(Each circle represents a layer consists of recurrent units (i.e. LSTM-Cell))

Introduction 2/3
・The proposed GF-RNN outperforms the baseline methods in these tasks.
7
1. Character-level Lauguage Modeling
(from a subsequence of structured data, predict the rest of the characters.)

Introduction 3/3
・The proposed GF-RNN outperforms the baseline methods in these tasks.
8
2. Python Program Evaluation
(predict script execution results from the input as a raw charater sequence.)
( [Zaremba 2014] Figure 1)

Contents
9
・Introduction
・Related Works
・Proposed Methods
・Conclusion

Related works(Unit) : Long short-term memory
・A LSTM cell is just a neuron
・but it decides when to memorize, forget and expose the content value
10
(The notation used in the figure is slightly different from this paper)

Related works(Unit) : Gated recurrent unit
・Cho et al. 2014
・Like LSTM, adaptively reset(forget) or update(input) its memory content.
・But unlike LSTM, no output gate
・balances between the previous and new memory contents adaptively
11( [Cho 2014] Figure 2)

Related works(Architecture) : Conventional Stacked RNN
・Each circle represents a layer consists of many recurrent units
・Several hidden recurrent layers are stacked to model and capture hierahchical
structure between short and long-term dependencies.
12

Related works(Architecture) : Clockwork RNN
・i-th hidden module is only updated at the rate of 2^(i-1)
・Neurons in faster module i are connected to neurons in a slower
module j only if a clock period T_i < T_j.
13( [Koutnik 2014] Figure 1)

Contents
14
・Introduction
・Related Works
・Proposed Methods
・Conclusion

Proposed Method : Gated Feedback RNN
・Generalize the Clockwork RNN in both connection and work rate
・Flows back from the upper recurrent layers into the lower layers
・Adaptively control when to connect each layer with "global reset gates".
(Small bullets on the edges)
15
: the concatenation of all the
hidden states from the previous
timestep (t-1)
: from layer i in timestep (t-1)
to layer j in timestep t
global reset gate

Proposed Method : GF-RNN with LSTM unit
・Only used when computing new memory state
16
(The notation used in the figure is slightly different from this paper)

Proposed Method : GF-RNN with GRN unit
・Only used when computing new memory state
17( [Cho 2014] Figure 2)

Contents
18
・Introduction
・Related Works
・Proposed Methods
・Conclusion

Experiment : Tasks (Lauguage Modeling)
・From a subsequence of structured data, predict the rest of the characters.)
19

Experiment : Tasks (Lauguage Modeling)
・Hutter dataset
・English Wikipedia, contains 100 MBytes of characters which include Latin
alphabets, non-Latin alphabets, XML markups and special characters
・Training set : the first 90 MBytes
Validation set : the next 5 MBytes
Test set : the last 10 MBytes
・Performance measure :
the average number of bits-per-character (BPC)
20

Experiment : Models (Lauguage Modeling)
・3 RNN architectures : single, (conventional) stacked, Gated-feedback
・3 recurrent units : tanh, LSTM, Gated Recurrent Unit (GRU)
・The number of parameters are constrained to be roughly 1000
・Detail
- RMSProp & momentum
- 100 epochs
- learning rate : 0.001 (GRU, LSTM)
5×10^(-5) (tanh)
- momentum coef. : 0.9
- Each update is done using a
minibatch of 100 subsequences
of length 100 each.
21

Experiment : Results and Analysis (Lauguage Modeling)
22
・GF-RNN is good when used together with GRU and LSTM
・But failed to improve the performance with tanh units
・GF-RNN with LSTM is better than the Non-Gated (The undermost)

Experiment : Results and Analysis (Lauguage Modeling)
23
・the stacked LSTM failed to close the tags with </username> and
</contributor> in both trials
・However, the GF-LSTM succeeded to close both of them,
which shows that it learned about the structure of XML tags

Experiment : Additional results (Lauguage Modeling)
24
・They trained another GF-RNN with LSTM which includes
larger number of parameters, and obtained comparable results.
・(They wrote it's better than the previously reported best results,
but there is a non-RNN work that acheived 1.278)

Experiment : Tasks (Python Program Evaluation)
・input : a python program ends with a print statement, 41symbols
output : the result of a print statement, 13 symbols
・Scripts used in this task include addition, multiplication, subtraction,
for-loop, variable assignment, logical comparison and if-else statement.
・Both the input & output are sequences of characters.
・Nesting : [1,5]
・length : [1, 1^10]
25( [Zaremba 2014] Figure 1)

Experiment : Models (Python Program Evaluation)
・RNN encoder-decoder approach, used for translation task previously
・Encoder RNN : the hidden state of the encoder RNN is unfolded for 50 timesteps.
・Decoder RNN : initial hidden state is initialized with the last hidden state of the
encoder RNN.
・Detail
- GRU & LSTM with and without Gated
- 3 hidden layers for each Encoder &
Decoder RNN
- hidden layer contains : 230 units(GRM)
200 units(LSTM)
- mixed curriculum strategy [Zaremba '14]
- Adam [Kingma '14]
- minibatch with 128 sequences
- 30 epochs
26( [Cho 2014] Figure 1)

Experiment : Results & Analysis (Python Program Evaluation)
・From the 3rd column, GF-RNN is better with almost all target script.
27
GRULSTM

Contents
28
・Introduction
・Related Works
・Proposed Methods
・Conclusion

Conclusion
・They proposed a novel architecture for deep stacked RNNs which uses gated-
feedback connections between different layers.
・The proposed method outperformed previous results in the tasks of character-level
language modeling and Python program evaluation.
・Gated-feedback architecture is faster and better (in performance) than the
standard stacked RNN even with a same amount of capacity.
・More thorough investigation into the interaction between the gated- feedback
connections and the role of recurrent activation function is required in the future.
(because the proposed gated-feedback architecture works bad with
the tanh activation function)
29

References
[Cho 2014] Cho, Kyunghyun, Van Merrienboer, Bart, Gulcehre, Caglar, Bougares, Fethi, Schwenk, Holger,
and Bengio, Yoshua. Learning phrase representations using rnn encoder-decoder for statistical machine
translation. arXiv preprint arXiv:1406.1078, 2014.
[Koutnik 2014] Koutnik, Jan, Greff, Klaus, Gomez, Faustino, and Schmidhuber, Ju ̈rgen. A clockwork rnn. In
Proceedings of the 31st International Conference on Machine Learning (ICML’14), 2014.
[Schmidhuber 1992] Schmidhuber, Jurgen. Learning complex, extended sequences using the principle of
history compression. Neural Computation, 4(2):234–242, 1992.
[Stollenga 2014] Stollenga, Marijn F, Masci, Jonathan, Gomez, Faustino, and Schmidhuber, Ju ̈rgen. Deep
networks with internal selective attention through feedback connections. In Ad- vances in Neural
Information Processing Systems, pp. 3545–3553, 2014.
[Zaremba 2014] Zaremba, Wojciech and Sutskever, Ilya. Learning to execute. arXiv preprint arXiv:1410.4615,
2014.
30

論文輪読資料「Gated Feedback Recurrent Neural Networks」

論文輪読資料「Gated Feedback Recurrent Neural Networks」

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to 論文輪読資料「Gated Feedback Recurrent Neural Networks」

Similar to 論文輪読資料「Gated Feedback Recurrent Neural Networks」 (20)

Recently uploaded

Recently uploaded (20)

論文輪読資料「Gated Feedback Recurrent Neural Networks」

Editor's Notes