SlideShare a Scribd company logo
1 of 31
Gated Feedback Recurrent Neural Networks
Matsuo lab. paper reading session
Jul.17 2015
School of Engineering, The University of Tokyo
Hiroki Kurotaki
kurotaki@weblab.t.u-tokyo.ac.jp
Contents
2
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Contents
3
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Paper Information
・Gated Feedback Recurrent Neural Networks
・Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio
Dept. IRO, Universite de Montreal, CIFAR Senior Fellow
・Proceedings of The 32nd International Conference on Machine Learning, pp.
2067–2075, 2015 (ICML 2015)
・1st submission to arXiv.org is on 9 Feb 2015
・Cited by 9 (Google Scholar, Jul 17 2015)
・http://jmlr.org/proceedings/papers/v37/chung15.html
4
Contents
5
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Introduction 1/3
・They propose a novel recurrent neural network(RNN) architecture,
Gated-feedback RNN (GN-RNN).
・GN-RNN allows connections from upper layers to lower layers,
then controlls the signals by global gating unit.
6
(Each circle represents a layer consists of recurrent units (i.e. LSTM-Cell))
Introduction 2/3
・The proposed GF-RNN outperforms the baseline methods in these tasks.
7
1. Character-level Lauguage Modeling
(from a subsequence of structured data, predict the rest of the characters.)
Introduction 3/3
・The proposed GF-RNN outperforms the baseline methods in these tasks.
8
2. Python Program Evaluation
(predict script execution results from the input as a raw charater sequence.)
( [Zaremba 2014] Figure 1)
Contents
9
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Related works(Unit) : Long short-term memory
・A LSTM cell is just a neuron
・but it decides when to memorize, forget and expose the content value
10
( [Zaremba 2014] Figure 1)
(The notation used in the figure is slightly different from this paper)
Related works(Unit) : Gated recurrent unit
・Cho et al. 2014
・Like LSTM, adaptively reset(forget) or update(input) its memory content.
・But unlike LSTM, no output gate
・balances between the previous and new memory contents adaptively
11( [Cho 2014] Figure 2)
Related works(Architecture) : Conventional Stacked RNN
・Each circle represents a layer consists of many recurrent units
・Several hidden recurrent layers are stacked to model and capture hierahchical
structure between short and long-term dependencies.
12
Related works(Architecture) : Clockwork RNN
・i-th hidden module is only updated at the rate of 2^(i-1)
・Neurons in faster module i are connected to neurons in a slower
module j only if a clock period T_i < T_j.
13( [Koutnik 2014] Figure 1)
Contents
14
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Proposed Method : Gated Feedback RNN
・Generalize the Clockwork RNN in both connection and work rate
・Flows back from the upper recurrent layers into the lower layers
・Adaptively control when to connect each layer with "global reset gates".
(Small bullets on the edges)
15
: the concatenation of all the
hidden states from the previous
timestep (t-1)
: from layer i in timestep (t-1)
to layer j in timestep t
global reset gate
Proposed Method : GF-RNN with LSTM unit
・Only used when computing new memory state
16
( [Zaremba 2014] Figure 1)
(The notation used in the figure is slightly different from this paper)
Proposed Method : GF-RNN with GRN unit
・Only used when computing new memory state
17( [Cho 2014] Figure 2)
Contents
18
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Experiment : Tasks (Lauguage Modeling)
・From a subsequence of structured data, predict the rest of the characters.)
19
Experiment : Tasks (Lauguage Modeling)
・Hutter dataset
・English Wikipedia, contains 100 MBytes of characters which include Latin
alphabets, non-Latin alphabets, XML markups and special characters
・Training set : the first 90 MBytes
Validation set : the next 5 MBytes
Test set : the last 10 MBytes
・Performance measure :
the average number of bits-per-character (BPC)
20
Experiment : Models (Lauguage Modeling)
・3 RNN architectures : single, (conventional) stacked, Gated-feedback
・3 recurrent units : tanh, LSTM, Gated Recurrent Unit (GRU)
・The number of parameters are constrained to be roughly 1000
・Detail
- RMSProp & momentum
- 100 epochs
- learning rate : 0.001 (GRU, LSTM)
5×10^(-5) (tanh)
- momentum coef. : 0.9
- Each update is done using a
minibatch of 100 subsequences
of length 100 each.
21
Experiment : Results and Analysis (Lauguage Modeling)
22
・GF-RNN is good when used together with GRU and LSTM
・But failed to improve the performance with tanh units
・GF-RNN with LSTM is better than the Non-Gated (The undermost)
Experiment : Results and Analysis (Lauguage Modeling)
23
・the stacked LSTM failed to close the tags with </username> and
</contributor> in both trials
・However, the GF-LSTM succeeded to close both of them,
which shows that it learned about the structure of XML tags
Experiment : Additional results (Lauguage Modeling)
24
・They trained another GF-RNN with LSTM which includes
larger number of parameters, and obtained comparable results.
・(They wrote it's better than the previously reported best results,
but there is a non-RNN work that acheived 1.278)
Experiment : Tasks (Python Program Evaluation)
・input : a python program ends with a print statement, 41symbols
output : the result of a print statement, 13 symbols
・Scripts used in this task include addition, multiplication, subtraction,
for-loop, variable assignment, logical comparison and if-else statement.
・Both the input & output are sequences of characters.
・Nesting : [1,5]
・length : [1, 1^10]
25( [Zaremba 2014] Figure 1)
Experiment : Models (Python Program Evaluation)
・RNN encoder-decoder approach, used for translation task previously
・Encoder RNN : the hidden state of the encoder RNN is unfolded for 50 timesteps.
・Decoder RNN : initial hidden state is initialized with the last hidden state of the
encoder RNN.
・Detail
- GRU & LSTM with and without Gated
- 3 hidden layers for each Encoder &
Decoder RNN
- hidden layer contains : 230 units(GRM)
200 units(LSTM)
- mixed curriculum strategy [Zaremba '14]
- Adam [Kingma '14]
- minibatch with 128 sequences
- 30 epochs
26( [Cho 2014] Figure 1)
Experiment : Results & Analysis (Python Program Evaluation)
・From the 3rd column, GF-RNN is better with almost all target script.
27
GRULSTM
Contents
28
・Paper information
・Introduction
・Related Works
・Proposed Methods
・Experiments, Results and Analysis
・Conclusion
Conclusion
・They proposed a novel architecture for deep stacked RNNs which uses gated-
feedback connections between different layers.
・The proposed method outperformed previous results in the tasks of character-level
language modeling and Python program evaluation.
・Gated-feedback architecture is faster and better (in performance) than the
standard stacked RNN even with a same amount of capacity.
・More thorough investigation into the interaction between the gated- feedback
connections and the role of recurrent activation function is required in the future.
(because the proposed gated-feedback architecture works bad with
the tanh activation function)
29
References
[Cho 2014] Cho, Kyunghyun, Van Merrienboer, Bart, Gulcehre, Caglar, Bougares, Fethi, Schwenk, Holger,
and Bengio, Yoshua. Learning phrase representations using rnn encoder-decoder for statistical machine
translation. arXiv preprint arXiv:1406.1078, 2014.
[Koutnik 2014] Koutnik, Jan, Greff, Klaus, Gomez, Faustino, and Schmidhuber, Ju ̈rgen. A clockwork rnn. In
Proceedings of the 31st International Conference on Machine Learning (ICML’14), 2014.
[Schmidhuber 1992] Schmidhuber, Jurgen. Learning complex, extended sequences using the principle of
history compression. Neural Computation, 4(2):234–242, 1992.
[Stollenga 2014] Stollenga, Marijn F, Masci, Jonathan, Gomez, Faustino, and Schmidhuber, Ju ̈rgen. Deep
networks with internal selective attention through feedback connections. In Ad- vances in Neural
Information Processing Systems, pp. 3545–3553, 2014.
[Zaremba 2014] Zaremba, Wojciech and Sutskever, Ilya. Learning to execute. arXiv preprint arXiv:1410.4615,
2014.
30
論文輪読資料「Gated Feedback Recurrent Neural Networks」

More Related Content

What's hot

Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlowLearning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlowAltoros
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Universitat Politècnica de Catalunya
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNNPradnya Saval
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitShiladitya Sen
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Universitat Politècnica de Catalunya
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & PythonLonghow Lam
 
Differences of Deep Learning Frameworks
Differences of Deep Learning FrameworksDifferences of Deep Learning Frameworks
Differences of Deep Learning FrameworksSeiya Tokui
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...Kazuki Fujikawa
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 

What's hot (20)

Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlowLearning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
 
Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)
Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)
Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)
 
LSTM
LSTMLSTM
LSTM
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Rnn & Lstm
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal Wabbit
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
Differences of Deep Learning Frameworks
Differences of Deep Learning FrameworksDifferences of Deep Learning Frameworks
Differences of Deep Learning Frameworks
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 

Viewers also liked

集合知プログラミング勉強会 7章(前半)
集合知プログラミング勉強会 7章(前半)集合知プログラミング勉強会 7章(前半)
集合知プログラミング勉強会 7章(前半)koba cky
 
アンサンブル学習
アンサンブル学習アンサンブル学習
アンサンブル学習Hidekazu Tanaka
 
現在のDNNにおける未解決問題
現在のDNNにおける未解決問題現在のDNNにおける未解決問題
現在のDNNにおける未解決問題Daisuke Okanohara
 
パターン認識 第10章 決定木
パターン認識 第10章 決定木 パターン認識 第10章 決定木
パターン認識 第10章 決定木 Miyoshi Yuya
 
機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual TalksYuya Unno
 

Viewers also liked (7)

集合知プログラミング勉強会 7章(前半)
集合知プログラミング勉強会 7章(前半)集合知プログラミング勉強会 7章(前半)
集合知プログラミング勉強会 7章(前半)
 
Beamertemplete
BeamertempleteBeamertemplete
Beamertemplete
 
NLP2017 NMT Tutorial
NLP2017 NMT TutorialNLP2017 NMT Tutorial
NLP2017 NMT Tutorial
 
アンサンブル学習
アンサンブル学習アンサンブル学習
アンサンブル学習
 
現在のDNNにおける未解決問題
現在のDNNにおける未解決問題現在のDNNにおける未解決問題
現在のDNNにおける未解決問題
 
パターン認識 第10章 決定木
パターン認識 第10章 決定木 パターン認識 第10章 決定木
パターン認識 第10章 決定木
 
機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks
 

Similar to 論文輪読資料「Gated Feedback Recurrent Neural Networks」

SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...Sharath TS
 
Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...TELKOMNIKA JOURNAL
 
Lexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam searchLexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam searchSatoru Katsumata
 
A Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series ForecastingA Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series ForecastingMartha Brown
 
Deep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative UnitDeep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative UnitSatoru Katsumata
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...ssuser4b1f48
 
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...ssuser4b1f48
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Pedro Lopes
 
Android Malware
Android Malware Android Malware
Android Malware Nambiraju
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...Sunghoon Joo
 
Ire presentation
Ire presentationIre presentation
Ire presentationRaj Patel
 
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINERaj Patel
 
aMCfast: Automation of Fast NLO Computations for PDF fits
aMCfast: Automation of Fast NLO Computations for PDF fitsaMCfast: Automation of Fast NLO Computations for PDF fits
aMCfast: Automation of Fast NLO Computations for PDF fitsjuanrojochacon
 
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...ssuser4b1f48
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attributiontaeseon ryu
 
Simulation of Wireless Sensor Networks
Simulation of Wireless Sensor NetworksSimulation of Wireless Sensor Networks
Simulation of Wireless Sensor NetworksDaniel Zuniga
 

Similar to 論文輪読資料「Gated Feedback Recurrent Neural Networks」 (20)

SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
 
Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...Exploration of genetic network programming with two-stage reinforcement learn...
Exploration of genetic network programming with two-stage reinforcement learn...
 
Lexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam searchLexically constrained decoding for sequence generation using grid beam search
Lexically constrained decoding for sequence generation using grid beam search
 
Conformer review
Conformer reviewConformer review
Conformer review
 
A Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series ForecastingA Hybrid Deep Neural Network Model For Time Series Forecasting
A Hybrid Deep Neural Network Model For Time Series Forecasting
 
Deep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative UnitDeep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative Unit
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Transformer with Ad...
 
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
 
Scene understanding
Scene understandingScene understanding
Scene understanding
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
Android Malware
Android Malware Android Malware
Android Malware
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
Ire presentation
Ire presentationIre presentation
Ire presentation
 
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
aMCfast: Automation of Fast NLO Computations for PDF fits
aMCfast: Automation of Fast NLO Computations for PDF fitsaMCfast: Automation of Fast NLO Computations for PDF fits
aMCfast: Automation of Fast NLO Computations for PDF fits
 
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
 
Simulation of Wireless Sensor Networks
Simulation of Wireless Sensor NetworksSimulation of Wireless Sensor Networks
Simulation of Wireless Sensor Networks
 
FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

論文輪読資料「Gated Feedback Recurrent Neural Networks」

  • 1. Gated Feedback Recurrent Neural Networks Matsuo lab. paper reading session Jul.17 2015 School of Engineering, The University of Tokyo Hiroki Kurotaki kurotaki@weblab.t.u-tokyo.ac.jp
  • 2. Contents 2 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 3. Contents 3 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 4. Paper Information ・Gated Feedback Recurrent Neural Networks ・Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio Dept. IRO, Universite de Montreal, CIFAR Senior Fellow ・Proceedings of The 32nd International Conference on Machine Learning, pp. 2067–2075, 2015 (ICML 2015) ・1st submission to arXiv.org is on 9 Feb 2015 ・Cited by 9 (Google Scholar, Jul 17 2015) ・http://jmlr.org/proceedings/papers/v37/chung15.html 4
  • 5. Contents 5 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 6. Introduction 1/3 ・They propose a novel recurrent neural network(RNN) architecture, Gated-feedback RNN (GN-RNN). ・GN-RNN allows connections from upper layers to lower layers, then controlls the signals by global gating unit. 6 (Each circle represents a layer consists of recurrent units (i.e. LSTM-Cell))
  • 7. Introduction 2/3 ・The proposed GF-RNN outperforms the baseline methods in these tasks. 7 1. Character-level Lauguage Modeling (from a subsequence of structured data, predict the rest of the characters.)
  • 8. Introduction 3/3 ・The proposed GF-RNN outperforms the baseline methods in these tasks. 8 2. Python Program Evaluation (predict script execution results from the input as a raw charater sequence.) ( [Zaremba 2014] Figure 1)
  • 9. Contents 9 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 10. Related works(Unit) : Long short-term memory ・A LSTM cell is just a neuron ・but it decides when to memorize, forget and expose the content value 10 ( [Zaremba 2014] Figure 1) (The notation used in the figure is slightly different from this paper)
  • 11. Related works(Unit) : Gated recurrent unit ・Cho et al. 2014 ・Like LSTM, adaptively reset(forget) or update(input) its memory content. ・But unlike LSTM, no output gate ・balances between the previous and new memory contents adaptively 11( [Cho 2014] Figure 2)
  • 12. Related works(Architecture) : Conventional Stacked RNN ・Each circle represents a layer consists of many recurrent units ・Several hidden recurrent layers are stacked to model and capture hierahchical structure between short and long-term dependencies. 12
  • 13. Related works(Architecture) : Clockwork RNN ・i-th hidden module is only updated at the rate of 2^(i-1) ・Neurons in faster module i are connected to neurons in a slower module j only if a clock period T_i < T_j. 13( [Koutnik 2014] Figure 1)
  • 14. Contents 14 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 15. Proposed Method : Gated Feedback RNN ・Generalize the Clockwork RNN in both connection and work rate ・Flows back from the upper recurrent layers into the lower layers ・Adaptively control when to connect each layer with "global reset gates". (Small bullets on the edges) 15 : the concatenation of all the hidden states from the previous timestep (t-1) : from layer i in timestep (t-1) to layer j in timestep t global reset gate
  • 16. Proposed Method : GF-RNN with LSTM unit ・Only used when computing new memory state 16 ( [Zaremba 2014] Figure 1) (The notation used in the figure is slightly different from this paper)
  • 17. Proposed Method : GF-RNN with GRN unit ・Only used when computing new memory state 17( [Cho 2014] Figure 2)
  • 18. Contents 18 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 19. Experiment : Tasks (Lauguage Modeling) ・From a subsequence of structured data, predict the rest of the characters.) 19
  • 20. Experiment : Tasks (Lauguage Modeling) ・Hutter dataset ・English Wikipedia, contains 100 MBytes of characters which include Latin alphabets, non-Latin alphabets, XML markups and special characters ・Training set : the first 90 MBytes Validation set : the next 5 MBytes Test set : the last 10 MBytes ・Performance measure : the average number of bits-per-character (BPC) 20
  • 21. Experiment : Models (Lauguage Modeling) ・3 RNN architectures : single, (conventional) stacked, Gated-feedback ・3 recurrent units : tanh, LSTM, Gated Recurrent Unit (GRU) ・The number of parameters are constrained to be roughly 1000 ・Detail - RMSProp & momentum - 100 epochs - learning rate : 0.001 (GRU, LSTM) 5×10^(-5) (tanh) - momentum coef. : 0.9 - Each update is done using a minibatch of 100 subsequences of length 100 each. 21
  • 22. Experiment : Results and Analysis (Lauguage Modeling) 22 ・GF-RNN is good when used together with GRU and LSTM ・But failed to improve the performance with tanh units ・GF-RNN with LSTM is better than the Non-Gated (The undermost)
  • 23. Experiment : Results and Analysis (Lauguage Modeling) 23 ・the stacked LSTM failed to close the tags with </username> and </contributor> in both trials ・However, the GF-LSTM succeeded to close both of them, which shows that it learned about the structure of XML tags
  • 24. Experiment : Additional results (Lauguage Modeling) 24 ・They trained another GF-RNN with LSTM which includes larger number of parameters, and obtained comparable results. ・(They wrote it's better than the previously reported best results, but there is a non-RNN work that acheived 1.278)
  • 25. Experiment : Tasks (Python Program Evaluation) ・input : a python program ends with a print statement, 41symbols output : the result of a print statement, 13 symbols ・Scripts used in this task include addition, multiplication, subtraction, for-loop, variable assignment, logical comparison and if-else statement. ・Both the input & output are sequences of characters. ・Nesting : [1,5] ・length : [1, 1^10] 25( [Zaremba 2014] Figure 1)
  • 26. Experiment : Models (Python Program Evaluation) ・RNN encoder-decoder approach, used for translation task previously ・Encoder RNN : the hidden state of the encoder RNN is unfolded for 50 timesteps. ・Decoder RNN : initial hidden state is initialized with the last hidden state of the encoder RNN. ・Detail - GRU & LSTM with and without Gated - 3 hidden layers for each Encoder & Decoder RNN - hidden layer contains : 230 units(GRM) 200 units(LSTM) - mixed curriculum strategy [Zaremba '14] - Adam [Kingma '14] - minibatch with 128 sequences - 30 epochs 26( [Cho 2014] Figure 1)
  • 27. Experiment : Results & Analysis (Python Program Evaluation) ・From the 3rd column, GF-RNN is better with almost all target script. 27 GRULSTM
  • 28. Contents 28 ・Paper information ・Introduction ・Related Works ・Proposed Methods ・Experiments, Results and Analysis ・Conclusion
  • 29. Conclusion ・They proposed a novel architecture for deep stacked RNNs which uses gated- feedback connections between different layers. ・The proposed method outperformed previous results in the tasks of character-level language modeling and Python program evaluation. ・Gated-feedback architecture is faster and better (in performance) than the standard stacked RNN even with a same amount of capacity. ・More thorough investigation into the interaction between the gated- feedback connections and the role of recurrent activation function is required in the future. (because the proposed gated-feedback architecture works bad with the tanh activation function) 29
  • 30. References [Cho 2014] Cho, Kyunghyun, Van Merrienboer, Bart, Gulcehre, Caglar, Bougares, Fethi, Schwenk, Holger, and Bengio, Yoshua. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014. [Koutnik 2014] Koutnik, Jan, Greff, Klaus, Gomez, Faustino, and Schmidhuber, Ju ̈rgen. A clockwork rnn. In Proceedings of the 31st International Conference on Machine Learning (ICML’14), 2014. [Schmidhuber 1992] Schmidhuber, Jurgen. Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234–242, 1992. [Stollenga 2014] Stollenga, Marijn F, Masci, Jonathan, Gomez, Faustino, and Schmidhuber, Ju ̈rgen. Deep networks with internal selective attention through feedback connections. In Ad- vances in Neural Information Processing Systems, pp. 3545–3553, 2014. [Zaremba 2014] Zaremba, Wojciech and Sutskever, Ilya. Learning to execute. arXiv preprint arXiv:1410.4615, 2014. 30

Editor's Notes

  1. TOK-AAA123-20100706-
  2. ----- 会議メモ (2014/02/10 18:36) ----- あまり体系だっていない 人工知能 = 深層学習とかよくわからん 雑誌の1行目をそのまま書いたみたいになっている 深層学習と人工知能の結びつけ