Memory Networks (End-to-End Memory Networks の Chainer 実装)

Memory Networks
2017/9/14 @shuyo

今日のお話
• Memory Networks
– 「記憶」のモデル化
– 「知識」を用いた推論
– 複数のタスクへの応用（汎用性?）
– 実装してみた

Long Short Term Memory
• これも記憶のモデル化

LSTM の記憶
• 固定⾧ベクトル
– 扱える情報量に限界がある
• 「文」をまたがない
– 文＝1つながりの入出力
– 複数の文を使った推論(対話、応答、……)に
応用するのが難しい

(End-to-End) Memory Networks
[Sukhbaatar 2015]

知識の記憶
番目の知識文の番目の単語(1-hot)
単語を分散表現ベクトルに変換する行列

質問との関連度
質問文の番目の単語(1-hot)
単語を分散表現ベクトルに変換する行列
質問と各との関連度、

応答の出力
実際には 𝑚 の代わりに出力用の分散表現 𝑐 を生成して使う
𝑾 =応答情報を回答(単語)に変換する行列

返答の生成に質問を
入力するのは?
• 質問によって、同じ知識から異なる返答

IGOR
I G
O
R
• [Weston 2014] によるモデル概要
• Input
• Generalization 記憶の更新(格納)
• Output
• Response
http://deeplearning.hatenablog
.com/entry/memory_networks
によれば「ちなみに I, G, O, R の
名前の由来はフランケンシュタイ
ン博士の助手イゴール (IGOR) で
ある」とのことだが、ソースが見
つからない
実際は記憶ストレージに
文ベクトルを追加格納してるだけ
多分ここをまじめにモデル化
しないと AGI への道は開かない

Dynamic Memory Networks
[Kumar 2016]
• “Most tasks in natural language processing
can be cast into question answering (QA)
problems over language input.”
– Abstract の第1文
– NLPのほとんどのタスクは質問応答に帰着できる
• Memory Networks はいろんな問題が解ける!
– 質問応答、テキスト分類()、品詞タグ付け、etc

End-to-End Memory Networks
• : 番目の知識文(単語列)
• : 質問文、 : 正解
• : 知識文ベクトル
• : 質問文ベクトル
• : 質問と知識の関連度
• : 応答情報を生成
• : 返答(単語)
データ
知識の関連度用、応答用、
質問用それぞれの
分散表現を個別に学習

Multi Layers
• 応答情報を質問にフィードバックし、記憶の
知識との関連度の推論を再度行う
– 複雑な推論を表現できることを期待
• 後述の three-supporting-facts など
– 3層重ねることを推奨

Position Encoding
• 素朴には単語ベクトルの総和を文ベクトルとする
– "Mary handed the football to John." と
"John passed the football to Mary." が区別できない
• ベクトルの要素に文中の位置の情報を持たせる
– は文⾧、は分散表現の次元、、は要素積
– 𝑚 = 1 − 𝑨 ∑ 𝒙 + − 1 𝑨 ∑ 𝒙 と変形すれば、固定⾧の処理に落とせる

Temporal Encoding と正則化
• 素朴なモデルは関連度の推定に記憶の時刻を用いない
– “Sandra moved to the garden.” と
“Sandra journeyed to the bathroom.” の順序を区別できない
• 各に時刻の重み情報を加える
– ∗ ∗ : 各時刻の重み、 : 質問時刻
– 特定の訓練データに過適合した
重みを推定してしまう可能性
• 10% の確率で記憶に0ベクトルを挿入
– 学習時のみ

Linear Start
• 中間のソフトマックス層を学習初期に取り除く
– 全て線形演算だけになり、速い?
• 確かに1割ほどは速くなる
– 局所解につかまりにくい?
• validation loss が下がらなくなったら元に戻すから、
初期値の選び方程度の影響しか無い
– 学習率<1e-5 でも inf に吹っ飛ぶ
• 全部線形だからランクが落ちて、逆行列が潰れてる?
• 勾配のノルムが 40 以下になるようにスカラー倍調整という
おかしなことに手を染めさせられる

bAbI dataset
https://research.fb.com/downloads/babi/
• Memory Networks 用に作成された
超超やさしい質問応答のデータセット(baby!)
– 語彙極小(10～40)、回答は1単語
– 否定文を含むのは qa9 のみ
[qa1_single-supporting-fact]
1 Mary moved to the bathroom.
2 John went to the hallway.
3 Where is Mary? bathroom 1
4 Daniel went back to the hallway.
5 Sandra moved to the garden.
6 Where is Daniel? hallway 4
質問、回答、
参照すべき知識
知識
※ End-to-End Memory
Networks では使わない
qa knowledge vocaburary answer
qa1 2000 19 6
qa2 4338 33 6
qa3 14796 34 6
qa4 2000 14 6
qa5 5038 39 7
qa6 2066 33 2
qa7 2638 39 4
qa8 2634 34 14
qa9 2000 22 2
qa10 2000 21 3
qa11 2000 26 6
qa12 2000 20 6
qa13 2000 26 6
qa14 2372 25 6
qa15 2000 17 4
qa16 9000 17 4
qa17 250 16 2
qa18 1230 16 2
qa19 5000 19 12
qa20 1000 35 7
knowledge は総知識数
vocaburary は知識＋質問の語彙数
answer は回答種類数

[qa9_simple-negation]
1 Mary is no longer in the bedroom.
2 Daniel moved to the hallway.
3 Is Mary in the bedroom? no 1
4 Sandra moved to the bedroom.
5 Sandra is in the bathroom.
6 Is Daniel in the bathroom? no 2
[qa20_agents-motivations]
1 Sumit is tired.
2 Where will sumit go? bedroom 1
3 Sumit went back to the bedroom.
4 Why did sumit go to the bedroom? tired 1
5 Sumit grabbed the pajamas there.
6 Why did sumit get the pajamas? tired 1
エスパーか!

[qa3_three-supporting-facts]
1 Mary moved to the bathroom.
2 Sandra journeyed to the bedroom.
3 Mary got the football there.
4 John went back to the bedroom.
5 Mary journeyed to the office.
6 John journeyed to the office.
7 John took the milk.
8 Daniel went back to the kitchen.
9 John moved to the bedroom.
10 Daniel went back to the hallway.
11 Daniel took the apple.
12 John left the milk there.
13 John travelled to the kitchen.
14 Sandra went back to the bathroom.
15 Daniel journeyed to the bathroom.
16 John journeyed to the bathroom.
17 Mary journeyed to the bathroom.
18 Sandra went back to the garden.
19 Sandra went to the office.
20 Daniel went to the garden.
21 Sandra went back to the hallway.
22 Daniel journeyed to the office.
23 Mary dropped the football.
24 John moved to the bedroom.
25 Where was the football before the bathroom? office 23 17 5
Mary が football を
持って office から
bathroom に移動した

実装
• End-to-End Memory Networks の Chainer 実装
– https://github.com/shuyo/iir/blob/master/dnn/e2emn.py
– 200行ちょい(モデル本体は50行)
– bAbI 1000件(質問数) の 100エポックの学習が5～10分
• Sparse Matrix が使えるライブラリなら多分もうちょい速くで
きる
– CPU/GPU 両対応だが、CPU の方が速い(苦笑
• データ積むオーバーヘッドがでかい?? うまく書けば解消で
きると思う

実験(qa1～20)
• 隠れユニット50, 100epoch
• 初期値を変えて各5回推論、一番良い
結果を採用
– 上述の実装は正解率を出す。論文に合わ
せて error rate に変換
• validation data = test data (手抜き)
– PE = Position Encoding
– LS = Linear Start
– RN = Random Noise (Temporal
Encoding の正則化)

[Sukhbaatar] の結果と比較

References
• Weston, Jason, Sumit Chopra, and Antoine Bordes.
"Memory networks." arXiv preprint
arXiv:1410.3916 (2014).
• Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus.
"End-to-end memory networks." Advances in neural
information processing systems. 2015.
• Kumar, Ankit, et al. "Ask me anything: Dynamic
memory networks for natural language
processing." International Conference on Machine
Learning. 2016.
• [赤本] 坪井,海野,鈴木, 深層学習による自然言語処理 (機械学
習プロフェッショナルシリーズ), 2017

Memory Networks (End-to-End Memory Networks の Chainer 実装)

Recommended

Recommended

More Related Content

More from Shuyo Nakatani

More from Shuyo Nakatani (20)

Memory Networks (End-to-End Memory Networks の Chainer 実装)