NIPS2013読み会: Distributed Representations of Words and Phrases and their Compositionality

2014/01/23
NIPS2013読み会@東京大学

Distributed Representations of
Words and Phrases and their
Compositionality
（株）Preferred Infrastructure
海野　裕也 (@unnonouno)

⾃自⼰己紹介

海野　裕也 (@unnonouno)
l  Preferred Infrastructure (PFI)
l 

l 
l 

l 

Jubautsプロジェクトリーダー
http://jubat.us

専⾨門
l 
l 

⾃自然⾔言語処理理
テキストマイニング

2

概要
l 

MikolovのICLR2013（word2vec）の続編
l 

l 

Berlin – German + France = Paris!!

計算をサボって速くしたら、何故か結果も良良くなった話
l 
l 

Before: ⽇日単位でかかっていた
After: 15~30分

3

word2vec [Mikolov+13]
l 
l 

各単語の「意味」を表現するベクトルを作るはなし
vec(Berlin) – vec(German) + vec(France) と⼀一番近い単
語を探したら、vec(Paris)だった
l 

ベクトルの作り⽅方は次のスライドで説明

Paris!!

France

German

Berlin
4

Skip gramモデル[Mikolov+13]の⽬目的関数
l 

⼊入⼒力力コーパス: w1, w2, …, wT 　（wiは単語）

これを最
⼤大化

cは文脈サイズで5くらい

vwは単語wを表現するようなベクトル（適当な次元）で、
これらを推定したい
5

問題点

l 

語彙数が多すぎて∑の計算が⼤大変
l 

l 

W = 105 ~ 107

いかに効率率率よく計算をサボるかがこの論論⽂文の主題

6
[Mikolov+13]より

Hierarchical Softmax (HS) [Morin+05]

ルートからw
までの全ノー
ドで積をとる

りんご

n3

n1

n2

みかん

カレー

ラーメン

各ノードのベ
クトル

σ(x)=1/(1 + exp(-x))

l 
l 

単語で⽊木を作り、ルートからその単語までの各ノードの
ベクトルと内積をとり、そのシグモイドの積にする
計算量量が単語数の対数時間になる
7

Noise Contrastive Estimation (NCE) [Gutmann
+12]
l 
l 

本題から外れるので割愛
Softmaxによる分布を近似するらしい

8

Negative Sampling (NEG) （提案⼿手法1）
log P(wo|wI) =

l 
l 

NCEをもっとサボった上式を使う
∑の中の期待値計算は、k個のサンプルを取って近似する
l 

l 

データが少ない時は5~20個、多ければ2~5個で充分

P(w)として、1-gram頻度度の3/4乗に⽐比例例させたときが
⼀一番良良かった
9

頻出語のサブサンプリング（提案⼿手法2）
l 
l 
l 

“a”や”the”などの頻出語をうまくモデル化してもしょう
がないので、頻度度をディスカウントする
tは適当な閾値（10-5くらい）、f(w)は単語頻度度
もはやPとは何だったのか・・・

10

実験結果
l 

[Mikolov+13]でやったanalogical reasoning taskで評価
l 

l 
l 

vec(Berlin) – vec(Germany) + vec(France)の近傍探索索で
vec(Paris)を⾒見見つける

NEGがHierarchical SoftmaxやNCEよりも⾼高精度度
サブサンプリングも効果的

小さい方がいい

11

大きい方がいい

複合語の実験

l 

適当なスコア関数（上式）の⾼高いものを複合語として
取ってくる（δは適当なディスカウント係数）

l 

あとは同様に実験
l 

単語と複合語のスコアからどうやって⽬目的関数を設計したかは
ちゃんとかかれてない・・・？

12

複合語の実験結果

l 
l 

サブサンプリングなしだとNEGがいいが、ありだとHS
の⽅方がとたんに良良くなる
データセットとベクトルの次元を増やすとどんどんよく
なる
l 

最終的に72%の精度度まで上がった

13

意味の⾜足し算

l 
l 

単純に2つの単語のベクトルを⾜足すと複合的な意味の単
語が⾒見見つかる
2つの単語の両⽅方と頻出しやすい単語を探していること
になるからでは（AND検索索っぽく振る舞う）

14

議論論
l 

このベクトルは何を⽰示しているのか？

l 

Softmaxをとると何がおこるのか？

l 

ベクトルのたし引きは何を⽰示しているのか？

l 

Distributional Hypothesisの実現？
l 

words that occur in the same contexts tend to have similar
meanings (wikipedia)

15

参考⽂文献
l 

l 

l 

[Mikolov+13] Tomas Mikolov, Kai Chen, Greg
Corrado, and Jeffrey Dean. Efficient estimation of
word representations in vector space. ICLR 2013.
[Morin+05] Frederic Morin and Yoshua Bengio.
Hierarchical probabilistic neural network language
model. AISTATS 2005.
[Gutmann+12] Michael U. Gutmann and Aapo
Hyvarinen. Noise-Contrastive Estimation of
Unnormalized Statistical Models, with Applications
to Natural Image Statistics. JMLR 2012.
16

NIPS2013読み会: Distributed Representations of Words and Phrases and their Compositionality

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NIPS2013読み会: Distributed Representations of Words and Phrases and their Compositionality

Similar to NIPS2013読み会: Distributed Representations of Words and Phrases and their Compositionality (18)

More from Yuya Unno

More from Yuya Unno (20)

Recently uploaded

Recently uploaded (12)

NIPS2013読み会: Distributed Representations of Words and Phrases and their Compositionality