NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model

NIPS2013読み会@東⼤大, 2014-‐‑‒01-‐‑‒23

DeViSE:

A Deep Visual-‐‑‒Semantic Embedding Model

得居誠也
Preferred Infrastructure

⾃自⼰己紹介
l 
l 

得居誠也（とくいせいや） @beam2d
株式会社Preferred Infrastructure
Jubatusプロジェクトリサーチャー

l 

数学科 → 情報理理⼯工修⼠士（機械学習）
– 

l 

専⾨門: ハッシュ学習による近傍探索索@中川研

最近: 深層学習と映像解析
– 

来週はなします: 全脳アーキテクチャ勉強会(第2回)

2

紹介する論論⽂文
DeViSE: A Deep Visual-Semantic Embedding Model
Andrea Frome*, Greg S. Corrado*, Jonathon Shlens*, Samy Bengio, Jeffrey
Dean, Marc’Aurelio Ranzato†, Tomas Mikolov. (Google, Inc.)
*These authors contributed equally.
l 
l 
l 

†Current affiliation: Facebook, Inc.

画像分類におけるZero-‐‑‒shot learning
Deep Convolutional Neural Netとword2vecを組合せる
画像から単語埋め込みベクトルを出⼒力力するDeep CNNが得られる
3

Zero-‐‑‒shot learning
l 
l 
l 
l 
l 

l 

x
y
画像に対してラベルを予測する
(x, y)
訓練データ、テストデータはこれらの組の集合
訓練データのラベル集合 Ytrain
テストデータのラベル集合 Ytest
としたときにという設定
Ytrain Ytest = ;
つまり評価時に答えるべきラベルに対応する画像が学習時にはない

4

Zero-‐‑‒shot learning
l 
l 
l 

補助情報がないと不不可能なのは明らか
そこでsemantic knowledgeを活⽤用する
この論論⽂文ではWikipediaのテキストデータを使ってよいという設定
– 

l 

Cf.) R. Socher+, Zero-Shot Learning Through Cross-Modal Transfer,
ICLR2013
– 

l 

画像分類には直接関係しない

単語埋め込みベクトルからzero-‐‑‒shot learningを⾏行行う話はここから（多分）

このタスク⾃自体は2008年年あたりから出てきた概念念っぽい
(e.g. H. Larochelle+, Zero-data Learning of New Tasks, AAAI2008)
5

Supervision
l 

画像認識識といえばILSVRC2012で優勝したSupervisionが有名
– 
– 

l 

5層のCNN+2つの全結合層+softmax層
この1年年はSupervisionをベースにした論論⽂文がたくさん出てる

DeViSEもSupervisionベース
– 

実装は猫認識識で有名なDistBelief

6

word2vec
l 

Google製の単語埋め込みベクトル学習器
– 

⼿手法はSkipGram: ある単語から周りの単語を予測する浅いMLPを学習する

l 

NIPSにも論論⽂文: T. Mikolov+, Distributed Representations of
Words and Phrases and their Compositionality, NIPS2013

l 

⾜足し算引き算をしてもそこそこ意味のある結果を返してくれる、
という不不思議な性質で有名になった
– 

Cf.) ⼯工藤拓拓さんのポスト
https://plus.google.com/107334123935896432800/posts/JvXrjzmLVW4
7

DeViSE
Deep CNN (Supervision)
画像

8

Softmax 
layer

ラベル

DeViSE

ラベル

SkipGram (word2vec)

9

埋め込み
ベクトル

DeViSE
回帰レイヤー

画像

ラベル

SkipGram (word2vec)

1. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, T. Mikolov.
DeViSE: A Deep Visual-Semantic Embedding Model. NIPS 2013.
10

埋め込み
ベクトル

DeViSE

回帰レイヤー

画像

ラベル

Hinge 
rank loss

SkipGram (word2vec)

11

埋め込み
ベクトル

DeViSE
回帰レイヤー

画像

word2vecによる
テストラベルたちの
埋め込みベクトル

埋め込み
ベクトル
12

近傍探索索で
ラベルを決定

埋め込み
ベクトル

実験

13

実験
普通の分類の結果

14

その後: convex combination of semantic
embeddings (ConSE)
l 

M. Norouzi+, Zero-‐‑‒Shot Learning by Convex Combination of
Semantic Embeddings, ArXiv 1312.5650v2
– 

ほぼ同じ⼈人たちのプレプリント

– 

ICLR2014に提出中 (open review)

– 

回帰レイヤーを学習せずに、Supervisionのtop-‐‑‒kラベルスコアを使って、対
応する埋め込みベクトルを予測スコアで重み付けした平均値を出⼒力力する

– 

過学習を回避（zero-‐‑‒shot labelに対する精度度がより⾼高い）
l 

DeViSEは教師ラベルに過適合している
15

まとめ
l 

l 
l 
l 

Zero-‐‑‒shot learningを使えば(semantic knowledgeだけがある)
未知のラベルを予測できる
DeViSE: Deep CNNとword2vecを組合せてzero-‐‑‒shot learning
後継でConSEが提案されている（もっと簡単、汎化性能で優れる）
画像と⾔言語の両⽅方における表現学習が交差するところという意味
で興味深い
– 

画像認識識を⾔言語で補強する、画像には直接現れない常識識を組み込む

– 

逆は？画像情報を使ってsemantic knowledgeを似た感じで補強できない？

16

参考⽂文献
1. 

Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, T. Mikolov.

2. 

A. Krizhevsky, I. Sutskever, G. Hinton.
ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012.

3. 

H. Larochelle, D. Erhan, Y. Bengio.
Zero-data Learning of New Tasks. AAAI 2008.

4. 

M. Norouzi, T. Mikolov, S. Bengio, Y. Singer, J. Shlens, A. Frome, G. S. Corrado, J. Dean.
Zero-Shot Learning by Convex Combination of Semantic Embeddings. ArXiv 1312.5650.

5. 

R. Socher, M. Ganjoo, H. Sridhar, O. Bastani, C. D. Manning, A. Y. Ng.
Zero-Shot Learning Through Cross-Modal Transfer. ICLR 2013.

17

NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model

Similar to NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model (20)

More from Seiya Tokui

More from Seiya Tokui (20)

Recently uploaded

Recently uploaded (8)

NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model