10. Pascal Sentence Dataset(2
■ Pascal DatasetにAmazon MTurkで説明文を付与
2)Cyrus Rashtchian et. al., “Collecting Image Annotations Using Amazon's Mechanical Turk,”
NAACL HLT 2010 Workshop
A bike painted pink sitting on a sidewalk
outside a building.
An old bicycle painted almost completely
pink standing against a city building.
A pink bicycle is in front of a building
A pink bicycle is parked next to a brick
and concrete building.
A pink bicycle with matching tires.
11. Pascal Sentence Dataset(2
■ Pascal DatasetにAmazon MTurkで説明文を付与
2)Cyrus Rashtchian et. al., “Collecting Image Annotations Using Amazon's Mechanical Turk,”
NAACL HLT 2010 Workshop
A bike painted pink sitting on a sidewalk
outside a building.
An old bicycle painted almost completely
pink standing against a city building.
A pink bicycle is in front of a building
A pink bicycle is parked next to a brick
and concrete building.
A pink bicycle with matching tires.
現状の技術で扱うのが
かなり難しい部類
ほぼ有形な概念
物体同士の位置関係(次スライド)
12. Grounded Language Learning(3
3) Haonan Yu et. al., “Grounded
Language Learning from Video
Described with Sentences,”
ACL2013
Unlike prior computer-vision
approaches that learn from
videos with verb labels or
images with noun labels,
our labels are sentences
containing nouns, verbs,
prepositions, adjectives, and
adverbs
物体/動作特徴はHand-crafted.
13. Deep visual-semantic alignment(4
4)A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for generating image descriptions,”
CVPR2015
CNN+RNNによる
画像説明文の生成
個々の単語の尤度も
出力可能(左図)
14. Deep visual-semantic alignment(4
4)A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for generating image descriptions,”
CVPR2015
CNN+RNNによる
画像説明文の生成
個々の単語の尤度も
出力可能(左図)
”Visible”に近い形容詞,前
置詞などしか扱えていない