SlideShare a Scribd company logo
1 of 58
Download to read offline
2018 6 28
17
v.20180802
•
• 2015 9
• 2015 10
• 2018 6 AIP
•
•
•
•
2
(20 )
(35 )
3
•
•
4
3
16
5
6
man in black shirt
is playing guitar.
•
•
Neural Image Caption (NIC)
1. CNN Encode
2. LSTM Decode
7
[Vinyals+ 2015]
Chainer Tensorflow PyTorch
Flickr8k
• Flickr 8092
5
• ” ”
[Hodosh+ 2013]
8
Flickr30k
Flickr8k Flickr8k 31,783
5
9
[Young+ 2014]
Flickr30k Entities
• Flickr30k
( )
•
10
[Plummer+ 2016]
[Liu+ 2017]
MS COCO
• Flickr
5
•
11
http://cocodataset.org/#explore?id=409091
• a lady blowing out candles on a cake
• the woman is blowing out her birthday
cake candles
• a woman blowing candles on a frosted
cake.
• two people blowing out candles on a
cake.
• a girl is blowing out candles on a
birthday cake.
[Chen+ 2015]
MS COCO
Amazon Mechanical Turk (AMT)
12
•
• “There is”
•
•
•
• 8 words
Visual Genome
13
[Krishna+ 2017]
Park bench is made of gray weathered wood The man is almost bald
• MS COCO YFCC100M
• 1
•
Visual Genome
• Object ( )
•
• Attribute ( )
•
• Relationship ( )
• 2
• : jumbing_over(man, fire hydrant)
• Region graph ( )
• object, attribute, relationship
• Scene graph ( )
• Region graph
14
Region Graph
15
Scene Graph
1 1
•
Visual Genome 1
16
[Krause+ 2017]
Two children are sitting at a table in a restaurant.
The children are one little girl and one little boy. The
little girl is eating a pink frosted donut with white icing
lines on top of it. The girl has blonde hair and is
wearing a green jacket with a black long sleeve shirt
underneath. The little boy is wearing a black zip up
jacket and is holding his finger to his lip but is not
eating. A metal napkin dispenser is in between them
at the table. The wall next to them is white brick. Two
adults are on the other side of the short white brick
wall. The room has white circular lights on the ceiling
and a large window in the front of the restaurant. It is
daylight outside.
STAIR Captions
MS COCO
17
[Yoshikawa+ 2017]
http://captions.stair.center/explore/
STAIR Captions
• 2100
18
1. 15
2.
3.
4.
5.
STAIR Captions
MS COCO Google
( ) STAIR Captions ( )
19
STAIR Captions
STAIR Captions
http://captions.stair.center
20
• Pascal Sentence
• PASCAL VOC2008 1000 5
•
• Abstract Scenes
•
• YJ Captions
• MS COCO
• Multi30k
• Flickr30k MS COCO
21
[Rashtchian+ 2010]
[Funaki+ 2015]
[Zitnick+ 2013]
[Miyazaki+ 2016]
[Elliott+ 2016]
22
Pascal
Sentence
1,000 5
MS COCO 123,287 5
Flickr8k 8,092 5
Flickr30k 31,783 5
Visual
Genome
108,077 50
Krause et al. 19,551 1
Multi30k 123,287 5
STAIR
Captions
123,287 5
23
•
•
Classification (Recognition)
•
Temporal Localization
•
24
Spatial-Temporal Localization
•
Classification
C3D (3DCNN)
• 3 (Conv)
(Pool)
• Conv :
3x3x3 kernels with stride 1
• Pool : 2x2x2
25
[Tran+ 2015]
input
3 channels
16 frames
112x112 pixels
output
3 (Conv)
MNIST
• HMDB51
• 51 6766
• Prelinger archive, YouTube
• UCF101
• 101
13320
• YouTube
26
[Kuehne+ 2011]
[Soomro+ 2012]
ActivityNet 200
•
•
27
200 1.5 2.3
[Heilbron+ 2015]
• CVPR2016 ActivityNet Challenge
• ActivityNet Challenge 2017
(Untrimmed Video Classification)
8.8%
1 YouTube
ActivityNet 200 (1/4)
(1)
• American Time Use Survey (ATUS)
2000 200
28
American Time Use Survey Activity Lexicon 2016
ActivityNet 200 (2/4)
(2)
• WordNet
YouTube
29
ActivityNet 200 (3/4)
(3)
• (AMT)
30
ActivityNet 200 (4/4)
(4)
•
31
Charades
•
•
•
32
[Sigurdsson+ 2016]
: (mAP)
157 6.7
Charades (1/3)
(1)
• 1 5
5
• 2 2
33
Charades (2/3)
(2)
•
30
34
Charades (3/3)
(3)
•
•
5
• 5
35
Charades-Ego
• 1 3
• Charades
• 60%
36
157 4000
[Sigurdsson+ 2018]
Kinetics-400
•
• YouTube 10
• 1 YouTube 1
37
400 30
[Kay+ 2017]
Top-1/Top-5
600
Kinetics-600
Kinetics
1.
•
AMT
2.
• YouTube
•
10
3.
•
AMT
38
SOMETHING-SOMETHING (v1)
•
• Something
•
• 1
•
• Holding something
• Dropping something into something
• Something falling like a rock
•
• 88.5%
39
174 10 2~6
[Goyal+ 2017]
AVA
•
• 14
49
17
• Bounding box
•
40
80 430 15
[Gu+ 2017]
AVA (1/2)
(1) YouTube
• 15 30
• 15 1 3 900
(2) Bounding Box
• Faster-RCNN person detector
•
(3) Bounding Box
• Bounding Box
41
AVA (2/2)
(4)
1.
2.
42
Moments in Time
•
•
•
•
43
339 100 3
[Monfort+ 2018]
Moments in Time
• Top-1: 0.39, Top-5: 0.67
(The Moments in Time Recognition Challenge 2018 )
STAIR Actions (v1.0)
•
• 100
• YouTube
44
100 9 5
[Yoshikawa+ 2018]
STAIR Actions
45
•
•
•
•
• PC
Wiktionary: 1000
STAIR Actions (1/4)
1. YouTube
• 4
CC0
2. 5
• 5
5
3. 5
4.
46
STAIR Actions (2/4)
10
5 10
47
5
10
5
STAIR Actions (3/4)
3
3 2
48
STAIR Actions (4/4)
STAIR Lab
49
STAIR Actions Kinetics
OpenPose
50
STAIR Actions 95.6% Kinetics 55.5%
STAIR Actions
• 2DCNN+LSTM (LRCN) Two-stream CNN 3DCNN
STAIR Actions
• 76.5%
• c.f. Kinetics 61.0% (Two-stream CNN)
51
STAIR Actions
52
/
Bounding
Box
HMDB51 / YouTube 51 6K
UCF101 / YouTube 101 13K
ActivityNet 200 / YouTube 200 15K
Charades / 157 67K
Charades-Ego / 157 8K
Kinetics / YouTube 400 300K
SOMETHING-
SOMETHING (v1)
/ 174 100K
AVA / YouTube 80 430
Moments in
Time
/ YouTube 339 >1M
STAIR Actions
(v1.0)
/
/
YouTube
100 >90K
•
• STAIR Captions
•
• MS COCO 5
• STAIR Actions
• 100
53
1
54
• [Vinyals+ 2015] Vinyals, Oriol, et al. "Show and tell: A neural image
caption generator." Computer Vision and Pattern Recognition (CVPR),
2015 IEEE Conference on. IEEE, 2015.
• [Hodosh+ 2013] Hodosh, Micah, Peter Young, and Julia Hockenmaier.
"Framing image description as a ranking task: Data, models and
evaluation metrics." Journal of Artificial Intelligence Research 47
(2013): 853-899.
• [Young+ 2014] Young, Peter, et al. "From image descriptions to visual
denotations: New similarity metrics for semantic inference over event
descriptions." Transactions of the Association for Computational
Linguistics 2 (2014): 67-78.
• [Plummer+ 2016] Plummer, Bryan A., et al. "Flickr30k entities:
Collecting region-to-phrase correspondences for richer image-to-
sentence models." Computer Vision (ICCV), 2015 IEEE International
Conference on. IEEE, 2015.
• [Liu+ 2017] Liu, Chenxi, et al. "Attention Correctness in Neural Image
Captioning." AAAI. 2017.
• [Chen+ 2015] Chen, Xinlei, et al. "Microsoft COCO captions: Data
collection and evaluation server." arXiv preprint
arXiv:1504.00325 (2015).
2
55
• [Krishna+ 2017] Krishna, Ranjay, et al. "Visual genome: Connecting
language and vision using crowdsourced dense image
annotations." International Journal of Computer Vision 123.1 (2017):
32-73.
• [Krause+ 2017] Krause, Jonathan, et al. "A hierarchical approach for
generating descriptive image paragraphs." 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
• [Yoshikawa+ 2017] Yoshikawa, Yuya, Yutaro Shigeto, and Akikazu
Takeuchi. "Stair captions: Constructing a large-scale japanese image
caption dataset." arXiv preprint arXiv:1705.00823 (2017).
• [Rashtchian+ 2010] Rashtchian, Cyrus, et al. "Collecting image
annotations using Amazon's Mechanical Turk." Proceedings of the NAACL
HLT 2010 Workshop on Creating Speech and Language Data with Amazon's
Mechanical Turk. Association for Computational Linguistics, 2010.
• [Funaki+ 2015] Funaki, Ruka, and Hideki Nakayama. "Image-mediated
learning for zero-shot cross-lingual document retrieval." Proceedings
of the 2015 Conference on Empirical Methods in Natural Language
Processing. 2015.
3
56
• [Zitnick+ 2013] Zitnick, C. Lawrence, and Devi Parikh. "Bringing
semantics into focus using visual abstraction." Computer Vision and
Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013.
• [Miyazaki+ 2016] Miyazaki, Takashi, and Nobuyuki Shimizu. "Cross-
lingual image caption generation." Proceedings of the 54th Annual
Meeting of the Association for Computational Linguistics (Volume 1:
Long Papers). Vol. 1. 2016.
• [Elliott+ 2016] Elliott, Desmond, et al. "Multi30k: Multilingual
english-german image descriptions." arXiv preprint
arXiv:1605.00459 (2016).
• [Tran+ 2015] Tran, Du, et al. "C3D: generic features for video
analysis." CoRR, abs/1412.0767 2.7 (2014): 8.
• [Kuehne+ 2011] Kuehne, Hilde, et al. "HMDB51: A large video database
for human motion recognition." High Performance Computing in Science
and Engineering ‘12. Springer, Berlin, Heidelberg, 2013. 571-582.
• [Soomro+ 2012] Soomro, Khurram, Amir Roshan Zamir, and Mubarak Shah.
"UCF101: A dataset of 101 human actions classes from videos in the
wild." arXiv preprint arXiv:1212.0402 (2012).
4
57
• [Heilbron+ 2015] Caba Heilbron, Fabian, et al. "Activitynet: A large-
scale video benchmark for human activity understanding." Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition.
2015.
• [Sigurdsson+ 2016] Sigurdsson, Gunnar A., et al. "Hollywood in homes:
Crowdsourcing data collection for activity understanding." European
Conference on Computer Vision. Springer, Cham, 2016.
• [Sigurdsson+ 2018] Sigurdsson, Gunnar A., et al. "Charades-Ego: A
Large-Scale Dataset of Paired Third and First Person Videos." arXiv
preprint arXiv:1804.09626 (2018).
• [Kay+ 2017] Kay, Will, et al. "The kinetics human action video
dataset." arXiv preprint arXiv:1705.06950 (2017).
• [Goyal+ 2017] Goyal, Raghav, et al. "The” something something” video
database for learning and evaluating visual common sense." Proc. ICCV.
2017.
• [Gu+ 2017] Gu, Chunhui, et al. "AVA: A video dataset of spatio-
temporally localized atomic visual actions." arXiv preprint
arXiv:1705.08421(2017).
5
58
• [Monfort+ 2018] Monfort, Mathew, et al. "Moments in Time Dataset: one
million videos for event understanding." arXiv preprint
arXiv:1801.03150(2018).
• [Yoshikawa+ 2018] Yoshikawa, Yuya, Jiaqing Lin, and Akikazu Takeuchi.
"STAIR Actions: A Video Dataset of Everyday Home Actions." arXiv
preprint arXiv:1804.04326 (2018).

More Related Content

What's hot

Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Yusuke Uchida
 
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisDeep Learning JP
 
【メタサーベイ】Video Transformer
 【メタサーベイ】Video Transformer 【メタサーベイ】Video Transformer
【メタサーベイ】Video Transformercvpaper. challenge
 
動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)cvpaper. challenge
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP
 
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII
 
Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Ohnishi Katsunori
 
【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者cvpaper. challenge
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーnlab_utokyo
 
[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめDeep Learning JP
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイcvpaper. challenge
 
画像認識の初歩、SIFT,SURF特徴量
画像認識の初歩、SIFT,SURF特徴量画像認識の初歩、SIFT,SURF特徴量
画像認識の初歩、SIFT,SURF特徴量takaya imai
 
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜SSII
 
最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情Yuta Kikuchi
 
[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted WindowsDeep Learning JP
 
【論文紹介】U-GAT-IT
【論文紹介】U-GAT-IT【論文紹介】U-GAT-IT
【論文紹介】U-GAT-ITmeownoisy
 
ドメイン適応の原理と応用
ドメイン適応の原理と応用ドメイン適応の原理と応用
ドメイン適応の原理と応用Yoshitaka Ushiku
 
【DL輪読会】Visual Classification via Description from Large Language Models (ICLR...
【DL輪読会】Visual Classification via Description from Large Language Models (ICLR...【DL輪読会】Visual Classification via Description from Large Language Models (ICLR...
【DL輪読会】Visual Classification via Description from Large Language Models (ICLR...Deep Learning JP
 
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs 【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs Deep Learning JP
 

What's hot (20)

Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
 
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
【メタサーベイ】Video Transformer
 【メタサーベイ】Video Transformer 【メタサーベイ】Video Transformer
【メタサーベイ】Video Transformer
 
動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)動画認識における代表的なモデル・データセット(メタサーベイ)
動画認識における代表的なモデル・データセット(メタサーベイ)
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
 
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
 
Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向
 
【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
 
[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ
 
Transformer メタサーベイ
Transformer メタサーベイTransformer メタサーベイ
Transformer メタサーベイ
 
画像認識の初歩、SIFT,SURF特徴量
画像認識の初歩、SIFT,SURF特徴量画像認識の初歩、SIFT,SURF特徴量
画像認識の初歩、SIFT,SURF特徴量
 
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
 
最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情
 
[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
 
【論文紹介】U-GAT-IT
【論文紹介】U-GAT-IT【論文紹介】U-GAT-IT
【論文紹介】U-GAT-IT
 
Semantic segmentation
Semantic segmentationSemantic segmentation
Semantic segmentation
 
ドメイン適応の原理と応用
ドメイン適応の原理と応用ドメイン適応の原理と応用
ドメイン適応の原理と応用
 
【DL輪読会】Visual Classification via Description from Large Language Models (ICLR...
【DL輪読会】Visual Classification via Description from Large Language Models (ICLR...【DL輪読会】Visual Classification via Description from Large Language Models (ICLR...
【DL輪読会】Visual Classification via Description from Large Language Models (ICLR...
 
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs 【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs
 

Similar to 画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)

Action Recognition Datasets.pptx
Action Recognition Datasets.pptxAction Recognition Datasets.pptx
Action Recognition Datasets.pptxSangmin Woo
 
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2OIntroduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2OData Science Milan
 
Recognize, Describe, and Generate: Introduction of Recent Work at MIL
Recognize, Describe, and Generate: Introduction of Recent Work at MILRecognize, Describe, and Generate: Introduction of Recent Work at MIL
Recognize, Describe, and Generate: Introduction of Recent Work at MILYoshitaka Ushiku
 
[論文読み]Interpretable Coun.ng for Visual Ques.on Answering
[論文読み]Interpretable Coun.ng for Visual Ques.on Answering[論文読み]Interpretable Coun.ng for Visual Ques.on Answering
[論文読み]Interpretable Coun.ng for Visual Ques.on Answeringhirono kawashima
 
Introduction to Processing
Introduction to ProcessingIntroduction to Processing
Introduction to Processingsiufu
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Introduction
Bridging the Gap: Machine Learning for Ubiquitous Computing -- IntroductionBridging the Gap: Machine Learning for Ubiquitous Computing -- Introduction
Bridging the Gap: Machine Learning for Ubiquitous Computing -- IntroductionThomas Ploetz
 
Machine Learning The Key Ingredient to Self-Driving Data Center
Machine Learning The Key Ingredient to Self-Driving Data CenterMachine Learning The Key Ingredient to Self-Driving Data Center
Machine Learning The Key Ingredient to Self-Driving Data CenterSergey A. Razin
 
ENEI16 - WebGL with Three.js
ENEI16 - WebGL with Three.jsENEI16 - WebGL with Three.js
ENEI16 - WebGL with Three.jsJosé Ferrão
 
Predicting the Future - Avner Algom presentation
Predicting the Future - Avner Algom presentationPredicting the Future - Avner Algom presentation
Predicting the Future - Avner Algom presentationAvner Algom
 
UISTで登壇発表しようぜ (UIST勉強会講演2/2)
UISTで登壇発表しようぜ (UIST勉強会講演2/2)UISTで登壇発表しようぜ (UIST勉強会講演2/2)
UISTで登壇発表しようぜ (UIST勉強会講演2/2)Masa Ogata
 
CORBEL Bioimage Analysis webinar slides
CORBEL Bioimage Analysis webinar slidesCORBEL Bioimage Analysis webinar slides
CORBEL Bioimage Analysis webinar slidesCORBEL
 
Predict the oscars with data science
Predict the oscars with data sciencePredict the oscars with data science
Predict the oscars with data scienceJustin Ezor
 
Kaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesKaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesJo-fai Chow
 
NASA' Use of Immersive Environments
NASA' Use of Immersive EnvironmentsNASA' Use of Immersive Environments
NASA' Use of Immersive EnvironmentsPeter Brantley
 
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se... [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...Deep Learning JP
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringTao Xie
 
Predict the Oscars using Data Science
Predict the Oscars using Data SciencePredict the Oscars using Data Science
Predict the Oscars using Data ScienceTJ Stalcup
 
Smoke and mirrors_the magic of emerging media
Smoke and mirrors_the magic of emerging mediaSmoke and mirrors_the magic of emerging media
Smoke and mirrors_the magic of emerging mediaCynthia Calongne
 

Similar to 画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー) (20)

Action Recognition Datasets.pptx
Action Recognition Datasets.pptxAction Recognition Datasets.pptx
Action Recognition Datasets.pptx
 
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2OIntroduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Recognize, Describe, and Generate: Introduction of Recent Work at MIL
Recognize, Describe, and Generate: Introduction of Recent Work at MILRecognize, Describe, and Generate: Introduction of Recent Work at MIL
Recognize, Describe, and Generate: Introduction of Recent Work at MIL
 
[論文読み]Interpretable Coun.ng for Visual Ques.on Answering
[論文読み]Interpretable Coun.ng for Visual Ques.on Answering[論文読み]Interpretable Coun.ng for Visual Ques.on Answering
[論文読み]Interpretable Coun.ng for Visual Ques.on Answering
 
Introduction to Processing
Introduction to ProcessingIntroduction to Processing
Introduction to Processing
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Introduction
Bridging the Gap: Machine Learning for Ubiquitous Computing -- IntroductionBridging the Gap: Machine Learning for Ubiquitous Computing -- Introduction
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Introduction
 
Machine Learning The Key Ingredient to Self-Driving Data Center
Machine Learning The Key Ingredient to Self-Driving Data CenterMachine Learning The Key Ingredient to Self-Driving Data Center
Machine Learning The Key Ingredient to Self-Driving Data Center
 
ENEI16 - WebGL with Three.js
ENEI16 - WebGL with Three.jsENEI16 - WebGL with Three.js
ENEI16 - WebGL with Three.js
 
romi-dm-aug2020.pptx
romi-dm-aug2020.pptxromi-dm-aug2020.pptx
romi-dm-aug2020.pptx
 
Predicting the Future - Avner Algom presentation
Predicting the Future - Avner Algom presentationPredicting the Future - Avner Algom presentation
Predicting the Future - Avner Algom presentation
 
UISTで登壇発表しようぜ (UIST勉強会講演2/2)
UISTで登壇発表しようぜ (UIST勉強会講演2/2)UISTで登壇発表しようぜ (UIST勉強会講演2/2)
UISTで登壇発表しようぜ (UIST勉強会講演2/2)
 
CORBEL Bioimage Analysis webinar slides
CORBEL Bioimage Analysis webinar slidesCORBEL Bioimage Analysis webinar slides
CORBEL Bioimage Analysis webinar slides
 
Predict the oscars with data science
Predict the oscars with data sciencePredict the oscars with data science
Predict the oscars with data science
 
Kaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesKaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New Opportunities
 
NASA' Use of Immersive Environments
NASA' Use of Immersive EnvironmentsNASA' Use of Immersive Environments
NASA' Use of Immersive Environments
 
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se... [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
 
Predict the Oscars using Data Science
Predict the Oscars using Data SciencePredict the Oscars using Data Science
Predict the Oscars using Data Science
 
Smoke and mirrors_the magic of emerging media
Smoke and mirrors_the magic of emerging mediaSmoke and mirrors_the magic of emerging media
Smoke and mirrors_the magic of emerging media
 

More from STAIR Lab, Chiba Institute of Technology

リアクティブプログラミングにおける時変値永続化の試み (第2回ステアラボソフトウェア技術セミナー)
リアクティブプログラミングにおける時変値永続化の試み (第2回ステアラボソフトウェア技術セミナー)リアクティブプログラミングにおける時変値永続化の試み (第2回ステアラボソフトウェア技術セミナー)
リアクティブプログラミングにおける時変値永続化の試み (第2回ステアラボソフトウェア技術セミナー)STAIR Lab, Chiba Institute of Technology
 
制約解消によるプログラム検証・合成 (第1回ステアラボソフトウェア技術セミナー)
制約解消によるプログラム検証・合成 (第1回ステアラボソフトウェア技術セミナー)制約解消によるプログラム検証・合成 (第1回ステアラボソフトウェア技術セミナー)
制約解消によるプログラム検証・合成 (第1回ステアラボソフトウェア技術セミナー)STAIR Lab, Chiba Institute of Technology
 
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
企業化する大学と、公益化する企業。そして、人工知能の社会実装に向けて。(ステアラボ人工知能シンポジウム)
企業化する大学と、公益化する企業。そして、人工知能の社会実装に向けて。(ステアラボ人工知能シンポジウム)企業化する大学と、公益化する企業。そして、人工知能の社会実装に向けて。(ステアラボ人工知能シンポジウム)
企業化する大学と、公益化する企業。そして、人工知能の社会実装に向けて。(ステアラボ人工知能シンポジウム)STAIR Lab, Chiba Institute of Technology
 
文法および流暢性を考慮した頑健なテキスト誤り訂正 (第15回ステアラボ人工知能セミナー)
文法および流暢性を考慮した頑健なテキスト誤り訂正 (第15回ステアラボ人工知能セミナー)文法および流暢性を考慮した頑健なテキスト誤り訂正 (第15回ステアラボ人工知能セミナー)
文法および流暢性を考慮した頑健なテキスト誤り訂正 (第15回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
多腕バンディット問題: 定式化と応用 (第13回ステアラボ人工知能セミナー)
多腕バンディット問題: 定式化と応用 (第13回ステアラボ人工知能セミナー)多腕バンディット問題: 定式化と応用 (第13回ステアラボ人工知能セミナー)
多腕バンディット問題: 定式化と応用 (第13回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
Computer Vision meets Fashion (第12回ステアラボ人工知能セミナー)
Computer Vision meets Fashion (第12回ステアラボ人工知能セミナー)Computer Vision meets Fashion (第12回ステアラボ人工知能セミナー)
Computer Vision meets Fashion (第12回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)
高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)
高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)
知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)
知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
時系列ビッグデータの特徴自動抽出とリアルタイム将来予測(第9回ステアラボ人工知能セミナー)
時系列ビッグデータの特徴自動抽出とリアルタイム将来予測(第9回ステアラボ人工知能セミナー)時系列ビッグデータの特徴自動抽出とリアルタイム将来予測(第9回ステアラボ人工知能セミナー)
時系列ビッグデータの特徴自動抽出とリアルタイム将来予測(第9回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
Stair Captions and Stair Actions(ステアラボ人工知能シンポジウム2017)
Stair Captions and Stair Actions(ステアラボ人工知能シンポジウム2017)Stair Captions and Stair Actions(ステアラボ人工知能シンポジウム2017)
Stair Captions and Stair Actions(ステアラボ人工知能シンポジウム2017)STAIR Lab, Chiba Institute of Technology
 
最近の重要な論文の紹介 - テキストとの対応付けによる映像の理解に関連して(ステアラボ人工知能シンポジウム2017)
最近の重要な論文の紹介 - テキストとの対応付けによる映像の理解に関連して(ステアラボ人工知能シンポジウム2017)最近の重要な論文の紹介 - テキストとの対応付けによる映像の理解に関連して(ステアラボ人工知能シンポジウム2017)
最近の重要な論文の紹介 - テキストとの対応付けによる映像の理解に関連して(ステアラボ人工知能シンポジウム2017)STAIR Lab, Chiba Institute of Technology
 
視覚×言語の最前線(ステアラボ人工知能シンポジウム2017)
視覚×言語の最前線(ステアラボ人工知能シンポジウム2017)視覚×言語の最前線(ステアラボ人工知能シンポジウム2017)
視覚×言語の最前線(ステアラボ人工知能シンポジウム2017)STAIR Lab, Chiba Institute of Technology
 
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)STAIR Lab, Chiba Institute of Technology
 
ヒューマンコンピュテーションのための専門家発見(ステアラボ人工知能シンポジウム2017)
ヒューマンコンピュテーションのための専門家発見(ステアラボ人工知能シンポジウム2017)ヒューマンコンピュテーションのための専門家発見(ステアラボ人工知能シンポジウム2017)
ヒューマンコンピュテーションのための専門家発見(ステアラボ人工知能シンポジウム2017)STAIR Lab, Chiba Institute of Technology
 
深層学習を利用した映像要約への取り組み(第7回ステアラボ人工知能セミナー)
深層学習を利用した映像要約への取り組み(第7回ステアラボ人工知能セミナー)深層学習を利用した映像要約への取り組み(第7回ステアラボ人工知能セミナー)
深層学習を利用した映像要約への取り組み(第7回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)STAIR Lab, Chiba Institute of Technology
 

More from STAIR Lab, Chiba Institute of Technology (20)

リアクティブプログラミングにおける時変値永続化の試み (第2回ステアラボソフトウェア技術セミナー)
リアクティブプログラミングにおける時変値永続化の試み (第2回ステアラボソフトウェア技術セミナー)リアクティブプログラミングにおける時変値永続化の試み (第2回ステアラボソフトウェア技術セミナー)
リアクティブプログラミングにおける時変値永続化の試み (第2回ステアラボソフトウェア技術セミナー)
 
制約解消によるプログラム検証・合成 (第1回ステアラボソフトウェア技術セミナー)
制約解消によるプログラム検証・合成 (第1回ステアラボソフトウェア技術セミナー)制約解消によるプログラム検証・合成 (第1回ステアラボソフトウェア技術セミナー)
制約解消によるプログラム検証・合成 (第1回ステアラボソフトウェア技術セミナー)
 
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
グラフ構造データに対する深層学習〜創薬・材料科学への応用とその問題点〜 (第26回ステアラボ人工知能セミナー)
 
企業化する大学と、公益化する企業。そして、人工知能の社会実装に向けて。(ステアラボ人工知能シンポジウム)
企業化する大学と、公益化する企業。そして、人工知能の社会実装に向けて。(ステアラボ人工知能シンポジウム)企業化する大学と、公益化する企業。そして、人工知能の社会実装に向けて。(ステアラボ人工知能シンポジウム)
企業化する大学と、公益化する企業。そして、人工知能の社会実装に向けて。(ステアラボ人工知能シンポジウム)
 
メテオサーチチャレンジ報告 (2位解法)
メテオサーチチャレンジ報告 (2位解法)メテオサーチチャレンジ報告 (2位解法)
メテオサーチチャレンジ報告 (2位解法)
 
文法および流暢性を考慮した頑健なテキスト誤り訂正 (第15回ステアラボ人工知能セミナー)
文法および流暢性を考慮した頑健なテキスト誤り訂正 (第15回ステアラボ人工知能セミナー)文法および流暢性を考慮した頑健なテキスト誤り訂正 (第15回ステアラボ人工知能セミナー)
文法および流暢性を考慮した頑健なテキスト誤り訂正 (第15回ステアラボ人工知能セミナー)
 
多腕バンディット問題: 定式化と応用 (第13回ステアラボ人工知能セミナー)
多腕バンディット問題: 定式化と応用 (第13回ステアラボ人工知能セミナー)多腕バンディット問題: 定式化と応用 (第13回ステアラボ人工知能セミナー)
多腕バンディット問題: 定式化と応用 (第13回ステアラボ人工知能セミナー)
 
Computer Vision meets Fashion (第12回ステアラボ人工知能セミナー)
Computer Vision meets Fashion (第12回ステアラボ人工知能セミナー)Computer Vision meets Fashion (第12回ステアラボ人工知能セミナー)
Computer Vision meets Fashion (第12回ステアラボ人工知能セミナー)
 
高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)
高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)
高次元空間におけるハブの出現 (第11回ステアラボ人工知能セミナー)
 
知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)
知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)
知識グラフの埋め込みとその応用 (第10回ステアラボ人工知能セミナー)
 
JSAI Cup2017報告会
JSAI Cup2017報告会JSAI Cup2017報告会
JSAI Cup2017報告会
 
時系列ビッグデータの特徴自動抽出とリアルタイム将来予測(第9回ステアラボ人工知能セミナー)
時系列ビッグデータの特徴自動抽出とリアルタイム将来予測(第9回ステアラボ人工知能セミナー)時系列ビッグデータの特徴自動抽出とリアルタイム将来予測(第9回ステアラボ人工知能セミナー)
時系列ビッグデータの特徴自動抽出とリアルタイム将来予測(第9回ステアラボ人工知能セミナー)
 
Stair Captions and Stair Actions(ステアラボ人工知能シンポジウム2017)
Stair Captions and Stair Actions(ステアラボ人工知能シンポジウム2017)Stair Captions and Stair Actions(ステアラボ人工知能シンポジウム2017)
Stair Captions and Stair Actions(ステアラボ人工知能シンポジウム2017)
 
最近の重要な論文の紹介 - テキストとの対応付けによる映像の理解に関連して(ステアラボ人工知能シンポジウム2017)
最近の重要な論文の紹介 - テキストとの対応付けによる映像の理解に関連して(ステアラボ人工知能シンポジウム2017)最近の重要な論文の紹介 - テキストとの対応付けによる映像の理解に関連して(ステアラボ人工知能シンポジウム2017)
最近の重要な論文の紹介 - テキストとの対応付けによる映像の理解に関連して(ステアラボ人工知能シンポジウム2017)
 
視覚×言語の最前線(ステアラボ人工知能シンポジウム2017)
視覚×言語の最前線(ステアラボ人工知能シンポジウム2017)視覚×言語の最前線(ステアラボ人工知能シンポジウム2017)
視覚×言語の最前線(ステアラボ人工知能シンポジウム2017)
 
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
 
ヒューマンコンピュテーションのための専門家発見(ステアラボ人工知能シンポジウム2017)
ヒューマンコンピュテーションのための専門家発見(ステアラボ人工知能シンポジウム2017)ヒューマンコンピュテーションのための専門家発見(ステアラボ人工知能シンポジウム2017)
ヒューマンコンピュテーションのための専門家発見(ステアラボ人工知能シンポジウム2017)
 
深層学習を利用した映像要約への取り組み(第7回ステアラボ人工知能セミナー)
深層学習を利用した映像要約への取り組み(第7回ステアラボ人工知能セミナー)深層学習を利用した映像要約への取り組み(第7回ステアラボ人工知能セミナー)
深層学習を利用した映像要約への取り組み(第7回ステアラボ人工知能セミナー)
 
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
 
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)
群衆の知を引き出すための機械学習(第4回ステアラボ人工知能セミナー)
 

Recently uploaded

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

画像キャプションと動作認識の最前線 〜データセットに注目して〜(第17回ステアラボ人工知能セミナー)

  • 2. • • 2015 9 • 2015 10 • 2018 6 AIP • • • • 2
  • 5. 5
  • 6. 6 man in black shirt is playing guitar. • •
  • 7. Neural Image Caption (NIC) 1. CNN Encode 2. LSTM Decode 7 [Vinyals+ 2015] Chainer Tensorflow PyTorch
  • 8. Flickr8k • Flickr 8092 5 • ” ” [Hodosh+ 2013] 8
  • 10. Flickr30k Entities • Flickr30k ( ) • 10 [Plummer+ 2016] [Liu+ 2017]
  • 11. MS COCO • Flickr 5 • 11 http://cocodataset.org/#explore?id=409091 • a lady blowing out candles on a cake • the woman is blowing out her birthday cake candles • a woman blowing candles on a frosted cake. • two people blowing out candles on a cake. • a girl is blowing out candles on a birthday cake. [Chen+ 2015]
  • 12. MS COCO Amazon Mechanical Turk (AMT) 12 • • “There is” • • • • 8 words
  • 13. Visual Genome 13 [Krishna+ 2017] Park bench is made of gray weathered wood The man is almost bald • MS COCO YFCC100M • 1 •
  • 14. Visual Genome • Object ( ) • • Attribute ( ) • • Relationship ( ) • 2 • : jumbing_over(man, fire hydrant) • Region graph ( ) • object, attribute, relationship • Scene graph ( ) • Region graph 14 Region Graph
  • 16. 1 1 • Visual Genome 1 16 [Krause+ 2017] Two children are sitting at a table in a restaurant. The children are one little girl and one little boy. The little girl is eating a pink frosted donut with white icing lines on top of it. The girl has blonde hair and is wearing a green jacket with a black long sleeve shirt underneath. The little boy is wearing a black zip up jacket and is holding his finger to his lip but is not eating. A metal napkin dispenser is in between them at the table. The wall next to them is white brick. Two adults are on the other side of the short white brick wall. The room has white circular lights on the ceiling and a large window in the front of the restaurant. It is daylight outside.
  • 17. STAIR Captions MS COCO 17 [Yoshikawa+ 2017] http://captions.stair.center/explore/
  • 19. STAIR Captions MS COCO Google ( ) STAIR Captions ( ) 19 STAIR Captions
  • 21. • Pascal Sentence • PASCAL VOC2008 1000 5 • • Abstract Scenes • • YJ Captions • MS COCO • Multi30k • Flickr30k MS COCO 21 [Rashtchian+ 2010] [Funaki+ 2015] [Zitnick+ 2013] [Miyazaki+ 2016] [Elliott+ 2016]
  • 22. 22 Pascal Sentence 1,000 5 MS COCO 123,287 5 Flickr8k 8,092 5 Flickr30k 31,783 5 Visual Genome 108,077 50 Krause et al. 19,551 1 Multi30k 123,287 5 STAIR Captions 123,287 5
  • 25. Classification C3D (3DCNN) • 3 (Conv) (Pool) • Conv : 3x3x3 kernels with stride 1 • Pool : 2x2x2 25 [Tran+ 2015] input 3 channels 16 frames 112x112 pixels output 3 (Conv)
  • 26. MNIST • HMDB51 • 51 6766 • Prelinger archive, YouTube • UCF101 • 101 13320 • YouTube 26 [Kuehne+ 2011] [Soomro+ 2012]
  • 27. ActivityNet 200 • • 27 200 1.5 2.3 [Heilbron+ 2015] • CVPR2016 ActivityNet Challenge • ActivityNet Challenge 2017 (Untrimmed Video Classification) 8.8% 1 YouTube
  • 28. ActivityNet 200 (1/4) (1) • American Time Use Survey (ATUS) 2000 200 28 American Time Use Survey Activity Lexicon 2016
  • 29. ActivityNet 200 (2/4) (2) • WordNet YouTube 29
  • 33. Charades (1/3) (1) • 1 5 5 • 2 2 33
  • 36. Charades-Ego • 1 3 • Charades • 60% 36 157 4000 [Sigurdsson+ 2018]
  • 37. Kinetics-400 • • YouTube 10 • 1 YouTube 1 37 400 30 [Kay+ 2017] Top-1/Top-5 600 Kinetics-600
  • 39. SOMETHING-SOMETHING (v1) • • Something • • 1 • • Holding something • Dropping something into something • Something falling like a rock • • 88.5% 39 174 10 2~6 [Goyal+ 2017]
  • 40. AVA • • 14 49 17 • Bounding box • 40 80 430 15 [Gu+ 2017]
  • 41. AVA (1/2) (1) YouTube • 15 30 • 15 1 3 900 (2) Bounding Box • Faster-RCNN person detector • (3) Bounding Box • Bounding Box 41
  • 43. Moments in Time • • • • 43 339 100 3 [Monfort+ 2018] Moments in Time • Top-1: 0.39, Top-5: 0.67 (The Moments in Time Recognition Challenge 2018 )
  • 44. STAIR Actions (v1.0) • • 100 • YouTube 44 100 9 5 [Yoshikawa+ 2018]
  • 46. STAIR Actions (1/4) 1. YouTube • 4 CC0 2. 5 • 5 5 3. 5 4. 46
  • 47. STAIR Actions (2/4) 10 5 10 47 5 10 5
  • 50. STAIR Actions Kinetics OpenPose 50 STAIR Actions 95.6% Kinetics 55.5%
  • 51. STAIR Actions • 2DCNN+LSTM (LRCN) Two-stream CNN 3DCNN STAIR Actions • 76.5% • c.f. Kinetics 61.0% (Two-stream CNN) 51 STAIR Actions
  • 52. 52 / Bounding Box HMDB51 / YouTube 51 6K UCF101 / YouTube 101 13K ActivityNet 200 / YouTube 200 15K Charades / 157 67K Charades-Ego / 157 8K Kinetics / YouTube 400 300K SOMETHING- SOMETHING (v1) / 174 100K AVA / YouTube 80 430 Moments in Time / YouTube 339 >1M STAIR Actions (v1.0) / / YouTube 100 >90K
  • 53. • • STAIR Captions • • MS COCO 5 • STAIR Actions • 100 53
  • 54. 1 54 • [Vinyals+ 2015] Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE, 2015. • [Hodosh+ 2013] Hodosh, Micah, Peter Young, and Julia Hockenmaier. "Framing image description as a ranking task: Data, models and evaluation metrics." Journal of Artificial Intelligence Research 47 (2013): 853-899. • [Young+ 2014] Young, Peter, et al. "From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions." Transactions of the Association for Computational Linguistics 2 (2014): 67-78. • [Plummer+ 2016] Plummer, Bryan A., et al. "Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to- sentence models." Computer Vision (ICCV), 2015 IEEE International Conference on. IEEE, 2015. • [Liu+ 2017] Liu, Chenxi, et al. "Attention Correctness in Neural Image Captioning." AAAI. 2017. • [Chen+ 2015] Chen, Xinlei, et al. "Microsoft COCO captions: Data collection and evaluation server." arXiv preprint arXiv:1504.00325 (2015).
  • 55. 2 55 • [Krishna+ 2017] Krishna, Ranjay, et al. "Visual genome: Connecting language and vision using crowdsourced dense image annotations." International Journal of Computer Vision 123.1 (2017): 32-73. • [Krause+ 2017] Krause, Jonathan, et al. "A hierarchical approach for generating descriptive image paragraphs." 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017. • [Yoshikawa+ 2017] Yoshikawa, Yuya, Yutaro Shigeto, and Akikazu Takeuchi. "Stair captions: Constructing a large-scale japanese image caption dataset." arXiv preprint arXiv:1705.00823 (2017). • [Rashtchian+ 2010] Rashtchian, Cyrus, et al. "Collecting image annotations using Amazon's Mechanical Turk." Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Association for Computational Linguistics, 2010. • [Funaki+ 2015] Funaki, Ruka, and Hideki Nakayama. "Image-mediated learning for zero-shot cross-lingual document retrieval." Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015.
  • 56. 3 56 • [Zitnick+ 2013] Zitnick, C. Lawrence, and Devi Parikh. "Bringing semantics into focus using visual abstraction." Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013. • [Miyazaki+ 2016] Miyazaki, Takashi, and Nobuyuki Shimizu. "Cross- lingual image caption generation." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2016. • [Elliott+ 2016] Elliott, Desmond, et al. "Multi30k: Multilingual english-german image descriptions." arXiv preprint arXiv:1605.00459 (2016). • [Tran+ 2015] Tran, Du, et al. "C3D: generic features for video analysis." CoRR, abs/1412.0767 2.7 (2014): 8. • [Kuehne+ 2011] Kuehne, Hilde, et al. "HMDB51: A large video database for human motion recognition." High Performance Computing in Science and Engineering ‘12. Springer, Berlin, Heidelberg, 2013. 571-582. • [Soomro+ 2012] Soomro, Khurram, Amir Roshan Zamir, and Mubarak Shah. "UCF101: A dataset of 101 human actions classes from videos in the wild." arXiv preprint arXiv:1212.0402 (2012).
  • 57. 4 57 • [Heilbron+ 2015] Caba Heilbron, Fabian, et al. "Activitynet: A large- scale video benchmark for human activity understanding." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. • [Sigurdsson+ 2016] Sigurdsson, Gunnar A., et al. "Hollywood in homes: Crowdsourcing data collection for activity understanding." European Conference on Computer Vision. Springer, Cham, 2016. • [Sigurdsson+ 2018] Sigurdsson, Gunnar A., et al. "Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos." arXiv preprint arXiv:1804.09626 (2018). • [Kay+ 2017] Kay, Will, et al. "The kinetics human action video dataset." arXiv preprint arXiv:1705.06950 (2017). • [Goyal+ 2017] Goyal, Raghav, et al. "The” something something” video database for learning and evaluating visual common sense." Proc. ICCV. 2017. • [Gu+ 2017] Gu, Chunhui, et al. "AVA: A video dataset of spatio- temporally localized atomic visual actions." arXiv preprint arXiv:1705.08421(2017).
  • 58. 5 58 • [Monfort+ 2018] Monfort, Mathew, et al. "Moments in Time Dataset: one million videos for event understanding." arXiv preprint arXiv:1801.03150(2018). • [Yoshikawa+ 2018] Yoshikawa, Yuya, Jiaqing Lin, and Akikazu Takeuchi. "STAIR Actions: A Video Dataset of Everyday Home Actions." arXiv preprint arXiv:1804.04326 (2018).