SlideShare a Scribd company logo
1 of 40
Download to read offline
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Action Recognition
September 3, 2018
Katsunori Ohnishi
DeNA Co., Ltd.
1
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
n Action recognition
n
n
n
Deep
Deep
Temporal Aggregation
n Tips
n
n
2
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n ( )
Twitter: @ohnishi_ka
n
2014 4 -2017 9 : B4~M2.5 Computer Vision
• ( ) : http://katsunoriohnishi.github.io/
CVPR2016 (spotlight oral, acceptance rate=9.7%): egocentric vision (wrist-mounted camera)
ACMMM2016 (poster, acceptance rate=30%): action recognition ( state-of-the-art)
AAAI2018 (oral, acceptance rate=10.9%): video generation (FTGAN)
2017 10 - : DeNA AI
• DeNA
→ https://www.wantedly.com/projects/209980
3
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Action Recognition
n
Image classification
action recognition = human action recognition
• fine-grained egocentric
4
Fine-grained
egocentric
Dog-centric
Action recognition
RGBD
Evaluation of video activity localizations integrating quality and quantity measurements [C. Wolf+, CVIU14]
Recognizing Activities of Daily Living with a Wrist-mounted Camera [K. Ohnishi+, CVPR16]
A Database for Fine Grained Activity Detection of Cooking Activities [M. Rohrbach+, CVPR12]
First-Person Animal Activity Recognition from Egocentric Videos [Y. Iwashita+, ICPR14]
Recognizing Human Actions: A Local SVM Approach [C. Schuldt+, ICPR04]
HMDB: A Large Video Database for Human Motion Recognition [H. Kuehne+, ICCV11]
Ucf101: A dataset of 101 human actions classes from videos in the wild [K. Soomro+, arXiv2012]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
KTH, UCF101, HMDB51
• UCF101 101 13320 …
n
Activity-net, Kinetics, Youtube8M
n
AVA, Moments in times, SLAC
5
UCF101
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n YouTube-8M Video Understanding
Challenge
https://www.kaggle.com/c/youtube8m
CVPR17 ECCV18 workshop ,
Kaggle
frame-level
test
• kaggle , action recognition
n ActivityNet Challenge
http://activity-net.org/challenges/2018/
ActivityNet 3
• Temporal Proposal (T )
• Temporal localization (T )
• Video Captioning
• Kinetics: classification (human action)
• AVA: Spatio-temporal localization (XYT)
• Moments-in-time: classification (event)
6
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN
n
2000
SIFT
local descriptor→coding global feature→
n
STIP [I. Laptev, IJCV04]
Dense Trajectory [H. Wang+, ICCV11]
Improved Dense Trajectory [H. Wang+, ICCV13]
7
•
http://hirokatsukataoka.net/temp/presen/170121STAIRLab_slideshar
e.pdf
•
https://arxiv.org/pdf/1605.04988.pdf
On space-time interest points [I. Laptev, IJCV04]
Action Recognition by Dense Trajectories [H. Wang+, ICCV11]
Action Recognition with Improved Trajectories [H. Wang+, ICCV13]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN
n Improved Dense Trajectories (iDT) [H. Wang+, ICCV13]
Dense Trajectories [H. Wang+, ICCV11]
8
2
optical flow
foreground
optical flow
Improved dense trajectories (green)
(background dense trajectories (white))
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN
n
9
SIFT Fisher Vector
Fisher vector
http://www.isi.imi.i.u-tokyo.ac.jp/~harada/pdf/SSII_harada20120608.pdf
https://www.slideshare.net/takao-y/fisher-vector
…
input Local descriptor
iDT
Video descriptor
Fisher Vector
[F. Perronnin+, CVPR07]
Classifier
SVM
Fisher kernels on visual vocabularies for image categorization [F. Perronnin, CVPR07]
[F. Pedregosa+, JMLR11]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition
n
CNN
Two-stream
• Hand-crafted feature ( )
3D Convolution
• C3D
• C3D Two-stream
• 3D conv
Optical flow
10
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: CNN
n Spatio-temporal ConvNet [A. Karpathy+, CVPR 14]
CNN
AlexNet RGB ch → 10 frames ch (gray)
multi scale Fusion
Sports1M pre-training UCF101 65.4 (iDT 85.9%)
11
Large-scale video classification with convolutional neural network [A. Karpathy+, CVPR14]
• 10 frames conv1 ch
• RGB gray frame-by-frame
score ( )
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: Two-stream
n Two-stream [K. Simonyan+, NIPS15]
2D CNN* ,
• Spatial-stream: RGB (input: RGB)
• Temporal-stream: Optical flow (input: optical flow 10 frames)
• Frame-by-frame
Hand-crafted feature CNN
12
Two-stream convolutional networks for action recognition in videos [K. Simonyan+, NIPS15]
UCF101 HMDB51
iDT 85.9% 57.2%
Spatio-temporal ConvNet 65.4% -
RGB-stream 73.0% 40.5%
Flow-stream 83.7% 54.6%
Two-steam 88.0% 59.4%
• ( )
• 2DCNN
*imagenet pre-trained
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n C3D [D. Tran +, ICCV15]
16frame 3D convolution CNN
• XYT 3D convolution
UCF101 pre-training
ICCV15 arxiv 2 reject
13
Learning Spatiotemporal Features with 3D Convolutional Networks [D. Tran +, ICCV15]
UCF101 HMDB51
iDT 85.9% 57.2%
Two-steam 88.0% 59.4%
C3D (1net) 82.3% -
3D conv
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n P3D [Z. Qiu+, ICCV17]
C3D ,
3D conv → 2D conv (XY) + 1D conv (T)
pre-training
14
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [Z. Qiu+, ICCV17]
UCF101 HMDB51
iDT 85.9% 57.2%
Two-steam (Alexnet) 88.0% 59.4%
P3D (ResNet) 88.6% -
Spatial 2D conv
Temporal 1D conv
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n P3D [Z. Qiu+, ICCV17]
C3D ,
3D conv → 2D conv (XY) + 1D conv (T)
pre-training
15
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [Z. Qiu+, ICCV17]
UCF101 HMDB51
iDT 85.9% 57.2%
Two-steam (Alexnet) 88.0% 59.4%
P3D (ResNet) 88.6% -
Two-stream (ResNet152) 91.8%Spatial 2D conv
Temporal 1D conv
3D conv
again
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n C3D, P3D
3D conv
n
3D conv [K. Hara+, CVPR18]
16
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [K. Hara+, CVPR18]
2012 2011 2015 2017
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n C3D, P3D
3D conv
n
3D conv [K. Hara+, CVPR18]
17
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [K. Hara+, CVPR18]
2012 2011 2015 20172017
Kinetics!
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n Kinetics
human action dataset!
3D conv
• Pre-train UCF101
18
The Kinetics human action video dataset [W. Kay+, arXiv17]
• Youtube8M
•
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n I3D [J. Carreira +, ICCV17]
Kinetics dataset DeepMind
3D conv Inception
64 GPUs for training, 16 GPUs for predict
state-of-the-art
• RGB
• Two-stream optical flow
score
19
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J. Carreira +, ICCV17]
UCF101 HMDB51
RGB-I3D 95.6% 74.8%
Flow-I3D 96.7% 77.1%
Two-stream I3D 98.0% 80.7%
…
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n I3D [J. Carreira +, ICCV17]
Kinetics dataset DeepMind
3D conv Inception
64 GPUs for training, 16 GPUs for predict
state-of-the-art
• RGB
• Two-stream optical flow
score
20
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J. Carreira +, ICCV17]
UCF101 HMDB51
RGB-I3D 95.6% 74.8%
Flow-I3D 96.7% 77.1%
Two-stream I3D 98.0% 80.7%
…
?
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n I3D Two-stream
3D convolution
n ( )
3D conv XY T
• XY T
3D conv
21
time
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n 3D convolution [D.A. Huang+, CVPR18]
• 3D CNN
• →
•
• Two-stream I3D Optical flow 3D conv
22
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets [D.A. Huang+, CVPR18]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: 3D convolution
n 3D conv
CVPR18
CVPR/ICCV/ECCV
3D conv 3D
conv
• GPU
23
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
CNN action recognition: Optical flow
n Optical flow [L Sevilla-Lara+, CVPR18]
• Optical flow
• Optical flow (EPE) action recognition
• flow action recognition
•
Optical flow appearance
• Optical flow
24
On the Integration of Optical Flow and Action Recognition [L Sevilla-Lara+, CVPR18]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
25
AVA
XYZT bounding box
human action localization
Moments-in-time
3
Kinetics-600
Kinetics 400 600
[C. Gu+, CVPR18] [M. Monfort+, arXiv2018] [W. Kay+, arXiv2017]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n
2D conv frame-by-frame 3D conv
(100 frames, 232 frames, 50 frames)
26
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n
Score
→
LSTM
→
• FC
?
• fencing → fencing
→…
27
…
…
CNN
LSTM
FC
CNN
LSTM
FC
CNN
LSTM
FC
CVPR ACMMM AAAI
…
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
…
input Local descriptor
iDT
Video descriptor
Fisher Vector
[F. Perronnin+, CVPR07]
Classifier
SVM
[F. Pedregosa+, JMLR11]
Temporal Aggregation
n ,
→ …!
Fisher Vector
• CNN SIFT GMM
• FV VLAD [H. Jegou+, CVPR10]
28
Aggregating local descriptors into a compact image representation [H. Jegou+, CVPR10]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n LCD [Z. Xu+, CVPR15]
VGG16 pool5 XY 512dim feature
• 224x224 feature 7x7=49
• VLAD global feature
29
A discriminative CNN video representation for event detection [Z. Xu+, CVPR15]
…
input
CNN
Pool5
(e.g. 2x2x512)
Local descriptors
VLAD
SVM
global feature
CNN
CNN
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n ActionVLAD [R. Girdhar+, CVPR17]
NetVLAD [R Arandjelović+, CVPR16]
• NetVLAD VLAD NN Cluster assign softmax
assign
• VLAD LCD
VLAD
• End2end CNN !
30
ActionVLAD: Learning spatio-temporal aggregation for action classification [R. Girdhar+, CVPR17]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n TLE [A. Diba+, CVPR17]
VLAD Compact Bilinear Pooling [Y. Gao+, CVPR16]
Temporal Aggregation
VLAD
• SVM VLAD NN
31
Deep Temporal Linear Encoding Networks [A. Diba+, CVPR17]
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Tips
n
Two-stream (ResNet) 2D conv Optical flow
n Single model State-of-the-art
I3D + TLE BA
64GPU
n
Two-stream optical flow GPU
• optical flow stream
• RGB-stream
Optical flow
32
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Tips
n
CNN TLE coding
• TLE ActionVLAD
iDT
• CNN
• FisherVector iDT
Tips: PCA (dim=64). K=256. FV power norm
• CPU
33
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
Temporal Aggregation
n
Score
→
LSTM
→
• FC
?
• fencing → fencing
→…
34
…
…
CNN
LSTM
FC
CNN
LSTM
FC
CNN
LSTM
FC
CVPR ACMMM AAAI
…
input
↓
Two-stream
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
LSTM
3D conv
Optical flow
•
[L Sevilla-Lara+, CVPR18]
35
…
…
CNN
LSTM
FC
CNN
LSTM
FC
CNN
LSTM
FC
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
2D conv + LSTM 3D conv 3D conv
Two-stream
Optical flow
MoCoGAN
[S. Tulyakov+, CVPR18]
VGAN
[C. Vondrick+, NIPS16]
TGAN
[M. Saito+, ICCV17]
FTGAN
[K. Ohnishi+, AAAI18]
LRCN
[J. Donahue+, CVPR15]
C3D
[D. Tran+, ICCV15]
P3D
[Z. Qiu+, ICCV17]
Two-stream [K. Simonyan+, NIPS15]
I3D [J. Carreira +, ICCV17]
( )VGAN
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
2D conv + LSTM 3D conv 3D conv
Two-stream
Optical flow
MoCoGAN
[S. Tulyakov+, CVPR18]
VGAN
[C. Vondrick+, NIPS16]
TGAN
[M. Saito+, ICCV17]
FTGAN
[K. Ohnishi+, AAAI18]
LRCN
[J. Donahue+, CVPR15]
C3D
[D. Tran+, ICCV15]
P3D
[Z. Qiu+, ICCV17]
Two-stream [K. Simonyan+, NIPS15]
I3D [J. Carreira +, ICCV17]
( )
!
VGAN
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n !
Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture
K. Ohnishi+, AAAI 2018 (oral presentation)
https://arxiv.org/abs/1711.09618
38
Optical flow
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
Action classification
• Temporal action localization Spatio-temporal localization
3D conv
Augmentation
n Pose
Pose
• pose
• data distillation
n Tips
&optical flow
Kinetics Youtube
39
Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.
n
XY XYT O(n2)→ O(n3)
• !
n
n
n
40

More Related Content

What's hot

What's hot (20)

【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-
【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-
【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-
 
【チュートリアル】コンピュータビジョンによる動画認識
【チュートリアル】コンピュータビジョンによる動画認識【チュートリアル】コンピュータビジョンによる動画認識
【チュートリアル】コンピュータビジョンによる動画認識
 
文献紹介:TSM: Temporal Shift Module for Efficient Video Understanding
文献紹介:TSM: Temporal Shift Module for Efficient Video Understanding文献紹介:TSM: Temporal Shift Module for Efficient Video Understanding
文献紹介:TSM: Temporal Shift Module for Efficient Video Understanding
 
画像認識と深層学習
画像認識と深層学習画像認識と深層学習
画像認識と深層学習
 
SSII2019企画: 点群深層学習の研究動向
SSII2019企画: 点群深層学習の研究動向SSII2019企画: 点群深層学習の研究動向
SSII2019企画: 点群深層学習の研究動向
 
ConvNetの歴史とResNet亜種、ベストプラクティス
ConvNetの歴史とResNet亜種、ベストプラクティスConvNetの歴史とResNet亜種、ベストプラクティス
ConvNetの歴史とResNet亜種、ベストプラクティス
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
 
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
 
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)
【論文読み会】Alias-Free Generative Adversarial Networks(StyleGAN3)
 
【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者【メタサーベイ】Vision and Language のトップ研究室/研究者
【メタサーベイ】Vision and Language のトップ研究室/研究者
 
【DL輪読会】An Image is Worth One Word: Personalizing Text-to-Image Generation usi...
【DL輪読会】An Image is Worth One Word: Personalizing Text-to-Image Generation usi...【DL輪読会】An Image is Worth One Word: Personalizing Text-to-Image Generation usi...
【DL輪読会】An Image is Worth One Word: Personalizing Text-to-Image Generation usi...
 
[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision
[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision
[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision
 
【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習
 
深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎
 
畳み込みニューラルネットワークの研究動向
畳み込みニューラルネットワークの研究動向畳み込みニューラルネットワークの研究動向
畳み込みニューラルネットワークの研究動向
 
12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf
 
【メタサーベイ】Video Transformer
 【メタサーベイ】Video Transformer 【メタサーベイ】Video Transformer
【メタサーベイ】Video Transformer
 
Domain Adaptation 発展と動向まとめ(サーベイ資料)
Domain Adaptation 発展と動向まとめ(サーベイ資料)Domain Adaptation 発展と動向まとめ(サーベイ資料)
Domain Adaptation 発展と動向まとめ(サーベイ資料)
 
【チュートリアル】コンピュータビジョンによる動画認識 v2
【チュートリアル】コンピュータビジョンによる動画認識 v2【チュートリアル】コンピュータビジョンによる動画認識 v2
【チュートリアル】コンピュータビジョンによる動画認識 v2
 
【メタサーベイ】Neural Fields
【メタサーベイ】Neural Fields【メタサーベイ】Neural Fields
【メタサーベイ】Neural Fields
 

Similar to Action Recognitionの歴史と最新動向

"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen..."Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
Edge AI and Vision Alliance
 
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Michael Hewitt, GISP
 

Similar to Action Recognitionの歴史と最新動向 (20)

動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]
 
How Deep Learning Could Predict Weather Events
How Deep Learning Could Predict Weather EventsHow Deep Learning Could Predict Weather Events
How Deep Learning Could Predict Weather Events
 
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen..."Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
"Using Deep Learning for Video Event Detection on a Compute Budget," a Presen...
 
Recent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-ResolutionRecent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-Resolution
 
Video complexity analyzer (VCA) for streaming applications
 Video complexity analyzer (VCA) for streaming applications Video complexity analyzer (VCA) for streaming applications
Video complexity analyzer (VCA) for streaming applications
 
Navigation-aware adaptive streaming strategies for omnidirectional video
Navigation-aware adaptive streaming strategies for omnidirectional videoNavigation-aware adaptive streaming strategies for omnidirectional video
Navigation-aware adaptive streaming strategies for omnidirectional video
 
Neural Architectures for Video Encoding
Neural Architectures for Video EncodingNeural Architectures for Video Encoding
Neural Architectures for Video Encoding
 
Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩
 
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
Daniel Bochicchio, Skybernetics - “Valuable Insights from On High: Drone use ...
 
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
 
Presentation NBMP and PCC
Presentation NBMP and PCCPresentation NBMP and PCC
Presentation NBMP and PCC
 
GRT Imaging for Seismic AVO/AVA Inversion
GRT Imaging for Seismic AVO/AVA InversionGRT Imaging for Seismic AVO/AVA Inversion
GRT Imaging for Seismic AVO/AVA Inversion
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
 
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...“Video Activity Recognition with Limited Data for Smart Home Applications,” a...
“Video Activity Recognition with Limited Data for Smart Home Applications,” a...
 
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision..."Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
Session6
Session6Session6
Session6
 
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Action Recognitionの歴史と最新動向

  • 1. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved.Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Action Recognition September 3, 2018 Katsunori Ohnishi DeNA Co., Ltd. 1
  • 2. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n n Action recognition n n n Deep Deep Temporal Aggregation n Tips n n 2
  • 3. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n ( ) Twitter: @ohnishi_ka n 2014 4 -2017 9 : B4~M2.5 Computer Vision • ( ) : http://katsunoriohnishi.github.io/ CVPR2016 (spotlight oral, acceptance rate=9.7%): egocentric vision (wrist-mounted camera) ACMMM2016 (poster, acceptance rate=30%): action recognition ( state-of-the-art) AAAI2018 (oral, acceptance rate=10.9%): video generation (FTGAN) 2017 10 - : DeNA AI • DeNA → https://www.wantedly.com/projects/209980 3
  • 4. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Action Recognition n Image classification action recognition = human action recognition • fine-grained egocentric 4 Fine-grained egocentric Dog-centric Action recognition RGBD Evaluation of video activity localizations integrating quality and quantity measurements [C. Wolf+, CVIU14] Recognizing Activities of Daily Living with a Wrist-mounted Camera [K. Ohnishi+, CVPR16] A Database for Fine Grained Activity Detection of Cooking Activities [M. Rohrbach+, CVPR12] First-Person Animal Activity Recognition from Egocentric Videos [Y. Iwashita+, ICPR14] Recognizing Human Actions: A Local SVM Approach [C. Schuldt+, ICPR04] HMDB: A Large Video Database for Human Motion Recognition [H. Kuehne+, ICCV11] Ucf101: A dataset of 101 human actions classes from videos in the wild [K. Soomro+, arXiv2012]
  • 5. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n KTH, UCF101, HMDB51 • UCF101 101 13320 … n Activity-net, Kinetics, Youtube8M n AVA, Moments in times, SLAC 5 UCF101
  • 6. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n YouTube-8M Video Understanding Challenge https://www.kaggle.com/c/youtube8m CVPR17 ECCV18 workshop , Kaggle frame-level test • kaggle , action recognition n ActivityNet Challenge http://activity-net.org/challenges/2018/ ActivityNet 3 • Temporal Proposal (T ) • Temporal localization (T ) • Video Captioning • Kinetics: classification (human action) • AVA: Spatio-temporal localization (XYT) • Moments-in-time: classification (event) 6
  • 7. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN n 2000 SIFT local descriptor→coding global feature→ n STIP [I. Laptev, IJCV04] Dense Trajectory [H. Wang+, ICCV11] Improved Dense Trajectory [H. Wang+, ICCV13] 7 • http://hirokatsukataoka.net/temp/presen/170121STAIRLab_slideshar e.pdf • https://arxiv.org/pdf/1605.04988.pdf On space-time interest points [I. Laptev, IJCV04] Action Recognition by Dense Trajectories [H. Wang+, ICCV11] Action Recognition with Improved Trajectories [H. Wang+, ICCV13]
  • 8. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN n Improved Dense Trajectories (iDT) [H. Wang+, ICCV13] Dense Trajectories [H. Wang+, ICCV11] 8 2 optical flow foreground optical flow Improved dense trajectories (green) (background dense trajectories (white))
  • 9. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN n 9 SIFT Fisher Vector Fisher vector http://www.isi.imi.i.u-tokyo.ac.jp/~harada/pdf/SSII_harada20120608.pdf https://www.slideshare.net/takao-y/fisher-vector … input Local descriptor iDT Video descriptor Fisher Vector [F. Perronnin+, CVPR07] Classifier SVM Fisher kernels on visual vocabularies for image categorization [F. Perronnin, CVPR07] [F. Pedregosa+, JMLR11]
  • 10. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition n CNN Two-stream • Hand-crafted feature ( ) 3D Convolution • C3D • C3D Two-stream • 3D conv Optical flow 10
  • 11. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: CNN n Spatio-temporal ConvNet [A. Karpathy+, CVPR 14] CNN AlexNet RGB ch → 10 frames ch (gray) multi scale Fusion Sports1M pre-training UCF101 65.4 (iDT 85.9%) 11 Large-scale video classification with convolutional neural network [A. Karpathy+, CVPR14] • 10 frames conv1 ch • RGB gray frame-by-frame score ( )
  • 12. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: Two-stream n Two-stream [K. Simonyan+, NIPS15] 2D CNN* , • Spatial-stream: RGB (input: RGB) • Temporal-stream: Optical flow (input: optical flow 10 frames) • Frame-by-frame Hand-crafted feature CNN 12 Two-stream convolutional networks for action recognition in videos [K. Simonyan+, NIPS15] UCF101 HMDB51 iDT 85.9% 57.2% Spatio-temporal ConvNet 65.4% - RGB-stream 73.0% 40.5% Flow-stream 83.7% 54.6% Two-steam 88.0% 59.4% • ( ) • 2DCNN *imagenet pre-trained
  • 13. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n C3D [D. Tran +, ICCV15] 16frame 3D convolution CNN • XYT 3D convolution UCF101 pre-training ICCV15 arxiv 2 reject 13 Learning Spatiotemporal Features with 3D Convolutional Networks [D. Tran +, ICCV15] UCF101 HMDB51 iDT 85.9% 57.2% Two-steam 88.0% 59.4% C3D (1net) 82.3% - 3D conv
  • 14. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n P3D [Z. Qiu+, ICCV17] C3D , 3D conv → 2D conv (XY) + 1D conv (T) pre-training 14 Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [Z. Qiu+, ICCV17] UCF101 HMDB51 iDT 85.9% 57.2% Two-steam (Alexnet) 88.0% 59.4% P3D (ResNet) 88.6% - Spatial 2D conv Temporal 1D conv
  • 15. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n P3D [Z. Qiu+, ICCV17] C3D , 3D conv → 2D conv (XY) + 1D conv (T) pre-training 15 Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks [Z. Qiu+, ICCV17] UCF101 HMDB51 iDT 85.9% 57.2% Two-steam (Alexnet) 88.0% 59.4% P3D (ResNet) 88.6% - Two-stream (ResNet152) 91.8%Spatial 2D conv Temporal 1D conv 3D conv again
  • 16. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n C3D, P3D 3D conv n 3D conv [K. Hara+, CVPR18] 16 Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [K. Hara+, CVPR18] 2012 2011 2015 2017
  • 17. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n C3D, P3D 3D conv n 3D conv [K. Hara+, CVPR18] 17 Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [K. Hara+, CVPR18] 2012 2011 2015 20172017 Kinetics!
  • 18. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n Kinetics human action dataset! 3D conv • Pre-train UCF101 18 The Kinetics human action video dataset [W. Kay+, arXiv17] • Youtube8M •
  • 19. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n I3D [J. Carreira +, ICCV17] Kinetics dataset DeepMind 3D conv Inception 64 GPUs for training, 16 GPUs for predict state-of-the-art • RGB • Two-stream optical flow score 19 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J. Carreira +, ICCV17] UCF101 HMDB51 RGB-I3D 95.6% 74.8% Flow-I3D 96.7% 77.1% Two-stream I3D 98.0% 80.7% …
  • 20. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n I3D [J. Carreira +, ICCV17] Kinetics dataset DeepMind 3D conv Inception 64 GPUs for training, 16 GPUs for predict state-of-the-art • RGB • Two-stream optical flow score 20 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J. Carreira +, ICCV17] UCF101 HMDB51 RGB-I3D 95.6% 74.8% Flow-I3D 96.7% 77.1% Two-stream I3D 98.0% 80.7% … ?
  • 21. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n I3D Two-stream 3D convolution n ( ) 3D conv XY T • XY T 3D conv 21 time
  • 22. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n 3D convolution [D.A. Huang+, CVPR18] • 3D CNN • → • • Two-stream I3D Optical flow 3D conv 22 What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets [D.A. Huang+, CVPR18]
  • 23. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: 3D convolution n 3D conv CVPR18 CVPR/ICCV/ECCV 3D conv 3D conv • GPU 23
  • 24. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. CNN action recognition: Optical flow n Optical flow [L Sevilla-Lara+, CVPR18] • Optical flow • Optical flow (EPE) action recognition • flow action recognition • Optical flow appearance • Optical flow 24 On the Integration of Optical Flow and Action Recognition [L Sevilla-Lara+, CVPR18]
  • 25. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. 25 AVA XYZT bounding box human action localization Moments-in-time 3 Kinetics-600 Kinetics 400 600 [C. Gu+, CVPR18] [M. Monfort+, arXiv2018] [W. Kay+, arXiv2017]
  • 26. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n 2D conv frame-by-frame 3D conv (100 frames, 232 frames, 50 frames) 26
  • 27. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n Score → LSTM → • FC ? • fencing → fencing →… 27 … … CNN LSTM FC CNN LSTM FC CNN LSTM FC CVPR ACMMM AAAI …
  • 28. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. … input Local descriptor iDT Video descriptor Fisher Vector [F. Perronnin+, CVPR07] Classifier SVM [F. Pedregosa+, JMLR11] Temporal Aggregation n , → …! Fisher Vector • CNN SIFT GMM • FV VLAD [H. Jegou+, CVPR10] 28 Aggregating local descriptors into a compact image representation [H. Jegou+, CVPR10]
  • 29. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n LCD [Z. Xu+, CVPR15] VGG16 pool5 XY 512dim feature • 224x224 feature 7x7=49 • VLAD global feature 29 A discriminative CNN video representation for event detection [Z. Xu+, CVPR15] … input CNN Pool5 (e.g. 2x2x512) Local descriptors VLAD SVM global feature CNN CNN
  • 30. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n ActionVLAD [R. Girdhar+, CVPR17] NetVLAD [R Arandjelović+, CVPR16] • NetVLAD VLAD NN Cluster assign softmax assign • VLAD LCD VLAD • End2end CNN ! 30 ActionVLAD: Learning spatio-temporal aggregation for action classification [R. Girdhar+, CVPR17]
  • 31. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n TLE [A. Diba+, CVPR17] VLAD Compact Bilinear Pooling [Y. Gao+, CVPR16] Temporal Aggregation VLAD • SVM VLAD NN 31 Deep Temporal Linear Encoding Networks [A. Diba+, CVPR17]
  • 32. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Tips n Two-stream (ResNet) 2D conv Optical flow n Single model State-of-the-art I3D + TLE BA 64GPU n Two-stream optical flow GPU • optical flow stream • RGB-stream Optical flow 32
  • 33. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Tips n CNN TLE coding • TLE ActionVLAD iDT • CNN • FisherVector iDT Tips: PCA (dim=64). K=256. FV power norm • CPU 33
  • 34. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. Temporal Aggregation n Score → LSTM → • FC ? • fencing → fencing →… 34 … … CNN LSTM FC CNN LSTM FC CNN LSTM FC CVPR ACMMM AAAI … input ↓ Two-stream
  • 35. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n LSTM 3D conv Optical flow • [L Sevilla-Lara+, CVPR18] 35 … … CNN LSTM FC CNN LSTM FC CNN LSTM FC
  • 36. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. 2D conv + LSTM 3D conv 3D conv Two-stream Optical flow MoCoGAN [S. Tulyakov+, CVPR18] VGAN [C. Vondrick+, NIPS16] TGAN [M. Saito+, ICCV17] FTGAN [K. Ohnishi+, AAAI18] LRCN [J. Donahue+, CVPR15] C3D [D. Tran+, ICCV15] P3D [Z. Qiu+, ICCV17] Two-stream [K. Simonyan+, NIPS15] I3D [J. Carreira +, ICCV17] ( )VGAN
  • 37. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. 2D conv + LSTM 3D conv 3D conv Two-stream Optical flow MoCoGAN [S. Tulyakov+, CVPR18] VGAN [C. Vondrick+, NIPS16] TGAN [M. Saito+, ICCV17] FTGAN [K. Ohnishi+, AAAI18] LRCN [J. Donahue+, CVPR15] C3D [D. Tran+, ICCV15] P3D [Z. Qiu+, ICCV17] Two-stream [K. Simonyan+, NIPS15] I3D [J. Carreira +, ICCV17] ( ) ! VGAN
  • 38. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n ! Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture K. Ohnishi+, AAAI 2018 (oral presentation) https://arxiv.org/abs/1711.09618 38 Optical flow
  • 39. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n Action classification • Temporal action localization Spatio-temporal localization 3D conv Augmentation n Pose Pose • pose • data distillation n Tips &optical flow Kinetics Youtube 39
  • 40. Copyright (C) 2018 DeNA Co.,Ltd. All Rights Reserved. n XY XYT O(n2)→ O(n3) • ! n n n 40