SlideShare a Scribd company logo
1 of 32
Download to read offline
Pyramid Scene Parsing Network
Hengshuang Zhao1
, Jianping Shi2
, Xiaojuan Qi1
,
Xiaogang Wang1
, Jiaya Jia 1
1
The Chinese University of Hong Kong, 2
SenseTime Group Limited
Presentation: Shunta Saito
Slide: Powered by Deckset
(c) Preferred Networks 1
Summary
• Introduce Pyramid Pooling Module for better context grasp with sub-region awareness
(c) Preferred Networks 2
Why did I choose this paper?
• Presented in CVPR 2017
• 1st place in ImageNet Scene Parsing Challenge
2016 (ADE20K)
• was 1st place in Cityscapes leaderboard
• now it's in 2nd place (I noticed this last week!)
(c) Preferred Networks 3
Agenda
1. Common building blocks in semantic segmentation
2. Major Issue
3. Prior Work
4. Pyramid Pooling Module
5. Experiment results
(c) Preferred Networks 4
Semantic Segmentation
• Predict pixel-wise labels from natural
images
• Each pixel in an image belongs to an
object class
• So it's not instance-aware !
(c) Preferred Networks 5
Common Building Blocks (1)
Fully convolutional network (FCN)1
• A deep convolutional neural network
which doesn't include any fully-
connected layers
• Almost all recent methods are based
on FCN
• Typically pre-trained with ImageNet
under classification problem setting
1
"Fully Convolutional Networks for Semantic Segmentation", PAMI 2016
(c) Preferred Networks 6
Common Building Blocks (2)
Dilated convolution2
• Widen receptive field without reducing
feature map resolution
• Important for leveraging global context
prior efficiently
2
"Multi-Scale Context Aggregation by Dilated Convolutions", ICLR 2016
(c) Preferred Networks 7
Common Building Blocks (3)
Multi-scale feature ensemble
• Higher-layer feature contains more
semantic meaning and less location
information
• Combining multi-scale features can
improve the performance3
3
"Hypercolumns for Object Segmentation and Fine-grained Localization",
CVPR 2015
(c) Preferred Networks 8
Common Building Blocks (4)
Conditional random field (CRF)
• Post-processing to refine the
segmentation result (DeepLab4
)
• Some following methods refined network
via end-to-end modeling (DPN5
, CRF as
RNN6
, Detections and Superpixels7
)
7
"Higher order conditional random fields in deep neural networks", ECCV
2016
6
"Conditional random fields as recurrent neural networks", ICCV 2015
5
"Semantic image segmentation via deep parsing network", ICCV 2015
4
"Semantic image segmentation with deep convolutional nets and fully
connected crfs", ICLR 2015
(c) Preferred Networks 9
Common Building Blocks (5)
Global average pooling (GAP)
• ParsenNet8
proved that global average
pooling with FCN can improve semantic
segmentation results
• But the global descriptors used in the
paper are not representative enough for
some challenging datasets like ADE20K
8
"Parsenet: Looking wider to see better", ICLR 2016
(c) Preferred Networks 10
Major Issue (1)
Mismatched relationship
• Co-occurrent visual patterns imply some
contexts
• e.g., an airplane is likely to fly in sky
while not over a road
• Lack of the ability to collect contextual
information increases the chance of
misclassification
• In the right figure, FCN predicts the boat
in the yellow box as a "car" based on its
appearance
(c) Preferred Networks 11
Major Issue (2)
Confusing Classes
• There are confusing classes in major datasets: field
and earth; mountain and hill; wall, house, building
and skyscraper, etc.
• The expert human annotator still makes 17.6%
pixel error for ADE20K9
• FCN predicts the object in the box as part of
skyscraper and part of building but the whole object
should be either skyscraper or building, not both
• Utilizing the relationship between classes is
important
9
"Semantic understanding of scenes through the ADE20K dataset",
CVPR 2017
(c) Preferred Networks 12
Major Issue (3)
Inconspicuous Classes
• Small objects like streetlight and
signboard are inconspicuous and hard
to find while they may be important
• Big objects may appear in
discontinuous, but FCN couldn't label
the pillow which has similar
appearance with the sheet correctly
• To improve performance for small or
very big objects, sub-regions should be
paid more attention
(c) Preferred Networks 13
Summary of Issues
• Use co-occurrent visual patterns as context
• Consider relationship between classes
• Sub-regions should be paid more attention
(c) Preferred Networks 14
Prior Work
Global Average Pooling (GAP)10
• Receptive field of ResNet is already
larger than the input image, so GAP
sounds good to summarize the all
information
• But, pixels in an image may be various
objects which have different sizes, so
directly fusing them to form a single
vector may lose the spatial relation
and cause ambiguity
10
"Parsenet: Looking wider to see better", ICLR 2016
(c) Preferred Networks 15
Prior Work
Spatial Pyramid Pooling (SPP)11
• Pooling with different kernel/stride
sizes to the feature maps
• Then flatten and concatenate the
pooling results to make fix-length
representation
• There still is context information loss
11
"Spatial pyramid pooling in deep convolutional networks for visual
recognition", ECCV 2014
(c) Preferred Networks 16
Pyramid Pooling Module
• A hierarchical global prior, containing information with different scales and varying among different sub-regions
• Pyramid Pooling Module for global scene prior constructed on the top of the final-layer-feature-map
(c) Preferred Networks 17
Pyramid Pooling Module
• Use 1x1 conv to reduce the number of channels
• Then upsample (bilinear) them to the same size and concatenate all
(c) Preferred Networks 18
Implementation details (1)
• The average pooling are four levels, 1x1, 2x2,
3x3, and 6x6 (ksize, stride)
• Pre-trained ResNet model with dilated
convolution is used as the feature extractor
(the output size will be 1/8 of input image)
• They use two losses;
1. softmax loss between final layer and labels
2. softmax loss between an intermediate
output of ResNet and labels12
(weighted by
0.4)
12
"Relay backpropagation for effective learning of deep convolutional
neural networks", ECCV 2016
(c) Preferred Networks 19
Implementation details (2)
Optimization
MomentumSGD with weight
deacy
LR Scheduling
Momentum: 0.9
Weight decay: 0.0001 where
(c) Preferred Networks 20
Implementation details (3)
Training iteration Dataset augmentation
ADE20K: 150K Random mirror
PASCAL VOC: 30K Random resize between 0.5 and 2
Cityscapes: 90K Random rotation betwee -10 and 10
degrees
Random Gaussian blur for ADE20K
and PASCAL VOC
(c) Preferred Networks 21
Implementation detailts (4)
• An appropriately large "cropsize" can yield good performance
• "batchsize" in the batch normalization layer is of great importance:
Cropsize Batchsize
ADE20K: 473 x 473 16 for all dataset
PASCAL VOC: 473 x 473
Cityscapes: 713 x 713
(c) Preferred Networks 22
Implementation detailts (5)
MultiNode Batch Normalization
• To increase the "batchsize" in batch
normalization layers, they used custom
BN layer applied on data gathered from
multiple GPUs using OpenMPI
• We have Akiba-san's implementation of
multi-node batch normalization !
(c) Preferred Networks 23
ImageNet Scene Parsing
Challenge 2016
• Dataset: ADE20K
• 150 classes and 1,038 image-level
labels
• 20,000/2,000/3,000 pixel-level labels
for train/val/test
(c) Preferred Networks 24
Ablation Study for
Pyramid Pooling Module
• Average pooling works better than max
pooling in all settings
• Pooling with pyramid parsing
outperforms that using global pooling
• With dimension reduction (DR; reducing
the number of channels after pyramid
pooling), the performance is further
enhanced
(c) Preferred Networks 25
Ablation Study for
Auxiliary Loss
• Set the auxiliary loss weight between
0 and 1 and compared the final results
• yields the best performance
(c) Preferred Networks 26
Ablation Study for the
depth of ResNet
Deeper is better
(c) Preferred Networks 27
More Detailed
Performance Analysis
Additional processing Improvement (% in mIoU)
Data augmentation (DA) +1.54
Auxiliary loss (AL) +1.41
Pyramid pooling module (PSP) +4.45
Use deeper ResNet (50 to 269) +2.13
Multi-scale testing (MS) +1.13
• For multi-scale testing, they create prediction at 6 different
scales (0.5, 0.75, 1, 1.25, 1.5, and 1.75) and take average of them.
(c) Preferred Networks 28
Results on PASCAL VOC
2012
• Extended with Semantic Boundaries Dataset (SBD) 13
, they
used
• 10582, 1449, and 1456 images for train/val/test
• Mismatched relationship: For "aeroplane" and "sky" in the
second and third rows, PSPNet finds missing parts.
• Confusing classes: For "cows" in row one, our baseline
model treats it as "horse" and "dog" while PSPNet corrects
these errors
• Conspicuous objects: For "person", "bottle" and "plant" in
following rows, PSPNet performs well on these small-size-
object classes in the images compared to the baseline model
13
"Semantic Contours from Inverse Detectors", ICCV 2011, http://
home.bharathh.info/pubs/codes/SBD/download.html
(c) Preferred Networks 29
Results on PASCAL VOC 2012
• Comparing PSPNet with previous best-performing methods on the testing set based on two settings, i.e., with or without pre-training
on MS-COCO dataset
(c) Preferred Networks 30
Results on Cityscapes
• Cityscapes dataset consits of 2975, 500, and 1525 train/val/tests images (19
classes)
• 20000 coarsely annotated images are available (in the table below, ‡ means it's used)
(c) Preferred Networks 31
Thank you for your attention
• The official repository doesn't include any training code
• My own implementation for both training and testing have been ready:
• mitmul/chainer-pspnet: https://github.com/mitmul/chainer-pspnet
• Now I'm training a model to ensure the reproducibility
• Once finished the reproduction work, I'll send the code to ChainerCV
• In semantic segmentation task,
• input image is large (713 for PSPNet on cityscapes)
• appropriate batchsize, e.g., 16 or so, is important for batch normalization
• As the authors said, distributed batch normalization seems to be important in multi-GPU training
• So, now ChainerMN is necessary tool for such large-scale dataset and deep models
• It means that we need more GPU machines connected with InfiniBand
(c) Preferred Networks 32

More Related Content

What's hot

On the Convergence of Adam and Beyond
On the Convergence of Adam and BeyondOn the Convergence of Adam and Beyond
On the Convergence of Adam and Beyondharmonylab
 
信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化Shunsuke Ono
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...Edge AI and Vision Alliance
 
(2022年3月版)深層学習によるImage Classificaitonの発展
(2022年3月版)深層学習によるImage Classificaitonの発展(2022年3月版)深層学習によるImage Classificaitonの発展
(2022年3月版)深層学習によるImage Classificaitonの発展Takumi Ohkuma
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine LearningSri Ambati
 
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
ArcFace: Additive Angular Margin Loss for Deep Face RecognitionArcFace: Additive Angular Margin Loss for Deep Face Recognition
ArcFace: Additive Angular Margin Loss for Deep Face Recognitionharmonylab
 
[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...
[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...
[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...Deep Learning JP
 
[DL輪読会] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
[DL輪読会] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [DL輪読会] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
[DL輪読会] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields Deep Learning JP
 
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...Deep Learning JP
 
物体検出コンペティションOpen Imagesに挑む
物体検出コンペティションOpen Imagesに挑む物体検出コンペティションOpen Imagesに挑む
物体検出コンペティションOpen Imagesに挑むHiroto Honda
 
(文献紹介)Depth Completionの最新動向
(文献紹介)Depth Completionの最新動向(文献紹介)Depth Completionの最新動向
(文献紹介)Depth Completionの最新動向Morpho, Inc.
 
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...Deep Learning JP
 
【DL輪読会】Generative models for molecular discovery: Recent advances and challenges
【DL輪読会】Generative models for molecular discovery: Recent advances and challenges【DL輪読会】Generative models for molecular discovery: Recent advances and challenges
【DL輪読会】Generative models for molecular discovery: Recent advances and challengesDeep Learning JP
 
Superpixel Sampling Networks
Superpixel Sampling NetworksSuperpixel Sampling Networks
Superpixel Sampling Networksyukihiro domae
 
畳み込みニューラルネットワークの研究動向
畳み込みニューラルネットワークの研究動向畳み込みニューラルネットワークの研究動向
畳み込みニューラルネットワークの研究動向Yusuke Uchida
 
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築Kosuke Shinoda
 
CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選Kazuyuki Miyazawa
 
古典的見解を越えたオーバーフィッティングの先の世界
古典的見解を越えたオーバーフィッティングの先の世界古典的見解を越えたオーバーフィッティングの先の世界
古典的見解を越えたオーバーフィッティングの先の世界西岡 賢一郎
 
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会Preferred Networks
 
ドメイン適応の原理と応用
ドメイン適応の原理と応用ドメイン適応の原理と応用
ドメイン適応の原理と応用Yoshitaka Ushiku
 

What's hot (20)

On the Convergence of Adam and Beyond
On the Convergence of Adam and BeyondOn the Convergence of Adam and Beyond
On the Convergence of Adam and Beyond
 
信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
(2022年3月版)深層学習によるImage Classificaitonの発展
(2022年3月版)深層学習によるImage Classificaitonの発展(2022年3月版)深層学習によるImage Classificaitonの発展
(2022年3月版)深層学習によるImage Classificaitonの発展
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
ArcFace: Additive Angular Margin Loss for Deep Face RecognitionArcFace: Additive Angular Margin Loss for Deep Face Recognition
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
 
[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...
[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...
[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...
 
[DL輪読会] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
[DL輪読会] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [DL輪読会] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
[DL輪読会] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
 
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
 
物体検出コンペティションOpen Imagesに挑む
物体検出コンペティションOpen Imagesに挑む物体検出コンペティションOpen Imagesに挑む
物体検出コンペティションOpen Imagesに挑む
 
(文献紹介)Depth Completionの最新動向
(文献紹介)Depth Completionの最新動向(文献紹介)Depth Completionの最新動向
(文献紹介)Depth Completionの最新動向
 
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
 
【DL輪読会】Generative models for molecular discovery: Recent advances and challenges
【DL輪読会】Generative models for molecular discovery: Recent advances and challenges【DL輪読会】Generative models for molecular discovery: Recent advances and challenges
【DL輪読会】Generative models for molecular discovery: Recent advances and challenges
 
Superpixel Sampling Networks
Superpixel Sampling NetworksSuperpixel Sampling Networks
Superpixel Sampling Networks
 
畳み込みニューラルネットワークの研究動向
畳み込みニューラルネットワークの研究動向畳み込みニューラルネットワークの研究動向
畳み込みニューラルネットワークの研究動向
 
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
 
CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選
 
古典的見解を越えたオーバーフィッティングの先の世界
古典的見解を越えたオーバーフィッティングの先の世界古典的見解を越えたオーバーフィッティングの先の世界
古典的見解を越えたオーバーフィッティングの先の世界
 
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
 
ドメイン適応の原理と応用
ドメイン適応の原理と応用ドメイン適応の原理と応用
ドメイン適応の原理と応用
 

Viewers also liked

これから始める人の為のディープラーニング基礎講座
これから始める人の為のディープラーニング基礎講座これから始める人の為のディープラーニング基礎講座
これから始める人の為のディープラーニング基礎講座NVIDIA Japan
 
Chapter 8 ボルツマンマシン - 深層学習本読み会
Chapter 8 ボルツマンマシン - 深層学習本読み会Chapter 8 ボルツマンマシン - 深層学習本読み会
Chapter 8 ボルツマンマシン - 深層学習本読み会Taikai Takeda
 
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)Takuma Yagi
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Shunta Saito
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to ChainerShunta Saito
 
NIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksNIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksEiichi Matsumoto
 

Viewers also liked (8)

これから始める人の為のディープラーニング基礎講座
これから始める人の為のディープラーニング基礎講座これから始める人の為のディープラーニング基礎講座
これから始める人の為のディープラーニング基礎講座
 
Chapter 8 ボルツマンマシン - 深層学習本読み会
Chapter 8 ボルツマンマシン - 深層学習本読み会Chapter 8 ボルツマンマシン - 深層学習本読み会
Chapter 8 ボルツマンマシン - 深層学習本読み会
 
Semantic segmentation2
Semantic segmentation2Semantic segmentation2
Semantic segmentation2
 
CVPR 2017 速報
CVPR 2017 速報CVPR 2017 速報
CVPR 2017 速報
 
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
NIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksNIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder Networks
 

Similar to [unofficial] Pyramid Scene Parsing Network (CVPR 2017)

Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
Point cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihangPoint cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihangLihang Li
 
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...ssuser4b1f48
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSitakanta Mishra
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationDevansh16
 
URBAN OBJECT DETECTION IN UAV RESNETpptx
URBAN OBJECT DETECTION IN UAV RESNETpptxURBAN OBJECT DETECTION IN UAV RESNETpptx
URBAN OBJECT DETECTION IN UAV RESNETpptxbalajimankena
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010Cloudera, Inc.
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingYu Huang
 
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...cscpconf
 
Energy and latency aware application
Energy and latency aware applicationEnergy and latency aware application
Energy and latency aware applicationcsandit
 
Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...IJECEIAES
 

Similar to [unofficial] Pyramid Scene Parsing Network (CVPR 2017) (20)

PointNet
PointNetPointNet
PointNet
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Point cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihangPoint cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihang
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
paper
paperpaper
paper
 
URBAN OBJECT DETECTION IN UAV RESNETpptx
URBAN OBJECT DETECTION IN UAV RESNETpptxURBAN OBJECT DETECTION IN UAV RESNETpptx
URBAN OBJECT DETECTION IN UAV RESNETpptx
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
Portfolio
PortfolioPortfolio
Portfolio
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
ENERGY AND LATENCY AWARE APPLICATION MAPPING ALGORITHM & OPTIMIZATION FOR HOM...
 
Energy and latency aware application
Energy and latency aware applicationEnergy and latency aware application
Energy and latency aware application
 
Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...
 

More from Shunta Saito

[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...Shunta Saito
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsShunta Saito
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imageryShunta Saito
 
DeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural NetworksDeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural NetworksShunta Saito
 
Building detection with decision fusion
Building detection with decision fusionBuilding detection with decision fusion
Building detection with decision fusionShunta Saito
 
Automatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learningAutomatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learningShunta Saito
 
強化学習入門
強化学習入門強化学習入門
強化学習入門Shunta Saito
 
視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論Shunta Saito
 
集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回Shunta Saito
 

More from Shunta Saito (10)

[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
LT@Chainer Meetup
LT@Chainer MeetupLT@Chainer Meetup
LT@Chainer Meetup
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imagery
 
DeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural NetworksDeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural Networks
 
Building detection with decision fusion
Building detection with decision fusionBuilding detection with decision fusion
Building detection with decision fusion
 
Automatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learningAutomatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learning
 
強化学習入門
強化学習入門強化学習入門
強化学習入門
 
視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論
 
集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回
 

Recently uploaded

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

[unofficial] Pyramid Scene Parsing Network (CVPR 2017)

  • 1. Pyramid Scene Parsing Network Hengshuang Zhao1 , Jianping Shi2 , Xiaojuan Qi1 , Xiaogang Wang1 , Jiaya Jia 1 1 The Chinese University of Hong Kong, 2 SenseTime Group Limited Presentation: Shunta Saito Slide: Powered by Deckset (c) Preferred Networks 1
  • 2. Summary • Introduce Pyramid Pooling Module for better context grasp with sub-region awareness (c) Preferred Networks 2
  • 3. Why did I choose this paper? • Presented in CVPR 2017 • 1st place in ImageNet Scene Parsing Challenge 2016 (ADE20K) • was 1st place in Cityscapes leaderboard • now it's in 2nd place (I noticed this last week!) (c) Preferred Networks 3
  • 4. Agenda 1. Common building blocks in semantic segmentation 2. Major Issue 3. Prior Work 4. Pyramid Pooling Module 5. Experiment results (c) Preferred Networks 4
  • 5. Semantic Segmentation • Predict pixel-wise labels from natural images • Each pixel in an image belongs to an object class • So it's not instance-aware ! (c) Preferred Networks 5
  • 6. Common Building Blocks (1) Fully convolutional network (FCN)1 • A deep convolutional neural network which doesn't include any fully- connected layers • Almost all recent methods are based on FCN • Typically pre-trained with ImageNet under classification problem setting 1 "Fully Convolutional Networks for Semantic Segmentation", PAMI 2016 (c) Preferred Networks 6
  • 7. Common Building Blocks (2) Dilated convolution2 • Widen receptive field without reducing feature map resolution • Important for leveraging global context prior efficiently 2 "Multi-Scale Context Aggregation by Dilated Convolutions", ICLR 2016 (c) Preferred Networks 7
  • 8. Common Building Blocks (3) Multi-scale feature ensemble • Higher-layer feature contains more semantic meaning and less location information • Combining multi-scale features can improve the performance3 3 "Hypercolumns for Object Segmentation and Fine-grained Localization", CVPR 2015 (c) Preferred Networks 8
  • 9. Common Building Blocks (4) Conditional random field (CRF) • Post-processing to refine the segmentation result (DeepLab4 ) • Some following methods refined network via end-to-end modeling (DPN5 , CRF as RNN6 , Detections and Superpixels7 ) 7 "Higher order conditional random fields in deep neural networks", ECCV 2016 6 "Conditional random fields as recurrent neural networks", ICCV 2015 5 "Semantic image segmentation via deep parsing network", ICCV 2015 4 "Semantic image segmentation with deep convolutional nets and fully connected crfs", ICLR 2015 (c) Preferred Networks 9
  • 10. Common Building Blocks (5) Global average pooling (GAP) • ParsenNet8 proved that global average pooling with FCN can improve semantic segmentation results • But the global descriptors used in the paper are not representative enough for some challenging datasets like ADE20K 8 "Parsenet: Looking wider to see better", ICLR 2016 (c) Preferred Networks 10
  • 11. Major Issue (1) Mismatched relationship • Co-occurrent visual patterns imply some contexts • e.g., an airplane is likely to fly in sky while not over a road • Lack of the ability to collect contextual information increases the chance of misclassification • In the right figure, FCN predicts the boat in the yellow box as a "car" based on its appearance (c) Preferred Networks 11
  • 12. Major Issue (2) Confusing Classes • There are confusing classes in major datasets: field and earth; mountain and hill; wall, house, building and skyscraper, etc. • The expert human annotator still makes 17.6% pixel error for ADE20K9 • FCN predicts the object in the box as part of skyscraper and part of building but the whole object should be either skyscraper or building, not both • Utilizing the relationship between classes is important 9 "Semantic understanding of scenes through the ADE20K dataset", CVPR 2017 (c) Preferred Networks 12
  • 13. Major Issue (3) Inconspicuous Classes • Small objects like streetlight and signboard are inconspicuous and hard to find while they may be important • Big objects may appear in discontinuous, but FCN couldn't label the pillow which has similar appearance with the sheet correctly • To improve performance for small or very big objects, sub-regions should be paid more attention (c) Preferred Networks 13
  • 14. Summary of Issues • Use co-occurrent visual patterns as context • Consider relationship between classes • Sub-regions should be paid more attention (c) Preferred Networks 14
  • 15. Prior Work Global Average Pooling (GAP)10 • Receptive field of ResNet is already larger than the input image, so GAP sounds good to summarize the all information • But, pixels in an image may be various objects which have different sizes, so directly fusing them to form a single vector may lose the spatial relation and cause ambiguity 10 "Parsenet: Looking wider to see better", ICLR 2016 (c) Preferred Networks 15
  • 16. Prior Work Spatial Pyramid Pooling (SPP)11 • Pooling with different kernel/stride sizes to the feature maps • Then flatten and concatenate the pooling results to make fix-length representation • There still is context information loss 11 "Spatial pyramid pooling in deep convolutional networks for visual recognition", ECCV 2014 (c) Preferred Networks 16
  • 17. Pyramid Pooling Module • A hierarchical global prior, containing information with different scales and varying among different sub-regions • Pyramid Pooling Module for global scene prior constructed on the top of the final-layer-feature-map (c) Preferred Networks 17
  • 18. Pyramid Pooling Module • Use 1x1 conv to reduce the number of channels • Then upsample (bilinear) them to the same size and concatenate all (c) Preferred Networks 18
  • 19. Implementation details (1) • The average pooling are four levels, 1x1, 2x2, 3x3, and 6x6 (ksize, stride) • Pre-trained ResNet model with dilated convolution is used as the feature extractor (the output size will be 1/8 of input image) • They use two losses; 1. softmax loss between final layer and labels 2. softmax loss between an intermediate output of ResNet and labels12 (weighted by 0.4) 12 "Relay backpropagation for effective learning of deep convolutional neural networks", ECCV 2016 (c) Preferred Networks 19
  • 20. Implementation details (2) Optimization MomentumSGD with weight deacy LR Scheduling Momentum: 0.9 Weight decay: 0.0001 where (c) Preferred Networks 20
  • 21. Implementation details (3) Training iteration Dataset augmentation ADE20K: 150K Random mirror PASCAL VOC: 30K Random resize between 0.5 and 2 Cityscapes: 90K Random rotation betwee -10 and 10 degrees Random Gaussian blur for ADE20K and PASCAL VOC (c) Preferred Networks 21
  • 22. Implementation detailts (4) • An appropriately large "cropsize" can yield good performance • "batchsize" in the batch normalization layer is of great importance: Cropsize Batchsize ADE20K: 473 x 473 16 for all dataset PASCAL VOC: 473 x 473 Cityscapes: 713 x 713 (c) Preferred Networks 22
  • 23. Implementation detailts (5) MultiNode Batch Normalization • To increase the "batchsize" in batch normalization layers, they used custom BN layer applied on data gathered from multiple GPUs using OpenMPI • We have Akiba-san's implementation of multi-node batch normalization ! (c) Preferred Networks 23
  • 24. ImageNet Scene Parsing Challenge 2016 • Dataset: ADE20K • 150 classes and 1,038 image-level labels • 20,000/2,000/3,000 pixel-level labels for train/val/test (c) Preferred Networks 24
  • 25. Ablation Study for Pyramid Pooling Module • Average pooling works better than max pooling in all settings • Pooling with pyramid parsing outperforms that using global pooling • With dimension reduction (DR; reducing the number of channels after pyramid pooling), the performance is further enhanced (c) Preferred Networks 25
  • 26. Ablation Study for Auxiliary Loss • Set the auxiliary loss weight between 0 and 1 and compared the final results • yields the best performance (c) Preferred Networks 26
  • 27. Ablation Study for the depth of ResNet Deeper is better (c) Preferred Networks 27
  • 28. More Detailed Performance Analysis Additional processing Improvement (% in mIoU) Data augmentation (DA) +1.54 Auxiliary loss (AL) +1.41 Pyramid pooling module (PSP) +4.45 Use deeper ResNet (50 to 269) +2.13 Multi-scale testing (MS) +1.13 • For multi-scale testing, they create prediction at 6 different scales (0.5, 0.75, 1, 1.25, 1.5, and 1.75) and take average of them. (c) Preferred Networks 28
  • 29. Results on PASCAL VOC 2012 • Extended with Semantic Boundaries Dataset (SBD) 13 , they used • 10582, 1449, and 1456 images for train/val/test • Mismatched relationship: For "aeroplane" and "sky" in the second and third rows, PSPNet finds missing parts. • Confusing classes: For "cows" in row one, our baseline model treats it as "horse" and "dog" while PSPNet corrects these errors • Conspicuous objects: For "person", "bottle" and "plant" in following rows, PSPNet performs well on these small-size- object classes in the images compared to the baseline model 13 "Semantic Contours from Inverse Detectors", ICCV 2011, http:// home.bharathh.info/pubs/codes/SBD/download.html (c) Preferred Networks 29
  • 30. Results on PASCAL VOC 2012 • Comparing PSPNet with previous best-performing methods on the testing set based on two settings, i.e., with or without pre-training on MS-COCO dataset (c) Preferred Networks 30
  • 31. Results on Cityscapes • Cityscapes dataset consits of 2975, 500, and 1525 train/val/tests images (19 classes) • 20000 coarsely annotated images are available (in the table below, ‡ means it's used) (c) Preferred Networks 31
  • 32. Thank you for your attention • The official repository doesn't include any training code • My own implementation for both training and testing have been ready: • mitmul/chainer-pspnet: https://github.com/mitmul/chainer-pspnet • Now I'm training a model to ensure the reproducibility • Once finished the reproduction work, I'll send the code to ChainerCV • In semantic segmentation task, • input image is large (713 for PSPNet on cityscapes) • appropriate batchsize, e.g., 16 or so, is important for batch normalization • As the authors said, distributed batch normalization seems to be important in multi-GPU training • So, now ChainerMN is necessary tool for such large-scale dataset and deep models • It means that we need more GPU machines connected with InfiniBand (c) Preferred Networks 32