SlideShare a Scribd company logo
1 of 30
Download to read offline
A Brief Introduction
to Recent Segmentation Methods
Shunta Saito
Researcher at Preferred Networks, Inc.
Semantic Segmentation?
• Classifying all pixels, so it’s also called “Pixel-labeling”
* Feature Selection and Learning for Semantic Segmentation (Caner
Hazirbas), Master's thesis, Technical University Munich, 2014.
C C
C C
B
B
B
C C
B
B
B
B B
SS
S S S S S S S
S S S
R
R
R
R R
R R
R R
• Tackling this problem with CNN, usually it’s formulated as:
Typical formulation
Image CNN Prediction Label
Cross entropy
• The loss is calculated for each pixel independently
• It leads to the problem: “How can the model consider the context to make a single
prediction for a pixel?”
• How to leverage context information
• How to use law-level features in upper layers to make detailed
predictions
• How to create dense prediction
Common problems
“Fully Convolutional Networks for Semantic Segmentation”, Jonathan Long and Evan Shelhamer et al. appeared in
arxiv on Nov. 14, 2014
Fully Convolutional Network
(1: Reinterpret classification as a coarse prediction)
• The fully connected layers in
classification network can be
viewed as convolutions with
kernels that cover their entire input
regions
• The spatial output maps of
these convolutionalized
models make them a natural
choice for dense problems
like semantic segmentation.
See a Caffe’s example “Net Surgery”: https://github.com/BVLC/caffe/blob/master/examples/net_surgery.ipynb
256-6x6
4096-1x1
4096-1x1
If input is 451x451, output is 8x8 of 1000ch
Fully Convolutional Network (2: Coarse to dense)
• 1 possible way:

“Shift-and-Stitch” trick proposed in OverFeat paper (21 Dec, 2013)

OverFeat is the winner at the localization task of ILSVRC 2013 (not detection)
Shift input and stitch
(=“interlace”) the outputs
プログレッシブとインターレース(改訂版)by 義裕⼤大⾒見見
https://www.youtube.com/watch?v=h785loU4bh4
Fully Convolutional Network (2: Coarse to dense)
To understand OverFeat’s “Shift-and-Stitch” trick
Fully Convolutional Network (2: Coarse to dense)
• Another way: Decreasing subsampling layer (like Max Pooling)
‣ It has tradeoff:
‣ The filters see finer information, but have smaller receptive fields
and take longer to compute (due to large feature maps)
movies and images are from: http://cs231n.github.io/convolutional-networks/
Fully Convolutional Network (2: Coarse to dense)
• Instead of all ways listed
above, finally, they employed
upsampling to make coarse
predictions denser
• In a sense, upsampling with
factor f is convolution with a
fractional input stride of 1/f
• So, a natural way to upsample
is therefore backwards
convolution (sometimes
called deconvolution) with an
output stride of f
Upsampling by deconvolution 74 74
'k '
x ,
.kz
ftp.YE?D ,
"
iEIII÷IiiIEE÷:#
in
,÷÷:±÷ei:#
a-
Output
stride .f
74k ,
l⇒I*.IE?IItiiIe#eEiidYEEEE.*ai
.
Deconvolution
Fully Convolutional Network
(3: Patch-wise training or whole image training)
• Whole image fully convolutional training is identical
to patchwise training where each batch consists of
all the receptive fields of the units below the loss for
an image
yajif
-
ED
yan⇒Ise###
Patchwise training
is loss sampling
• We performed spatial sampling of the loss by making an
independent choice to ignore each final layer cell with some
probability 1 − p
• To avoid changing the effec- tive batch size, we
simultaneously increase the number of images per batch by
a factor 1/p
Fully Convolutional Network (4: Skip connection)
Nov. 14, 2014
• Fuse coarse, semantic and local,
appearance information
: Deconvolution

(initialized as bilinear upsampling, and learned)
Added
: Bilinear upsampling (fixed)
Fully Convolutional Network (5: Training scheme)
1. Prepare a trained model for ILSVRC12 (1000-class image classification)
2. Discard the final classifier layer
3. Convolutionalizing all remaining fully-connected layers
4. Append a 1x1 convolution with the target class number of channels
• MomentumSGD (momentum: 0.9)
• batchsize: 20
• Fixed LR: 10^-4 for FCN-VGG-16 (Doubled LR for biases)
• Weight decay: 5^-4
• Zero-initialize the class scoring layer
• Fine-tuning was for all layers
Other training settings:
Fully Convolutional Network
1. Replacing all fully-connected layers with convolutions
2. Upsampling by backwards convolution, a.k.a. deconvolution (and
bilinear upsampling)
3. Applied skip connections to use local, appearance information in the
final layer
Summary
• “Learning Deconvolution Network for Semantic Segmentation”, Hyeonwoo Noh, et al. appeared in
arxiv on May. 17, 2015
Deconvolution Network
* “Unpooling” here is corresponding to Chainer’s “Upsampling2D” function
• Fully Convolutional Network (FCN) has limitations:
• Fixed-size receptive field yields inconsistent labels for large objects
➡ Skip connection can’t solve this because there is inherent trade-off between boundary
details and semantics
• Interpolating 16 x 16 output to the original input size makes blurred results
➡ The absence of a deep deconvolution network trained on a large dataset makes
it difficult to reconstruct highly non- linear structures of object boundaries accurately.
Deconvolution Network
Let’s do it to perform proper upsampling
Deconvolution Network
Feature extractor Shape generator
Deconvolution Network
Shape generator
14 × 14
deconvolutional
layer
28 × 28
unpooling layer
28 × 28
deconvolutional
layer
56 × 56
unpooling layer
56 × 56
deconvolutional
layer
112 × 112
unpooling layer
112 × 112
deconvolutional
layer
224 × 224
unpooling layer
224 × 224
deconvolutional
layer
Activations of each layer
Deconvolution Network
One can think: “Skip connection is missing…?”
U-Net
“U-Net: Convolutional Networks for Biomedical Image Segmentation”, Olaf Ronneberger,
Philipp Fischer, Thomas Brox, 18 May 2015
U-Net
SegNet
• “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image
Segmentation”, Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla, 2 Nov,
2015
SegNet
• The training procedure is a bit complicated
• Encoder-decorder “pair-wise” training
There’s a Chainer implementation:

pfnet-research/chainer-segnet:
https://github.com/pfnet-research/chainer-segnet
Dilated convolutions
• “Multi-Scale Context Aggregation by Dilated Convolutions”, Fisher Yu, Vladlen Koltun, 23 Nov, 2015
• a.k.a stroud convolution, convolution with holes
• Enlarge the size of receptive field without losing resolution
The figure is from “WaveNet: A Generative Model for Raw Audio”
Dilated convolutions
• For example, the feature maps of ResNet are
downsampled 5 times, and 4 times in the 5 are done by
convolutions with stride of 2 (only the first one is by
pooling with stride of 2)
1/4
1/8
1/16
1/32
1/2
Dilated convolutions
• By using dilated convolutions instead of vanilla
convolutions, the resolution after the first pooling can be
kept as the same to the end
1/4
1/8
1/8
1/8
1/2
Dilated convolutions
But, it is still 1/8…
1/4
1/8
1/8
1/8
1/2
RefineNet
• “RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic
Segmentation”, Guosheng Lin, Anton Milan, Chunhua Shen, Ian Reid, 20 Nov. 2016
RefineNet
• Each intermediate feature map is refined through “RefineNet module”
RefineNet
* Implementation has been done in Chainer, the codes will be public soon
Semantic Segmentation using Adversarial Networks
• “Semantic Segmentation using Adversarial Networks”, Pauline Luc, Camille Couprie,
Soumith Chintala, Jakob Verbeek, 25 Nov. 2016

More Related Content

What's hot

【DL輪読会】Perceiver io a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs 【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs Deep Learning JP
 
【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...
【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...
【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP
 
[DL輪読会]Focal Loss for Dense Object Detection
[DL輪読会]Focal Loss for Dense Object Detection[DL輪読会]Focal Loss for Dense Object Detection
[DL輪読会]Focal Loss for Dense Object DetectionDeep Learning JP
 
3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向Kensho Hara
 
モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019Yusuke Uchida
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP
 
Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Hiroto Honda
 
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...Deep Learning JP
 
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose EstimationDeep Learning JP
 
[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...
[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...
[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...Deep Learning JP
 
モデルアーキテクチャ観点からのDeep Neural Network高速化
モデルアーキテクチャ観点からのDeep Neural Network高速化モデルアーキテクチャ観点からのDeep Neural Network高速化
モデルアーキテクチャ観点からのDeep Neural Network高速化Yusuke Uchida
 
[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative ModelsDeep Learning JP
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida
 
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image TranslationDeep Learning JP
 
深層学習の数理
深層学習の数理深層学習の数理
深層学習の数理Taiji Suzuki
 
[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image GeneratorsDeep Learning JP
 
backbone としての timm 入門
backbone としての timm 入門backbone としての timm 入門
backbone としての timm 入門Takuji Tahara
 
【DL輪読会】A Path Towards Autonomous Machine Intelligence
【DL輪読会】A Path Towards Autonomous Machine Intelligence【DL輪読会】A Path Towards Autonomous Machine Intelligence
【DL輪読会】A Path Towards Autonomous Machine IntelligenceDeep Learning JP
 

What's hot (20)

【DL輪読会】Perceiver io a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs 【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs
 
【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...
【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...
【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
[DL輪読会]Focal Loss for Dense Object Detection
[DL輪読会]Focal Loss for Dense Object Detection[DL輪読会]Focal Loss for Dense Object Detection
[DL輪読会]Focal Loss for Dense Object Detection
 
3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向
 
モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
 
Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩
 
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
 
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
 
[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...
[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...
[DL輪読会]Discriminative Learning for Monaural Speech Separation Using Deep Embe...
 
モデルアーキテクチャ観点からのDeep Neural Network高速化
モデルアーキテクチャ観点からのDeep Neural Network高速化モデルアーキテクチャ観点からのDeep Neural Network高速化
モデルアーキテクチャ観点からのDeep Neural Network高速化
 
[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
 
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
 
Semantic segmentation
Semantic segmentationSemantic segmentation
Semantic segmentation
 
深層学習の数理
深層学習の数理深層学習の数理
深層学習の数理
 
[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
 
backbone としての timm 入門
backbone としての timm 入門backbone としての timm 入門
backbone としての timm 入門
 
【DL輪読会】A Path Towards Autonomous Machine Intelligence
【DL輪読会】A Path Towards Autonomous Machine Intelligence【DL輪読会】A Path Towards Autonomous Machine Intelligence
【DL輪読会】A Path Towards Autonomous Machine Intelligence
 

Viewers also liked

強化学習入門
強化学習入門強化学習入門
強化学習入門Shunta Saito
 
DeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural NetworksDeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural NetworksShunta Saito
 
20170211クレジットカード認識
20170211クレジットカード認識20170211クレジットカード認識
20170211クレジットカード認識Takuya Minagawa
 
実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017 実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017 Preferred Networks
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imageryShunta Saito
 
視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論Shunta Saito
 
動的輪郭モデル
動的輪郭モデル動的輪郭モデル
動的輪郭モデルArumaziro
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationNamHyuk Ahn
 
20160525はじめてのコンピュータビジョン
20160525はじめてのコンピュータビジョン20160525はじめてのコンピュータビジョン
20160525はじめてのコンピュータビジョンTakuya Minagawa
 
全脳アーキテクチャ若手の会20170131
全脳アーキテクチャ若手の会20170131全脳アーキテクチャ若手の会20170131
全脳アーキテクチャ若手の会20170131Hangyo Masatsugu
 
強化学習その2
強化学習その2強化学習その2
強化学習その2nishio
 
AutoEncoderで特徴抽出
AutoEncoderで特徴抽出AutoEncoderで特徴抽出
AutoEncoderで特徴抽出Kai Sasaki
 
Crfと素性テンプレート
Crfと素性テンプレートCrfと素性テンプレート
Crfと素性テンプレートKei Uchiumi
 
全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習kwp_george
 
なにわTech20170218(tpu) tfug
なにわTech20170218(tpu) tfugなにわTech20170218(tpu) tfug
なにわTech20170218(tpu) tfugNatsutani Minoru
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural NetworksSeiya Tokui
 
Convolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をするConvolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をするDaiki Shimada
 
Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2
Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2
Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2Daiki Shimada
 

Viewers also liked (20)

強化学習入門
強化学習入門強化学習入門
強化学習入門
 
DeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural NetworksDeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural Networks
 
20170211クレジットカード認識
20170211クレジットカード認識20170211クレジットカード認識
20170211クレジットカード認識
 
LT@Chainer Meetup
LT@Chainer MeetupLT@Chainer Meetup
LT@Chainer Meetup
 
実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017 実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imagery
 
視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論
 
動的輪郭モデル
動的輪郭モデル動的輪郭モデル
動的輪郭モデル
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 
20160525はじめてのコンピュータビジョン
20160525はじめてのコンピュータビジョン20160525はじめてのコンピュータビジョン
20160525はじめてのコンピュータビジョン
 
全脳アーキテクチャ若手の会20170131
全脳アーキテクチャ若手の会20170131全脳アーキテクチャ若手の会20170131
全脳アーキテクチャ若手の会20170131
 
強化学習その2
強化学習その2強化学習その2
強化学習その2
 
AutoEncoderで特徴抽出
AutoEncoderで特徴抽出AutoEncoderで特徴抽出
AutoEncoderで特徴抽出
 
Crfと素性テンプレート
Crfと素性テンプレートCrfと素性テンプレート
Crfと素性テンプレート
 
全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習
 
なにわTech20170218(tpu) tfug
なにわTech20170218(tpu) tfugなにわTech20170218(tpu) tfug
なにわTech20170218(tpu) tfug
 
LDA入門
LDA入門LDA入門
LDA入門
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
 
Convolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をするConvolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をする
 
Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2
Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2
Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2
 

Similar to A brief introduction to recent segmentation methods

Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Sergey Karayev
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningTrong-An Bui
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in PythonKv Sagar
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxssuser3aa461
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Yu Huang
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks曾 子芸
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processingYu Huang
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classificationCenk Bircanoğlu
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Yan Xu
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Alex Conway
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptxZainULABIDIN496386
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural NetworksDean Wyatte
 

Similar to A brief introduction to recent segmentation methods (20)

Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classification
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
 

More from Shunta Saito

Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Shunta Saito
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)Shunta Saito
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to ChainerShunta Saito
 
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...Shunta Saito
 
Building detection with decision fusion
Building detection with decision fusionBuilding detection with decision fusion
Building detection with decision fusionShunta Saito
 
Automatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learningAutomatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learningShunta Saito
 
集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回Shunta Saito
 

More from Shunta Saito (7)

Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
 
Building detection with decision fusion
Building detection with decision fusionBuilding detection with decision fusion
Building detection with decision fusion
 
Automatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learningAutomatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learning
 
集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回
 

Recently uploaded

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

A brief introduction to recent segmentation methods

  • 1. A Brief Introduction to Recent Segmentation Methods Shunta Saito Researcher at Preferred Networks, Inc.
  • 2. Semantic Segmentation? • Classifying all pixels, so it’s also called “Pixel-labeling” * Feature Selection and Learning for Semantic Segmentation (Caner Hazirbas), Master's thesis, Technical University Munich, 2014. C C C C B B B C C B B B B B SS S S S S S S S S S S R R R R R R R R R
  • 3. • Tackling this problem with CNN, usually it’s formulated as: Typical formulation Image CNN Prediction Label Cross entropy • The loss is calculated for each pixel independently • It leads to the problem: “How can the model consider the context to make a single prediction for a pixel?”
  • 4. • How to leverage context information • How to use law-level features in upper layers to make detailed predictions • How to create dense prediction Common problems
  • 5. “Fully Convolutional Networks for Semantic Segmentation”, Jonathan Long and Evan Shelhamer et al. appeared in arxiv on Nov. 14, 2014 Fully Convolutional Network (1: Reinterpret classification as a coarse prediction) • The fully connected layers in classification network can be viewed as convolutions with kernels that cover their entire input regions • The spatial output maps of these convolutionalized models make them a natural choice for dense problems like semantic segmentation. See a Caffe’s example “Net Surgery”: https://github.com/BVLC/caffe/blob/master/examples/net_surgery.ipynb 256-6x6 4096-1x1 4096-1x1 If input is 451x451, output is 8x8 of 1000ch
  • 6. Fully Convolutional Network (2: Coarse to dense) • 1 possible way:
 “Shift-and-Stitch” trick proposed in OverFeat paper (21 Dec, 2013)
 OverFeat is the winner at the localization task of ILSVRC 2013 (not detection) Shift input and stitch (=“interlace”) the outputs
  • 8. Fully Convolutional Network (2: Coarse to dense) • Another way: Decreasing subsampling layer (like Max Pooling) ‣ It has tradeoff: ‣ The filters see finer information, but have smaller receptive fields and take longer to compute (due to large feature maps) movies and images are from: http://cs231n.github.io/convolutional-networks/
  • 9. Fully Convolutional Network (2: Coarse to dense) • Instead of all ways listed above, finally, they employed upsampling to make coarse predictions denser • In a sense, upsampling with factor f is convolution with a fractional input stride of 1/f • So, a natural way to upsample is therefore backwards convolution (sometimes called deconvolution) with an output stride of f Upsampling by deconvolution 74 74 'k ' x , .kz ftp.YE?D , " iEIII÷IiiIEE÷:# in ,÷÷:±÷ei:# a- Output stride .f 74k , l⇒I*.IE?IItiiIe#eEiidYEEEE.*ai . Deconvolution
  • 10. Fully Convolutional Network (3: Patch-wise training or whole image training) • Whole image fully convolutional training is identical to patchwise training where each batch consists of all the receptive fields of the units below the loss for an image yajif - ED yan⇒Ise### Patchwise training is loss sampling • We performed spatial sampling of the loss by making an independent choice to ignore each final layer cell with some probability 1 − p • To avoid changing the effec- tive batch size, we simultaneously increase the number of images per batch by a factor 1/p
  • 11. Fully Convolutional Network (4: Skip connection) Nov. 14, 2014 • Fuse coarse, semantic and local, appearance information : Deconvolution
 (initialized as bilinear upsampling, and learned) Added : Bilinear upsampling (fixed)
  • 12. Fully Convolutional Network (5: Training scheme) 1. Prepare a trained model for ILSVRC12 (1000-class image classification) 2. Discard the final classifier layer 3. Convolutionalizing all remaining fully-connected layers 4. Append a 1x1 convolution with the target class number of channels • MomentumSGD (momentum: 0.9) • batchsize: 20 • Fixed LR: 10^-4 for FCN-VGG-16 (Doubled LR for biases) • Weight decay: 5^-4 • Zero-initialize the class scoring layer • Fine-tuning was for all layers Other training settings:
  • 13. Fully Convolutional Network 1. Replacing all fully-connected layers with convolutions 2. Upsampling by backwards convolution, a.k.a. deconvolution (and bilinear upsampling) 3. Applied skip connections to use local, appearance information in the final layer Summary
  • 14. • “Learning Deconvolution Network for Semantic Segmentation”, Hyeonwoo Noh, et al. appeared in arxiv on May. 17, 2015 Deconvolution Network * “Unpooling” here is corresponding to Chainer’s “Upsampling2D” function
  • 15. • Fully Convolutional Network (FCN) has limitations: • Fixed-size receptive field yields inconsistent labels for large objects ➡ Skip connection can’t solve this because there is inherent trade-off between boundary details and semantics • Interpolating 16 x 16 output to the original input size makes blurred results ➡ The absence of a deep deconvolution network trained on a large dataset makes it difficult to reconstruct highly non- linear structures of object boundaries accurately. Deconvolution Network Let’s do it to perform proper upsampling
  • 17. Deconvolution Network Shape generator 14 × 14 deconvolutional layer 28 × 28 unpooling layer 28 × 28 deconvolutional layer 56 × 56 unpooling layer 56 × 56 deconvolutional layer 112 × 112 unpooling layer 112 × 112 deconvolutional layer 224 × 224 unpooling layer 224 × 224 deconvolutional layer Activations of each layer
  • 18. Deconvolution Network One can think: “Skip connection is missing…?”
  • 19. U-Net “U-Net: Convolutional Networks for Biomedical Image Segmentation”, Olaf Ronneberger, Philipp Fischer, Thomas Brox, 18 May 2015
  • 20. U-Net
  • 21. SegNet • “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation”, Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla, 2 Nov, 2015
  • 22. SegNet • The training procedure is a bit complicated • Encoder-decorder “pair-wise” training There’s a Chainer implementation:
 pfnet-research/chainer-segnet: https://github.com/pfnet-research/chainer-segnet
  • 23. Dilated convolutions • “Multi-Scale Context Aggregation by Dilated Convolutions”, Fisher Yu, Vladlen Koltun, 23 Nov, 2015 • a.k.a stroud convolution, convolution with holes • Enlarge the size of receptive field without losing resolution The figure is from “WaveNet: A Generative Model for Raw Audio”
  • 24. Dilated convolutions • For example, the feature maps of ResNet are downsampled 5 times, and 4 times in the 5 are done by convolutions with stride of 2 (only the first one is by pooling with stride of 2) 1/4 1/8 1/16 1/32 1/2
  • 25. Dilated convolutions • By using dilated convolutions instead of vanilla convolutions, the resolution after the first pooling can be kept as the same to the end 1/4 1/8 1/8 1/8 1/2
  • 26. Dilated convolutions But, it is still 1/8… 1/4 1/8 1/8 1/8 1/2
  • 27. RefineNet • “RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation”, Guosheng Lin, Anton Milan, Chunhua Shen, Ian Reid, 20 Nov. 2016
  • 28. RefineNet • Each intermediate feature map is refined through “RefineNet module”
  • 29. RefineNet * Implementation has been done in Chainer, the codes will be public soon
  • 30. Semantic Segmentation using Adversarial Networks • “Semantic Segmentation using Adversarial Networks”, Pauline Luc, Camille Couprie, Soumith Chintala, Jakob Verbeek, 25 Nov. 2016