SlideShare a Scribd company logo
1 of 32
Download to read offline
FCN to DeepLab.v3+
Whi Kwon
TLDR;
1. Semantic Segmentation 분야에서 FCN 이라는 Encoder(CNN)-Decoder 구조의
새로운 패러다임이 등장함 .
2. U-Net 이 등장함 . Skip Connection, gradually up/down sampling 이 구조에 추가 되었으며
왠지는 모르겠지만 많은 논문들에서 “ U-Net architecture” 라는 이름으로 segmentation
Network 를 사용함 .
3. 옛날 알고리즘 보다는 좋지만 FCN(+U-Net) 의 가장 큰 문제 (a.k.a. 개선 가능점 ) 는 Pooling.
Pooling 의 역할은 Exponential expansion of receptive field.
Pooling 의 문제점은 Feature map 의 크기의 축소 , 위치 정보의 손실 .
4. Pooling 의 역할을 대체해보자 ! → Dilated(Atrous) convolution.
Exponential expansion of receptive field 을 구조적 변경으로 가능하게 함 . 성능 저하도 없음 .
Feature map 크기 축소 문제 해결 !
5. Pooling 할 때 filter 크기 별로 위치 정보 손실이 다르지 않을까 ? 그럼 , 다양한 크기로 pooling 한
뒤에 합쳐보자 . → Spatial Pyramid Pooling.
6. 위에서 사용한 내용들 , skip connection, dilated convolution, spatial pyramid pooling 을
다 함께 사용하자 . + 좋은 Encoder → DeepLab.v3+ ( 현재 PASCAL VOC 2012 1 등 )
7. 아주 짧은 내용만을 다뤘기 때문에 내용을 참고하셔서 많은 논문을 보시면 좋겠습니다 .
Outline
Part.1: Encoder – Decoder 란 ?
Part.2: 위치 정보를 잘 보존하려면 ?
Part.3: End-to-End Semantic Segmentation
의 재료들
Part 1. Encoder – Decoder 란 무엇인가 ?
Outline – Part 1.
1. Encoder - Decoder 란 ?
2. Encoder 로써의 CNN
3. 위치 정보를 얻기 위한 Decoder 는 ?
4. Fully Convolutional Network (FCN) 의 등장
Encoder Decoder“hello
world”
[104, 101,
108, 108,
111, 32, 119,
111, 114,
108, 100]
“hello
world”
Encoder 는 원본 데이터로
부터 변환된 데이터를 얻습
니다 .
Decoder 는 변환된 데이터
로부터 원본 데이터를 얻습
니다 .
Source: https://unsplash.com/photos/EcsCeS6haJ8
Encoder
(CNN)
Feature
map
0
0
0
1
고양이 사진을 입력했을 때
feature extraction 하는 과정을
Encoding 이라고 볼 수 있고 이
때 , Encoder 는 CNN 입니다 .
각각의 값들이 어떤 의미를 하는
지 정확하게 알 수는 없지만 고
양이 사진을 변환한 정보를 가지
고 있습니다 .
Source: https://cdn-images-1.medium.com/max/1600/1*bGTawFxQwzc5yV1_szDrwQ.png
CNN 이라는 Encoder 로 데이터
를 변환해서 예측했는데 매우 잘
합니다 . 데이터가 잘 변환되어
사진의 정보를 많이 가지고 있는
듯 합니다 !
이 정보를 잘 활용 할 수 있지
않을까 ..
Fully Convolutional Network!
Encoder Decoder
CNN 은 Encoder 로써 잘 작동하므
로 Feature map 에 각 픽셀의 정
보가 압축되어 있다고 해보자 .
압축된 정보가 Decoder 를 통하
면 픽셀의 위치 정보를 얻을 수 있
지 않을까 ? Yes!
Long et al. Fully convolutional networks for semantic segmentation. CVPR, 2015.
Part 2. 위치 정보를 잘 보존하려면 ?
Outline – Part 2.
1. FCN 의 문제점 ?
2. En--------coder De--------coder 구조 (U-Net)
3. Dilated Convolution (Dilated Net, DeepLab.v2)
4. Spatial Pyramid Pooling (PSPNet, DeepLab.v3,+)
Fully Convolutional Network?
x32
Long et al. Fully convolutional networks for semantic segmentation. CVPR, 2015.
Feature map 의 값이 대응되는 pixel 개수가
너무 많습니다 ! 위치 정보가 세세하게 보존되
기 어려워요 .
문제 : Pooling layer!
VGG-19(FCN Encoder)
Image
Conv/
Pool
Conv/
Pool
Conv/
Pool
Conv/
Pool
Conv/
Pool FC
Pooling 의 역할 :
- Exponential expansion of
receptive field
- Translation invariance
Pooling 의 문제점 :
- Feature map 의 축소
- 위치 정보의 손실
En—coder De—coder (a.k.a. U-net architecture)
단계적
Encoding
단계적
Decoding
앞선 정보를 전달하자 !
(skip connection)
Ronneberger et al, U-net: Convolutional networks for biomedical image segmentation. MICCAI, 2015.
좋은 방법들을 사용하긴 했는데 그래도 여전히
마지막 Feature map 이 원본 이미지에 비해 너
무 작은 문제는 그대로 있네요 .
Dilated(Atrous) Convolution
Perone et al. Spinal cord gray matter segmentation using deep dilated convolutions. ArXiv, 2017
Dilated Convolution?
Dilated(Atrous) Convolution
Yu, Koltun et al. Multi-Scale Context Aggregation by Dilated Convolutions. ILCR, 2016
Layer 1 2 3
Convolution 3x3 3x3 3x3
Dilation 1 1 1
Receptive field 3x3 5x5 5x5
Layer 1 2 3
Convolution 3x3 3x3 3x3
Dilation 1 2 4
Receptive field 3x3 7x7 15x15
vs
Receptive Field 비교 (Normal vs Dilated)
Exponential expansion of receptive field!
1 2 3
Dilated(Atrous) Convolution
Input/Final feature
map : 1/32
Input/Final feature
map: 1/8
Feature map 크기 기존 대비 4 배 보존 !
Chen et al. Rethinking atrous convolution for semantic image segmentation. arXiv, 2017
Feature map 비교 (Normal vs Dilated)
Spatial Pyramid Pooling
He et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. ECCV, 2014.
Pooling 할 때 생기는 위치 정보 손실이
filter 크기 마다 다르지 않을까요 ?
Filter 크기 별로 정보를 추출한 뒤에
합쳐서 위치 정보 손실을 최소화해봅시다 .
Atrous Convolution + Spatial Pyramid Pooling!
Spatial Pyramid Pooling!
Chen et al. Rethinking atrous convolution for semantic image segmentation. ArXiv, 2017.
Zhao et al. Pyramid scene parsing network. CVPR, 2017.
Encoder/Decoder,
Atrous Conv,
Spatial Pyramid
Pooling
DeepLab.v3+
Chen et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. ArXiv, 2018.
배운 내용을 찾아봅시다 !
PASCAL VOC2012 Leaderboard
모델 Mean Average
Precision (%)
Base CNN 모델
DeepLab.v3+ 87.8 Xception
DeepLab.v3 85.7 ResNet-101
PSPNet 85.4 ResNet-101
DeepLab.v2-CRF 79.7 ResNet-101
FCN-2s-
Dilated_VGG19
69.0 VGG-19
FCN-8s 62.2 VGG-19
SegNet 59.9 VGG-19
VOC Score: http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean&challengeid=11&compid=6&submid=6103
Encoder 의 발전
SPP
Dilated
Conv
Encoder/
Decoder
Part 3. End-to-End Semantic
Segmentation 의 추가 재료들
Outline – Part 3.
1. 데이터 준비 , 전처리
2. 모델 선정
3. Loss, Optimizer 선정
4. 평가 (Metrics)
데이터 전처리
- 전처리는 classification 과 다르게 특별한 건 없습니다 .
대신 augmentation 할 때 image-mask 쌍으로 해줘야
합니 다 !
Loss
- Cross Entropy Loss
Optimizer
- SGD with momentum (+ Nesterov)
Learning rate
- Poly learning rate policy
(PSPNet, DeepLab.v2~v3+)
평가 방법 (Pixel)
- IoU: B / (A + C - B)
- Pixel accuracy: B / A
A
B
C
예측
정답
예측 성공 !
평가 방법 (Object)
- Precision/Recall: IoU >= 0.5
- AP: IoU 기준 (0~1.0) 에 따른
Precision/Recall Curve 의 면적
- mAP: 모든 class 의 AP 평균
A
A
A’
C
C
C’
IoU = 0.7
IoU = 0.2
Success(TP)
Fail(FN)
AP AP → mAP
Source: https://github.com/Cartucho/mAP
A C
C’
빠진 내용
1. Post preprocess – CRF, ...
2. Dilated Conv, Upsampling 에 대한 상세 이해
3. 다른 분야와의 접목된 연구 결과 (e.g. pix2pix)
… 채워주세요 !
Reference
1. 모델
- He et al. Spatial pyramid pooling in deep convolutional networks for
visual recognition. ECCV, 2014.
- Long et al. Fully convolutional networks for semantic segmentation.
CVPR, 2015.
- Ronneberger et al, U-net: Convolutional networks for biomedical image
segmentation. MICCAI, 2015.
- Yu, Koltun et al. Multi-Scale Context Aggregation by Dilated Convolutions.
ILCR, 2016
- Zhao et al. Pyramid scene parsing network. CVPR, 2017.
- Chen et al. Rethinking atrous convolution for semantic image segmentation.
ArXiv, 2017
- Chen et al. Encoder-Decoder with Atrous Separable Convolution for
Semantic Image Segmentation. ArXiv, 2018.
Reference
2. 참고 자료
– FCN – PSPNet Pytorch 구현
(https://github.com/ZijunDeng/pytorch-semantic-segmentation)
- 평가 지표 Python 구현
(https://github.com/martinkersner/py_img_seg_eval)
- DeepLab Pytorch 구현
(https://github.com/doiken23/DeepLab_pytorch)
- Deconvolution 설명 – Distill
(https://distill.pub/2016/deconv-checkerboard/)
- FCN to DeepLab.v3 정리 블로그
(http://blog.qure.ai/notes/semantic-segmentation-deep-learning-review)
- PASCAL VOC 2012 Semantic Segmentation 평가 결과
(http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean
&challengeid=11&compid=6&submid=8284
Reference
– Dilated Convolution 설명
(https://stackoverflow.com/questions/41178576/whats-the-use-of-dilated-
convolutions)
- Spatial Pyramid Pooling 설명
(https://www.quora.com/What-is-the-difference-between-simple-max-
Pooling-and-spatial-pyramid-pooling-Im-seeing-these-terms-a-lot-lately-
In-papers-where-the-authors-need-to-get-a-feature-vector)
- Receptive field 설명
(https://medium.com/mlreview/a-guide-to-receptive-field-arithmetic-for
-convolutional-neural-networks-e0f514068807)
- Dilated Convolution 유무 성능 비교 , 발생 문제 (gridding artifact) 해결
(https://arxiv.org/abs/1705.09914)

More Related Content

What's hot

Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Johan Andersson
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0taeseon ryu
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearningAbhishek Sharma
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural networkNAVER Engineering
 
Find nuclei in images with U-net
Find nuclei in images with U-netFind nuclei in images with U-net
Find nuclei in images with U-netDing Li
 
[Paper] DetectoRS for Object Detection
[Paper] DetectoRS for Object Detection[Paper] DetectoRS for Object Detection
[Paper] DetectoRS for Object DetectionSusang Kim
 
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fieldsVarun Bhaseen
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360VIKAS SINGH BHADOURIA
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNNrlawjdgns
 
김혁, <드래곤 하운드>의 PBR과 레이트레이싱 렌더링 기법, NDC2019
김혁, <드래곤 하운드>의 PBR과 레이트레이싱 렌더링 기법, NDC2019김혁, <드래곤 하운드>의 PBR과 레이트레이싱 렌더링 기법, NDC2019
김혁, <드래곤 하운드>의 PBR과 레이트레이싱 렌더링 기법, NDC2019devCAT Studio, NEXON
 
Autoencoder
AutoencoderAutoencoder
AutoencoderHARISH R
 
Deep learning for person re-identification
Deep learning for person re-identificationDeep learning for person re-identification
Deep learning for person re-identification哲东 郑
 
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...Joonhyung Lee
 
CNN Attention Networks
CNN Attention NetworksCNN Attention Networks
CNN Attention NetworksTaeoh Kim
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421穗碧 陳
 

What's hot (20)

Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural network
 
Find nuclei in images with U-net
Find nuclei in images with U-netFind nuclei in images with U-net
Find nuclei in images with U-net
 
[Paper] DetectoRS for Object Detection
[Paper] DetectoRS for Object Detection[Paper] DetectoRS for Object Detection
[Paper] DetectoRS for Object Detection
 
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fields
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360Next generation graphics programming on xbox 360
Next generation graphics programming on xbox 360
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 
Blenderbot
BlenderbotBlenderbot
Blenderbot
 
김혁, <드래곤 하운드>의 PBR과 레이트레이싱 렌더링 기법, NDC2019
김혁, <드래곤 하운드>의 PBR과 레이트레이싱 렌더링 기법, NDC2019김혁, <드래곤 하운드>의 PBR과 레이트레이싱 렌더링 기법, NDC2019
김혁, <드래곤 하운드>의 PBR과 레이트레이싱 렌더링 기법, NDC2019
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
Deep learning for person re-identification
Deep learning for person re-identificationDeep learning for person re-identification
Deep learning for person re-identification
 
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
 
CNN Attention Networks
CNN Attention NetworksCNN Attention Networks
CNN Attention Networks
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421
 

Similar to FCN to DeepLab.v3+

History of Vision AI
History of Vision AIHistory of Vision AI
History of Vision AITae Young Lee
 
A Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding AutoencoderA Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding AutoencoderLee Seungeun
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection창기 문
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection창기 문
 
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...Dae Hyun Nam
 
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)Lee Seungeun
 
Deep Learning Into Advance - 1. Image, ConvNet
Deep Learning Into Advance - 1. Image, ConvNetDeep Learning Into Advance - 1. Image, ConvNet
Deep Learning Into Advance - 1. Image, ConvNetHyojun Kim
 
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksOh Yoojin
 
[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_sparkNAVER D2
 
Final project v0.84
Final project v0.84Final project v0.84
Final project v0.84Soukwon Jun
 
실전프로젝트 정서경 양현찬
실전프로젝트 정서경 양현찬실전프로젝트 정서경 양현찬
실전프로젝트 정서경 양현찬현찬 양
 
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...changedaeoh
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰taeseon ryu
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksSanghoon Yoon
 
Image Deep Learning 실무적용
Image Deep Learning 실무적용Image Deep Learning 실무적용
Image Deep Learning 실무적용Youngjae Kim
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural networkDongyi Kim
 

Similar to FCN to DeepLab.v3+ (20)

History of Vision AI
History of Vision AIHistory of Vision AI
History of Vision AI
 
A Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding AutoencoderA Beginner's guide to understanding Autoencoder
A Beginner's guide to understanding Autoencoder
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection
 
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detectionSummary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection
 
LeNet & GoogLeNet
LeNet & GoogLeNetLeNet & GoogLeNet
LeNet & GoogLeNet
 
HistoryOfCNN
HistoryOfCNNHistoryOfCNN
HistoryOfCNN
 
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
SPPNet : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Re...
 
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
 
CNN
CNNCNN
CNN
 
Deep Learning Into Advance - 1. Image, ConvNet
Deep Learning Into Advance - 1. Image, ConvNetDeep Learning Into Advance - 1. Image, ConvNet
Deep Learning Into Advance - 1. Image, ConvNet
 
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
 
[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark
 
Final project v0.84
Final project v0.84Final project v0.84
Final project v0.84
 
실전프로젝트 정서경 양현찬
실전프로젝트 정서경 양현찬실전프로젝트 정서경 양현찬
실전프로젝트 정서경 양현찬
 
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
Convolutional Neural Networks(CNN) / Stanford cs231n 2017 lecture 5 / MLAI@UO...
 
딥러닝 논문읽기 efficient netv2 논문리뷰
딥러닝 논문읽기 efficient netv2  논문리뷰딥러닝 논문읽기 efficient netv2  논문리뷰
딥러닝 논문읽기 efficient netv2 논문리뷰
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Image Deep Learning 실무적용
Image Deep Learning 실무적용Image Deep Learning 실무적용
Image Deep Learning 실무적용
 
Dl from scratch(7)
Dl from scratch(7)Dl from scratch(7)
Dl from scratch(7)
 
Designing more efficient convolution neural network
Designing more efficient convolution neural networkDesigning more efficient convolution neural network
Designing more efficient convolution neural network
 

FCN to DeepLab.v3+

  • 2. TLDR; 1. Semantic Segmentation 분야에서 FCN 이라는 Encoder(CNN)-Decoder 구조의 새로운 패러다임이 등장함 . 2. U-Net 이 등장함 . Skip Connection, gradually up/down sampling 이 구조에 추가 되었으며 왠지는 모르겠지만 많은 논문들에서 “ U-Net architecture” 라는 이름으로 segmentation Network 를 사용함 . 3. 옛날 알고리즘 보다는 좋지만 FCN(+U-Net) 의 가장 큰 문제 (a.k.a. 개선 가능점 ) 는 Pooling. Pooling 의 역할은 Exponential expansion of receptive field. Pooling 의 문제점은 Feature map 의 크기의 축소 , 위치 정보의 손실 . 4. Pooling 의 역할을 대체해보자 ! → Dilated(Atrous) convolution. Exponential expansion of receptive field 을 구조적 변경으로 가능하게 함 . 성능 저하도 없음 . Feature map 크기 축소 문제 해결 ! 5. Pooling 할 때 filter 크기 별로 위치 정보 손실이 다르지 않을까 ? 그럼 , 다양한 크기로 pooling 한 뒤에 합쳐보자 . → Spatial Pyramid Pooling. 6. 위에서 사용한 내용들 , skip connection, dilated convolution, spatial pyramid pooling 을 다 함께 사용하자 . + 좋은 Encoder → DeepLab.v3+ ( 현재 PASCAL VOC 2012 1 등 ) 7. 아주 짧은 내용만을 다뤘기 때문에 내용을 참고하셔서 많은 논문을 보시면 좋겠습니다 .
  • 3. Outline Part.1: Encoder – Decoder 란 ? Part.2: 위치 정보를 잘 보존하려면 ? Part.3: End-to-End Semantic Segmentation 의 재료들
  • 4. Part 1. Encoder – Decoder 란 무엇인가 ?
  • 5. Outline – Part 1. 1. Encoder - Decoder 란 ? 2. Encoder 로써의 CNN 3. 위치 정보를 얻기 위한 Decoder 는 ? 4. Fully Convolutional Network (FCN) 의 등장
  • 6. Encoder Decoder“hello world” [104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100] “hello world” Encoder 는 원본 데이터로 부터 변환된 데이터를 얻습 니다 . Decoder 는 변환된 데이터 로부터 원본 데이터를 얻습 니다 .
  • 7. Source: https://unsplash.com/photos/EcsCeS6haJ8 Encoder (CNN) Feature map 0 0 0 1 고양이 사진을 입력했을 때 feature extraction 하는 과정을 Encoding 이라고 볼 수 있고 이 때 , Encoder 는 CNN 입니다 . 각각의 값들이 어떤 의미를 하는 지 정확하게 알 수는 없지만 고 양이 사진을 변환한 정보를 가지 고 있습니다 .
  • 8. Source: https://cdn-images-1.medium.com/max/1600/1*bGTawFxQwzc5yV1_szDrwQ.png CNN 이라는 Encoder 로 데이터 를 변환해서 예측했는데 매우 잘 합니다 . 데이터가 잘 변환되어 사진의 정보를 많이 가지고 있는 듯 합니다 ! 이 정보를 잘 활용 할 수 있지 않을까 ..
  • 9. Fully Convolutional Network! Encoder Decoder CNN 은 Encoder 로써 잘 작동하므 로 Feature map 에 각 픽셀의 정 보가 압축되어 있다고 해보자 . 압축된 정보가 Decoder 를 통하 면 픽셀의 위치 정보를 얻을 수 있 지 않을까 ? Yes! Long et al. Fully convolutional networks for semantic segmentation. CVPR, 2015.
  • 10. Part 2. 위치 정보를 잘 보존하려면 ?
  • 11. Outline – Part 2. 1. FCN 의 문제점 ? 2. En--------coder De--------coder 구조 (U-Net) 3. Dilated Convolution (Dilated Net, DeepLab.v2) 4. Spatial Pyramid Pooling (PSPNet, DeepLab.v3,+)
  • 12. Fully Convolutional Network? x32 Long et al. Fully convolutional networks for semantic segmentation. CVPR, 2015. Feature map 의 값이 대응되는 pixel 개수가 너무 많습니다 ! 위치 정보가 세세하게 보존되 기 어려워요 .
  • 13. 문제 : Pooling layer! VGG-19(FCN Encoder) Image Conv/ Pool Conv/ Pool Conv/ Pool Conv/ Pool Conv/ Pool FC Pooling 의 역할 : - Exponential expansion of receptive field - Translation invariance Pooling 의 문제점 : - Feature map 의 축소 - 위치 정보의 손실
  • 14. En—coder De—coder (a.k.a. U-net architecture) 단계적 Encoding 단계적 Decoding 앞선 정보를 전달하자 ! (skip connection) Ronneberger et al, U-net: Convolutional networks for biomedical image segmentation. MICCAI, 2015. 좋은 방법들을 사용하긴 했는데 그래도 여전히 마지막 Feature map 이 원본 이미지에 비해 너 무 작은 문제는 그대로 있네요 .
  • 15. Dilated(Atrous) Convolution Perone et al. Spinal cord gray matter segmentation using deep dilated convolutions. ArXiv, 2017 Dilated Convolution?
  • 16. Dilated(Atrous) Convolution Yu, Koltun et al. Multi-Scale Context Aggregation by Dilated Convolutions. ILCR, 2016 Layer 1 2 3 Convolution 3x3 3x3 3x3 Dilation 1 1 1 Receptive field 3x3 5x5 5x5 Layer 1 2 3 Convolution 3x3 3x3 3x3 Dilation 1 2 4 Receptive field 3x3 7x7 15x15 vs Receptive Field 비교 (Normal vs Dilated) Exponential expansion of receptive field! 1 2 3
  • 17. Dilated(Atrous) Convolution Input/Final feature map : 1/32 Input/Final feature map: 1/8 Feature map 크기 기존 대비 4 배 보존 ! Chen et al. Rethinking atrous convolution for semantic image segmentation. arXiv, 2017 Feature map 비교 (Normal vs Dilated)
  • 18. Spatial Pyramid Pooling He et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. ECCV, 2014. Pooling 할 때 생기는 위치 정보 손실이 filter 크기 마다 다르지 않을까요 ? Filter 크기 별로 정보를 추출한 뒤에 합쳐서 위치 정보 손실을 최소화해봅시다 .
  • 19. Atrous Convolution + Spatial Pyramid Pooling! Spatial Pyramid Pooling! Chen et al. Rethinking atrous convolution for semantic image segmentation. ArXiv, 2017. Zhao et al. Pyramid scene parsing network. CVPR, 2017.
  • 21. Chen et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. ArXiv, 2018. 배운 내용을 찾아봅시다 !
  • 22. PASCAL VOC2012 Leaderboard 모델 Mean Average Precision (%) Base CNN 모델 DeepLab.v3+ 87.8 Xception DeepLab.v3 85.7 ResNet-101 PSPNet 85.4 ResNet-101 DeepLab.v2-CRF 79.7 ResNet-101 FCN-2s- Dilated_VGG19 69.0 VGG-19 FCN-8s 62.2 VGG-19 SegNet 59.9 VGG-19 VOC Score: http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean&challengeid=11&compid=6&submid=6103 Encoder 의 발전 SPP Dilated Conv Encoder/ Decoder
  • 23. Part 3. End-to-End Semantic Segmentation 의 추가 재료들
  • 24. Outline – Part 3. 1. 데이터 준비 , 전처리 2. 모델 선정 3. Loss, Optimizer 선정 4. 평가 (Metrics)
  • 25. 데이터 전처리 - 전처리는 classification 과 다르게 특별한 건 없습니다 . 대신 augmentation 할 때 image-mask 쌍으로 해줘야 합니 다 !
  • 26. Loss - Cross Entropy Loss Optimizer - SGD with momentum (+ Nesterov) Learning rate - Poly learning rate policy (PSPNet, DeepLab.v2~v3+)
  • 27. 평가 방법 (Pixel) - IoU: B / (A + C - B) - Pixel accuracy: B / A A B C 예측 정답 예측 성공 !
  • 28. 평가 방법 (Object) - Precision/Recall: IoU >= 0.5 - AP: IoU 기준 (0~1.0) 에 따른 Precision/Recall Curve 의 면적 - mAP: 모든 class 의 AP 평균 A A A’ C C C’ IoU = 0.7 IoU = 0.2 Success(TP) Fail(FN) AP AP → mAP Source: https://github.com/Cartucho/mAP A C C’
  • 29. 빠진 내용 1. Post preprocess – CRF, ... 2. Dilated Conv, Upsampling 에 대한 상세 이해 3. 다른 분야와의 접목된 연구 결과 (e.g. pix2pix) … 채워주세요 !
  • 30. Reference 1. 모델 - He et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. ECCV, 2014. - Long et al. Fully convolutional networks for semantic segmentation. CVPR, 2015. - Ronneberger et al, U-net: Convolutional networks for biomedical image segmentation. MICCAI, 2015. - Yu, Koltun et al. Multi-Scale Context Aggregation by Dilated Convolutions. ILCR, 2016 - Zhao et al. Pyramid scene parsing network. CVPR, 2017. - Chen et al. Rethinking atrous convolution for semantic image segmentation. ArXiv, 2017 - Chen et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. ArXiv, 2018.
  • 31. Reference 2. 참고 자료 – FCN – PSPNet Pytorch 구현 (https://github.com/ZijunDeng/pytorch-semantic-segmentation) - 평가 지표 Python 구현 (https://github.com/martinkersner/py_img_seg_eval) - DeepLab Pytorch 구현 (https://github.com/doiken23/DeepLab_pytorch) - Deconvolution 설명 – Distill (https://distill.pub/2016/deconv-checkerboard/) - FCN to DeepLab.v3 정리 블로그 (http://blog.qure.ai/notes/semantic-segmentation-deep-learning-review) - PASCAL VOC 2012 Semantic Segmentation 평가 결과 (http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean &challengeid=11&compid=6&submid=8284
  • 32. Reference – Dilated Convolution 설명 (https://stackoverflow.com/questions/41178576/whats-the-use-of-dilated- convolutions) - Spatial Pyramid Pooling 설명 (https://www.quora.com/What-is-the-difference-between-simple-max- Pooling-and-spatial-pyramid-pooling-Im-seeing-these-terms-a-lot-lately- In-papers-where-the-authors-need-to-get-a-feature-vector) - Receptive field 설명 (https://medium.com/mlreview/a-guide-to-receptive-field-arithmetic-for -convolutional-neural-networks-e0f514068807) - Dilated Convolution 유무 성능 비교 , 발생 문제 (gridding artifact) 해결 (https://arxiv.org/abs/1705.09914)