SlideShare a Scribd company logo
1 of 83
Download to read offline
Image Segmentation with Deep Learning
Xavier Giro-i-Nieto
UPC & BSC Barcelona
Carles Ventura
UOC Barcelona
Xavier Giro-i-Nieto
Associate Professor at Universitat Politecnica
de Catalunya (UPC) in Barcelona, Catalonia.
IDEAI Center for
Intelligent Data Science
& Artificial Intelligence
@DocXavi
xavier.giro@upc.edu
https://sites.google.com/view/dlbcn2018/home https://sites.google.com/view/dlbcn2019/home
Deep Learning Barcelona Symposium
Foundations
● MSc course [2017] [2018] [2019]
● BSc course [2018] [2019] [2020]
Multimedia Applications
Vision: [2016] [2017][2018][2019]
Language & Speech: [2017] [2018] [2019]
Reinforcement Learning
● [2020 Spring] [2020 Autumn]
Deep Learning @ UPC TelecomBCN
4th (face-to-face) & 5th edition (online) start November 2020. Sign up here.
Online Postgraduate Course
Àgata
Lapedriza
(UOC)
Xavier
Giró
(UPC-BSC)
Xavier
Suau
(Apple)
Marta
Ruiz
(UPC)
Carles
Ventura
(UOC)
Jordi
Pons
(Dolby)
Jordi
Torres
(BSC)
Elisenda
Bou
(Vilynx)
Daniel
Fojo
(Glovo)
Acknowledgements
6
Amaia Salvador
amaia.salvador@upc.edu
PhD Candidate
Universitat Politècnica de Catalunya
[DLCV 2016]
Verónica Vilaplana
veronica.vilaplana@upc.edu
Associate Professor
Universitat Politècnica de Catalunya
[DLCV 2017]
Míriam Bellver
miriam.bellver@bsc.edu
PhD Candidate
Barcelona Supercomputing Center
[DLCV 2018] [DLCV 2018]
From image to pixels classification (segmentation)
7
Slide inspired by cs231n lecture from Stanford University.
Image
Segmentation
Object Detection
Image
Classification
“chair”, “bin” “chair” “bin” “chair” “bin”
Segmentation
Segmentation: Define the accurate boundaries of all objects in an image
predicting a class map for each pixel
8
● Autonomous driving
Segmentation Applications
● Medical imaging
Image source: DRIVE Digital Retinal Image Vessel Extraction
Segmentation Applications
● Robotic applications
Segmentation Applications
● Scene understanding
Segmentation Applications
Outline
From Global to Local-scale Image Classification
Semantic Segmentation
● Deconvolution (or transposed convolution)
● Dilated Convolution
● Skip Connections
Instance Segmentation
● Proposal-Based
● Recurrent
● Instance Embedding
Panoptic Segmentation
13
14
Figure: Jeremy Jordan (2018)
From Image to Pixel Classification (Segmentation)
From Image to Pixel Classification (Segmentation)
15
Slide: CS231n (Stanford University)
CNN COW
Extract
patch
Run through
a CNN
Classify
center pixel
Repeat for
every pixel
16
From Image to Pixel Classification (Segmentation)
Naive approach: Train a sliding window classifier.
Slide: CS231n (Stanford University)
CNN COW
Extract
patch
Run through
a CNN
Classify
center pixel
Repeat for
every pixel
17
From Image to Pixel Classification (Segmentation)
Naive approach: Train a sliding window classifier.
CNN
Convolutionize: Run “fully convolutional” network to get all pixels at once.
18
From Global to Local-scale Image Classification
Slide: CS231n (Stanford University)
CNN
Convolutionize: Run “fully convolutional” network to get all pixels at once.
19
Slide concept: CS231n (Stanford University)
From Global to Local-scale Image Classification
Convolutionize: Formulate each neuron in a fully connected (FC) layer as a
convolutional filter (kernel) of a convolutional layer:
20
3x2x2 tensor
(RGB image of 2x2)
2 fully connected
neurons
3x2x2 * 2 weights
2 convolutional filters of 3 x 2 x 2
(same size as input tensor)
3x2x2 * 2 weights
From Global to Local-scale Image Classification
21
A model trained for image classification on low-definition images can provide local
response when fed with high-definition images.
Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." CVPR
2015. (original figure has been modified)
From Global to Local-scale Image Classification
22Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." CVPR
2015. (original figure has been modified)
From Global to Local-scale Image Classification
CNN
Convolutionize: Run “fully convolutional” network to get all pixels at once...
23
From Global to Local-scale Image Classification
Campos, V., Jou, B., & Giro-i-Nieto, X. . From Pixels to Sentiment: Fine-tuning CNNs for Visual Sentiment Prediction.
Image and Vision Computing. (2017)
The FC to Conv redefinition allows generating heatmaps of the class prediction over
the input images.
24
From Global to Local-scale Image Classification
Limitation:
Pooling layers in the CNN will
decrease the spatial definition of the
output.
Figure: Alicja Kwasniewska (ISSonDL 2020)
25
From Global to Local-scale Image Classification
CNN
Limitation: Pooling layers in the CNN will decrease the spatial definition of
the output.
Slide concept: CS231n (Stanford University)
Outline
From Global to Local-scale Image Classification
Semantic Segmentation
● Deconvolution (or transposed convolution)
● Skip Connections
● Dilated Convolutions
Instance Segmentation
● Proposal-Based
● Recurrent
● Instance Embedding
Panoptic Segmentation
26
Semantic Segmentation
Label every pixel!
Don’t differentiate
instances (cows)
Classic computer
vision problem
27
Slide: CS231n (Stanford University)
Instance Segmentation
Detect instances,
give category, label
pixels
“simultaneous
detection and
segmentation” (SDS)
Labels are
class-aware and
instance-aware
28
Slide: CS231n (Stanford University)
Outline
Semantic Segmentation
● Deconvolution (or transposed convolution)
● Dilated Convolution
● Skip Connections
Instance Segmentation Methods
● Proposal-Based
● Recurrent
● Instance Embedding
Panoptic Segmentation
29
30Slide Credit: https://www.jeremyjordan.me/semantic-segmentation/
Semantic Segmentation
Semantic Segmentation
31
CNN
Limitation of convolutionizing CNNs for image classification:
Pooling layers in the CNN will decrease the spatial definition of the output.
Slide concept: CS231n (Stanford University)
Learnable upsampling
32Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." CVPR
2015.
33
Slide: Alicja Kwasniewska (ISSonDL 2020)
Learnable Upsample: Transposed Convolution
Reminder: Convolutional Layer
Typical 3 x 3 convolution, stride 1 pad 1
Input: 4 x 4 Output: 4 x 4
34
Slide credit: CS231n (Stanford University)
Reminder: Convolutional Layer
Typical 3 x 3 convolution, stride 1 pad 1
Input: 4 x 4 Output: 4 x 4
Dot product
between filter
and input
35
Slide credit: CS231n (Stanford University)
Reminder: Convolutional Layer
Typical 3 x 3 convolution, stride 1 pad 1
Input: 4 x 4 Output: 4 x 4
Dot product
between filter
and input
36
Slide credit: CS231n (Stanford University)
Reminder: Convolutional Layer
Typical 3 x 3 convolution, stride 2 pad 1
Input: 4 x 4 Output: 2 x 2
37
Slide credit: CS231n (Stanford University)
Reminder: Convolutional Layer
Typical 3 x 3 convolution, stride 2 pad 1
Input: 4 x 4 Output: 2 x 2
Dot product
between filter
and input
38
Slide credit: CS231n (Stanford University)
Reminder: Convolutional Layer
Typical 3 x 3 convolution, stride 2 pad 1
Input: 4 x 4 Output: 2 x 2
Dot product
between filter
and input
39
Slide credit: CS231n (Stanford University)
3 x 3 “deconvolution”, stride 2 pad 1
Input: 2 x 2 Output: 4 x 4
40
Slide credit: CS231n (Stanford University)
Learnable upsampling with Transposed Convolutions
3 x 3 “deconvolution”, stride 2 pad 1
Input: 2 x 2 Output: 4 x 4
Input gives
weight for
filter values
Learnable Upsample: Transposed Convolution
41
Slide credit: CS231n (Stanford University)
Learnable Upsample: Transposed Convolution
Slide Credit: CS231n
3 x 3 “deconvolution”, stride 2 pad 1
Input: 2 x 2 Output: 4 x 4
Input gives
weight for
filter values
Sum where
output overlaps
42
Learnable Upsample: Transposed Convolution
Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. ICCV 2015.
“Regular” VGG “Upside down” VGG
43
44
Limitation of upsampling from deep CNN layers: Deeper layers
are specialized for higher-level semantic tasks, not in capturing
fine-grained details required for segmentation.
Highest activations along CNN depth
Learnable Upsample
Skip Connections
“skip
connections”
Solution: Combine
predictions from features
at different depths.
45Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." CVPR
2015.
combination
46#U-Net Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image
segmentation." MICCAI 2015
Skip connections to intermediate layers
47
Receptive Field
Receptive field: Part of the input data that is visible to a neuron.
It increases as we stack more convolutional layers (i.e. neurons in deeper layers
have larger receptive fields).
André Araujo, Wade Norris, Jack Sim, “Computing Receptive Fields of Convolutional Neural Networks”. Distill.pub
2019.
Problem: Receptive field may be limited, and pixel-wise predictions at
the deepest layer may not be aware of the whole image.
48
Receptive Field: Dilated (atrous) convolutions
Slide: Alicja Kwasniewska (ISSonDL 2020)
Dilated Convolutions
● By adding more layers:
○ The receptive field grows exponentially.
○ The number of learnable parameters (filter weights) grows linearly.
49
Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. ICLR 2016.
Dilated Convolutions
50Source: https://github.com/vdumoulin/conv_arithmetic
Dilated Convolutions + Spatial Pyramid Pooling (SPP)
51
#SPP He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual
recognition. TPAMI 2015.
#PSPNet Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. CVPR 2017.
State-of-the-art models
52
● DeepLab v3+: Atrous Convolutions + Spatial Pyramid Pooling + Encoder-Decoder
#DeepLabv3+ Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous
separable convolution for semantic image segmentation. ECCV 2018
Outline
From Global to Local-scale Image Classification
Semantic Segmentation
● Deconvolution (or transposed convolution)
● Skip Connections
● Dilated Convolution
Instance Segmentation
● Proposal-Based
● Recurrent
● Instance Embedding
Panoptic Segmentation
53
Proposal-based
54
Typical object detection/segmentation pipelines:
Object
proposal
Refinement
and
Classification
Dog
0.85
Cat
0.80
Dog
0.75
Cat
0.90
Proposal-based
55
Typical object detection/segmentation pipelines:
Object
proposal
Refinement
and
Classification
Dog
0.85
Cat
0.80
Dog
0.75
Cat
0.90
NMS: Non-Maximum Suppression
Proposal-based
56
Typical object detection/segmentation pipelines:
Object
proposal
Refinement
and
Classification
Dog
0.85
Cat
0.80
Dog
0.75
Cat
0.90
Binary
Map
Binary
Map
Proposal-based
Slide Credit: CS231nHariharan et al. Simultaneous Detection and Segmentation. ECCV 2014
External
Segment
proposals
Mask out background
with mean image
Similar to R-CNN, but with segment proposals
57
Proposal based: Detection - Faster R-CNN
Conv
layers
Region Proposal Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5_3
RPN Proposals
58
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
Learn proposals end-to-end sharing parameters with the classification network
He et al. Mask R-CNN. ICCV 2017
Proposal-based Instance Segmentation: Mask R-CNN
Faster R-CNN for Pixel Level Segmentation as a parallel prediction of masks
and class labels
59
Mask R-CNN
He et al. Mask R-CNN. ICCV 2017
Object Detection Object Detection and Segmentation
He et al. Mask R-CNN. ICCV 2017
Mask R-CNN: RoI Align
RoI Pool from Fast R-CNN
Hi-res input image:
3 x 800 x 600
with region
proposal
Convolution
and Pooling
Hi-res conv features:
C x H x W
with region proposal
Fully-connected
layers
Max-pool within
each grid cell
RoI conv features:
C x h x w
for region proposal
Fully-connected layers expect
low-res conv features:
C x h x w
x/16 & rounding → misalignment ! + not differentiable
61
62
Limitations of Proposal-based models
63
1. Two objects might share the same bounding box: Only
one will be kept after NMS step.
2. Choice of NMS threshold is application dependant
3. Same pixel can be assigned to multiple instances
4. Number of predictions is limited by the number of
proposals.
Single-shot Instance Segmentation
64
● Improving RetinaNet (single-shot object detector) in three ways:
○ Integrating instance mask prediction
○ Making the loss function adaptive and more stable
○ Including hard examples in training
#RetinaMask Fu et al. RetinaMars: Learning to predict masks improves state-of-the-art single-shot detection for free.
ArXiv 2019
65
CNN Cat
A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” NIPS 2012
66
Cat
Grass
Stone
CNN
RNN
CNN
CNN
RNN
67
CNN
RNN
CNN
CNN
RNN
CNN
CNN
CNN
Recurrent Instance Segmentation
Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 68
Sequential mask generation
Salvador, A., Bellver, Campos. V, M., Baradad, M., Marqués, F., Torres, J., & Giro-i-Nieto, X. (2018) From Pixels to Object
Sequences: Recurrent Semantic Instance Segmentation.
Recurrent Instance Segmentation
Recurrent Instance Segmentation
#RVOS Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier Giro-i-Nieto.
“RVOS: End-to-End Recurrent Network for Video Object Segmentation”, CVPR 2019.
time
(frame sequence)
space
(object sequence)
Outline
Segmentation Datasets
Segmentation Applications
Semantic Segmentation
● Deconvolution (or transposed convolution)
● Dilated Convolution
● Skip Connections
Instance Segmentation
● Proposal-Based
● Recurrent
● DETR
Panoptic Segmentation
71
Semantic + Instance = Panoptic Segmentation
72#PS Kirillov, A., He, K., Girshick, R., Rother, C., & Dollár, P. (2019). Panoptic segmentation. CVPR 2019.
Panoptic Segmentation: methods
73
● UPSNet: A Unified Panoptic Segmentation Network
Mask R-CNN design
#UPSNET Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., & Urtasun, R. (2019). Upsnet: A unified panoptic segmentation
network. CVPR 2019.
Panoptic Segmentation: methods
74
● UPSNet: A Unified Panoptic Segmentation Network
Xioing et al. UPSNet: A Unified Panoptic Segmentation Network. CVPR 2019
Summary
Semantic Segmentation Methods
● Deconvolution (or transposed convolution)
● Dilated Convolution
● Skip Connections
Instance Segmentation Methods
● Proposal-Based
● Recurrent
● Instance Embedding
Panoptic Segmentation
75
Latest advances
● Bolya et al. YOLACT Real-time Instance Segmentation. ICCV 2019
● #Axial-DeepLab Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., & Chen, L. C. (2020).
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation. ECCV 2020.
● #SOLO Wang, X., Kong, T., Shen, C., Jiang, Y., & Li, L. (2019). Solo: Segmenting objects by locations.
ECCV 2020
● Fast Semantic Segmentation with MobileNet in PyTorch.
76
Segmentation Datasets
● 20 categories
● +10,000 images
● Semantic segmentation GT
● Instance segmentation GT
● 540 categories
● +10,000 images
● Dense annotations
● Semantic segmentation GT
● Objects + stuff
Pascal Visual Object Classes Pascal Context
77
Segmentation Datasets
● Real indoor & outdoor scenes
● 80 categories
● +300,000 images
● 2M instances
● Partial annotations
● Semantic segmentation GT
● Instance segmentation GT
● Objects, but no stuff
COCO Common Objects in Context
78
● Real general scenes
● +150 categories
● +22,000 images
● Semantic segmentation GT
● Instance + parts segmentation GT
● Objects and stuff
ADE20K
Segmentation Datasets
79
● Real general scenes
● 350 categories
● +950,000 of images
● 2,700,00 instance segmentations
● Instance segmentation GT
● Objects
Open Images V6
Segmentation Datasets
80
● Real general scenes
● 1,000 categories
● 164,000 of images
● 2,200,00 instance segmentations
● 11.2 objects instance from 3.4
categories on average per image
(more complex images than Open
Images and MS COCO)
● Instance segmentation GT
● Objects
LVIS
Segmentation Datasets
● Real driving scenes
● 30 categories
● +25,000 images
● 20,000 partial annotations
● 5,000 dense annotations
● Semantic segmentation GT
● Instance segmentation GT
● Depth, GPS and other metadata
● Objects and stuff
● Real driving scenes covering 6
continents with variety of
weather/season/time of
day/camera/viewpoint
● 152 categories
● 25,000 images
● Semantic segmentation GT
● Instance + parts segmentation GT
● Objects and stuff
CityScapes Mapillary Vistas Dataset
81
Our research
Hands on
Carles Ventura
cventuraroy@uoc.edu
Lecturer
Universitat Oberta de Catalunya

More Related Content

What's hot

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaUniversitat Politècnica de Catalunya
 
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Deep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningDeep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningMarc Bolaños Solà
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중datasciencekorea
 
DeepFix: a fully convolutional neural network for predicting human fixations...
DeepFix:  a fully convolutional neural network for predicting human fixations...DeepFix:  a fully convolutional neural network for predicting human fixations...
DeepFix: a fully convolutional neural network for predicting human fixations...Universitat Politècnica de Catalunya
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Universitat Politècnica de Catalunya
 

What's hot (20)

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Neural Architectures for Video Encoding
Neural Architectures for Video EncodingNeural Architectures for Video Encoding
Neural Architectures for Video Encoding
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
 
Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
 
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
 
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Deep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal LearningDeep Neural Networks for Multimodal Learning
Deep Neural Networks for Multimodal Learning
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
DeepFix: a fully convolutional neural network for predicting human fixations...
DeepFix:  a fully convolutional neural network for predicting human fixations...DeepFix:  a fully convolutional neural network for predicting human fixations...
DeepFix: a fully convolutional neural network for predicting human fixations...
 
Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
 

Similar to Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonDL 2020

Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Sergey Karayev
 
Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Dongheon Lee
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsShunta Saito
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 
Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?Nilavra Bhattacharya
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxNoorUlHaq47
 
Visual Search Engine with MXNet Gluon
Visual Search Engine with MXNet GluonVisual Search Engine with MXNet Gluon
Visual Search Engine with MXNet GluonApache MXNet
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesDmytro Mishkin
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in PythonKv Sagar
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Intel® Software
 
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Jedha Bootcamp
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Universitat Politècnica de Catalunya
 

Similar to Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonDL 2020 (20)

Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)
 
Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 
Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?Can AI say from our eyes when we read relevant information?
Can AI say from our eyes when we read relevant information?
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
 
Visual Search Engine with MXNet Gluon
Visual Search Engine with MXNet GluonVisual Search Engine with MXNet Gluon
Visual Search Engine with MXNet Gluon
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
 
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
 
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Universitat Politècnica de Catalunya
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaUniversitat Politècnica de Catalunya
 
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationHate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationUniversitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (16)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
 
Backpropagation for Deep Learning
Backpropagation for Deep LearningBackpropagation for Deep Learning
Backpropagation for Deep Learning
 
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationHate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
 

Recently uploaded

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 

Recently uploaded (20)

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonDL 2020

  • 1. Image Segmentation with Deep Learning Xavier Giro-i-Nieto UPC & BSC Barcelona Carles Ventura UOC Barcelona
  • 2. Xavier Giro-i-Nieto Associate Professor at Universitat Politecnica de Catalunya (UPC) in Barcelona, Catalonia. IDEAI Center for Intelligent Data Science & Artificial Intelligence @DocXavi xavier.giro@upc.edu
  • 4. Foundations ● MSc course [2017] [2018] [2019] ● BSc course [2018] [2019] [2020] Multimedia Applications Vision: [2016] [2017][2018][2019] Language & Speech: [2017] [2018] [2019] Reinforcement Learning ● [2020 Spring] [2020 Autumn] Deep Learning @ UPC TelecomBCN
  • 5. 4th (face-to-face) & 5th edition (online) start November 2020. Sign up here. Online Postgraduate Course Àgata Lapedriza (UOC) Xavier Giró (UPC-BSC) Xavier Suau (Apple) Marta Ruiz (UPC) Carles Ventura (UOC) Jordi Pons (Dolby) Jordi Torres (BSC) Elisenda Bou (Vilynx) Daniel Fojo (Glovo)
  • 6. Acknowledgements 6 Amaia Salvador amaia.salvador@upc.edu PhD Candidate Universitat Politècnica de Catalunya [DLCV 2016] Verónica Vilaplana veronica.vilaplana@upc.edu Associate Professor Universitat Politècnica de Catalunya [DLCV 2017] Míriam Bellver miriam.bellver@bsc.edu PhD Candidate Barcelona Supercomputing Center [DLCV 2018] [DLCV 2018]
  • 7. From image to pixels classification (segmentation) 7 Slide inspired by cs231n lecture from Stanford University. Image Segmentation Object Detection Image Classification “chair”, “bin” “chair” “bin” “chair” “bin”
  • 8. Segmentation Segmentation: Define the accurate boundaries of all objects in an image predicting a class map for each pixel 8
  • 10. ● Medical imaging Image source: DRIVE Digital Retinal Image Vessel Extraction Segmentation Applications
  • 13. Outline From Global to Local-scale Image Classification Semantic Segmentation ● Deconvolution (or transposed convolution) ● Dilated Convolution ● Skip Connections Instance Segmentation ● Proposal-Based ● Recurrent ● Instance Embedding Panoptic Segmentation 13
  • 14. 14 Figure: Jeremy Jordan (2018) From Image to Pixel Classification (Segmentation)
  • 15. From Image to Pixel Classification (Segmentation) 15
  • 16. Slide: CS231n (Stanford University) CNN COW Extract patch Run through a CNN Classify center pixel Repeat for every pixel 16 From Image to Pixel Classification (Segmentation) Naive approach: Train a sliding window classifier.
  • 17. Slide: CS231n (Stanford University) CNN COW Extract patch Run through a CNN Classify center pixel Repeat for every pixel 17 From Image to Pixel Classification (Segmentation) Naive approach: Train a sliding window classifier.
  • 18. CNN Convolutionize: Run “fully convolutional” network to get all pixels at once. 18 From Global to Local-scale Image Classification Slide: CS231n (Stanford University)
  • 19. CNN Convolutionize: Run “fully convolutional” network to get all pixels at once. 19 Slide concept: CS231n (Stanford University) From Global to Local-scale Image Classification
  • 20. Convolutionize: Formulate each neuron in a fully connected (FC) layer as a convolutional filter (kernel) of a convolutional layer: 20 3x2x2 tensor (RGB image of 2x2) 2 fully connected neurons 3x2x2 * 2 weights 2 convolutional filters of 3 x 2 x 2 (same size as input tensor) 3x2x2 * 2 weights From Global to Local-scale Image Classification
  • 21. 21 A model trained for image classification on low-definition images can provide local response when fed with high-definition images. Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." CVPR 2015. (original figure has been modified) From Global to Local-scale Image Classification
  • 22. 22Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." CVPR 2015. (original figure has been modified) From Global to Local-scale Image Classification CNN Convolutionize: Run “fully convolutional” network to get all pixels at once...
  • 23. 23 From Global to Local-scale Image Classification Campos, V., Jou, B., & Giro-i-Nieto, X. . From Pixels to Sentiment: Fine-tuning CNNs for Visual Sentiment Prediction. Image and Vision Computing. (2017) The FC to Conv redefinition allows generating heatmaps of the class prediction over the input images.
  • 24. 24 From Global to Local-scale Image Classification Limitation: Pooling layers in the CNN will decrease the spatial definition of the output. Figure: Alicja Kwasniewska (ISSonDL 2020)
  • 25. 25 From Global to Local-scale Image Classification CNN Limitation: Pooling layers in the CNN will decrease the spatial definition of the output. Slide concept: CS231n (Stanford University)
  • 26. Outline From Global to Local-scale Image Classification Semantic Segmentation ● Deconvolution (or transposed convolution) ● Skip Connections ● Dilated Convolutions Instance Segmentation ● Proposal-Based ● Recurrent ● Instance Embedding Panoptic Segmentation 26
  • 27. Semantic Segmentation Label every pixel! Don’t differentiate instances (cows) Classic computer vision problem 27 Slide: CS231n (Stanford University)
  • 28. Instance Segmentation Detect instances, give category, label pixels “simultaneous detection and segmentation” (SDS) Labels are class-aware and instance-aware 28 Slide: CS231n (Stanford University)
  • 29. Outline Semantic Segmentation ● Deconvolution (or transposed convolution) ● Dilated Convolution ● Skip Connections Instance Segmentation Methods ● Proposal-Based ● Recurrent ● Instance Embedding Panoptic Segmentation 29
  • 31. Semantic Segmentation 31 CNN Limitation of convolutionizing CNNs for image classification: Pooling layers in the CNN will decrease the spatial definition of the output. Slide concept: CS231n (Stanford University)
  • 32. Learnable upsampling 32Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." CVPR 2015.
  • 33. 33 Slide: Alicja Kwasniewska (ISSonDL 2020) Learnable Upsample: Transposed Convolution
  • 34. Reminder: Convolutional Layer Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 34 Slide credit: CS231n (Stanford University)
  • 35. Reminder: Convolutional Layer Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 Dot product between filter and input 35 Slide credit: CS231n (Stanford University)
  • 36. Reminder: Convolutional Layer Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 Dot product between filter and input 36 Slide credit: CS231n (Stanford University)
  • 37. Reminder: Convolutional Layer Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 37 Slide credit: CS231n (Stanford University)
  • 38. Reminder: Convolutional Layer Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 Dot product between filter and input 38 Slide credit: CS231n (Stanford University)
  • 39. Reminder: Convolutional Layer Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 Dot product between filter and input 39 Slide credit: CS231n (Stanford University)
  • 40. 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 40 Slide credit: CS231n (Stanford University) Learnable upsampling with Transposed Convolutions
  • 41. 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 Input gives weight for filter values Learnable Upsample: Transposed Convolution 41 Slide credit: CS231n (Stanford University)
  • 42. Learnable Upsample: Transposed Convolution Slide Credit: CS231n 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 Input gives weight for filter values Sum where output overlaps 42
  • 43. Learnable Upsample: Transposed Convolution Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. ICCV 2015. “Regular” VGG “Upside down” VGG 43
  • 44. 44 Limitation of upsampling from deep CNN layers: Deeper layers are specialized for higher-level semantic tasks, not in capturing fine-grained details required for segmentation. Highest activations along CNN depth Learnable Upsample
  • 45. Skip Connections “skip connections” Solution: Combine predictions from features at different depths. 45Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." CVPR 2015. combination
  • 46. 46#U-Net Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." MICCAI 2015 Skip connections to intermediate layers
  • 47. 47 Receptive Field Receptive field: Part of the input data that is visible to a neuron. It increases as we stack more convolutional layers (i.e. neurons in deeper layers have larger receptive fields). André Araujo, Wade Norris, Jack Sim, “Computing Receptive Fields of Convolutional Neural Networks”. Distill.pub 2019. Problem: Receptive field may be limited, and pixel-wise predictions at the deepest layer may not be aware of the whole image.
  • 48. 48 Receptive Field: Dilated (atrous) convolutions Slide: Alicja Kwasniewska (ISSonDL 2020)
  • 49. Dilated Convolutions ● By adding more layers: ○ The receptive field grows exponentially. ○ The number of learnable parameters (filter weights) grows linearly. 49 Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. ICLR 2016.
  • 51. Dilated Convolutions + Spatial Pyramid Pooling (SPP) 51 #SPP He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. TPAMI 2015. #PSPNet Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. CVPR 2017.
  • 52. State-of-the-art models 52 ● DeepLab v3+: Atrous Convolutions + Spatial Pyramid Pooling + Encoder-Decoder #DeepLabv3+ Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. ECCV 2018
  • 53. Outline From Global to Local-scale Image Classification Semantic Segmentation ● Deconvolution (or transposed convolution) ● Skip Connections ● Dilated Convolution Instance Segmentation ● Proposal-Based ● Recurrent ● Instance Embedding Panoptic Segmentation 53
  • 54. Proposal-based 54 Typical object detection/segmentation pipelines: Object proposal Refinement and Classification Dog 0.85 Cat 0.80 Dog 0.75 Cat 0.90
  • 55. Proposal-based 55 Typical object detection/segmentation pipelines: Object proposal Refinement and Classification Dog 0.85 Cat 0.80 Dog 0.75 Cat 0.90 NMS: Non-Maximum Suppression
  • 56. Proposal-based 56 Typical object detection/segmentation pipelines: Object proposal Refinement and Classification Dog 0.85 Cat 0.80 Dog 0.75 Cat 0.90 Binary Map Binary Map
  • 57. Proposal-based Slide Credit: CS231nHariharan et al. Simultaneous Detection and Segmentation. ECCV 2014 External Segment proposals Mask out background with mean image Similar to R-CNN, but with segment proposals 57
  • 58. Proposal based: Detection - Faster R-CNN Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals 58 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 Learn proposals end-to-end sharing parameters with the classification network
  • 59. He et al. Mask R-CNN. ICCV 2017 Proposal-based Instance Segmentation: Mask R-CNN Faster R-CNN for Pixel Level Segmentation as a parallel prediction of masks and class labels 59
  • 60. Mask R-CNN He et al. Mask R-CNN. ICCV 2017 Object Detection Object Detection and Segmentation
  • 61. He et al. Mask R-CNN. ICCV 2017 Mask R-CNN: RoI Align RoI Pool from Fast R-CNN Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal Fully-connected layers Max-pool within each grid cell RoI conv features: C x h x w for region proposal Fully-connected layers expect low-res conv features: C x h x w x/16 & rounding → misalignment ! + not differentiable 61
  • 62. 62
  • 63. Limitations of Proposal-based models 63 1. Two objects might share the same bounding box: Only one will be kept after NMS step. 2. Choice of NMS threshold is application dependant 3. Same pixel can be assigned to multiple instances 4. Number of predictions is limited by the number of proposals.
  • 64. Single-shot Instance Segmentation 64 ● Improving RetinaNet (single-shot object detector) in three ways: ○ Integrating instance mask prediction ○ Making the loss function adaptive and more stable ○ Including hard examples in training #RetinaMask Fu et al. RetinaMars: Learning to predict masks improves state-of-the-art single-shot detection for free. ArXiv 2019
  • 65. 65 CNN Cat A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” NIPS 2012
  • 68. Recurrent Instance Segmentation Romera-Paredes & H.S. Torr. Recurrent Instance Segmentation ECCV 2016 68 Sequential mask generation
  • 69. Salvador, A., Bellver, Campos. V, M., Baradad, M., Marqués, F., Torres, J., & Giro-i-Nieto, X. (2018) From Pixels to Object Sequences: Recurrent Semantic Instance Segmentation. Recurrent Instance Segmentation
  • 70. Recurrent Instance Segmentation #RVOS Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques and Xavier Giro-i-Nieto. “RVOS: End-to-End Recurrent Network for Video Object Segmentation”, CVPR 2019. time (frame sequence) space (object sequence)
  • 71. Outline Segmentation Datasets Segmentation Applications Semantic Segmentation ● Deconvolution (or transposed convolution) ● Dilated Convolution ● Skip Connections Instance Segmentation ● Proposal-Based ● Recurrent ● DETR Panoptic Segmentation 71
  • 72. Semantic + Instance = Panoptic Segmentation 72#PS Kirillov, A., He, K., Girshick, R., Rother, C., & Dollár, P. (2019). Panoptic segmentation. CVPR 2019.
  • 73. Panoptic Segmentation: methods 73 ● UPSNet: A Unified Panoptic Segmentation Network Mask R-CNN design #UPSNET Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., & Urtasun, R. (2019). Upsnet: A unified panoptic segmentation network. CVPR 2019.
  • 74. Panoptic Segmentation: methods 74 ● UPSNet: A Unified Panoptic Segmentation Network Xioing et al. UPSNet: A Unified Panoptic Segmentation Network. CVPR 2019
  • 75. Summary Semantic Segmentation Methods ● Deconvolution (or transposed convolution) ● Dilated Convolution ● Skip Connections Instance Segmentation Methods ● Proposal-Based ● Recurrent ● Instance Embedding Panoptic Segmentation 75
  • 76. Latest advances ● Bolya et al. YOLACT Real-time Instance Segmentation. ICCV 2019 ● #Axial-DeepLab Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., & Chen, L. C. (2020). Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation. ECCV 2020. ● #SOLO Wang, X., Kong, T., Shen, C., Jiang, Y., & Li, L. (2019). Solo: Segmenting objects by locations. ECCV 2020 ● Fast Semantic Segmentation with MobileNet in PyTorch. 76
  • 77. Segmentation Datasets ● 20 categories ● +10,000 images ● Semantic segmentation GT ● Instance segmentation GT ● 540 categories ● +10,000 images ● Dense annotations ● Semantic segmentation GT ● Objects + stuff Pascal Visual Object Classes Pascal Context 77
  • 78. Segmentation Datasets ● Real indoor & outdoor scenes ● 80 categories ● +300,000 images ● 2M instances ● Partial annotations ● Semantic segmentation GT ● Instance segmentation GT ● Objects, but no stuff COCO Common Objects in Context 78 ● Real general scenes ● +150 categories ● +22,000 images ● Semantic segmentation GT ● Instance + parts segmentation GT ● Objects and stuff ADE20K
  • 79. Segmentation Datasets 79 ● Real general scenes ● 350 categories ● +950,000 of images ● 2,700,00 instance segmentations ● Instance segmentation GT ● Objects Open Images V6
  • 80. Segmentation Datasets 80 ● Real general scenes ● 1,000 categories ● 164,000 of images ● 2,200,00 instance segmentations ● 11.2 objects instance from 3.4 categories on average per image (more complex images than Open Images and MS COCO) ● Instance segmentation GT ● Objects LVIS
  • 81. Segmentation Datasets ● Real driving scenes ● 30 categories ● +25,000 images ● 20,000 partial annotations ● 5,000 dense annotations ● Semantic segmentation GT ● Instance segmentation GT ● Depth, GPS and other metadata ● Objects and stuff ● Real driving scenes covering 6 continents with variety of weather/season/time of day/camera/viewpoint ● 152 categories ● 25,000 images ● Semantic segmentation GT ● Instance + parts segmentation GT ● Objects and stuff CityScapes Mapillary Vistas Dataset 81