SlideShare a Scribd company logo
1 of 54
Download to read offline
SSD: Single Shot MultiBox Detector
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
Cheng-Yang Fu, Alexander C.Berg
[arXiv][demo][code] (Mar 2016)
Slides by Míriam Bellver
Computer Vision Reading Group, UPC
28th October, 2016
Outline
▷ Introduction
▷ Related Work
▷ The Single-Shot Detector
▷ Experimental Results
▷ Conclusions
1.
Introduction
SSD: Single Shot MultiBox Detector
Introduction
Object detection
Introduction
Current object detection systems
resample pixels for each BBOX
resample features for each BBOX
high quality
classifier
Object proposals
generation
Image
pixels
Introduction
Current object detection systems
Computationally too intensive and too slow for real-time
applications
Faster R-CNN 7 FPS
resample pixels for each BBOX
resample features for each BBOX
high quality
classifier
Object proposals
generation
Image
pixels
Introduction
Current object detection systems
Computationally too intensive and too slow for real-time
applications
Faster R-CNN 7 FPS
resample pixels for each BBOX
resample features for each BBOX
high quality
classifier
Object proposals
generation
Image
pixels
Introduction
SSD: First deep network based object detector
that does not resample pixels or features for
bounding box hypotheses and is as accurate as
approaches that do.
Improvement in speed vs accuracy trade-off
Introduction
Introduction
Contributions:
▷ A single-shot detector for multiple categories that is faster
than state of the art single shot detectors (YOLO) and as
accurate as Faster R-CNN
▷ Predicts category scores and boxes offset for a fixed set of
default BBs using small convolutional filters applied to
feature maps
▷ Predictions of different scales from feature maps of
different scales, and separate predictions by aspect ratio
▷ End-to-end training and high accuracy, improving speed vs
accuracy trade-off
2.
Related Work
SSD: Single Shot MultiBox Detector
Object Detection prior to CNNs
Two different traditional approaches:
▷ Sliding Window: e.g. Deformable Part Model (DPM)
▷ Object proposals: e.g. Selective Search
Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008, June). A discriminatively trained, multiscale, deformable part model
Uijlings, J. R., van de Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition
Object detection with CNN’s
▷ R-CNN
Girshick, R. (2015). Fast r-cnn
Leveraging the object proposals
bottleneck
▷ SPP-net
▷ Fast R-CNN
He, K., Zhang, X., Ren, S., & Sun, J. (2014, September). Spatial pyramid pooling in deep convolutional networks for visual recognition
Girshick, R. (2015). Fast r-cnn.
Improving quality of proposals using
CNNs
Low-level features object proposals
Proposals generated directly from a DNN
E.g. : MultiBox, Faster R-CNN
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks
Szegedy, C., Reed, S., Erhan, D., & Anguelov, D. (2014). Scalable, high-quality object detection.
Single-shot detectors
Instead of having two networks
Region Proposals Network + Classifier Network
In Single-shot architectures, bounding boxes and confidences
for multiple categories are predicted directly with a single
network
e.g. : Overfeat, YOLO
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2015). You only look once: Unified, real-time object detection
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using
convolutional networks.
Single-shot detectors
Main differences of SSD over YOLO and Overfeat:
▷ Small conv. filters to predict object categories and offsets in BBs
locations, using separate predictors for different aspect ratios, and
applying them on different feature maps to perform detection on multiple
scales
3.1
The Single Shot Detector (SSD)
Model
The Single Shot Detector (SSD)
base network fc6 and fc7
converted to
convolutional
collection of bounding boxes
and scores for each category
The Single Shot Detector (SSD)
base network fc6 and fc7
converted to
convolutional
Multi-scale feature maps for detection: observe how conv feature maps decrease in size and
allow predictions at multiple scales
collection of bounding boxes
and scores for each category
The Single Shot Detector (SSD)
Comparison to YOLO
The Single Shot Detector (SSD)
Convolutional predictors for detection: We apply on top of each conv
feature map a set of filters that predict detections for different aspect ratios
and class categories
The Single Shot Detector (SSD)
What is a detection ?
Described by four parameters (center
bounding box x and y, width and height)
Class category
For all categories we need for
a detection a total of #classes
+ 4 values
The Single Shot Detector (SSD)
Detector for SSD:
Each detector will output a single value, so we need
(classes + 4) detectors for a detection
The Single Shot Detector (SSD)
Detector for SSD:
Each detector will output a single value, so we need
(classes + 4) detectors for a detection
BUT there are different types of detections!
The Single Shot Detector (SSD)
Different “classes” of detections
aspect ratio 2:1
for cats
aspect ratio 1:2
for cats
aspect ratio 1:1
for cats
The Single Shot Detector (SSD)
Default boxes and aspect ratios
Similar to the anchors of Faster R-CNN, with the difference that SSD
applies them on several feature maps of different resolutions
The Single Shot Detector (SSD)
Detector for SSD:
Each detector will output a single value, so we need
(classes + 4) detectors for a detection
as we have #default boxes, we need
(classes + 4) x #default boxes detectors
The Single Shot Detector (SSD)
Convolutional predictors for detection: We apply on top of each conv
feature map a set of filters that predict detections for different aspect ratios
and class categories
The Single Shot Detector (SSD)
For each feature layer of m x n with p channels we apply kernels of 3 x 3 x p to
produce either a score for a category, or a shape offset relative to a default
bounding box coordinates
1024
3
3
Example for conv6:
shape
offset xo to
default box
of aspect
ratio: 1 for
category 1
1024
3
3
score for
category 1
for the
default box
of aspect
ratio: 1
up to classes+4 filters for each default box
considered at that conv feature map
The Single Shot Detector (SSD)
For each feature layer of m x n with p channels we apply kernels of 3 x 3 x p to
produce either a score for a category, or a shape offset relative to a default
bounding box coordinates
So, for each conv layer considered, there are
(classes + 4) x default boxes x m x n
outputs
3.2
The Single Shot Detector (SSD)
Training
The Single Shot Detector (SSD)
SSD requires that ground-truth data
is assigned to specific outputs in the
fixed set of detector outputs
The Single Shot Detector (SSD)
Matching strategy:
For each ground truth box we have to select from all the
default boxes the ones that best fit in terms of location, aspect
ratio and scale.
▷ We select the default box with best jaccard overlap. Then
every box has at least 1 correspondence.
▷ Default boxes with a jaccard overlap higher than 0.5 are
also selected
The Single Shot Detector (SSD)
Training objective:
Similar to MultiBox but handles multiple categories.
confidence loss
softmax loss
localization loss
Smooth L1 lossN: number of default matched BBs
x: is 1 if the default box is matched to a
determined ground truth box, and 0
otherwise
l: predicted bb parameters
g: ground truth bb parameters
c: class
is 1 by
cross-validation
The Single Shot Detector (SSD)
Choosing scales and aspect ratios for default boxes:
▷ Feature maps from different layers are used to handle scale variance
▷ Specific feature map locations learn to be responsive to specific areas
of the image and particular scales of objects
The Single Shot Detector (SSD)
Choosing scales and aspect ratios for default boxes:
▷ If m feature maps are used for prediction, the scale of the default
boxes for each feature map is:
0.1
0.95
The Single Shot Detector (SSD)
Choosing scales and aspect ratios for default boxes:
▷ At each scale, different aspect ratios are considered:
width and height of default bbox
The Single Shot Detector (SSD)
Hard negative mining:
Significant imbalance between positive and negative training examples
▷ Use negative samples with higher confidence score
▷ Then the ratio of positive-negative samples is 3:1
The Single Shot Detector (SSD)
Data augmentation:
Each training sample is randomly sampled by one of the
following options:
▷ Use the original image
▷ Sample a path with a minimum jaccard overlap with
objects
▷ Randomly sample a path
4.
Experimental Results
SSD: Single Shot MultiBox Detector
Experimental Results
Base network:
▷ VGG16 (with fc6 and fc7 converted to conv layers and pool5 from 2x2 to
3x3 using atrous algorithm, removed fc8 and dropout)
▷ It is fine-tuned using SGD
▷ Training and testing code is built on caffe
Database:
▷ Training: VOC2007 trainval and VOC2012 trainval (16551 images)
▷ Testing: VOC2007 test (4952 images)
Experimental Results
Mean Average Precision for PASCAL ‘07
Experimental Results
Model analysis
Experimental Results
Visualization for performance
Experimental Results
Sensitivity to different object characteristics with PASCAL
2007 :
Experimental Results
Database:
▷ Training: VOC2007 trainval and test and VOC2012
trainval (21503 images)
▷ Testing: VOC2012 test (10991 images)
Mean Average Precision for PASCAL ‘12
Experimental Results
MS COCO Database:
▷ A total of 300k images
Test-dev results:
Experimental Results
Inference time :
Non-maximum suppression has to be efficient because of the
large number of boxes generated
Visualizations
Visualizations
5.
Conclusions
SSD: Single Shot MultiBox Detector
Conclusions
▷ Single-shot object detector for multiple categories
▷ One key feature is to use multiple convolutional maps
to deal with different scales
▷ More default bounding boxes, the better results
obtained
▷ Comparable accuracy to state-of-the-art object
detectors, but much faster
▷ Future direction: use RNNs to detect and track objects
in video
Thank you for your
attention! Questions?

More Related Content

What's hot

Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionDADAJONJURAKUZIEV
 
Single Shot Multibox Detector
Single Shot Multibox DetectorSingle Shot Multibox Detector
Single Shot Multibox DetectorNamHyuk Ahn
 
Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001Md. Minhazul Haque
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationDat Nguyen
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation岳華 杜
 
Object detection and Instance Segmentation
Object detection and Instance SegmentationObject detection and Instance Segmentation
Object detection and Instance SegmentationHichem Felouat
 

What's hot (20)

Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Introduction of Faster R-CNN
Introduction of Faster R-CNNIntroduction of Faster R-CNN
Introduction of Faster R-CNN
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
 
Single Shot Multibox Detector
Single Shot Multibox DetectorSingle Shot Multibox Detector
Single Shot Multibox Detector
 
Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Yolo
YoloYolo
Yolo
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Faster rcnn
Faster rcnnFaster rcnn
Faster rcnn
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
 
Object detection and Instance Segmentation
Object detection and Instance SegmentationObject detection and Instance Segmentation
Object detection and Instance Segmentation
 
Object tracking
Object trackingObject tracking
Object tracking
 

Viewers also liked

Single Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance SegmentationSingle Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance Segmentation홍배 김
 
SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)Takanori Ogata
 
Convolutional Pose Machines
Convolutional Pose MachinesConvolutional Pose Machines
Convolutional Pose MachinesTakanori Ogata
 
Deep Object Detectors #1 (~2016.6)
Deep Object Detectors #1 (~2016.6)Deep Object Detectors #1 (~2016.6)
Deep Object Detectors #1 (~2016.6)Ildoo Kim
 
Q Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object LocalizationQ Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object Localization홍배 김
 
인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝Jinwon Lee
 
알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리Shane (Seungwhan) Moon
 
기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가Yongha Kim
 

Viewers also liked (9)

Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
Single Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance SegmentationSingle Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance Segmentation
 
SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)
 
Convolutional Pose Machines
Convolutional Pose MachinesConvolutional Pose Machines
Convolutional Pose Machines
 
Deep Object Detectors #1 (~2016.6)
Deep Object Detectors #1 (~2016.6)Deep Object Detectors #1 (~2016.6)
Deep Object Detectors #1 (~2016.6)
 
Q Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object LocalizationQ Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object Localization
 
인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝
 
알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리
 
기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가
 

Similar to SSD: Fast and Accurate Object Detection in a Single Shot

#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn nsAndrew Brozek
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper reviewYoonho Na
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningCharles Deledalle
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
 
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSSoma Boubou
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterYousef Fadila
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術CHENHuiMei
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors지현 백
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingwolf
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors지현 백
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxfahmi324663
 
Video Object Segmentation in Videos
Video Object Segmentation in VideosVideo Object Segmentation in Videos
Video Object Segmentation in VideosNAVER Engineering
 

Similar to SSD: Fast and Accurate Object Detection in a Single Shot (20)

Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn ns
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
object-detection.pptx
object-detection.pptxobject-detection.pptx
object-detection.pptx
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
 
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Adaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom predictionAdaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom prediction
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors
 
Depth estimation using deep learning
Depth estimation using deep learningDepth estimation using deep learning
Depth estimation using deep learning
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
Video Object Segmentation in Videos
Video Object Segmentation in VideosVideo Object Segmentation in Videos
Video Object Segmentation in Videos
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 

Recently uploaded

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 

Recently uploaded (20)

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 

SSD: Fast and Accurate Object Detection in a Single Shot

  • 1. SSD: Single Shot MultiBox Detector Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C.Berg [arXiv][demo][code] (Mar 2016) Slides by Míriam Bellver Computer Vision Reading Group, UPC 28th October, 2016
  • 2. Outline ▷ Introduction ▷ Related Work ▷ The Single-Shot Detector ▷ Experimental Results ▷ Conclusions
  • 5. Introduction Current object detection systems resample pixels for each BBOX resample features for each BBOX high quality classifier Object proposals generation Image pixels
  • 6. Introduction Current object detection systems Computationally too intensive and too slow for real-time applications Faster R-CNN 7 FPS resample pixels for each BBOX resample features for each BBOX high quality classifier Object proposals generation Image pixels
  • 7. Introduction Current object detection systems Computationally too intensive and too slow for real-time applications Faster R-CNN 7 FPS resample pixels for each BBOX resample features for each BBOX high quality classifier Object proposals generation Image pixels
  • 8. Introduction SSD: First deep network based object detector that does not resample pixels or features for bounding box hypotheses and is as accurate as approaches that do. Improvement in speed vs accuracy trade-off
  • 10. Introduction Contributions: ▷ A single-shot detector for multiple categories that is faster than state of the art single shot detectors (YOLO) and as accurate as Faster R-CNN ▷ Predicts category scores and boxes offset for a fixed set of default BBs using small convolutional filters applied to feature maps ▷ Predictions of different scales from feature maps of different scales, and separate predictions by aspect ratio ▷ End-to-end training and high accuracy, improving speed vs accuracy trade-off
  • 11. 2. Related Work SSD: Single Shot MultiBox Detector
  • 12. Object Detection prior to CNNs Two different traditional approaches: ▷ Sliding Window: e.g. Deformable Part Model (DPM) ▷ Object proposals: e.g. Selective Search Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008, June). A discriminatively trained, multiscale, deformable part model Uijlings, J. R., van de Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition
  • 13. Object detection with CNN’s ▷ R-CNN Girshick, R. (2015). Fast r-cnn
  • 14. Leveraging the object proposals bottleneck ▷ SPP-net ▷ Fast R-CNN He, K., Zhang, X., Ren, S., & Sun, J. (2014, September). Spatial pyramid pooling in deep convolutional networks for visual recognition Girshick, R. (2015). Fast r-cnn.
  • 15. Improving quality of proposals using CNNs Low-level features object proposals Proposals generated directly from a DNN E.g. : MultiBox, Faster R-CNN Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks Szegedy, C., Reed, S., Erhan, D., & Anguelov, D. (2014). Scalable, high-quality object detection.
  • 16. Single-shot detectors Instead of having two networks Region Proposals Network + Classifier Network In Single-shot architectures, bounding boxes and confidences for multiple categories are predicted directly with a single network e.g. : Overfeat, YOLO Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2015). You only look once: Unified, real-time object detection Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks.
  • 17. Single-shot detectors Main differences of SSD over YOLO and Overfeat: ▷ Small conv. filters to predict object categories and offsets in BBs locations, using separate predictors for different aspect ratios, and applying them on different feature maps to perform detection on multiple scales
  • 18. 3.1 The Single Shot Detector (SSD) Model
  • 19. The Single Shot Detector (SSD) base network fc6 and fc7 converted to convolutional collection of bounding boxes and scores for each category
  • 20. The Single Shot Detector (SSD) base network fc6 and fc7 converted to convolutional Multi-scale feature maps for detection: observe how conv feature maps decrease in size and allow predictions at multiple scales collection of bounding boxes and scores for each category
  • 21. The Single Shot Detector (SSD) Comparison to YOLO
  • 22. The Single Shot Detector (SSD) Convolutional predictors for detection: We apply on top of each conv feature map a set of filters that predict detections for different aspect ratios and class categories
  • 23. The Single Shot Detector (SSD) What is a detection ? Described by four parameters (center bounding box x and y, width and height) Class category For all categories we need for a detection a total of #classes + 4 values
  • 24. The Single Shot Detector (SSD) Detector for SSD: Each detector will output a single value, so we need (classes + 4) detectors for a detection
  • 25. The Single Shot Detector (SSD) Detector for SSD: Each detector will output a single value, so we need (classes + 4) detectors for a detection BUT there are different types of detections!
  • 26. The Single Shot Detector (SSD) Different “classes” of detections aspect ratio 2:1 for cats aspect ratio 1:2 for cats aspect ratio 1:1 for cats
  • 27. The Single Shot Detector (SSD) Default boxes and aspect ratios Similar to the anchors of Faster R-CNN, with the difference that SSD applies them on several feature maps of different resolutions
  • 28. The Single Shot Detector (SSD) Detector for SSD: Each detector will output a single value, so we need (classes + 4) detectors for a detection as we have #default boxes, we need (classes + 4) x #default boxes detectors
  • 29. The Single Shot Detector (SSD) Convolutional predictors for detection: We apply on top of each conv feature map a set of filters that predict detections for different aspect ratios and class categories
  • 30. The Single Shot Detector (SSD) For each feature layer of m x n with p channels we apply kernels of 3 x 3 x p to produce either a score for a category, or a shape offset relative to a default bounding box coordinates 1024 3 3 Example for conv6: shape offset xo to default box of aspect ratio: 1 for category 1 1024 3 3 score for category 1 for the default box of aspect ratio: 1 up to classes+4 filters for each default box considered at that conv feature map
  • 31. The Single Shot Detector (SSD) For each feature layer of m x n with p channels we apply kernels of 3 x 3 x p to produce either a score for a category, or a shape offset relative to a default bounding box coordinates So, for each conv layer considered, there are (classes + 4) x default boxes x m x n outputs
  • 32. 3.2 The Single Shot Detector (SSD) Training
  • 33. The Single Shot Detector (SSD) SSD requires that ground-truth data is assigned to specific outputs in the fixed set of detector outputs
  • 34. The Single Shot Detector (SSD) Matching strategy: For each ground truth box we have to select from all the default boxes the ones that best fit in terms of location, aspect ratio and scale. ▷ We select the default box with best jaccard overlap. Then every box has at least 1 correspondence. ▷ Default boxes with a jaccard overlap higher than 0.5 are also selected
  • 35. The Single Shot Detector (SSD) Training objective: Similar to MultiBox but handles multiple categories. confidence loss softmax loss localization loss Smooth L1 lossN: number of default matched BBs x: is 1 if the default box is matched to a determined ground truth box, and 0 otherwise l: predicted bb parameters g: ground truth bb parameters c: class is 1 by cross-validation
  • 36. The Single Shot Detector (SSD) Choosing scales and aspect ratios for default boxes: ▷ Feature maps from different layers are used to handle scale variance ▷ Specific feature map locations learn to be responsive to specific areas of the image and particular scales of objects
  • 37. The Single Shot Detector (SSD) Choosing scales and aspect ratios for default boxes: ▷ If m feature maps are used for prediction, the scale of the default boxes for each feature map is: 0.1 0.95
  • 38. The Single Shot Detector (SSD) Choosing scales and aspect ratios for default boxes: ▷ At each scale, different aspect ratios are considered: width and height of default bbox
  • 39. The Single Shot Detector (SSD) Hard negative mining: Significant imbalance between positive and negative training examples ▷ Use negative samples with higher confidence score ▷ Then the ratio of positive-negative samples is 3:1
  • 40. The Single Shot Detector (SSD) Data augmentation: Each training sample is randomly sampled by one of the following options: ▷ Use the original image ▷ Sample a path with a minimum jaccard overlap with objects ▷ Randomly sample a path
  • 41. 4. Experimental Results SSD: Single Shot MultiBox Detector
  • 42. Experimental Results Base network: ▷ VGG16 (with fc6 and fc7 converted to conv layers and pool5 from 2x2 to 3x3 using atrous algorithm, removed fc8 and dropout) ▷ It is fine-tuned using SGD ▷ Training and testing code is built on caffe Database: ▷ Training: VOC2007 trainval and VOC2012 trainval (16551 images) ▷ Testing: VOC2007 test (4952 images)
  • 43. Experimental Results Mean Average Precision for PASCAL ‘07
  • 46. Experimental Results Sensitivity to different object characteristics with PASCAL 2007 :
  • 47. Experimental Results Database: ▷ Training: VOC2007 trainval and test and VOC2012 trainval (21503 images) ▷ Testing: VOC2012 test (10991 images) Mean Average Precision for PASCAL ‘12
  • 48. Experimental Results MS COCO Database: ▷ A total of 300k images Test-dev results:
  • 49. Experimental Results Inference time : Non-maximum suppression has to be efficient because of the large number of boxes generated
  • 52. 5. Conclusions SSD: Single Shot MultiBox Detector
  • 53. Conclusions ▷ Single-shot object detector for multiple categories ▷ One key feature is to use multiple convolutional maps to deal with different scales ▷ More default bounding boxes, the better results obtained ▷ Comparable accuracy to state-of-the-art object detectors, but much faster ▷ Future direction: use RNNs to detect and track objects in video
  • 54. Thank you for your attention! Questions?