SlideShare a Scribd company logo
1 of 39
Download to read offline
auro@shatterline.com 1
How CNNs Localize Objects
with Increasing Precision
and Speed
Auro Tripathy
May 2017
How do I
fine-tune the
bounding box?
What
Class?
auro@shatterline.com 2
•  Terms, concepts, and metrics for detection algorithms
•  Two-stage detectors
•  Region-based Convolution Neural Networks (R-CNN)
•  Fast R-CNN
•  Faster R-CNN
•  Unified (single-shot) detectors
•  You Only Look Once (YOLO)
•  Single-Shot Detector (SSD)
Outline
auro@shatterline.com 3
What is to Classification as Where is to Detection
“We’re in the midst of an Object Detection Renaissance”
– Ross Girschik
What?
ü  Person, Probability=0.7
ü  Dog, Probability=0.8
ü  Horse, Probability=0.8
What & Where?
ü  Person, Location=(x1, y1, w1, h1), Confidence=90%
ü  Dog, Location=(x2, y2, w2, h2), Confidence=80%
ü  Horse, Location=(x3, y3, w3, h5), Confidence=90%
auro@shatterline.com 4
0.5, 34.3
0.02, 58.5
0.4, 70
7, 73.2
21, 63.2
58, 77
19, 80
30
40
50
60
70
80
0 10 20 30 40 50 60 70
CNN-Based Detection Performance at a Glance
Two-Stage Techniques versus Single-Shot Techniques
SSD300X300
SSD512X512
YOLO
Faster R-CNN
Fast R-CNN
R-CNN
Deformable Parts Model
Frames per Sec (fps)
meanAvgPrecision(mAP)VOC
(fps, mAP)
auro@shatterline.com 5
What’s Mean Average Precision (mAP)?
Precision =
TP
TP + FP
Recall =
TP
TP + FN
1. Predict the Average Precision of each class in your test set
2. Then take the mean of these average individual class precisions to get
mean Average Precision (mAP)
High precision relates
to low false-positives
High recall relates
to low false-negatives
auro@shatterline.com 6
Region-based CNN (R-CNN) Kick-started Detection
0.5, 34.3
0.02, 58.5
0.4, 70
7, 73.2
21, 63.2
58, 77
19, 80
30
40
50
60
70
80
0 10 20 30 40 50 60 70
SSD300X300
SSD512X512
YOLO
Faster R-CNN
Fast R-CNN
R-CNN
Deformable Parts Model
mAPVOC
(fps, mAP)
fps
auro@shatterline.com 7
Image
Region Proposal
Generator (2000
Regions)
CNN - Feature
Extractor Per
Region
CNN Output -
Feature Vector
Linear SVM
Classifier for
Region
Airplane: No
:
Dog: Yes
:
TV Monitor: No
Region-Based CNN (R-CNN)
Rich feature hierarchies for accurate object detection and semantic segmentation Tech report (v5)
Bounding Box
Regressor
CNN
auro@shatterline.com 8
•  Training is a three-stage disjoint pipeline
1.  Fine-tune a CNN on region proposals using log loss
2.  Fits SVMs (acting as object detectors) to CNN features replacing Softmax
3.  Learn to regress bounding boxes with squared loss (L2)
•  External Region Proposal Algorithm
•  No sharing of parameters between the 2000 region proposals
•  Volume of data mandates intermediate stages stored on disk
Using CNNs Broke New Ground
The Downside – High Workloads for Train/Test
http://videolectures.net/iccv2015_girshick_fast_r_cnn/
auro@shatterline.com 9
What’s Bounding-Box Regression?
Learn Transformation W that Maps Proposal P to Ground Truth G
Groundtruth, G
Proposal, P
d(P)
d★(P) = W★
T
ϕ5(P),
where ★ is x, y, w, h and ϕ5 are Pool5 Features
Transformation d(P) is parameterized into four functions:
dx(P), dy(P), dw(P), dh(P)
x, y are linear translations of the center of P’s bounding box
w, h are log-space translations of the width & height of P
We learn W by minimizing a standard least squares problem
with Ridge Regression regularization
x, y
w
h
auro@shatterline.com 10
Learn to Only Regress Proposals that are “Nearby”
to Ground Truth with Intersection over Union
IoU Threshold = 0.9IoU Threshold = 0.7IoU Threshold = 0.6
Used only if the Intersection over Union (IoU) between the
predicted box and the ground truth box is greater than a threshold
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/object_localization_and_detection.html
auro@shatterline.com 11
Fast R-CNN Improved Detection w/Single-Stage
Training
0.5, 34.3
0.02, 58.5
0.4, 70
7, 73.2
21, 63.2
58, 77
19, 80
30
40
50
60
70
80
0 10 20 30 40 50 60 70
SSD300X300
SSD512X512
YOLO
Faster R-CNN
Fast R-CNN
R-CNN
Deformable Parts Model
mAPVOC
(fps, mAP)
fps
auro@shatterline.com 12
•  CNN over entire image instead of over a
region proposal
•  Shares convolution layers
•  Continues to use external region proposals
•  Projects region proposals on top of
Conv5 of VGG16
•  Simultaneously predicts
•  Classes and
•  Bounding Boxes via joint training
Fast R-CNN
How do I
fine-tune the
bounding box?
What
Class?
https://clipartfest.com/download/fb2cd25bdefb07cc8eb8cd28091ab62ea3519461.html
Network is designed with a
classification “head” and a
regression “head”
auro@shatterline.com 13
Fast R-CNN
RoI Projection
(for each Region)
Region Proposal
Generator (2000 Regions)
RoI Pooling Layer
Fully Connected (FC6 + FC7)
1024 RoI Feature Vector
FC
Class
Probability
Bounding Box
Prediction
Conv5
Conv1
auro@shatterline.com 14
Fast R-CNN – Forward and Back-Prop Paths Using
Multi-Class Loss
RoI Projection
(for each Region)
Region Proposal Generator
(2000 Regions)
FC compatible RoI Pooling Layer
FC
Class
Probability
Bounding Box
Regressor
Conv5
Conv1
Linear
Softmax Linear
Log Loss + Smooth L1 Loss Forward Path
Back-Prop Path
https://andrewliao11.github.io/object_detection/faster_rcnn/
auro@shatterline.com 15
Lossmulticlass = Lossclassification + λ * Loss bounding box regression
Multiclass Loss = Log Loss + Smooth L1 Loss
predicted
offsets
ground truth
regression targetΣ Smooth-L1= -log(loss for true class u) + λ *
0.5x2 if mod(x) < 1
mod(x) – 0.5 otherwise
Smooth-L1(x) =
Smooth-L1 Loss less sensitive to outliers than L2 Loss
auro@shatterline.com 16
•  RoI is a rectangular window into
the feature map (r, c, h,w )
•  HxW grid of sub-windows
•  (e.g., 7X7)
•  Each sub-window, h/H x w/W
•  Max-pool the values in each sub-
window into the corresponding
output grid cell
Introduce Region-of-Interest (RoI) Pooling Layer
For Compatibility with the Fully-Connected Layer Above
Back-Propagation routes
derivatives through RoI Layer
w
h
(r,c)
h/H
w/W
auro@shatterline.com 17
•  Higher mAP over R-CNN
•  Training is single-stage using a multi-class loss
•  Training can update all network layers
•  No disk storage is required for feature-caching
Benefits of Fast R-CNN over R-CNN
auro@shatterline.com 18
Faster R-CNN Subsumes Region Proposals
0.5, 34.3
0.02, 58.5
0.4, 70
7, 73.2
21, 63.2
58, 77
19, 80
30
40
50
60
70
80
0 10 20 30 40 50 60 70
SSD300X300
SSD512X512
YOLO
Faster R-CNN
Fast R-CNN
R-CNN
Deformable Parts Model
mAPVOC
(fps, mAP)
fps
auro@shatterline.com 19
•  Replace the use of external object proposals with a Region Proposal
Network (RPN)
•  RPN reuse CNNs for object proposals!
•  RPN shares convolutions with the detection side of the network
•  Big benefit, marginal cost of computing proposals becomes small
•  Reuse previously covered Fast R-CNN for detection!
•  Training regime alternates between
•  First, fine-tuning for the region proposal task
•  Then, fine-tuning for the object detection keeping the proposals fixed
Faster R-CNN
auro@shatterline.com 20
Novel “Anchor” Boxes Serve as References at
Multiple Scales and Aspect Ratios
Pyramids of feature maps are
built & the classifier is run at
all scales
feature map
scaled images
Pyramids of filters of
multiple scales and sizes
are run on the feature map
multiple filters
Pyramids of reference
boxers in the regression
functions
feature map
anchors =
references
at multiple
scales and
aspect ratios
✓New
auro@shatterline.com 21
Region Proposal Network
Training Classifies Objectness & Regresses Bounding Boxes
Conv5
Conv1
k=9 * 2 Class Scores
(object or background)
k=9 * 4
Box Proposals
(x, y, w, h)
Sliding
window
k=9 “anchor” boxes to address
Three scales (128,256,512)
Three aspect ratios (2:1, 1:1, 1:2)
Scale 1 Scale 2 Scale 3
1:1
2:1
1:2
“Objectness” Score Bounding Box Regression
256 Dimension
Vector for each
Anchor at each
location
auro@shatterline.com 22
Step 1 – Train RPN initialized w/ImageNet to
Output Region Proposals
FC
Bounding Box
Regressor
Conv5
Conv1
Linear
Softmax
RPN
Layers
RPN Proposals
Fine-Tuned end-to-end
w/ImageNet Weights
https://andrewliao11.github.io/object_detection/faster_rcnn/
auro@shatterline.com 23
Step 2 – Train Fast R-CNN with Learnt Region
Proposals
FC
Bounding Box
Regressor
Conv5
Conv1
Linear
Softmax
RPN
Layers
Object Class
Probabilities
Fine-Tuned end-to-end
w/ImageNet Weights
RPN Proposals Learned in Step 1
auro@shatterline.com 24
Step 3 – Initialize RPN from Model Trained in Step 2
& Train RPN Again
FC
Bounding Box
Regressor
Conv5
Conv1
Linear
Softmax
RPN
Layers
RPN Proposals
Share the Weights from Step 2
but Lock them (prevent updates)
auro@shatterline.com 25
Step 4 – Fine Tune FC Layers of Fast R-CNN Using the
Shared Convolution Weights from Step 3
FC
Bounding Box
Regressor
Conv5
Conv1
Linear
Softmax
RPN
Layers
Object Class
Probabilities
RPN Proposals Learned in Step 3
Share the Weights from Step 3
But Prevent Updates
Fine-tune the
unique layers
Of Fast R-CNN
auro@shatterline.com 26
You Only Look Once (YOLO) Uses One Network,
Runs Fast
0.5, 34.3
0.02, 58.5
0.4, 70
7, 73.2
21, 63.2
58, 77
19, 80
30
40
50
60
70
80
0 10 20 30 40 50 60 70
SSD300X300
SSD512X512
YOLO
Faster R-CNN
Fast R-CNN
R-CNN
Deformable Parts Model
mAPVOC
(fps, mAP)
fps
auro@shatterline.com 27
You-Only-Look-Once (YOLO)
Do Away with Dual Networks (RPN + Classifier), Use a Single Network
•  Divide Image into a S=7 x S=7 grid of
cells
•  Within each cell, predict
1.  B=2 Bounding Boxes
2.  C=20 Class Probabilities
•  Each Bounding Box predicts 5
parameters
•  x, y, width, height, confidence
•  x, y is the center of the box relative
to the grid cell
•  Conditional class probability
(conditioned on the grid cell
containing an object)
Bounding Box +
Confidence
Class
Probability
•  Output of Network:
•  S * S * (5 * B + C)
•  7 * 7 *(5 * 2 + 20) = 1470 values
auro@shatterline.com 28
YOLO – Very Fast Direct Prediction Using a CNN
Output
S * S * (5 * B + C)
7 * 7 *(5 * 2 + 20) = 1470 values
448 * 448
3
112 * 112
256
56 * 56
192
1024
512
10247 * 7
14 * 14
Convs, 7x7x64-s-2
MaxPool, 2x2-s-2
10247 * 7
7 * 7 (5 * 2 + 20)
4096
Convs, 3x3x192
MaxPool, 2x2-s-2
Convs, 1x1x128
3x3x256
1x1x256
3x3x512
MaxPool, 2x2-s2
Convs, (1x1x256
3x3x512) x 4
1x1x512
3x3x1024
MaxPool, 2x2-s-2
Fully Connected Layer
Convs, 3x3x1024
3x3x1024
Fully Connected Layer
28 * 28
Convs, (1x1x512
3x3x1024) x 2
1x1x512
3x3x1024
3x3x1024-s-2
auro@shatterline.com 29
YOLO’s 1X1 Convolutions Reduces Parameters, Runs Fast
Simple Example Shows Parameters Reduced from 4860 to 1440
Parameter Size =
18 x (3 x 3) x 30 =
4860
30
h
w
3
3
w
h
Output
Feature
Map3x3
Kernel
Input
Feature
Map
18
Total Parameter Size =
90 + 1350 =
1440
30
h
w
3
3
w
h
Output
Feature
Map
3x3
Kernel
Input
Feature
Map
18
1x1
Kernel
5
w
h
Parameter Size =
5 x (3 x 3) x 30 =
1350
Parameter Size =
18 x (1 x 1) x 5 =
90
auro@shatterline.com 30
•  Confidence score Intersection-over-Union (IoU) between
•  Predicted Box
•  Ground Truth
Non-Maximal Suppression via Intersection-
Over-Union
Predicted Box
Ground Truth
Intersection Area
Union AreaIoU=
auro@shatterline.com 31
•  “[YOLO] struggles with small objects that appear in groups, such as
flocks of birds.”
•  “[YOLO] struggles to generalize to objects in new or unusual aspect
ratios or configurations.”
•  “YOLO struggles to localize objects correctly.”
Limitations of YOLO
You Only Look Once: Unified, Real-Time Object Detection Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
auro@shatterline.com 32
21, 63.2
19, 80
91, 69
81, 73.7
67, 76.8
59, 77.840, 78.6
60
70
80
90
0 20 40 60 80 100
YOLOv2 catches up to SSD
Provides Tradeoffs Between Speed and Accuracy
mAPVOC
fps
YOLO9000: Better, Faster, Stronger Joseph Redmon, Ali Farhadi University of Washington, Allen Institute for AI
SSD512X512
YOLOv2228x228
YOLOv2352x352
YOLOv2416x416
YOLOv2480x480YOLOv2544x544
YOLOv1448x448
auro@shatterline.com 33
Single-Shot Detector (SSD), Faster than YOLO
and as Accurate as Faster R-CNN
0.5, 34.3
0.02, 58.5
0.4, 70
7, 73.2
21, 63.2
58, 77
19, 80
30
40
50
60
70
80
0 10 20 30 40 50 60 70
SSD300X300
SSD512X512
YOLO
Faster R-CNN
Fast R-CNN
R-CNN
Deformable Parts Model
mAPVOC
(fps, mAP)
fps
auro@shatterline.com 34
•  Use six default boxes at
each feature cell
•  Similar to anchor boxes in
Faster R-CNN
•  Six aspect rations
•  { 1, 2, 3, 1/2, 1/3 }
aspect ratio boxes + 1
box with 1 aspect ratio
Uses Default Boxes at Multiple Aspect Ratios
& Scales
4x4 Feature Map
8x8 Feature Map
In a convolutional fashion, we evaluate six default boxes of six
aspect ratios at each location in two feature maps with different
scales (e.g. 8 × 8 and 4 × 4)
Default boxes
auro@shatterline.com 35
Single-Shot Detector Uses Feature Maps at Different
Scales and Concatenates Them All at the Last Layer
Multiclass
Scores
Bounding Box
Regression
Forward Path
Back-Prop Path
Multiclass
Scores
Bounding Box
Regression
Stride=2
Convolution
“…, by utilizing feature maps from several different layers in a
single network for prediction we can mimic the same effect, while
also sharing parameters across all object scales.”
19x19
10x10
auro@shatterline.com 36
SSD – Six Progressively Smaller Layers
Concatenated
300 300
3
38 38
512
Non Maximum Supression
Concatenate Detections Total Detections/Class:7308
19
19
1024
19
19
1024
512
5 5
256
3 3
256
1 1 256
Conv6 (FC)
Default Boxes:6
Detections/Class = (19 * 19 * 6)
Default Boxes:6
Detections/Class = (10 * 10 * 6)
Default Boxes:6
Detections/Class = (5 * 5 * 6)
Default Boxes:6
Detections/Class = (3 * 3 * 6)
Default Boxes:6
Detections/Class = (1 * 1 * 6)
Default Boxes:3*
Detections/Class = (38 * 38 * 3)
Conv4_3
Conv7 (FC)
Conv8_2
Conv9_2
Conv10_2
Pool 11
VGG-16thru
Pool5Layer
1010
* 3 Boxes to reduce computation
auro@shatterline.com 37
•  Data augmentation adds 6.7% mAP
•  Scaling and cropping
•  Additionally, using lower features maps (Conv4_3) for prediction, adds 4% mAP
•  Use a variety of default box shapes
•  Similar to Faster R-CNN anchor boxes
•  { 1, 2, 3, 1/2, 1/3 } aspect ratio boxes + 1 box with 1 aspect ratio
•  {2, 1/2, 3, 1/3} aspect ratio contribute 2.9% mAP
•  Use the atrous algorithm of VGG16 (adds 0.7% mAP)
•  Use Hard Negative Mining to balance ratio of positive samples to negative
samples
SSD has Many Tools that Progressively Improve
mAP
auro@shatterline.com 38
•  Single-shot methods are faster than two-stage methods
•  Single shot mAP is comparable to Faster R-CNN, the best two-stage
method
•  SSD is faster than YOLO, and just as accurate as Faster R-CNN
•  YOLOv2 provides tradeoffs between speed and accuracy
•  The building blocks of detection algorithms presented here can lead to
higher precision and recall, i.e., more innovations to come
Summary
auro@shatterline.com 39
Links to Seminal Resources
Technique Resource
R-CNN Rich feature hierarchies for accurate object detection and
semantic segmentation Tech report (v5)
Fast R-CNN Fast R-CNN
Faster R-CNN Faster R-CNN: Towards Real-Time Object Detection with
Region Proposal Networks
YOLO You Only Look Once: Unified, Real-Time Object Detection
YOLOv2 YOLO9000: Better, Faster, Stronger
SSD SSD: Single Shot MultiBox Detector

More Related Content

What's hot

#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn nsAndrew Brozek
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNNanna8885
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationDat Nguyen
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningMatthew Opala
 
150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural NetworksJunho Cho
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)Universitat Politècnica de Catalunya
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methodsBrodmann17
 
物体検出の歴史(R-CNNからSSD・YOLOまで)
物体検出の歴史(R-CNNからSSD・YOLOまで)物体検出の歴史(R-CNNからSSD・YOLOまで)
物体検出の歴史(R-CNNからSSD・YOLOまで)HironoriKanazawa
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentationMatthew Opala
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural RepresentationJunho Cho
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnnTaeoh Kim
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionKai-Wen Zhao
 

What's hot (20)

Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
 
#10 pydata warsaw object detection with dn ns
#10   pydata warsaw object detection with dn ns#10   pydata warsaw object detection with dn ns
#10 pydata warsaw object detection with dn ns
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Codetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep LearningCodetecon #KRK 3 - Object detection with Deep Learning
Codetecon #KRK 3 - Object detection with Deep Learning
 
150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks150424 Scalable Object Detection using Deep Neural Networks
150424 Scalable Object Detection using Deep Neural Networks
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methods
 
物体検出の歴史(R-CNNからSSD・YOLOまで)
物体検出の歴史(R-CNNからSSD・YOLOまで)物体検出の歴史(R-CNNからSSD・YOLOまで)
物体検出の歴史(R-CNNからSSD・YOLOまで)
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
 
160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation160205 NeuralArt - Understanding Neural Representation
160205 NeuralArt - Understanding Neural Representation
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnn
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
 

Similar to Auro tripathy - Localizing with CNNs

Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxfahmi324663
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper reviewYoonho Na
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networksEntrepreneur / Startup
 
Improving region based CNN object detector using bayesian optimization
Improving region based CNN object detector using bayesian optimizationImproving region based CNN object detector using bayesian optimization
Improving region based CNN object detector using bayesian optimizationAmgad Muhammad
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level FeatureDongmin Choi
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsOn-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsKen Chatfield
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Convolutional neural networks
Convolutional neural networks Convolutional neural networks
Convolutional neural networks Roozbeh Sanaei
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsIRJET Journal
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Universität Rostock
 
Application note multibeam antennas planning
Application note  multibeam antennas planningApplication note  multibeam antennas planning
Application note multibeam antennas planningDonny Aryobowo
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterYousef Fadila
 
Grow and Shrink - Dynamically Extending the Ruby VM Stack
Grow and Shrink - Dynamically Extending the Ruby VM StackGrow and Shrink - Dynamically Extending the Ruby VM Stack
Grow and Shrink - Dynamically Extending the Ruby VM StackKeitaSugiyama1
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsNaughty Dog
 
Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxMannyK4
 

Similar to Auro tripathy - Localizing with CNNs (20)

Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networks
 
Improving region based CNN object detector using bayesian optimization
Improving region based CNN object detector using bayesian optimizationImproving region based CNN object detector using bayesian optimization
Improving region based CNN object detector using bayesian optimization
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsOn-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image Collections
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Convolutional neural networks
Convolutional neural networks Convolutional neural networks
Convolutional neural networks
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection Algorithms
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
 
Application note multibeam antennas planning
Application note  multibeam antennas planningApplication note  multibeam antennas planning
Application note multibeam antennas planning
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
Grow and Shrink - Dynamically Extending the Ruby VM Stack
Grow and Shrink - Dynamically Extending the Ruby VM StackGrow and Shrink - Dynamically Extending the Ruby VM Stack
Grow and Shrink - Dynamically Extending the Ruby VM Stack
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT Methods
 
Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsx
 

More from Auro Tripathy

Back-propagation Primer
Back-propagation PrimerBack-propagation Primer
Back-propagation PrimerAuro Tripathy
 
Of knights-and-drawbridges-nat-behaviour
Of knights-and-drawbridges-nat-behaviourOf knights-and-drawbridges-nat-behaviour
Of knights-and-drawbridges-nat-behaviourAuro Tripathy
 
A Random Forest Approach To Skin Detection With R
A Random Forest Approach To Skin Detection With RA Random Forest Approach To Skin Detection With R
A Random Forest Approach To Skin Detection With RAuro Tripathy
 
Latent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyLatent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyAuro Tripathy
 

More from Auro Tripathy (6)

GoogLeNet Insights
GoogLeNet InsightsGoogLeNet Insights
GoogLeNet Insights
 
Back-propagation Primer
Back-propagation PrimerBack-propagation Primer
Back-propagation Primer
 
Of knights-and-drawbridges-nat-behaviour
Of knights-and-drawbridges-nat-behaviourOf knights-and-drawbridges-nat-behaviour
Of knights-and-drawbridges-nat-behaviour
 
A Random Forest Approach To Skin Detection With R
A Random Forest Approach To Skin Detection With RA Random Forest Approach To Skin Detection With R
A Random Forest Approach To Skin Detection With R
 
Latent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro TripathyLatent Semanctic Analysis Auro Tripathy
Latent Semanctic Analysis Auro Tripathy
 
HTTP Live Streaming
HTTP Live StreamingHTTP Live Streaming
HTTP Live Streaming
 

Recently uploaded

Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...HenryBriggs2
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfsmsksolar
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 

Recently uploaded (20)

Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 

Auro tripathy - Localizing with CNNs

  • 1. auro@shatterline.com 1 How CNNs Localize Objects with Increasing Precision and Speed Auro Tripathy May 2017 How do I fine-tune the bounding box? What Class?
  • 2. auro@shatterline.com 2 •  Terms, concepts, and metrics for detection algorithms •  Two-stage detectors •  Region-based Convolution Neural Networks (R-CNN) •  Fast R-CNN •  Faster R-CNN •  Unified (single-shot) detectors •  You Only Look Once (YOLO) •  Single-Shot Detector (SSD) Outline
  • 3. auro@shatterline.com 3 What is to Classification as Where is to Detection “We’re in the midst of an Object Detection Renaissance” – Ross Girschik What? ü  Person, Probability=0.7 ü  Dog, Probability=0.8 ü  Horse, Probability=0.8 What & Where? ü  Person, Location=(x1, y1, w1, h1), Confidence=90% ü  Dog, Location=(x2, y2, w2, h2), Confidence=80% ü  Horse, Location=(x3, y3, w3, h5), Confidence=90%
  • 4. auro@shatterline.com 4 0.5, 34.3 0.02, 58.5 0.4, 70 7, 73.2 21, 63.2 58, 77 19, 80 30 40 50 60 70 80 0 10 20 30 40 50 60 70 CNN-Based Detection Performance at a Glance Two-Stage Techniques versus Single-Shot Techniques SSD300X300 SSD512X512 YOLO Faster R-CNN Fast R-CNN R-CNN Deformable Parts Model Frames per Sec (fps) meanAvgPrecision(mAP)VOC (fps, mAP)
  • 5. auro@shatterline.com 5 What’s Mean Average Precision (mAP)? Precision = TP TP + FP Recall = TP TP + FN 1. Predict the Average Precision of each class in your test set 2. Then take the mean of these average individual class precisions to get mean Average Precision (mAP) High precision relates to low false-positives High recall relates to low false-negatives
  • 6. auro@shatterline.com 6 Region-based CNN (R-CNN) Kick-started Detection 0.5, 34.3 0.02, 58.5 0.4, 70 7, 73.2 21, 63.2 58, 77 19, 80 30 40 50 60 70 80 0 10 20 30 40 50 60 70 SSD300X300 SSD512X512 YOLO Faster R-CNN Fast R-CNN R-CNN Deformable Parts Model mAPVOC (fps, mAP) fps
  • 7. auro@shatterline.com 7 Image Region Proposal Generator (2000 Regions) CNN - Feature Extractor Per Region CNN Output - Feature Vector Linear SVM Classifier for Region Airplane: No : Dog: Yes : TV Monitor: No Region-Based CNN (R-CNN) Rich feature hierarchies for accurate object detection and semantic segmentation Tech report (v5) Bounding Box Regressor CNN
  • 8. auro@shatterline.com 8 •  Training is a three-stage disjoint pipeline 1.  Fine-tune a CNN on region proposals using log loss 2.  Fits SVMs (acting as object detectors) to CNN features replacing Softmax 3.  Learn to regress bounding boxes with squared loss (L2) •  External Region Proposal Algorithm •  No sharing of parameters between the 2000 region proposals •  Volume of data mandates intermediate stages stored on disk Using CNNs Broke New Ground The Downside – High Workloads for Train/Test http://videolectures.net/iccv2015_girshick_fast_r_cnn/
  • 9. auro@shatterline.com 9 What’s Bounding-Box Regression? Learn Transformation W that Maps Proposal P to Ground Truth G Groundtruth, G Proposal, P d(P) d★(P) = W★ T ϕ5(P), where ★ is x, y, w, h and ϕ5 are Pool5 Features Transformation d(P) is parameterized into four functions: dx(P), dy(P), dw(P), dh(P) x, y are linear translations of the center of P’s bounding box w, h are log-space translations of the width & height of P We learn W by minimizing a standard least squares problem with Ridge Regression regularization x, y w h
  • 10. auro@shatterline.com 10 Learn to Only Regress Proposals that are “Nearby” to Ground Truth with Intersection over Union IoU Threshold = 0.9IoU Threshold = 0.7IoU Threshold = 0.6 Used only if the Intersection over Union (IoU) between the predicted box and the ground truth box is greater than a threshold https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/object_localization_and_detection.html
  • 11. auro@shatterline.com 11 Fast R-CNN Improved Detection w/Single-Stage Training 0.5, 34.3 0.02, 58.5 0.4, 70 7, 73.2 21, 63.2 58, 77 19, 80 30 40 50 60 70 80 0 10 20 30 40 50 60 70 SSD300X300 SSD512X512 YOLO Faster R-CNN Fast R-CNN R-CNN Deformable Parts Model mAPVOC (fps, mAP) fps
  • 12. auro@shatterline.com 12 •  CNN over entire image instead of over a region proposal •  Shares convolution layers •  Continues to use external region proposals •  Projects region proposals on top of Conv5 of VGG16 •  Simultaneously predicts •  Classes and •  Bounding Boxes via joint training Fast R-CNN How do I fine-tune the bounding box? What Class? https://clipartfest.com/download/fb2cd25bdefb07cc8eb8cd28091ab62ea3519461.html Network is designed with a classification “head” and a regression “head”
  • 13. auro@shatterline.com 13 Fast R-CNN RoI Projection (for each Region) Region Proposal Generator (2000 Regions) RoI Pooling Layer Fully Connected (FC6 + FC7) 1024 RoI Feature Vector FC Class Probability Bounding Box Prediction Conv5 Conv1
  • 14. auro@shatterline.com 14 Fast R-CNN – Forward and Back-Prop Paths Using Multi-Class Loss RoI Projection (for each Region) Region Proposal Generator (2000 Regions) FC compatible RoI Pooling Layer FC Class Probability Bounding Box Regressor Conv5 Conv1 Linear Softmax Linear Log Loss + Smooth L1 Loss Forward Path Back-Prop Path https://andrewliao11.github.io/object_detection/faster_rcnn/
  • 15. auro@shatterline.com 15 Lossmulticlass = Lossclassification + λ * Loss bounding box regression Multiclass Loss = Log Loss + Smooth L1 Loss predicted offsets ground truth regression targetΣ Smooth-L1= -log(loss for true class u) + λ * 0.5x2 if mod(x) < 1 mod(x) – 0.5 otherwise Smooth-L1(x) = Smooth-L1 Loss less sensitive to outliers than L2 Loss
  • 16. auro@shatterline.com 16 •  RoI is a rectangular window into the feature map (r, c, h,w ) •  HxW grid of sub-windows •  (e.g., 7X7) •  Each sub-window, h/H x w/W •  Max-pool the values in each sub- window into the corresponding output grid cell Introduce Region-of-Interest (RoI) Pooling Layer For Compatibility with the Fully-Connected Layer Above Back-Propagation routes derivatives through RoI Layer w h (r,c) h/H w/W
  • 17. auro@shatterline.com 17 •  Higher mAP over R-CNN •  Training is single-stage using a multi-class loss •  Training can update all network layers •  No disk storage is required for feature-caching Benefits of Fast R-CNN over R-CNN
  • 18. auro@shatterline.com 18 Faster R-CNN Subsumes Region Proposals 0.5, 34.3 0.02, 58.5 0.4, 70 7, 73.2 21, 63.2 58, 77 19, 80 30 40 50 60 70 80 0 10 20 30 40 50 60 70 SSD300X300 SSD512X512 YOLO Faster R-CNN Fast R-CNN R-CNN Deformable Parts Model mAPVOC (fps, mAP) fps
  • 19. auro@shatterline.com 19 •  Replace the use of external object proposals with a Region Proposal Network (RPN) •  RPN reuse CNNs for object proposals! •  RPN shares convolutions with the detection side of the network •  Big benefit, marginal cost of computing proposals becomes small •  Reuse previously covered Fast R-CNN for detection! •  Training regime alternates between •  First, fine-tuning for the region proposal task •  Then, fine-tuning for the object detection keeping the proposals fixed Faster R-CNN
  • 20. auro@shatterline.com 20 Novel “Anchor” Boxes Serve as References at Multiple Scales and Aspect Ratios Pyramids of feature maps are built & the classifier is run at all scales feature map scaled images Pyramids of filters of multiple scales and sizes are run on the feature map multiple filters Pyramids of reference boxers in the regression functions feature map anchors = references at multiple scales and aspect ratios ✓New
  • 21. auro@shatterline.com 21 Region Proposal Network Training Classifies Objectness & Regresses Bounding Boxes Conv5 Conv1 k=9 * 2 Class Scores (object or background) k=9 * 4 Box Proposals (x, y, w, h) Sliding window k=9 “anchor” boxes to address Three scales (128,256,512) Three aspect ratios (2:1, 1:1, 1:2) Scale 1 Scale 2 Scale 3 1:1 2:1 1:2 “Objectness” Score Bounding Box Regression 256 Dimension Vector for each Anchor at each location
  • 22. auro@shatterline.com 22 Step 1 – Train RPN initialized w/ImageNet to Output Region Proposals FC Bounding Box Regressor Conv5 Conv1 Linear Softmax RPN Layers RPN Proposals Fine-Tuned end-to-end w/ImageNet Weights https://andrewliao11.github.io/object_detection/faster_rcnn/
  • 23. auro@shatterline.com 23 Step 2 – Train Fast R-CNN with Learnt Region Proposals FC Bounding Box Regressor Conv5 Conv1 Linear Softmax RPN Layers Object Class Probabilities Fine-Tuned end-to-end w/ImageNet Weights RPN Proposals Learned in Step 1
  • 24. auro@shatterline.com 24 Step 3 – Initialize RPN from Model Trained in Step 2 & Train RPN Again FC Bounding Box Regressor Conv5 Conv1 Linear Softmax RPN Layers RPN Proposals Share the Weights from Step 2 but Lock them (prevent updates)
  • 25. auro@shatterline.com 25 Step 4 – Fine Tune FC Layers of Fast R-CNN Using the Shared Convolution Weights from Step 3 FC Bounding Box Regressor Conv5 Conv1 Linear Softmax RPN Layers Object Class Probabilities RPN Proposals Learned in Step 3 Share the Weights from Step 3 But Prevent Updates Fine-tune the unique layers Of Fast R-CNN
  • 26. auro@shatterline.com 26 You Only Look Once (YOLO) Uses One Network, Runs Fast 0.5, 34.3 0.02, 58.5 0.4, 70 7, 73.2 21, 63.2 58, 77 19, 80 30 40 50 60 70 80 0 10 20 30 40 50 60 70 SSD300X300 SSD512X512 YOLO Faster R-CNN Fast R-CNN R-CNN Deformable Parts Model mAPVOC (fps, mAP) fps
  • 27. auro@shatterline.com 27 You-Only-Look-Once (YOLO) Do Away with Dual Networks (RPN + Classifier), Use a Single Network •  Divide Image into a S=7 x S=7 grid of cells •  Within each cell, predict 1.  B=2 Bounding Boxes 2.  C=20 Class Probabilities •  Each Bounding Box predicts 5 parameters •  x, y, width, height, confidence •  x, y is the center of the box relative to the grid cell •  Conditional class probability (conditioned on the grid cell containing an object) Bounding Box + Confidence Class Probability •  Output of Network: •  S * S * (5 * B + C) •  7 * 7 *(5 * 2 + 20) = 1470 values
  • 28. auro@shatterline.com 28 YOLO – Very Fast Direct Prediction Using a CNN Output S * S * (5 * B + C) 7 * 7 *(5 * 2 + 20) = 1470 values 448 * 448 3 112 * 112 256 56 * 56 192 1024 512 10247 * 7 14 * 14 Convs, 7x7x64-s-2 MaxPool, 2x2-s-2 10247 * 7 7 * 7 (5 * 2 + 20) 4096 Convs, 3x3x192 MaxPool, 2x2-s-2 Convs, 1x1x128 3x3x256 1x1x256 3x3x512 MaxPool, 2x2-s2 Convs, (1x1x256 3x3x512) x 4 1x1x512 3x3x1024 MaxPool, 2x2-s-2 Fully Connected Layer Convs, 3x3x1024 3x3x1024 Fully Connected Layer 28 * 28 Convs, (1x1x512 3x3x1024) x 2 1x1x512 3x3x1024 3x3x1024-s-2
  • 29. auro@shatterline.com 29 YOLO’s 1X1 Convolutions Reduces Parameters, Runs Fast Simple Example Shows Parameters Reduced from 4860 to 1440 Parameter Size = 18 x (3 x 3) x 30 = 4860 30 h w 3 3 w h Output Feature Map3x3 Kernel Input Feature Map 18 Total Parameter Size = 90 + 1350 = 1440 30 h w 3 3 w h Output Feature Map 3x3 Kernel Input Feature Map 18 1x1 Kernel 5 w h Parameter Size = 5 x (3 x 3) x 30 = 1350 Parameter Size = 18 x (1 x 1) x 5 = 90
  • 30. auro@shatterline.com 30 •  Confidence score Intersection-over-Union (IoU) between •  Predicted Box •  Ground Truth Non-Maximal Suppression via Intersection- Over-Union Predicted Box Ground Truth Intersection Area Union AreaIoU=
  • 31. auro@shatterline.com 31 •  “[YOLO] struggles with small objects that appear in groups, such as flocks of birds.” •  “[YOLO] struggles to generalize to objects in new or unusual aspect ratios or configurations.” •  “YOLO struggles to localize objects correctly.” Limitations of YOLO You Only Look Once: Unified, Real-Time Object Detection Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
  • 32. auro@shatterline.com 32 21, 63.2 19, 80 91, 69 81, 73.7 67, 76.8 59, 77.840, 78.6 60 70 80 90 0 20 40 60 80 100 YOLOv2 catches up to SSD Provides Tradeoffs Between Speed and Accuracy mAPVOC fps YOLO9000: Better, Faster, Stronger Joseph Redmon, Ali Farhadi University of Washington, Allen Institute for AI SSD512X512 YOLOv2228x228 YOLOv2352x352 YOLOv2416x416 YOLOv2480x480YOLOv2544x544 YOLOv1448x448
  • 33. auro@shatterline.com 33 Single-Shot Detector (SSD), Faster than YOLO and as Accurate as Faster R-CNN 0.5, 34.3 0.02, 58.5 0.4, 70 7, 73.2 21, 63.2 58, 77 19, 80 30 40 50 60 70 80 0 10 20 30 40 50 60 70 SSD300X300 SSD512X512 YOLO Faster R-CNN Fast R-CNN R-CNN Deformable Parts Model mAPVOC (fps, mAP) fps
  • 34. auro@shatterline.com 34 •  Use six default boxes at each feature cell •  Similar to anchor boxes in Faster R-CNN •  Six aspect rations •  { 1, 2, 3, 1/2, 1/3 } aspect ratio boxes + 1 box with 1 aspect ratio Uses Default Boxes at Multiple Aspect Ratios & Scales 4x4 Feature Map 8x8 Feature Map In a convolutional fashion, we evaluate six default boxes of six aspect ratios at each location in two feature maps with different scales (e.g. 8 × 8 and 4 × 4) Default boxes
  • 35. auro@shatterline.com 35 Single-Shot Detector Uses Feature Maps at Different Scales and Concatenates Them All at the Last Layer Multiclass Scores Bounding Box Regression Forward Path Back-Prop Path Multiclass Scores Bounding Box Regression Stride=2 Convolution “…, by utilizing feature maps from several different layers in a single network for prediction we can mimic the same effect, while also sharing parameters across all object scales.” 19x19 10x10
  • 36. auro@shatterline.com 36 SSD – Six Progressively Smaller Layers Concatenated 300 300 3 38 38 512 Non Maximum Supression Concatenate Detections Total Detections/Class:7308 19 19 1024 19 19 1024 512 5 5 256 3 3 256 1 1 256 Conv6 (FC) Default Boxes:6 Detections/Class = (19 * 19 * 6) Default Boxes:6 Detections/Class = (10 * 10 * 6) Default Boxes:6 Detections/Class = (5 * 5 * 6) Default Boxes:6 Detections/Class = (3 * 3 * 6) Default Boxes:6 Detections/Class = (1 * 1 * 6) Default Boxes:3* Detections/Class = (38 * 38 * 3) Conv4_3 Conv7 (FC) Conv8_2 Conv9_2 Conv10_2 Pool 11 VGG-16thru Pool5Layer 1010 * 3 Boxes to reduce computation
  • 37. auro@shatterline.com 37 •  Data augmentation adds 6.7% mAP •  Scaling and cropping •  Additionally, using lower features maps (Conv4_3) for prediction, adds 4% mAP •  Use a variety of default box shapes •  Similar to Faster R-CNN anchor boxes •  { 1, 2, 3, 1/2, 1/3 } aspect ratio boxes + 1 box with 1 aspect ratio •  {2, 1/2, 3, 1/3} aspect ratio contribute 2.9% mAP •  Use the atrous algorithm of VGG16 (adds 0.7% mAP) •  Use Hard Negative Mining to balance ratio of positive samples to negative samples SSD has Many Tools that Progressively Improve mAP
  • 38. auro@shatterline.com 38 •  Single-shot methods are faster than two-stage methods •  Single shot mAP is comparable to Faster R-CNN, the best two-stage method •  SSD is faster than YOLO, and just as accurate as Faster R-CNN •  YOLOv2 provides tradeoffs between speed and accuracy •  The building blocks of detection algorithms presented here can lead to higher precision and recall, i.e., more innovations to come Summary
  • 39. auro@shatterline.com 39 Links to Seminal Resources Technique Resource R-CNN Rich feature hierarchies for accurate object detection and semantic segmentation Tech report (v5) Fast R-CNN Fast R-CNN Faster R-CNN Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks YOLO You Only Look Once: Unified, Real-Time Object Detection YOLOv2 YOLO9000: Better, Faster, Stronger SSD SSD: Single Shot MultiBox Detector