Tutorial on Object Detection (Faster R-CNN)

Tutorial
Faster R-CNN
Object Detection: Localization & Classification
Hwa Pyung Kim
Department of Computational Science and Engineering, Yonsei University
hpkim0512@yonsei.ac.kr

𝑥
𝑦
𝑤
ℎ
Bounding box regression (localization):
Where?
Object Detection: Classification + Regression
A dog at (𝒙, 𝒚, 𝒘, 𝒉)
+ =
1
0
0
⋮
Dog
Cat
⋮
Person
Classification (recognition):
What?
Objection Detection
Feature
map
Encoding
(conv&pool)
Combining
features
𝒙, 𝒚
w
h
Bounding box information
• 𝒙, 𝒚 : top left corner position
• w = width
• h = height

Dog
Cat
Person
⋮
pool5 features[224,224,3]
[7,7,512]
Input image
224
224
7 =
224
32
32 = 25
5 = # of pooling
7
7
Vgg16 Networks
Pooling
CNN-based Object Detection:
There are clues of dog (What) at local position (Where)
in the convolution feature map
Fully-connected
layers
Classification
Regression
𝑥
𝑦
𝑤
ℎ
1
0
0
⋮
These red boxes contains clues of “dog at the bounding box (𝑥, 𝑦, 𝑤, ℎ)”.
⋯ ⋯ Dog

Multiple Object Detection:
Localize and Classify all objects appearing in the image
How many objects are in there?
• Classify these multiply overlapping objects
• Identify their bounding boxes
PASCAL VOC2007

Background
Person
Dining table
Extract “region proposals” using
selective search method.
ConvNet
Region based CNN (R-CNN) method
CNN input (fixed size)
Affine image warping: Compute fixed-size CNN input from each region proposal, regardless of the region’s shape
Classifier
&
Regressor
Classifier
&
Regressor
Classifier
&
Regressor

Fast R-CNN
feature map
ConvNet
Classifier &
Regressor
RoI pooling: Convert the features inside valid RoI into a small feature map with a fixed spatial

Faster R-CNN:
Towards Real-Time Object Detection with Region Proposal Networks
feature map
Region Proposal
Network
RoI pooling
proposals
ConvNet
Classifier
&
Regressor
What is Region Proposal Network?

Region Proposal Network (RPN)
Region Proposal Network
380
480 11 =
360
32
, 15 =
480
32
32 = 25
5 = # of pooling
512 = # of filters
15
11
512
Conv feature map
RPN
RPN outputs a set of rectangular object
proposals, each with an objectness score.
How?
Region proposals

Conv feature map
15
11
512
Region Proposals & Anchor Boxes
𝑠 𝑜𝑏𝑗
𝑠 𝑛𝑜𝑏𝑗
t𝑥
t𝑦
t𝑤
tℎ
Fully-
connected
layers
Input: each sliding window
3×3×512
For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 ,
the proposal is parametrized relative to an anchor.
𝑝𝑥 = 𝑎𝑥 + 𝑎𝑤 ⋅ 𝑡𝑥
𝑝𝑦 = 𝑎𝑦 + 𝑎ℎ ⋅ 𝑡𝑦
𝑝𝑤 = 𝑎𝑤 ⋅ exp 𝑡𝑤
𝑝ℎ = 𝑎ℎ ⋅ exp 𝑡ℎ
Output:
• 4 coordinates: 𝑝𝑥 , 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
• 2 scores: 𝑠 𝑜𝑏𝑗
, 𝑠 𝑛𝑜𝑏𝑗
that estimate
probability of object or not object
for each proposal
Anchor box information
• 𝒂𝒙 , 𝒂𝒚 : center position
• 𝒂𝒘 = width
• 𝒂𝒉 = height
Anchor box
For example, 𝑎𝑤 = 𝑎ℎ = 128
• 𝑎𝑤 and 𝑎ℎ are fixed.
• 𝑎𝑥 , 𝑎𝑦 is determined by the
position of the red box

Region Proposals & Anchor Boxes
⋮
𝑠1
𝑜𝑏𝑗
𝑠1
𝑛𝑜𝑏𝑗
t𝑥1
t𝑦1
t𝑤1
tℎ1Conv feature map
15
11
512
Fully-
connected
layers
3×3×512
• 𝑎𝑤𝑖 and 𝑎ℎ𝑖 are fixed.
• 𝑎𝑥𝑖, 𝑎𝑦𝑖 is determined by the
position of the red box
9 Anchor boxes = 3 ratios × 3 scales
For example,
𝑎𝑤1 = 𝑎ℎ1 = 128, 𝑎𝑤2 = 𝑎ℎ2 = 2 × 128, 𝑎𝑤3 = 𝑎ℎ3 = 4 × 128,
𝑎𝑤4 = 2 × 𝑎ℎ4 = 128, ⋯
𝑎𝑤7 =
1
2
× 𝑎ℎ7 = 128, ⋯
Output: For 𝑖 = 1, ⋯ , 9,
• 4 coordinates: 𝑝𝑥𝑖, 𝑝𝑦𝑖, 𝑝𝑤𝑖, 𝑝ℎ𝑖
• 2 scores: 𝑠𝑖
𝑜𝑏𝑗
, 𝑠𝑖
𝑛𝑜𝑏𝑗
that estimate
probability of object or not object
for each proposal
For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 ,
the 9 proposals are parametrized relative to 9 anchors.
Input: each sliding window
𝑠2
𝑜𝑏𝑗
𝑠2
𝑛𝑜𝑏𝑗
t𝑥2
t𝑦2
t𝑤2
tℎ2
𝑠9
𝑜𝑏𝑗
𝑠9
𝑛𝑜𝑏𝑗
t𝑥9
t𝑦9
t𝑤9
tℎ9
For 𝑖 = 1, ⋯ 9,
𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ t𝑥𝑖
𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ t𝑦𝑖
𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp t𝑤𝑖
𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp tℎ𝑖
Anchor box information
• 𝒂𝒙𝒊, 𝒂𝒚𝒊 : center position
• 𝒂𝒘𝒊 = width
• 𝒂𝒉𝒊 = height

Fully-
connected
layers
Conv feature map
Anchor boxes
15
11
512
For 𝑖 = 1, ⋯ 9,
𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ 𝑡𝑥𝑖
𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ 𝑡𝑦𝑖
𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp 𝑡𝑤𝑖
𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp 𝑡ℎ𝑖
𝑝𝑖 =
exp 𝑠𝑖
𝑜𝑏𝑗
exp 𝑠𝑖
𝑜𝑏𝑗
+ exp 𝑠𝑖
𝑛𝑜𝑏𝑗
⋮
𝑝1
𝑝𝑥1
𝑝𝑦1
𝑝𝑤1
𝑝ℎ1
𝑝2
𝑝𝑥2
𝑝𝑦2
𝑝𝑤2
𝑝ℎ2
𝑝9
𝑝𝑥9
𝑝𝑦9
𝑝𝑤9
𝑝ℎ9
Extract 9 Proposals relative to 9 Anchors
Proposals
3×3×512
⋮
𝑠1
𝑜𝑏𝑗
𝑠1
𝑛𝑜𝑏𝑗
t𝑥1
t𝑦1
t𝑤1
tℎ1
𝑠2
𝑜𝑏𝑗
𝑠2
𝑛𝑜𝑏𝑗
t𝑥2
t𝑦2
t𝑤2
tℎ2
𝑠9
𝑜𝑏𝑗
𝑠9
𝑛𝑜𝑏𝑗
t𝑥9
t𝑦9
t𝑤9
tℎ9

⋮
⋮
Total # of windows # of proposals
per a window
Total # of proposals: 11 × 15 × 9 = 1485
Conv feature map
The proposals highly overlaps each other!
Need to reduce redundancy.
Generate Region Proposals
15
11
512
Total#ofwindows=11×15

Reduce redundancy by
Non-Maximum Suppression (NMS)
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173p𝑟𝑜𝑝𝑜𝑠𝑎𝑙1 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙2
⋯
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485
⋯ ⋯
Most probable proposal
Step 1.
Take the most probable proposal from 1485 proposals
Proposal information
• 𝒑𝒙𝒊, 𝒑𝒚𝒊 : top left corner position
• 𝒑𝒘𝒊 = width
• 𝒑𝒉𝒊 = height
• 𝒑𝒊 = objectness probability,
𝒑 𝟏 ≥ 𝒑 𝟐 ≥ 𝒑 𝟏𝟒𝟖𝟓
𝑝𝑥1, 𝑝𝑦1, 𝑝𝑤1, 𝑝ℎ1, 𝑝1 𝑝𝑥2, 𝑝𝑦2, 𝑝𝑤2, 𝑝ℎ2, 𝑝2 𝑝𝑥173, 𝑝𝑦173, 𝑝𝑤173, 𝑝ℎ173, 𝑝173 𝑝𝑥1480, 𝑝𝑦1480, 𝑝𝑤1480, 𝑝ℎ1480, 𝑝1480 𝑝𝑥1485, 𝑝𝑦1485, 𝑝𝑤1485, 𝑝ℎ1485, 𝑝1485

Step 2.
Compute the 𝐼𝑜𝑈 between the most probable and the other proposals,
and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
Step 1.
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480
0.83𝐼𝑂𝑈 = 0.71
⋯ ⋯
0.30 0
⋯

Step 1.
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480
0.830.71
⋯ ⋯
0.30 0
⋯
Step 2.
Compute the 𝐼𝑜𝑈 between the most probable and the other proposals,
and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
𝐼𝑂𝑈 =

Most probable proposal
30 proposals having IoU>0.7
are discarded.
Given the most probable proposal,
the blue proposals have 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
Summary of step 1-2 in NMS.
Step 3:
Get the next most probable proposal among the rest 1485 − 30 proposals & repeat the previous process.
Next most probable proposal
36 proposals having IoU>0.7
are discarded.
Reduce redundancy by NMS

Before NMS After NMS
1,485 proposals 300 proposals
Repeats the previous procedure until…
Reduce redundancy by NMS

Summary of RPN
Inputs:
• Conv feature map
Outputs:
• Region proposals coordinates.
• Probabilities representing how likely the image in that region proposal will be an object.

feature map
Region Proposal
Network
RoI pooling
proposals
ConvNet
Now we are ready to explain
Classifier & Regressor.
Classifier
&
Regressor
Classifier & Regressor

RoI pooling layer
Proposal 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ 𝑝𝑥′
, 𝑝𝑦′
, 𝑝𝑤′
, 𝑝ℎ′
𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
Bilinear interpolation
& Max pooling
Input for
: fixed-size
Conv feature map
Bilinear interpolation
& Max pooling
Convert the features inside valid RoI into a small feature map with a fixed spatial extent.
𝑝𝑥′
= 𝑝𝑥 ⋅
15
, 𝑝𝑦′
= 𝑝𝑦 ⋅
11
, 𝑝𝑤′
= 𝑝𝑤 ⋅
15
, 𝑝ℎ′
= 𝑝ℎ ⋅
11
360
480
11
15
5
8
3
9
7
7
7
7
𝑝𝑥′
, 𝑝𝑦′
, 𝑝𝑤′
, 𝑝ℎ′

⋯
300 RoI pooled feature maps
RoI pooling layer generates
inputs for Classifier & Regressor
7
7
512
7
7
512
7
7
512
7
7
512

⋮
𝑠0
𝑟𝑥0
𝑟𝑦0
𝑟𝑤0
𝑟ℎ0
𝑠15
𝑟𝑥15
𝑟𝑦15
𝑟𝑤15
𝑟ℎ15
𝑠20
𝑟𝑥20
𝑟𝑦20
𝑟𝑤20
𝑟ℎ20
𝑝0 = 0.0124
𝑝15 = 0.9797
𝑝20 = 0.0001
⋮
RoI pooling
Classification & Regression per each proposal
𝑥𝑖 = 𝑝𝑥 + 𝑝𝑤 ⋅ 𝑟𝑥𝑖
𝑦𝑖 = 𝑝𝑦 + 𝑝ℎ ⋅ 𝑟𝑦𝑖
𝑤𝑖 = 𝑝𝑤 ⋅ exp 𝑟𝑤𝑖
ℎ𝑖 = 𝑝ℎ ⋅ exp 𝑟ℎ𝑖
𝑝𝑖 =
exp 𝑠𝑖
𝑗=0
20
exp 𝑠𝑗
Background
Person
TV monitor
𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
Fully-connected
layers
⋮
𝑝0
𝑥0
𝑦0
𝑤0
ℎ0
𝑝15
𝑥15
𝑦15
𝑤15
ℎ15
𝑝20
𝑥20
𝑦20
𝑤20
ℎ20
⋮
Proposal
Classification &
Bounding-box regression
Each of the 21 classes
gets its own refined
bounding-box prediction and
assign estimated probability.
7
7
512
7×7×512
4096

Summary of Classification & Regression
Regress & classify
each class from proposals
⋮
Background
Person
TV monitor
⋮
⋮
Reduce redundancy
by NMS
Dining table
⋮
None
None
Discard bounding boxes
(p < 0.6 or background)
⋮
⋮
⋮
Region Proposals

Summary of Classifier & Regressor
Inputs:
• Conv feature map
• Region proposals
Outputs:
• Bounding boxes coordinate of objects in the image.
• Classification of bounding boxes

Training process for RPN
Ground-truth proposals associated with anchors 𝐴𝑗
𝑘
Find the nearest bounding box from each anchors, 𝐵𝑖
𝑘
= argmax
𝐵∈ 𝐵(𝑘)
𝐼𝑜𝑈 𝐵, 𝐴𝑗
𝑘
• Ground-truth probability of objectness: 𝑝𝑗
(𝑘)
≔
1, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖
𝑘
, 𝐴𝑗
𝑘
> 0.7
0, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖
𝑘
, 𝐴𝑗
𝑘
< 0.3
• Ground-truth proposal transformation: 𝑡𝑗
(𝑘)
≔ 𝑡𝑥𝑗
(𝑘)
, 𝑡𝑦𝑗
(𝑘)
, 𝑡𝑤𝑗
(𝑘)
, 𝑡ℎ𝑗
(𝑘)
where Δ 𝑥𝑗
(𝑘)
= 𝑥𝑖
𝑘
− 𝑎𝑥𝑗
(𝑘)
/𝑎𝑤𝑗
(𝑘)
, Δ 𝑦𝑗
𝑘
= 𝑦𝑖
(𝑘)
− 𝑎𝑦𝑗
(𝑘)
/𝑎ℎ𝑗
(𝑘)
, Δ 𝑤𝑗 = log 𝑤𝑖
𝑘
/𝑎𝑤𝑗
(𝑘)
, Δℎ𝑗
𝑘
= log ℎ𝑖
𝑘
/𝑎ℎ𝑗
(𝑘)
Predicted proposals
• Predicted probability of objectness: 𝑝𝑗
𝑘
• Predicted proposal transformation: 𝑡𝑗
(𝑘)
= 𝑡𝑥𝑗
𝑘
, 𝑡𝑦𝑗
𝑘
, t𝑤𝑗
𝑘
, tℎ𝑗
𝑘
where
𝑡𝑗
𝑘
, 𝑝𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
𝑘
= 𝑅𝑃𝑁 𝐶𝑁𝑁 𝑋 𝑘
; 𝑊𝐶𝑁𝑁 ; 𝑊𝑅𝑃𝑁 ,
Anchor boxes
𝐴(𝑘)
= 𝐴𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
(𝑘)
where A𝑗
𝑘
= 𝑎𝑥𝑗
(𝑘)
, 𝑎𝑦𝑗
(𝑘)
, 𝑎𝑤𝑗
(𝑘)
, 𝑎ℎ𝑗
(𝑘)
Input
• Image 𝑋 𝑘
Ground-truth
• Bounding boxes 𝐵(𝑘)
= 𝐵𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
where 𝐵𝑖
𝑘
= 𝑥𝑖
𝑘
, 𝑦𝑖
𝑘
, 𝑤𝑖
𝑘
, ℎ𝑖
𝑘
• Classes 𝐶(𝑘)
= 𝐶𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
𝐿 𝑅𝑃𝑁 𝑝𝑗
(𝑘)
, 𝑡𝑗
(𝑘)
, 𝑝𝑗
(𝑘)
, 𝑡𝑗
(𝑘)
; 𝑊𝐶𝑁𝑁, 𝑊𝑅𝑃𝑁 =
1
2
𝑗=1
𝑁 𝑏𝑎𝑡𝑐ℎ
𝐻 𝑝𝑗
(𝑘)
, 𝑝𝑗
𝑘
+ 𝜆 𝑅𝑃𝑁
1
𝑁𝑎𝑛𝑐
(𝑘)
𝑗=1
𝑁 𝑏𝑎𝑡𝑐ℎ
𝑝𝑗
𝑘
𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑡𝑗
𝑘
, 𝑡𝑗
𝑘
where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑥 =
0.5𝑥2
, 𝑖𝑓 𝑥 < 1
𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.

Training process for Classifier & Regressor
Input
• Image 𝑋 𝑘
Ground-truth
• Bounding boxes 𝐵(𝑘)
= 𝐵𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
where 𝐵𝑖
𝑘
= 𝑥𝑖
𝑘
, 𝑦𝑖
𝑘
, 𝑤𝑖
𝑘
, ℎ𝑖
𝑘
• Classes 𝐶(𝑘)
= 𝐶𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
c
Ground-truth Classification & Regression associated with proposals 𝑃𝑗
(𝑘)
Find the nearest bounding box from each proposals 𝐵𝑖
𝑘
= argmax
𝐵∈ 𝐵(𝑘)
𝐼𝑜𝑈 𝐵, 𝑃𝑗
𝑘
• Ground-truth Classification: 𝑐𝑗
(𝑘)
≔ 𝑐𝑗,0
(𝑘)
, ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠
(𝑘)
=
1,0, ⋯ , 0 , 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖
𝑘
, 𝑃𝑗
𝑘
< 0.5
0, ⋯ 0,1,0,⋯ , 0 , 𝑜𝑡ℎ𝑒𝑟𝑠
• Ground-truth Regression: 𝑟𝑗
(𝑘)
≔ 𝑟𝑥𝑗
(𝑘)
, 𝑟𝑦𝑗
(𝑘)
, 𝑟𝑤𝑗
(𝑘)
, 𝑟ℎ𝑗
(𝑘)
where 𝑟𝑥𝑗
(𝑘)
= 𝑥𝑖
𝑘
− 𝑝𝑥 𝑗
(𝑘)
/𝑝𝑤𝑗
(𝑘)
, 𝑟𝑦𝑗
𝑘
= 𝑦𝑖
𝑘
− 𝑝𝑦𝑗
(𝑘)
/𝑝ℎ 𝑗
(𝑘)
, 𝑟𝑤𝑗
(𝑘)
= log 𝑤𝑖
𝑘
/𝑝𝑤𝑗
(𝑘)
, 𝑟ℎ𝑗
𝑘
= log ℎ𝑖
𝑘
/𝑝ℎ 𝑗
(𝑘)
𝐶𝑖
𝑘
+ 1 𝑡ℎ 𝑐𝑜𝑚𝑝𝑜𝑒𝑛𝑒𝑡
Predicted Classification & Regression
• Predicted Classification: 𝑐𝑗
𝑘
= 𝑐𝑗,0
𝑘
, ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠
𝑘
• Predicted Regression: 𝑟𝑗
(𝑘)
= r𝑥𝑗
𝑘
, r𝑦𝑗
𝑘
, r𝑤𝑗
𝑘
, rℎ𝑗
𝑘
where
𝑟𝑗
𝑘
, 𝑐𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
𝑘
= 𝐶𝑅 𝐶𝑁𝑁 𝑋 𝑘
; 𝑊𝐶𝑁𝑁 , 𝑃 𝑘
; 𝑊𝐶𝑅
Region Proposals associated with anchors 𝐴𝑗
(𝑘)
P(𝑘)
≔ 𝑃𝑗
𝑘
, 𝑝𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
𝑘
, 𝑃𝑗
𝑘
= 𝑝𝑥 𝑗
𝑘
, 𝑝𝑦𝑗
𝑘
, 𝑝𝑤𝑗
𝑘
, 𝑝ℎ 𝑗
𝑘
where
𝑝𝑥 𝑗
𝑘
= 𝑎𝑥𝑗
(𝑘)
+ 𝑎𝑤𝑗
(𝑘)
𝑡𝑥𝑗
(𝑘)
, 𝑝𝑦𝑗
𝑘
= 𝑎𝑦𝑗
(𝑘)
+ 𝑎ℎ𝑗
(𝑘)
𝑡𝑦𝑗
(𝑘)
𝑝𝑤𝑗
𝑘
= 𝑎𝑤𝑗
𝑘
exp 𝑡𝑤𝑗
(𝑘)
, 𝑝ℎ 𝑗
(𝑘)
= 𝑎ℎ𝑗
(𝑘)
exp 𝑡ℎ𝑗
(𝑘)
𝑃(𝑘)
← 𝑁𝑀𝑆(𝑃 𝑘
, 𝑁𝑝𝑟𝑜𝑝)
𝐿 𝐶𝑅 𝑟𝑗
(𝑘)
, 𝑐𝑗
(𝑘)
, 𝑟𝑗
(𝑘)
, 𝑐𝑗
(𝑘)
; 𝑊𝐶𝑁𝑁, 𝑊𝐶𝑅 =
𝑗=1
𝑁 𝑝𝑟𝑜𝑝
𝐻 𝑐𝑗
𝑘
, 𝑐𝑗
𝑘
+ 𝜆 𝐶𝑅
𝑗=1
𝑁 𝑝𝑟𝑜𝑝
1 − 𝑐𝑗,0
𝑘
𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑟𝑗
𝑘
, 𝑟𝑗
𝑘
where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑥 =
0.5𝑥2
, 𝑖𝑓 𝑥 < 1
𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.

The History of object detection
in deep learning
Yolo Yolo v2 SSD
RCNN
Fast RCNN
Faster RCNN
Mask RCNN
DSSD
2012.12
AlexNet
2014.9
VggNet &
InceptionNet
15.12.10
ResNet
2013.11.11
2015.4.30
2015.5.14
15.6.8 15.12.2515.12.08 17.1.23
17.3.20

Application to Ultrasound-based Fetal biometry

References
[Gitbooks] Object Localization and Detection
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/object_localization_and_detection.html
[ICCV2015 Tutorial] Convolutional Feature Maps
https://courses.engr.illinois.edu/ece420/sp2017/iccv2015_tutorial_convolutional_feature_maps_kaiminghe.pdf
[Infographic] The Modern History of Object Recognition
https://github.com/Nikasa1889/HistoryObjectRecognition
[Tensorflow Code] tf-Faster-RCNN
https://github.com/kevinjliang/tf-Faster-RCNN
[Medium] A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN
https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4
[pyimagesearch] Intersection over Union (IoU) for object detection
https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
[Stanford c231n] Lecture 11: Detection and Segmentation
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

Thank you
E-mail: hpkim0512@yonsei.ac.kr/
Hompage: https://hpkim0512.blogspot.com

Tutorial on Object Detection (Faster R-CNN)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tutorial on Object Detection (Faster R-CNN)

Similar to Tutorial on Object Detection (Faster R-CNN) (20)

Recently uploaded

Recently uploaded (20)

Tutorial on Object Detection (Faster R-CNN)