SlideShare a Scribd company logo
1 of 30
Tutorial
Faster R-CNN
Object Detection: Localization & Classification
Hwa Pyung Kim
Department of Computational Science and Engineering, Yonsei University
hpkim0512@yonsei.ac.kr
𝑥
𝑦
𝑤
ℎ
Bounding box regression (localization):
Where?
Object Detection: Classification + Regression
A dog at (𝒙, 𝒚, 𝒘, 𝒉)
+ =
1
0
0
⋮
Dog
Cat
⋮
Person
Classification (recognition):
What?
Objection Detection
Feature
map
Encoding
(conv&pool)
Combining
features
𝒙, 𝒚
w
h
Bounding box information
• 𝒙, 𝒚 : top left corner position
• w = width
• h = height
Dog
Cat
Person
⋮
pool5 features[224,224,3]
[7,7,512]
Input image
224
224
7 =
224
32
32 = 25
5 = # of pooling
7
7
Vgg16 Networks
Pooling
CNN-based Object Detection:
There are clues of dog (What) at local position (Where)
in the convolution feature map
Fully-connected
layers
Classification
Regression
𝑥
𝑦
𝑤
ℎ
1
0
0
⋮
These red boxes contains clues of “dog at the bounding box (𝑥, 𝑦, 𝑤, ℎ)”.
⋯ ⋯ Dog
Multiple Object Detection:
Localize and Classify all objects appearing in the image
How many objects are in there?
• Classify these multiply overlapping objects
• Identify their bounding boxes
PASCAL VOC2007
Background
Person
Dining table
Extract “region proposals” using
selective search method.
ConvNet
Region based CNN (R-CNN) method
CNN input (fixed size)
Affine image warping: Compute fixed-size CNN input from each region proposal, regardless of the region’s shape
Classifier
&
Regressor
Classifier
&
Regressor
Classifier
&
Regressor
Fast R-CNN
feature map
ConvNet
Classifier &
Regressor
RoI pooling: Convert the features inside valid RoI into a small feature map with a fixed spatial
Faster R-CNN:
Towards Real-Time Object Detection with Region Proposal Networks
feature map
Region Proposal
Network
RoI pooling
proposals
ConvNet
Classifier
&
Regressor
What is Region Proposal Network?
Region Proposal Network (RPN)
Region Proposal Network
380
480 11 =
360
32
, 15 =
480
32
32 = 25
5 = # of pooling
512 = # of filters
15
11
512
Conv feature map
RPN
RPN outputs a set of rectangular object
proposals, each with an objectness score.
How?
Region proposals
Region Proposal Network
Conv feature map
15
11
512
Region Proposals & Anchor Boxes
𝑠 𝑜𝑏𝑗
𝑠 𝑛𝑜𝑏𝑗
t𝑥
t𝑦
t𝑤
tℎ
Fully-
connected
layers
Input: each sliding window
3×3×512
For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 ,
the proposal is parametrized relative to an anchor.
𝑝𝑥 = 𝑎𝑥 + 𝑎𝑤 ⋅ 𝑡𝑥
𝑝𝑦 = 𝑎𝑦 + 𝑎ℎ ⋅ 𝑡𝑦
𝑝𝑤 = 𝑎𝑤 ⋅ exp 𝑡𝑤
𝑝ℎ = 𝑎ℎ ⋅ exp 𝑡ℎ
Output:
• 4 coordinates: 𝑝𝑥 , 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
• 2 scores: 𝑠 𝑜𝑏𝑗
, 𝑠 𝑛𝑜𝑏𝑗
that estimate
probability of object or not object
for each proposal
Anchor box information
• 𝒂𝒙 , 𝒂𝒚 : center position
• 𝒂𝒘 = width
• 𝒂𝒉 = height
Anchor box
For example, 𝑎𝑤 = 𝑎ℎ = 128
• 𝑎𝑤 and 𝑎ℎ are fixed.
• 𝑎𝑥 , 𝑎𝑦 is determined by the
position of the red box
Region Proposals & Anchor Boxes
⋮
𝑠1
𝑜𝑏𝑗
𝑠1
𝑛𝑜𝑏𝑗
t𝑥1
t𝑦1
t𝑤1
tℎ1Conv feature map
15
11
512
Fully-
connected
layers
3×3×512
• 𝑎𝑤𝑖 and 𝑎ℎ𝑖 are fixed.
• 𝑎𝑥𝑖, 𝑎𝑦𝑖 is determined by the
position of the red box
9 Anchor boxes = 3 ratios × 3 scales
For example,
𝑎𝑤1 = 𝑎ℎ1 = 128, 𝑎𝑤2 = 𝑎ℎ2 = 2 × 128, 𝑎𝑤3 = 𝑎ℎ3 = 4 × 128,
𝑎𝑤4 = 2 × 𝑎ℎ4 = 128, ⋯
𝑎𝑤7 =
1
2
× 𝑎ℎ7 = 128, ⋯
Output: For 𝑖 = 1, ⋯ , 9,
• 4 coordinates: 𝑝𝑥𝑖, 𝑝𝑦𝑖, 𝑝𝑤𝑖, 𝑝ℎ𝑖
• 2 scores: 𝑠𝑖
𝑜𝑏𝑗
, 𝑠𝑖
𝑛𝑜𝑏𝑗
that estimate
probability of object or not object
for each proposal
For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 ,
the 9 proposals are parametrized relative to 9 anchors.
Input: each sliding window
Region Proposal Network
𝑠2
𝑜𝑏𝑗
𝑠2
𝑛𝑜𝑏𝑗
t𝑥2
t𝑦2
t𝑤2
tℎ2
𝑠9
𝑜𝑏𝑗
𝑠9
𝑛𝑜𝑏𝑗
t𝑥9
t𝑦9
t𝑤9
tℎ9
For 𝑖 = 1, ⋯ 9,
𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ t𝑥𝑖
𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ t𝑦𝑖
𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp t𝑤𝑖
𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp tℎ𝑖
Anchor box information
• 𝒂𝒙𝒊, 𝒂𝒚𝒊 : center position
• 𝒂𝒘𝒊 = width
• 𝒂𝒉𝒊 = height
Region Proposal Network
Fully-
connected
layers
Conv feature map
Anchor boxes
15
11
512
For 𝑖 = 1, ⋯ 9,
𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ 𝑡𝑥𝑖
𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ 𝑡𝑦𝑖
𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp 𝑡𝑤𝑖
𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp 𝑡ℎ𝑖
𝑝𝑖 =
exp 𝑠𝑖
𝑜𝑏𝑗
exp 𝑠𝑖
𝑜𝑏𝑗
+ exp 𝑠𝑖
𝑛𝑜𝑏𝑗
⋮
𝑝1
𝑝𝑥1
𝑝𝑦1
𝑝𝑤1
𝑝ℎ1
𝑝2
𝑝𝑥2
𝑝𝑦2
𝑝𝑤2
𝑝ℎ2
𝑝9
𝑝𝑥9
𝑝𝑦9
𝑝𝑤9
𝑝ℎ9
Extract 9 Proposals relative to 9 Anchors
Proposals
3×3×512
⋮
𝑠1
𝑜𝑏𝑗
𝑠1
𝑛𝑜𝑏𝑗
t𝑥1
t𝑦1
t𝑤1
tℎ1
𝑠2
𝑜𝑏𝑗
𝑠2
𝑛𝑜𝑏𝑗
t𝑥2
t𝑦2
t𝑤2
tℎ2
𝑠9
𝑜𝑏𝑗
𝑠9
𝑛𝑜𝑏𝑗
t𝑥9
t𝑦9
t𝑤9
tℎ9
⋮
⋮
Total # of windows # of proposals
per a window
Total # of proposals: 11 × 15 × 9 = 1485
Conv feature map
The proposals highly overlaps each other!
Need to reduce redundancy.
Generate Region Proposals
15
11
512
Total#ofwindows=11×15
Region Proposal Network
Reduce redundancy by
Non-Maximum Suppression (NMS)
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173p𝑟𝑜𝑝𝑜𝑠𝑎𝑙1 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙2
⋯
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485
⋯ ⋯
Most probable proposal
Region Proposal Network
Step 1.
Take the most probable proposal from 1485 proposals
Proposal information
• 𝒑𝒙𝒊, 𝒑𝒚𝒊 : top left corner position
• 𝒑𝒘𝒊 = width
• 𝒑𝒉𝒊 = height
• 𝒑𝒊 = objectness probability,
𝒑 𝟏 ≥ 𝒑 𝟐 ≥ 𝒑 𝟏𝟒𝟖𝟓
𝑝𝑥1, 𝑝𝑦1, 𝑝𝑤1, 𝑝ℎ1, 𝑝1 𝑝𝑥2, 𝑝𝑦2, 𝑝𝑤2, 𝑝ℎ2, 𝑝2 𝑝𝑥173, 𝑝𝑦173, 𝑝𝑤173, 𝑝ℎ173, 𝑝173 𝑝𝑥1480, 𝑝𝑦1480, 𝑝𝑤1480, 𝑝ℎ1480, 𝑝1480 𝑝𝑥1485, 𝑝𝑦1485, 𝑝𝑤1485, 𝑝ℎ1485, 𝑝1485
Region Proposal Network
Step 2.
Compute the 𝐼𝑜𝑈 between the most probable and the other proposals,
and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
Step 1.
Take the most probable proposal from 1485 proposals
Reduce redundancy by
Non-Maximum Suppression (NMS)
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480
0.83𝐼𝑂𝑈 = 0.71
⋯ ⋯
0.30 0
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485
⋯
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2
Region Proposal Network
Step 1.
Take the most probable proposal from 1485 proposals
Reduce redundancy by
Non-Maximum Suppression (NMS)
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480
0.830.71
⋯ ⋯
0.30 0
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485
⋯
𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2
Step 2.
Compute the 𝐼𝑜𝑈 between the most probable and the other proposals,
and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
𝐼𝑂𝑈 =
Most probable proposal
30 proposals having IoU>0.7
are discarded.
Region Proposal Network
Given the most probable proposal,
the blue proposals have 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7)
Summary of step 1-2 in NMS.
Step 3:
Get the next most probable proposal among the rest 1485 − 30 proposals & repeat the previous process.
Next most probable proposal
36 proposals having IoU>0.7
are discarded.
Reduce redundancy by NMS
Region Proposal Network
Before NMS After NMS
1,485 proposals 300 proposals
Repeats the previous procedure until…
Reduce redundancy by NMS
Summary of RPN
Inputs:
• Conv feature map
Outputs:
• Region proposals coordinates.
• Probabilities representing how likely the image in that region proposal will be an object.
Region Proposal Network
feature map
Region Proposal
Network
RoI pooling
proposals
ConvNet
Now we are ready to explain
Classifier & Regressor.
Classifier
&
Regressor
Classifier & Regressor
RoI pooling layer
Proposal 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ 𝑝𝑥′
, 𝑝𝑦′
, 𝑝𝑤′
, 𝑝ℎ′
𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
Classifier & Regressor
Bilinear interpolation
& Max pooling
Input for
Classifier & Regressor
: fixed-size
Conv feature map
Bilinear interpolation
& Max pooling
Convert the features inside valid RoI into a small feature map with a fixed spatial extent.
𝑝𝑥′
= 𝑝𝑥 ⋅
15
, 𝑝𝑦′
= 𝑝𝑦 ⋅
11
, 𝑝𝑤′
= 𝑝𝑤 ⋅
15
, 𝑝ℎ′
= 𝑝ℎ ⋅
11
360
480
11
15
5
8
3
9
7
7
7
7
𝑝𝑥′
, 𝑝𝑦′
, 𝑝𝑤′
, 𝑝ℎ′
⋯
300 RoI pooled feature maps
RoI pooling layer generates
inputs for Classifier & Regressor
Classifier & Regressor
7
7
512
7
7
512
7
7
512
7
7
512
⋮
𝑠0
𝑟𝑥0
𝑟𝑦0
𝑟𝑤0
𝑟ℎ0
𝑠15
𝑟𝑥15
𝑟𝑦15
𝑟𝑤15
𝑟ℎ15
𝑠20
𝑟𝑥20
𝑟𝑦20
𝑟𝑤20
𝑟ℎ20
𝑝0 = 0.0124
𝑝15 = 0.9797
𝑝20 = 0.0001
⋮
RoI pooling
Classification & Regression per each proposal
𝑥𝑖 = 𝑝𝑥 + 𝑝𝑤 ⋅ 𝑟𝑥𝑖
𝑦𝑖 = 𝑝𝑦 + 𝑝ℎ ⋅ 𝑟𝑦𝑖
𝑤𝑖 = 𝑝𝑤 ⋅ exp 𝑟𝑤𝑖
ℎ𝑖 = 𝑝ℎ ⋅ exp 𝑟ℎ𝑖
𝑝𝑖 =
exp 𝑠𝑖
𝑗=0
20
exp 𝑠𝑗
Background
Person
TV monitor
𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ
Fully-connected
layers
⋮
𝑝0
𝑥0
𝑦0
𝑤0
ℎ0
𝑝15
𝑥15
𝑦15
𝑤15
ℎ15
𝑝20
𝑥20
𝑦20
𝑤20
ℎ20
⋮
Proposal
Classification &
Bounding-box regression
Each of the 21 classes
gets its own refined
bounding-box prediction and
assign estimated probability.
Classifier & Regressor
7
7
512
7×7×512
4096
Summary of Classification & Regression
Regress & classify
each class from proposals
⋮
Background
Person
TV monitor
⋮
⋮
Reduce redundancy
by NMS
Dining table
⋮
None
None
Classifier & Regressor
Discard bounding boxes
(p < 0.6 or background)
⋮
⋮
⋮
Region Proposals
Summary of Classifier & Regressor
Inputs:
• Conv feature map
• Region proposals
Outputs:
• Bounding boxes coordinate of objects in the image.
• Classification of bounding boxes
Classifier & Regressor
Training process for RPN
Ground-truth proposals associated with anchors 𝐴𝑗
𝑘
Find the nearest bounding box from each anchors, 𝐵𝑖
𝑘
= argmax
𝐵∈ 𝐵(𝑘)
𝐼𝑜𝑈 𝐵, 𝐴𝑗
𝑘
• Ground-truth probability of objectness: 𝑝𝑗
(𝑘)
≔
1, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖
𝑘
, 𝐴𝑗
𝑘
> 0.7
0, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖
𝑘
, 𝐴𝑗
𝑘
< 0.3
• Ground-truth proposal transformation: 𝑡𝑗
(𝑘)
≔ 𝑡𝑥𝑗
(𝑘)
, 𝑡𝑦𝑗
(𝑘)
, 𝑡𝑤𝑗
(𝑘)
, 𝑡ℎ𝑗
(𝑘)
where Δ 𝑥𝑗
(𝑘)
= 𝑥𝑖
𝑘
− 𝑎𝑥𝑗
(𝑘)
/𝑎𝑤𝑗
(𝑘)
, Δ 𝑦𝑗
𝑘
= 𝑦𝑖
(𝑘)
− 𝑎𝑦𝑗
(𝑘)
/𝑎ℎ𝑗
(𝑘)
, Δ 𝑤𝑗 = log 𝑤𝑖
𝑘
/𝑎𝑤𝑗
(𝑘)
, Δℎ𝑗
𝑘
= log ℎ𝑖
𝑘
/𝑎ℎ𝑗
(𝑘)
Predicted proposals
• Predicted probability of objectness: 𝑝𝑗
𝑘
• Predicted proposal transformation: 𝑡𝑗
(𝑘)
= 𝑡𝑥𝑗
𝑘
, 𝑡𝑦𝑗
𝑘
, t𝑤𝑗
𝑘
, tℎ𝑗
𝑘
where
𝑡𝑗
𝑘
, 𝑝𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
𝑘
= 𝑅𝑃𝑁 𝐶𝑁𝑁 𝑋 𝑘
; 𝑊𝐶𝑁𝑁 ; 𝑊𝑅𝑃𝑁 ,
Anchor boxes
𝐴(𝑘)
= 𝐴𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
(𝑘)
where A𝑗
𝑘
= 𝑎𝑥𝑗
(𝑘)
, 𝑎𝑦𝑗
(𝑘)
, 𝑎𝑤𝑗
(𝑘)
, 𝑎ℎ𝑗
(𝑘)
Input
• Image 𝑋 𝑘
Ground-truth
• Bounding boxes 𝐵(𝑘)
= 𝐵𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
where 𝐵𝑖
𝑘
= 𝑥𝑖
𝑘
, 𝑦𝑖
𝑘
, 𝑤𝑖
𝑘
, ℎ𝑖
𝑘
• Classes 𝐶(𝑘)
= 𝐶𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
𝐿 𝑅𝑃𝑁 𝑝𝑗
(𝑘)
, 𝑡𝑗
(𝑘)
, 𝑝𝑗
(𝑘)
, 𝑡𝑗
(𝑘)
; 𝑊𝐶𝑁𝑁, 𝑊𝑅𝑃𝑁 =
1
2
𝑗=1
𝑁 𝑏𝑎𝑡𝑐ℎ
𝐻 𝑝𝑗
(𝑘)
, 𝑝𝑗
𝑘
+ 𝜆 𝑅𝑃𝑁
1
𝑁𝑎𝑛𝑐
(𝑘)
𝑗=1
𝑁 𝑏𝑎𝑡𝑐ℎ
𝑝𝑗
𝑘
𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑡𝑗
𝑘
, 𝑡𝑗
𝑘
where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑥 =
0.5𝑥2
, 𝑖𝑓 𝑥 < 1
𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
Training process for Classifier & Regressor
Input
• Image 𝑋 𝑘
Ground-truth
• Bounding boxes 𝐵(𝑘)
= 𝐵𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
where 𝐵𝑖
𝑘
= 𝑥𝑖
𝑘
, 𝑦𝑖
𝑘
, 𝑤𝑖
𝑘
, ℎ𝑖
𝑘
• Classes 𝐶(𝑘)
= 𝐶𝑖
𝑘
𝑖=1
𝑁 𝑜𝑏𝑗
(𝑘)
c
Ground-truth Classification & Regression associated with proposals 𝑃𝑗
(𝑘)
Find the nearest bounding box from each proposals 𝐵𝑖
𝑘
= argmax
𝐵∈ 𝐵(𝑘)
𝐼𝑜𝑈 𝐵, 𝑃𝑗
𝑘
• Ground-truth Classification: 𝑐𝑗
(𝑘)
≔ 𝑐𝑗,0
(𝑘)
, ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠
(𝑘)
=
1,0, ⋯ , 0 , 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖
𝑘
, 𝑃𝑗
𝑘
< 0.5
0, ⋯ 0,1,0,⋯ , 0 , 𝑜𝑡ℎ𝑒𝑟𝑠
• Ground-truth Regression: 𝑟𝑗
(𝑘)
≔ 𝑟𝑥𝑗
(𝑘)
, 𝑟𝑦𝑗
(𝑘)
, 𝑟𝑤𝑗
(𝑘)
, 𝑟ℎ𝑗
(𝑘)
where 𝑟𝑥𝑗
(𝑘)
= 𝑥𝑖
𝑘
− 𝑝𝑥 𝑗
(𝑘)
/𝑝𝑤𝑗
(𝑘)
, 𝑟𝑦𝑗
𝑘
= 𝑦𝑖
𝑘
− 𝑝𝑦𝑗
(𝑘)
/𝑝ℎ 𝑗
(𝑘)
, 𝑟𝑤𝑗
(𝑘)
= log 𝑤𝑖
𝑘
/𝑝𝑤𝑗
(𝑘)
, 𝑟ℎ𝑗
𝑘
= log ℎ𝑖
𝑘
/𝑝ℎ 𝑗
(𝑘)
𝐶𝑖
𝑘
+ 1 𝑡ℎ 𝑐𝑜𝑚𝑝𝑜𝑒𝑛𝑒𝑡
Predicted Classification & Regression
• Predicted Classification: 𝑐𝑗
𝑘
= 𝑐𝑗,0
𝑘
, ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠
𝑘
• Predicted Regression: 𝑟𝑗
(𝑘)
= r𝑥𝑗
𝑘
, r𝑦𝑗
𝑘
, r𝑤𝑗
𝑘
, rℎ𝑗
𝑘
where
𝑟𝑗
𝑘
, 𝑐𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
𝑘
= 𝐶𝑅 𝐶𝑁𝑁 𝑋 𝑘
; 𝑊𝐶𝑁𝑁 , 𝑃 𝑘
; 𝑊𝐶𝑅
Region Proposals associated with anchors 𝐴𝑗
(𝑘)
P(𝑘)
≔ 𝑃𝑗
𝑘
, 𝑝𝑗
𝑘
𝑗=1
𝑁 𝑎𝑛𝑐
𝑘
, 𝑃𝑗
𝑘
= 𝑝𝑥 𝑗
𝑘
, 𝑝𝑦𝑗
𝑘
, 𝑝𝑤𝑗
𝑘
, 𝑝ℎ 𝑗
𝑘
where
𝑝𝑥 𝑗
𝑘
= 𝑎𝑥𝑗
(𝑘)
+ 𝑎𝑤𝑗
(𝑘)
𝑡𝑥𝑗
(𝑘)
, 𝑝𝑦𝑗
𝑘
= 𝑎𝑦𝑗
(𝑘)
+ 𝑎ℎ𝑗
(𝑘)
𝑡𝑦𝑗
(𝑘)
𝑝𝑤𝑗
𝑘
= 𝑎𝑤𝑗
𝑘
exp 𝑡𝑤𝑗
(𝑘)
, 𝑝ℎ 𝑗
(𝑘)
= 𝑎ℎ𝑗
(𝑘)
exp 𝑡ℎ𝑗
(𝑘)
𝑃(𝑘)
← 𝑁𝑀𝑆(𝑃 𝑘
, 𝑁𝑝𝑟𝑜𝑝)
𝐿 𝐶𝑅 𝑟𝑗
(𝑘)
, 𝑐𝑗
(𝑘)
, 𝑟𝑗
(𝑘)
, 𝑐𝑗
(𝑘)
; 𝑊𝐶𝑁𝑁, 𝑊𝐶𝑅 =
𝑗=1
𝑁 𝑝𝑟𝑜𝑝
𝐻 𝑐𝑗
𝑘
, 𝑐𝑗
𝑘
+ 𝜆 𝐶𝑅
𝑗=1
𝑁 𝑝𝑟𝑜𝑝
1 − 𝑐𝑗,0
𝑘
𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑟𝑗
𝑘
, 𝑟𝑗
𝑘
where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1
𝑥 =
0.5𝑥2
, 𝑖𝑓 𝑥 < 1
𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
The History of object detection
in deep learning
Yolo Yolo v2 SSD
RCNN
Fast RCNN
Faster RCNN
Mask RCNN
DSSD
2012.12
AlexNet
2014.9
VggNet &
InceptionNet
15.12.10
ResNet
2013.11.11
2015.4.30
2015.5.14
15.6.8 15.12.2515.12.08 17.1.23
17.3.20
Application to Ultrasound-based Fetal biometry
References
[Gitbooks] Object Localization and Detection
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/object_localization_and_detection.html
[ICCV2015 Tutorial] Convolutional Feature Maps
https://courses.engr.illinois.edu/ece420/sp2017/iccv2015_tutorial_convolutional_feature_maps_kaiminghe.pdf
[Infographic] The Modern History of Object Recognition
https://github.com/Nikasa1889/HistoryObjectRecognition
[Tensorflow Code] tf-Faster-RCNN
https://github.com/kevinjliang/tf-Faster-RCNN
[Medium] A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN
https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4
[pyimagesearch] Intersection over Union (IoU) for object detection
https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
[Stanford c231n] Lecture 11: Detection and Segmentation
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Thank you
E-mail: hpkim0512@yonsei.ac.kr/
Hompage: https://hpkim0512.blogspot.com

More Related Content

What's hot

Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421穗碧 陳
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023Jinwon Lee
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNNanna8885
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionDADAJONJURAKUZIEV
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaPreferred Networks
 
Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detectionchettykulkarni
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep LearningSungjoon Choi
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
 

What's hot (20)

Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 
Faster rcnn
Faster rcnnFaster rcnn
Faster rcnn
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
You only look once
You only look onceYou only look once
You only look once
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detection
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
R-CNN
R-CNNR-CNN
R-CNN
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 

Similar to Tutorial on Object Detection (Faster R-CNN)

CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics PipelineMark Kilgard
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxSubrata Kumer Paul
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxGopiNathVelivela
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)Cory Cook
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdfRahul926331
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Approximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher DimensionsApproximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher DimensionsShrey Verma
 
designanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptxdesignanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptxarifimad15
 
Sketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignmentSketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignmentssuser2be88c
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingHsing-chuan Hsieh
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...ssuser2624f71
 
Clipping & Rasterization
Clipping & RasterizationClipping & Rasterization
Clipping & RasterizationAhmed Daoud
 

Similar to Tutorial on Object Detection (Faster R-CNN) (20)

CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics Pipeline
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptx
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptx
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
 
VoxelNet
VoxelNetVoxelNet
VoxelNet
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Approximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher DimensionsApproximate Nearest Neighbour in Higher Dimensions
Approximate Nearest Neighbour in Higher Dimensions
 
lecture_20.pptx
lecture_20.pptxlecture_20.pptx
lecture_20.pptx
 
lecture_20.pptx
lecture_20.pptxlecture_20.pptx
lecture_20.pptx
 
designanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptxdesignanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptx
 
Sketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignmentSketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignment
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
LSH
LSHLSH
LSH
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
Clipping & Rasterization
Clipping & RasterizationClipping & Rasterization
Clipping & Rasterization
 

Recently uploaded

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 

Recently uploaded (20)

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 

Tutorial on Object Detection (Faster R-CNN)

  • 1. Tutorial Faster R-CNN Object Detection: Localization & Classification Hwa Pyung Kim Department of Computational Science and Engineering, Yonsei University hpkim0512@yonsei.ac.kr
  • 2. 𝑥 𝑦 𝑤 ℎ Bounding box regression (localization): Where? Object Detection: Classification + Regression A dog at (𝒙, 𝒚, 𝒘, 𝒉) + = 1 0 0 ⋮ Dog Cat ⋮ Person Classification (recognition): What? Objection Detection Feature map Encoding (conv&pool) Combining features 𝒙, 𝒚 w h Bounding box information • 𝒙, 𝒚 : top left corner position • w = width • h = height
  • 3. Dog Cat Person ⋮ pool5 features[224,224,3] [7,7,512] Input image 224 224 7 = 224 32 32 = 25 5 = # of pooling 7 7 Vgg16 Networks Pooling CNN-based Object Detection: There are clues of dog (What) at local position (Where) in the convolution feature map Fully-connected layers Classification Regression 𝑥 𝑦 𝑤 ℎ 1 0 0 ⋮ These red boxes contains clues of “dog at the bounding box (𝑥, 𝑦, 𝑤, ℎ)”. ⋯ ⋯ Dog
  • 4. Multiple Object Detection: Localize and Classify all objects appearing in the image How many objects are in there? • Classify these multiply overlapping objects • Identify their bounding boxes PASCAL VOC2007
  • 5. Background Person Dining table Extract “region proposals” using selective search method. ConvNet Region based CNN (R-CNN) method CNN input (fixed size) Affine image warping: Compute fixed-size CNN input from each region proposal, regardless of the region’s shape Classifier & Regressor Classifier & Regressor Classifier & Regressor
  • 6. Fast R-CNN feature map ConvNet Classifier & Regressor RoI pooling: Convert the features inside valid RoI into a small feature map with a fixed spatial
  • 7. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks feature map Region Proposal Network RoI pooling proposals ConvNet Classifier & Regressor What is Region Proposal Network?
  • 8. Region Proposal Network (RPN) Region Proposal Network 380 480 11 = 360 32 , 15 = 480 32 32 = 25 5 = # of pooling 512 = # of filters 15 11 512 Conv feature map RPN RPN outputs a set of rectangular object proposals, each with an objectness score. How? Region proposals
  • 9. Region Proposal Network Conv feature map 15 11 512 Region Proposals & Anchor Boxes 𝑠 𝑜𝑏𝑗 𝑠 𝑛𝑜𝑏𝑗 t𝑥 t𝑦 t𝑤 tℎ Fully- connected layers Input: each sliding window 3×3×512 For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 , the proposal is parametrized relative to an anchor. 𝑝𝑥 = 𝑎𝑥 + 𝑎𝑤 ⋅ 𝑡𝑥 𝑝𝑦 = 𝑎𝑦 + 𝑎ℎ ⋅ 𝑡𝑦 𝑝𝑤 = 𝑎𝑤 ⋅ exp 𝑡𝑤 𝑝ℎ = 𝑎ℎ ⋅ exp 𝑡ℎ Output: • 4 coordinates: 𝑝𝑥 , 𝑝𝑦, 𝑝𝑤, 𝑝ℎ • 2 scores: 𝑠 𝑜𝑏𝑗 , 𝑠 𝑛𝑜𝑏𝑗 that estimate probability of object or not object for each proposal Anchor box information • 𝒂𝒙 , 𝒂𝒚 : center position • 𝒂𝒘 = width • 𝒂𝒉 = height Anchor box For example, 𝑎𝑤 = 𝑎ℎ = 128 • 𝑎𝑤 and 𝑎ℎ are fixed. • 𝑎𝑥 , 𝑎𝑦 is determined by the position of the red box
  • 10. Region Proposals & Anchor Boxes ⋮ 𝑠1 𝑜𝑏𝑗 𝑠1 𝑛𝑜𝑏𝑗 t𝑥1 t𝑦1 t𝑤1 tℎ1Conv feature map 15 11 512 Fully- connected layers 3×3×512 • 𝑎𝑤𝑖 and 𝑎ℎ𝑖 are fixed. • 𝑎𝑥𝑖, 𝑎𝑦𝑖 is determined by the position of the red box 9 Anchor boxes = 3 ratios × 3 scales For example, 𝑎𝑤1 = 𝑎ℎ1 = 128, 𝑎𝑤2 = 𝑎ℎ2 = 2 × 128, 𝑎𝑤3 = 𝑎ℎ3 = 4 × 128, 𝑎𝑤4 = 2 × 𝑎ℎ4 = 128, ⋯ 𝑎𝑤7 = 1 2 × 𝑎ℎ7 = 128, ⋯ Output: For 𝑖 = 1, ⋯ , 9, • 4 coordinates: 𝑝𝑥𝑖, 𝑝𝑦𝑖, 𝑝𝑤𝑖, 𝑝ℎ𝑖 • 2 scores: 𝑠𝑖 𝑜𝑏𝑗 , 𝑠𝑖 𝑛𝑜𝑏𝑗 that estimate probability of object or not object for each proposal For each sliding window (red cuboid) expressed by a vector 𝟑 × 𝟑 × 𝟓𝟏𝟐 , the 9 proposals are parametrized relative to 9 anchors. Input: each sliding window Region Proposal Network 𝑠2 𝑜𝑏𝑗 𝑠2 𝑛𝑜𝑏𝑗 t𝑥2 t𝑦2 t𝑤2 tℎ2 𝑠9 𝑜𝑏𝑗 𝑠9 𝑛𝑜𝑏𝑗 t𝑥9 t𝑦9 t𝑤9 tℎ9 For 𝑖 = 1, ⋯ 9, 𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ t𝑥𝑖 𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ t𝑦𝑖 𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp t𝑤𝑖 𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp tℎ𝑖 Anchor box information • 𝒂𝒙𝒊, 𝒂𝒚𝒊 : center position • 𝒂𝒘𝒊 = width • 𝒂𝒉𝒊 = height
  • 11. Region Proposal Network Fully- connected layers Conv feature map Anchor boxes 15 11 512 For 𝑖 = 1, ⋯ 9, 𝑝𝑥𝑖 = 𝑎𝑥𝑖 + 𝑎𝑤𝑖 ⋅ 𝑡𝑥𝑖 𝑝𝑦𝑖 = 𝑎𝑦𝑖 + 𝑎ℎ𝑖 ⋅ 𝑡𝑦𝑖 𝑝𝑤𝑖 = 𝑎𝑤𝑖 ⋅ exp 𝑡𝑤𝑖 𝑝ℎ𝑖 = 𝑎ℎ𝑖 ⋅ exp 𝑡ℎ𝑖 𝑝𝑖 = exp 𝑠𝑖 𝑜𝑏𝑗 exp 𝑠𝑖 𝑜𝑏𝑗 + exp 𝑠𝑖 𝑛𝑜𝑏𝑗 ⋮ 𝑝1 𝑝𝑥1 𝑝𝑦1 𝑝𝑤1 𝑝ℎ1 𝑝2 𝑝𝑥2 𝑝𝑦2 𝑝𝑤2 𝑝ℎ2 𝑝9 𝑝𝑥9 𝑝𝑦9 𝑝𝑤9 𝑝ℎ9 Extract 9 Proposals relative to 9 Anchors Proposals 3×3×512 ⋮ 𝑠1 𝑜𝑏𝑗 𝑠1 𝑛𝑜𝑏𝑗 t𝑥1 t𝑦1 t𝑤1 tℎ1 𝑠2 𝑜𝑏𝑗 𝑠2 𝑛𝑜𝑏𝑗 t𝑥2 t𝑦2 t𝑤2 tℎ2 𝑠9 𝑜𝑏𝑗 𝑠9 𝑛𝑜𝑏𝑗 t𝑥9 t𝑦9 t𝑤9 tℎ9
  • 12. ⋮ ⋮ Total # of windows # of proposals per a window Total # of proposals: 11 × 15 × 9 = 1485 Conv feature map The proposals highly overlaps each other! Need to reduce redundancy. Generate Region Proposals 15 11 512 Total#ofwindows=11×15 Region Proposal Network
  • 13. Reduce redundancy by Non-Maximum Suppression (NMS) 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173p𝑟𝑜𝑝𝑜𝑠𝑎𝑙1 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙2 ⋯ 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485 ⋯ ⋯ Most probable proposal Region Proposal Network Step 1. Take the most probable proposal from 1485 proposals Proposal information • 𝒑𝒙𝒊, 𝒑𝒚𝒊 : top left corner position • 𝒑𝒘𝒊 = width • 𝒑𝒉𝒊 = height • 𝒑𝒊 = objectness probability, 𝒑 𝟏 ≥ 𝒑 𝟐 ≥ 𝒑 𝟏𝟒𝟖𝟓 𝑝𝑥1, 𝑝𝑦1, 𝑝𝑤1, 𝑝ℎ1, 𝑝1 𝑝𝑥2, 𝑝𝑦2, 𝑝𝑤2, 𝑝ℎ2, 𝑝2 𝑝𝑥173, 𝑝𝑦173, 𝑝𝑤173, 𝑝ℎ173, 𝑝173 𝑝𝑥1480, 𝑝𝑦1480, 𝑝𝑤1480, 𝑝ℎ1480, 𝑝1480 𝑝𝑥1485, 𝑝𝑦1485, 𝑝𝑤1485, 𝑝ℎ1485, 𝑝1485
  • 14. Region Proposal Network Step 2. Compute the 𝐼𝑜𝑈 between the most probable and the other proposals, and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7) Step 1. Take the most probable proposal from 1485 proposals Reduce redundancy by Non-Maximum Suppression (NMS) 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480 0.83𝐼𝑂𝑈 = 0.71 ⋯ ⋯ 0.30 0 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485 ⋯ 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2
  • 15. Region Proposal Network Step 1. Take the most probable proposal from 1485 proposals Reduce redundancy by Non-Maximum Suppression (NMS) 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 173 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1480 0.830.71 ⋯ ⋯ 0.30 0 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 1485 ⋯ 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 2 Step 2. Compute the 𝐼𝑜𝑈 between the most probable and the other proposals, and reduce proposals having 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7) 𝐼𝑂𝑈 =
  • 16. Most probable proposal 30 proposals having IoU>0.7 are discarded. Region Proposal Network Given the most probable proposal, the blue proposals have 𝑰𝒐𝑼 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (0.7) Summary of step 1-2 in NMS. Step 3: Get the next most probable proposal among the rest 1485 − 30 proposals & repeat the previous process. Next most probable proposal 36 proposals having IoU>0.7 are discarded. Reduce redundancy by NMS
  • 17. Region Proposal Network Before NMS After NMS 1,485 proposals 300 proposals Repeats the previous procedure until… Reduce redundancy by NMS
  • 18. Summary of RPN Inputs: • Conv feature map Outputs: • Region proposals coordinates. • Probabilities representing how likely the image in that region proposal will be an object. Region Proposal Network
  • 19. feature map Region Proposal Network RoI pooling proposals ConvNet Now we are ready to explain Classifier & Regressor. Classifier & Regressor Classifier & Regressor
  • 20. RoI pooling layer Proposal 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ 𝑝𝑥′ , 𝑝𝑦′ , 𝑝𝑤′ , 𝑝ℎ′ 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ Classifier & Regressor Bilinear interpolation & Max pooling Input for Classifier & Regressor : fixed-size Conv feature map Bilinear interpolation & Max pooling Convert the features inside valid RoI into a small feature map with a fixed spatial extent. 𝑝𝑥′ = 𝑝𝑥 ⋅ 15 , 𝑝𝑦′ = 𝑝𝑦 ⋅ 11 , 𝑝𝑤′ = 𝑝𝑤 ⋅ 15 , 𝑝ℎ′ = 𝑝ℎ ⋅ 11 360 480 11 15 5 8 3 9 7 7 7 7 𝑝𝑥′ , 𝑝𝑦′ , 𝑝𝑤′ , 𝑝ℎ′
  • 21. ⋯ 300 RoI pooled feature maps RoI pooling layer generates inputs for Classifier & Regressor Classifier & Regressor 7 7 512 7 7 512 7 7 512 7 7 512
  • 22. ⋮ 𝑠0 𝑟𝑥0 𝑟𝑦0 𝑟𝑤0 𝑟ℎ0 𝑠15 𝑟𝑥15 𝑟𝑦15 𝑟𝑤15 𝑟ℎ15 𝑠20 𝑟𝑥20 𝑟𝑦20 𝑟𝑤20 𝑟ℎ20 𝑝0 = 0.0124 𝑝15 = 0.9797 𝑝20 = 0.0001 ⋮ RoI pooling Classification & Regression per each proposal 𝑥𝑖 = 𝑝𝑥 + 𝑝𝑤 ⋅ 𝑟𝑥𝑖 𝑦𝑖 = 𝑝𝑦 + 𝑝ℎ ⋅ 𝑟𝑦𝑖 𝑤𝑖 = 𝑝𝑤 ⋅ exp 𝑟𝑤𝑖 ℎ𝑖 = 𝑝ℎ ⋅ exp 𝑟ℎ𝑖 𝑝𝑖 = exp 𝑠𝑖 𝑗=0 20 exp 𝑠𝑗 Background Person TV monitor 𝑝𝑥, 𝑝𝑦, 𝑝𝑤, 𝑝ℎ Fully-connected layers ⋮ 𝑝0 𝑥0 𝑦0 𝑤0 ℎ0 𝑝15 𝑥15 𝑦15 𝑤15 ℎ15 𝑝20 𝑥20 𝑦20 𝑤20 ℎ20 ⋮ Proposal Classification & Bounding-box regression Each of the 21 classes gets its own refined bounding-box prediction and assign estimated probability. Classifier & Regressor 7 7 512 7×7×512 4096
  • 23. Summary of Classification & Regression Regress & classify each class from proposals ⋮ Background Person TV monitor ⋮ ⋮ Reduce redundancy by NMS Dining table ⋮ None None Classifier & Regressor Discard bounding boxes (p < 0.6 or background) ⋮ ⋮ ⋮ Region Proposals
  • 24. Summary of Classifier & Regressor Inputs: • Conv feature map • Region proposals Outputs: • Bounding boxes coordinate of objects in the image. • Classification of bounding boxes Classifier & Regressor
  • 25. Training process for RPN Ground-truth proposals associated with anchors 𝐴𝑗 𝑘 Find the nearest bounding box from each anchors, 𝐵𝑖 𝑘 = argmax 𝐵∈ 𝐵(𝑘) 𝐼𝑜𝑈 𝐵, 𝐴𝑗 𝑘 • Ground-truth probability of objectness: 𝑝𝑗 (𝑘) ≔ 1, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖 𝑘 , 𝐴𝑗 𝑘 > 0.7 0, 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖 𝑘 , 𝐴𝑗 𝑘 < 0.3 • Ground-truth proposal transformation: 𝑡𝑗 (𝑘) ≔ 𝑡𝑥𝑗 (𝑘) , 𝑡𝑦𝑗 (𝑘) , 𝑡𝑤𝑗 (𝑘) , 𝑡ℎ𝑗 (𝑘) where Δ 𝑥𝑗 (𝑘) = 𝑥𝑖 𝑘 − 𝑎𝑥𝑗 (𝑘) /𝑎𝑤𝑗 (𝑘) , Δ 𝑦𝑗 𝑘 = 𝑦𝑖 (𝑘) − 𝑎𝑦𝑗 (𝑘) /𝑎ℎ𝑗 (𝑘) , Δ 𝑤𝑗 = log 𝑤𝑖 𝑘 /𝑎𝑤𝑗 (𝑘) , Δℎ𝑗 𝑘 = log ℎ𝑖 𝑘 /𝑎ℎ𝑗 (𝑘) Predicted proposals • Predicted probability of objectness: 𝑝𝑗 𝑘 • Predicted proposal transformation: 𝑡𝑗 (𝑘) = 𝑡𝑥𝑗 𝑘 , 𝑡𝑦𝑗 𝑘 , t𝑤𝑗 𝑘 , tℎ𝑗 𝑘 where 𝑡𝑗 𝑘 , 𝑝𝑗 𝑘 𝑗=1 𝑁 𝑎𝑛𝑐 𝑘 = 𝑅𝑃𝑁 𝐶𝑁𝑁 𝑋 𝑘 ; 𝑊𝐶𝑁𝑁 ; 𝑊𝑅𝑃𝑁 , Anchor boxes 𝐴(𝑘) = 𝐴𝑗 𝑘 𝑗=1 𝑁 𝑎𝑛𝑐 (𝑘) where A𝑗 𝑘 = 𝑎𝑥𝑗 (𝑘) , 𝑎𝑦𝑗 (𝑘) , 𝑎𝑤𝑗 (𝑘) , 𝑎ℎ𝑗 (𝑘) Input • Image 𝑋 𝑘 Ground-truth • Bounding boxes 𝐵(𝑘) = 𝐵𝑖 𝑘 𝑖=1 𝑁 𝑜𝑏𝑗 (𝑘) where 𝐵𝑖 𝑘 = 𝑥𝑖 𝑘 , 𝑦𝑖 𝑘 , 𝑤𝑖 𝑘 , ℎ𝑖 𝑘 • Classes 𝐶(𝑘) = 𝐶𝑖 𝑘 𝑖=1 𝑁 𝑜𝑏𝑗 (𝑘) 𝐿 𝑅𝑃𝑁 𝑝𝑗 (𝑘) , 𝑡𝑗 (𝑘) , 𝑝𝑗 (𝑘) , 𝑡𝑗 (𝑘) ; 𝑊𝐶𝑁𝑁, 𝑊𝑅𝑃𝑁 = 1 2 𝑗=1 𝑁 𝑏𝑎𝑡𝑐ℎ 𝐻 𝑝𝑗 (𝑘) , 𝑝𝑗 𝑘 + 𝜆 𝑅𝑃𝑁 1 𝑁𝑎𝑛𝑐 (𝑘) 𝑗=1 𝑁 𝑏𝑎𝑡𝑐ℎ 𝑝𝑗 𝑘 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1 𝑡𝑗 𝑘 , 𝑡𝑗 𝑘 where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1 𝑥 = 0.5𝑥2 , 𝑖𝑓 𝑥 < 1 𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
  • 26. Training process for Classifier & Regressor Input • Image 𝑋 𝑘 Ground-truth • Bounding boxes 𝐵(𝑘) = 𝐵𝑖 𝑘 𝑖=1 𝑁 𝑜𝑏𝑗 (𝑘) where 𝐵𝑖 𝑘 = 𝑥𝑖 𝑘 , 𝑦𝑖 𝑘 , 𝑤𝑖 𝑘 , ℎ𝑖 𝑘 • Classes 𝐶(𝑘) = 𝐶𝑖 𝑘 𝑖=1 𝑁 𝑜𝑏𝑗 (𝑘) c Ground-truth Classification & Regression associated with proposals 𝑃𝑗 (𝑘) Find the nearest bounding box from each proposals 𝐵𝑖 𝑘 = argmax 𝐵∈ 𝐵(𝑘) 𝐼𝑜𝑈 𝐵, 𝑃𝑗 𝑘 • Ground-truth Classification: 𝑐𝑗 (𝑘) ≔ 𝑐𝑗,0 (𝑘) , ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠 (𝑘) = 1,0, ⋯ , 0 , 𝑖𝑓 𝐼𝑜𝑈 𝐵𝑖 𝑘 , 𝑃𝑗 𝑘 < 0.5 0, ⋯ 0,1,0,⋯ , 0 , 𝑜𝑡ℎ𝑒𝑟𝑠 • Ground-truth Regression: 𝑟𝑗 (𝑘) ≔ 𝑟𝑥𝑗 (𝑘) , 𝑟𝑦𝑗 (𝑘) , 𝑟𝑤𝑗 (𝑘) , 𝑟ℎ𝑗 (𝑘) where 𝑟𝑥𝑗 (𝑘) = 𝑥𝑖 𝑘 − 𝑝𝑥 𝑗 (𝑘) /𝑝𝑤𝑗 (𝑘) , 𝑟𝑦𝑗 𝑘 = 𝑦𝑖 𝑘 − 𝑝𝑦𝑗 (𝑘) /𝑝ℎ 𝑗 (𝑘) , 𝑟𝑤𝑗 (𝑘) = log 𝑤𝑖 𝑘 /𝑝𝑤𝑗 (𝑘) , 𝑟ℎ𝑗 𝑘 = log ℎ𝑖 𝑘 /𝑝ℎ 𝑗 (𝑘) 𝐶𝑖 𝑘 + 1 𝑡ℎ 𝑐𝑜𝑚𝑝𝑜𝑒𝑛𝑒𝑡 Predicted Classification & Regression • Predicted Classification: 𝑐𝑗 𝑘 = 𝑐𝑗,0 𝑘 , ⋯ , 𝑐𝑗,𝑁 𝑐𝑙𝑠 𝑘 • Predicted Regression: 𝑟𝑗 (𝑘) = r𝑥𝑗 𝑘 , r𝑦𝑗 𝑘 , r𝑤𝑗 𝑘 , rℎ𝑗 𝑘 where 𝑟𝑗 𝑘 , 𝑐𝑗 𝑘 𝑗=1 𝑁 𝑎𝑛𝑐 𝑘 = 𝐶𝑅 𝐶𝑁𝑁 𝑋 𝑘 ; 𝑊𝐶𝑁𝑁 , 𝑃 𝑘 ; 𝑊𝐶𝑅 Region Proposals associated with anchors 𝐴𝑗 (𝑘) P(𝑘) ≔ 𝑃𝑗 𝑘 , 𝑝𝑗 𝑘 𝑗=1 𝑁 𝑎𝑛𝑐 𝑘 , 𝑃𝑗 𝑘 = 𝑝𝑥 𝑗 𝑘 , 𝑝𝑦𝑗 𝑘 , 𝑝𝑤𝑗 𝑘 , 𝑝ℎ 𝑗 𝑘 where 𝑝𝑥 𝑗 𝑘 = 𝑎𝑥𝑗 (𝑘) + 𝑎𝑤𝑗 (𝑘) 𝑡𝑥𝑗 (𝑘) , 𝑝𝑦𝑗 𝑘 = 𝑎𝑦𝑗 (𝑘) + 𝑎ℎ𝑗 (𝑘) 𝑡𝑦𝑗 (𝑘) 𝑝𝑤𝑗 𝑘 = 𝑎𝑤𝑗 𝑘 exp 𝑡𝑤𝑗 (𝑘) , 𝑝ℎ 𝑗 (𝑘) = 𝑎ℎ𝑗 (𝑘) exp 𝑡ℎ𝑗 (𝑘) 𝑃(𝑘) ← 𝑁𝑀𝑆(𝑃 𝑘 , 𝑁𝑝𝑟𝑜𝑝) 𝐿 𝐶𝑅 𝑟𝑗 (𝑘) , 𝑐𝑗 (𝑘) , 𝑟𝑗 (𝑘) , 𝑐𝑗 (𝑘) ; 𝑊𝐶𝑁𝑁, 𝑊𝐶𝑅 = 𝑗=1 𝑁 𝑝𝑟𝑜𝑝 𝐻 𝑐𝑗 𝑘 , 𝑐𝑗 𝑘 + 𝜆 𝐶𝑅 𝑗=1 𝑁 𝑝𝑟𝑜𝑝 1 − 𝑐𝑗,0 𝑘 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1 𝑟𝑗 𝑘 , 𝑟𝑗 𝑘 where 𝐻 is the cross−entropy function and 𝑠𝑚𝑜𝑜𝑡ℎ 𝐿1 𝑥 = 0.5𝑥2 , 𝑖𝑓 𝑥 < 1 𝑥 − 0.5, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
  • 27. The History of object detection in deep learning Yolo Yolo v2 SSD RCNN Fast RCNN Faster RCNN Mask RCNN DSSD 2012.12 AlexNet 2014.9 VggNet & InceptionNet 15.12.10 ResNet 2013.11.11 2015.4.30 2015.5.14 15.6.8 15.12.2515.12.08 17.1.23 17.3.20
  • 29. References [Gitbooks] Object Localization and Detection https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/object_localization_and_detection.html [ICCV2015 Tutorial] Convolutional Feature Maps https://courses.engr.illinois.edu/ece420/sp2017/iccv2015_tutorial_convolutional_feature_maps_kaiminghe.pdf [Infographic] The Modern History of Object Recognition https://github.com/Nikasa1889/HistoryObjectRecognition [Tensorflow Code] tf-Faster-RCNN https://github.com/kevinjliang/tf-Faster-RCNN [Medium] A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4 [pyimagesearch] Intersection over Union (IoU) for object detection https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/ [Stanford c231n] Lecture 11: Detection and Segmentation http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
  • 30. Thank you E-mail: hpkim0512@yonsei.ac.kr/ Hompage: https://hpkim0512.blogspot.com