SlideShare a Scribd company logo
1 of 39
PR-146
CornerNet: Detecting Object as Paired Keypoints
Hei Law, Jia Deng. ECCV’18
visonNoob(Jaewon Lee)
Object Detection
person dog dog
(multiple objects)
https://youtu.be/8jfscFuP_9k
Many slides from https://heilaw.github.io/
Author’s page : https://heilaw.github.io/
Code : https://github.com/princeton-vl/CornerNet (PyTorch impl)
ECCV’18 oral session : https://youtu.be/aJnvTT1-spc
Slides : https://heilaw.github.io/slides/CornerNet.pptx
Paper list from 2014 to now(2019) for object detection based on DL
https://github.com/hoya012/deep_learning_object_detection
2. Introduction
Main Contributions
• CornerNet: Detecting objects as pairs of top-left and bottom-
right corners
• Corner pooling to help better localize corners
• State-of-the-art performance among single-stage detectors
https://heilaw.github.io/
2. Introduction
CornerNet: Detecting Objects as Paired Keypoints
https://heilaw.github.io/
2. Introduction
CornerNet: Detecting Objects as Paired Keypoints
Person
Top-Left
Corner?
ConvNet
Class
Whose
Top-Left?
Bottom-Right
Corner? Class
Whose
Bottom-Right?
Yes No
Yes Person
Yes Person
No
No
Yes PersonNo
2. Introduction
https://heilaw.github.io/
CornerNet: Detecting Objects as Paired Keypoints
Person
Top-Left
Corner? Class
Whose
Top-Left?
Bottom-Right
Corner? Class
Whose Botto
m-Right?
Yes No
Yes Person
Yes Person
No
No
Yes PersonNo
2. Introduction
https://heilaw.github.io/
CornerNet: Detecting Objects as Paired Keypoints
Person
Top-Left
Corner? Class
Whose
Top-Left?
Bottom-Right
Corner? Class
Whose
Bottom-Right?
Yes No
Yes Person
Yes Person
No
No
Yes PersonNo
Loss: distance
Loss: similarity
2. Introduction
https://heilaw.github.io/
2. Introduction
https://heilaw.github.io/ https://www.youtube.com/watch?v=pW6nZXeWlGM https://youtu.be/pW6nZXeWlGM?t=90
Experiment: CornerNet versus Others
42.1
41.8
39.1
33.2
21.6
45.7
42.8
39.8
0 5 10 15 20 25 30 35 40 45 50
CornerNet
RefineDet
RetinaNet
DSSD
YOLOv2
D-RFCN + SNIP
Cascade R-CNN
Mask R-CNN
Two-stage One-stage mAP
2. Introduction
2. Related Works
Two-Stage Detector
[Girshick et al. CVPR’14] [He et al. ECCV’14] [He et al. ICCV’17] [Cai & Vasconcelos, CVPR’18] [Singh & Davis, CVPR’18]
Region Pooling
[Girshick, ICCV’15]
Region of Interest
[Ren et al. NIPS’15]
1st Network
2nd Network
Person
Person
https://heilaw.github.io/
r-cnn SPP Mask r-cnn Cascade r-cnn snip
Faster R-CNN PR-012 : https://youtu.be/kcPAGIgBGRs
Mask R-CNN PR-057 : https://youtu.be/RtSZALC9DlU
3. Related Works
One-stage Detector
Class
Person
Class
Person
Class
Background
Anchors
Anchors
Anchors
[Redmon & Farhadi, CVPR’17] [Shen et al. ICCV’17] [Liu et al. ECCV’16] [Fu et al. arXiv’17] [Lin et al. ICCV’17] [Zhang et al. CVPR’18]
ConvNet
Yolo9000 Dsod Ssd Dssd RetinaNet RefineDet
Yolo PR-016 : https://youtu.be/eTDcoeqj1_w
Yolo9000 PR-023 : https://youtu.be/6fdclSGgeio
SSD PR-132 https://youtu.be/ej1ISEoAK5g
https://heilaw.github.io/
3. Related Works
Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks."
Advances in neural information processing systems. 2015. (https://arxiv.org/abs/1506.01497)
3. Related Works
Anchor Boxes
https://medium.com/@andersasac/anchor-boxes-the-key-to-quality-object-detection-ddf9d612d4f9
Drawbacks of Anchor Boxes
1. Need a large number of anchors
 A tiny fraction of anchors are positive examples
 Slow down training [Lin et al. ICCV’17]
2. Extra hyperparameters – sizes and aspect ratios
At least one anchor
sufficiently overlaps
with ground-truth
https://heilaw.github.io/
3. Related Works
3. CornerNet
3.2 Detecting Corner
Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose
estimation." European Conference on Computer Vision. Springer, Cham, 2016.
3.2 Detecting Corner
Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE
international conference on computer vision. 2017.
Ground-Truth Annotation
3.2 Detecting Corner
Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.
Faster R-CNN
Bounding-box regression
𝑜 𝑘: offset
n : downsampling factor
𝑥 𝑘, yk: coordinate for corner k
3.3 Grouping Corners
Person
Top-Left
Corner? Class
Whose
Top-Left?
Bottom-Right
Corner? Class
Whose
Bottom-Righ
t?
Yes No
Yes Person
Yes Person
No
No
Yes PersonNo
Loss: distance
Loss: similarity
3.3 Grouping Corners
https://heilaw.github.io/
Associative Embedding [Newell et al. NIPS’1
7]
https://heilaw.github.io/
Newell, Alejandro, Zhiao Huang, and Jia Deng. "Associative embedding: End-to-end learning for joint detection and grouping."
Advances in Neural Information Processing Systems. 2017.
3.3 Grouping Corners
3.3 Grouping Corners
𝑒𝑡 𝑘
: embedding for the top-left corner of object k
𝑒 𝑏 𝑘
: embedding for the bottom-right corner of object k
𝑒 𝑘: : average of 𝑒𝑡 𝑘
and 𝑒 𝑏 𝑘
△ : 1
3.3 Corner Pooling
3.3 Corner Pooling
Top-Left Corner Pooling
ma
x
max
feature maps
3.3 Corner Pooling
4.4 Ablation Study
Corner Pooling
α and β to 0.1 and γ to 1
4 Experiments
4.1 Training Details
- Implementation in PyTorch https://github.com/princeton-vl/CornerNet
- Network is randomly initialized with no pretraining on any external dataset
- Input Resolution : 511 x 511, Output Resolution : 128 x 128
- Data augmentation : horizontal flipping, random scaling/cropping/color jittering
- Bach_size : 49 (Total 10 Tintan X GPUs, 4 on the master GPU, 5 images for the rest)
- For ablation study : 250k iterations with a learning rate of 2.5 × 10−4
- For comparing with others : an extra 250k iterations and reduce the learning rate to 2.5 ×
10−5 for the last 50k iterations.
4 Experiments
4.2 Testing Details
A simple post-processing algorithm
1. Non-maximal suppression :
3 x 3 max pooling layer on the corner heatmap
2. Picking the top 100 top-left, bottom-right corners from the heatmap
3. The corner locations are adjusted by the corresponding offsets
4. Calculation L1 distances between the embeddings of the top-left and bottom-right corners.
5. Pairs that have distances greater than 0.5 or contain corners from different categories are
rejected.
6. The average scores of the top-left and bottom-right corners are used as the detection
Generating
bounding boxes
4 Experiments
4.5 Comparisons with state-of-the-art detectors
Conclusion
• CornerNet: Detecting objects as pairs of top-left and bottom-
right corners
• Corner pooling to help better localize corners
• State-of-the-art performance among single-stage detectors
https://heilaw.github.io/
Further Discussion
• Other backbone?
• Occlusion between points?
• Corner Pooling
• Speed?
Corner pooling
The average inference time : 244ms per image on a Titan X (PASCAL) GPU (AP : 42.1)
Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017.
REFERENCES
[1] Law, Hei, and Jia Deng. "Cornernet: Detecting objects as paired keypoints."
Proceedings of the European Conference on Computer Vision (ECCV). 2018.
[2] Lin, Tsung-Yi, et al. "Focal loss for dense object detection."
Proceedings of the IEEE international conference on computer vision. 2017.
[3] Newell, Alejandro, Zhiao Huang, and Jia Deng. "Associative embedding: End-to-end learning for joint
detection and grouping." Advances in Neural Information Processing Systems. 2017.
[4] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.
[5] Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose
estimation." European Conference on Computer Vision. Springer, Cham, 2016.

More Related Content

What's hot

Deep Natural Language Processing for Search and Recommender Systems
Deep Natural Language Processing for Search and Recommender SystemsDeep Natural Language Processing for Search and Recommender Systems
Deep Natural Language Processing for Search and Recommender SystemsHuiji Gao
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi
 
Model compression
Model compressionModel compression
Model compressionNanhee Kim
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsJustin Basilico
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionKai-Wen Zhao
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisitedXavier Amatriain
 
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...Preferred Networks
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
 
[PyConKR][2020]이 선 넘으면 침범이야, BEEP!
[PyConKR][2020]이 선 넘으면 침범이야, BEEP![PyConKR][2020]이 선 넘으면 침범이야, BEEP!
[PyConKR][2020]이 선 넘으면 침범이야, BEEP!Ji Hyung Moon
 
QoS for ROS 2 Dashing/Eloquent
QoS for ROS 2 Dashing/EloquentQoS for ROS 2 Dashing/Eloquent
QoS for ROS 2 Dashing/EloquentHideki Takase
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersSeunghyun Hwang
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim
 
Ultra96ボードでYOLOを高速化
Ultra96ボードでYOLOを高速化Ultra96ボードでYOLOを高速化
Ultra96ボードでYOLOを高速化Hiroyuki Okuhata
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 

What's hot (20)

Deep Natural Language Processing for Search and Recommender Systems
Deep Natural Language Processing for Search and Recommender SystemsDeep Natural Language Processing for Search and Recommender Systems
Deep Natural Language Processing for Search and Recommender Systems
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
 
Model compression
Model compressionModel compression
Model compression
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
 
R-CNN
R-CNNR-CNN
R-CNN
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
[PyConKR][2020]이 선 넘으면 침범이야, BEEP!
[PyConKR][2020]이 선 넘으면 침범이야, BEEP![PyConKR][2020]이 선 넘으면 침범이야, BEEP!
[PyConKR][2020]이 선 넘으면 침범이야, BEEP!
 
QoS for ROS 2 Dashing/Eloquent
QoS for ROS 2 Dashing/EloquentQoS for ROS 2 Dashing/Eloquent
QoS for ROS 2 Dashing/Eloquent
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Ultra96ボードでYOLOを高速化
Ultra96ボードでYOLOを高速化Ultra96ボードでYOLOを高速化
Ultra96ボードでYOLOを高速化
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 

Similar to PR-146: CornerNet detecting objects as paired keypoints

Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportationWanjin Yu
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingGrigory Sapunov
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research ObjectsDavid De Roure
 
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Intel® Software
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with TransformersDatabricks
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
 
2014 manchester-reproducibility
2014 manchester-reproducibility2014 manchester-reproducibility
2014 manchester-reproducibilityc.titus.brown
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsKeiichiro Ono
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Kelle Cruz
 
Discovery Hub: on-the-fly linked data exploratory search
Discovery Hub: on-the-fly linked data exploratory searchDiscovery Hub: on-the-fly linked data exploratory search
Discovery Hub: on-the-fly linked data exploratory searchFabien Gandon
 
Histolab: an Open Source Python Library for Reproducible Digital Pathology
Histolab: an Open Source Python Library for Reproducible Digital PathologyHistolab: an Open Source Python Library for Reproducible Digital Pathology
Histolab: an Open Source Python Library for Reproducible Digital PathologyAlessia Marcolini
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use caseFlorian Wilhelm
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use caseinovex GmbH
 
odtslide-180529073940.pptx
odtslide-180529073940.pptxodtslide-180529073940.pptx
odtslide-180529073940.pptxahmedchammam
 

Similar to PR-146: CornerNet detecting objects as paired keypoints (20)

Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportation
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
 
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...
 
2014 manchester-reproducibility
2014 manchester-reproducibility2014 manchester-reproducibility
2014 manchester-reproducibility
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis Tools
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...
 
Discovery Hub: on-the-fly linked data exploratory search
Discovery Hub: on-the-fly linked data exploratory searchDiscovery Hub: on-the-fly linked data exploratory search
Discovery Hub: on-the-fly linked data exploratory search
 
Histolab: an Open Source Python Library for Reproducible Digital Pathology
Histolab: an Open Source Python Library for Reproducible Digital PathologyHistolab: an Open Source Python Library for Reproducible Digital Pathology
Histolab: an Open Source Python Library for Reproducible Digital Pathology
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
odtslide-180529073940.pptx
odtslide-180529073940.pptxodtslide-180529073940.pptx
odtslide-180529073940.pptx
 

More from jaewon lee

PR-185: RetinaFace: Single-stage Dense Face Localisation in the Wild
PR-185: RetinaFace: Single-stage Dense Face Localisation in the WildPR-185: RetinaFace: Single-stage Dense Face Localisation in the Wild
PR-185: RetinaFace: Single-stage Dense Face Localisation in the Wildjaewon lee
 
PR-199: SNIPER:Efficient Multi Scale Training
PR-199: SNIPER:Efficient Multi Scale TrainingPR-199: SNIPER:Efficient Multi Scale Training
PR-199: SNIPER:Efficient Multi Scale Trainingjaewon lee
 
PR 171: Large margin softmax loss for Convolutional Neural Networks
PR 171: Large margin softmax loss for Convolutional Neural NetworksPR 171: Large margin softmax loss for Convolutional Neural Networks
PR 171: Large margin softmax loss for Convolutional Neural Networksjaewon lee
 
PR157: Best of both worlds: human-machine collaboration for object annotation
PR157: Best of both worlds: human-machine collaboration for object annotationPR157: Best of both worlds: human-machine collaboration for object annotation
PR157: Best of both worlds: human-machine collaboration for object annotationjaewon lee
 
PR-122: Can-Creative Adversarial Networks
PR-122: Can-Creative Adversarial NetworksPR-122: Can-Creative Adversarial Networks
PR-122: Can-Creative Adversarial Networksjaewon lee
 
Pytorch kr devcon
Pytorch kr devconPytorch kr devcon
Pytorch kr devconjaewon lee
 
PR-134 How Does Batch Normalization Help Optimization?
PR-134 How Does Batch Normalization Help Optimization?PR-134 How Does Batch Normalization Help Optimization?
PR-134 How Does Batch Normalization Help Optimization?jaewon lee
 
PR-110: An Analysis of Scale Invariance in Object Detection – SNIP
PR-110: An Analysis of Scale Invariance in Object Detection – SNIPPR-110: An Analysis of Scale Invariance in Object Detection – SNIP
PR-110: An Analysis of Scale Invariance in Object Detection – SNIPjaewon lee
 

More from jaewon lee (9)

PR-185: RetinaFace: Single-stage Dense Face Localisation in the Wild
PR-185: RetinaFace: Single-stage Dense Face Localisation in the WildPR-185: RetinaFace: Single-stage Dense Face Localisation in the Wild
PR-185: RetinaFace: Single-stage Dense Face Localisation in the Wild
 
PR-199: SNIPER:Efficient Multi Scale Training
PR-199: SNIPER:Efficient Multi Scale TrainingPR-199: SNIPER:Efficient Multi Scale Training
PR-199: SNIPER:Efficient Multi Scale Training
 
PR 171: Large margin softmax loss for Convolutional Neural Networks
PR 171: Large margin softmax loss for Convolutional Neural NetworksPR 171: Large margin softmax loss for Convolutional Neural Networks
PR 171: Large margin softmax loss for Convolutional Neural Networks
 
PR157: Best of both worlds: human-machine collaboration for object annotation
PR157: Best of both worlds: human-machine collaboration for object annotationPR157: Best of both worlds: human-machine collaboration for object annotation
PR157: Best of both worlds: human-machine collaboration for object annotation
 
PR-122: Can-Creative Adversarial Networks
PR-122: Can-Creative Adversarial NetworksPR-122: Can-Creative Adversarial Networks
PR-122: Can-Creative Adversarial Networks
 
Rgb data
Rgb dataRgb data
Rgb data
 
Pytorch kr devcon
Pytorch kr devconPytorch kr devcon
Pytorch kr devcon
 
PR-134 How Does Batch Normalization Help Optimization?
PR-134 How Does Batch Normalization Help Optimization?PR-134 How Does Batch Normalization Help Optimization?
PR-134 How Does Batch Normalization Help Optimization?
 
PR-110: An Analysis of Scale Invariance in Object Detection – SNIP
PR-110: An Analysis of Scale Invariance in Object Detection – SNIPPR-110: An Analysis of Scale Invariance in Object Detection – SNIP
PR-110: An Analysis of Scale Invariance in Object Detection – SNIP
 

Recently uploaded

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 

Recently uploaded (20)

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 

PR-146: CornerNet detecting objects as paired keypoints

  • 1. PR-146 CornerNet: Detecting Object as Paired Keypoints Hei Law, Jia Deng. ECCV’18 visonNoob(Jaewon Lee)
  • 2. Object Detection person dog dog (multiple objects) https://youtu.be/8jfscFuP_9k
  • 3. Many slides from https://heilaw.github.io/ Author’s page : https://heilaw.github.io/ Code : https://github.com/princeton-vl/CornerNet (PyTorch impl) ECCV’18 oral session : https://youtu.be/aJnvTT1-spc Slides : https://heilaw.github.io/slides/CornerNet.pptx
  • 4. Paper list from 2014 to now(2019) for object detection based on DL https://github.com/hoya012/deep_learning_object_detection
  • 6. Main Contributions • CornerNet: Detecting objects as pairs of top-left and bottom- right corners • Corner pooling to help better localize corners • State-of-the-art performance among single-stage detectors https://heilaw.github.io/ 2. Introduction
  • 7. CornerNet: Detecting Objects as Paired Keypoints https://heilaw.github.io/ 2. Introduction
  • 8. CornerNet: Detecting Objects as Paired Keypoints Person Top-Left Corner? ConvNet Class Whose Top-Left? Bottom-Right Corner? Class Whose Bottom-Right? Yes No Yes Person Yes Person No No Yes PersonNo 2. Introduction https://heilaw.github.io/
  • 9. CornerNet: Detecting Objects as Paired Keypoints Person Top-Left Corner? Class Whose Top-Left? Bottom-Right Corner? Class Whose Botto m-Right? Yes No Yes Person Yes Person No No Yes PersonNo 2. Introduction https://heilaw.github.io/
  • 10. CornerNet: Detecting Objects as Paired Keypoints Person Top-Left Corner? Class Whose Top-Left? Bottom-Right Corner? Class Whose Bottom-Right? Yes No Yes Person Yes Person No No Yes PersonNo Loss: distance Loss: similarity 2. Introduction https://heilaw.github.io/
  • 12. Experiment: CornerNet versus Others 42.1 41.8 39.1 33.2 21.6 45.7 42.8 39.8 0 5 10 15 20 25 30 35 40 45 50 CornerNet RefineDet RetinaNet DSSD YOLOv2 D-RFCN + SNIP Cascade R-CNN Mask R-CNN Two-stage One-stage mAP 2. Introduction
  • 14. Two-Stage Detector [Girshick et al. CVPR’14] [He et al. ECCV’14] [He et al. ICCV’17] [Cai & Vasconcelos, CVPR’18] [Singh & Davis, CVPR’18] Region Pooling [Girshick, ICCV’15] Region of Interest [Ren et al. NIPS’15] 1st Network 2nd Network Person Person https://heilaw.github.io/ r-cnn SPP Mask r-cnn Cascade r-cnn snip Faster R-CNN PR-012 : https://youtu.be/kcPAGIgBGRs Mask R-CNN PR-057 : https://youtu.be/RtSZALC9DlU 3. Related Works
  • 15. One-stage Detector Class Person Class Person Class Background Anchors Anchors Anchors [Redmon & Farhadi, CVPR’17] [Shen et al. ICCV’17] [Liu et al. ECCV’16] [Fu et al. arXiv’17] [Lin et al. ICCV’17] [Zhang et al. CVPR’18] ConvNet Yolo9000 Dsod Ssd Dssd RetinaNet RefineDet Yolo PR-016 : https://youtu.be/eTDcoeqj1_w Yolo9000 PR-023 : https://youtu.be/6fdclSGgeio SSD PR-132 https://youtu.be/ej1ISEoAK5g https://heilaw.github.io/ 3. Related Works
  • 16. Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015. (https://arxiv.org/abs/1506.01497) 3. Related Works Anchor Boxes https://medium.com/@andersasac/anchor-boxes-the-key-to-quality-object-detection-ddf9d612d4f9
  • 17. Drawbacks of Anchor Boxes 1. Need a large number of anchors  A tiny fraction of anchors are positive examples  Slow down training [Lin et al. ICCV’17] 2. Extra hyperparameters – sizes and aspect ratios At least one anchor sufficiently overlaps with ground-truth https://heilaw.github.io/ 3. Related Works
  • 19. 3.2 Detecting Corner Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose estimation." European Conference on Computer Vision. Springer, Cham, 2016.
  • 20. 3.2 Detecting Corner Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017. Ground-Truth Annotation
  • 21. 3.2 Detecting Corner Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015. Faster R-CNN Bounding-box regression 𝑜 𝑘: offset n : downsampling factor 𝑥 𝑘, yk: coordinate for corner k
  • 23. Person Top-Left Corner? Class Whose Top-Left? Bottom-Right Corner? Class Whose Bottom-Righ t? Yes No Yes Person Yes Person No No Yes PersonNo Loss: distance Loss: similarity 3.3 Grouping Corners https://heilaw.github.io/
  • 24. Associative Embedding [Newell et al. NIPS’1 7] https://heilaw.github.io/
  • 25. Newell, Alejandro, Zhiao Huang, and Jia Deng. "Associative embedding: End-to-end learning for joint detection and grouping." Advances in Neural Information Processing Systems. 2017. 3.3 Grouping Corners
  • 26. 3.3 Grouping Corners 𝑒𝑡 𝑘 : embedding for the top-left corner of object k 𝑒 𝑏 𝑘 : embedding for the bottom-right corner of object k 𝑒 𝑘: : average of 𝑒𝑡 𝑘 and 𝑒 𝑏 𝑘 △ : 1
  • 32. α and β to 0.1 and γ to 1 4 Experiments 4.1 Training Details - Implementation in PyTorch https://github.com/princeton-vl/CornerNet - Network is randomly initialized with no pretraining on any external dataset - Input Resolution : 511 x 511, Output Resolution : 128 x 128 - Data augmentation : horizontal flipping, random scaling/cropping/color jittering - Bach_size : 49 (Total 10 Tintan X GPUs, 4 on the master GPU, 5 images for the rest) - For ablation study : 250k iterations with a learning rate of 2.5 × 10−4 - For comparing with others : an extra 250k iterations and reduce the learning rate to 2.5 × 10−5 for the last 50k iterations.
  • 33. 4 Experiments 4.2 Testing Details A simple post-processing algorithm 1. Non-maximal suppression : 3 x 3 max pooling layer on the corner heatmap 2. Picking the top 100 top-left, bottom-right corners from the heatmap 3. The corner locations are adjusted by the corresponding offsets 4. Calculation L1 distances between the embeddings of the top-left and bottom-right corners. 5. Pairs that have distances greater than 0.5 or contain corners from different categories are rejected. 6. The average scores of the top-left and bottom-right corners are used as the detection Generating bounding boxes
  • 35. 4.5 Comparisons with state-of-the-art detectors
  • 36. Conclusion • CornerNet: Detecting objects as pairs of top-left and bottom- right corners • Corner pooling to help better localize corners • State-of-the-art performance among single-stage detectors https://heilaw.github.io/
  • 37. Further Discussion • Other backbone? • Occlusion between points? • Corner Pooling • Speed? Corner pooling
  • 38. The average inference time : 244ms per image on a Titan X (PASCAL) GPU (AP : 42.1) Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017.
  • 39. REFERENCES [1] Law, Hei, and Jia Deng. "Cornernet: Detecting objects as paired keypoints." Proceedings of the European Conference on Computer Vision (ECCV). 2018. [2] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017. [3] Newell, Alejandro, Zhiao Huang, and Jia Deng. "Associative embedding: End-to-end learning for joint detection and grouping." Advances in Neural Information Processing Systems. 2017. [4] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015. [5] Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose estimation." European Conference on Computer Vision. Springer, Cham, 2016.

Editor's Notes

  1. 네 발표를 시작하도록 하겠습니다. 제가 이번에 발표드릴 논문은 CornerNet: Detecting Object as Paired Keypoints 이라는 논문입니다. ECCV’18 에 발표가 된 논문입니다. 이 논문은 제목에서 알 수 있겠지만 Object Detection 관련된 논문 이구요.
  2. PR-12에서 Object Detection 논문이 많이 소개되어서 (이제 많은 분들이 아시겠지만) Object Detection이 풀려는 문제는 이미지 내 multiple objects가 각각이 무엇이고 어디에 있는지를 바운딩박스를 통해 localization 하는 문제입니다. 쉽게말해서 네모박스 치는거져
  3. 이 논문에서는 Object Detection task를 paired keypoints detection 으로 풀어보겠다는 것이구요 바운딩 박스라는걸 top-left, bottom-right의 이 두 코너포인트로 나타낼 수 있잖아요? 그래서 결국 바운딩박스의 top-left, bottom-right corne를 찾아보겠다는 것입니다. 결국 human pose estimation 문제에서 keypoint detection 하는 방법과 거의 유사한 방법을 적용했다고 보시면 됩니다. 이 논문을 선정하게된 이유는, 대부분의 object detection 논문들이 high recal의 anchor boxes를 기반으로 하는데. 다른 시도를 한 점이 흥미로워서 선정하게 되었습니다. 이번 발표슬라이드도 저자가 리소스들을 많이 공개를 해서 많은 부분들을 참고를 해서 만들었습니다.
  4. 요즘 발표 때 마다 아주 유용하게 써먹는 figure인데요 아주 고마운 분꼐서 딥러닝 기반의 Object Detection 알고리즘들을 리스트업해 주셨는데 CornerNet 논문이 여기쯤 있는 것을 보실 수 있습니다. 여기에서 확인하실 수 있습니다.
  5. 우선 논문의 main contributions은 다음과 같습니다. 우선 CornerNet 이라는 Object Detection Network를 제안합니다. 가장 큰 특징이라면 기존의 앵커박스 기반의 네트워크와는 달리 top-left, bottom-right coner points로 바운딩박스를 예측하는 새로운 모델을 제안하고 있구요. 두번째로는 corner 예측을 더 잘 예측하게 도와주는 corner pooling 이라는 새로운 pooling 방법을 제안합니다. 그리고 이런 방법을 통해서 COCO dataset mAP 으로, single-stage detectors 중에서 SOTA 성능을 달성했다고 합니다.
  6. 다시한번 그림으로 보자면 CornerNet의 핵심 아이디어는 이렇게 키포인트 두개를 가지고 바운딩박스를 예측해 보겠다는 것이구요 그러다보니 human pose estimation 쪽의 아이디어를 대부분 활용했다고 보시면 될 것 같습니다.
  7. 대략적으로 어떻게 돌아가는지를 보면 우리가 두 포인트를 예측해서 바운딩박스와 해당 클래스를 예측하고자 한다면 결국 이런식으로 Top-Left corner point와, Bottom Right coner point를 예측하고 당연히 클래스 probability도 예측을 하면 되겠죠 실제로 이부분이 합쳐져서 W x H x numClass 만큼의 map을 만들게 됩니다. .그리고 여기 중간에 있는게 embedding vecto인데 (다음 슬라이드)
  8. 나중에 다시 설명하겠지만. 포인트를 예측했으면 결국 얘가 얘랑 연결되는지 아니면 애랑 연결되는지를 결정해줘야합니다. Top-Left corner point와 Borrom-right corner point를 이 벡터간의 유사성을 이용해서 연결시켜줍니다.
  9. 그래서 결국 이 두개는 동일한 값을 가지도록 그리고 이 두 값은 다른 값을 가지도록 학습을 시킵니다.
  10. 네트워크 아웃풋 중에 heatmap 이 바로 이런 식으로 나오는데요 왼쪽이 Top-Left, 오른쪽이 Bottom-Right Corner에 대한 히트맵입니다. 여기 person class와 tennis rocket class가 하나의 히트맵으로 나타나 있는데 실제로 히트맵은 클래스마다 하나씩 가지고 있어서 실제로는 W * H * C 만큼의 히트맵을 각각 가지고 있겠죠 원래 이 person 이랑 tennis rocket은 서로 다른 채널에 있겠죠
  11. 실험은 MSCOCO로 하는데 CornerNet이 One-stage Detectors 중에서는 COCO mAP에서 SOTA를 성능을 달성했다고 합니다.
  12. 그리고 논문에서 푸는 썰을 좀 말씀드리자면 기존 object detection 알고리즘들을 크게 one-stage, two-stage 알고리즘으로 나눠 볼 수가 있을텐데
  13. 2-Stage Detector들은 보통 R-CNN 계열의 알고리즘들로 Region Proposal Network 가 앞쪽에 붙어있는 형태고 1-Stage Detector에 비해 상대적으로 속도는 느리지만 디텍션 성능이 좋은 특징이 있죠
  14. 반대로 1-stage network는 하나의 네트워크가 앵커박스도 찾고 클래스도 찾는 형태이고 2-stage에 비해 detection 성능이 조금씩 딸리지만 비교적 속도가 빠른 특징이 있습니다.
  15. 근데 1-stage 건 2-stage건 보통 anchor boxes 기반의 detector라고 할 수 있는데. 이 방식은 이런식으로 엄청나게 많은 후보 바운딩박스를 만들어놓고 하나라도 걸려라 하는 식인거죠
  16. 여기서부터가 논문의 썰인데 그럼 이런 앵커박스 기반의 방식의 문제가 무엇이냐면, 엄처안게 많은 앵커박스를 사전에 만들어 놓는데 사실 그 중에 객체랑 매칭이 되는 앵커는 비율로 따지면 수천개 중에 몇 개 안된다는 점이고 이게 근본적으로 positive, negative boxes간의 데이터 imbalance 문제를 초래하고, 학습을 더디게 한다는 점이 있고 두번째로는 이런 앵커박스를 이용하면 또 이것도 사람의 휴리스틱이 필요한 추가적인 hyperparameter가 필요하다는 문제가 있다는 겁니다. 그래서 우리는 앵커박스를 안쓰고 코너 포인트를 이용해보겠다 라는 거겠죠
  17. 네트워크를 보면 다음과 같습니다. Backbone으로 아워글래스 네트워크가 있는데 이건 human pose estimation 논문인 Stacked hourglass networks for human pose estimation 이라는 논문에서 에서 제안하는 모델인데, 참고로 CornerNet이랑 같은 Lab에서 나온 논문으로 알고 있구요. 제가 이 부분은 잘 모르기도 하고 이 논문에서 깊게 다루지 않기도 해서 Hourglass network 같은 부분은 다음에 이 논문을 발표할 기회가 있으면 그때 다시한번 다루도록 하겠습니다. 결국 다른 detection 논문들 처럼 hourglass networ이라는 backbone networt에서 representatio을 뽑아낸다고 보시면 되구요 이후에 두 브렌치로 나눠지는데 하나가 Top-left corner 그리고 다른 한 쪽이 Bottom-right corners로 나눠지게 됩고, 각각의 모듈에서 코너가 담긴 히트맵과 임베딩 벡터, 오프셋을 예측합니다. 히트맵이랑 임베딩s는 앞서 봤었고, 오프셋은 기존 object detection 알고리즘들의 bounding box regression과 거의 동일한 역할이라고 보시면 됩니다. 각 포인트에 대한 위치를 미세조정해주는 역할을 합니다.
  18. 이제 Loss Function을 조금 살펴볼건데요. 우선 L_det을 살펴볼껀데 앞서 말씀드렸듯이 각 포인트의 위치를 W*H*numClass만큼의 히트맵으로 가지고 있다고 했었잖아요? 이 맵의 gt를 만들때 그냥 점만 딱 찍는게 아니라 그 점을 중심으로 하는 가우시안을 정답 히트맵에 입힙니다. 그러면 정답값도 이런 형태로 나오겠죠. 그리고 실제 loss function은 focal loss를 변형해서 만드는데, 제 생각에는 당연히 얘도 결국 class imbalance 문제가 있겠죠 그래서 focal loss를 사용한다고 생각하구요 결국은 cross entropy loss의 변형형태. 다만 정답이 binar가 아니기때문에 여기 1 – y 부분이 들어갑니다. 가우시안으로 Negative가 0만 있진 않겠죠.
  19. 그리고 L_off 가 앞서 말씀드린 offset에 대한 loss인데 Offset을 하는 이유는 CorNerNet이 입력과 아웃풋의 사이즈가 다릅니다. 그게 n배 만큼 달라져서 그만큼 이제 localization error가 생기는데 그걸 보정해주는 용도구요 Loss metric는 Smooth L1Loss를 사용하는데 이는 Faster r-cnn이나 여러 BBX REGRESSION 할때 쓰던 loss를 그대로 가져와서 씁니다. 제가 알기로는 l1이나 l2를 그냥 사용하면 학습이 잘 안되는것으로 알고 있습니다. 더 자세히는 한번 고민을 해 봐야할 것 같습니다.
  20. 이제 임베딩에 대한 loss가 남아있는데 Pull push loss가 있습니다.
  21. 쉽게 생각하면 같은 놈들은 같도록 다른 놈들은 다르도록 학습시키는건데요
  22. 사실 이런 학습방법도 keypoints detection 논문에서 가져온거라고 보시면 되고 Keypoint detection 할때 키포인트들간에 연결시켜줄때 이렇게 임베일 벡터를 학습시켜서 유사한 키포인트들끼리 연결시켜주는 방법이라고 생각하시면 됩니다.
  23. 이것도 키포인트 디텍션에서의 예시인데. 임베딩 벡터가 2차원 이상일 것이라고 생각했는데 CornerNet도 그렇고 이렇게 1차원 임베딩을 사용합니다. 이 예시에서는 9명의 사람이 있는데 잘못된 것도 하나 있구요 키포인트를 연결시키는데 이 임베딩 벡터가 유사한 것들 끼리 연결시키도록 임베딩을 학습시키면 이렇게 잘 되더라 하는 논문입니다.
  24. 그래서 이 방법을 고대로 가져와서 detection에 적용한 건데요 E_t_k는 k 번째 오브젝트의 top-left corne의 임베딩이고 E_b_k 는 K번째 오브젝트의 bottom-right corner의 임베딩입니다. E_k는 이 둘의 평균이구요. L_pull은 결국 이 두 임베딩이 평균에 가까워지게 학습을 시키는 것이고 L_push는 서로 다른 오브젝트간의 이 평균 임베딩이 멀어지도록 학습시키는 것입니다.
  25. 그리고 마지막으로 corner pooling 이라는 기법을 제안하는데요 앞서 backbone에서 두 브렌치로 나왔잖아요? 하나가 top-left corner pooling module 이고 다른 하나는 bottom-left plling module 이었죠 여기 들어가는게 Corner pooling 인데요
  26. 논문에서 표현하기로는 이 코너에 local visual evidence가 부족할 수 있다고 이야기해요 이런 예시를 봐도 사실 코너의 위치가 객체와 아주 먼 곳에 있잖아요? 이럴때 이런식으로 수평적으로 한번 보고, 수직으로도 한번 볼 수 있으면 좋지 않을까 하는아이디어구요.
  27. 가령 이렇게 top-left corner의 경우에는 여기에서는 요 라인중에서 가장 큰 값 그리고 여기에서는 요 라인중에서 가장 큰 값을 가져옵니다.
  28. 다시보면 여기 3이라는 값은 요기에서 가장 큰값 여기 3은 요기에서 가장 큰 값이고 이 둘을 element wise 더한게 6이 됩니다. 요런 방식으로 corner 픽셀위치에 activatio을 주는 방법을 corner polling 이라고 합니다. Bottom right corner pooling의 경우에는 반대로 하면 되겠죠
  29. 여기 corner pooling에 대한 ablation study가 있는데 Corner pooling을 했더니 특히 medium, large object의 map가 크게 올랐다고 합니다.
  30. 그래서 실제로는 pytorch로 구현이 되어 있구요 특이한 점이 있다면 pretrained network를 사용하지 않았다는 점이 있구요.. 이유는 나와있지는 않네요..
  31. 앞서 네트워크의 출력이 heatmap, embeddings, offset이었는데 이걸 가지고 실제 바운딩박스를 만들어야겠죠 이 논문에서는 Simple post-processing 알고리즘을 사용한다 라고 하는데 단계는 엄청 많네요 우선 3x3 pooling을 통해서 지역적으로 가장 높은 값들을 뽑아내구요 거기서 상위 100개의 top-left, bottom-right corners를 다시 뽑아냅니다. 그리고 여기에 offse을 더해주고, L1 distanc로 top-left, bottom right 임베딩의 거리를 계산합니다. 그래서 가장 가까운게 0.5를 넘거나 클래스가 서로 다르면 제거하고 최종으로 남은 paired points를 가지고 바운딩박스를 만들게 됩니다.
  32. 결과는 우리께 잘된다. 라는 거겠죠 잘 안되는 케이스도 넣어줬으면 좋았을것 같은데 그럼점이 살짝 아쉽구요
  33. 그리고 다른 SOTA detectors와 비교해 봤을때 One-stage detectors 중에서는 가장 높았고 Tow-stage detectors 에도 견줄만하다 라고 합니다.
  34. 다시 한번 결론을 짓자면 이번 논문에서는 cornetnet이라는 네트워크를 제안을 했고 Top-left, bottom right corner를 가지고 바운딩박스를 찾아보자는 방법이었습니다. 그리고 cornerpooling 이라는 방법을 통해서 성능을 조금 더 올릴 수 있었고 COCO map 기준 songle-stage detectors 중에서는 SOTA 성능을 보였다. 라고 정리할 수 있겠습니다.
  35. 그리고 Oral Session 동영상에서 나왔던 질문중에 Hourglass network 말고 다른 backbone을 써봤냐 라는 질문이 있었는데 Resnet, ResNext 등의 모델을 써봤는데 성능이 안좋았다는 답변이 있었고 Point 간의 Occlusion이 발생하면 어떡하냐는 질문에 그러면 별 수 없다.. 향후 과제인것 같다 라는 답변에 있었습니다. 그리고 개인적을 corner pooling 이 조금 의심하는데 surveillance video 같이 동일한 사람 클래스가 엄청 많거나 하는 상황에서 저런 pooling 을 사용하면 서로 다른 object 끼리 activation이 섞일 우려가 있을 것 같단 생각이 들었고 예를들어 여기 6이 있는데 그러면 다 6이 되잖아요? 요런게 문제가 될 수 있지 않을까 하는거죠 속도와 관련된 궁금증도 있는데
  36. 앞서 말씀드렸듯이 one-stage detector 는 비교적 성능이 조금 나쁘거나 비슷한 상황에서 속도가 빠른점은 강조하는 식의 비교를 많이 하곤 하는데 fps가 들어간 테이블이 없다는 점이 의하했습니다. 대신 TitanX 기준 inference time이 장당 224ms 정도라는 말이 있긴 한데 One-stage detector 측면에서보면 속도가 좀 아쉽다는 생각이 들고 이 plot은 retinanet 논문에서 가져왔는데 제가 알기로는 이걸 벤치마킹한 GPU가 TintanX급으로 알고있습니다만 여기에 견주어보면 다른 one-stage detector에 비해 성능이 좋을지는 몰라도 inference time 이 너무 느린점이 있다고 봅니다. 그럼 결국 one-stage detector 중에서 COCO dataset에서 SOTA를 찍은게 큰 의미가 있을까? 하는 생각이 들었습니다.
  37. In this work, we propose CornetNet, a new one-stage detector that does away with anchor boxes. We reformulate object detection as detecting and grouping keypoints. In particular, we detect the top-left corners and bottom-right corners of bounding boxes, and pair them to form individual object instances.