SlideShare a Scribd company logo
1 of 17
Focal Loss for Dense Object Detection
Motivation
• one-stage Network(YOLO,SSD 등) 의 Dense Object Detection 은
two-stage Network(R-CNN 계열) 에 비해 속도는 빠르지만 성능은 낮다.
• 극단적인 Class 간 unbalance에 기인
 # of Hard positives(object) << # of Easy negatives(back ground)
• 클래스 분류에 일반적으로 사용되는 cross entropy loss 함수를 조금 수
정한 Focal Loss 를 제안
 Easy sample에 대해서는 작은 가중치를 부여하는 반면 Hard
sample 에는 큰 가중치를 부여해서 학습을 어려운 예제에 집중
 RetinaNet 은 one-stage detector 만큼 빠르면서도
기존의 모든 최고 성능의 detector들을 능가하는 정확도
• One stage detector 는 영상 전체 위치에서 오브젝트의 위치, 크기, 비율
을 dense 하게 샘플링
 # of Hard positives(object) << # of Easy negatives(back ground)
 학습과정에서 loss가 너무나 많은 background examples에 의해 압도
• Detectors 는 한장의 이미지에서 104~105 개의 후보 위치를 제안하지만
실제로는 몇개의 object만 존재.
이러한 unbalance 현상은 2가지 문제를 발생
1) 분류하기 쉬운easy negative 들이 대부분인데 이들이 학습에
기여하는 것이 거의 없기 때문에 학습이 비효율적
2) 수많은 easy negative 들이 학습 과정을 압도하므로 비일반적인
모델이 학습
One stage detector의 문제점
Focal loss에 대한 제안
Scaling factor, (1- 𝑝𝑡 )ϒ 는 자동으로 학습과정에서 분류하기 쉬운
예제(easy example)들이 학습에 기여하는 정도를 낮춘다(down-weight).
반면 분류하기 어려운 예제(hard example)들에 대해서 매우 집중
대부분이 구분하기 쉬운 배경 예제들이 많은 경우 Focal loss를 사용하면
높은 성능을 갖는 dense object detector 를 학습시킬 수 있다.
hard easy
𝑝𝑡
Focal loss에 대한 제안
RetinaNet의 높은 성능을 보이는 이유는 네트워크 디자인이 아니라
새로운 loss 함수 때문
Focal Loss 는 one-stage object detection 에서 object 와 background
의 클래스간 unbalance가 극도로 심한 상황(예를 들면 1:1000)을
해결하기 위해 제안됨.
Binary classification을 위한 cross entropy(CE) loss
Focal loss
𝐶𝐸 𝑝𝑡 = -𝑙𝑜𝑔 𝑝𝑡
FL 𝑝𝑡 = - (1-𝑝𝑡 )ϒ 𝑙𝑜𝑔 𝑝𝑡 , ϒ ≥ 0
Modulating factor
Focusing parameter ϒ는 일반적으로 0~5 사이의 값(2 works best)
(1) 만약 example 을 잘못 분류하고, Pt 가 작은 값인 경우
(1-𝑝𝑡 )ϒ~1  loss함수에 영향을 주지않음
(2) 만약 example 을 잘 분류하고 Pt ~1 인 경우
(1-𝑝𝑡 )ϒ~0  loss for well classified examples
is down-weight
 Modulating factor 은 easy example들의 loss 에 대한 영향력을 줄임
예를 들어 ϒ=2 경우, Pt = 0.9 로 예측되었던 example 의 원래 CE 로스
에 비해서 FL 에서는 CE의 Pt=~0.968 일때 수준의 작은 loss를 받게 되므
로 상대적으로 1000x 작아 진다.
▶ 잘못 분류된 examples 의 중요도를 상대적으로 높이는 역활
Focal loss 작동원리
RetinaNet Detector
- Backbone : ResNet + Feature Pyramid Network(FPN)
Constructs pyramid with levels P3 through P7,
where l indicates pyramid level (Pl has resolution
2l lower than the input). All pyramid levels have C = 256 channels.
- Two task-specific subnetworks :
i) The first subnet performs convolutional object classification on the
backbone’s output;
ii) The second subnet performs convolutional bounding box regression.
Comparison to State of the Art
Applying Focal Loss on Cats-vs-dogs
Classification Task
https://shaoanlu.wordpress.com/2017/08/16/applying-focal-loss-on-cats-vs-
dogs-classification-task/#more-1302
The Focal Loss is designed to address the one-stage object detection
scenario in which there is an extreme imbalance between foreground
and background classes during training (e.g., 1:1000)”
Apply focal loss on toy experiment, which is very highly imbalance
problem in classification
Related paper : “A systematic study of the class imbalance problem in
convolutional neural networks” published on Oct. 2017
(https://arxiv.org/abs/1710.05381)
https://github.com/shaoanlu/expriment-with-focal-loss/tree/master
Experiment Results
Experiment 1: imbalanced data (10:1)
• Data: 11500 cat images and 1000 dog images
• Approach: Fine-tuning ResNet50 top FC layers using focal loss
• Input: Features extracted from ResNet50 (keras ImageNet pre-trained
model)
• Architecture: Input -> Dense512 -> BN -> Dropout -> Dense512 -> BN -
> Dropout -> sigmoid
• Optimizer: RMSProp
• Learning rate: 3e-5 -> 1e-5 (30 epochs for each learning rate)
Validation accuracy with different hyper-parameters of focal loss
Experiment Results
Experiment 2: imbalanced data (100:1)
• Data: 11500 cat images and 115 dog images
• Approach: Fine-tuning ResNet50 top FC layers using focal loss
• Input: Features extracted from ResNet50 (keras ImageNet pre-trained
model)
• Architecture: Input -> BN -> Dropout -> Dense512 -> BN -> Dropout ->
sigmoid
• Optimizer: adam
• Learning rate: 1e-4 ( divide by 3 every 20 epochs)
Experiment Results
α=0.5 and ϒ is gradually increasing from 0.1 to 2 over 60 epochs
Experiment Results
Confusion matrix
Left: model using focal loss (val_acc=0.949)
Right: model using standard cross entropy loss (val_acc=0.868)
Experiment Results
Comparing this two matrices,
model using default CE loss
predict more cats than dogs.
On the contrary, model using
focal loss predict evenly
among cats/dogs.
Observations:
- Focal loss did not really outperform standard CE loss on both
balanced/imbalanced data.
This is conceivable since the focal loss is designed for detection.
- The zigzagging curves are caused by high ϒ values
- The zigzagging curves have higher validation accuracy (at their peak) than
default binary_crossentropy loss function.
- If we carefully tuned α and ϒ, focal loss somehow handle imbalanced data well
(despite the oscillating valid. acc.)
Experiment Results
• Model using focal loss is able to achieve much higher valid.
accuracy than standard CE loss in this experiment. (again, if we
ignore the horrible fluctuation)
• The confusion matrix also indicate that the model classify really
well even though there are large imbalance in data.
* This is because focal loss increase loss contribution
from hard examples.
Experiment Results

More Related Content

What's hot

[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우
NAVER D2
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
NAVER Engineering
 

What's hot (20)

Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
 
Dependency Parser, 의존 구조 분석기
Dependency Parser, 의존 구조 분석기Dependency Parser, 의존 구조 분석기
Dependency Parser, 의존 구조 분석기
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
[2B7]시즌2 멀티쓰레드프로그래밍이 왜 이리 힘드나요
[2B7]시즌2 멀티쓰레드프로그래밍이 왜 이리 힘드나요[2B7]시즌2 멀티쓰레드프로그래밍이 왜 이리 힘드나요
[2B7]시즌2 멀티쓰레드프로그래밍이 왜 이리 힘드나요
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
 
[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropout[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropout
 
딥러닝 기반의 자연어처리 최근 연구 동향
딥러닝 기반의 자연어처리 최근 연구 동향딥러닝 기반의 자연어처리 최근 연구 동향
딥러닝 기반의 자연어처리 최근 연구 동향
 
[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우
 
PR-395: Variational Image Compression with a Scale Hyperprior
PR-395: Variational Image Compression with a Scale HyperpriorPR-395: Variational Image Compression with a Scale Hyperprior
PR-395: Variational Image Compression with a Scale Hyperprior
 
Chapter 19 Variational Inference
Chapter 19 Variational InferenceChapter 19 Variational Inference
Chapter 19 Variational Inference
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
Switch transformers paper review
Switch transformers paper reviewSwitch transformers paper review
Switch transformers paper review
 
머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Big Bird - Transformers for Longer Sequences
Big Bird - Transformers for Longer SequencesBig Bird - Transformers for Longer Sequences
Big Bird - Transformers for Longer Sequences
 

Viewers also liked

Viewers also liked (20)

알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder
 
딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향
 
Sk t academy lecture note
Sk t academy lecture noteSk t academy lecture note
Sk t academy lecture note
 
Deep learning text NLP and Spark Collaboration . 한글 딥러닝 Text NLP & Spark
Deep learning text NLP and Spark Collaboration . 한글 딥러닝 Text NLP & SparkDeep learning text NLP and Spark Collaboration . 한글 딥러닝 Text NLP & Spark
Deep learning text NLP and Spark Collaboration . 한글 딥러닝 Text NLP & Spark
 
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
 
Meta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural NetworksMeta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural Networks
 
Learning by association
Learning by associationLearning by association
Learning by association
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법
 
Explanation on Tensorflow example -Deep mnist for expert
Explanation on Tensorflow example -Deep mnist for expertExplanation on Tensorflow example -Deep mnist for expert
Explanation on Tensorflow example -Deep mnist for expert
 
A neural image caption generator
A neural image caption generatorA neural image caption generator
A neural image caption generator
 
Convolution 종류 설명
Convolution 종류 설명Convolution 종류 설명
Convolution 종류 설명
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
Binarized CNN on FPGA
Binarized CNN on FPGABinarized CNN on FPGA
Binarized CNN on FPGA
 
Learning to remember rare events
Learning to remember rare eventsLearning to remember rare events
Learning to remember rare events
 
MNIST for ML beginners
MNIST for ML beginnersMNIST for ML beginners
MNIST for ML beginners
 
Single Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance SegmentationSingle Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance Segmentation
 
Q Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object LocalizationQ Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object Localization
 
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
 
책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017
책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017
책 읽어주는 딥러닝: 배우 유인나가 해리포터를 읽어준다면 DEVIEW 2017
 

Similar to Focal loss의 응용(Detection & Classification)

Similar to Focal loss의 응용(Detection & Classification) (20)

Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031
 
Alexnet paper review
Alexnet paper reviewAlexnet paper review
Alexnet paper review
 
neural network 기초
neural network 기초neural network 기초
neural network 기초
 
Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리
 
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 4장. 모델 훈련
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 4장. 모델 훈련[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 4장. 모델 훈련
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 4장. 모델 훈련
 
03.12 cnn backpropagation
03.12 cnn backpropagation03.12 cnn backpropagation
03.12 cnn backpropagation
 
MRC recent trend_ppt
MRC recent trend_pptMRC recent trend_ppt
MRC recent trend_ppt
 
인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝
 
Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2
 
Dl from scratch(7)
Dl from scratch(7)Dl from scratch(7)
Dl from scratch(7)
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
 
LeNet & GoogLeNet
LeNet & GoogLeNetLeNet & GoogLeNet
LeNet & GoogLeNet
 
2.supervised learning
2.supervised learning2.supervised learning
2.supervised learning
 
네이버 NLP Challenge 후기
네이버 NLP Challenge 후기네이버 NLP Challenge 후기
네이버 NLP Challenge 후기
 
Rdatamining
Rdatamining Rdatamining
Rdatamining
 
KOOC Ch8. k-means & GMM
KOOC Ch8. k-means & GMMKOOC Ch8. k-means & GMM
KOOC Ch8. k-means & GMM
 
Densely Connected Convolutional Networks
Densely Connected Convolutional NetworksDensely Connected Convolutional Networks
Densely Connected Convolutional Networks
 
Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks Image net classification with deep convolutional neural networks
Image net classification with deep convolutional neural networks
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks I
 
Gamma Interaction Position Estimation using Deep Neural Networks
Gamma Interaction Position Estimation using Deep Neural NetworksGamma Interaction Position Estimation using Deep Neural Networks
Gamma Interaction Position Estimation using Deep Neural Networks
 

More from 홍배 김

More from 홍배 김 (15)

Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Lecture Summary : Camera Projection
Lecture Summary : Camera Projection Lecture Summary : Camera Projection
Lecture Summary : Camera Projection
 
Learning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robotsLearning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robots
 
Robotics of Quadruped Robot
Robotics of Quadruped RobotRobotics of Quadruped Robot
Robotics of Quadruped Robot
 
Basics of Robotics
Basics of RoboticsBasics of Robotics
Basics of Robotics
 
Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Optimal real-time landing using DNN
Optimal real-time landing using DNNOptimal real-time landing using DNN
Optimal real-time landing using DNN
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance
 
Anomaly Detection with GANs
Anomaly Detection with GANsAnomaly Detection with GANs
Anomaly Detection with GANs
 

Focal loss의 응용(Detection & Classification)

  • 1. Focal Loss for Dense Object Detection
  • 2. Motivation • one-stage Network(YOLO,SSD 등) 의 Dense Object Detection 은 two-stage Network(R-CNN 계열) 에 비해 속도는 빠르지만 성능은 낮다. • 극단적인 Class 간 unbalance에 기인  # of Hard positives(object) << # of Easy negatives(back ground) • 클래스 분류에 일반적으로 사용되는 cross entropy loss 함수를 조금 수 정한 Focal Loss 를 제안  Easy sample에 대해서는 작은 가중치를 부여하는 반면 Hard sample 에는 큰 가중치를 부여해서 학습을 어려운 예제에 집중  RetinaNet 은 one-stage detector 만큼 빠르면서도 기존의 모든 최고 성능의 detector들을 능가하는 정확도
  • 3. • One stage detector 는 영상 전체 위치에서 오브젝트의 위치, 크기, 비율 을 dense 하게 샘플링  # of Hard positives(object) << # of Easy negatives(back ground)  학습과정에서 loss가 너무나 많은 background examples에 의해 압도 • Detectors 는 한장의 이미지에서 104~105 개의 후보 위치를 제안하지만 실제로는 몇개의 object만 존재. 이러한 unbalance 현상은 2가지 문제를 발생 1) 분류하기 쉬운easy negative 들이 대부분인데 이들이 학습에 기여하는 것이 거의 없기 때문에 학습이 비효율적 2) 수많은 easy negative 들이 학습 과정을 압도하므로 비일반적인 모델이 학습 One stage detector의 문제점
  • 4. Focal loss에 대한 제안 Scaling factor, (1- 𝑝𝑡 )ϒ 는 자동으로 학습과정에서 분류하기 쉬운 예제(easy example)들이 학습에 기여하는 정도를 낮춘다(down-weight). 반면 분류하기 어려운 예제(hard example)들에 대해서 매우 집중 대부분이 구분하기 쉬운 배경 예제들이 많은 경우 Focal loss를 사용하면 높은 성능을 갖는 dense object detector 를 학습시킬 수 있다. hard easy 𝑝𝑡
  • 5. Focal loss에 대한 제안 RetinaNet의 높은 성능을 보이는 이유는 네트워크 디자인이 아니라 새로운 loss 함수 때문 Focal Loss 는 one-stage object detection 에서 object 와 background 의 클래스간 unbalance가 극도로 심한 상황(예를 들면 1:1000)을 해결하기 위해 제안됨. Binary classification을 위한 cross entropy(CE) loss Focal loss 𝐶𝐸 𝑝𝑡 = -𝑙𝑜𝑔 𝑝𝑡 FL 𝑝𝑡 = - (1-𝑝𝑡 )ϒ 𝑙𝑜𝑔 𝑝𝑡 , ϒ ≥ 0 Modulating factor
  • 6. Focusing parameter ϒ는 일반적으로 0~5 사이의 값(2 works best) (1) 만약 example 을 잘못 분류하고, Pt 가 작은 값인 경우 (1-𝑝𝑡 )ϒ~1  loss함수에 영향을 주지않음 (2) 만약 example 을 잘 분류하고 Pt ~1 인 경우 (1-𝑝𝑡 )ϒ~0  loss for well classified examples is down-weight  Modulating factor 은 easy example들의 loss 에 대한 영향력을 줄임 예를 들어 ϒ=2 경우, Pt = 0.9 로 예측되었던 example 의 원래 CE 로스 에 비해서 FL 에서는 CE의 Pt=~0.968 일때 수준의 작은 loss를 받게 되므 로 상대적으로 1000x 작아 진다. ▶ 잘못 분류된 examples 의 중요도를 상대적으로 높이는 역활 Focal loss 작동원리
  • 7. RetinaNet Detector - Backbone : ResNet + Feature Pyramid Network(FPN) Constructs pyramid with levels P3 through P7, where l indicates pyramid level (Pl has resolution 2l lower than the input). All pyramid levels have C = 256 channels. - Two task-specific subnetworks : i) The first subnet performs convolutional object classification on the backbone’s output; ii) The second subnet performs convolutional bounding box regression.
  • 8. Comparison to State of the Art
  • 9. Applying Focal Loss on Cats-vs-dogs Classification Task https://shaoanlu.wordpress.com/2017/08/16/applying-focal-loss-on-cats-vs- dogs-classification-task/#more-1302
  • 10. The Focal Loss is designed to address the one-stage object detection scenario in which there is an extreme imbalance between foreground and background classes during training (e.g., 1:1000)” Apply focal loss on toy experiment, which is very highly imbalance problem in classification Related paper : “A systematic study of the class imbalance problem in convolutional neural networks” published on Oct. 2017 (https://arxiv.org/abs/1710.05381)
  • 11. https://github.com/shaoanlu/expriment-with-focal-loss/tree/master Experiment Results Experiment 1: imbalanced data (10:1) • Data: 11500 cat images and 1000 dog images • Approach: Fine-tuning ResNet50 top FC layers using focal loss • Input: Features extracted from ResNet50 (keras ImageNet pre-trained model) • Architecture: Input -> Dense512 -> BN -> Dropout -> Dense512 -> BN - > Dropout -> sigmoid • Optimizer: RMSProp • Learning rate: 3e-5 -> 1e-5 (30 epochs for each learning rate)
  • 12. Validation accuracy with different hyper-parameters of focal loss Experiment Results
  • 13. Experiment 2: imbalanced data (100:1) • Data: 11500 cat images and 115 dog images • Approach: Fine-tuning ResNet50 top FC layers using focal loss • Input: Features extracted from ResNet50 (keras ImageNet pre-trained model) • Architecture: Input -> BN -> Dropout -> Dense512 -> BN -> Dropout -> sigmoid • Optimizer: adam • Learning rate: 1e-4 ( divide by 3 every 20 epochs) Experiment Results
  • 14. α=0.5 and ϒ is gradually increasing from 0.1 to 2 over 60 epochs Experiment Results
  • 15. Confusion matrix Left: model using focal loss (val_acc=0.949) Right: model using standard cross entropy loss (val_acc=0.868) Experiment Results Comparing this two matrices, model using default CE loss predict more cats than dogs. On the contrary, model using focal loss predict evenly among cats/dogs.
  • 16. Observations: - Focal loss did not really outperform standard CE loss on both balanced/imbalanced data. This is conceivable since the focal loss is designed for detection. - The zigzagging curves are caused by high ϒ values - The zigzagging curves have higher validation accuracy (at their peak) than default binary_crossentropy loss function. - If we carefully tuned α and ϒ, focal loss somehow handle imbalanced data well (despite the oscillating valid. acc.) Experiment Results
  • 17. • Model using focal loss is able to achieve much higher valid. accuracy than standard CE loss in this experiment. (again, if we ignore the horrible fluctuation) • The confusion matrix also indicate that the model classify really well even though there are large imbalance in data. * This is because focal loss increase loss contribution from hard examples. Experiment Results