SlideShare a Scribd company logo
1 of 21
Download to read offline
DEEP LEARNING JP
[DL Papers]
ReDet: A Rotation-equivariant Detector for Aerial
Object Detection
Yuting Lin, Kokusai Kogyo Co., Ltd.(国際航業)
http://deeplearning.jp/
1
書籍情報
• タイトル
ReDet: A Rotation-equivariant Detector for Aerial Object Detection
• 著者
Jiaming Han, Jian Ding, Nan Xue, Gui-Song Xia (中国武漢大学)
• CVPR2021に採択
• Paper
https://arxiv.org/abs/2103.07733
• Code
https://github.com/csuhan/ReDet
2
概要
• 航空写真における物体検出
物体のrotationを配慮すべき
oriented object detectionタスクでもいう
Oriented Bounding Boxes (OBBs)で対応
• 提案手法のcontribution
Backboneにrotation-equivariant CNNを導入し、rotation equivariance[1]とrotation
invarianceをencode(oriented object detectionにおけるrotation equivarianceの導入
が初)
RiRoI alignを提案し、rotation-equivariant特徴を抽出とrotation-invariant featureを抽
出
ネットワークを軽量化すると同時に、SOTAを達成
3
[1] Equivariance(同変?) is a property that applying transformations to the input produces transformations of the feature in a predictable way
𝛷 𝑇𝑟 𝐼 = 𝑇𝑟 𝛷 𝐼
既往研究 - oriented object detection
• 既往手法
– 様々な角度で回転させたanchorで、bboxの位置を回帰 → 計算量が増加
– RoI transformerで、普通のRoIをrotated RoIに変換し、anchor数を削減
– 物体領域を別の方法で表現(Gliding vertex/mask)
– R3Det、S2A-Netは、一般の特量マップとrotated bboxをalignする
– DRNは動的に特徴を選択し、 rotated bboxを検出
– CSLは角度の推定をもう一つのタスクとして学習
– CenterNetベース(小さい物体にいい精度を示す)
• 課題
– 一般的なCNNの場合、入力画像に回転かける≠特徴マップに同じ回転をかける
(rotation equivariance)→異なる回転に対応しきれない
– Rotation-equivariant networkは、回転不変の特徴を対象としていない
• 本手法は、backboneからrotation equivariance特徴を抽出し、headから回転
不変特徴を抽出 4
既往研究 - Rotation-equivariant Networks
• group conv/hexaconvというconvで畳み込む
• 内挿等でリサンプリングしたフィルタや、 circular harmonics(球面調和関
数)をフィルタにするなど、 equivariance情報を抽出
• 本手法は、Rotation-equivariant Networksを物体検出のbackboneに導入
(初)
5
既往研究 - Rotation-invariant Object Detection
• 従来の物体検出手法に対し、回転情報をencodeする機構(パラメータ)と
学習データの拡張が必要
• RoI warping拡張し、instance-levelの回転不変情報を抽出
– Rotated RoI warping
• 一般的なCNNだと、回転equivariantではないため、回転不変情報の抽出
が不完全
• 本手法は、Rotation-invariant RoI Align (RiRoI Align)で、回転equivariant
特徴量から、回転不変特徴を抽出
6
提案手法 - Rotation-equivariant Detector
• rotation-equivariant networksをbackboneとする
• Rotation-invariant RoI Alignを提案し、RoI毎回転不変特徴を抽出
7
提案手法 - Rotation-equivariant Backbone(ReResNet)
• equivariance :𝛷 𝑇𝑟
𝑋 𝐼 = 𝑇𝑟
𝑌 𝛷 𝐼
– Where, 𝑇𝑟=transformation group
• translation-equivariance
– CNNはtranslation equivariant
– 𝑇𝑡𝑓 ∗ 𝜑 𝑥 = 𝑇𝑡 𝑓 ∗ 𝜑 𝑥
– Where, 𝑇𝑡=translation group, f=feature map, 𝜑=convolution filters, ∗=convolution operation
• translation and rotation-equivariant convolution
– 最近の研究成果では、CNNを大きいgroupに拡張することで、同時に達成できる
– 𝑇𝑔𝑓 ∗ 𝜑 𝑔 = 𝑇𝑔 𝑓 ∗ 𝜑 𝑔
– Where, 𝑇𝑔=rotation group, g=平行移動と回転の半直積
• Rotation-equivariant Networks
– 複数rotation-equivariantレイヤで構成
8
提案手法 - Rotation-invariant RoI(RiRoI) Align
• rotated RoI(RRoI)は、空間次元(spatial dimension)対応し、回転次元
(orientation dimension)特徴に対応しきれていない
– max poolingにより、特徴マップから、反応が強い回転情報のみ残す
• 空間次元に対し、RiRoIはRRoIと同様に、特徴マップからwarpingし、
alignmentする
• 回転次元に対する特徴alignment
• 𝑓𝑅 = 𝐼𝑛𝑡 𝑆𝐶 𝑓𝑅, 𝑟 , 𝜃 , 𝑟 = 𝜃𝑁/2𝜋
• Where, SC=switching channels, Int=feature interpolation, r=index
9
提案手法 - 回転角度の推定
• Rotation-invariant Features
– 入力画像に𝑇𝑟かけても、出力に変化がなければrotation-invariant featuresといえる
– image、instance、pixel-levelに分解できる
– RiRoI Alignから得られた特徴マップ
– 𝛷 𝐼𝑅 = 𝑇𝑟
′𝛷 𝑇𝑟𝐼𝑅
Where, HRoI 𝐼𝑅 as the rotation-invariant representation of RRoI 𝑇𝑅𝐼𝑅
𝛷は特徴空間上の表現
𝑇𝑟
′は𝑇𝑟の逆変換
𝑇𝑟 = 𝑇 𝜃 , 𝜃は一般的な物体検出手法で学習できる
10
実験 - Datasets
• DOTA:回転情報をもつ最大級の航空写真物体検出データセット
– Version:
• v1.0:2806枚画像(800~4000pixel)、188,282個物体
• v1.5:小さい物体(<10pixel)が追加され、402,089個物体、v1.0より学習が安定
– 1024×1024のパッチ画像にして(stride=824)学習
– Train/test Augmentation: random horizontal flip, multiscale=(0.5,1.0,1.5), random
rotation
• HRSC2016:船の検出データセット
– 1061枚画像(300~1500pixel)
– 800×512にリサンプリングして学習
– Augmentation: random horizontal flip
11
実験 – 実施詳細
• baseline
– ResNet + FPN
– ResNetはImageNetでpretrain
• 提案手法
– ReResNet: ImageNet-1Kでpretrain
– mmdetectionで手法を構築
12
実験結果 – Ablation Studies
• Rotation-equivariant backbone(ReResNet)
– 分類精度が落ちるが、検出の精度が良くなる
– モデルサイズも大幅に削減
13
実験結果 – Ablation Studies
• RiRoI Align
– 提案手法の有効性を確認
– interpolationは隣のrotation-equivariant特徴マップで行った方がいい
• やり過ぎると、情報が曖昧になってしまう
14
実験結果 – Ablation Studies
• rotation augmentation
– 提案手法は、特殊なネットワーク内部のrotation augmentationとみなせる
– 直接回転のaugmentationとの効果が近いが、収束が早くなる(モデルサイズが同じレ
ベル)
15
実験結果 – Ablation Studies
• 汎化性能(別のデータセットで実験)
– 他のデータセットでも、提案手法の有効性を確認
– 特に、AP75の結果から、位置推定の性能向上が顕著
16
実験結果 -既存SOTAとの比較
17
実験結果 -既存SOTAとの比較
• 小さい物体において、提案手法の有効性がさらに顕著
18
実験結果 -既存SOTAとの比較
• 単一物体のデータセットでも有効性を確認
19
実験結果 - 結果サンプル
20
まとめ
• Backboneに、rotation-equivariant layerを導入、 rotation-equivariant情報
を抽出
• RiRoI alignを提案し、 rotation-equivariant情報から、rotation-invariant情報
を抽出
• ネットワークサイズを削減する同時に、検出精度を向上
21

More Related Content

More from Deep Learning JP

【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究についてDeep Learning JP
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP
 
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...Deep Learning JP
 
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...Deep Learning JP
 
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデルDeep Learning JP
 
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...Deep Learning JP
 
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...Deep Learning JP
 
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLMDeep Learning JP
 
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without SupervisionDeep Learning JP
 
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...Deep Learning JP
 

More from Deep Learning JP (20)

【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
 
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
 
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
 
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル
 
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
 
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
 
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM
 
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
 
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
 

Recently uploaded

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

[DL輪読会]ReDet: A Rotation-equivariant Detector for Aerial Object Detection

  • 1. DEEP LEARNING JP [DL Papers] ReDet: A Rotation-equivariant Detector for Aerial Object Detection Yuting Lin, Kokusai Kogyo Co., Ltd.(国際航業) http://deeplearning.jp/ 1
  • 2. 書籍情報 • タイトル ReDet: A Rotation-equivariant Detector for Aerial Object Detection • 著者 Jiaming Han, Jian Ding, Nan Xue, Gui-Song Xia (中国武漢大学) • CVPR2021に採択 • Paper https://arxiv.org/abs/2103.07733 • Code https://github.com/csuhan/ReDet 2
  • 3. 概要 • 航空写真における物体検出 物体のrotationを配慮すべき oriented object detectionタスクでもいう Oriented Bounding Boxes (OBBs)で対応 • 提案手法のcontribution Backboneにrotation-equivariant CNNを導入し、rotation equivariance[1]とrotation invarianceをencode(oriented object detectionにおけるrotation equivarianceの導入 が初) RiRoI alignを提案し、rotation-equivariant特徴を抽出とrotation-invariant featureを抽 出 ネットワークを軽量化すると同時に、SOTAを達成 3 [1] Equivariance(同変?) is a property that applying transformations to the input produces transformations of the feature in a predictable way 𝛷 𝑇𝑟 𝐼 = 𝑇𝑟 𝛷 𝐼
  • 4. 既往研究 - oriented object detection • 既往手法 – 様々な角度で回転させたanchorで、bboxの位置を回帰 → 計算量が増加 – RoI transformerで、普通のRoIをrotated RoIに変換し、anchor数を削減 – 物体領域を別の方法で表現(Gliding vertex/mask) – R3Det、S2A-Netは、一般の特量マップとrotated bboxをalignする – DRNは動的に特徴を選択し、 rotated bboxを検出 – CSLは角度の推定をもう一つのタスクとして学習 – CenterNetベース(小さい物体にいい精度を示す) • 課題 – 一般的なCNNの場合、入力画像に回転かける≠特徴マップに同じ回転をかける (rotation equivariance)→異なる回転に対応しきれない – Rotation-equivariant networkは、回転不変の特徴を対象としていない • 本手法は、backboneからrotation equivariance特徴を抽出し、headから回転 不変特徴を抽出 4
  • 5. 既往研究 - Rotation-equivariant Networks • group conv/hexaconvというconvで畳み込む • 内挿等でリサンプリングしたフィルタや、 circular harmonics(球面調和関 数)をフィルタにするなど、 equivariance情報を抽出 • 本手法は、Rotation-equivariant Networksを物体検出のbackboneに導入 (初) 5
  • 6. 既往研究 - Rotation-invariant Object Detection • 従来の物体検出手法に対し、回転情報をencodeする機構(パラメータ)と 学習データの拡張が必要 • RoI warping拡張し、instance-levelの回転不変情報を抽出 – Rotated RoI warping • 一般的なCNNだと、回転equivariantではないため、回転不変情報の抽出 が不完全 • 本手法は、Rotation-invariant RoI Align (RiRoI Align)で、回転equivariant 特徴量から、回転不変特徴を抽出 6
  • 7. 提案手法 - Rotation-equivariant Detector • rotation-equivariant networksをbackboneとする • Rotation-invariant RoI Alignを提案し、RoI毎回転不変特徴を抽出 7
  • 8. 提案手法 - Rotation-equivariant Backbone(ReResNet) • equivariance :𝛷 𝑇𝑟 𝑋 𝐼 = 𝑇𝑟 𝑌 𝛷 𝐼 – Where, 𝑇𝑟=transformation group • translation-equivariance – CNNはtranslation equivariant – 𝑇𝑡𝑓 ∗ 𝜑 𝑥 = 𝑇𝑡 𝑓 ∗ 𝜑 𝑥 – Where, 𝑇𝑡=translation group, f=feature map, 𝜑=convolution filters, ∗=convolution operation • translation and rotation-equivariant convolution – 最近の研究成果では、CNNを大きいgroupに拡張することで、同時に達成できる – 𝑇𝑔𝑓 ∗ 𝜑 𝑔 = 𝑇𝑔 𝑓 ∗ 𝜑 𝑔 – Where, 𝑇𝑔=rotation group, g=平行移動と回転の半直積 • Rotation-equivariant Networks – 複数rotation-equivariantレイヤで構成 8
  • 9. 提案手法 - Rotation-invariant RoI(RiRoI) Align • rotated RoI(RRoI)は、空間次元(spatial dimension)対応し、回転次元 (orientation dimension)特徴に対応しきれていない – max poolingにより、特徴マップから、反応が強い回転情報のみ残す • 空間次元に対し、RiRoIはRRoIと同様に、特徴マップからwarpingし、 alignmentする • 回転次元に対する特徴alignment • 𝑓𝑅 = 𝐼𝑛𝑡 𝑆𝐶 𝑓𝑅, 𝑟 , 𝜃 , 𝑟 = 𝜃𝑁/2𝜋 • Where, SC=switching channels, Int=feature interpolation, r=index 9
  • 10. 提案手法 - 回転角度の推定 • Rotation-invariant Features – 入力画像に𝑇𝑟かけても、出力に変化がなければrotation-invariant featuresといえる – image、instance、pixel-levelに分解できる – RiRoI Alignから得られた特徴マップ – 𝛷 𝐼𝑅 = 𝑇𝑟 ′𝛷 𝑇𝑟𝐼𝑅 Where, HRoI 𝐼𝑅 as the rotation-invariant representation of RRoI 𝑇𝑅𝐼𝑅 𝛷は特徴空間上の表現 𝑇𝑟 ′は𝑇𝑟の逆変換 𝑇𝑟 = 𝑇 𝜃 , 𝜃は一般的な物体検出手法で学習できる 10
  • 11. 実験 - Datasets • DOTA:回転情報をもつ最大級の航空写真物体検出データセット – Version: • v1.0:2806枚画像(800~4000pixel)、188,282個物体 • v1.5:小さい物体(<10pixel)が追加され、402,089個物体、v1.0より学習が安定 – 1024×1024のパッチ画像にして(stride=824)学習 – Train/test Augmentation: random horizontal flip, multiscale=(0.5,1.0,1.5), random rotation • HRSC2016:船の検出データセット – 1061枚画像(300~1500pixel) – 800×512にリサンプリングして学習 – Augmentation: random horizontal flip 11
  • 12. 実験 – 実施詳細 • baseline – ResNet + FPN – ResNetはImageNetでpretrain • 提案手法 – ReResNet: ImageNet-1Kでpretrain – mmdetectionで手法を構築 12
  • 13. 実験結果 – Ablation Studies • Rotation-equivariant backbone(ReResNet) – 分類精度が落ちるが、検出の精度が良くなる – モデルサイズも大幅に削減 13
  • 14. 実験結果 – Ablation Studies • RiRoI Align – 提案手法の有効性を確認 – interpolationは隣のrotation-equivariant特徴マップで行った方がいい • やり過ぎると、情報が曖昧になってしまう 14
  • 15. 実験結果 – Ablation Studies • rotation augmentation – 提案手法は、特殊なネットワーク内部のrotation augmentationとみなせる – 直接回転のaugmentationとの効果が近いが、収束が早くなる(モデルサイズが同じレ ベル) 15
  • 16. 実験結果 – Ablation Studies • 汎化性能(別のデータセットで実験) – 他のデータセットでも、提案手法の有効性を確認 – 特に、AP75の結果から、位置推定の性能向上が顕著 16
  • 21. まとめ • Backboneに、rotation-equivariant layerを導入、 rotation-equivariant情報 を抽出 • RiRoI alignを提案し、 rotation-equivariant情報から、rotation-invariant情報 を抽出 • ネットワークサイズを削減する同時に、検出精度を向上 21