Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAMの解説

Unsupervised Collaborative Learning of
Keyframe Detection and Visual Odometry
Towards Monocular Deep SLAM [Sheng & Xu+, ICCV’19]
東京大学相澤研究室
M2 金子真也

1
本論文
• Unsupervised Collaborative Learning of Keyframe Detection
and Visual Odometry Towards Monocular Deep SLAM
– 著者: L. Sheng, D. Xu, W. Ouyang and X. Wang
– 所属: Beihang University, Oxford, SenseTime
– 採択会議: ICCV2019

2
本論文
• Unsupervised Collaborative Learning of Keyframe Detection
and Visual Odometry Towards Monocular Deep SLAM
– 著者: L. Sheng, D. Xu, W. Ouyang and X. Wang
– 所属: Beihang University, Oxford, SenseTime
– 採択会議: ICCV2019
– Monocular Deep SLAMを実現したいという強い気持ちの論文
– わかりみが深い

3
Introduction
• Visual SLAM
– 3D reconstruction + Camera pose estimation
– 両者の同時最適化 (Bundle Adjustment)
Direct Sparse Odometry [Engel+, TPAMI’18]

4
Introduction
• Deep Learning for Visual SLAM (Deep SLAM)
End-to-end Deep SLAMDL helps SLAM
SfMLearner [Zhou+, CVPR’17]
CNN-SLAM [Tateno+, CVPR’17]
CodeSLAM [Tateno+, CVPR’18]
DeepTAM [Zhou+, ECCV’18]
This figure is from Tombari’s presentation slide @ ICCVW.

5
Introduction
• Deep Learning for Visual SLAM (Deep SLAM)
End-to-end Deep SLAMDL helps SLAM
CNN-SLAM [Tateno+, CVPR’17]
CodeSLAM [Tateno+, CVPR’18]
DeepTAM [Zhou+, ECCV’18]
SfMLearner [Zhou+, CVPR’17]
本論文の目標は,
この領域での最高のDeep SLAMを作ること
This figure is from Tombari’s presentation slide @ ICCVW.

6
Related works
• SfMLearner [Zhou+, CVPR’17]
– 古典的なSfMを応用し, UnsupervisedにDeep SLAMを実現
– Training
• Photometric errorを最小化するように学習

7
Related works
– Inference
• 入力画像の奥行き画像と, ２視点間のカメラ姿勢をCNNで回帰

8
Related works
– Inference
• 入力画像の奥行き画像と, ２視点間のカメラ姿勢をCNNで回帰
より従来のVSLAMに近い
Deep SLAMを実現するためには？？？

9
Related works
• VSLAM
– VSLAMで最も重要な要素はBundle Adjustment (BA)

10
Related works
• VSLAM
画像 𝑍𝑍𝑗𝑗 画像 𝑍𝑍𝑗𝑗+1
𝒖𝒖𝑖𝑖,𝑗𝑗+1特徴点 𝒖𝒖𝑖𝑖,𝑗𝑗
カメラ姿勢 [𝐑𝐑𝑗𝑗, 𝐭𝐭𝑗𝑗]
𝒖𝒖𝑖𝑖,𝑗𝑗
投影点
𝑓𝑓 𝐗𝐗𝑖𝑖 𝐑𝐑𝑗𝑗, 𝐭𝐭𝑗𝑗) Bundle
𝑍𝑍𝑗𝑗+1
[𝐑𝐑𝑗𝑗+1, 𝐭𝐭𝑗𝑗+1]
𝒖𝒖𝑖𝑖,𝑗𝑗+1
画像 𝑍𝑍𝑗𝑗
3D位置 𝐗𝐗𝑖𝑖

11
Related works
• VSLAM
最適化
Bundle
𝑍𝑍𝑗𝑗+1
投影点
𝑓𝑓 𝐗𝐗𝑖𝑖 𝐑𝐑𝑗𝑗, 𝐭𝐭𝑗𝑗)

12
Related works
• VSLAM
• 三次元地図に, 三次元点とその点が属するカメラ画像を登録
• カメラ画像をKeyframe (KF)と呼ぶ
最適化
Bundle
𝑍𝑍𝑗𝑗+1
投影点
𝑓𝑓 𝐗𝐗𝑖𝑖 𝐑𝐑𝑗𝑗, 𝐭𝐭𝑗𝑗)
Keyframe

13
Related works
• VSLAM
Keyframe
[1] ORB-SLAM2 for Monocular, Stereo and RGB-D Cameras [Mur-Artal+, ToR17]

14
Related works
• VSLAM
– Keyframeの選び方
1. 重複を避けるようにある程度間隔を空ける

15
Related works
• VSLAM
2. 十分な地図が作れなくなるのでそれなりに必要

16
Related works
• VSLAM
→ 職人技のような挿入条件の設定が必要

17
Related works
• VSLAM
→ 職人技のような挿入条件の設定が必要
– この選択をCNNで実現し, SfMLearnerに組み込めないか？

18
Proposed method
• SfMLearner with KF selection
– KF選択を行いながら, 三次元復元とカメラ姿勢推定を行うような
Deep SLAMの実現

19
Proposed method
– Unsupervised collaborative learning
• 三次元復元 + カメラ姿勢推定 + KF選択 ←New!!!
KF selection network
Depth + Camera pose network
(Visual Odometry)

(Visual Odometry)
20
Proposed method
KF selection network
- 2枚の画像間のsimilarity
scoreを回帰
- このscoreに応じてKFの
選択を行う

21
Proposed method
(Visual Odometry) KF selection network

22
Proposed method
• Training
– Data pair
• Sequential frames ℐ𝑠𝑠 ={𝐈𝐈𝑡𝑡−1, 𝐈𝐈𝑡𝑡, 𝐈𝐈𝑡𝑡+1}
• Keyframes {𝐈𝐈𝑝𝑝, 𝐈𝐈𝑛𝑛}
Nearest Keyframe
2nd nearest Keyframe

23
Proposed method
• Training
– Data pair
– Visual Odometry (≈SfMLearner)
• Photometric error + cycle consistency + depth smooth term
Target 𝐈𝐈𝑡𝑡
𝐃𝐃𝑡𝑡
Reference 𝐈𝐈𝑟𝑟
𝐃𝐃𝑟𝑟
Warped ref 𝐈𝐈𝑡𝑡←𝑟𝑟
𝐃𝐃𝑡𝑡
𝐃𝐃𝑟𝑟
Photometric error Cycle Consistency
t
r
Target 𝐈𝐈𝑡𝑡Warped tgt 𝐈𝐈𝑟𝑟←𝑡𝑡
Warped2
tgt 𝐈𝐈𝑡𝑡←𝑟𝑟←𝑡𝑡
Warped tgt 𝐈𝐈𝑟𝑟←𝑡𝑡

24
Proposed method
• Training
– Data pair
– Keyframe selection
• Triplet loss
<𝐈𝐈𝑡𝑡, 𝐈𝐈𝑠𝑠, 𝐈𝐈𝑝𝑝>
s
t
p
0.1
n
大小

25
Proposed method
• Training
– Data pair
• Triplet loss
<𝐈𝐈𝑡𝑡, 𝐈𝐈𝑠𝑠, 𝐈𝐈𝑛𝑛>
s
t
p
0.1
n
大小
大小0.8

26
Proposed method
• Training
– Data pair
• Triplet loss
<𝐈𝐈𝑡𝑡, 𝐈𝐈𝑠𝑠, 𝐈𝐈𝑛𝑛>
s
t
p
0.1
n
大小
大小0.8

27
Proposed method
• Training
– Data pair
• Triplet loss
KFはどのように選ばれるのか？

28
Proposed method
• Training
– Keyframe update & management
Random KF initialization
for epoch:
for iteration:
Choose training pair {ℐ𝑠𝑠, 𝐈𝐈𝑝𝑝, 𝐈𝐈𝑛𝑛}
Train all the model
if iteration > 200 &
Similarity(I𝑝𝑝, I𝑡𝑡) > th:
Insert tgt frame I𝑡𝑡 as KF
Merge KF
KF pool 𝒫𝒫 𝐾𝐾
Dataset
Model

29
Proposed method
• Training
for epoch:
for iteration:
Choose training pair {𝓘𝓘𝒔𝒔, 𝐈𝐈𝒑𝒑, 𝐈𝐈𝒏𝒏}
Train all the model
Merge KF
ℐ𝑠𝑠
{𝐈𝐈𝑝𝑝, 𝐈𝐈𝑛𝑛}
𝐈𝐈𝑡𝑡
Model
Dataset

30
Proposed method
• Training
for epoch:
for iteration:
Choose training pair {ℐ𝑠𝑠, I𝑝𝑝, I𝑛𝑛}
Train all the model
Merge KF
Dataset
ℐ𝑠𝑠
𝐈𝐈𝑡𝑡
Model Loss
Train

31
Proposed method
• Training
for epoch:
for iteration:
Train all the model
Merge KF
Dataset
ℐ𝑠𝑠
𝐈𝐈𝑡𝑡
Model Loss
Train

32
Proposed method
• Training
for epoch:
for iteration:
Train all the model
Similarity(𝐈𝐈𝑝𝑝, 𝐈𝐈𝑡𝑡) > th:
Insert tgt frame 𝐈𝐈𝒕𝒕 as KF
Merge KF
Dataset
ℐ𝑠𝑠
𝐈𝐈𝑝𝑝
𝐈𝐈𝑡𝑡
Model Score

33
Proposed method
• Training
for epoch:
for iteration:
Train all the model
Merge KF
Dataset
ℐ𝑠𝑠
𝐈𝐈𝑝𝑝
𝐈𝐈𝑡𝑡
Model Score

34
Proposed method
• Training
for epoch:
for iteration:
Train all the model
Merge KF
Dataset
ℐ𝑠𝑠
𝐈𝐈𝑝𝑝
𝐈𝐈𝑡𝑡
Model Score

35
Proposed method
• Training
for epoch:
for iteration:
Train all the model
Merge KF
Dataset
ℐ𝑠𝑠
𝐈𝐈𝑝𝑝
𝐈𝐈𝑡𝑡
Model Score

36
Proposed method
• Training
for epoch:
for iteration:
Train all the model
Merge KF
Dataset
𝐈𝐈𝑛𝑛𝐈𝐈𝑝𝑝
Model
ℐ𝑠𝑠
𝐈𝐈𝑡𝑡
Scores

37
Proposed method
• Training
for epoch:
for iteration:
Train all the model
Merge KF
Dataset
Model

38
Proposed method
• Training
for epoch:
for iteration:
Train all the model
Similarity(𝐼𝐼𝑝𝑝, 𝐼𝐼𝑡𝑡) > th:
Insert tgt frame 𝐼𝐼𝑡𝑡 as KF
Merge KF
Dataset
Model
この操作を繰り返すことで
KF poolの最適化を行う

39
Experimental results
• KITTI dataset
– Monocular Depth Estimation
KF selectionによって学習データを調整することで, 学習が安定し
推定精度も高くなる

40
• KITTI dataset
– Monocular Depth Estimation
KF selectionによって学習データを調整することで, 学習が安定し
推定精度も高くなる

41
• KITTI dataset
– Absolute Trajectory Error (ATE)
KF selectionがdata augmentationの効果を持ち, 結果としてカメラ
姿勢の推定精度が向上

42
• KITTI dataset
– Average Rotation Errors
とはいえカメラの回転の推定精度はORB-SLAM[Mur-Artal, TOR15]には
勝てていない状況

43
• KITTI dataset
• カメラが並進する場所では, 均一になるように選択
• カメラが回転する場所では, 変化が激しいのでより刻んだ選択

44
• KITTI dataset
– Ablation study
Depth推定
カメラ軌跡推定

45
Conclusion
– VSLAMで最も重要なKF selectionを, SfMLearnerの枠組みに追加
– UnsupervisedでKF selectionを学習する手法を提案
– 従来手法よりも高精度な奥行き推定, カメラ姿勢推定を達成.
• 感想
– 従来人手の緻密な設計が必要だったKF selectionを, unsupervisedに
CNNで学習し実現した点が新しく非常に面白い
– KF selectionだけでなく, Bundle Adjustment等の最適化要素も追加
できるとDeep SLAMの実現により近付きそう

Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAMの解説

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAMの解説

Similar to Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAMの解説 (20)

More from Masaya Kaneko

More from Masaya Kaneko (7)

Recently uploaded

Recently uploaded (20)

Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAMの解説