9. 全体の流れ
大量画像収集+弱教師による選別で絶大な効果
– 既存の検出器+誤検出除去でデータを半自動構築
– 大規模データで任意の検出器に事前学習を実施
SSD, M2DetWSPD
4. Pre-training 5. Fine-tuning
SSD, M2Dete.g. Caltech Ped.
/ .
/
To get a better representation of person detection, we use
the person bboxes in 3. Bbox Refinement.
Fine-tuning for any person datasets.
2. Person Detection1. Image Collection 3. Bbox Refinement
Collect images taken in the vicinity
of 16 representative cities.
Faster R-CNN generates bbox in
the area considered to be person.
Remove noisy bbox by SVM binary
Classification; 2,886 training images.
10. Step 1. 画像収集
– 世界16都市で撮影された画像を収集
– YFCC100M(Flickr)を使用
SSD, M2DetWSPD
4. Pre-training 5. Fine-tuning
SSD, M2Dete.g. Caltech Ped.
/ .
/
To get a better representation of person detection, we use
the person bboxes in 3. Bbox Refinement.
Fine-tuning for any person datasets.
2. Person Detection1. Image Collection 3. Bbox Refinement
Collect images taken in the vicinity
of 16 representative cities.
Faster R-CNN generates bbox in
the area considered to be person.
Remove noisy bbox by SVM binary
Classification; 2,886 training images.
#img: 100M → 8.5M
#box: 0
11. Step 2. 既存検出器によるラベル付け
– 既存検出器により検出枠(bbox)を付与
– 本研究ではFaster R-CNNを使用
SSD, M2DetWSPD
4. Pre-training 5. Fine-tuning
SSD, M2Dete.g. Caltech Ped.
/ .
/
To get a better representation of person detection, we use
the person bboxes in 3. Bbox Refinement.
Fine-tuning for any person datasets.
2. Person Detection1. Image Collection 3. Bbox Refinement
Collect images taken in the vicinity
of 16 representative cities.
Faster R-CNN generates bbox in
the area considered to be person.
Remove noisy bbox by SVM binary
Classification; 2,886 training images.
#img: 8.5M
#box: 0 → 76M
12. Step 3. 誤検出の除去
– SVMによって各bboxが人物であるかを判断
– 学習データは少量を人手で用意
(人物画像・ネガティブ画像を各1,443枚)
SSD, M2DetWSPD
4. Pre-training 5. Fine-tuning
SSD, M2Dete.g. Caltech Ped.
/ .
/
To get a better representation of person detection, we use
the person bboxes in 3. Bbox Refinement.
Fine-tuning for any person datasets.
2. Person Detection1. Image Collection 3. Bbox Refinement
Collect images taken in the vicinity
of 16 representative cities.
Faster R-CNN generates bbox in
the area considered to be person.
Remove noisy bbox by SVM binary
Classification; 2,886 training images.
#img: 8.5M → 2.8M
#box: 76M → 8.7M
13. Step 4-5. 事前学習 & ファインチュー二ング
– Step 3のbboxを使用し検出器を学習
– 検出器にはSSD, M2Detを使用
SSD, M2DetWSPD
4. Pre-training 5. Fine-tuning
SSD, M2Dete.g. Caltech Ped.
/ .
/
To get a better representation of person detection, we use
the person bboxes in 3. Bbox Refinement.
Fine-tuning for any person datasets.
2. Person Detection1. Image Collection 3. Bbox Refinement
Collect images taken in the vicinity
of 16 representative cities.
Faster R-CNN generates bbox in
the area considered to be person.
Remove noisy bbox by SVM binary
Classification; 2,886 training images.
#img: 2.8M
#box: 8.7M
14. Q. 弱教師あり学習とは?#あくまで今回の場合
A. 単純なYES/NO質問に回答すること
– Step 2の検出枠に人物の全身写っているか否か?
– Step 3のデータ洗浄にて教師となる人物画像の質向上
5. Fine-tuning
tion 3. Bbox Refinement
tes bbox in
to be person.
Remove noisy bbox by SVM binary
Classification; 2,886 training images.YES: 2値判別器の正例に追加
NO: 学習データから除外
1,000枚程度画像が集まるまで実施
(数時間で終了)
データ洗浄前(人物以外も混在)
データ洗浄後(9割程度人物を含む; 次ページ)
2値識別により教師データの質向上!