This document discusses adapting an AI object identification (AOI) system to changes in domains. It proposes using attention and domain adaptation techniques. Specifically:
1) AOI is like fine-grained recognition, which can benefit from attention models that focus on discriminative regions.
2) Domain shift between different sensors/viewpoints can degrade performance, but domain adaptation methods like attention models and domain adversarial learning can help address this.
3) The paper proposes a method for unsupervised cross-city adaptation of road scene segmenters using global and class-wise domain alignment with an attention-based static object prior. This achieves state-of-the-art performance adapting models between cities.
3. Vision Science Lab (VSLab)
Analyzing
Street Views
Understanding
Personal Videos
3D & Robot Vision Human Sensing
Research Topicsin ComputerVision & Machine Learning
Wearable Camera Applications
Make3D
3
4. Challenges
p AOI is similar to fine-grained Recognition
p How to adapt to changes (e.g., due to different sensors/viewpoints)?
What kind of bird? Attentionshould help
image source
http://yassersouri.github.io/pages/fast-bird-part.html
Domain shift
image source
http://vision.cs.uml.edu/adaptation.html
Domain adaption should help
11. Motivation
p Goal: use domain adaptation to mitigate the effect of domain shift.
p Approaches:
n Supervised Fine-Tuning: CAN access the label on the target domain.
• Straightforward but time-consuming and expensive.
n Unsupervised Adaptation: CAN’T access the label on the target domain.
• More challenging but low cost.
Pixel labeling of one
Cityscapes image takes
90 minutes on average.[4]
[4] M. Cordts, M.Omran, S. Ramos, T. Rehfeld,M. Enzweiler, R. Benenson,U.Franke,S.Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene
understanding,” in CVPR,IEEE,2016.
a
Practical in real life !
12. Data Collection
p Use Google Streetview API to download images of different cities.
n Randomly sample locations at each city to ensure sufficient variations in visual appearance.
p Use Time-Machine feature to collect images pairs at the same location but different
times.
Tokyo
Rome
Rio
Taipei
T1 T2 T1 T2
Location B Location A
TimeTime
Unlabeled Image Pairs
Same location / Different Times
13. Our Dataset
p We propose a new dataset of complex road scenes, with:
n Diverse Appearance: includes 4 different cities across continents.
n Temporal Information: each city includes 1600 image pairs which provide helpful supervision
without any human interaction.
n Dense pixel annotations : each city includes 100 high-quality annotated images.
Please visit : https://yihsinchen.github.io/segmentation_adaptation/
17. Global Domain Alignment
p Our objective is to minimize by iteratively update domain classifier
and feature extractor :
• and : the images from source and target domain respectively.
• : the number of grids in each map.
• and the feature maps of source and target domain images.
• : the probability that the grid n of image x belongs to the source domain, where is the
sigmoid function.
19. Class-wise Domain Alignment
p Let each class do domain adversarial learning individually.
p But we must first address some problem :
n Under the unsupervisedsetting, we don’t have any label on target domain to
link with source domain.
• Can’t do domain adversarial learning against source domain.
n In global domain adaptation, we define each grid 𝑛 in the feature space as one
instance.
• Can’t directly use the labels which are in the image(pixel) space.
a pseudo label
agrid-level soft label
Up-Sampling
Input Image Network Prediction
feature space Pixel space
20. Class-wise Domain Alignment --- Grid-Level Soft Label
p (In source domain)
n Calculate grid-wise soft label Φ2
, (𝐼4) as the
probability of grid 𝑛 belonging to class 𝑐:
• 𝑖: is the pixel index in image space.
• 𝑛: is the grid index in feature space.
• 𝑅(𝑛): is the set of pixels that correspond to grid
n.
• 𝑦= 𝐼4 : denotes the ground truth label of pixel 𝑖.
Pixel-Level Ground-truthGrid-Level Soft Label
21. Class-wise Domain Alignment --- Pseudo Label
p (In target domain)
n Calculate target-domain grid-wise soft
pseudo label Φ2
, (𝐼>) as the probability of
grid 𝑛 belonging to class 𝑐:
• 𝑖: is the pixel index in image space.
• 𝑛: is the grid index in feature space.
• 𝑅(𝑛): is the set of pixels that correspond to
grid n.
• 𝜙=
,
𝐼> : is the pixel-wise soft pseudo label of
pixel 𝑖 corresponding to class c
Pixel-Level Pseudo LabelGrid-Level Soft Label
22. Class-wise Domain Alignment
p Due to the pseudo label and soft label, we could “link” each class
between source and target domain.
p Using the same adversarial learning framework can be achieved.
Road
Car
Source Domain
(Ground Truth)
Target Domain
(Pseudo Label)
High
Low
Links of
Road
Probability
bar
27. Class-wise Domain Alignment---Static-Object Prior
p Use static-object prior to refine pseudo label.
p For pixel that belongs to static-object prior, we suppress its probability
corresponding to non-static objects.
• 𝑃'"$"=,(𝐼>) : the set of pixels belong to static-object prior .
• 𝐶'"$"=,: the set of static-object classes.
• Static-objects: building, road, sidewalk…etc.
• Non-static-objects: person, car, motorbike…etc
35. Recap
p AOI is similar to fine-grained Recognition
p How to adapt to changes (e.g., due to different sensors/viewpoints)?
What kind of bird? Attentionshould help
image source
http://yassersouri.github.io/pages/fast-bird-part.html
Domain shift
image source
http://vision.cs.uml.edu/adaptation.html
Domain adaption should help