2. 1. Uncertainty based method
try to find the samples which are hard to learn
method title year
Using Bayesian
to estimate uncertainty
Deep bayesian active learning with image
data
ICML’17
Using non-Bayesian
to estimate uncertainty
Simple and scalable predictive uncertainty
estimation using deep ensembles
NeurIPS’17
Using dropout
to estimate uncertainty
Dropout as a Bayesian Approximation:
Representing Model Uncertainty in Deep
Learning
ICML’16
Using entropy of the softmax
to estimate uncertainty
Cost-effective active learning for deep
image classification
2017
Using ensemble
to estimate uncertainty
The power of ensembles for active learning
in image classification
CVPR’18
predict target losses of
unlabeled samples
Learning loss for active learning CVPR’19
3. 2. Representation based method
try to find a diverse set of samples
that optimally represents the complete dataset distribution.
method title year
density-based methods
Active learning for convolutional neural
networks: A core-set approach
ICLR’2018
learns a VAE-GAN hybrid
network to select unlabeled
samples that are not well
represented in the labeled set
Variational adversarial active learning ICCV’2019
4. Active Learning Acquisition Function
method detail
Max Entropy
Choose pool points that maximise the predictive entropy
BALD
Choose pool points that are expected to maximise the mutual
information between predictions and model posterior
Variation Ratios
Maximise the Variation Ratios
Mean STD
Maximise mean standard deviation
APPENDIX
there exist model parameters
that produce disagreeing predictions
with high certainty
5. Deep bayesian active learning with image data
(ICML’17)
Bayesian deep learning:
• combine Bayesian deep learning into the active learning framework in
a practical way
Acquisition Functions and their Approximations
• Key idea is approximating traditional acquisition functions (*refer previous slide)
by the approximate distribution q∗
θ(ω)
• q∗
θ(ω) minimises the Kullback-Leibler(KL) divergence to the true model
posterior p(ω|Dtrain) given a training set Dtrain.
https://arxiv.org/pdf/1703.02910.pdf
https://arxiv.org/pdf/1901.02731.pdf
6. Dropout as a Bayesian Approximation: Representing
Model Uncertainty in Deep Learning (ICML’16)
Dropout approximates Bayesian:
• dropout training in deep neural networks (NNs) as
approximate Bayesian inference in deep Gaussian processes.
uncertainty can be obtained from dropout NN
• if the uncertainty envelope intersects that of other classes (such as in
the case of the middle input image), the softmax output uncertainty
can be as large as the entire space.
http://proceedings.mlr.press/v48/gal16.pdf
7. Minority & Most Informative Samples:
• Selected by uncertainty-based method
and added into the labeled set after active user labeling
Majority & Clearly Classified Samples:
• automatically selected and iteratively assigned pseudo-labels.
Cost-effective active learning
for deep image classification (2017)
https://arxiv.org/abs/1701.03551
softmax
output of
the CNN
8. Approximate Acquisition Functions
• Approximating traditional acquisition functions (*refer previous slide)
by an ensemble of N classifiers
The power of ensembles for active learning
in image classification (CVPR’18)
http://openaccess.thecvf.com/content_cvpr_2018/papers/Beluch_The_Power_of_CVPR_2018_paper.pdf
Approximate
the entropy method
by the ensemble of N classifiers
Approximate
the BALD method
by the ensemble of N classifiers
9. Loss prediction module:
• learn it to predict target losses of unlabeled inputs.
• Then, this module can suggest data that the target model is likely to
produce a wrong prediction.
Learning loss for active learning (CVPR’19)
https://arxiv.org/pdf/1905.03677.pdf
10. Active learning for convolutional neural networks: A
core-set approach (ICLR’18)
https://arxiv.org/abs/1708.00489
Core-set selection problem:
• Core-set selection problem aims to find a small subset given a large
labeled dataset such that a model learned over the small subset is
competitive over the whole dataset.
selects a batch of samples:
• choose “b” center points such that the largest distance between a data
point and its nearest center is minimized in feature space
11. The minimax game between the VAE and Discriminator:
Discriminator
trained to discriminate between unlabeled and labeled data.
VAE
trained to trick the adversarial network into predicting that
all data points are from the labeled pool.
The samples that Discriminator predicts as unlabeled are informative
Variational adversarial active learning (ICCV’19)
http://openaccess.thecvf.com/content_ICCV_2019/papers/Sinha_Variational_Adversarial_Active_Learning_ICCV_2019_paper.pdf