TeamStation AI System Report LATAM IT Salaries 2024
do adversarially robust image net models transfer better
1. 2020/11/29
Ho Seong Lee (hoya012)
Cognex Deep Learning Lab
Research Engineer
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 1
2. Contents
• Introduction
• Related Work
• Experiments
• Analysis & Discussion
• Conclusion
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 2
3. Introduction
Transfer Learning is a widely-used paradigm in deep learning (maybe.. default..?)
• Models pre-trained on standard datasets (e.g. ImageNet) can be efficiently adapted to downstream tasks.
• Better pre-trained models yield better transfer results, suggesting that initial accuracy is a key aspect of
transfer learning performance.
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 3
Reference: “Do Better ImageNet Models Transfer Better?“, 2019 CVPR
4. Related Works
Transfer Learning in various domain
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 4
• Medical imaging
• “Comparison of deep transfer learning strategies for digital pathology”, 2018 CVPRW
• Language modeling
• “Senteval: An evaluation toolkit for universal sentence representations”, 2018 arXiv
• Object Detection, Segmentation
• “Faster r-cnn: Towards real-time object detection with region proposal networks”, 2015 NIPS
• “R-fcn: Object detection via region-based fully convolutional networks”, 2016 NIPS
• “Speed/accuracy trade-offs for modern convolutional object detectors”, 2017 CVPR
• “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and
fully connected crfs”, 2017 TPAMI
5. Related Works
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 5
Transfer Learning with fine-tuning or frozen feature-based methods
• “Analyzing the performance of multilayer neural networks for object recognition”, 2014 ECCV
• “Return of the devil in the details: Delving deep into convolutional nets”, 2014 arXiv
• “Rich feature hierarchies for accurate object detection and semantic seg- mentation”, 2014 CVPR
• “How transferable are features in deep neural networks?”, 2014 NIPS
• “Factors of transferability for a generic convnet representation”, 2015 TPAMI
• “Bilinear cnn models for fine- grained visual recognition”, 2015 ICCV
• “What makes ImageNet good for transfer learning?”, 2016 arXiv
• “Best practices for fine-tuning visual classifiers to new domains”, 2016 ECCV
→ They show that fine-tuning outperforms frozen feature-based methods
6. Related Works
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 6
Adversarial robustness
• “Towards deep learning models resistant to adversarial attacks”, 2018 ICLR
• “Virtual adversarial training: a regularization method for supervised and semi-supervised learning”,
2018
• “Provably robust deep learning via adversarially trained smoothed classifier”, 2019 NeurIPS
• And many papers has studied the features learned by these robust networks and suggested that they
improve upon those learned by standard networks.
• On the other hand, prior studies have also identified theoretical and empirical tradeoffs between
standard accuracy and adversarial robustness.
7. Related Works
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 7
Adversarial robustness and Transfer learning
• “Adversarially robust transfer learning”, 2019 arXiv
• Transfer learning can increase downstream-task adversarial robustness
• “Adversarially-Trained Deep Nets Transfer Better”, 2020 arXiv
• Investigate the transfer performance of adversarially robust networks. → Very similar work!
• Authors study a larger set of downstream datasets and tasks and analyze the effects of model
accuracy, model width, and data resolution.
8. Experiments
Motivation: Fixed-Feature Transfer Learning
• Basically we use the source model as a feature extractor for the target dataset, the trains a simple (often
linear) model on the resulting features
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 8
Reference: Stanford cs231n lecture note
9. Experiments
How can we improve transfer learning?
• Prior works suggest that accuracy on the source dataset is a strong indicator of performance on
downstream tasks.
• Still, it is unclear if improving ImageNet accuracy is the only way to improve performance.
• After all, the behavior of fixed-feature transfer is governed by models’ learned representations, which
are not fully described by source-dataset accuracy.
• These representations are, in turn, controlled by the priors that we put on them during training
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 9
architectural components, loss functions, augmentations, etc.
10. Experiments
The adversarial robustness prior
• Adversarial robustness refers to a model’s invariance to small (often imperceptible) perturbations of its
inputs.
• Robustness is typically induced at training time by replacing the standard empirical risk minimization
objective with a robust optimization objective
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 10
11. Experiments
Should adversarial robustness help fixed-feature transfer?
• In fact, adversarially robust models are known to be significantly less accurate than their standard
counterparts.
• It suggest that using adversarially robust feature representations should hurt transfer performance.
• On the other hand, recent work has found that the feature representations of robust models carry
several advantages over those of standard models.
• For example, adversarially robust representations typically have better-behaved gradients and thus
facilitate regularization-free feature visualization
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 11
12. Experiments
Experiments – Fixed Feature Transfer Learning
• To resolve these two conflicting hypotheses (adversarially robust feature representations should hurt
transfer performance. vs. feature representations of robust models carry several advantages over
those of standard models.), use a test bed of 12 standard transfer learning datasets.
• Use four ResNet-based architecture (ResNet-18, 50, WideResNet-50-x2, 50-x4)
• The results indicate that robust networks consistently extract better features for transfer learning than
standard networks.
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 12
13. Experiments
Experiments – Fixed Feature Transfer Learning
• To resolve these two conflicting hypotheses (adversarially robust feature representations should hurt
transfer performance. vs. feature representations of robust models carry several advantages over
those of standard models.), use a test bed of 12 standard transfer learning datasets.
• Use four ResNet-based architecture (ResNet-18, 50, WideResNet-50-x2, 50-x4)
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 13
14. Experiments
Experiments – Full-Network Fine Tuning
• A more expensive but often better-performing transfer learning method uses the pre-trained model as a
weight initialization rather than as a feature extractor.
• In other words, update all of the weights of the pre-trained model (via gradient descent) to minimize loss
on the target task.
• Many previous works find that for standard models, performance on full-network transfer learning is
highly correlated with performance on fixed-feature transfer learning.
• Hope that the findings of the last section (fixed-feature) also carry over to this setting (full-network).
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 14
15. Experiments
Experiments – Full-Network Fine Tuning
• Robust models match or improve on standard models in terms of transfer learning performance.
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 15
16. Experiments
Experiments – Full-Network Fine Tuning
• Also, adversarially robust networks consistently outperform standard networks in Object Detection &
Instance Segmentation
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 16
17. Analysis & Discussion
4.1 ImageNet accuracy and transfer performance
• Take a closer look at the similarities and differences in transfer learning between robust networks and
standard networks.
• Hypothesis: robustness and accuracy have counteracting yet separate effects!
• That is, higher accuracy improves transfer learning for a fixed level of robustness, and higher
robustness improves transfer learning for a fixed level of accuracy
• The results (cf. Figure 5; similar results for full-network transfer in Appendix F) support this hypothesis.
• The previously observed linear relationship between accuracy and transfer performance is often violated
once robustness aspect comes into play.
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 17
19. Analysis & Discussion
4.1 ImageNet accuracy and transfer performance
• In even more direct support of our hypothesis, find that when the robustness level is held fixed, the
accuracy- transfer correlation observed by prior works for standard models holds for robust models too.
• This findings also indicate that accuracy is not a sufficient measure of feature quality or versatility.
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 19
20. Analysis & Discussion
4.2 Robust models improve with width
• Previous works find that although increasing network depth improves transfer performance, increasing
width hurts it.
• The results corroborate this trend for standard networks but indicate that it does not hold for robust
networks, at least in the regime of widths tested.
• As width increases, transfer performance plateaus and decreases for standard models, but continues to
steadily grow for robust models.
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 20
Not always!!
21. Analysis & Discussion
4.2 Robust models improve with width
• Previous works find that although increasing network depth improves transfer performance, increasing
width hurts it.
• The results corroborate this trend for standard networks but indicate that it does not hold for robust
networks, at least in the regime of widths tested.
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 21
22. Analysis & Discussion
4.3 Optimal robustness levels for downstream tasks
• Although the best robust models often outperform the best standard models, the optimal choice of
robustness parameter ε varies widely between datasets. For example, when transferring to CIFAR- 10
and CIFAR-100, the optimal ε values were 3.0 and 1.0, respectively.
• In contrast, smaller values of ε (smaller by an order of magnitude) tend to work better for the rest of the
datasets.
• One possible explanation for this variability in the optimal choice of ε might relate to dataset granularity.
• Although we lack a quantitative notion of granularity (in reality, features are not simply singular pixels),
we consider image resolution as a crude proxy.
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 22
23. Analysis & Discussion
4.3 Optimal robustness levels for downstream tasks
• Since we scale target datasets to match ImageNet dimensions, each pixel in a low-resolution dataset
(e.g., CIFAR-10) image translates into several pixels in transfer, thus inflating dataset’s separability.
• Attempt to calibrate the granularities of the 12 image classification datasets used in this work, by first
downscaling all the images to the size of CIFAR-10 (32× 32), and then upscaling them to ImageNet size
once more.
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 23
24. Analysis & Discussion
4.3 Optimal robustness levels for downstream tasks
• After controlling for original dataset dimension, the dataset’s epsilon vs. transfer accuracy curves all
behave almost identically to CIFAR-10 and CIFAR-100 ones. (Similar results for full-network transfer)
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 24
25. Analysis & Discussion
4.4 Comparing adversarial robustness to texture robustness
• Consider texture-invariant models, i.e., models trained on the texture-randomizing Stylized ImageNet
(SIN) dataset.
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 25
26. Analysis & Discussion
4.4 Comparing adversarial robustness to texture robustness
• Transfer learning from adversarially robust models outperforms transfer learning from texture-invariant
models on all considered datasets.
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better? 26
Full-network
Fixed-feature
27. Conclusion
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better?
• Propose using adversarially robust models for transfer learning.
• Compare transfer learning performance of robust and standard models on a suite of 12
classification tasks, object detection, and instance segmentation.
• Find that adversarial robust neural networks consistently match or improve upon the
performance of their standard counterparts, despite having lower ImageNet accuracy.
• Take a closer look at the behavior of adversarially robust networks, and study the interplay
between ImageNet accuracy, model width, robustness, and transfer performance.
27
28. Conclusion
PR-290 | Do Adversarially Robust ImageNet Models Transfer Better?
• We can simply try this experiments! (https://github.com/Microsoft/robust-models-transfer)
28