Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
1. Model-Agnostic Meta-Learning for Fast
Adaptation of Deep Networks
Yoonho Lee
Department of Computer Science and Engineering
Pohang University of Science and Technology
September 14, 2017
3. Meta-learning
RL methods take a long time to train: needs meta-learning
The meta-train set for humans would be: objects in real life,
experience playing different games etc
7. Previous Deep Meta-Learning Methods
RNNs as learners12
(assuming a sufficiently expressive RNN) Search space
includes all conceivable ML algorithms
Moves the burden of innovation to RNNs
Ignores advances achieved in ML by humans
Subpar results
1
Adam Santoro et al. “One-shot Learning with Memory-Augmented Neural
Networks”. In: ICML (2016).
2
Yan Duan et al. “RLˆ2: Fast Reinforcement Learning via Slow
Reinforcement Learning”. In: (2016).
9. Previous Deep Meta-Learning Methods
Metric Learning34
Learn a metric in input space
Specialized to one/few-shot classification(Omniglot,
MiniImageNet etc)
Cannot use in other problems (e.g. RL)
3
Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. “Siamese Neural
Networks for One-shot Image Recognition”. In: ICML (2015).
4
Oriol Vinyals et al. “Matching Networks for One Shot Learning”. In:
NIPS (2016).
11. Previous Deep Meta-Learning Methods
Optimizer Learning56
Learn parameter update given gradients (search space includes
SGD, RMSProp, Adam etc)
Applicable to any architecture/task
Best performance on Omniglot, MiniImageNet
5
Sachin Ravi and Hugo Larochelle. “Optimization as a Model for Few-shot
Learning”. In: ICLR (2017).
6
Marcin Andrychowicz et al. “Learning to learn by gradient descent by
gradient descent”. In: NIPS (2016).
13. ImageNet pre-training
Pretrain network on ImageNet classification, then fine-tune
network on new task.
Enables NNs to learn new vision tasks using relatively small
datasets.
This works because we have a huge labelled image dataset
and the manifold of images have a somewhat consistent
structure even between different datasets and tasks.
How do we bring ’Initialization as meta-learning’ to non-vision
domains such as speech/NLP/RL?
22. Discussion
Parameter space noise (as opposed to policy space noise) has
been shown to result in more consistent exploration89. This
supports MAML’s idea.
Why does MAML not overfit when taking multiple gradient
steps?
Do we need to overwrite all weights during adaptation?
8
Matthias Plappert et al. “Parameter Space Noise for Exploration”. In:
(2017).
9
Meire Fortunato et al. “Noisy Networks for Exploration”. In: (2017).
23. References I
[1] Marcin Andrychowicz et al. “Learning to learn by gradient
descent by gradient descent”. In: NIPS (2016).
[2] Yan Duan et al. “RLˆ2: Fast Reinforcement Learning via
Slow Reinforcement Learning”. In: (2016).
[3] Chelsea Finn, Pieter Abbeel, and Sergey Levine.
“Model-Agnostic Meta-Learning for Fast Adaptation of Deep
Networks”. In: (2017).
[4] Meire Fortunato et al. “Noisy Networks for Exploration”. In:
(2017).
[5] Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov.
“Siamese Neural Networks for One-shot Image Recognition”.
In: ICML (2015).
[6] Zhenguo Li et al. “Meta-SGD: Learning to Learn Quickly for
Few Shot Learning”. In: (2017).
24. References II
[7] Nikhil Mishra, Mostafa Rohaninejad, and Xi UC Chen Pieter
Abbeel Berkeley. “Meta-Learning with Temporal
Convolutions”. In: (2017).
[8] Matthias Plappert et al. “Parameter Space Noise for
Exploration”. In: (2017).
[9] Sachin Ravi and Hugo Larochelle. “Optimization as a Model
for Few-shot Learning”. In: ICLR (2017).
[10] Adam Santoro et al. “One-shot Learning with
Memory-Augmented Neural Networks”. In: ICML (2016).
[11] Oriol Vinyals et al. “Matching Networks for One Shot
Learning”. In: NIPS (2016).
[12] Jx Wang et al. “Learning to Reinforcement Learn”. In:
(2016).