Efficient Neural Architecture Search via Parameters Sharing @ ICML2018読み会

© DeNA Co., Ltd.
論文紹介
Efficient Neural
Architecture Search via
Parameters Sharing
ICML2018読み会
July 28, 2018
Tomohiro Kato
AI System Dept.
DeNA Co., Ltd.

© DeNA Co., Ltd.
本資料について
 本資料は ICML2018読み会@DeNA (2018/07/28) の発表資料です
⁃ https://connpass.com/event/92705/
2

© DeNA Co., Ltd.
Agenda
 はじめに
 Neural Architechture Searchの概要
 論文紹介
 まとめ・所感
3

© DeNA Co., Ltd.
自己紹介
 加藤倫弘 (かとうともひろ)
 ㈱ DeNA システム本部 AIシステム部
 Computer Vision関連のプロジェクトを担当
 最近書いた(link)
 Chainer → NNVM/TVM → Androidの話
 NN DistillerでPytorchモデルのpruningする
 DeepLearningのモデル軽量化 (ICLR2018)
4
@_tkato_Twitter

© DeNA Co., Ltd.
はじめに
 Efficient Neural Architecture Search via Parameters Sharing（ENAS）[1]
Hieu Pham Melody Guan Barret Zoph Quoc Le Jeff Dean
⁃ 1000x less expensive than standard Neural Architecture Search
⁃ 450 GPUs for 3~4 days → 1 GPU for <16 hours
 論文の選択理由
⁃ 現実的な時間で実験可能なNAS
• 自分のタスクでも試してみたい
⁃ 著者らの公式実装（TensorFlow）に惹かれた
• ポイントであるParameters Sharingの実装が面白い
• 実際に実装を解読して深く理解したい → Chainerで再現実装
5

© DeNA Co., Ltd.
Neural Architecture Search
 目的に最適なモデルアーキテクチャを自動設計したい
 強化学習(RL)ベース[1,3,4]、進化的計算(GA)ベース、Progressive Search[5]など
 課題1：報酬計算のために、大量の候補モデル (child) の学習が必要で遅い
⁃ 精度推定用のモデルを使う
⁃ 候補モデルのweightを共有する(ENAS, DARTS[2])
 課題2：探索空間が膨大
⁃ 選択候補をあらかじめhandcraftしたり、モデルの一部を探索+繰り返し構造(NASNet[4])
6
評価（精度、計算量、実行速度...）選択
探索空間
（レイヤー種別, パラメータ数...）
良いモデル

© DeNA Co., Ltd.
NAS family
7
NASNet[4] 1707.07012 CVPR2018
450GPU(P100) 3~4days
NAS[3] 1611.01578 ICLR2017
800GPU(K40) 21~28days
PNAS[5] 1712.00559
100GPU(P100) 1.5days
ENAS[1] 1802.03268 ICML2018
1GPU(1080Ti) 0.45days

© DeNA Co., Ltd.
実験結果: CIFAR10
8ENAS[1]のTable.2 より NAS familyをplot
800GPU(K40) 21~28days
100GPU(P100) 1.5days
1GPU(1080Ti) 0.45days
450GPU(P100) 3~4days ★

© DeNA Co., Ltd.
実験結果: CIFAR10
 1 GPUで 0.45日
 精度はNASNet-A + CutOut
よりやや劣る（- 0.34%）
⁃ 探索空間の大きさ？
9from Table.2 [1]
K40
1080Ti
1080Ti
P100
P100

© DeNA Co., Ltd.
ENASの基本設計
10
Child
Controller
1 2 3 4
※1 B.Zoph氏らの強化学習ベースのNASは基本同じ(NAS[3], NASNet[4])
※2 論文では定義されていないが、説明のため本スライドで定義
交互に出力(softmax)
・ノードの入力となるindex
・ノードのOp種
探索空間はhandcrafted
タスクによりメタなアーキテクチャ※2を決める
from [1]
※1

© DeNA Co., Ltd.
ENASの基本設計
11
Child
Controller
1 2 3 4
from [1]
※1

© DeNA Co., Ltd.
ENASの基本設計
12
Child
Controller
1 2 3 4
from [1]
※1

© DeNA Co., Ltd.
ENASの基本設計
13
Child
Controller
1 2 3 4
from [1]
※1

© DeNA Co., Ltd.
ENASの基本設計
14
Child
Controller
1 2 3 4
from [1]
※1

© DeNA Co., Ltd.
ENASの基本設計
15
Child
Controller
1 2 3 4
from [1]
※1

© DeNA Co., Ltd.
ENASの基本設計
16
Child
Controller
1 2 3 4
from [1]
※1

© DeNA Co., Ltd.
ENASの基本設計
 前記+2のメタアーキテクチャを定義
17
CNN micro search space
RNN cell
2種類のセルのstack
セル内部が探索対象
活性化関数と入力を推定
ControllerController
Child
Child
from [1]
LSTMやGRUのようなcellをつくりたい

© DeNA Co., Ltd.
ENAS key contribution
 探索空間を1つのDAGとして定義し、subgraphをchildとする
 child間でweight sharingするので、child毎にスクラッチ学習不要
18
接続するノードを都度controllerが決める
...
from [1]

© DeNA Co., Ltd.
ENAS training loop
 1-iteration毎にcontrollerからchildをサンプルして学習
⁃ controllerのweightは固定だが、確率的にactionを出力するので毎回異なる
⁃ 1つの大きなグラフの1部だけをforward/backwardする
⁃ controllerは、childのvalidation accuracyを最大化させるように強化学習
19
for epoch
for batch
controllerでchildを1つ出力
childを1iteration学習
controllerをn-iteration学習
探索後、一番精度が良いモデルをベースにスクラッチで学習
（weight sharing用の制約がなくなるのでメタアーキテクチャを若干変える）

© DeNA Co., Ltd.
CNN micro serach space
 Cellのスタックとしてモデルを設計しCell内をControllerで決める(NASNet[4])
 macro search spaceより高精度
20
各セルはレイヤー間ではアーキテクチャ同一
ただしWeightはそれぞれ独立
ChildController
from [1]

© DeNA Co., Ltd.
CNN micro serach space: Childの階層
21
Layer
Cell
Node
Op
add
sep
3x3
sep
5x5
max
3x3
avg
3x3
id
concat
gather
※ 説明のため、一連の論文/実装とは厳密には異なる定義がありますfrom [1]

© DeNA Co., Ltd.
Cell
 Conv CellとReduction Cellを定義(NASNet[3])
22
Op種: sep3x3, sep5x5, max3x3, avg3x3, id
接続ルール
• add(op1, op2)を5つconcatする
• 各Opの入力は
• 自Opより前のOp出力
• 2個前までのレイヤーの出力
Conv/Reductionは基本同じ構成
ただしReductionは
• Cell入力前にstride2でdownsampling
• 同時に出力チャネル数を2倍にする
from [1]

© DeNA Co., Ltd.
23
h[t-1] h[t]
conv
1x1
concat
Cell
W0 W1 W2 W3 W4
Nodeの入出力のI/F (=NCHW) は基本同じだが、
チャネル数が異なる場合はconv1x1でI/Fをあわせる
入力毎にWeightが独立
入力に応じて選択
※h[t],h[t-1]から各nodeへの入力は本来4本あるが、簡略化のため省略
※
接続しうる全てのエッジを表示
この中からcontrollerで選択

© DeNA Co., Ltd.
24
h[t-1] h[t]
conv
1x1
concat
Cell
W0 W1 W2 W3 W4

© DeNA Co., Ltd.
Node, Op
27
sep
3x3
sep
5x5
max
3x3
avg
3x3
id
concat
gather
C ch
5*C ch
C ch
 5種類のOpを事前に定義し、Controllerに 1つだけ選択させる
Op2
Add
Op1
separable conv pooling 何もしない
NASNet[4]では
sep7x7, conv1x3など
12種のバリエーションがあった

© DeNA Co., Ltd.
Opの実装: sep3x3
28
c*h*w
5c*h*w
c*h*w
pw
1x1
dw
3x3
W0 W1 W2
W0 W1 W2
W0 W1 W2
bn
relu
h0 h1 h2
h5
※1
sep
3x3
入力に応じてWeightを切り替える
（予め、入力しうる数のWeightを持っておく）
※2
※1 同様にseparable convを繰りかえす
※2 dwとpwの間にrelu-bnはなし(NASNet[4])

© DeNA Co., Ltd.
Opの実装: sep3x3
29
c*h*w
5c*h*w
c*h*w
pw
1x1
dw
3x3
W0 W1 W2
W0 W1 W2
W0 W1 W2
bn
relu
h0 h1 h2
h5
※1
sep
3x3
※2

© DeNA Co., Ltd.
Opの実装: sep3x3
30
c*h*w
5c*h*w
c*h*w
pw
1x1
dw
3x3
W0 W1 W2
W0 W1 W2
W0 W1 W2
bn
relu
h0 h1 h2
h5
※1
sep
3x3
※2

© DeNA Co., Ltd.
Child全体
31
展開すると意外と大きい
メタアーキテクチャとして必要な命令が多い
（I/Fをあわせたり, downsamplingしたり）
DAG全体のパラメータ数は5.3M (ResNet-50の1/5)
childのパラメータ数は4.6M (from [1] Table.2)
3x3 or 5x5 の depthwise-conv と 1x1convがメイン
1-Childを全て書くと... （DAG全体でないので注意）
※Chainerによる実装
(Functionのみ表示)
from [1]

© DeNA Co., Ltd.
controller
 あるOpの前のindexとOp種を、2つずつ交互に出力
⁃ 出力は決定的でなく確率的(uniform distribution)
⁃ sep3x3, sep5x5が出力されやすいようにバイアスの初期値を調整
32
外部入力なし。自身の埋め込みベクトルが最初の入力
from [1]

© DeNA Co., Ltd.
実験条件(ENAS固有のもの)
 Child
⁃ Momentum SGD with Nesterov (learning rateはcosine scheduling)
⁃ 1 iteration毎にchildを変えながら学習
⁃ 精度向上のため、auxiliary head も lossに追加（勾配増強、正則化の効果）
⁃ 探索後のスクラッチ学習ではScheduledDropPath (NASNet[4])
 Controller
⁃ Adam / REINFORCE with baseline
⁃ rewardは valデータのランダムな1バッチのaccuracy (高速化)
33

© DeNA Co., Ltd.
実験結果: CNN micro architecture CIFAR10
 著者公開実装を利用した再試 https://github.com/melodyguan/enas
⁃ ./scripts/cifar10_micro_{serach or final}.sh を変更せず実行
 architecture search時の学習曲線
34
child train loss child val/test acc controller reward
lr
search後のスクラッチ学習では test accuracy: 96.22% (err 3.78% <= 論文はerr 3.54%)
subgraphを切り替えながら学習しているのに
lossがきれいに落ちていくのは興味深い

© DeNA Co., Ltd.
まとめと所感
 Efficient Neural Architecture Search via Parameters Sharing（ENAS）
⁃ 1000x less expensive than standard Neural Architecture Search
⁃ 1つの計算グラフの一部をchildとして学習させる
⁃ Weight Sharingして学習するために、Weightの切り替えなど複雑なグラフ
 所感
⁃ Chainerの再現実装（CNN micro search）を後日公開予定
35

© DeNA Co., Ltd.
Reference
1. Hieu Pham. 2018. Efficient Neural Architecture Search via Parameters Sharing. ICML2018
⁃ http://proceedings.mlr.press/v80/pham18a.html https://github.com/melodyguan/enas
2. Hanxiao Liu. 2018. DARTS: Differentiable Architecture Search. arXiv:1806.09055
⁃ https://arxiv.org/abs/1806.09055 https://github.com/quark0/darts
3. Barret Zoph. Neural Architecture Search with Reinforcement Learning. ICML2017
⁃ https://arxiv.org/abs/1611.01578
4. Barret Zoph. Learning Transferable Architectures for Scalable Image Recognition. CVPR2018
⁃ https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet
5. C. Liu. Progressive Neural Architecture Search. arXiv:1712.00559
36

Efficient Neural Architecture Search via Parameters Sharing @ ICML2018読み会

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Efficient Neural Architecture Search via Parameters Sharing @ ICML2018読み会

Similar to Efficient Neural Architecture Search via Parameters Sharing @ ICML2018読み会 (20)

Recently uploaded

Recently uploaded (9)

Efficient Neural Architecture Search via Parameters Sharing @ ICML2018読み会

Editor's Notes