정말 딥러닝은 사람처럼 세상을 인식하고 있을까?

정말 딥러닝은 사람처럼
세상을 인식하고 있을까?
윤재식
jaesik817

저는
졸업
에서 ML developer로 근무중
https://github.com/jaesik817
+ 10

ILSVRC 2012 이후 딥러닝은…
…

???
실제 gibbon data 중 하나

???
아 원래 confidence가 낮아서 그런 걸 거임
좀 헷갈리게(?) 생겼음 ㅇㅇㅇ
실제 gibbon data 중 하나

??
실제 ostrich data 중 하나

??
One pixel attack for fooling deep neural
networks (https://arxiv.org/pdf/1710.08864.pdf )

??
Making error success rate: 73.8%

??
Making error success rate: 73.8%
???????

전부 image classification 관련된 예제인데 RNN이나
Reinforcement learning에서도 저런 현상이 보이는가?

보인다
• Adversarial Attacks on Neural Network Policies (Huang et al.
2017)

보인다2
• Black-Box Attacks against RNN based Malware Detection
Algorithms (Hu and Tan, 2017)

현재 흔히 사용되는 딥러닝 구조들 (MLP, CNN, RNN, 다
양한 RL들)은 작은 noise에도 쉽게 오작동한다

오작동하게 만드는 noise가 Adversarial Attack

오작동하게 만드는 noise가 Adversarial Attack
이를 방어 하는 것이 Adversarial Defense

수학적인 정의
• Given a classifier 𝑓 𝐱 : 𝐱 ∈ ℝ𝐼 → 𝑦 ∈ ℤ,
• Adversarial Attack is 𝜉 makes 𝑓 𝐱 ≠ 𝑓 𝐱 + 𝜉 𝑠. 𝑡. 𝜉 𝑝
≤ 𝜖
=> 제한된 noise 크기 한도 내에서 miss-classification을 만드는
noise

대표적인 Adversarial Attack들
• Fast Gradient Sign Method (FGSM)
• Basic Iteration Method (Basic Iter.)
• Least-Likely Class Method (step l.l)
• Iterative Least-likely Class Method (iter. l.l)

흔히 benchmark로 사용되는 방법

FGSM (Goodfellow et al. 2014)
• 아이디어: 학습하는 방향의 반대 방향으로 노이즈 크기 한도까
지 노이즈를 만들면 직관적으로 가장 잘 오작동 시킬 수 있지
않을까

않을까
Loss function and gradient descent

않을까
Optimization의 반대 방향으로 이동
Loss function and gradient descent

• 간단한 방법이지만, 효과적인 성능을 보여 benchmark로 주로
사용됨

사용됨
• 추후에 서술될 adversarial training에 사용되면 label leaking
problem을 발생시킴

사용됨
(label leaking problem: noise 생성 시 true label 정보를 이용하
기 때문에 분류 시 이용된 true label 정보가 힌트로 사용되어
noise 정도가 커질수록 acc.가 증가함)

사용됨
(label leaking problem: noise 생성 시 true label 정보를 이용하
기 때문에 분류 시 이용된 true label 정보가 힌트로 사용되어
noise 정도가 커질수록 acc.가 증가함)
(Kurakin et al. 2017)

FGSM을 반복적으로 수행

step l.l을 반복적으로 수행

Step l.l (Kurakin et al. 2017)
• MNIST나 CIFAR 10처럼 class의 개수가 적은 dataset에 대해
adversarial attack이 miss-classification을 유발하는 것을 보여주
는 것은 유의미하다

• 하지만,

• 하지만, imagenet처럼 유사한 class가 많은 dataset에 대해 유사
한 class로 miss-classification되는 것은 큰 문제가 아닌 것으로
보일 수 있다. (i.e. 코카스파니엘 -> 푸들)

• 그래서,

• 그래서, true label을 못 맞추게 하는게 아니라 가장 아닐 것 같
은 class로 맞추게 하는 방법이 오작동을 만든다면 재미있지 않
을까

을까
• 근데,

을까
• 근데, 재미 뿐만 아니라 잘됨

을까
• 근데, 재미 뿐만 아니라 잘됨
• 게다가 label leaking problem도 없음

대표적인 Adversarial Attack들 (Summary)
Name equation description
FGSM
학습 반대 방향으로 노이즈 한도까
지 노이즈 생성
Basic iter. FGSM을 n번 반복
Step l.l
가장 유사하지 않은 class로 분류되
게끔 노이즈 생성
Iter. l.l Step l.l n번 반복

최근 Adversarial Attack 연구들
• Universal adversarial perturbations (Moosavi-Dezfooli et al.
2017)
• Adversarial Transformation Networks (ATN, Baluja et al. 2017)
• NIPS 2017 nontargeted/targeted attack competition

Universal adversarial perturbations
• Image 각자에 대해 noise를 만들어 줬을 때 잘 되는데,

• Image 각자에 대해 noise를 만들어 줬을 때 잘 되는데, 혹시
image 종류에 상관없이 (image-agnostic) 오작동시키는 noise
가 있지 않을까?

• Image 각자에 대해 noise를 만들어 줬을 때 잘 되는데, 혹시
image 종류에 상관없이 (image-agnostic) 오작동시키는 noise
가 있지 않을까?
• 있음

• ㄱ…그럼 model 종류에 상관없이도(model-agnostic) 잘 될까?

• ㄱ…그럼 model 종류에 상관없이도(model-agnostic) 잘 될까?
• 어느정도 됨

Adversarial Transformation Networks
• deconv-conv model을 이용해 image를 encoding-decoding하
여 noise를 제공하는 학습기반 adversarial attacker 제안

• Reranking function(r(y,t))를 제안하여, true label이 아닌 다른
label로 classify되도록 학습

• Multi-classifier에 대해서도 학습이 되도록 design

• Multi-classifier에 대해서도 학습이 되도록 design
근데…

NIPS 2017 nontargeted/targeted attack
competition
• NIPS 2017 workshop에서 adversarial attack/defense
competition이 열려 다양한 방법들이 제시되었다.
Nontargeted Targeted
Ours (Jaesik Yoon
& Hoyong Jang)

competition
Nontargeted Targeted
Ours (Jaesik Yoon
& Hoyong Jang)
상위권에 있는 대부분의 방법들은
multi-classifier에 대해 동작하도록
design됨 (러닝 기반이 아니더라도)

competition
• Discovering Adversarial Examples with Momentum (Dong et
al. 2017, 1st nontargeted/targeted)

competition
• Basic iter.를 구할 때 momentum term 추가

competition
• Basic iter.를 구할 때 momentum term 추가
• 여러 개의 모델에 대한 ensemble momentum iterative FGSM (basic
iter.) 구함

competition
• https://github.com/sangxia/nips-2017-adversarial (sangxia,
2017, 2nd nontargeted/targeted)

competition
• Random perturbation을 더 한 상태에서

competition
• Random perturbation을 더 한 상태에서
• 여러 개의 모델에 대한 ensemble iterative FGSM을 구함

competition
• https://github.com/jaesik817/nips17_adv_attack (Jaesik and
Hoyong, 2017, 18th nontargeted)

competition
• Deconv-conv base의 학습 기반 adversarial attacker 학습

competition
• Loss function은 FGSM과 유사한 reverse CE 사용 (-CE)

competition
• Loss function은 FGSM과 유사한 reverse CE 사용 (-CE)
• 여러 개의 모델에 대한 ensemble noise 구함

여기서 잠깐, 그러면 우리는 왜 망했을까?

여기서 잠깐, 그러면 우리는 왜 망했을까?
ㄱ…그래도 18등이면 망한 건 아니지 않냐

competition
• 공격을 해보자
• Related work survey: model의 loss로부터 나오는 gradient를 이용한
방법들 밖에 없네? -> 학습을 하면 무조건 저거보단 잘 되겠구나!
+
노벨티도 있겠구나!!
Survey 실패 이미 ATM이 arxiv에 올라와 있었다

competition
방법들 밖에 없네? -> 학습을 하면 저거보단 잘 되겠구나!
+
Survey 실패 이미 ATM이 arxiv에 올라와 있었다

competition
방법들 밖에 없네? -> 학습을 하면 저거보단 잘 되겠구나!
+
Survey 실패: 이미 ATN이 arxiv에 올라와 있었다

competition
• 자, 그럼 attacker를 학습해보자

competition
• 이미지를 받아서 노이즈를 만드는 걸로 만들면 되겠지?

competition
• 어, 그거 이거랑 약간 비슷하다

competition
(Hyeonwoo Noh et al. 2015)

competition
Output은 FGSM처럼 - 𝜖, 𝜖 이 되
도록
𝑦 = 𝜖 ∗ 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒(tanh 𝑜 )

competition
• Loss function은?

competition
• 최대한 틀리게 -> -CE(Cross Entropy)?

competition
• 최대한 틀리게 -> -CE(Cross Entropy)? 1/CE?

competition
• 최대한 틀리게 -> -CE(Cross Entropy)? 1/CE?
1/CE는 수학적 의미가 이상(?)함

competition
• 최대한 틀리게 -> -CE(Cross Entropy) 1/CE?
1/CE는 수학적 의미가 이상(?)함

competition
• 결과는?

competition
• 결과는?
대충 돌려봤는데 나쁘지 않음 ㅇㅇ

competition
• 결과는?
대충 돌려봤는데 나쁘지 않음 ㅇㅇ -> 이게 mark1

competition
• 결과는?
대충 돌려봤는데 나쁘지 않음 ㅇㅇ -> 이게 mark1
Learning based method가 작은 노이즈에 대해 FGSM보다 성능이 안 나옴

competition
실험

competition
실험
실험

competition
실험
실험
실험

competition
실험
실험
실험
실험

competition
실험
실험
실험
실험
실험
실험

competition
• 노이즈를 - 𝜖, 𝜖 로 하는 것보다 [- 𝜖, 𝜖]로 하는 게 더 다양한 노이즈 패
턴을 만들 수 있지 않을까

competition
턴을 만들 수 있지 않을까
𝑦 = 𝜖 ∗ tanh 𝑜

competition
턴을 만들 수 있지 않을까 -> 이것이 mark2
𝑦 = 𝜖 ∗ tanh 𝑜

competition
• 다시

competition
• 이번엔 ensemble target에 대해서도

competition
• 그런데,

competition
• 그런데, 단순히 CE의 값을 높여주는 게

competition
• 그런데, 단순히 CE의 값을 높여주는 게 miss classification을 직접적으
로 늘려줄 수 있을까?

competition
로 늘려줄 수 있을까? (positive relationship이 있을 것이라는 것에는
동의하지만 원래 정답을 헷갈리게 하는 것과 틀리게 하는 것은 차이가
있다)

competition
있다)
• 그래서,

competition
있다)
• 그래서, step l.l의 concept을 섞어보았다

competition
+ batch normalization 도

competition
+ batch normalization 도
-> 이것이 mark3

competition
• 결과는,

competition
• 결과는, 잘됨. Maximum score가 13000임

competition
• 결과는,
낮은 noise 값에서도 압도적인 성능을 보임

competition
• 결과는,
D-3 for final submission

competition
• 결과는,
??

competition
정리가 되었을 때 3일 남음

competition
정리가 되었을 때 3일 남음
가지고 있는 건 딸랑 1080 하나

competition
• 결국,

competition
• 결국,
Mark2와 작은 noise에 대해선
FGSM(baseline algorithm)으
로 제출

그러면 우리는 왜 망했을까?

ㄹ….리소스가 부족해서 (원래 안되면 남탓 ㅇㅇ)

ㄹ….리소스가 부족해서 (원래 안되면 남탓 ㅇㅇ)
절반은 버리고 18등했으면 망한건 아님 정신승리 ㅇㅇ

다시 원래 내용으로 돌아와서…

이렇게 다양한 adversarial attack들이 연구되었다
…

그럼 이런 adversarial attack에 대응하는
방법들은 뭐가 있을까

Adversarial Defense들
• Adversarial Training
• Adversarial attack detection
• NIPS 2017 adversarial defense competition

Adversarial Training (Goodfellow et al. 2015)
• Adversarial attack들이 제대로 분류가 안된다면, 이 data들 또한
training data에 넣어 학습한다.

Adversarial attack에 대한 loss
function term

• 원래 loss function과 ratio를 조정함으로써 regularizer로 동작하
도록 함

• 원래 loss function과 ratio를 조정함으로써 regularizer로 동작하
도록 함
효과 있음
(Kurakin et al. 2017)

Adversarial Example Detection
• Adversarial attack이 FGSM등 특별한 방법으로 만든 noise라면,
분명 pattern이 있을 것이고 detection이 될 것이다!

• ON DETECTING ADVERSARIAL PERTURBATIONS (Metzen et al.
2017)

2017)
각 layer마다 Adversarial detector를 학
습시켜, adversarial attack을 detect함.

2017)
각 layer마다 Adversarial detector를 학
습시켜, adversarial attack을 detect함.
단, 이 경우 detect된 attack의 label은
estimation할 수 없음

• Tactics of Adversarial Attack on Deep Reinforcement Learning
Agents (Lin et al. 2017)

다음 state를 예측하는 model (video
prediction model)을 학습,
다음 번 state가 adversarial attack이라
고 판단되면 input으로 들어온 state가
아닌 예측된 state를 사용

다음 state를 예측하는 model (video
prediction model)을 학습,
다음 번 state가 adversarial attack이라
고 판단되면 input으로 들어온 state가
아닌 예측된 state를 사용
Metzen’s paper와는 달리 rl이기 때문
에 attack으로 detect하여 drop하여도
성능은 떨어지지만 working함

NIPS adversarial defense competition

• https://github.com/lfz/Guided-Denoise (TsAIL team, 1st on
competition)

competition)
• Denoiser를 학습시켜 perturbation을 줄이도록 함

competition)
• Denoiser를 학습시켜 perturbation을 줄이도록 함
• 여러 모델을 이용하여 denoiser 학습

• MITIGATING ADVERSARIAL EFFECTS THROUGH
RANDOMIZATION (Zhang and Yuille et al. 2017 , 2nd on
competition)

Capsule Nets의 등장
• 기존의 CNN보다 natural variance에 robust하다고 주장하는
Capsule Network가 제안되었다

그래서 제가 한번 공격해봤습니다

Adversarial attack to CapsNet
• CapsNet 구현체는 InnerPeace-Wu의 구현체를 사용
• https://github.com/InnerPeace-Wu/CapsNet-tensorflow.

• CapsNet의 loss function으로부터의 gradient를 이용하여
white-box attack을 실험

• CapsNet의 loss function으로부터의 gradient를 이용하여
white-box attack을 실험
• FGSM, Basic iter., step l.l, iter. l.l 을 이용하여 noise 0~50까지
test하였음

• FGSM

• Basic iter.

• step l.l

• iter. l.l

• 이 실험은 Dynamic Routing Between Capsules 논문의 구조
를 이용하여 한 실험이며,
• 현재 ICLR 2018 review 중인 MATRIX CAPSULES WITH EM
ROUTING에서 기존의 CNN보다 EM routing을 이용하여 학습
한 Capsule Networks가 adversarial attack에 대해 더 robust
하다는 실험 결과를 report했기 때문에
• Capsule Networks에 대한 adversarial attack의 영향력은 좀
더 연구가 필요함

다시 처음으로 돌아가서…

정말 딥러닝은 사람처럼 세상을 인식하고 있을
까?

대답은, 아니다
정리해보면,
1. 현재의 딥러닝 구조 및 학습법은 adversarial attack으로부터 자유롭지
못하다
2. (짧은 시간이었지만) 다양한 연구 결과들이 나왔고, 이들은 adversarial
attack을 함께 학습함으로써, 혹은 detecting이나 denoising함으로써 이를
보완하는 결과를 보였다.
3. natural variance에 robust하지 못한 현재의 구조를 개선하며 Capsule
Network이 제안되었지만, 아직 검증은 되지 못했다
4. adversarial attack problem이 해결된다 하더라도 또 다른 부분이 발견될
수 있다.(아직 할게 많다)

정말 딥러닝은 사람처럼 세상을 인식하고 있을까?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 정말 딥러닝은 사람처럼 세상을 인식하고 있을까?

Similar to 정말 딥러닝은 사람처럼 세상을 인식하고 있을까? (9)

More from NAVER Engineering

More from NAVER Engineering (20)

Recently uploaded

Recently uploaded (7)

정말 딥러닝은 사람처럼 세상을 인식하고 있을까?