작고 빠른 딥러닝 그리고 Edge computing

—
양 서 연 (Stella Yang)
LG 전자 로봇사업센터 연구원
작고 빠른 딥러닝 그리고 Edge Computing

AI for Newbie
5
Do you have GPU?

AI Cloud Service
6
No I don’t have GPU
가난함 돈있음

AI Cloud Service
7
돈있으면 즐길 수 있는 풍성한 클라우드 서비스

Latency Problem
8
Cloud Computing 은 반드시 좋은 것일까?
Cloud 를 경유하며 생기는 latency 증가는
실시간성을 확보해야 하는
자율주행, 로봇등에서 치명적인 문제.
개인정보가 중요시되는 시대에
클라우드 경우시 보안의 문제.

Embedded System
9
제한된 용량, 컴퓨팅 자원, 배터리

Edge Computing
Group Name / DOC ID / Month XX, 2018 / © 2018 IBM Corporation
10
Nvidia TX2
Nvidia TX Nano
Google Coral TPU
Qualcomm RB3 Intel Movidius
Google Coral
GPU / TPU USB Accelerator
임베디드에서도 고성능 연산이 가능하기 시작

Edge vs Remote
12
Edge 와 Cloud 는 반드시 상충하는 장점으로 선택되어야 할까?

Complementary Concepts
13
Edge 와 Cloud 는 보완적

Distributed Learning
14
전처리, 분산처리, 학습, 데이터

15
On device AI ?
- Light weight
- Fast inference
(low latency)
- Power efficient
- Privacy Protected

How ?
16
- Model Compression
- Compact Networks Design
- Low Computation
- Hardware Aware Distributed Computing
- Memory Bottleneck, I/O Pipeline Optimized
- Federated Learning

17
Compact Networks Design
Residual Connection
Bottle neck , Fire module ,
Depthwise Conv
Pointwise Conv
Grouped Conv
Dense Conv
Pointwise Grouped Conv
Efficientnet
MNASNet
Model Compression
Prunning
Quantization
Tensor factorization
Distillation
AML
만들어진 모델을 압축하는 기법 설계부터 최적의 모델을 설계하는 기법

Prunning
19
Ne
Optimal Brain Damage
Threshold 이하의 Node
Prunning 후 Retraining
작고 빠른 딥러닝 그리고 Edge Computing / September 6, 2019 / © 2019 IBM Corporation

Prunning – Dense Sparse Dense
20
Prunning 한 가지를 되살려
다시 학습시키면
Performance 가 증가하지
않을까?

Quantization
작고 빠른 딥러닝 그리고 Edge Computing / September 6, 2019 / © 2019 IBM Corporation 21
Prunning
Generate Code book
Quantization
Huffman Encoding
Code Book 을 통해서 Weight 을 중복 하자
Quantization 을 통해서 경량화하자
Deep Compression: Compressing Deep Neural Networks with Pruning,
Trained Quantization and Huffman Coding
https://arxiv.org/abs/1510.00149

Quantization, Weight Sharing
22
K Clustering
within-cluster sum of squares (WCSS):
Generate code book
Quantized Weight
Retrain, Fine-tuned with quantized gradient
Binarized Neural Network
Weight , Bias, Activation 이진화
Deep Compression

Tensor Factorization
23
5x5 * 1 = 3x3 * 9 -> 3x3 *1
(25) (9+9=18)
3x3 * 1 = 1x3 * 3 -> 3x1 *1
(9) (3+3=6)
nxn = 1*n , n*1
weight 수 :
weight 수 :

Tensor Factorization
Group Name / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 24
Inception Module

Knowledge Distillation
크기가 상대적으로 큰 Teacher network 의
softmax out 을 soften 한 soft labels 를
크기가 상대적으로 작은 student network 에 교사 후
hard label 로 재학습

26
https://blog.lunit.io/2018/03/22/distilling-the-knowledg
e-in-a-neural-network-nips-2014-workshop/

28
Model Compression
Prunning : 가지치기!
Quantization : Bit resolution 낮추기!
Tensor factorization : 인수분해!
Knowledge distillation : 큰 그릇에서 작은 그릇으로!

Residual Connection
이전 layer 의 정보를 넘겨주는 것.
깊은 망도 더 잘 최적화.
동일 파라메터수, 연산량의 증가가 덧셈 외에 거의 없음
ResNet

Bottleneck Convolution
Dimension reduction
Parameter reduction
Bottle Neck
ResNet

Dimension Reduction Conv
각 채널별 spatial 정보만을
이용하여 convolution 을 한다.
Channel 간의 weighted linear
combination
-Dimension reduction
Standard Convolution Depthwise Convolution Pointwise Convolution

Depthwise Separable Convolution
Depthwise convolution
Pointwise convolution
MobileNet, Xception

Depthwise Separable Convolution
34
https://www.slideshare.net/NaverEngineering/designing-
more-efficient-convolution-neural-network-122869307

Fire module
35
SqueezeNet
Squeeze layer :
1x1 conv 로 Pointwise conv 수행
= channel reduction
Expansion layer :
1x1 은 spatial feature 를
잘 찾아내지 못하므로
3x3 과 섞어서 사용. Padding 을 이용해
두 kernel 의 결과를 같게 하여 적층

Grouped Convolution
Filter 를 group 화 하면 channel 정보를 분리해서 학습이 가능
1x1 은 모든 채널을 연결하는 반면, group conv 는 특정 채널끼리 연결
각 그룹마다 correlation 이 높은 정보를 학습 가능
Sparse 해지므로 더 적은 파라메터를 가짐

Grouped Convolution
37
AlexNet
Group 의 수에 따라서 정밀도에도 영향을 미
침 (hyper parameter)
각 그룹마다 학습되는 네트워크가 다른
feature 를 뽑아 낼 수 있음

Channel Shuffle
38
ShuffleNet
Depth wise convolution 은 효율적이
지만 여전히 1x1 연산의 cost 가 크다.
Group conv 로 특정 채널끼리만 연결
하면 더 효율적이나 그룹들간에 연결
성 저하됨.
그룹간의 연결성을 channel
shuffle 로 극복

Pointwise Grouped Convolution
39
ShuffleNet
Bottle neck 구조에서
Group conv 와
Channel shuffle 을 사용
stride=2 인 3x3 average
pooling shortcut path
: output size가 2배 줄이기위함

Dense Convolution
40
Concatenate
Resnet Densenet
Bottleneck layer 에서 1x1 으로 expansion 하는
대신 concatinate 하여 feature 를 growth rate 만큼씩 키운다.

Shiftnet
41
하나만 1 인 kernel matrix 를 곱하여
Feature shift 가 일어나는 효과를 이용
Spatial change 외에 channel 별 연결을 위해
Pointwise convolution

All you need is few shift
42
Grouped Shift
Active shift
그룹별로 묶어서
채널별로 최적 shift 가 있지 않을까
학습으로 찾아보자!
Shift 는 pixelwise integer 로 일어남
Integer 는 Differential 이 안됨
Real value 로 우선 학습 후
Bilinear interpolation으로 integer 값을 추정함
https://www.youtube.com/watch?v=c3xLY7EuikU
&feature=youtu.be

Compound Scaling
43
EfficientNet
https://tykimos.github.io/warehouse/2019-7-4-ISS_2nd_Deep_
Learning_Conference_All_Together_jwlee_file.pdf

44
Compound Scaling
EfficientNet
하나씩 scale 하는 것 보다 모든 dimension에대해서
로 scaling 하여서 grid search
https://tykimos.github.io/warehouse/2019-7-4-ISS_2nd_Deep_Learni
ng_Conference_All_Together_jwlee_file.pdf

Auto ML
45
AMCMNASNET
Real world platform latency aware 강화학습을 이용하여 automl 을 통해
prunning, quantization 을 조정

46
Residual Connection, Densenet : 깊은 곳까지 정보 전달
Bottleneck, Firemodule: Dimension Reduction
Depthwise Conv, Grouped Conv, Shift: Increase Useful Correlation
Few Shift, Efficientnet, MNASNET, AMC: 최적은 인간이 찾는 것 아님

그밖에 Compact Networks Designs..
Structured Sparsity
multibit quantization
Variational Dropout
Variational Information Bottleneck

Compile Optimization Pipeline Optimization
48

Compiler : Vendor SDK dependency
49
http://blog.naver.com/PostView.nhn?blogId=mesa_&logNo=221466719724&parentCategory
No=&categoryNo=10&viewDate=&isShowPopularPosts=false&from=postView

Compiler 표준화
50
Framework, Compiler, Hardware
dependency 를 표준화를 통해 줄이는 노력.
http://blog.naver.com/PostView.nhn?blogId=mesa_&logNo=
221466719724&parentCategoryNo=&categoryNo=10&viewD
ate=&isShowPopularPosts=false&from=postView

AI future with Edge
51
Distributed
Mobile, Ubiquitous

AI Robotics KR
52
https://github.com/ai-robotics-kr/nnq_cnd_study/blob/
master/AwesomePapers.md
https://www.facebook.com/groups/airoboticskr/
10 월 3 일 (개천절!)
마곡 로보티즈 메이커 스페이스
AI x Robotics The First Meetup
NNQ_CND study

LG 전자 로봇 사업센터
53

Thank you.
54
양서연 (Stella Yang)
LG 전자 로봇사업센터
—
howtowhy@gmail.com
Reviewer : 앤드류엉님

작고 빠른 딥러닝 그리고 Edge computing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 작고 빠른 딥러닝 그리고 Edge computing

Similar to 작고 빠른 딥러닝 그리고 Edge computing (20)

작고 빠른 딥러닝 그리고 Edge computing