SlideShare a Scribd company logo
1 of 145
Download to read offline
Autoencoders
A way for Unsupervised Learning of Nonlinear Manifold
이활석
Slide Design : https://graphicriver.net/item/simpleco-simple-powerpoint-template/13220655
한글로 작성된 부분은 개인적인 견해인 경우가 많으니 참조하시기 바랍니다.
각 슬라이드 아래에는 해당 슬라이드에서 사용된 자료 출처가 제시되어 있습니다.
Autoencoder in WikipediaFOUR KEYWORDS 1 / 5
[KEYWORDS]
Unsupervised learning
Representation learning
= Efficient coding learning
Dimensionality reduction
Generative model learning
INTRODUCTION
Nonlinear dimensionality reductionFOUR KEYWORDS 2 / 5
[KEYWORDS]
Unsupervised learning
Nonlinear Dimensionality reduction
= Representation learning
= Efficient coding learning
= Feature extraction
= Manifold learning
Generative model learning
INTRODUCTION
Representation learningFOUR KEYWORDS 3 / 5
[KEYWORDS]
Unsupervised learning
Nonlinear Dimensionality reduction
= Representation learning
= Efficient coding learning
= Feature extraction
= Manifold learning
Generative model learning
http://videolectures.net/kdd2014_bengio_deep_learning/
INTRODUCTION
ML density estimationFOUR KEYWORDS 4 / 5
[KEYWORDS]
Unsupervised learning
Nonlinear Dimensionality reduction
= Representation learning
= Efficient coding learning
= Feature extraction
= Manifold learning
Generative model learning
ML density estimationhttp://www.iangoodfellow.com/slides/2016-12-04-NIPS.pdf
INTRODUCTION
5 / 5
[4 MAIN KEYWORDS]
1. Unsupervised learning
2. Manifold learning
3. Generative model learning
4. ML density estimation
오토인코더를 학습할 때:
학습 방법은 비교사 학습 방법을 따르며,
Loss는 negative ML로 해석된다.
학습된 오토인코더에서:
인코더는 차원 축소 역할을 수행하며,
디코더는 생성 모델의 역할을 수행한다.
SummaryFOUR KEYWORDS INTRODUCTION
Unsupervised learning
ML density estimation
Manifold learning
Generative model learning
Encoder DecoderInput Output
CONTENTS
01. Revisit Deep Neural Networks
• Machine learning problem
• Loss function viewpoints I : Back-propagation
• Loss function viewpoints II : Maximum likelihood
• Maximum likelihood for autoencoders
03. Autoencoders
• Autoencoder (AE)
• Denosing AE (DAE)
• Contractive AE (CAE)
02. Manifold Learning
• Four objectives
• Dimension reduction
• Density estimation
04. Variational Autoencoders
• Variational AE (VAE)
• Conditional VAE (CVAE)
• Adversarial AE (AAE)
05. Applications
• Retrieval
• Generation
• Regression
• GAN+VAE
01. Revisit Deep Neural Networks
02. Manifold Learning
03. Autoencoders
04. Variational Autoencoders
05. Applications
• Machine learning problem
• Loss function viewpoints I : Back-propagation
• Loss function viewpoints II : Maximum likelihood
• Maximum likelihood for autoencoders
KEYWORD : ML density estimation
딥뉴럴넷을 학습할 때 사용되는 로스함수는 다양한 각도에서
해석할 수 있다.
그 중 하나는 back-propagation 알고리즘이 좀 더 잘 동작할 수
있는지에 대한 해석이다.
(gradient-vanishing problem이 덜 발생할 수 있다는 해석)
다른 하나는 negative maximum likelihood로 보고 특정 형태의
loss는 특정 형태의 확률분포를 가정한다는 해석이다.
Autoencoder를 학습하는 것 또한 maximum likehihood
관점에서의 최적화로 볼 수 있다.
Classic Machine LearningML PROBLEM 1 / 17
01. Collect training data 입력
데이터
출력
정보
𝑥
모델
𝑦𝑥 = {𝑥1, 𝑥2, … , 𝑥 𝑁}
02. Define functions
𝑓𝜃 ∙
• Output : 𝑓𝜃 𝑥
• Loss : 𝐿(𝑓𝜃 𝑥 , 𝑦) 모델 종류
𝐿(𝑓𝜃 𝑥 , 𝑦)
서로 다른 정도
03. Learning/Training
Find the optimal parameter
𝜃∗ = argmin
𝜃
𝐿(𝑓𝜃 𝑥 , 𝑦)
04. Predicting/Testing
Compute optimal function output
𝑦𝑛𝑒𝑤 = 𝑓𝜃∗ 𝑥 𝑛𝑒𝑤
주어진 데이터를 제일 잘
설명하는 모델 찾기
고정 입력, 고정 출력
𝑦 = {𝑦1, 𝑦2, … , 𝑦 𝑁}
𝒟 = 𝑥1, 𝑦1 , 𝑥2, 𝑦2 … 𝑥 𝑁, 𝑦 𝑁
예측
REVISIT DNN
Deep Neural Networks
2 / 17
01. Collect training data 입력
데이터
출력
정보
𝑥
모델
𝑦02. Define functions
𝑓𝜃 ∙
모델 종류
𝐿(𝑓𝜃 𝑥 , 𝑦)
서로 다른 정도
03. Learning/Training
04. Predicting/Testing
예측
𝜃 = 𝑊, 𝑏
𝑓𝜃 ∙ 𝐿(𝑓𝜃 𝑥 , 𝑦)Deep Neural Network
파라미터는 웨이트와 바이어스
Assumption 1.
Total loss of DNN over training samples is the sum of loss for each
training sample
𝐿 𝑓𝜃 𝑥 , 𝑦 = σ𝑖 𝐿 𝑓𝜃 𝑥𝑖 , 𝑦𝑖
Assumption 2.
Loss for each training example is a function of final output of DNN
Backpropagation을 통해 DNN학습을 학습 시키기 위한 조건들
ML PROBLEM REVISIT DNN
Questions Strategies
How to update 𝜃  𝜃 + ∆𝜃 Only if 𝐿 𝜃 + ∆𝜃 < 𝐿(𝜃)
When we stop to search?? If 𝐿 𝜃 + ∆𝜃 == 𝐿 𝜃
Deep Neural Networks
3 / 17
01. Collect training data
02. Define functions
03. Learning/Training
04. Predicting/Testing
𝜃∗ = argmin
𝜃∈Θ
𝐿(𝑓𝜃 𝑥 , 𝑦) Gradient Descent
Iterative Method 𝜃∗ = argmin
𝜃∈Θ
𝐿(𝑓𝜃 𝑥 , 𝑦) = argmin
𝜃∈Θ
𝐿(𝜃)
Θ
𝐿(𝜃)
로스값이 줄어드는 방향으로 계속 이동하고,
움직여도 로스값이 변함 없을 경우 멈춘다
ML PROBLEM REVISIT DNN
Deep Neural Networks
4 / 17
01. Collect training data
02. Define functions
03. Learning/Training
04. Predicting/Testing
𝜃∗ = argmin
𝜃∈Θ
𝐿(𝑓𝜃 𝑥 , 𝑦) Gradient Descent
𝐿 𝜃 + ∆𝜃 = 𝐿 𝜃 + 𝛻𝐿 ∙ ∆𝜃 + 𝑠𝑒𝑐𝑜𝑛𝑑 𝑑𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒 + 𝑡ℎ𝑖𝑟𝑑 𝑑𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒 + ⋯Taylor Expansion 
𝐿 𝜃 + ∆𝜃 ≈ 𝐿 𝜃 + 𝛻𝐿 ∙ ∆𝜃Approximation 
𝐿 𝜃 + ∆𝜃 − 𝐿 𝜃 = ∆𝐿 = 𝛻𝐿 ∙ ∆𝜃
If ∆𝜃 = −𝜂∆𝐿, then ∆𝐿 = −𝜂 𝛻𝐿 2
< 0, where 𝜂 > 0 and called learning rate
𝛻𝐿 is gradient of 𝐿 and indicates the steepest increasing direction of 𝐿
더 많은 차수를 사용할 수록 더 넓은 지역을 작은 오차로 표현 가능
Learning rate를 사용하여 조금씩 파라미터 값을 바꾸는 것은 로스 함수의 1차 미분항까지만
사용했기에 아주 좁은 영역에서만 감소 방향이 정확하기 때문이다.
ML PROBLEM REVISIT DNN
Questions Strategies
How to update 𝜃  𝜃 + ∆𝜃 Only if 𝐿 𝜃 + ∆𝜃 < 𝐿(𝜃)
When we stop to search?? If 𝐿 𝜃 + ∆𝜃 == 𝐿 𝜃
How to find ∆𝜃 so that 𝐿(𝜃 + ∆𝜃) < 𝐿(𝜃) ? ∆𝜃 = −𝜂𝛻𝐿, where 𝜂 > 0
Deep Neural Networks
5 / 17
01. Collect training data
02. Define functions
03. Learning/Training
04. Predicting/Testing
DB LOSS
𝒟
DNN
𝐿(𝜃 𝑘, 𝒟)𝜃 𝑘 = 𝑤 𝑘, 𝑏 𝑘
𝐿 𝜃 𝑘, 𝒟 = σ𝑖 𝐿 𝜃 𝑘, 𝒟𝑖
𝛻𝐿 𝜃 𝑘, 𝒟 = σ𝑖 𝛻𝐿 𝜃 𝑘, 𝒟𝑖
𝛻𝐿 𝜃 𝑘, 𝒟 ≜ σ𝑖 𝛻𝐿 𝜃 𝑘, 𝒟𝑖 /𝑁
𝛻𝐿 𝜃 𝑘, 𝒟 ≈ σ 𝑗 𝛻𝐿 𝜃 𝑘, 𝒟𝑖 /𝑀, where 𝑀 < 𝑁Stochastic Gradient Descent 
𝜃 𝑘+1 = 𝜃 𝑘 − 𝜂𝛻𝐿 𝜃 𝑘, 𝒟
M : batch size
𝜃∗ = argmin
𝜃∈Θ
𝐿(𝑓𝜃 𝑥 , 𝑦) Gradient Descent
ML PROBLEM REVISIT DNN
전체 데이터에 대한 로스 함수가 각 데이터 샘플에 대
한 로스의 합으로 구성되어 있기에 미분 계산을 효율적
으로 할 수 있다.
만약 곱으로 구성되어 있으면 미분을 위해 모든 샘플의
결과를 메모리에 저장해야 한다.
Redefinition 
원래는 모든 데이터에 대한 로스 미분값의 합을 구한
후 파라미터를 갱신해야 하지만, 배치 크기만큼만 로스
미분값의 합을 구한 후 파라미터를 갱신한다.
Deep Neural Networks
6 / 17
01. Collect training data
02. Define functions
03. Learning/Training
04. Predicting/Testing
𝜃∗ = argmin
𝜃∈Θ
𝐿(𝑓𝜃 𝑥 , 𝑦) Gradient Descent + Backpropagation
DB LOSS
𝒟
DNN
𝐿(𝜃 𝑘, 𝒟)𝜃 𝑘 = 𝑤 𝑘, 𝑏 𝑘
𝜃 𝑘+1 = 𝜃 𝑘 − 𝜂𝛻𝐿 𝜃 𝑘, 𝒟
𝑤 𝑘+1
𝑙
= 𝑤 𝑘
𝑙
− 𝜂𝛻 𝑤 𝑘
𝑙 𝐿 𝜃 𝑘, 𝒟
𝑏 𝑘+1
𝑙
= 𝑏 𝑘
𝑙
− 𝜂𝛻𝑏 𝑘
𝑙 𝐿 𝜃 𝑘, 𝒟
특정 레이어에서
파라미터 갱신식
𝛿 𝐿 = 𝛻𝑎 𝐶⨀𝜎′ 𝑧 𝐿
• 𝐶 : Cost (Loss)
• 𝑎 : final output of DNN
• 𝜎(∙) : activation function
1. Error at the output layer
𝛿 𝑙
= 𝜎′
𝑧 𝑙
⨀ 𝑤 𝑙+1 𝑇
𝛿 𝑙+1
2. Error relationship between two adjacent layers
3. Gradient of C in terms of bias
4. Gradient of C in terms of weight
𝛻𝑏 𝑙 𝐶 = 𝛿 𝑙
𝛻 𝑊 𝑙 𝐶 = 𝛿 𝑙 𝑎 𝑙−1 𝑇
http://neuralnetworksanddeeplearning.com/chap2.html
ML PROBLEM
[ Backpropagation Algorithm ]
REVISIT DNN
로스함수의 미분값이 딥뉴럴넷 학습에서 제일 중요!!
View-Point I : BackpropagationLOSS FUNCTION 7 / 17
∑
w
b
Input : 1.0 Output : 0.0
w=-1.28
b=-0.98
a=+0.09
w =-0.68
b =-0.68
a =+0.20
x y
OBJECT
𝛻𝑎 𝐶 = (𝑎 − 𝑦)
𝑤 = 𝑤 − 𝜂𝛿 𝑏 = 𝑏 − 𝜂𝛿
𝐶 = 𝑎 − 𝑦 2/2 = 𝑎2/2
𝛿 = 𝛻𝑎 𝐶⨀𝜎′
𝑧 = (𝑎 − 𝑦)𝜎′
𝑧
az
𝜎 . :sigmoid
Type 1 : Mean Square Error / Quadratic loss
w0=+0.6, b0=+0.9, a0=+0.82 w0=+2.0, b0=+2.0, a0=+0.98
𝜕𝐶
𝜕𝑤
= 𝑥𝛿 = 𝛿
𝜕𝐶
𝜕𝑏
= 𝛿
http://neuralnetworksanddeeplearning.com/chap2.html
REVISIT DNN
View-Point I : BackpropagationLOSS FUNCTION 8 / 17
Type 1 : Mean Square Error / Quadratic loss
Learning slow means are 𝜕𝐶 ∕ 𝜕𝑤, 𝜕𝐶 ∕ 𝜕𝑏 small !!
Why they are small???
w=+0.6
b=+0.9
w=+2
b=+2
𝜕𝐶
𝜕𝑤
= 𝑥𝛿 = 𝑥𝑎𝜎′
𝑧 = 𝑎𝜎′
𝑧
𝜕𝐶
𝜕𝑏
= 𝛿 = 𝑎𝜎′ 𝑧
∑
w
b
Input : 1.0 Output : 0.0
x y
OBJECT
az
𝜎 . :sigmoid
http://neuralnetworksanddeeplearning.com/chap2.html
REVISIT DNN
𝛻𝑎 𝐶 = (𝑎 − 𝑦)
𝑤 = 𝑤 − 𝜂𝛿 𝑏 = 𝑏 − 𝜂𝛿
𝐶 = 𝑎 − 𝑦 2/2 = 𝑎2/2
𝛿 = 𝛻𝑎 𝐶⨀𝜎′
𝑧 = (𝑎 − 𝑦)𝜎′
𝑧
𝜕𝐶
𝜕𝑤
= 𝑥𝛿 = 𝛿
𝜕𝐶
𝜕𝑏
= 𝛿
View-Point I : Backpropagation
9 / 17
Type 2 : Cross Entropy
𝐶 = − 𝑦 𝑙𝑛 𝑎 + (1 − 𝑦)𝑙𝑛(1 − 𝑎)
𝛻𝑎 𝐶 = −
𝑦
𝑎
− 1 − 𝑦
−1
1 − 𝑎
=
𝑦 − 𝑎
1 − 𝑎 𝑎
𝜎′
𝑧 =
𝜕𝑎
𝜕𝑧
= 𝜎′
𝑧 = 1 − 𝜎 𝑧 𝜎 𝑧 = 1 − 𝑎 𝑎
LOSS FUNCTION
∑
w
b
Input : 1.0 Output : 0.0
x y
OBJECT
az
𝜎 . :sigmoid
𝛿 𝐶𝐸 = 𝛻𝑎 𝐶⨀𝜎′ 𝑧 𝐿 =
𝑎 − 𝑦
1 − 𝑎 𝑎
1 − 𝑎 𝑎 = 𝑎 − 𝑦
= −
𝑦
𝑎
+
1 − 𝑦
1 − 𝑎
=
−(1 − 𝑎)𝑦
(1 − 𝑎)𝑎
+
1 − 𝑦 𝑎
(1 − 𝑎)𝑎
=
−𝑦 + 𝑎𝑦 + 𝑎 − 𝑎𝑦
(1 − 𝑎)𝑎
=
𝑎 − 𝑦
(1 − 𝑎)𝑎
𝛿 𝑀𝑆𝐸 = (𝑎 − 𝑦)𝜎′
𝑧
MSE와는 달리 CE는 출력 레이어에서의
에러값에 activation function의 미분값이
곱해지지 않아 gradient vanishing problem에서
좀 더 자유롭다.
(학습이 좀 더 빨리 된다)
그러나 레이어가 여러 개가 사용될 경우에는
결국 activation function의 미분값이 계속해서
곱해지므로 gradient vanishing problem에서
완전 자유로울 수 없다.
ReLU는 미분값이 1 혹은 0이므로 이러한
관점에서 훌륭한 activation function이다.
http://neuralnetworksanddeeplearning.com/chap2.html
REVISIT DNN
View-Point I : Backpropagation
10 / 17
Type 2 : Cross Entropy
𝐶 = − 𝑦 𝑙𝑛 𝑎 + (1 − 𝑦)𝑙𝑛(1 − 𝑎)
𝛿 = 𝑎 − 𝑦
LOSS FUNCTION
w=-1.28
b=-0.98
a=+0.09
w=-2.37
b=-2.07
a=+0.01
w=-0.68
b=-0.68
a=+0.20
w=-2.20
b=-2.20
a=+0.01
∑
w
b
Input : 1.0 Output : 0.0
x y
OBJECT
az
𝜎 . :sigmoid 𝑤 = 𝑤 − 𝜂𝛿 𝑏 = 𝑏 − 𝜂𝛿
𝜕𝐶
𝜕𝑤
= 𝑥𝛿 = 𝛿
𝜕𝐶
𝜕𝑏
= 𝛿
http://neuralnetworksanddeeplearning.com/chap2.html
REVISIT DNN
w0=+0.6, b0=+0.9, a0=+0.82 w0=+2.0, b0=+2.0, a0=+0.98
View-Point II : Maximum Likelihood
11 / 17
01. Collect training data 입력
데이터
출력
정보
𝑥
모델
𝑦𝑥 = {𝑥1, 𝑥2, … , 𝑥 𝑁}
02. Define functions
𝑓𝜃 ∙
• Output : 𝑓𝜃 𝑥
• Loss : − log 𝑝(𝑦|𝑓𝜃 𝑥 ) 모델 종류
𝑝(𝑦|𝑓𝜃 𝑥 )
정해진 확률분포에서
출력이 나올 확률
03. Learning/Training
Find the optimal parameter
𝜃∗ = argmin
𝜃
[− log 𝑝(𝑦|𝑓𝜃 𝑥 ) ]
04. Predicting/Testing
Compute optimal function output
𝑦𝑛𝑒𝑤~𝑝(𝑦|𝑓𝜃∗ 𝑥 𝑛𝑒𝑤 )
주어진 데이터를 제일 잘
설명하는 모델 찾기
고정 입력, 고정/다른 출력
𝑦 = {𝑦1, 𝑦2, … , 𝑦 𝑁}
𝒟 = 𝑥1, 𝑦1 , 𝑥2, 𝑦2 … 𝑥 𝑁, 𝑦 𝑁
예측
Back to Machine Learning Problem
LOSS FUNCTION REVISIT DNN
𝑦
𝑓𝜃1
𝑥
𝑓𝜃2
𝑥
𝑝 𝑦 𝑓𝜃12 𝑥 < 𝑝 𝑦 𝑓𝜃1
𝑥
View-Point II : Maximum Likelihood
12 / 17
01. Collect training data
02. Define functions
03. Learning/Training
04. Predicting/Testing
입력
데이터
출력
정보
𝑥
모델
𝑦
𝑓𝜃 ∙ 𝑝(𝑦|𝑓𝜃 𝑥 )
Back to Machine Learning Problem
Assumption 1.
Total loss of DNN over training samples is the sum of loss for each
training sample
Assumption 2.
Loss for each training example is a function of final output of DNN
Assumption 1 : Independence
All of our data is independent of each other
𝑝(𝑦|𝑓𝜃 𝑥 ) = ς𝑖 𝑝 𝐷 𝑖
(𝑦|𝑓𝜃 𝑥𝑖 )
Assumption 2: Identical Distribution
Our data is identically distributed
𝑝(𝑦|𝑓𝜃 𝑥 ) = ς𝑖 𝑝(𝑦|𝑓𝜃 𝑥𝑖 )
i.i.d Condition on 𝑝 𝑦 𝑓𝜃 𝑥
− log 𝑝 𝑦 𝑓𝜃 𝑥 = − ෍
𝑖
log 𝑝 𝑦𝑖 𝑓𝜃 𝑥𝑖
LOSS FUNCTION REVISIT DNN
View-Point II : Maximum Likelihood
13 / 17
− log 𝑝 𝑦𝑖 𝑓𝜃 𝑥𝑖
Gaussian distribution Bernoulli distribution
𝑝 𝑦𝑖 𝜇𝑖, 𝜎𝑖 =
1
2𝜋𝜎𝑖
exp −
𝑦𝑖 − 𝜇𝑖
2
2𝜎𝑖
2
𝑓𝜃 𝑥𝑖 = 𝜇𝑖, 𝜎𝑖 = 1
log( 𝑝 𝑦𝑖 𝜇𝑖, 𝜎𝑖 ) = log
1
2𝜋𝜎𝑖
−
𝑦𝑖 − 𝜇𝑖
2
2𝜎𝑖
2
−log( 𝑝 𝑦𝑖 𝜇𝑖 ) = − log
1
2𝜋
+
𝑦𝑖 − 𝜇𝑖
2
2
−log( 𝑝 𝑦𝑖 𝜇𝑖 ) ∝
𝑦𝑖 − 𝜇𝑖
2
2
=
𝑦𝑖 − 𝑓𝜃 𝑥𝑖
2
2
𝑓𝜃 𝑥𝑖 = 𝑝𝑖
𝑝 𝑦𝑖 𝑝𝑖 = 𝑝𝑖
𝑦 𝑖
1 − 𝑝𝑖
1−𝑦 𝑖
log( 𝑝 𝑦𝑖 𝑝𝑖 ) = 𝑦𝑖 log 𝑝𝑖 + 1 − 𝑦𝑖 log(1 − 𝑝𝑖)
− log( 𝑝 𝑦𝑖 𝑝𝑖 ) = − 𝑦𝑖 log 𝑝𝑖 + 1 − 𝑦𝑖 log(1 − 𝑝𝑖)
Mean Squared Error
Cross-entropy
LOSS FUNCTION REVISIT DNN
Univariate cases
View-Point II : Maximum Likelihood
14 / 17
Gaussian distribution Categorical distribution
𝑝 𝑦𝑖 𝜇𝑖, Σ𝑖 =
1
2𝜋 𝑛/2 Σ 𝑖
1/2 exp −
𝑦 𝑖−𝜇 𝑖
𝑇Σ 𝑖
−1
𝑦 𝑖−𝜇 𝑖
2
𝑓𝜃 𝑥𝑖 = 𝜇𝑖, Σ𝑖 = 𝐼
log( 𝑝 𝑦𝑖 𝜇𝑖, Σ𝑖 ) = log
1
2𝜋 𝑛/2 Σ 𝑖
1/2 −
𝑦 𝑖−𝜇 𝑖
𝑇Σ 𝑖
−1
𝑦 𝑖−𝜇𝑖
2
−log( 𝑝 𝑦𝑖 𝜇𝑖 ) = − log
1
2𝜋 𝑛/2 +
𝑦 𝑖−𝜇 𝑖 2
2
2
−log( 𝑝 𝑦𝑖 𝜇𝑖 ) ∝
𝑦 𝑖−𝜇 𝑖 2
2
2
=
𝑦 𝑖−𝑓 𝜃 𝑥𝑖 2
2
2
𝑓𝜃 𝑥𝑖 = 𝑝𝑖
𝑝 𝑦𝑖 𝑝𝑖 = ς 𝑗=1
𝑛
𝑝𝑖,𝑗
𝑦 𝑖,𝑗
1 − 𝑝𝑖,𝑗
1−𝑦𝑖,𝑗
log( 𝑝 𝑦𝑖 𝑝𝑖 ) = σ 𝑗=1
𝑛
𝑦𝑖,𝑗 log 𝑝𝑖,𝑗 + 1 − 𝑦𝑖,𝑗 log(1 − 𝑝𝑖,𝑗)
− log( 𝑝 𝑦𝑖 𝑝𝑖 ) = − σ 𝑗=1
𝑛
𝑦𝑖,𝑗 log 𝑝𝑖,𝑗 + 1 − 𝑦𝑖,𝑗 log(1 − 𝑝𝑖,𝑗)
Mean Squared Error
Cross-entropy
Also called Generalized Bernoulli or Multinoulli distribution
LOSS FUNCTION REVISIT DNN
Multivariate cases − log 𝑝 𝑦𝑖 𝑓𝜃 𝑥𝑖
View-Point II : Maximum Likelihood
15 / 17
Gaussian distribution Categorical distribution
𝑓𝜃 𝑥𝑖 = 𝜇𝑖 𝑓𝜃 𝑥𝑖 = 𝑝𝑖
Distribution estimation
𝑥𝑖
𝑝 𝑦𝑖 𝑥𝑖
Likelihood값을 예측하는 것이 아니라,
Likelihood의 파라미터값을 예측하는 것이다.
𝑥𝑖𝑥𝑖
LOSS FUNCTION REVISIT DNN
Multivariate cases − log 𝑝 𝑦𝑖 𝑓𝜃 𝑥𝑖
Mean Squared Error Cross-entropy
View-Point II : Maximum Likelihood
16 / 17
Let’s see Yoshua Bengio‘s slide
http://videolectures.net/kdd2014_bengio_deep_learning/
LOSS FUNCTION REVISIT DNN
View-Point II : Maximum Likelihood
17 / 17
Connection to Autoencoders
Autoencoder
LOSS FUNCTION REVISIT DNN
Variational
Autoencoder
Gaussian distribution
Categorical distribution
Probability
distribution
𝑝(𝑥|𝑥) 𝑝(𝑥)
Mean Squared Error Loss
Cross-Entropy Loss
Mean Squared Error Loss
Cross-Entropy Loss
01. Revisit Deep Neural Networks
02. Manifold Learning
03. Autoencoders
04. Variational Autoencoders
05. Applications
• Four objectives
• Dimension reduction
• Density estimation
KEYWORDS : Manifold learning, Unsupervised learning
Autoencoder의 가장 중요한 기능 중 하나는 매니폴드를
학습한다는 것이다.
매니폴드 학습의 목적 4가지인 데이터 압축, 데이터 시각화, 차원의
저주 피하기, 유용한 특징 추출하기에 대해서 설명할 것이다.
Autoencoder의 주요 기능인 차원 축소, 확률 분포 예측과 관련된
기존의 방법들을 살펴보고, 그 한계점에 대해서 짚어볼 것이다.
• A 𝑑 dimensional manifold ℳ is embedded in an 𝑚 dimensional space, and there is an explicit mapping
𝑓: ℛ 𝑑 → ℛ 𝑚 𝑤ℎ𝑒𝑟𝑒 𝑑 ≤ 𝑚
• We are given samples 𝑥𝑖 ∈ ℛ 𝑚 with noise
• 𝑓(∙) is called embedding function, 𝑚 is the extrinsic dimension, 𝑑 is the intrinsic dimension or the
dimension of the latent space
• Finding 𝑓(∙) or 𝜏𝑖 from the given 𝑥𝑖 is called manifold learning
• We assume 𝑝(𝜏) is smooth, is distributed uniformly, and noise is small  Manifold Hypothesis
DefinitionINTRODUCTION MANIFOLD LEARNING
1 / 20
https://math.stackexchange.com/questions/1203714/manifold-learning-how-should-this-method-be-interpreted
ℛ 𝑑
ℛ 𝑚
What is it useful for?INTRODUCTION MANIFOLD LEARNING
2 / 20
01. Data compression
02. Data visualization
03. Curse of dimensionality
04. Discovering most important features
Reasonable distance metric
Needs disentagling the underlying explanatory factors
(making sense of the data)
Manifold Hypothesis
Dimensionality Reduction is an
Unsupervised Learning Task!
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
𝑥𝑖 ∈ ℛ 𝑚
𝜏𝑖 ∈ ℛ 𝑑
(3.5, -1.7, 2.8, -3.5, -1.4, 2.4, 2.7, 7.5)
(0.32, -1.3, 1.2)
Data compressionOBJECTIVES MANIFOLD LEARNING
3 / 20
http://theis.io/media/publications/paper.pdf
Example : Lossy Image Compression with compressive Autoencoders, ‘17.03.01
Data visualizationOBJECTIVES MANIFOLD LEARNING
4 / 20
t-distributed stochastic neighbor embedding (t-SNE)
https://www.tensorflow.org/get_started/embedding_viz
http://vision-explorer.reactive.ai/#/?_k=aodf68
http://fontjoy.com/projector/
Curse of dimensionalityOBJECTIVES MANIFOLD LEARNING
5 / 20
데이터의 차원이 증가할수록 해당 공간의
크기(부피)가 기하급수적으로 증가하기
때문에 동일한 개수의 데이터의 밀도는
차원이 증가할수록 급속도로 희박해진다.
따라서, 차원이 증가할수록 데이터의
분포 분석 또는 모델추정에 필요한 샘플
데이터의 개수가 기하급수적으로
증가하게 된다.
http://darkpgmr.tistory.com/145
http://videolectures.net/kdd2014_bengio_deep_learning/
Curse of dimensionalityOBJECTIVES MANIFOLD LEARNING
6 / 20
Natural data in high dimensional spaces concentrates close to lower dimensional manifolds.
Probability density decreases very rapidly when moving away from the supporting manifold.
Manifold Hypothesis (assumption)
고차원의 데이터의 밀도는 낮지만, 이들의 집합을 포함하는 저차원의 매니폴드가 있다.
이 저차원의 매니폴드를 벗어나는 순간 급격히 밀도는 낮아진다.
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
Curse of dimensionalityOBJECTIVES MANIFOLD LEARNING
7 / 20
Manifold Hypothesis (assumption)
• 200x200 RGB image has 10^96329 possible states.
• Random image is just noisy.
• Natural images occupy a tiny fraction of that space
• suggests peaked density
• Realistic smooth transformations from one image to
another continuous path along manifold
• Data density concentrates near a lower dimensional
manifold
• It can shift the curse from high d to d << m
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
http://www.freejapanesefont.com/category/calligraphy-2/
http://baanimotion.blogspot.kr/2014/05/animated-acting.html
https://medicalxpress.com/news/2013-03-people-facial.html
Discovering most important featuresOBJECTIVES MANIFOLD LEARNING
8 / 20
Manifold follows naturally from continuous underlying factors (≈intrinsic manifold coordinates)
Such continuous factors are part of a meaningful representation!
https://dmm613.wordpress.com/tag/machine-learning/
thickness
rotation size
rotation
From InfoGAN From VAE
매니폴드 학습 결과 평가를 위해 매니폴드 좌표들이 조금씩 변할 때 원 데이터도 유의미하게 조금씩 변함을 보인다.
Discovering most important featuresOBJECTIVES MANIFOLD LEARNING
9 / 20
의미적으로 가깝다고 생각되는 고차원 공간에서의 두 샘플들 간의 거리는 먼 경우가 많다.
고차원 공간에서 가까운 두 샘플들은 의미적으로는 굉장히 다를 수 있다.
차원의 저주로 인해 고차원에서의 유의미한 거리 측정 방식을 찾기 어렵다.
Reasonable distance metric
A1
B
A2 A2
A1
B
Distance in high dimension Distance in manifold
중요한 특징들을
찾았다면 이 특징을
공유하는 샘플들도
찾을 수 있어야 한다.
Discovering most important featuresOBJECTIVES MANIFOLD LEARNING
10 / 20
Reasonable distance metric
https://www.cs.cmu.edu/~efros/courses/AP06/presentations/ThompsonDimensionalityReduction.pdf
Interpolation in high dimension
Discovering most important featuresOBJECTIVES MANIFOLD LEARNING
11 / 20
Reasonable distance metric
Interpolation in manifold
https://www.cs.cmu.edu/~efros/courses/AP06/presentations/ThompsonDimensionalityReduction.pdf
Discovering most important featuresOBJECTIVES MANIFOLD LEARNING
12 / 20
Needs disentagling the underlying explanatory factors
In general, learned manifold is entangled, i.e. encoded in a data space in a complicated manner.
When a manifold is disentangled, it would be more interpretable and easier to apply to tasks
Entangled manifold Disentangled manifold
MNIST Data  2D manifold
TexonomyDIM. REDUCTION MANIFOLD LEARNING
13 / 20
• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA)
• etc..
Dimensionality
Reduction
Linear
Non-Linear
• Autoencoders (AE)
• t-distributed stochastic neighbor embedding (t-SNE)
• Isomap
• Locally-linear embedding (LLE)
• etc..
PCADIM. REDUCTION MANIFOLD LEARNING
14 / 20
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
• Finds k directions in which data has highest variance
• Principal directions (eigenvectors) 𝑊
• Projecting inputs 𝑥 on these vectors yields reduced
dimension representation (&decorrelated)
• Principal components
• ℎ = 𝑓𝜃 𝑥 = 𝑊 𝑥 − 𝜇 𝑤𝑖𝑡ℎ 𝜃 = 𝑊, 𝜇
http://www.nlpca.org/fig_pca_principal_component_analysis.png
• Why mention PCA?
• Prototypical unsupervised representation
learning algorithm
• Related to autoencoders
• Prototypical manifold modeling algorithm
PCADIM. REDUCTION MANIFOLD LEARNING
15 / 20
http://www.astroml.org/book_figures/chapter7/fig_S_manifold_PCA.html
Entangled manifold
Linear manifold
Disentangled manifold
Nonlinear manifold
Disentangled manifold
Nonlinear manifold
Non linear methodsDIM. REDUCTION MANIFOLD LEARNING
16 / 20
Isomap LLE
https://www.slideshare.net/plutoyang/manifold-learning-64891420
Parzen WIndowsDENSITY ESTIMATION MANIFOLD LEARNING
17 / 20
https://en.wikipedia.org/wiki/Density_estimation#/media/File:KernelDensityGaussianAnimated.gif
Ƹ𝑝 𝑥 =
1
𝑛
෍
𝑖=1
𝑛
𝒩(𝑥; 𝑥𝑖, 𝜎𝑖
2
)
• Demonstration of density estimation using kernel smoothing
• The true density is mixture of two Gaussians centered around 0 and 3, shown with solid blue curve.
• In each frame, 100 samples are generated from the distribution, shown in red.
• Centered on each sample, a Gaussian kernel is drawn in gray.
• Averaging the Gaussians yields the density estimate shown in the dashed black curve.
1D Example
Parzen WIndowsDENSITY ESTIMATION MANIFOLD LEARNING
18 / 20
Ƹ𝑝 𝑥 =
1
𝑛
෍
𝑖=1
𝑛
𝒩(𝑥; 𝑥𝑖, 𝜎𝑖
2
𝐼)
2D Example
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
Classical Parzen
Windows
- Isotropic Gaussian centered on
each training point
Ƹ𝑝 𝑥 =
1
𝑛
෍
𝑖=1
𝑛
𝒩(𝑥; 𝑥𝑖, 𝐶𝑖)
Manifold Parzen
Windows
-Oriented Gaussian centered on
each training point
-Use local PCA to get 𝐶𝑖
-High variance directions from
PCA on k nearest neighbors
Ƹ𝑝 𝑥 =
1
𝑛
෍
𝑖=1
𝑛
𝒩(𝑥; 𝜇(𝑥𝑖), 𝐶(𝑥𝑖))
Non-local Manifold
Parzen Windows
-High variance directions and
center output by neural network
trained to maximize likelihood of
k nearest neighbors
Bengio, Larochelle, Vincent NIPS 2006Vincent and Bengio, NIPS 2003
Parzen WIndowsDENSITY ESTIMATION MANIFOLD LEARNING
19 / 20
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
LIMITATION MANIFOLD LEARNING
20 / 20
• Isomap
• Locally-linear embedding (LLE)
Dimensionality
Reduction
• Isotropic parzen window
• Manifold parzen window
• Non-local manifold parzen window
Non-parametric
Density Estimation
• They explicitly use distance based neighborhoods.
• Training with k-nearest neighbors, or pairs of points.
• Typically Euclidean neighbors
• But in high d, your nearest Euclidean neighbor can be very different from you
Neighborhood based training !!!
고차원 데이터 간의 유클리디안 거리는 유의미한 거리 개념이 아닐 가능성이 높다.
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
01. Revisit Deep Neural Networks
02. Manifold Learning
03. Autoencoders
04. Variational Autoencoders
05. Applications
Autoencoder를 설명하고 이와 유사해 보이는 PCA, RBM과의
차이점을 설명한다.
Autoencoder의 입력에 Stochastic perturbation을 추가한
Denoising Autoencoder, perturbation을 analytic regularization
term으로 바꾼 Contractive Autoencoder에 대해서 설명한다.
• Autoencoder (AE)
• Denosing AE (DAE)
• Contractive AE (CAE)
TerminologyINTRODUCTION AUTOENOCDERS
1 / 24
• Code
• Latent Variable
• Feature
• Hidden representation
Encoding
Undercomplete
Decoding
Overcomplete
Autoencoders
= Auto-associators
= Diabolo networks
= Sandglass-shaped net
Diabolo
𝑥 𝑦
𝑧
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
NotationsINTRODUCTION 2 / 24
𝑧
Encoder
h(.)
𝑥 𝑦
• Make output layer same size as input layer
𝑥, 𝑦 ∈ ℝ 𝑑
• Loss encourages output to be close to input
𝐿(𝑥, 𝑦)
• Unsupervised Learning  Supervised Learning
Decoder
g(.)
𝐿(𝑥, 𝑦)
𝑧 = ℎ(𝑥) ∈ ℝ 𝑑 𝑧
𝑦 = 𝑔 𝑧 = 𝑔(ℎ(𝑥))
𝐿 𝐴𝐸 = ෍
𝑥∈𝐷
𝐿(𝑥, 𝑦)
입출력이 동일한 네트워크
비교사 학습 문제를 교사 학습 문제로 바꾸어서 해결
Decoder가 최소한 학습 데이터는 생성해 낼 수 있게 된다.
 생성된 데이터가 학습 데이터 좀 닮아 있다.
Encoder가 최소한 학습 데이터는 잘 latent vector로 표현
할 수 있게 된다.
 데이터의 추상화를 위해 많이 사용된다.
AUTOENOCDERS
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
Multi-Layer PerceptronLINEAR AUTOENCODER 3 / 24
𝐿(𝑥, 𝑦)
𝑥 ∈ ℝ 𝑑 𝑦 ∈ ℝ 𝑑
𝑧 ∈ ℝ 𝑑 𝑧
𝑧 = ℎ(𝑥)
𝑦 = 𝑔(ℎ 𝑥 )
ℎ(∙) 𝑔(∙)
input output
reconstruction error
Encoder Decoder
latent vector
𝐿 𝐴𝐸 = ෍
𝑥∈𝐷
𝐿(𝑥, 𝑔(ℎ 𝑥 )Minimize
ℎ 𝑥 = 𝑊𝑒 𝑥 + 𝑏 𝑒
𝑔 ℎ 𝑥 = 𝑊𝑑 𝑧 + 𝑏 𝑑
𝑥 − 𝑦 2 or cross-entropy
General Autoencoder Linear Autoencoder
Hidden layer 1개이고 레이어 간
fully-connected로 연결된 구조
AUTOENOCDERS
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
Connection to PCA & RBMLINEAR AUTOENCODER 4 / 24
Principle Component Analysis Restricted Boltzman Machine
• For bottleneck structure : 𝑑 𝑧 < 𝑑
• With linear neurons and squared loss,
autoencoder learns same subspace as PCA
• Also true with a single sigmoidal hidden layer, if
using linear output neurons with squared loss
and untied weights.
• Won’t learn the exact same basis as PCA, but W
will span the same subspace.
Baldi, Pierre, & Hornik, Kurt. 1989. Neural networks and principal
component analysis: Learning from examples without local
minima. Neural networks, 2(1), 53–58.
• With a single hidden layer with sigmoid non-
linearity and sigmoid output non-linearity.
• Tie encoder and decoder weights: 𝑊𝑑 = 𝑊𝑒
𝑇
AUTOENOCDERS
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
Autoencoder RBM
𝑧𝑖 = 𝜎 𝑊𝑒𝑖 𝑥 + 𝑏 𝑒𝑖 𝑃 ℎ𝑖 = 1 𝑣 = 𝜎 𝑊𝑒𝑖 𝑣 + 𝑏 𝑒𝑖
𝑦𝑗 = 𝜎 𝑊𝑒𝑗
𝑇
𝑧 + 𝑏 𝑑𝑗 𝑃 𝑣𝑗 = 1 ℎ = 𝜎 𝑊𝑒𝑗
𝑇
ℎ + 𝑏 𝑑𝑗
Determinisitc mapping 𝑧 is
a function 𝑥
Stochastic mapping 𝑧 is a
random variable
ℎ = 𝑓𝜃 𝑥 = 𝑊 𝑥 − 𝜇 𝑤𝑖𝑡ℎ 𝜃 = 𝑊, 𝜇 in PCA Slide
RBMPRETRAINING 5 / 24
AUTOENOCDERS
Stacking RBM  Deep Belief Network (DBN)
https://www.cs.toronto.edu/~hinton/science.pdf
Reducing the Dimensionality of Data with Neural Networks
Target
784
1000
1000
10
Input
output
Input 784
1000
784
W1
𝑥
ො𝑥
500
W1’
AutoencoderPRETRAINING 6 / 24
AUTOENOCDERS
Stacking Autoencoder
http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/auto.pptx
Target
784
1000
1000
500
10
Input
output
Input 784
1000
W1
1000
1000
fix
𝑥
𝑎1
ො𝑎1
W2
W2’
AutoencoderPRETRAINING 7 / 24
AUTOENOCDERS
Stacking Autoencoder
http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/auto.pptx
Target
784
1000
1000
10
Input
output
Input 784
1000
W1
1000
fix
𝑥
𝑎1
ො𝑎2
W2fix
𝑎2
1000
W3
500 500
W3’
AutoencoderPRETRAINING 8 / 24
AUTOENOCDERS
Stacking Autoencoder
http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/auto.pptx
Target
784
1000
1000
10
Input
output
Input 784
1000
W1
1000
𝑥
W2
W3
500 500
10output
W4  Random initialization
AutoencoderPRETRAINING 9 / 24
AUTOENOCDERS
Stacking Autoencoder
Fine-tuning by backpropagation
http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/auto.pptx
IntroductionDAE 10 / 24
~~~
𝐿(𝑥, 𝑦)
෤𝑥 ∈ ℝ 𝑑 𝑦 ∈ ℝ 𝑑
𝑧 ∈ ℝ 𝑑 𝑧
𝑧 = ℎ(෤𝑥)
𝑦 = 𝑔(ℎ ෤𝑥) )
ℎ(∙) 𝑔(∙)
corrupted input output
reconstruction
error
Encoder Decoder
latent vector
𝐿 𝐷𝐴𝐸 = ෍
𝑥∈𝐷
𝐸 𝑞( ෤𝑥|𝑥) 𝐿(𝑥, 𝑔(ℎ ෤𝑥 )Minimize
𝑥 ∈ ℝ 𝑑
input
add random noise 𝑞(෤𝑥|𝑥)
AUTOENOCDERS
Denoising AutoEnocder
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
Adding noiseDAE 11 / 24
Denoising corrupted input
• will encourage representation that is robust to small perturbations of the input
• Yield similar or better classification performance as deep neural net pre-training
Possible corruptions
• Zeroing pixels at random (now called dropout noise)
• Additive Gaussian noise
• Salt-and-pepper noise
• Etc
Cannot compute expectation exactly
• Use sampling corrupted inputs
𝐿 𝐷𝐴𝐸 = ෍
𝑥∈𝐷
𝐸 𝑞( ෤𝑥|𝑥) 𝐿(𝑥, 𝑔(ℎ ෤𝑥 ) ≈ ෍
𝑥∈𝐷
1
𝐿
෍
𝑖=1
𝐿
𝐿(𝑥, 𝑔(ℎ ෤𝑥𝑖 )
L개 샘플에 대한
평균으로 대체
AUTOENOCDERS
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
Manifold interpretationDAE 12 / 24
• Suppose training data (x) concentrate near a low dimensional manifold.
• Corrupted examples (●) obtained by applying corruption process.
• 𝑞(෤𝑥|𝑥) will generally lie farther from the manifold.
• The model learns with 𝑝(𝑥|෤𝑥) to “project them back” (via autoencoder 𝑔(ℎ ෤𝑥 )) onto the manifold.
• Intermediate representation 𝑧 = ℎ(𝑥) may be interpreted as a coordinate system for points 𝑥 on the
manifold.
𝑞(෤𝑥|𝑥)
𝑔(ℎ ෤𝑥 )
෤𝑥
෤𝑥
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
AUTOENOCDERS
DAE 13 / 24
Filters in 1 hidden architecture must capture low-level features of images.
𝑥
𝑧 = 𝜎𝑒(𝑊𝑥 + 𝑏 𝑒)
𝑦 = 𝜎 𝑑(𝑊 𝑇
𝑧 + 𝑏 𝑑)
AutoEncoder
with 1hidden layer
Performance – Visualization of learned filters
AUTOENOCDERS
DAE 14 / 24
Natural image patches (12x12 pixels) : 100 hidden units
랜덤값으로 초기화하였기 때문에
노이즈처럼 보이는 필터일 수록 학습이
잘 안 된 것이고 edge filter와 같은 모습
일 수록 학습이 잘 된 것이다.
10% salt-and-pepper noise
• Mean Squared Error
• 100 hidden units
• Salt-and-pepper noise
Performance – Visualization of learned filters
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
AUTOENOCDERS
Performance – Visualization of learned filtersDAE 15 / 24
MNIST digits (64x64 pixels)
25% corruption
• Cross Entropy
• 100 hidden units
• Zero-masking noise
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
AUTOENOCDERS
Performance – PretrainingDAE 16 / 24
Stacked Denoising Auto-Encoders (SDAE)
bgImgRot Data
Train/Valid/Test : 10k/2k/20k
Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion
AUTOENOCDERS
Performance – PretrainingDAE 17 / 24
Stacked Denoising Auto-Encoders (SDAE)
bgImgRot Data
Train/Valid/Test : 10k/2k/20k Zero-masking noise
SAE
Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion
AUTOENOCDERS
Performance – GenerationDAE 18 / 24
Bernoulli
input input
Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion
AUTOENOCDERS
Encouraging representation to be insensitive to corruptionDAE 19 / 24
DAE encourages reconstruction to be insensitive to input corruption
Alternative: encourage representation to be insensitive
Tied weights i.e. 𝑊′ = 𝑊 𝑇 prevent 𝑊 from collapsing ℎ to 0.
𝐿 𝑆𝐶𝐴𝐸 = ෍
𝑥∈𝐷
𝐿(𝑥, 𝑔 ℎ 𝑥 + 𝜆𝐸 𝑞( ෤𝑥|𝑥) ℎ 𝑥 − ℎ ෤𝑥 2
정규화 항목이 0이 되는 경우는 h가 0인 경우도 있으므로 이를 방지하기 위해 tied weight를 사용한다.
DAE의 loss의 해석은 g,h중에서 특히 h는 데이터 x가 조금 바뀌더라도
매니폴드 위에서 같은 샘플로 매칭이 되도록 학습 되어야 한다라고 볼 수 있다.
Reconstruction Error Stochastic Regularization
Stochastic Contractive AutoEncoder (SCAE)
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
AUTOENOCDERS
From stochastic to analytic penaltyCAE 20 / 24
SCAE stochastic regularization term : 𝐸 𝑞( ෤𝑥|𝑥) ℎ 𝑥 − ℎ ෤𝑥 2
For small additive noise, ෤𝑥|𝑥 = 𝑥 + 𝜖, 𝜖~𝒩(0, 𝜎2 𝐼)
Taylor series expansion yields, ℎ ෤𝑥 = ℎ 𝑥 + 𝜖 = ℎ 𝑥 +
𝜕ℎ
𝜕𝑥
𝜖 + ⋯
It can be showed that
Analytic Regularization
(CAE)
Stochastic Regularization
(SCAE)
𝐸 𝑞( ෤𝑥|𝑥) ℎ 𝑥 − ℎ ෤𝑥 2 ≈
𝜕ℎ
𝜕𝑥
𝑥
𝐹
2
Contractive AutoEncoder (CAE)
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
AUTOENOCDERS
Frobenius Norm
𝐴 𝐹
2
= ෍
𝑖=1
𝑚
෍
𝑗=1
𝑛
𝑎𝑖𝑗
2
Analytic contractive regularization term is
• Frobenius norm of the Jacobian of the non-linear mapping
Penalizing 𝐽ℎ 𝑥 𝐹
2
encourages the mapping to the feature space to be contractive in
the neighborhood of the training data.
Loss functionCAE 21 / 24
𝐿 𝑆𝐶𝐴𝐸 = ෍
𝑥∈𝐷
𝐿(𝑥, 𝑔 ℎ 𝑥 + 𝜆
𝜕ℎ
𝜕𝑥
𝑥
𝐹
2
Reconstruction Error Analytic Contractive
Regularization
For training examples, encourages both:
• small reconstruction error
• representation insensitive to small variations around example
𝜕ℎ
𝜕𝑥
𝑥
𝐹
2
= ෍
𝑖𝑗
𝜕𝑧𝑗
𝜕𝑥𝑖
𝑥
2
= 𝐽ℎ 𝑥 𝐹
2
highlights the advantages for representations to be locally invariant in many directions of change of the raw input.
Contractive Auto-Encoders:Explicit Invariance During Feature Extraction
AUTOENOCDERS
Performance – Visualization of learned filtersCAE 22 / 24
CIFAR-10 (32x32 pixels)
MNIST digits (64x64 pixels) • 2000 hidden units
• 4000 hidden units
Gaussian noise
http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
AUTOENOCDERS
Performance – PretrainingCAE 23 / 24
• DAE-g : DAE with gaussian noise
• DAE-b : DAE with binary masking noise
• CIFAR-bw : gray scale version
• Training/Validation/test : 10k/2k/50k
• SAT : average fraction of saturated units per sample
• 1-hidden layer with 1000 units
Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
AUTOENOCDERS
Performance – PretrainingCAE 24 / 24
• basic: smaller subset of MNIST
• rot: digits with added random rotation
• bg-rand: digits with random noise background
• bg-img: digits with random image background
• bg-img-rot: digits with rotation and image background
• rect: discriminate between tall and wide rectangles (white on black)
• rect-img: discriminate between tall and wide rectangular image on a different background image
http://www.iro.umontreal.ca/~lisa/icml2007
Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
AUTOENOCDERS
01. Revisit Deep Neural Networks
02. Manifold Learning
03. Autoencoders
04. Variational Autoencoders
05. Applications
생성 모델로서의 Variational Autoencoder를 설명한다.
이에 생성 결과물을 조절하기 위한 조건을 추가한 Conditional
Variational Autoencoder를 설명한다.
또한 일반 prior distributio에 대해서 동작 가능한 Adversarial
Autoencder를 설명한다.
• Variational AE (VAE)
• Conditional VAE (CVAE)
• Adversarial AE (AAE)
KEYWORD: Generative model learning
Sample GenerationGENERATIVE MODEL VAE
1 / 49
https://github.com/mingyuliutw/cvpr2017_gan_tutorial/blob/master/gan_tutorial.pdf
Training Examples
Density Estimation
Sampling
Latent Variable ModelGENERATIVE MODEL VAE
2 / 49
𝑧 𝑥Generator
gθ(.)
Latent Variable Target Data
Latent variable can be seen as a set of control
parameters for target data (generated data)
For MNIST example, our model can be trained to
generate image which match a digit value z randomly
sampled from the set [0, ..., 9].
그래서, p(z)는 샘플링 하기 용이해야 편하다.
𝑧~𝑝(𝑧)
𝑥 = 𝑔 𝜃(𝑧)
Random variable
Random variable
𝑔 𝜃(∙) Deterministic function
parameterized by θ
𝑝 𝑥 𝑔 𝜃(𝑧) = 𝑝 𝜃 𝑥 𝑧
We are aiming maximize the probability of each x in the training set,
under the entire generative process, according to:
න 𝑝 𝑥 𝑔 𝜃 𝑧 𝑝(𝑧)𝑑𝑧 = 𝑝(𝑥)
Prior distribution p(z)GENERATIVE MODEL 3 / 49
Yes!!!
Recall that 𝑝 𝑥 𝑔 𝜃(𝑧) = 𝒩 𝑥 𝑔 𝜃 𝑧 , 𝜎2 ∗ 𝐼 . If 𝑔 𝜃(𝑧) is a multi-
layer neural network, then we can imagine the network using its first
few layers to map the normally distributed z’s to the latent values
(like digit identity, stroke weight, angle, etc.) with exactly the right
statitics. Then it can use later layers to map those latent values to a
fully-rendered digit.
Tutorial on Variational Autoencoders : https://arxiv.org/pdf/1606.05908
VAE
Question: Is it enough to model p(z) with simple distribution like normal distribution?
Generator가 여러 개의 레이어를 사용할 경우, 처음 몇 개의
레이어들를 통해 복잡할 수 있지만 딱 맞는 latent space로의 맵핑이
수행되고 나머지 레이어들을 통해 latent vector에 맞는 이미지를
생성할 수 있다.
GENERATIVE MODEL 4 / 49
If 𝑝 𝑥 𝑔 𝜃(𝑧) = 𝒩 𝑥 𝑔 𝜃 𝑧 , 𝜎2
∗ 𝐼 , the negative log probability
of X is proportional squared Euclidean distance between 𝑔 𝜃(𝑧)
and 𝑥.
𝑥 : Figure 3(a)
𝑧 𝑏𝑎𝑑 → 𝑔 𝜃 𝑧 𝑏𝑎𝑑 : Figure 3(b)
𝑧 𝑏𝑎𝑑 → 𝑔 𝜃 𝑧 𝑔𝑜𝑜𝑑 : Figure 3(c) (identical to x but shifted down and to the
right by half a pixel)
𝑥 − 𝑧 𝑏𝑎𝑑
2 < 𝑥 − 𝑧 𝑔𝑜𝑜𝑑
2
→ 𝑝 𝑥 𝑔 𝜃(𝑧 𝑏𝑎𝑑) >𝑝 𝑥 𝑔 𝜃(𝑧 𝑔𝑜𝑜𝑑)
Solution 1: we should set the 𝜎 hyperparameter of our Gaussian
distribution such that this kind of erroroneous digit does not
contribute to p(X)  hard..
Solution 2: we would likely need to sample many thousands of
digits from 𝑧 𝑔𝑜𝑜𝑑  hard..
𝑝(𝑥) ≈ ෍
𝑖
𝑝 𝑥 𝑔 𝜃 𝑧𝑖 𝑝(𝑧𝑖)
VAE
Question: Why don’t we use maximum likelihood estimation directly?
생성기에 대한 확률모델을 가우시안으로 할 경우, MSE관점
에서 가까운 것이 더 p(x)에 기여하는 바가 크다.
MSE가 더 작은 이미지가 의미적으로도 더 가까운 경우가
아닌 이미지들이 많기 때문에 현실적으로 올바른 확률값을
구하기가 어렵다.
Tutorial on Variational Autoencoders : https://arxiv.org/pdf/1606.05908
pθ(x|z)
ELBO : Evidence LowerBOundVARIATIONAL INFERENCE 5 / 49
VAE
앞 슬라이드에서 Solution2가 가능하게 하는 방법 중 하나는 z를 정규분포에서 샘플링하는 것보다 x와
유의미하게 유사한 샘플이 나올 수 있는 확률분포 𝑝(𝑧|𝑥)로 부터 샘플링하면 된다. 그러나 𝑝(𝑧|𝑥) 가 무엇인지
알지 못하므로, 우리가 알고 있는 확률분포 중 하나를 택해서 (𝑞 𝜙(𝑧|𝑥)) 그것의 파라미터값을 (𝜆) 조정하여
𝑝(𝑧|𝑥) 와 유사하게 만들어 본다. (Variational Inference)
https://www.slideshare.net/haezoom/variational-autoencoder-understanding-variational-autoencoder-from-various-perspectives
http://shakirm.com/papers/VITutorial.pdf
𝑝 𝑧|𝑥 ≈ 𝑞 𝜙 𝑧|𝑥 ~𝑧 𝑥Generator
gθ(.)
Latent Variable Target Data
One possible solution : sampling 𝑧 from 𝑝(𝑧|𝑥)
[ Variational Inference ]
ELBO : Evidence LowerBOundVARIATIONAL INFERENCE 6 / 49
VAE
[ Jensen’s Inequality ]
For concave functions f(.)
f(E[x])≥E[f(x)]log 𝑝 𝑥 = log න 𝑝 𝑥|𝑧 𝑝(𝑧)𝑑𝑧 ≥ න log 𝑝 𝑥|𝑧 𝑝(𝑧)𝑑𝑧
f(.) = log(.) is concave
log 𝑝 𝑥 = log න 𝑝 𝑥|𝑧
𝑝(𝑧)
𝑞 𝜙(𝑧|𝑥)
𝑞 𝜙(𝑧|𝑥)𝑑𝑧 ≥ න log 𝑝 𝑥|𝑧
𝑝(𝑧)
𝑞 𝜙(𝑧|𝑥)
𝑞 𝜙(𝑧|𝑥)𝑑𝑧
log 𝑝 𝑥 ≥ න log 𝑝 𝑥|𝑧 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 − න log
𝑞 𝜙(𝑧|𝑥)
𝑝 𝑧
𝑞 𝜙(𝑧|𝑥)𝑑𝑧
Variational lower bound
Evidence lower bound (ELBO)
= 𝔼 𝑞 𝜙 𝑧|𝑥 log 𝑝 𝑥|𝑧 − 𝐾𝐿 𝑞 𝜙 𝑧|𝑥 |𝑝 𝑧
Relationship among 𝑝 𝑥 , 𝑝 𝑧 𝑥 , 𝑞 𝜙 𝑧|𝑥 : Derivation 1
←
←Variational inference
𝐸𝐿𝐵𝑂( 𝜙)
ELBO를 최대화하는 𝜙∗
값을 찾으면 log 𝑝 𝑥 = 𝔼 𝑞 𝜙∗ 𝑧|𝑥 log 𝑝 𝑥|𝑧 − 𝐾𝐿 𝑞 𝜙∗ 𝑧|𝑥 |𝑝 𝑧 이다.
ELBO : Evidence LowerBOundVARIATIONAL INFERENCE 7 / 49
log 𝑝 𝑥 = න log 𝑝 𝑥 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 ← ‫׬‬ 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 = 1
= න log
𝑝 𝑥, 𝑧
𝑝 𝑧|𝑥
𝑞 𝜙(𝑧|𝑥)𝑑𝑧 ← 𝑝 𝑥 =
𝑝(𝑥, 𝑧)
𝑝(𝑧|𝑥)
= න log
𝑝 𝑥, 𝑧
𝑞 𝜙 𝑧|𝑥
∙
𝑞 𝜙 𝑧|𝑥
𝑝 𝑧|𝑥
𝑞 𝜙(𝑧|𝑥)𝑑𝑧
= න log
𝑝 𝑥, 𝑧
𝑞 𝜙 𝑧 𝑥
𝑞 𝜙(𝑧|𝑥)𝑑𝑧 + න log
𝑞 𝜙 𝑧 𝑥
𝑝 𝑧|𝑥
𝑞 𝜙(𝑧|𝑥)𝑑𝑧
𝐾𝐿 𝑞 𝜙 𝑧|𝑥 ∥ 𝑝 𝑧 𝑥
VAE
𝐸𝐿𝐵𝑂( 𝜙)
두 확률분포 간의 거리≥0
log 𝑝 𝑥
𝐸𝐿𝐵𝑂(𝜙)
𝜙1 𝜙2
KL을 최소화하는 𝑞 𝜙 𝑧|𝑥 의 𝜙값을 찾으면 되는데 𝑝 𝑧|𝑥 를 모르기 때문에,
KL최소화 대신에 ELBO를 최대화하는𝜙값을 찾는다.
Relationship among 𝑝 𝑥 , 𝑝 𝑧 𝑥 , 𝑞 𝜙 𝑧|𝑥 : Derivation 2
𝐾𝐿(𝑞 𝜙 𝑧|𝑥 ∥ 𝑝 𝑧|𝑥 )
8 / 49
log 𝑝 𝑥 = 𝐸𝐿𝐵𝑂( 𝜙) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥 𝑝 𝑧|𝑥
𝑞 𝜙∗ 𝑧|𝑥 = argmax
𝜙
𝐸𝐿𝐵𝑂(𝜙)
𝐸𝐿𝐵𝑂 𝜙 = න log
𝑝 𝑥, 𝑧
𝑞 𝜙 𝑧 𝑥
𝑞 𝜙(𝑧|𝑥)𝑑𝑧
= න log
𝑝 𝑥 𝑧 𝑝(z)
𝑞 𝜙 𝑧|𝑥
𝑞 𝜙(𝑧|𝑥)𝑑𝑧
ELBO : Evidence LowerBOundVARIATIONAL INFERENCE VAE
= 𝔼 𝑞 𝜙 𝑧|𝑥 log 𝑝 𝑥|𝑧 − 𝐾𝐿 𝑞 𝜙 𝑧|𝑥 |𝑝 𝑧
= න log 𝑝 𝑥 𝑧 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 − න log
𝑞 𝜙 𝑧|𝑥
𝑝(z)
𝑞 𝜙(𝑧|𝑥)𝑑𝑧
앞 슬라이드에서의 KL과 인자가 다른 것에 유의
Relationship among 𝑝 𝑥 , 𝑝 𝑧 𝑥 , 𝑞 𝜙 𝑧|𝑥 : Derivation 2
9 / 49
DerivationLOSS FUNCTION VAE
𝑝 𝑧|𝑥 ≈ 𝑞 𝜙 𝑧|𝑥 ~𝑧 𝑥Generator
gθ(.)
Latent Variable Target Data
log 𝑝 𝑥 ≥ 𝔼 𝑞 𝜙 𝑧|𝑥 log 𝑝 𝑥|𝑧 − 𝐾𝐿 𝑞 𝜙 𝑧|𝑥 |𝑝 𝑧 = 𝐸𝐿𝐵𝑂 𝜙
arg min
𝜙,𝜃
෍
𝑖
−𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑝 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
− ෍
𝑖
log 𝑝 𝑥𝑖 ≤ − ෍
𝑖
𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑝 𝑥𝑖|𝑔 𝜃(𝑧) − 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
Optimization Problem 1 on 𝜙: Variational Inference
Optimization Problem 2 on 𝜃: Maximum likelihood
Final Optimization Problem
10 / 49
NeuralNet PerspectiveLOSS FUNCTION VAE
𝐿𝑖 𝜙, 𝜃, 𝑥𝑖
arg min
𝜙,𝜃
෍
𝑖
−𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑝 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
𝑥𝑥 𝑔 𝜃(∙)𝑞 𝜙 ∙
𝑞 𝜙 𝑧|𝑥
~𝑧
𝑔 𝜃 𝑥|𝑧
Encoder
Posterior
Inference Network
Decoder
Generator
Generation Network
SAMPLING
The mathematical basis of VAEs actually has relatively little to do with classical autoencoders
Tutorial on Variational Autoencoders : https://arxiv.org/pdf/1606.05908
ExplanationLOSS FUNCTION 11 / 49
𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑝 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
Reconstruction Error Regularization
VAE
𝐿𝑖 𝜙, 𝜃, 𝑥𝑖
• 현재 샘플링 용 함수에 대한 negative log
likelihood
• 𝑥𝑖에 대한 복원 오차 (AutoEncoder 관점)
• 현재 샘플링 용 함수에 대한 추가 조건
• 샘플링의 용의성/생성 데이터에 대한
통제성을 위한 조건을 prior에 부여 하고
이와 유사해야 한다는 조건을 부여
다루기 쉬운 확률 분포 중 선택
Variational inference를 위한
approximation class 중 선택
원 데이터에 대한 likelihood
arg min
𝜙,𝜃
෍
𝑖
−𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑝 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
RegularizationLOSS FUNCTION 12 / 49
𝑞 𝜙 𝑧|𝑥𝑖 ~𝑁(𝜇𝑖, 𝜎𝑖
2
𝐼)
Assumption 1
[Encoder : approximation class]
multivariate gaussian distribution with a diagonal covariance
𝑝(𝑧) ~𝑁(0, 𝐼)
Assumption 2
[prior] multivariate normal distribution
VAE
𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
Regularization
Assumptions
𝑥𝑖 𝑞 𝜙 𝑧|𝑥𝑖
𝜇𝑖
𝜎𝑖
RegularizationLOSS FUNCTION 13 / 49
VAE
Regularization
KL divergence
𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧 =
1
2
𝑡𝑟 𝜎𝑖
2
𝐼 + 𝜇𝑖
𝑇
𝜇𝑖 − 𝐽 + ln
1
ς 𝑗=1
𝐽
𝜎𝑖,𝑗
2
=
1
2
෍
𝑗=1
𝐽
𝜎𝑖,𝑗
2
+ ෍
𝑗=1
𝐽
𝜇𝑖,𝑗
2
− 𝐽 − ෍
𝑗=1
𝐽
ln 𝜎𝑖,𝑗
2
=
1
2
෍
𝑗=1
𝐽
𝜇𝑖,𝑗
2
+ 𝜎𝑖,𝑗
2
− ln 𝜎𝑖,𝑗
2
− 1
priorposterior
𝑥𝑖 𝑞 𝜙 𝑧|𝑥𝑖
𝜇𝑖
𝜎𝑖
𝐽
Easy to compute!!
𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
Reconstruction errorLOSS FUNCTION 14 / 49
VAE
Sampling
𝑥𝑖 𝑞 𝜙 𝑧|𝑥𝑖
𝜇𝑖
𝜎𝑖
Reconstruction Error
𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑝 𝜃 𝑥𝑖|𝑧 = න log 𝑝 𝜃 𝑥𝑖|𝑧 𝑞 𝜙 𝑧|𝑥𝑖 𝑑𝑧
≈
1
𝐿
෍
𝑧 𝑖,𝑙
log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖,𝑙Monte-carlo technique 
𝑧 𝑖,1
𝑧 𝑖,2
𝑧 𝑖,𝑙
𝑧 𝑖,𝐿
……
log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖,1
log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖,2
log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖,𝑙
log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖,𝐿
……
mean
……
Reconstruction
Error
• L is the number of samples for latent vector
• Usually L is set to 1 for convenience
SAMPLING
https://home.zhaw.ch/~dueo/bbs/files/vae.pdf
𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
Reconstruction errorLOSS FUNCTION 15 / 49
VAE
Reparameterization Trick
𝑧 𝑖,𝑙~𝑁(𝜇𝑖, 𝜎𝑖
2
𝐼)Sampling
Process
𝑧 𝑖,𝑙 = 𝜇𝑖 + 𝜎𝑖
2
⨀𝜖
𝜖~𝑁(0, 𝐼)
Same distribution!
But it makes backpropagation possible!!
https://home.zhaw.ch/~dueo/bbs/files/vae.pdf
Reconstruction errorLOSS FUNCTION 16 / 49
VAE
Assumption
Reconstruction Error
𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑝 𝜃 𝑥𝑖|𝑧 = න log 𝑝 𝜃 𝑥𝑖|𝑧 𝑞 𝜙 𝑧|𝑥𝑖 𝑑𝑧 ≈
1
𝐿
෍
𝑧 𝑖,𝑙
log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖,𝑙
≈ log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖
Monte-carlo
technique
L=1
𝑝𝑖𝑔 𝜃(∙)𝑧 𝑖
Assumption 3-1
[Decoder, likelihood]
multivariate bernoulli or gaussain distribution
log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖 = log ෑ
𝑗=1
𝐷
𝑝 𝜃 𝑥𝑖,𝑗|𝑧 𝑖 = ෍
𝑗=1
𝐷
log 𝑝 𝜃 𝑥𝑖,𝑗|𝑧 𝑖
𝐷
= ෍
𝑗=1
𝐷
log 𝑝𝑖,𝑗
𝑥 𝑖,𝑗
1 − 𝑝𝑖,𝑗
1−𝑥 𝑖,𝑗
← 𝑝𝑖,𝑗 ≗ 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑜𝑢𝑡𝑝𝑢𝑡
= ෍
𝑗=1
𝐷
𝑥𝑖,𝑗 log 𝑝𝑖,𝑗 + (1 − 𝑥𝑖,𝑗) log 1 − 𝑝𝑖,𝑗 ← Cross entropy
𝑝 𝜃 𝑥𝑖|𝑧 𝑖
~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝𝑖)
𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
Reconstruction errorLOSS FUNCTION 17 / 49
VAE
Assumption
Reconstruction Error
𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑝 𝜃 𝑥𝑖|𝑧 ≈ log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖
𝜎𝑖
𝑔 𝜃(∙)𝑧 𝑖
Assumption 3-2
[Decoder, likelihood]
multivariate bernoulli or gaussain distribution
𝐷
← Squared Error
𝜇𝑖
𝐷
log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖 = log 𝑁(𝑥𝑖; 𝜇𝑖, 𝜎𝑖
2
𝐼)
= − ෍
𝑗=1
𝐷 1
2
log 𝜎𝑖,𝑗
2
+
𝑥𝑖,𝑗 − 𝜇𝑖,𝑗
2
2𝜎𝑖,𝑗
2
For gaussain distribution with identity covariance
log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖 ∝ − ෍
𝑗=1
𝐷
𝑥𝑖,𝑗 − 𝜇𝑖,𝑗
2
𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
Default : Gaussian Encoder + Bernoulli DecoderSTRUCTURE 18 / 49
VAE
𝑥𝑖
𝑞 𝜙 ∙
𝜇𝑖
𝜎𝑖
𝑝𝑖
𝑔 𝜃(∙)
𝑧 𝑖
𝜖𝑖
SAMPLING
𝒩(0, 𝐼) Reparameterization
Trick
Gaussian
Encoder
Bernoulli
Decoder
Reconstruction Error: − σ 𝑗=1
𝐷
𝑥𝑖,𝑗 log 𝑝𝑖,𝑗 + (1 − 𝑥𝑖,𝑗) log 1 − 𝑝𝑖,𝑗
Regularization :
1
2
σ 𝑗=1
𝐽
𝜇𝑖,𝑗
2
+ 𝜎𝑖,𝑗
2
− ln 𝜎𝑖,𝑗
2
− 1
𝐷
𝐽
Gaussian Encoder + Gaussian DecoderSTRUCTURE 19 / 49
VAE
𝑥𝑖
𝑞 𝜙 ∙
𝜇𝑖
𝜎𝑖
𝜇𝑖
′
𝑔 𝜃(∙)
𝑧 𝑖
𝜖𝑖
SAMPLING
Reparameterization
Trick
Gaussian
Encoder
Gaussian
Decoder
Reconstruction Error: σ 𝑗=1
𝐷 1
2
log 𝜎𝑖,𝑗
′2
+
𝑥 𝑖,𝑗−𝜇 𝑖,𝑗
′
2
2𝜎𝑖,𝑗
′2
Regularization :
1
2
σ 𝑗=1
𝐽
𝜇𝑖,𝑗
2
+ 𝜎𝑖,𝑗
2
− ln 𝜎𝑖,𝑗
2
− 1
𝐷
𝐽
𝜎𝑖
′
𝒩(0, 𝐼)
Gaussian Encoder + Gaussian Decoder with Identity CovarianceSTRUCTURE 20 / 49
VAE
𝑥𝑖
𝑞 𝜙 ∙
𝜇𝑖
𝜎𝑖
𝜇𝑖
𝑔 𝜃(∙)
𝑧 𝑖
𝜖𝑖
SAMPLING
Reparameterization
Trick
Gaussian
Encoder
Gaussian
Decoder
Reconstruction Error: σ 𝑗=1
𝐷
𝑥 𝑖,𝑗−𝜇 𝑖,𝑗
′
2
2
Regularization :
1
2
σ 𝑗=1
𝐽
𝜇𝑖,𝑗
2
+ 𝜎𝑖,𝑗
2
− ln 𝜎𝑖,𝑗
2
− 1
𝐷
𝐽
𝒩(0, 𝐼)
MNISTRESULT 21 / 49
28
VAE
𝑥𝑖
𝑞 𝜙 ∙
𝜇𝑖
𝜎𝑖
𝑝𝑖
𝑔 𝜃(∙)
𝑧 𝑖
𝜖𝑖
SAMPLING
Reparameterization
Trick
Gaussian
Encoder
Bernoulli
Decoder
𝐷
𝐽
28
𝐷=784
MLP with 2 hidden layers (500, 500)
Architecture
𝒩(0, 𝐼)
MNISTRESULT 22 / 49
VAE
Reproduce
Input image J = |z| =2 J = |z| =5 J = |z| =20
https://github.com/hwalsuklee/tensorflow-mnist-VAE
MNISTRESULT 23 / 49
VAE
Denoising
Input image + zero-masking noise with 50% prob.
+ salt&peppr noise with 50% prob.
Restored image
https://github.com/hwalsuklee/tensorflow-mnist-VAE
MNISTRESULT 24 / 49
VAE
Learned Manifold
https://github.com/hwalsuklee/tensorflow-mnist-VAE
AE VAE
• 테스트 샘플 중 5000개가 매니폴드 어느 위치에 맵핑이 되는 지 보여줌.
• 학습 6번의 결과를 애니메이션으로 표현.
• 생성 관점에서는 다루고자 하는 매니폴드의 위치가 안정적이야 좋음.
MNISTRESULT 25 / 49
VAE
Learned Manifold
학습이 잘 되었을 수록 2D공간에서 같은 숫자들을 생성하는 z들은 뭉쳐있고,
다른 숫자들은 생성하는 z들은 떨어져 있어야 한다.
z1
z2
https://github.com/hwalsuklee/tensorflow-mnist-VAE
A
A
B
B
C
C
D
D
IntroductionCVAE 26 / 49
VAE
Conditional VAE
𝑥
ℎ
ℎ
𝜇𝜎 𝑧
𝑥
ℎ
ℎ
𝜖
𝑞 𝜆 𝑧|𝑥 𝑝 𝜃 𝑥|𝑧
Vanilla VAE (M1) CVAE (M2) : supervised version
𝑥
ℎ
ℎ
𝜇𝜎 𝑧
𝑥
ℎ
ℎ
𝜖
𝑞 𝜆 𝑧|𝑥, 𝑦 𝑝 𝜃 𝑥|𝑧, 𝑦
𝑦
𝑦
Condition on latent space
Condition on output
M2CVAE 27 / 49
VAE
Summary
CVAE (M2) : supervised version
𝑥
ℎ
ℎ
𝜇𝜎 𝑧
𝑥
ℎ
ℎ
𝜖
𝑞 𝜆 𝑧|𝑥, 𝑦 𝑝 𝜃 𝑥|𝑧, 𝑦
𝑦
𝑦
Condition on latent space
Condition on output
log 𝑝 𝜃 𝑥, 𝑦 = log න 𝑝 𝜃 𝑥, 𝑦|𝑧
𝑝(𝑧)
𝑞 𝜙 𝑧 𝑥, 𝑦
𝑞 𝜙(𝑧|𝑥, 𝑦)𝑑𝑧
= 𝔼 𝑞 𝜙 𝑧|𝑥,𝑦 ቂlog 𝑝 𝜃 𝑥|𝑦, 𝑧 + log 𝑝 𝑦
= න log 𝑝 𝜃 𝑥|𝑦, 𝑧
𝑝(𝑦)𝑝(𝑧)
𝑞 𝜙 𝑧 𝑥, 𝑦
𝑞 𝜙(𝑧|𝑥, 𝑦)𝑑𝑧
≥ න log 𝑝 𝜃 𝑥, 𝑦|𝑧
𝑝(𝑧)
𝑞 𝜙 𝑧 𝑥, 𝑦
𝑞 𝜙(𝑧|𝑥, 𝑦)𝑑𝑧
ELBO!!= −ℒ(𝑥, 𝑦)
M3CVAE 28 / 49
VAE
Summary
CVAE (M2) : unsupervised version
𝑥
ℎ
ℎ
𝜇𝜎 𝑧
𝑥
ℎ
ℎ
𝜖
𝑦
ℎ
ℎ
CVAE (M3)
Train M1
Train M2
29 / 49
VAE
Architecture : M2 supervised version
MNIST resultsCVAE
𝑥
ℎ
ℎ
𝜇𝜎 𝑧
𝑥
ℎ
ℎ
𝜖
𝑞 𝜆 𝑧|𝑥, 𝑦 𝑝 𝜃 𝑥|𝑧, 𝑦
𝑦
𝑦
Label info
MLP with 2 hidden layers
(500, 500)
MLP with 2 hidden layers
(500, 500)
Label info
30 / 49
input
CVAE, epoch 1
VAE, epoch 1
CVAE, epoch 20
VAE, epoch 20
VAE
Reproduce |z| = 2
MNIST resultsCVAE
https://github.com/hwalsuklee/tensorflow-mnist-CVAE
31 / 49
CVAE, epoch 1
VAE, epoch 1
CVAE, epoch 20
VAE, epoch 20
input
VAE
Denoising |z| = 2
MNIST resultsCVAE
https://github.com/hwalsuklee/tensorflow-mnist-CVAE
32 / 49
VAE
Handwriting styles obtained by fixing the class label and varying z |z| = 2
y=[1,0,0,0,0,0,0,0,0,0] y=[0,1,0,0,0,0,0,0,0,0] y=[0,0,1,0,0,0,0,0,0,0] y=[0,0,0,1,0,0,0,0,0,0] y=[0,0,0,0,1,0,0,0,0,0]
y=[0,0,0,0,0,0,1,0,0,0]y=[0,0,0,0,0,1,0,0,0,0] y=[0,0,0,0,0,0,0,0,1,0]y=[0,0,0,0,0,0,0,1,0,0] y=[0,0,0,0,0,0,0,0,0,1]
MNIST resultsCVAE
https://github.com/hwalsuklee/tensorflow-mnist-CVAE
33 / 49
Z-sampling
각 행 별로, 고정된 z값에
대해서 label정보만 바꿔
서 이미지 생성 (스타일
유지하면 숫자만 바뀜)
VAE
MNIST resultsCVAE
Analogies : Result in paper
Semi-Supervised Learning with Deep Generative Models : https://arxiv.org/abs/1406.5298
34 / 49
𝑧1
𝑧2
𝑧3
𝑧4
𝑐0 𝑐1 𝑐2 𝑐3 𝑐4 𝑐5 𝑐6 𝑐7 𝑐8 𝑐9
Handwriting style for a given z must be preserved for all labels
VAE
Analogies |z| = 2
MNIST resultsCVAE
https://github.com/hwalsuklee/tensorflow-mnist-CVAE
𝑐0 𝑐1 𝑐2 𝑐3 𝑐4 𝑐5 𝑐6 𝑐7 𝑐8 𝑐9
Real
handwritten
image
실제로 손으로 쓴 글씨 ‘3’을 CVAE의 label정보와 같이 넣었을 때 얻는
latent vector는 decoder의 고정 입력으로 하고, label정보만 바꿨을 경우
35 / 49
Things are messy here, in contrast to VAE’s
Q(z|X), which nicely clusters z.
But if we look at it closely, we could see that
given a specific value of c=y, Q(z|X,c=y) is
roughly N(0,1)!
It’s because, if we look at our objective above,
we are now modeling P(z|c), which we infer
variationally with a N(0,1).
Q(z|X,c=y)가 N(0,1)에 가까운 모습인데,
P(z|c)가 N(0,1)이고 Q(z|X,c=y)는 P(z|c)와의 KL-
Divergence를 최소화하도록 학습이 되기 때문에
바람직한 현상이다.
(P(z|X,c=y) = Q(z|X,c=y)임은 이미지 결과로 확인했음.)
VAE
Learned Manifold |z| = 2
MNIST resultsCVAE
36 / 49
N = 100일 때 직접 돌려보니, 0.9514  4.86
(총 50000개 중, 100개만 레이블 사용, 49900개는 미사용)
VAE
MNIST resultsCVAE
Classification : Result in paper
https://github.com/saemundsson/semisupervised_vae
Semi-Supervised Learning with Deep Generative Models : https://arxiv.org/abs/1406.5298
37 / 49
IntroductionAAE
𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑝 𝜃 𝑥𝑖|𝑧 + 𝐾𝐿(𝑞 𝜙 𝑧|𝑥𝑖 ∥ 𝑝 𝑧 )
Regularization
Conditions for 𝑞 𝜙 𝑧|𝑥𝑖 , 𝑝 𝑧
1. Easily draw samples from distribution
2. KL divergence can be calculated
Adversarial Autoencoder (AAE)
Adversarial Autoencoder
Conditions 𝑞 𝜙 𝑧|𝑥𝑖 𝑝 𝑧
Easily draw samples from distribution O O
KL divergence can be calculated X X
VAE
KL divergence is replaced by discriminator in GAN
38 / 49
ArchitectureAAE
Generative Adversarial Network
VAE
𝑝 𝑧(𝑧)
Generator
𝑝 𝑑𝑎𝑡𝑎(𝑥)
Yes / No
𝐷 𝑥 = 1
𝑥
𝐷 𝐺 𝑧 = 1
𝑉 𝐷, 𝐺 = 𝔼 𝑥~𝑝 𝑑𝑎𝑡𝑎(𝑥) log𝐷(𝑥) + 𝔼 𝑧~𝑝 𝑧(𝑧) log 1 − 𝐷 𝐺 𝑧Value function of GAN :
Goal : 𝐷∗, 𝐺∗ = min
𝐺
max
𝐷
𝑉 𝐷, 𝐺
Discriminator
𝑧
𝐺(𝑧)
𝐷 𝐺 𝑧 = 0
GAN은 𝐺 𝑧 ~𝑝 𝑑𝑎𝑡𝑎(𝑥)로 만드는 것이 목적이다
39 / 49
ArchitectureAAE
Overall
VAE
AutoEncoder
Prior Distribution
(Target Distribution)
Discriminator
Generator
Adversarial Autoencoders : https://arxiv.org/abs/1511.05644
40 / 49
TrainingAAE
Loss Function
VAE
𝑉 𝐷, 𝐺 = 𝔼 𝑧~𝑝(𝑧) log𝐷(𝑧) + 𝔼 𝑥~𝑝(𝑥) log 1 − 𝐷 𝑞 𝜙(𝑥)GAN loss
VAE loss 𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑝 𝜃 𝑥𝑖|𝑧 + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
Let’s say G is defined by 𝑞 𝜙(∙) and D is defined by 𝑑 𝜆 ∙
𝑉𝑖 𝜙, 𝜆, 𝑥𝑖, 𝑧𝑖 = log𝑑 𝜆(𝑧𝑖) + log 1 − 𝑑 𝜆 𝑞 𝜙(𝑥𝑖)
*논문에는 로스 정의가 제시되어 있지 않아 새로 정리한 내용
41 / 49
TrainingAAE
Training Procedure
VAE
−𝑉𝑖 𝜙, 𝜆, 𝑥𝑖, 𝑧𝑖 = −log𝑑 𝜆 𝑧𝑖 − log 1 − 𝑑 𝜆 𝑞 𝜙(𝑥𝑖)
Training Step 1 : Update AE
update 𝜙, 𝜃 according to reconstruction error
𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖
log 𝑝 𝜃 𝑥𝑖|𝑧
Training Step 2 : Update Discriminator
update 𝜆 according to loss for discriminator
−𝑉𝑖 𝜙, 𝜆, 𝑥𝑖, 𝑧𝑖 = −log 𝑑 𝜆 𝑞 𝜙(𝑥𝑖)
Training Step 3 : Update Generator
update 𝜙 according to loss for discriminator
For drawn samples 𝑥𝑖 from training data set, 𝑧𝑖 from prior distribution 𝑝(𝑧)
*논문에는 학습 절차 정의가 수식으로 제시되어 있지 않아 새로 정리한 내용
42 / 49
MNIST ResultsAAE
VAE VS AAE
VAE
𝑝 𝑧 : 𝑚𝑖𝑥𝑡𝑢𝑟𝑒 𝑜𝑓 10 𝑔𝑎𝑢𝑠𝑠𝑖𝑎𝑛𝑠
𝑝 𝑧 : 𝒩(0,52 𝐼)
VAE는 자주 나오는 값을 파악하는 것 중시하여
빈 공간이 가끔 있는 반면,
AAE는 분포의 모양을 중시하여 빈 공간이 상
대적으로 적다.
Adversarial Autoencoders : https://arxiv.org/abs/1511.05644
43 / 49
MNIST ResultsAAE
Incorporating Label Information in the Adversarial Regularization
VAE
Condition on
latent space
• Discriminator에 prior distribution에서 뽑은 샘플이 입
력으로 들어갈 때는 해당 샘플이 어떤 label을 가져야
하는지에 대한 condition을 Discriminator에 넣어준다.
• Discriminator에 posterior distribution에서 뽑은 샘플이
입력으로 들어갈 때는 해당 이미지에 대한 label을
Discriminator에 넣어준다.
• 특정 label의 이미지는 Latent space에서 의도된 구간으
로 맵핑된다.
Adversarial Autoencoders : https://arxiv.org/abs/1511.05644
44 / 49
MNIST ResultsAAE
Incorporating Label Information in the Adversarial Regularization
VAE
각 gaussian 분포에서 동일 위치는 동일 스타일을 갖는다.
나선을 따라서 해당 샘플들을 순차적으로 복원했을 때의 결과Adversarial Autoencoders : https://arxiv.org/abs/1511.05644
45 / 49
MNIST ResultsAAE
Supervised Adversarial Autoencoders
VAE
Condition on
generated data
𝑐0 𝑐1 𝑐2 𝑐3 𝑐4 𝑐5 𝑐6 𝑐7 𝑐8 𝑐9
각 행은 동일 z값Adversarial Autoencoders : https://arxiv.org/abs/1511.05644
46 / 49
MNIST ResultsAAE
Semi-Supervised Adversarial Autoencoders
VAE
auxiliary classifier
Discriminator
• Label이 제공되지 않을 경우에는 auxiliary classifier를
통해 label을 예측하고 이 예측이 맞는지를 확인하기 위
한 discriminator를 추가로 학습시킨다.
Adversarial Autoencoders : https://arxiv.org/abs/1511.05644
47 / 49
MNIST ResultsAAE
Incorporating Label Information in the Adversarial Regularization
VAE
실제 실험 결과 : mixture of 10 gaussians 실제 실험 결과 : swiss roll
https://github.com/hwalsuklee/tensorflow-mnist-AAE
48 / 49
MNIST ResultsAAE
Incorporating Label Information in the Adversarial Regularization
VAE
실제 실험 결과 : mixture of 10 gaussians
Learned Manifold Generation
https://github.com/hwalsuklee/tensorflow-mnist-AAE
49 / 49
MNIST ResultsAAE
Incorporating Label Information in the Adversarial Regularization
VAE
실제 실험 결과 : swiss roll
Learned Manifold Generation
https://github.com/hwalsuklee/tensorflow-mnist-AAE
01. Revisit Deep Neural Networks
02. Manifold Learning
03. Autoencoders
04. Variational Autoencoders
05. Applications
데이터 압축의 주 사용처인 retrieval에 관한 사례, 생성 모델로서의
VAE를 사용한 사례, 최근 가장 유행하는 GAN 방법과 결합하여
사용되는 VAE들을 소개한다.
• Retrieval
• Generation
• GAN+VAE
Information Retrieval via AutoencodersRETRIEVAL 1 / 22
APPLICATIONS
• Text
• Semantic Hashing
(Link) http://www.cs.utoronto.ca/~rsalakhu/papers/semantic_final.pdf
• Dynamic Auto-Encoders for Semantic Indexing
(Link) http://yann.lecun.com/exdb/publis/pdf/mirowski-nipsdl-10.pdf
• Image
• Using Very Deep Autoencoders for Content-Based Image Retrieval
(Link) http://nuyoo.utm.mx/~jjf/rna/A6%20Using%20Very%20Deep%20Autoencoders%20for%20Content-
Based%20Image%20Retrieval.pdf
• Autoencoding the Retrieval Relevance of Medical Images
(Link) https://arxiv.org/pdf/1507.01251.pdf
• Sound
• Retrieving Sounds by Vocal Imitation Recognition
(Link)
http://www.ece.rochester.edu/~zduan/resource/ZhangDuan_RetrievingSoundsByVocalImitationRecognition_MLSP
15.pdf
Information Retrieval via AutoencodersRETRIEVAL 2 / 22
APPLICATIONS
• 3D model
• Deep Learning Representation using Autoencoder for 3D Shape Retrieval
(Link) https://arxiv.org/pdf/1409.7164.pdf
• Deep Signatures for Indexing and Retrieval in Large Motion Databases
(Link) http://web.cs.ucdavis.edu/~neff/papers/MIG_2015_DeepSignature.pdf
• DeepShape: Deep Learned Shape Descriptor for 3D Shape Matching and Retrieval
(Link) http://www.cv-
foundation.org/openaccess/content_cvpr_2015/papers/Xie_DeepShape_Deep_Learned_2015_CVPR_paper.pdf
• Multi-modal
• Cross-modal Retrieval with Correspondence Autoencoder
(Link) https://people.cs.clemson.edu/~jzwang/1501863/mm2014/p7-feng.pdf
• Effective multi-modal retrieval based on stacked autoencoders
(Link) http://www.comp.nus.edu.sg/~ooibc/crossmodalvldb14.pdf
Gray Face / Handwritten DigitsGENERATION 3 / 22
APPLICATIONS
http://vdumoulin.github.io/morphing_faces/online_demo.html
|z|=29
64
64
http://www.dpkingma.com/sgvb_mnist_demo/demo.html
|z|=12
24
24
Handwritten Digits GenerationGray Face Generation
Deep Feature Consistent Variational AutoencoderGENERATION 4 / 22
APPLICATIONS
https://arxiv.org/abs/1610.00291
celeba DB
BEGAN
Deep Feature Consistent Variational AutoencoderGENERATION 5 / 22
APPLICATIONS
https://arxiv.org/abs/1610.00291
Sketch RNNGENERATION 6 / 22
APPLICATIONS
https://magenta.tensorflow.org/sketch-rnn-demo
The model can also mimic your drawings and produce similar doodles. In the Variational Autoencoder Demo, you are to draw a complete drawing of a
specified object. After you draw a complete sketch inside the area on the left, hit the auto-encode button and the model will start drawing similar
sketches inside the smaller boxes on the right. Rather than drawing a perfect duplicate copy of your drawing, the model will try to mimic your drawing
instead.
You can experiment drawing objects that are not the category you are supposed to draw, and see how the model interprets your drawing. For example,
try to draw a cat, and have a model trained to draw crabs generate cat-like crabs. Try the Variational Autoencoder demo.
https://magenta.tensorflow.org/assets/sketch_rnn_demo/multi_vae.html
IntroductionGAN+VAE 7 / 22
Model Optimization Image Quality Generalization
VAE
• Stochastic gradient descent
• Converge to local minimum
• Easier
• Smooth
• Blurry
• Tend to remember
input images
GAN
• Alternating stochastic gradient descent
• Converge to saddle points
• Harder
 Model collapsing
 Unstable convergence
• Sharp
• Artifact
• Generate new
unseen images
Comparison between VAE vs GAN
x
0~1D
G
G(z) Discriminator
Generator
z
𝑧 DE𝑥 𝑥
DecoderEncoderDiscriminator   Generator
VAE GAN
APPLICATIONS
IntroductionGAN+VAE 8 / 22
Comparison between VAE vs GAN
APPLICATIONS
VAE : maximum
likelihood approach
GAN
http://videolectures.net/site/normal_dl/tag=1129740/deeplearning2017_courville_generative_models_01.pdf
9 / 22
Regularized Autoencoders
x
Energy
G
G(z)
Discriminator
Generator
z
AE
GAN
𝑧 DE
Reconstruction
error
We argue that the energy function (the discriminator) in the EBGAN framework is also seen as
being regularized by having a generator producing the contrastive samples, to which the discriminator
ought to give high reconstruction energies.
We further argue that the EBGAN framework allows more flexibility from this perspective, because: (i)-the
regularizer (generator) is fully trainable instead of being handcrafted; (ii)-the adversarial training paradigm
enables a direct interaction between the duality of producing contrastive sample and learning the energy
function.
EBGAN : Energy-based Generative Adversarial Network ‘16.09
BEGAN : Boundary Equilibrium Generative Adversarial Networks ‘17.03
APPLICATIONS
EBGAN, BEGANGAN+VAE
10 / 22
StackGAN : Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks , ‘16.12
Multimodal Feature Learner
AI가 생성한 사진을 찾아보세요!
(1) This flower has overlapping pink pointed petals surrounding a ring of short yellow filaments
이 꽃은 짧은 노란색 필라멘트의 고리를 둘러싼 핑크색 뾰족한 꽃잎이 겹쳐져 있습니다
(2) This flower has upturned petals which are thin and orange with rounded edges
이 꽃은 둥근 모서리를 가진 얇고 오렌지색의 꽃잎이 위로 향해 있습니다
(3) A flower with small pink petals and a massive central orange and black stamen cluster
작은 분홍색 꽃잎들과 중심에 다수의 오렌지색과 검은 색 수술 군집이 있는 꽃
(1) (2) (3)
APPLICATIONS
StackGANGAN+VAE
11 / 22
Multimodal Feature Learner
StackGAN : Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks , ‘16.12
APPLICATIONS
StackGANGAN+VAE
12 / 22
Learning a Probabilistic latent Space of Object Shapes via 3D Generative-Adversarial Modeling (3D-GAN), ‘16.10
Multimodal Feature Learner
APPLICATIONS
3DGANGAN+VAE
13 / 22
Denoising
SEGAN: Speech Enhancement Generative Adversarial Network ‘17. 03. 28
Nothing is safe.
There will be no repeat of that
performance, that I can guarantee.
before after
before after
APPLICATIONS
SEGANGAN+VAE
14 / 22
Age Progression/Regression by Conditional Adversarial Autoencoder
https://zzutk.github.io/Face-Aging-CAAE/
APPLICATIONS
Papers in CVPR2017GAN+VAE
15 / 22
Age Progression/Regression by Conditional Adversarial Autoencoder
https://zzutk.github.io/Face-Aging-CAAE/
APPLICATIONS
Papers in CVPR2017GAN+VAE
16 / 22
Age Progression/Regression by Conditional Adversarial Autoencoder
https://zzutk.github.io/Face-Aging-CAAE/
APPLICATIONS
Papers in CVPR2017GAN+VAE
17 / 22
PaletteNet: Image Recolorization with Given Color Palette
http://tmmse.xyz/2017/07/27/palettenet/
APPLICATIONS
Papers in CVPR2017GAN+VAE
18 / 22
PaletteNet: Image Recolorization with Given Color Palette
http://tmmse.xyz/2017/07/27/palettenet/
APPLICATIONS
Papers in CVPR2017GAN+VAE
19 / 22
Hallucinating Very Low-Resolution Unaligned and Noisy Face Images by Transformative Discriminative Autoencoders
http://www.porikli.com/mysite/pdfs/porikli%202017%20-%20Hallucinating%20very%20low-
resolution%20unaligned%20and%20noisy%20face%20images%20by%20transformative%20discriminative%20autoencoders.pdf
APPLICATIONS
Papers in CVPR2017GAN+VAE
16x16  128x128
20 / 22
Hallucinating Very Low-Resolution Unaligned and Noisy Face Images by Transformative Discriminative Autoencoders
http://www.porikli.com/mysite/pdfs/porikli%202017%20-%20Hallucinating%20very%20low-
resolution%20unaligned%20and%20noisy%20face%20images%20by%20transformative%20discriminative%20autoencoders.pdf
APPLICATIONS
Papers in CVPR2017GAN+VAE
TUN loss
DL loss
TE loss
21 / 22
A Generative Model of People in Clothing
https://arxiv.org/abs/1705.04098
APPLICATIONS
Papers in ICCV2017GAN+VAE
22 / 22
A Generative Model of People in Clothing
APPLICATIONS
Papers in ICCV2017GAN+VAE
Conditional generation for test time
Condition on
human pose
Sketch info
for cloth
Famous pix2pix architecture
https://arxiv.org/abs/1705.04098
End of Document

More Related Content

What's hot

[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial NetworksJaeJun Yoo
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.Yongho Ha
 
Variational inference intro. (korean ver.)
Variational inference intro. (korean ver.)Variational inference intro. (korean ver.)
Variational inference intro. (korean ver.)Kiho Hong
 
Triplet Loss 徹底解説
Triplet Loss 徹底解説Triplet Loss 徹底解説
Triplet Loss 徹底解説tancoro
 
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position EmbeddingRoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embeddingtaeseon ryu
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習Deep Learning JP
 
論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative ModelsSeiya Tokui
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational AutoencoderDeep Learning JP
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習Masahiro Suzuki
 
[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)Donghyeon Kim
 
IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習Preferred Networks
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門tmtm otm
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
PRML第6章「カーネル法」
PRML第6章「カーネル法」PRML第6章「カーネル法」
PRML第6章「カーネル法」Keisuke Sugawara
 
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic SegmentationDeep Learning JP
 

What's hot (20)

[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
 
Variational inference intro. (korean ver.)
Variational inference intro. (korean ver.)Variational inference intro. (korean ver.)
Variational inference intro. (korean ver.)
 
Triplet Loss 徹底解説
Triplet Loss 徹底解説Triplet Loss 徹底解説
Triplet Loss 徹底解説
 
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position EmbeddingRoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習
 
論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
 
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習
 
[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)
 
IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門
 
Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
Swin transformer
Swin transformerSwin transformer
Swin transformer
 
PRML第6章「カーネル法」
PRML第6章「カーネル法」PRML第6章「カーネル法」
PRML第6章「カーネル法」
 
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
 

Similar to 오토인코더의 모든 것

Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...AmirParnianifard1
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfssuser7f0b19
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines SimplyEmad Nabil
 
李宏毅课件-Regression.pdf
李宏毅课件-Regression.pdf李宏毅课件-Regression.pdf
李宏毅课件-Regression.pdfssusere61d07
 
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjdArjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd12345arjitcs
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
ML_basics_lecture1_linear_regression.pdf
ML_basics_lecture1_linear_regression.pdfML_basics_lecture1_linear_regression.pdf
ML_basics_lecture1_linear_regression.pdfTigabu Yaya
 
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...Jisang Yoon
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 
Neural networks with python
Neural networks with pythonNeural networks with python
Neural networks with pythonSimone Piunno
 
Problem solving using computers - Chapter 1
Problem solving using computers - Chapter 1 Problem solving using computers - Chapter 1
Problem solving using computers - Chapter 1 To Sum It Up
 

Similar to 오토인코더의 모든 것 (20)

2021 04-01-dalle
2021 04-01-dalle2021 04-01-dalle
2021 04-01-dalle
 
Xgboost
XgboostXgboost
Xgboost
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
 
Xgboost
XgboostXgboost
Xgboost
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdf
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 
李宏毅课件-Regression.pdf
李宏毅课件-Regression.pdf李宏毅课件-Regression.pdf
李宏毅课件-Regression.pdf
 
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjdArjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
 
MLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptxMLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptx
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Repair dagstuhl jan2017
Repair dagstuhl jan2017Repair dagstuhl jan2017
Repair dagstuhl jan2017
 
ML_basics_lecture1_linear_regression.pdf
ML_basics_lecture1_linear_regression.pdfML_basics_lecture1_linear_regression.pdf
ML_basics_lecture1_linear_regression.pdf
 
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
PPT - Adaptive Quantitative Trading : An Imitative Deep Reinforcement Learnin...
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
AI Lesson 39
AI Lesson 39AI Lesson 39
AI Lesson 39
 
Lesson 39
Lesson 39Lesson 39
Lesson 39
 
EPFL_presentation
EPFL_presentationEPFL_presentation
EPFL_presentation
 
Neural networks with python
Neural networks with pythonNeural networks with python
Neural networks with python
 
Problem solving using computers - Chapter 1
Problem solving using computers - Chapter 1 Problem solving using computers - Chapter 1
Problem solving using computers - Chapter 1
 

More from NAVER Engineering

디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIXNAVER Engineering
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)NAVER Engineering
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트NAVER Engineering
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호NAVER Engineering
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라NAVER Engineering
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기NAVER Engineering
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정NAVER Engineering
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기NAVER Engineering
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)NAVER Engineering
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드NAVER Engineering
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기NAVER Engineering
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활NAVER Engineering
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출NAVER Engineering
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우NAVER Engineering
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...NAVER Engineering
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법NAVER Engineering
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며NAVER Engineering
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기NAVER Engineering
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기NAVER Engineering
 

More from NAVER Engineering (20)

React vac pattern
React vac patternReact vac pattern
React vac pattern
 
디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
 

Recently uploaded

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 

Recently uploaded (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 

오토인코더의 모든 것

  • 1. Autoencoders A way for Unsupervised Learning of Nonlinear Manifold 이활석 Slide Design : https://graphicriver.net/item/simpleco-simple-powerpoint-template/13220655 한글로 작성된 부분은 개인적인 견해인 경우가 많으니 참조하시기 바랍니다. 각 슬라이드 아래에는 해당 슬라이드에서 사용된 자료 출처가 제시되어 있습니다.
  • 2. Autoencoder in WikipediaFOUR KEYWORDS 1 / 5 [KEYWORDS] Unsupervised learning Representation learning = Efficient coding learning Dimensionality reduction Generative model learning INTRODUCTION
  • 3. Nonlinear dimensionality reductionFOUR KEYWORDS 2 / 5 [KEYWORDS] Unsupervised learning Nonlinear Dimensionality reduction = Representation learning = Efficient coding learning = Feature extraction = Manifold learning Generative model learning INTRODUCTION
  • 4. Representation learningFOUR KEYWORDS 3 / 5 [KEYWORDS] Unsupervised learning Nonlinear Dimensionality reduction = Representation learning = Efficient coding learning = Feature extraction = Manifold learning Generative model learning http://videolectures.net/kdd2014_bengio_deep_learning/ INTRODUCTION
  • 5. ML density estimationFOUR KEYWORDS 4 / 5 [KEYWORDS] Unsupervised learning Nonlinear Dimensionality reduction = Representation learning = Efficient coding learning = Feature extraction = Manifold learning Generative model learning ML density estimationhttp://www.iangoodfellow.com/slides/2016-12-04-NIPS.pdf INTRODUCTION
  • 6. 5 / 5 [4 MAIN KEYWORDS] 1. Unsupervised learning 2. Manifold learning 3. Generative model learning 4. ML density estimation 오토인코더를 학습할 때: 학습 방법은 비교사 학습 방법을 따르며, Loss는 negative ML로 해석된다. 학습된 오토인코더에서: 인코더는 차원 축소 역할을 수행하며, 디코더는 생성 모델의 역할을 수행한다. SummaryFOUR KEYWORDS INTRODUCTION Unsupervised learning ML density estimation Manifold learning Generative model learning Encoder DecoderInput Output
  • 7. CONTENTS 01. Revisit Deep Neural Networks • Machine learning problem • Loss function viewpoints I : Back-propagation • Loss function viewpoints II : Maximum likelihood • Maximum likelihood for autoencoders 03. Autoencoders • Autoencoder (AE) • Denosing AE (DAE) • Contractive AE (CAE) 02. Manifold Learning • Four objectives • Dimension reduction • Density estimation 04. Variational Autoencoders • Variational AE (VAE) • Conditional VAE (CVAE) • Adversarial AE (AAE) 05. Applications • Retrieval • Generation • Regression • GAN+VAE
  • 8. 01. Revisit Deep Neural Networks 02. Manifold Learning 03. Autoencoders 04. Variational Autoencoders 05. Applications • Machine learning problem • Loss function viewpoints I : Back-propagation • Loss function viewpoints II : Maximum likelihood • Maximum likelihood for autoencoders KEYWORD : ML density estimation 딥뉴럴넷을 학습할 때 사용되는 로스함수는 다양한 각도에서 해석할 수 있다. 그 중 하나는 back-propagation 알고리즘이 좀 더 잘 동작할 수 있는지에 대한 해석이다. (gradient-vanishing problem이 덜 발생할 수 있다는 해석) 다른 하나는 negative maximum likelihood로 보고 특정 형태의 loss는 특정 형태의 확률분포를 가정한다는 해석이다. Autoencoder를 학습하는 것 또한 maximum likehihood 관점에서의 최적화로 볼 수 있다.
  • 9. Classic Machine LearningML PROBLEM 1 / 17 01. Collect training data 입력 데이터 출력 정보 𝑥 모델 𝑦𝑥 = {𝑥1, 𝑥2, … , 𝑥 𝑁} 02. Define functions 𝑓𝜃 ∙ • Output : 𝑓𝜃 𝑥 • Loss : 𝐿(𝑓𝜃 𝑥 , 𝑦) 모델 종류 𝐿(𝑓𝜃 𝑥 , 𝑦) 서로 다른 정도 03. Learning/Training Find the optimal parameter 𝜃∗ = argmin 𝜃 𝐿(𝑓𝜃 𝑥 , 𝑦) 04. Predicting/Testing Compute optimal function output 𝑦𝑛𝑒𝑤 = 𝑓𝜃∗ 𝑥 𝑛𝑒𝑤 주어진 데이터를 제일 잘 설명하는 모델 찾기 고정 입력, 고정 출력 𝑦 = {𝑦1, 𝑦2, … , 𝑦 𝑁} 𝒟 = 𝑥1, 𝑦1 , 𝑥2, 𝑦2 … 𝑥 𝑁, 𝑦 𝑁 예측 REVISIT DNN
  • 10. Deep Neural Networks 2 / 17 01. Collect training data 입력 데이터 출력 정보 𝑥 모델 𝑦02. Define functions 𝑓𝜃 ∙ 모델 종류 𝐿(𝑓𝜃 𝑥 , 𝑦) 서로 다른 정도 03. Learning/Training 04. Predicting/Testing 예측 𝜃 = 𝑊, 𝑏 𝑓𝜃 ∙ 𝐿(𝑓𝜃 𝑥 , 𝑦)Deep Neural Network 파라미터는 웨이트와 바이어스 Assumption 1. Total loss of DNN over training samples is the sum of loss for each training sample 𝐿 𝑓𝜃 𝑥 , 𝑦 = σ𝑖 𝐿 𝑓𝜃 𝑥𝑖 , 𝑦𝑖 Assumption 2. Loss for each training example is a function of final output of DNN Backpropagation을 통해 DNN학습을 학습 시키기 위한 조건들 ML PROBLEM REVISIT DNN
  • 11. Questions Strategies How to update 𝜃  𝜃 + ∆𝜃 Only if 𝐿 𝜃 + ∆𝜃 < 𝐿(𝜃) When we stop to search?? If 𝐿 𝜃 + ∆𝜃 == 𝐿 𝜃 Deep Neural Networks 3 / 17 01. Collect training data 02. Define functions 03. Learning/Training 04. Predicting/Testing 𝜃∗ = argmin 𝜃∈Θ 𝐿(𝑓𝜃 𝑥 , 𝑦) Gradient Descent Iterative Method 𝜃∗ = argmin 𝜃∈Θ 𝐿(𝑓𝜃 𝑥 , 𝑦) = argmin 𝜃∈Θ 𝐿(𝜃) Θ 𝐿(𝜃) 로스값이 줄어드는 방향으로 계속 이동하고, 움직여도 로스값이 변함 없을 경우 멈춘다 ML PROBLEM REVISIT DNN
  • 12. Deep Neural Networks 4 / 17 01. Collect training data 02. Define functions 03. Learning/Training 04. Predicting/Testing 𝜃∗ = argmin 𝜃∈Θ 𝐿(𝑓𝜃 𝑥 , 𝑦) Gradient Descent 𝐿 𝜃 + ∆𝜃 = 𝐿 𝜃 + 𝛻𝐿 ∙ ∆𝜃 + 𝑠𝑒𝑐𝑜𝑛𝑑 𝑑𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒 + 𝑡ℎ𝑖𝑟𝑑 𝑑𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒 + ⋯Taylor Expansion  𝐿 𝜃 + ∆𝜃 ≈ 𝐿 𝜃 + 𝛻𝐿 ∙ ∆𝜃Approximation  𝐿 𝜃 + ∆𝜃 − 𝐿 𝜃 = ∆𝐿 = 𝛻𝐿 ∙ ∆𝜃 If ∆𝜃 = −𝜂∆𝐿, then ∆𝐿 = −𝜂 𝛻𝐿 2 < 0, where 𝜂 > 0 and called learning rate 𝛻𝐿 is gradient of 𝐿 and indicates the steepest increasing direction of 𝐿 더 많은 차수를 사용할 수록 더 넓은 지역을 작은 오차로 표현 가능 Learning rate를 사용하여 조금씩 파라미터 값을 바꾸는 것은 로스 함수의 1차 미분항까지만 사용했기에 아주 좁은 영역에서만 감소 방향이 정확하기 때문이다. ML PROBLEM REVISIT DNN Questions Strategies How to update 𝜃  𝜃 + ∆𝜃 Only if 𝐿 𝜃 + ∆𝜃 < 𝐿(𝜃) When we stop to search?? If 𝐿 𝜃 + ∆𝜃 == 𝐿 𝜃 How to find ∆𝜃 so that 𝐿(𝜃 + ∆𝜃) < 𝐿(𝜃) ? ∆𝜃 = −𝜂𝛻𝐿, where 𝜂 > 0
  • 13. Deep Neural Networks 5 / 17 01. Collect training data 02. Define functions 03. Learning/Training 04. Predicting/Testing DB LOSS 𝒟 DNN 𝐿(𝜃 𝑘, 𝒟)𝜃 𝑘 = 𝑤 𝑘, 𝑏 𝑘 𝐿 𝜃 𝑘, 𝒟 = σ𝑖 𝐿 𝜃 𝑘, 𝒟𝑖 𝛻𝐿 𝜃 𝑘, 𝒟 = σ𝑖 𝛻𝐿 𝜃 𝑘, 𝒟𝑖 𝛻𝐿 𝜃 𝑘, 𝒟 ≜ σ𝑖 𝛻𝐿 𝜃 𝑘, 𝒟𝑖 /𝑁 𝛻𝐿 𝜃 𝑘, 𝒟 ≈ σ 𝑗 𝛻𝐿 𝜃 𝑘, 𝒟𝑖 /𝑀, where 𝑀 < 𝑁Stochastic Gradient Descent  𝜃 𝑘+1 = 𝜃 𝑘 − 𝜂𝛻𝐿 𝜃 𝑘, 𝒟 M : batch size 𝜃∗ = argmin 𝜃∈Θ 𝐿(𝑓𝜃 𝑥 , 𝑦) Gradient Descent ML PROBLEM REVISIT DNN 전체 데이터에 대한 로스 함수가 각 데이터 샘플에 대 한 로스의 합으로 구성되어 있기에 미분 계산을 효율적 으로 할 수 있다. 만약 곱으로 구성되어 있으면 미분을 위해 모든 샘플의 결과를 메모리에 저장해야 한다. Redefinition  원래는 모든 데이터에 대한 로스 미분값의 합을 구한 후 파라미터를 갱신해야 하지만, 배치 크기만큼만 로스 미분값의 합을 구한 후 파라미터를 갱신한다.
  • 14. Deep Neural Networks 6 / 17 01. Collect training data 02. Define functions 03. Learning/Training 04. Predicting/Testing 𝜃∗ = argmin 𝜃∈Θ 𝐿(𝑓𝜃 𝑥 , 𝑦) Gradient Descent + Backpropagation DB LOSS 𝒟 DNN 𝐿(𝜃 𝑘, 𝒟)𝜃 𝑘 = 𝑤 𝑘, 𝑏 𝑘 𝜃 𝑘+1 = 𝜃 𝑘 − 𝜂𝛻𝐿 𝜃 𝑘, 𝒟 𝑤 𝑘+1 𝑙 = 𝑤 𝑘 𝑙 − 𝜂𝛻 𝑤 𝑘 𝑙 𝐿 𝜃 𝑘, 𝒟 𝑏 𝑘+1 𝑙 = 𝑏 𝑘 𝑙 − 𝜂𝛻𝑏 𝑘 𝑙 𝐿 𝜃 𝑘, 𝒟 특정 레이어에서 파라미터 갱신식 𝛿 𝐿 = 𝛻𝑎 𝐶⨀𝜎′ 𝑧 𝐿 • 𝐶 : Cost (Loss) • 𝑎 : final output of DNN • 𝜎(∙) : activation function 1. Error at the output layer 𝛿 𝑙 = 𝜎′ 𝑧 𝑙 ⨀ 𝑤 𝑙+1 𝑇 𝛿 𝑙+1 2. Error relationship between two adjacent layers 3. Gradient of C in terms of bias 4. Gradient of C in terms of weight 𝛻𝑏 𝑙 𝐶 = 𝛿 𝑙 𝛻 𝑊 𝑙 𝐶 = 𝛿 𝑙 𝑎 𝑙−1 𝑇 http://neuralnetworksanddeeplearning.com/chap2.html ML PROBLEM [ Backpropagation Algorithm ] REVISIT DNN 로스함수의 미분값이 딥뉴럴넷 학습에서 제일 중요!!
  • 15. View-Point I : BackpropagationLOSS FUNCTION 7 / 17 ∑ w b Input : 1.0 Output : 0.0 w=-1.28 b=-0.98 a=+0.09 w =-0.68 b =-0.68 a =+0.20 x y OBJECT 𝛻𝑎 𝐶 = (𝑎 − 𝑦) 𝑤 = 𝑤 − 𝜂𝛿 𝑏 = 𝑏 − 𝜂𝛿 𝐶 = 𝑎 − 𝑦 2/2 = 𝑎2/2 𝛿 = 𝛻𝑎 𝐶⨀𝜎′ 𝑧 = (𝑎 − 𝑦)𝜎′ 𝑧 az 𝜎 . :sigmoid Type 1 : Mean Square Error / Quadratic loss w0=+0.6, b0=+0.9, a0=+0.82 w0=+2.0, b0=+2.0, a0=+0.98 𝜕𝐶 𝜕𝑤 = 𝑥𝛿 = 𝛿 𝜕𝐶 𝜕𝑏 = 𝛿 http://neuralnetworksanddeeplearning.com/chap2.html REVISIT DNN
  • 16. View-Point I : BackpropagationLOSS FUNCTION 8 / 17 Type 1 : Mean Square Error / Quadratic loss Learning slow means are 𝜕𝐶 ∕ 𝜕𝑤, 𝜕𝐶 ∕ 𝜕𝑏 small !! Why they are small??? w=+0.6 b=+0.9 w=+2 b=+2 𝜕𝐶 𝜕𝑤 = 𝑥𝛿 = 𝑥𝑎𝜎′ 𝑧 = 𝑎𝜎′ 𝑧 𝜕𝐶 𝜕𝑏 = 𝛿 = 𝑎𝜎′ 𝑧 ∑ w b Input : 1.0 Output : 0.0 x y OBJECT az 𝜎 . :sigmoid http://neuralnetworksanddeeplearning.com/chap2.html REVISIT DNN 𝛻𝑎 𝐶 = (𝑎 − 𝑦) 𝑤 = 𝑤 − 𝜂𝛿 𝑏 = 𝑏 − 𝜂𝛿 𝐶 = 𝑎 − 𝑦 2/2 = 𝑎2/2 𝛿 = 𝛻𝑎 𝐶⨀𝜎′ 𝑧 = (𝑎 − 𝑦)𝜎′ 𝑧 𝜕𝐶 𝜕𝑤 = 𝑥𝛿 = 𝛿 𝜕𝐶 𝜕𝑏 = 𝛿
  • 17. View-Point I : Backpropagation 9 / 17 Type 2 : Cross Entropy 𝐶 = − 𝑦 𝑙𝑛 𝑎 + (1 − 𝑦)𝑙𝑛(1 − 𝑎) 𝛻𝑎 𝐶 = − 𝑦 𝑎 − 1 − 𝑦 −1 1 − 𝑎 = 𝑦 − 𝑎 1 − 𝑎 𝑎 𝜎′ 𝑧 = 𝜕𝑎 𝜕𝑧 = 𝜎′ 𝑧 = 1 − 𝜎 𝑧 𝜎 𝑧 = 1 − 𝑎 𝑎 LOSS FUNCTION ∑ w b Input : 1.0 Output : 0.0 x y OBJECT az 𝜎 . :sigmoid 𝛿 𝐶𝐸 = 𝛻𝑎 𝐶⨀𝜎′ 𝑧 𝐿 = 𝑎 − 𝑦 1 − 𝑎 𝑎 1 − 𝑎 𝑎 = 𝑎 − 𝑦 = − 𝑦 𝑎 + 1 − 𝑦 1 − 𝑎 = −(1 − 𝑎)𝑦 (1 − 𝑎)𝑎 + 1 − 𝑦 𝑎 (1 − 𝑎)𝑎 = −𝑦 + 𝑎𝑦 + 𝑎 − 𝑎𝑦 (1 − 𝑎)𝑎 = 𝑎 − 𝑦 (1 − 𝑎)𝑎 𝛿 𝑀𝑆𝐸 = (𝑎 − 𝑦)𝜎′ 𝑧 MSE와는 달리 CE는 출력 레이어에서의 에러값에 activation function의 미분값이 곱해지지 않아 gradient vanishing problem에서 좀 더 자유롭다. (학습이 좀 더 빨리 된다) 그러나 레이어가 여러 개가 사용될 경우에는 결국 activation function의 미분값이 계속해서 곱해지므로 gradient vanishing problem에서 완전 자유로울 수 없다. ReLU는 미분값이 1 혹은 0이므로 이러한 관점에서 훌륭한 activation function이다. http://neuralnetworksanddeeplearning.com/chap2.html REVISIT DNN
  • 18. View-Point I : Backpropagation 10 / 17 Type 2 : Cross Entropy 𝐶 = − 𝑦 𝑙𝑛 𝑎 + (1 − 𝑦)𝑙𝑛(1 − 𝑎) 𝛿 = 𝑎 − 𝑦 LOSS FUNCTION w=-1.28 b=-0.98 a=+0.09 w=-2.37 b=-2.07 a=+0.01 w=-0.68 b=-0.68 a=+0.20 w=-2.20 b=-2.20 a=+0.01 ∑ w b Input : 1.0 Output : 0.0 x y OBJECT az 𝜎 . :sigmoid 𝑤 = 𝑤 − 𝜂𝛿 𝑏 = 𝑏 − 𝜂𝛿 𝜕𝐶 𝜕𝑤 = 𝑥𝛿 = 𝛿 𝜕𝐶 𝜕𝑏 = 𝛿 http://neuralnetworksanddeeplearning.com/chap2.html REVISIT DNN w0=+0.6, b0=+0.9, a0=+0.82 w0=+2.0, b0=+2.0, a0=+0.98
  • 19. View-Point II : Maximum Likelihood 11 / 17 01. Collect training data 입력 데이터 출력 정보 𝑥 모델 𝑦𝑥 = {𝑥1, 𝑥2, … , 𝑥 𝑁} 02. Define functions 𝑓𝜃 ∙ • Output : 𝑓𝜃 𝑥 • Loss : − log 𝑝(𝑦|𝑓𝜃 𝑥 ) 모델 종류 𝑝(𝑦|𝑓𝜃 𝑥 ) 정해진 확률분포에서 출력이 나올 확률 03. Learning/Training Find the optimal parameter 𝜃∗ = argmin 𝜃 [− log 𝑝(𝑦|𝑓𝜃 𝑥 ) ] 04. Predicting/Testing Compute optimal function output 𝑦𝑛𝑒𝑤~𝑝(𝑦|𝑓𝜃∗ 𝑥 𝑛𝑒𝑤 ) 주어진 데이터를 제일 잘 설명하는 모델 찾기 고정 입력, 고정/다른 출력 𝑦 = {𝑦1, 𝑦2, … , 𝑦 𝑁} 𝒟 = 𝑥1, 𝑦1 , 𝑥2, 𝑦2 … 𝑥 𝑁, 𝑦 𝑁 예측 Back to Machine Learning Problem LOSS FUNCTION REVISIT DNN 𝑦 𝑓𝜃1 𝑥 𝑓𝜃2 𝑥 𝑝 𝑦 𝑓𝜃12 𝑥 < 𝑝 𝑦 𝑓𝜃1 𝑥
  • 20. View-Point II : Maximum Likelihood 12 / 17 01. Collect training data 02. Define functions 03. Learning/Training 04. Predicting/Testing 입력 데이터 출력 정보 𝑥 모델 𝑦 𝑓𝜃 ∙ 𝑝(𝑦|𝑓𝜃 𝑥 ) Back to Machine Learning Problem Assumption 1. Total loss of DNN over training samples is the sum of loss for each training sample Assumption 2. Loss for each training example is a function of final output of DNN Assumption 1 : Independence All of our data is independent of each other 𝑝(𝑦|𝑓𝜃 𝑥 ) = ς𝑖 𝑝 𝐷 𝑖 (𝑦|𝑓𝜃 𝑥𝑖 ) Assumption 2: Identical Distribution Our data is identically distributed 𝑝(𝑦|𝑓𝜃 𝑥 ) = ς𝑖 𝑝(𝑦|𝑓𝜃 𝑥𝑖 ) i.i.d Condition on 𝑝 𝑦 𝑓𝜃 𝑥 − log 𝑝 𝑦 𝑓𝜃 𝑥 = − ෍ 𝑖 log 𝑝 𝑦𝑖 𝑓𝜃 𝑥𝑖 LOSS FUNCTION REVISIT DNN
  • 21. View-Point II : Maximum Likelihood 13 / 17 − log 𝑝 𝑦𝑖 𝑓𝜃 𝑥𝑖 Gaussian distribution Bernoulli distribution 𝑝 𝑦𝑖 𝜇𝑖, 𝜎𝑖 = 1 2𝜋𝜎𝑖 exp − 𝑦𝑖 − 𝜇𝑖 2 2𝜎𝑖 2 𝑓𝜃 𝑥𝑖 = 𝜇𝑖, 𝜎𝑖 = 1 log( 𝑝 𝑦𝑖 𝜇𝑖, 𝜎𝑖 ) = log 1 2𝜋𝜎𝑖 − 𝑦𝑖 − 𝜇𝑖 2 2𝜎𝑖 2 −log( 𝑝 𝑦𝑖 𝜇𝑖 ) = − log 1 2𝜋 + 𝑦𝑖 − 𝜇𝑖 2 2 −log( 𝑝 𝑦𝑖 𝜇𝑖 ) ∝ 𝑦𝑖 − 𝜇𝑖 2 2 = 𝑦𝑖 − 𝑓𝜃 𝑥𝑖 2 2 𝑓𝜃 𝑥𝑖 = 𝑝𝑖 𝑝 𝑦𝑖 𝑝𝑖 = 𝑝𝑖 𝑦 𝑖 1 − 𝑝𝑖 1−𝑦 𝑖 log( 𝑝 𝑦𝑖 𝑝𝑖 ) = 𝑦𝑖 log 𝑝𝑖 + 1 − 𝑦𝑖 log(1 − 𝑝𝑖) − log( 𝑝 𝑦𝑖 𝑝𝑖 ) = − 𝑦𝑖 log 𝑝𝑖 + 1 − 𝑦𝑖 log(1 − 𝑝𝑖) Mean Squared Error Cross-entropy LOSS FUNCTION REVISIT DNN Univariate cases
  • 22. View-Point II : Maximum Likelihood 14 / 17 Gaussian distribution Categorical distribution 𝑝 𝑦𝑖 𝜇𝑖, Σ𝑖 = 1 2𝜋 𝑛/2 Σ 𝑖 1/2 exp − 𝑦 𝑖−𝜇 𝑖 𝑇Σ 𝑖 −1 𝑦 𝑖−𝜇 𝑖 2 𝑓𝜃 𝑥𝑖 = 𝜇𝑖, Σ𝑖 = 𝐼 log( 𝑝 𝑦𝑖 𝜇𝑖, Σ𝑖 ) = log 1 2𝜋 𝑛/2 Σ 𝑖 1/2 − 𝑦 𝑖−𝜇 𝑖 𝑇Σ 𝑖 −1 𝑦 𝑖−𝜇𝑖 2 −log( 𝑝 𝑦𝑖 𝜇𝑖 ) = − log 1 2𝜋 𝑛/2 + 𝑦 𝑖−𝜇 𝑖 2 2 2 −log( 𝑝 𝑦𝑖 𝜇𝑖 ) ∝ 𝑦 𝑖−𝜇 𝑖 2 2 2 = 𝑦 𝑖−𝑓 𝜃 𝑥𝑖 2 2 2 𝑓𝜃 𝑥𝑖 = 𝑝𝑖 𝑝 𝑦𝑖 𝑝𝑖 = ς 𝑗=1 𝑛 𝑝𝑖,𝑗 𝑦 𝑖,𝑗 1 − 𝑝𝑖,𝑗 1−𝑦𝑖,𝑗 log( 𝑝 𝑦𝑖 𝑝𝑖 ) = σ 𝑗=1 𝑛 𝑦𝑖,𝑗 log 𝑝𝑖,𝑗 + 1 − 𝑦𝑖,𝑗 log(1 − 𝑝𝑖,𝑗) − log( 𝑝 𝑦𝑖 𝑝𝑖 ) = − σ 𝑗=1 𝑛 𝑦𝑖,𝑗 log 𝑝𝑖,𝑗 + 1 − 𝑦𝑖,𝑗 log(1 − 𝑝𝑖,𝑗) Mean Squared Error Cross-entropy Also called Generalized Bernoulli or Multinoulli distribution LOSS FUNCTION REVISIT DNN Multivariate cases − log 𝑝 𝑦𝑖 𝑓𝜃 𝑥𝑖
  • 23. View-Point II : Maximum Likelihood 15 / 17 Gaussian distribution Categorical distribution 𝑓𝜃 𝑥𝑖 = 𝜇𝑖 𝑓𝜃 𝑥𝑖 = 𝑝𝑖 Distribution estimation 𝑥𝑖 𝑝 𝑦𝑖 𝑥𝑖 Likelihood값을 예측하는 것이 아니라, Likelihood의 파라미터값을 예측하는 것이다. 𝑥𝑖𝑥𝑖 LOSS FUNCTION REVISIT DNN Multivariate cases − log 𝑝 𝑦𝑖 𝑓𝜃 𝑥𝑖 Mean Squared Error Cross-entropy
  • 24. View-Point II : Maximum Likelihood 16 / 17 Let’s see Yoshua Bengio‘s slide http://videolectures.net/kdd2014_bengio_deep_learning/ LOSS FUNCTION REVISIT DNN
  • 25. View-Point II : Maximum Likelihood 17 / 17 Connection to Autoencoders Autoencoder LOSS FUNCTION REVISIT DNN Variational Autoencoder Gaussian distribution Categorical distribution Probability distribution 𝑝(𝑥|𝑥) 𝑝(𝑥) Mean Squared Error Loss Cross-Entropy Loss Mean Squared Error Loss Cross-Entropy Loss
  • 26. 01. Revisit Deep Neural Networks 02. Manifold Learning 03. Autoencoders 04. Variational Autoencoders 05. Applications • Four objectives • Dimension reduction • Density estimation KEYWORDS : Manifold learning, Unsupervised learning Autoencoder의 가장 중요한 기능 중 하나는 매니폴드를 학습한다는 것이다. 매니폴드 학습의 목적 4가지인 데이터 압축, 데이터 시각화, 차원의 저주 피하기, 유용한 특징 추출하기에 대해서 설명할 것이다. Autoencoder의 주요 기능인 차원 축소, 확률 분포 예측과 관련된 기존의 방법들을 살펴보고, 그 한계점에 대해서 짚어볼 것이다.
  • 27. • A 𝑑 dimensional manifold ℳ is embedded in an 𝑚 dimensional space, and there is an explicit mapping 𝑓: ℛ 𝑑 → ℛ 𝑚 𝑤ℎ𝑒𝑟𝑒 𝑑 ≤ 𝑚 • We are given samples 𝑥𝑖 ∈ ℛ 𝑚 with noise • 𝑓(∙) is called embedding function, 𝑚 is the extrinsic dimension, 𝑑 is the intrinsic dimension or the dimension of the latent space • Finding 𝑓(∙) or 𝜏𝑖 from the given 𝑥𝑖 is called manifold learning • We assume 𝑝(𝜏) is smooth, is distributed uniformly, and noise is small  Manifold Hypothesis DefinitionINTRODUCTION MANIFOLD LEARNING 1 / 20 https://math.stackexchange.com/questions/1203714/manifold-learning-how-should-this-method-be-interpreted ℛ 𝑑 ℛ 𝑚
  • 28. What is it useful for?INTRODUCTION MANIFOLD LEARNING 2 / 20 01. Data compression 02. Data visualization 03. Curse of dimensionality 04. Discovering most important features Reasonable distance metric Needs disentagling the underlying explanatory factors (making sense of the data) Manifold Hypothesis Dimensionality Reduction is an Unsupervised Learning Task! http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder 𝑥𝑖 ∈ ℛ 𝑚 𝜏𝑖 ∈ ℛ 𝑑 (3.5, -1.7, 2.8, -3.5, -1.4, 2.4, 2.7, 7.5) (0.32, -1.3, 1.2)
  • 29. Data compressionOBJECTIVES MANIFOLD LEARNING 3 / 20 http://theis.io/media/publications/paper.pdf Example : Lossy Image Compression with compressive Autoencoders, ‘17.03.01
  • 30. Data visualizationOBJECTIVES MANIFOLD LEARNING 4 / 20 t-distributed stochastic neighbor embedding (t-SNE) https://www.tensorflow.org/get_started/embedding_viz http://vision-explorer.reactive.ai/#/?_k=aodf68 http://fontjoy.com/projector/
  • 31. Curse of dimensionalityOBJECTIVES MANIFOLD LEARNING 5 / 20 데이터의 차원이 증가할수록 해당 공간의 크기(부피)가 기하급수적으로 증가하기 때문에 동일한 개수의 데이터의 밀도는 차원이 증가할수록 급속도로 희박해진다. 따라서, 차원이 증가할수록 데이터의 분포 분석 또는 모델추정에 필요한 샘플 데이터의 개수가 기하급수적으로 증가하게 된다. http://darkpgmr.tistory.com/145 http://videolectures.net/kdd2014_bengio_deep_learning/
  • 32. Curse of dimensionalityOBJECTIVES MANIFOLD LEARNING 6 / 20 Natural data in high dimensional spaces concentrates close to lower dimensional manifolds. Probability density decreases very rapidly when moving away from the supporting manifold. Manifold Hypothesis (assumption) 고차원의 데이터의 밀도는 낮지만, 이들의 집합을 포함하는 저차원의 매니폴드가 있다. 이 저차원의 매니폴드를 벗어나는 순간 급격히 밀도는 낮아진다. http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
  • 33. Curse of dimensionalityOBJECTIVES MANIFOLD LEARNING 7 / 20 Manifold Hypothesis (assumption) • 200x200 RGB image has 10^96329 possible states. • Random image is just noisy. • Natural images occupy a tiny fraction of that space • suggests peaked density • Realistic smooth transformations from one image to another continuous path along manifold • Data density concentrates near a lower dimensional manifold • It can shift the curse from high d to d << m http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder http://www.freejapanesefont.com/category/calligraphy-2/ http://baanimotion.blogspot.kr/2014/05/animated-acting.html https://medicalxpress.com/news/2013-03-people-facial.html
  • 34. Discovering most important featuresOBJECTIVES MANIFOLD LEARNING 8 / 20 Manifold follows naturally from continuous underlying factors (≈intrinsic manifold coordinates) Such continuous factors are part of a meaningful representation! https://dmm613.wordpress.com/tag/machine-learning/ thickness rotation size rotation From InfoGAN From VAE 매니폴드 학습 결과 평가를 위해 매니폴드 좌표들이 조금씩 변할 때 원 데이터도 유의미하게 조금씩 변함을 보인다.
  • 35. Discovering most important featuresOBJECTIVES MANIFOLD LEARNING 9 / 20 의미적으로 가깝다고 생각되는 고차원 공간에서의 두 샘플들 간의 거리는 먼 경우가 많다. 고차원 공간에서 가까운 두 샘플들은 의미적으로는 굉장히 다를 수 있다. 차원의 저주로 인해 고차원에서의 유의미한 거리 측정 방식을 찾기 어렵다. Reasonable distance metric A1 B A2 A2 A1 B Distance in high dimension Distance in manifold 중요한 특징들을 찾았다면 이 특징을 공유하는 샘플들도 찾을 수 있어야 한다.
  • 36. Discovering most important featuresOBJECTIVES MANIFOLD LEARNING 10 / 20 Reasonable distance metric https://www.cs.cmu.edu/~efros/courses/AP06/presentations/ThompsonDimensionalityReduction.pdf Interpolation in high dimension
  • 37. Discovering most important featuresOBJECTIVES MANIFOLD LEARNING 11 / 20 Reasonable distance metric Interpolation in manifold https://www.cs.cmu.edu/~efros/courses/AP06/presentations/ThompsonDimensionalityReduction.pdf
  • 38. Discovering most important featuresOBJECTIVES MANIFOLD LEARNING 12 / 20 Needs disentagling the underlying explanatory factors In general, learned manifold is entangled, i.e. encoded in a data space in a complicated manner. When a manifold is disentangled, it would be more interpretable and easier to apply to tasks Entangled manifold Disentangled manifold MNIST Data  2D manifold
  • 39. TexonomyDIM. REDUCTION MANIFOLD LEARNING 13 / 20 • Principal Component Analysis (PCA) • Linear Discriminant Analysis (LDA) • etc.. Dimensionality Reduction Linear Non-Linear • Autoencoders (AE) • t-distributed stochastic neighbor embedding (t-SNE) • Isomap • Locally-linear embedding (LLE) • etc..
  • 40. PCADIM. REDUCTION MANIFOLD LEARNING 14 / 20 http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder • Finds k directions in which data has highest variance • Principal directions (eigenvectors) 𝑊 • Projecting inputs 𝑥 on these vectors yields reduced dimension representation (&decorrelated) • Principal components • ℎ = 𝑓𝜃 𝑥 = 𝑊 𝑥 − 𝜇 𝑤𝑖𝑡ℎ 𝜃 = 𝑊, 𝜇 http://www.nlpca.org/fig_pca_principal_component_analysis.png • Why mention PCA? • Prototypical unsupervised representation learning algorithm • Related to autoencoders • Prototypical manifold modeling algorithm
  • 41. PCADIM. REDUCTION MANIFOLD LEARNING 15 / 20 http://www.astroml.org/book_figures/chapter7/fig_S_manifold_PCA.html Entangled manifold Linear manifold Disentangled manifold Nonlinear manifold Disentangled manifold Nonlinear manifold
  • 42. Non linear methodsDIM. REDUCTION MANIFOLD LEARNING 16 / 20 Isomap LLE https://www.slideshare.net/plutoyang/manifold-learning-64891420
  • 43. Parzen WIndowsDENSITY ESTIMATION MANIFOLD LEARNING 17 / 20 https://en.wikipedia.org/wiki/Density_estimation#/media/File:KernelDensityGaussianAnimated.gif Ƹ𝑝 𝑥 = 1 𝑛 ෍ 𝑖=1 𝑛 𝒩(𝑥; 𝑥𝑖, 𝜎𝑖 2 ) • Demonstration of density estimation using kernel smoothing • The true density is mixture of two Gaussians centered around 0 and 3, shown with solid blue curve. • In each frame, 100 samples are generated from the distribution, shown in red. • Centered on each sample, a Gaussian kernel is drawn in gray. • Averaging the Gaussians yields the density estimate shown in the dashed black curve. 1D Example
  • 44. Parzen WIndowsDENSITY ESTIMATION MANIFOLD LEARNING 18 / 20 Ƹ𝑝 𝑥 = 1 𝑛 ෍ 𝑖=1 𝑛 𝒩(𝑥; 𝑥𝑖, 𝜎𝑖 2 𝐼) 2D Example http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder Classical Parzen Windows - Isotropic Gaussian centered on each training point Ƹ𝑝 𝑥 = 1 𝑛 ෍ 𝑖=1 𝑛 𝒩(𝑥; 𝑥𝑖, 𝐶𝑖) Manifold Parzen Windows -Oriented Gaussian centered on each training point -Use local PCA to get 𝐶𝑖 -High variance directions from PCA on k nearest neighbors Ƹ𝑝 𝑥 = 1 𝑛 ෍ 𝑖=1 𝑛 𝒩(𝑥; 𝜇(𝑥𝑖), 𝐶(𝑥𝑖)) Non-local Manifold Parzen Windows -High variance directions and center output by neural network trained to maximize likelihood of k nearest neighbors Bengio, Larochelle, Vincent NIPS 2006Vincent and Bengio, NIPS 2003
  • 45. Parzen WIndowsDENSITY ESTIMATION MANIFOLD LEARNING 19 / 20 http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
  • 46. LIMITATION MANIFOLD LEARNING 20 / 20 • Isomap • Locally-linear embedding (LLE) Dimensionality Reduction • Isotropic parzen window • Manifold parzen window • Non-local manifold parzen window Non-parametric Density Estimation • They explicitly use distance based neighborhoods. • Training with k-nearest neighbors, or pairs of points. • Typically Euclidean neighbors • But in high d, your nearest Euclidean neighbor can be very different from you Neighborhood based training !!! 고차원 데이터 간의 유클리디안 거리는 유의미한 거리 개념이 아닐 가능성이 높다. http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
  • 47. 01. Revisit Deep Neural Networks 02. Manifold Learning 03. Autoencoders 04. Variational Autoencoders 05. Applications Autoencoder를 설명하고 이와 유사해 보이는 PCA, RBM과의 차이점을 설명한다. Autoencoder의 입력에 Stochastic perturbation을 추가한 Denoising Autoencoder, perturbation을 analytic regularization term으로 바꾼 Contractive Autoencoder에 대해서 설명한다. • Autoencoder (AE) • Denosing AE (DAE) • Contractive AE (CAE)
  • 48. TerminologyINTRODUCTION AUTOENOCDERS 1 / 24 • Code • Latent Variable • Feature • Hidden representation Encoding Undercomplete Decoding Overcomplete Autoencoders = Auto-associators = Diabolo networks = Sandglass-shaped net Diabolo 𝑥 𝑦 𝑧 http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
  • 49. NotationsINTRODUCTION 2 / 24 𝑧 Encoder h(.) 𝑥 𝑦 • Make output layer same size as input layer 𝑥, 𝑦 ∈ ℝ 𝑑 • Loss encourages output to be close to input 𝐿(𝑥, 𝑦) • Unsupervised Learning  Supervised Learning Decoder g(.) 𝐿(𝑥, 𝑦) 𝑧 = ℎ(𝑥) ∈ ℝ 𝑑 𝑧 𝑦 = 𝑔 𝑧 = 𝑔(ℎ(𝑥)) 𝐿 𝐴𝐸 = ෍ 𝑥∈𝐷 𝐿(𝑥, 𝑦) 입출력이 동일한 네트워크 비교사 학습 문제를 교사 학습 문제로 바꾸어서 해결 Decoder가 최소한 학습 데이터는 생성해 낼 수 있게 된다.  생성된 데이터가 학습 데이터 좀 닮아 있다. Encoder가 최소한 학습 데이터는 잘 latent vector로 표현 할 수 있게 된다.  데이터의 추상화를 위해 많이 사용된다. AUTOENOCDERS http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
  • 50. Multi-Layer PerceptronLINEAR AUTOENCODER 3 / 24 𝐿(𝑥, 𝑦) 𝑥 ∈ ℝ 𝑑 𝑦 ∈ ℝ 𝑑 𝑧 ∈ ℝ 𝑑 𝑧 𝑧 = ℎ(𝑥) 𝑦 = 𝑔(ℎ 𝑥 ) ℎ(∙) 𝑔(∙) input output reconstruction error Encoder Decoder latent vector 𝐿 𝐴𝐸 = ෍ 𝑥∈𝐷 𝐿(𝑥, 𝑔(ℎ 𝑥 )Minimize ℎ 𝑥 = 𝑊𝑒 𝑥 + 𝑏 𝑒 𝑔 ℎ 𝑥 = 𝑊𝑑 𝑧 + 𝑏 𝑑 𝑥 − 𝑦 2 or cross-entropy General Autoencoder Linear Autoencoder Hidden layer 1개이고 레이어 간 fully-connected로 연결된 구조 AUTOENOCDERS http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
  • 51. Connection to PCA & RBMLINEAR AUTOENCODER 4 / 24 Principle Component Analysis Restricted Boltzman Machine • For bottleneck structure : 𝑑 𝑧 < 𝑑 • With linear neurons and squared loss, autoencoder learns same subspace as PCA • Also true with a single sigmoidal hidden layer, if using linear output neurons with squared loss and untied weights. • Won’t learn the exact same basis as PCA, but W will span the same subspace. Baldi, Pierre, & Hornik, Kurt. 1989. Neural networks and principal component analysis: Learning from examples without local minima. Neural networks, 2(1), 53–58. • With a single hidden layer with sigmoid non- linearity and sigmoid output non-linearity. • Tie encoder and decoder weights: 𝑊𝑑 = 𝑊𝑒 𝑇 AUTOENOCDERS http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder Autoencoder RBM 𝑧𝑖 = 𝜎 𝑊𝑒𝑖 𝑥 + 𝑏 𝑒𝑖 𝑃 ℎ𝑖 = 1 𝑣 = 𝜎 𝑊𝑒𝑖 𝑣 + 𝑏 𝑒𝑖 𝑦𝑗 = 𝜎 𝑊𝑒𝑗 𝑇 𝑧 + 𝑏 𝑑𝑗 𝑃 𝑣𝑗 = 1 ℎ = 𝜎 𝑊𝑒𝑗 𝑇 ℎ + 𝑏 𝑑𝑗 Determinisitc mapping 𝑧 is a function 𝑥 Stochastic mapping 𝑧 is a random variable ℎ = 𝑓𝜃 𝑥 = 𝑊 𝑥 − 𝜇 𝑤𝑖𝑡ℎ 𝜃 = 𝑊, 𝜇 in PCA Slide
  • 52. RBMPRETRAINING 5 / 24 AUTOENOCDERS Stacking RBM  Deep Belief Network (DBN) https://www.cs.toronto.edu/~hinton/science.pdf Reducing the Dimensionality of Data with Neural Networks
  • 53. Target 784 1000 1000 10 Input output Input 784 1000 784 W1 𝑥 ො𝑥 500 W1’ AutoencoderPRETRAINING 6 / 24 AUTOENOCDERS Stacking Autoencoder http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/auto.pptx
  • 54. Target 784 1000 1000 500 10 Input output Input 784 1000 W1 1000 1000 fix 𝑥 𝑎1 ො𝑎1 W2 W2’ AutoencoderPRETRAINING 7 / 24 AUTOENOCDERS Stacking Autoencoder http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/auto.pptx
  • 55. Target 784 1000 1000 10 Input output Input 784 1000 W1 1000 fix 𝑥 𝑎1 ො𝑎2 W2fix 𝑎2 1000 W3 500 500 W3’ AutoencoderPRETRAINING 8 / 24 AUTOENOCDERS Stacking Autoencoder http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/auto.pptx
  • 56. Target 784 1000 1000 10 Input output Input 784 1000 W1 1000 𝑥 W2 W3 500 500 10output W4  Random initialization AutoencoderPRETRAINING 9 / 24 AUTOENOCDERS Stacking Autoencoder Fine-tuning by backpropagation http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/auto.pptx
  • 57. IntroductionDAE 10 / 24 ~~~ 𝐿(𝑥, 𝑦) ෤𝑥 ∈ ℝ 𝑑 𝑦 ∈ ℝ 𝑑 𝑧 ∈ ℝ 𝑑 𝑧 𝑧 = ℎ(෤𝑥) 𝑦 = 𝑔(ℎ ෤𝑥) ) ℎ(∙) 𝑔(∙) corrupted input output reconstruction error Encoder Decoder latent vector 𝐿 𝐷𝐴𝐸 = ෍ 𝑥∈𝐷 𝐸 𝑞( ෤𝑥|𝑥) 𝐿(𝑥, 𝑔(ℎ ෤𝑥 )Minimize 𝑥 ∈ ℝ 𝑑 input add random noise 𝑞(෤𝑥|𝑥) AUTOENOCDERS Denoising AutoEnocder http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
  • 58. Adding noiseDAE 11 / 24 Denoising corrupted input • will encourage representation that is robust to small perturbations of the input • Yield similar or better classification performance as deep neural net pre-training Possible corruptions • Zeroing pixels at random (now called dropout noise) • Additive Gaussian noise • Salt-and-pepper noise • Etc Cannot compute expectation exactly • Use sampling corrupted inputs 𝐿 𝐷𝐴𝐸 = ෍ 𝑥∈𝐷 𝐸 𝑞( ෤𝑥|𝑥) 𝐿(𝑥, 𝑔(ℎ ෤𝑥 ) ≈ ෍ 𝑥∈𝐷 1 𝐿 ෍ 𝑖=1 𝐿 𝐿(𝑥, 𝑔(ℎ ෤𝑥𝑖 ) L개 샘플에 대한 평균으로 대체 AUTOENOCDERS http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder
  • 59. Manifold interpretationDAE 12 / 24 • Suppose training data (x) concentrate near a low dimensional manifold. • Corrupted examples (●) obtained by applying corruption process. • 𝑞(෤𝑥|𝑥) will generally lie farther from the manifold. • The model learns with 𝑝(𝑥|෤𝑥) to “project them back” (via autoencoder 𝑔(ℎ ෤𝑥 )) onto the manifold. • Intermediate representation 𝑧 = ℎ(𝑥) may be interpreted as a coordinate system for points 𝑥 on the manifold. 𝑞(෤𝑥|𝑥) 𝑔(ℎ ෤𝑥 ) ෤𝑥 ෤𝑥 http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder AUTOENOCDERS
  • 60. DAE 13 / 24 Filters in 1 hidden architecture must capture low-level features of images. 𝑥 𝑧 = 𝜎𝑒(𝑊𝑥 + 𝑏 𝑒) 𝑦 = 𝜎 𝑑(𝑊 𝑇 𝑧 + 𝑏 𝑑) AutoEncoder with 1hidden layer Performance – Visualization of learned filters AUTOENOCDERS
  • 61. DAE 14 / 24 Natural image patches (12x12 pixels) : 100 hidden units 랜덤값으로 초기화하였기 때문에 노이즈처럼 보이는 필터일 수록 학습이 잘 안 된 것이고 edge filter와 같은 모습 일 수록 학습이 잘 된 것이다. 10% salt-and-pepper noise • Mean Squared Error • 100 hidden units • Salt-and-pepper noise Performance – Visualization of learned filters http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder AUTOENOCDERS
  • 62. Performance – Visualization of learned filtersDAE 15 / 24 MNIST digits (64x64 pixels) 25% corruption • Cross Entropy • 100 hidden units • Zero-masking noise http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder AUTOENOCDERS
  • 63. Performance – PretrainingDAE 16 / 24 Stacked Denoising Auto-Encoders (SDAE) bgImgRot Data Train/Valid/Test : 10k/2k/20k Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion AUTOENOCDERS
  • 64. Performance – PretrainingDAE 17 / 24 Stacked Denoising Auto-Encoders (SDAE) bgImgRot Data Train/Valid/Test : 10k/2k/20k Zero-masking noise SAE Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion AUTOENOCDERS
  • 65. Performance – GenerationDAE 18 / 24 Bernoulli input input Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion AUTOENOCDERS
  • 66. Encouraging representation to be insensitive to corruptionDAE 19 / 24 DAE encourages reconstruction to be insensitive to input corruption Alternative: encourage representation to be insensitive Tied weights i.e. 𝑊′ = 𝑊 𝑇 prevent 𝑊 from collapsing ℎ to 0. 𝐿 𝑆𝐶𝐴𝐸 = ෍ 𝑥∈𝐷 𝐿(𝑥, 𝑔 ℎ 𝑥 + 𝜆𝐸 𝑞( ෤𝑥|𝑥) ℎ 𝑥 − ℎ ෤𝑥 2 정규화 항목이 0이 되는 경우는 h가 0인 경우도 있으므로 이를 방지하기 위해 tied weight를 사용한다. DAE의 loss의 해석은 g,h중에서 특히 h는 데이터 x가 조금 바뀌더라도 매니폴드 위에서 같은 샘플로 매칭이 되도록 학습 되어야 한다라고 볼 수 있다. Reconstruction Error Stochastic Regularization Stochastic Contractive AutoEncoder (SCAE) http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder AUTOENOCDERS
  • 67. From stochastic to analytic penaltyCAE 20 / 24 SCAE stochastic regularization term : 𝐸 𝑞( ෤𝑥|𝑥) ℎ 𝑥 − ℎ ෤𝑥 2 For small additive noise, ෤𝑥|𝑥 = 𝑥 + 𝜖, 𝜖~𝒩(0, 𝜎2 𝐼) Taylor series expansion yields, ℎ ෤𝑥 = ℎ 𝑥 + 𝜖 = ℎ 𝑥 + 𝜕ℎ 𝜕𝑥 𝜖 + ⋯ It can be showed that Analytic Regularization (CAE) Stochastic Regularization (SCAE) 𝐸 𝑞( ෤𝑥|𝑥) ℎ 𝑥 − ℎ ෤𝑥 2 ≈ 𝜕ℎ 𝜕𝑥 𝑥 𝐹 2 Contractive AutoEncoder (CAE) http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder AUTOENOCDERS Frobenius Norm 𝐴 𝐹 2 = ෍ 𝑖=1 𝑚 ෍ 𝑗=1 𝑛 𝑎𝑖𝑗 2
  • 68. Analytic contractive regularization term is • Frobenius norm of the Jacobian of the non-linear mapping Penalizing 𝐽ℎ 𝑥 𝐹 2 encourages the mapping to the feature space to be contractive in the neighborhood of the training data. Loss functionCAE 21 / 24 𝐿 𝑆𝐶𝐴𝐸 = ෍ 𝑥∈𝐷 𝐿(𝑥, 𝑔 ℎ 𝑥 + 𝜆 𝜕ℎ 𝜕𝑥 𝑥 𝐹 2 Reconstruction Error Analytic Contractive Regularization For training examples, encourages both: • small reconstruction error • representation insensitive to small variations around example 𝜕ℎ 𝜕𝑥 𝑥 𝐹 2 = ෍ 𝑖𝑗 𝜕𝑧𝑗 𝜕𝑥𝑖 𝑥 2 = 𝐽ℎ 𝑥 𝐹 2 highlights the advantages for representations to be locally invariant in many directions of change of the raw input. Contractive Auto-Encoders:Explicit Invariance During Feature Extraction AUTOENOCDERS
  • 69. Performance – Visualization of learned filtersCAE 22 / 24 CIFAR-10 (32x32 pixels) MNIST digits (64x64 pixels) • 2000 hidden units • 4000 hidden units Gaussian noise http://videolectures.net/deeplearning2015_vincent_autoencoders/?q=vincent%20autoencoder AUTOENOCDERS
  • 70. Performance – PretrainingCAE 23 / 24 • DAE-g : DAE with gaussian noise • DAE-b : DAE with binary masking noise • CIFAR-bw : gray scale version • Training/Validation/test : 10k/2k/50k • SAT : average fraction of saturated units per sample • 1-hidden layer with 1000 units Contractive Auto-Encoders: Explicit Invariance During Feature Extraction AUTOENOCDERS
  • 71. Performance – PretrainingCAE 24 / 24 • basic: smaller subset of MNIST • rot: digits with added random rotation • bg-rand: digits with random noise background • bg-img: digits with random image background • bg-img-rot: digits with rotation and image background • rect: discriminate between tall and wide rectangles (white on black) • rect-img: discriminate between tall and wide rectangular image on a different background image http://www.iro.umontreal.ca/~lisa/icml2007 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction AUTOENOCDERS
  • 72. 01. Revisit Deep Neural Networks 02. Manifold Learning 03. Autoencoders 04. Variational Autoencoders 05. Applications 생성 모델로서의 Variational Autoencoder를 설명한다. 이에 생성 결과물을 조절하기 위한 조건을 추가한 Conditional Variational Autoencoder를 설명한다. 또한 일반 prior distributio에 대해서 동작 가능한 Adversarial Autoencder를 설명한다. • Variational AE (VAE) • Conditional VAE (CVAE) • Adversarial AE (AAE) KEYWORD: Generative model learning
  • 73. Sample GenerationGENERATIVE MODEL VAE 1 / 49 https://github.com/mingyuliutw/cvpr2017_gan_tutorial/blob/master/gan_tutorial.pdf Training Examples Density Estimation Sampling
  • 74. Latent Variable ModelGENERATIVE MODEL VAE 2 / 49 𝑧 𝑥Generator gθ(.) Latent Variable Target Data Latent variable can be seen as a set of control parameters for target data (generated data) For MNIST example, our model can be trained to generate image which match a digit value z randomly sampled from the set [0, ..., 9]. 그래서, p(z)는 샘플링 하기 용이해야 편하다. 𝑧~𝑝(𝑧) 𝑥 = 𝑔 𝜃(𝑧) Random variable Random variable 𝑔 𝜃(∙) Deterministic function parameterized by θ 𝑝 𝑥 𝑔 𝜃(𝑧) = 𝑝 𝜃 𝑥 𝑧 We are aiming maximize the probability of each x in the training set, under the entire generative process, according to: න 𝑝 𝑥 𝑔 𝜃 𝑧 𝑝(𝑧)𝑑𝑧 = 𝑝(𝑥)
  • 75. Prior distribution p(z)GENERATIVE MODEL 3 / 49 Yes!!! Recall that 𝑝 𝑥 𝑔 𝜃(𝑧) = 𝒩 𝑥 𝑔 𝜃 𝑧 , 𝜎2 ∗ 𝐼 . If 𝑔 𝜃(𝑧) is a multi- layer neural network, then we can imagine the network using its first few layers to map the normally distributed z’s to the latent values (like digit identity, stroke weight, angle, etc.) with exactly the right statitics. Then it can use later layers to map those latent values to a fully-rendered digit. Tutorial on Variational Autoencoders : https://arxiv.org/pdf/1606.05908 VAE Question: Is it enough to model p(z) with simple distribution like normal distribution? Generator가 여러 개의 레이어를 사용할 경우, 처음 몇 개의 레이어들를 통해 복잡할 수 있지만 딱 맞는 latent space로의 맵핑이 수행되고 나머지 레이어들을 통해 latent vector에 맞는 이미지를 생성할 수 있다.
  • 76. GENERATIVE MODEL 4 / 49 If 𝑝 𝑥 𝑔 𝜃(𝑧) = 𝒩 𝑥 𝑔 𝜃 𝑧 , 𝜎2 ∗ 𝐼 , the negative log probability of X is proportional squared Euclidean distance between 𝑔 𝜃(𝑧) and 𝑥. 𝑥 : Figure 3(a) 𝑧 𝑏𝑎𝑑 → 𝑔 𝜃 𝑧 𝑏𝑎𝑑 : Figure 3(b) 𝑧 𝑏𝑎𝑑 → 𝑔 𝜃 𝑧 𝑔𝑜𝑜𝑑 : Figure 3(c) (identical to x but shifted down and to the right by half a pixel) 𝑥 − 𝑧 𝑏𝑎𝑑 2 < 𝑥 − 𝑧 𝑔𝑜𝑜𝑑 2 → 𝑝 𝑥 𝑔 𝜃(𝑧 𝑏𝑎𝑑) >𝑝 𝑥 𝑔 𝜃(𝑧 𝑔𝑜𝑜𝑑) Solution 1: we should set the 𝜎 hyperparameter of our Gaussian distribution such that this kind of erroroneous digit does not contribute to p(X)  hard.. Solution 2: we would likely need to sample many thousands of digits from 𝑧 𝑔𝑜𝑜𝑑  hard.. 𝑝(𝑥) ≈ ෍ 𝑖 𝑝 𝑥 𝑔 𝜃 𝑧𝑖 𝑝(𝑧𝑖) VAE Question: Why don’t we use maximum likelihood estimation directly? 생성기에 대한 확률모델을 가우시안으로 할 경우, MSE관점 에서 가까운 것이 더 p(x)에 기여하는 바가 크다. MSE가 더 작은 이미지가 의미적으로도 더 가까운 경우가 아닌 이미지들이 많기 때문에 현실적으로 올바른 확률값을 구하기가 어렵다. Tutorial on Variational Autoencoders : https://arxiv.org/pdf/1606.05908 pθ(x|z)
  • 77. ELBO : Evidence LowerBOundVARIATIONAL INFERENCE 5 / 49 VAE 앞 슬라이드에서 Solution2가 가능하게 하는 방법 중 하나는 z를 정규분포에서 샘플링하는 것보다 x와 유의미하게 유사한 샘플이 나올 수 있는 확률분포 𝑝(𝑧|𝑥)로 부터 샘플링하면 된다. 그러나 𝑝(𝑧|𝑥) 가 무엇인지 알지 못하므로, 우리가 알고 있는 확률분포 중 하나를 택해서 (𝑞 𝜙(𝑧|𝑥)) 그것의 파라미터값을 (𝜆) 조정하여 𝑝(𝑧|𝑥) 와 유사하게 만들어 본다. (Variational Inference) https://www.slideshare.net/haezoom/variational-autoencoder-understanding-variational-autoencoder-from-various-perspectives http://shakirm.com/papers/VITutorial.pdf 𝑝 𝑧|𝑥 ≈ 𝑞 𝜙 𝑧|𝑥 ~𝑧 𝑥Generator gθ(.) Latent Variable Target Data One possible solution : sampling 𝑧 from 𝑝(𝑧|𝑥) [ Variational Inference ]
  • 78. ELBO : Evidence LowerBOundVARIATIONAL INFERENCE 6 / 49 VAE [ Jensen’s Inequality ] For concave functions f(.) f(E[x])≥E[f(x)]log 𝑝 𝑥 = log න 𝑝 𝑥|𝑧 𝑝(𝑧)𝑑𝑧 ≥ න log 𝑝 𝑥|𝑧 𝑝(𝑧)𝑑𝑧 f(.) = log(.) is concave log 𝑝 𝑥 = log න 𝑝 𝑥|𝑧 𝑝(𝑧) 𝑞 𝜙(𝑧|𝑥) 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 ≥ න log 𝑝 𝑥|𝑧 𝑝(𝑧) 𝑞 𝜙(𝑧|𝑥) 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 log 𝑝 𝑥 ≥ න log 𝑝 𝑥|𝑧 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 − න log 𝑞 𝜙(𝑧|𝑥) 𝑝 𝑧 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 Variational lower bound Evidence lower bound (ELBO) = 𝔼 𝑞 𝜙 𝑧|𝑥 log 𝑝 𝑥|𝑧 − 𝐾𝐿 𝑞 𝜙 𝑧|𝑥 |𝑝 𝑧 Relationship among 𝑝 𝑥 , 𝑝 𝑧 𝑥 , 𝑞 𝜙 𝑧|𝑥 : Derivation 1 ← ←Variational inference 𝐸𝐿𝐵𝑂( 𝜙) ELBO를 최대화하는 𝜙∗ 값을 찾으면 log 𝑝 𝑥 = 𝔼 𝑞 𝜙∗ 𝑧|𝑥 log 𝑝 𝑥|𝑧 − 𝐾𝐿 𝑞 𝜙∗ 𝑧|𝑥 |𝑝 𝑧 이다.
  • 79. ELBO : Evidence LowerBOundVARIATIONAL INFERENCE 7 / 49 log 𝑝 𝑥 = න log 𝑝 𝑥 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 ← ‫׬‬ 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 = 1 = න log 𝑝 𝑥, 𝑧 𝑝 𝑧|𝑥 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 ← 𝑝 𝑥 = 𝑝(𝑥, 𝑧) 𝑝(𝑧|𝑥) = න log 𝑝 𝑥, 𝑧 𝑞 𝜙 𝑧|𝑥 ∙ 𝑞 𝜙 𝑧|𝑥 𝑝 𝑧|𝑥 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 = න log 𝑝 𝑥, 𝑧 𝑞 𝜙 𝑧 𝑥 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 + න log 𝑞 𝜙 𝑧 𝑥 𝑝 𝑧|𝑥 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 𝐾𝐿 𝑞 𝜙 𝑧|𝑥 ∥ 𝑝 𝑧 𝑥 VAE 𝐸𝐿𝐵𝑂( 𝜙) 두 확률분포 간의 거리≥0 log 𝑝 𝑥 𝐸𝐿𝐵𝑂(𝜙) 𝜙1 𝜙2 KL을 최소화하는 𝑞 𝜙 𝑧|𝑥 의 𝜙값을 찾으면 되는데 𝑝 𝑧|𝑥 를 모르기 때문에, KL최소화 대신에 ELBO를 최대화하는𝜙값을 찾는다. Relationship among 𝑝 𝑥 , 𝑝 𝑧 𝑥 , 𝑞 𝜙 𝑧|𝑥 : Derivation 2 𝐾𝐿(𝑞 𝜙 𝑧|𝑥 ∥ 𝑝 𝑧|𝑥 )
  • 80. 8 / 49 log 𝑝 𝑥 = 𝐸𝐿𝐵𝑂( 𝜙) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥 𝑝 𝑧|𝑥 𝑞 𝜙∗ 𝑧|𝑥 = argmax 𝜙 𝐸𝐿𝐵𝑂(𝜙) 𝐸𝐿𝐵𝑂 𝜙 = න log 𝑝 𝑥, 𝑧 𝑞 𝜙 𝑧 𝑥 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 = න log 𝑝 𝑥 𝑧 𝑝(z) 𝑞 𝜙 𝑧|𝑥 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 ELBO : Evidence LowerBOundVARIATIONAL INFERENCE VAE = 𝔼 𝑞 𝜙 𝑧|𝑥 log 𝑝 𝑥|𝑧 − 𝐾𝐿 𝑞 𝜙 𝑧|𝑥 |𝑝 𝑧 = න log 𝑝 𝑥 𝑧 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 − න log 𝑞 𝜙 𝑧|𝑥 𝑝(z) 𝑞 𝜙(𝑧|𝑥)𝑑𝑧 앞 슬라이드에서의 KL과 인자가 다른 것에 유의 Relationship among 𝑝 𝑥 , 𝑝 𝑧 𝑥 , 𝑞 𝜙 𝑧|𝑥 : Derivation 2
  • 81. 9 / 49 DerivationLOSS FUNCTION VAE 𝑝 𝑧|𝑥 ≈ 𝑞 𝜙 𝑧|𝑥 ~𝑧 𝑥Generator gθ(.) Latent Variable Target Data log 𝑝 𝑥 ≥ 𝔼 𝑞 𝜙 𝑧|𝑥 log 𝑝 𝑥|𝑧 − 𝐾𝐿 𝑞 𝜙 𝑧|𝑥 |𝑝 𝑧 = 𝐸𝐿𝐵𝑂 𝜙 arg min 𝜙,𝜃 ෍ 𝑖 −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑝 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧 − ෍ 𝑖 log 𝑝 𝑥𝑖 ≤ − ෍ 𝑖 𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑝 𝑥𝑖|𝑔 𝜃(𝑧) − 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧 Optimization Problem 1 on 𝜙: Variational Inference Optimization Problem 2 on 𝜃: Maximum likelihood Final Optimization Problem
  • 82. 10 / 49 NeuralNet PerspectiveLOSS FUNCTION VAE 𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 arg min 𝜙,𝜃 ෍ 𝑖 −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑝 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧 𝑥𝑥 𝑔 𝜃(∙)𝑞 𝜙 ∙ 𝑞 𝜙 𝑧|𝑥 ~𝑧 𝑔 𝜃 𝑥|𝑧 Encoder Posterior Inference Network Decoder Generator Generation Network SAMPLING The mathematical basis of VAEs actually has relatively little to do with classical autoencoders Tutorial on Variational Autoencoders : https://arxiv.org/pdf/1606.05908
  • 83. ExplanationLOSS FUNCTION 11 / 49 𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑝 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧 Reconstruction Error Regularization VAE 𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 • 현재 샘플링 용 함수에 대한 negative log likelihood • 𝑥𝑖에 대한 복원 오차 (AutoEncoder 관점) • 현재 샘플링 용 함수에 대한 추가 조건 • 샘플링의 용의성/생성 데이터에 대한 통제성을 위한 조건을 prior에 부여 하고 이와 유사해야 한다는 조건을 부여 다루기 쉬운 확률 분포 중 선택 Variational inference를 위한 approximation class 중 선택 원 데이터에 대한 likelihood arg min 𝜙,𝜃 ෍ 𝑖 −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑝 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
  • 84. RegularizationLOSS FUNCTION 12 / 49 𝑞 𝜙 𝑧|𝑥𝑖 ~𝑁(𝜇𝑖, 𝜎𝑖 2 𝐼) Assumption 1 [Encoder : approximation class] multivariate gaussian distribution with a diagonal covariance 𝑝(𝑧) ~𝑁(0, 𝐼) Assumption 2 [prior] multivariate normal distribution VAE 𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧 Regularization Assumptions 𝑥𝑖 𝑞 𝜙 𝑧|𝑥𝑖 𝜇𝑖 𝜎𝑖
  • 85. RegularizationLOSS FUNCTION 13 / 49 VAE Regularization KL divergence 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧 = 1 2 𝑡𝑟 𝜎𝑖 2 𝐼 + 𝜇𝑖 𝑇 𝜇𝑖 − 𝐽 + ln 1 ς 𝑗=1 𝐽 𝜎𝑖,𝑗 2 = 1 2 ෍ 𝑗=1 𝐽 𝜎𝑖,𝑗 2 + ෍ 𝑗=1 𝐽 𝜇𝑖,𝑗 2 − 𝐽 − ෍ 𝑗=1 𝐽 ln 𝜎𝑖,𝑗 2 = 1 2 ෍ 𝑗=1 𝐽 𝜇𝑖,𝑗 2 + 𝜎𝑖,𝑗 2 − ln 𝜎𝑖,𝑗 2 − 1 priorposterior 𝑥𝑖 𝑞 𝜙 𝑧|𝑥𝑖 𝜇𝑖 𝜎𝑖 𝐽 Easy to compute!! 𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
  • 86. Reconstruction errorLOSS FUNCTION 14 / 49 VAE Sampling 𝑥𝑖 𝑞 𝜙 𝑧|𝑥𝑖 𝜇𝑖 𝜎𝑖 Reconstruction Error 𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑝 𝜃 𝑥𝑖|𝑧 = න log 𝑝 𝜃 𝑥𝑖|𝑧 𝑞 𝜙 𝑧|𝑥𝑖 𝑑𝑧 ≈ 1 𝐿 ෍ 𝑧 𝑖,𝑙 log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖,𝑙Monte-carlo technique  𝑧 𝑖,1 𝑧 𝑖,2 𝑧 𝑖,𝑙 𝑧 𝑖,𝐿 …… log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖,1 log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖,2 log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖,𝑙 log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖,𝐿 …… mean …… Reconstruction Error • L is the number of samples for latent vector • Usually L is set to 1 for convenience SAMPLING https://home.zhaw.ch/~dueo/bbs/files/vae.pdf 𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
  • 87. Reconstruction errorLOSS FUNCTION 15 / 49 VAE Reparameterization Trick 𝑧 𝑖,𝑙~𝑁(𝜇𝑖, 𝜎𝑖 2 𝐼)Sampling Process 𝑧 𝑖,𝑙 = 𝜇𝑖 + 𝜎𝑖 2 ⨀𝜖 𝜖~𝑁(0, 𝐼) Same distribution! But it makes backpropagation possible!! https://home.zhaw.ch/~dueo/bbs/files/vae.pdf
  • 88. Reconstruction errorLOSS FUNCTION 16 / 49 VAE Assumption Reconstruction Error 𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑝 𝜃 𝑥𝑖|𝑧 = න log 𝑝 𝜃 𝑥𝑖|𝑧 𝑞 𝜙 𝑧|𝑥𝑖 𝑑𝑧 ≈ 1 𝐿 ෍ 𝑧 𝑖,𝑙 log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖,𝑙 ≈ log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖 Monte-carlo technique L=1 𝑝𝑖𝑔 𝜃(∙)𝑧 𝑖 Assumption 3-1 [Decoder, likelihood] multivariate bernoulli or gaussain distribution log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖 = log ෑ 𝑗=1 𝐷 𝑝 𝜃 𝑥𝑖,𝑗|𝑧 𝑖 = ෍ 𝑗=1 𝐷 log 𝑝 𝜃 𝑥𝑖,𝑗|𝑧 𝑖 𝐷 = ෍ 𝑗=1 𝐷 log 𝑝𝑖,𝑗 𝑥 𝑖,𝑗 1 − 𝑝𝑖,𝑗 1−𝑥 𝑖,𝑗 ← 𝑝𝑖,𝑗 ≗ 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑜𝑢𝑡𝑝𝑢𝑡 = ෍ 𝑗=1 𝐷 𝑥𝑖,𝑗 log 𝑝𝑖,𝑗 + (1 − 𝑥𝑖,𝑗) log 1 − 𝑝𝑖,𝑗 ← Cross entropy 𝑝 𝜃 𝑥𝑖|𝑧 𝑖 ~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝𝑖) 𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
  • 89. Reconstruction errorLOSS FUNCTION 17 / 49 VAE Assumption Reconstruction Error 𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑝 𝜃 𝑥𝑖|𝑧 ≈ log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖 𝜎𝑖 𝑔 𝜃(∙)𝑧 𝑖 Assumption 3-2 [Decoder, likelihood] multivariate bernoulli or gaussain distribution 𝐷 ← Squared Error 𝜇𝑖 𝐷 log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖 = log 𝑁(𝑥𝑖; 𝜇𝑖, 𝜎𝑖 2 𝐼) = − ෍ 𝑗=1 𝐷 1 2 log 𝜎𝑖,𝑗 2 + 𝑥𝑖,𝑗 − 𝜇𝑖,𝑗 2 2𝜎𝑖,𝑗 2 For gaussain distribution with identity covariance log 𝑝 𝜃 𝑥𝑖|𝑧 𝑖 ∝ − ෍ 𝑗=1 𝐷 𝑥𝑖,𝑗 − 𝜇𝑖,𝑗 2 𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑥𝑖|𝑔 𝜃(𝑧) + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧
  • 90. Default : Gaussian Encoder + Bernoulli DecoderSTRUCTURE 18 / 49 VAE 𝑥𝑖 𝑞 𝜙 ∙ 𝜇𝑖 𝜎𝑖 𝑝𝑖 𝑔 𝜃(∙) 𝑧 𝑖 𝜖𝑖 SAMPLING 𝒩(0, 𝐼) Reparameterization Trick Gaussian Encoder Bernoulli Decoder Reconstruction Error: − σ 𝑗=1 𝐷 𝑥𝑖,𝑗 log 𝑝𝑖,𝑗 + (1 − 𝑥𝑖,𝑗) log 1 − 𝑝𝑖,𝑗 Regularization : 1 2 σ 𝑗=1 𝐽 𝜇𝑖,𝑗 2 + 𝜎𝑖,𝑗 2 − ln 𝜎𝑖,𝑗 2 − 1 𝐷 𝐽
  • 91. Gaussian Encoder + Gaussian DecoderSTRUCTURE 19 / 49 VAE 𝑥𝑖 𝑞 𝜙 ∙ 𝜇𝑖 𝜎𝑖 𝜇𝑖 ′ 𝑔 𝜃(∙) 𝑧 𝑖 𝜖𝑖 SAMPLING Reparameterization Trick Gaussian Encoder Gaussian Decoder Reconstruction Error: σ 𝑗=1 𝐷 1 2 log 𝜎𝑖,𝑗 ′2 + 𝑥 𝑖,𝑗−𝜇 𝑖,𝑗 ′ 2 2𝜎𝑖,𝑗 ′2 Regularization : 1 2 σ 𝑗=1 𝐽 𝜇𝑖,𝑗 2 + 𝜎𝑖,𝑗 2 − ln 𝜎𝑖,𝑗 2 − 1 𝐷 𝐽 𝜎𝑖 ′ 𝒩(0, 𝐼)
  • 92. Gaussian Encoder + Gaussian Decoder with Identity CovarianceSTRUCTURE 20 / 49 VAE 𝑥𝑖 𝑞 𝜙 ∙ 𝜇𝑖 𝜎𝑖 𝜇𝑖 𝑔 𝜃(∙) 𝑧 𝑖 𝜖𝑖 SAMPLING Reparameterization Trick Gaussian Encoder Gaussian Decoder Reconstruction Error: σ 𝑗=1 𝐷 𝑥 𝑖,𝑗−𝜇 𝑖,𝑗 ′ 2 2 Regularization : 1 2 σ 𝑗=1 𝐽 𝜇𝑖,𝑗 2 + 𝜎𝑖,𝑗 2 − ln 𝜎𝑖,𝑗 2 − 1 𝐷 𝐽 𝒩(0, 𝐼)
  • 93. MNISTRESULT 21 / 49 28 VAE 𝑥𝑖 𝑞 𝜙 ∙ 𝜇𝑖 𝜎𝑖 𝑝𝑖 𝑔 𝜃(∙) 𝑧 𝑖 𝜖𝑖 SAMPLING Reparameterization Trick Gaussian Encoder Bernoulli Decoder 𝐷 𝐽 28 𝐷=784 MLP with 2 hidden layers (500, 500) Architecture 𝒩(0, 𝐼)
  • 94. MNISTRESULT 22 / 49 VAE Reproduce Input image J = |z| =2 J = |z| =5 J = |z| =20 https://github.com/hwalsuklee/tensorflow-mnist-VAE
  • 95. MNISTRESULT 23 / 49 VAE Denoising Input image + zero-masking noise with 50% prob. + salt&peppr noise with 50% prob. Restored image https://github.com/hwalsuklee/tensorflow-mnist-VAE
  • 96. MNISTRESULT 24 / 49 VAE Learned Manifold https://github.com/hwalsuklee/tensorflow-mnist-VAE AE VAE • 테스트 샘플 중 5000개가 매니폴드 어느 위치에 맵핑이 되는 지 보여줌. • 학습 6번의 결과를 애니메이션으로 표현. • 생성 관점에서는 다루고자 하는 매니폴드의 위치가 안정적이야 좋음.
  • 97. MNISTRESULT 25 / 49 VAE Learned Manifold 학습이 잘 되었을 수록 2D공간에서 같은 숫자들을 생성하는 z들은 뭉쳐있고, 다른 숫자들은 생성하는 z들은 떨어져 있어야 한다. z1 z2 https://github.com/hwalsuklee/tensorflow-mnist-VAE A A B B C C D D
  • 98. IntroductionCVAE 26 / 49 VAE Conditional VAE 𝑥 ℎ ℎ 𝜇𝜎 𝑧 𝑥 ℎ ℎ 𝜖 𝑞 𝜆 𝑧|𝑥 𝑝 𝜃 𝑥|𝑧 Vanilla VAE (M1) CVAE (M2) : supervised version 𝑥 ℎ ℎ 𝜇𝜎 𝑧 𝑥 ℎ ℎ 𝜖 𝑞 𝜆 𝑧|𝑥, 𝑦 𝑝 𝜃 𝑥|𝑧, 𝑦 𝑦 𝑦 Condition on latent space Condition on output
  • 99. M2CVAE 27 / 49 VAE Summary CVAE (M2) : supervised version 𝑥 ℎ ℎ 𝜇𝜎 𝑧 𝑥 ℎ ℎ 𝜖 𝑞 𝜆 𝑧|𝑥, 𝑦 𝑝 𝜃 𝑥|𝑧, 𝑦 𝑦 𝑦 Condition on latent space Condition on output log 𝑝 𝜃 𝑥, 𝑦 = log න 𝑝 𝜃 𝑥, 𝑦|𝑧 𝑝(𝑧) 𝑞 𝜙 𝑧 𝑥, 𝑦 𝑞 𝜙(𝑧|𝑥, 𝑦)𝑑𝑧 = 𝔼 𝑞 𝜙 𝑧|𝑥,𝑦 ቂlog 𝑝 𝜃 𝑥|𝑦, 𝑧 + log 𝑝 𝑦 = න log 𝑝 𝜃 𝑥|𝑦, 𝑧 𝑝(𝑦)𝑝(𝑧) 𝑞 𝜙 𝑧 𝑥, 𝑦 𝑞 𝜙(𝑧|𝑥, 𝑦)𝑑𝑧 ≥ න log 𝑝 𝜃 𝑥, 𝑦|𝑧 𝑝(𝑧) 𝑞 𝜙 𝑧 𝑥, 𝑦 𝑞 𝜙(𝑧|𝑥, 𝑦)𝑑𝑧 ELBO!!= −ℒ(𝑥, 𝑦)
  • 100. M3CVAE 28 / 49 VAE Summary CVAE (M2) : unsupervised version 𝑥 ℎ ℎ 𝜇𝜎 𝑧 𝑥 ℎ ℎ 𝜖 𝑦 ℎ ℎ CVAE (M3) Train M1 Train M2
  • 101. 29 / 49 VAE Architecture : M2 supervised version MNIST resultsCVAE 𝑥 ℎ ℎ 𝜇𝜎 𝑧 𝑥 ℎ ℎ 𝜖 𝑞 𝜆 𝑧|𝑥, 𝑦 𝑝 𝜃 𝑥|𝑧, 𝑦 𝑦 𝑦 Label info MLP with 2 hidden layers (500, 500) MLP with 2 hidden layers (500, 500) Label info
  • 102. 30 / 49 input CVAE, epoch 1 VAE, epoch 1 CVAE, epoch 20 VAE, epoch 20 VAE Reproduce |z| = 2 MNIST resultsCVAE https://github.com/hwalsuklee/tensorflow-mnist-CVAE
  • 103. 31 / 49 CVAE, epoch 1 VAE, epoch 1 CVAE, epoch 20 VAE, epoch 20 input VAE Denoising |z| = 2 MNIST resultsCVAE https://github.com/hwalsuklee/tensorflow-mnist-CVAE
  • 104. 32 / 49 VAE Handwriting styles obtained by fixing the class label and varying z |z| = 2 y=[1,0,0,0,0,0,0,0,0,0] y=[0,1,0,0,0,0,0,0,0,0] y=[0,0,1,0,0,0,0,0,0,0] y=[0,0,0,1,0,0,0,0,0,0] y=[0,0,0,0,1,0,0,0,0,0] y=[0,0,0,0,0,0,1,0,0,0]y=[0,0,0,0,0,1,0,0,0,0] y=[0,0,0,0,0,0,0,0,1,0]y=[0,0,0,0,0,0,0,1,0,0] y=[0,0,0,0,0,0,0,0,0,1] MNIST resultsCVAE https://github.com/hwalsuklee/tensorflow-mnist-CVAE
  • 105. 33 / 49 Z-sampling 각 행 별로, 고정된 z값에 대해서 label정보만 바꿔 서 이미지 생성 (스타일 유지하면 숫자만 바뀜) VAE MNIST resultsCVAE Analogies : Result in paper Semi-Supervised Learning with Deep Generative Models : https://arxiv.org/abs/1406.5298
  • 106. 34 / 49 𝑧1 𝑧2 𝑧3 𝑧4 𝑐0 𝑐1 𝑐2 𝑐3 𝑐4 𝑐5 𝑐6 𝑐7 𝑐8 𝑐9 Handwriting style for a given z must be preserved for all labels VAE Analogies |z| = 2 MNIST resultsCVAE https://github.com/hwalsuklee/tensorflow-mnist-CVAE 𝑐0 𝑐1 𝑐2 𝑐3 𝑐4 𝑐5 𝑐6 𝑐7 𝑐8 𝑐9 Real handwritten image 실제로 손으로 쓴 글씨 ‘3’을 CVAE의 label정보와 같이 넣었을 때 얻는 latent vector는 decoder의 고정 입력으로 하고, label정보만 바꿨을 경우
  • 107. 35 / 49 Things are messy here, in contrast to VAE’s Q(z|X), which nicely clusters z. But if we look at it closely, we could see that given a specific value of c=y, Q(z|X,c=y) is roughly N(0,1)! It’s because, if we look at our objective above, we are now modeling P(z|c), which we infer variationally with a N(0,1). Q(z|X,c=y)가 N(0,1)에 가까운 모습인데, P(z|c)가 N(0,1)이고 Q(z|X,c=y)는 P(z|c)와의 KL- Divergence를 최소화하도록 학습이 되기 때문에 바람직한 현상이다. (P(z|X,c=y) = Q(z|X,c=y)임은 이미지 결과로 확인했음.) VAE Learned Manifold |z| = 2 MNIST resultsCVAE
  • 108. 36 / 49 N = 100일 때 직접 돌려보니, 0.9514  4.86 (총 50000개 중, 100개만 레이블 사용, 49900개는 미사용) VAE MNIST resultsCVAE Classification : Result in paper https://github.com/saemundsson/semisupervised_vae Semi-Supervised Learning with Deep Generative Models : https://arxiv.org/abs/1406.5298
  • 109. 37 / 49 IntroductionAAE 𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑝 𝜃 𝑥𝑖|𝑧 + 𝐾𝐿(𝑞 𝜙 𝑧|𝑥𝑖 ∥ 𝑝 𝑧 ) Regularization Conditions for 𝑞 𝜙 𝑧|𝑥𝑖 , 𝑝 𝑧 1. Easily draw samples from distribution 2. KL divergence can be calculated Adversarial Autoencoder (AAE) Adversarial Autoencoder Conditions 𝑞 𝜙 𝑧|𝑥𝑖 𝑝 𝑧 Easily draw samples from distribution O O KL divergence can be calculated X X VAE KL divergence is replaced by discriminator in GAN
  • 110. 38 / 49 ArchitectureAAE Generative Adversarial Network VAE 𝑝 𝑧(𝑧) Generator 𝑝 𝑑𝑎𝑡𝑎(𝑥) Yes / No 𝐷 𝑥 = 1 𝑥 𝐷 𝐺 𝑧 = 1 𝑉 𝐷, 𝐺 = 𝔼 𝑥~𝑝 𝑑𝑎𝑡𝑎(𝑥) log𝐷(𝑥) + 𝔼 𝑧~𝑝 𝑧(𝑧) log 1 − 𝐷 𝐺 𝑧Value function of GAN : Goal : 𝐷∗, 𝐺∗ = min 𝐺 max 𝐷 𝑉 𝐷, 𝐺 Discriminator 𝑧 𝐺(𝑧) 𝐷 𝐺 𝑧 = 0 GAN은 𝐺 𝑧 ~𝑝 𝑑𝑎𝑡𝑎(𝑥)로 만드는 것이 목적이다
  • 111. 39 / 49 ArchitectureAAE Overall VAE AutoEncoder Prior Distribution (Target Distribution) Discriminator Generator Adversarial Autoencoders : https://arxiv.org/abs/1511.05644
  • 112. 40 / 49 TrainingAAE Loss Function VAE 𝑉 𝐷, 𝐺 = 𝔼 𝑧~𝑝(𝑧) log𝐷(𝑧) + 𝔼 𝑥~𝑝(𝑥) log 1 − 𝐷 𝑞 𝜙(𝑥)GAN loss VAE loss 𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑝 𝜃 𝑥𝑖|𝑧 + 𝐾𝐿 𝑞 𝜙 𝑧|𝑥𝑖 |𝑝 𝑧 Let’s say G is defined by 𝑞 𝜙(∙) and D is defined by 𝑑 𝜆 ∙ 𝑉𝑖 𝜙, 𝜆, 𝑥𝑖, 𝑧𝑖 = log𝑑 𝜆(𝑧𝑖) + log 1 − 𝑑 𝜆 𝑞 𝜙(𝑥𝑖) *논문에는 로스 정의가 제시되어 있지 않아 새로 정리한 내용
  • 113. 41 / 49 TrainingAAE Training Procedure VAE −𝑉𝑖 𝜙, 𝜆, 𝑥𝑖, 𝑧𝑖 = −log𝑑 𝜆 𝑧𝑖 − log 1 − 𝑑 𝜆 𝑞 𝜙(𝑥𝑖) Training Step 1 : Update AE update 𝜙, 𝜃 according to reconstruction error 𝐿𝑖 𝜙, 𝜃, 𝑥𝑖 = −𝔼 𝑞 𝜙 𝑧|𝑥 𝑖 log 𝑝 𝜃 𝑥𝑖|𝑧 Training Step 2 : Update Discriminator update 𝜆 according to loss for discriminator −𝑉𝑖 𝜙, 𝜆, 𝑥𝑖, 𝑧𝑖 = −log 𝑑 𝜆 𝑞 𝜙(𝑥𝑖) Training Step 3 : Update Generator update 𝜙 according to loss for discriminator For drawn samples 𝑥𝑖 from training data set, 𝑧𝑖 from prior distribution 𝑝(𝑧) *논문에는 학습 절차 정의가 수식으로 제시되어 있지 않아 새로 정리한 내용
  • 114. 42 / 49 MNIST ResultsAAE VAE VS AAE VAE 𝑝 𝑧 : 𝑚𝑖𝑥𝑡𝑢𝑟𝑒 𝑜𝑓 10 𝑔𝑎𝑢𝑠𝑠𝑖𝑎𝑛𝑠 𝑝 𝑧 : 𝒩(0,52 𝐼) VAE는 자주 나오는 값을 파악하는 것 중시하여 빈 공간이 가끔 있는 반면, AAE는 분포의 모양을 중시하여 빈 공간이 상 대적으로 적다. Adversarial Autoencoders : https://arxiv.org/abs/1511.05644
  • 115. 43 / 49 MNIST ResultsAAE Incorporating Label Information in the Adversarial Regularization VAE Condition on latent space • Discriminator에 prior distribution에서 뽑은 샘플이 입 력으로 들어갈 때는 해당 샘플이 어떤 label을 가져야 하는지에 대한 condition을 Discriminator에 넣어준다. • Discriminator에 posterior distribution에서 뽑은 샘플이 입력으로 들어갈 때는 해당 이미지에 대한 label을 Discriminator에 넣어준다. • 특정 label의 이미지는 Latent space에서 의도된 구간으 로 맵핑된다. Adversarial Autoencoders : https://arxiv.org/abs/1511.05644
  • 116. 44 / 49 MNIST ResultsAAE Incorporating Label Information in the Adversarial Regularization VAE 각 gaussian 분포에서 동일 위치는 동일 스타일을 갖는다. 나선을 따라서 해당 샘플들을 순차적으로 복원했을 때의 결과Adversarial Autoencoders : https://arxiv.org/abs/1511.05644
  • 117. 45 / 49 MNIST ResultsAAE Supervised Adversarial Autoencoders VAE Condition on generated data 𝑐0 𝑐1 𝑐2 𝑐3 𝑐4 𝑐5 𝑐6 𝑐7 𝑐8 𝑐9 각 행은 동일 z값Adversarial Autoencoders : https://arxiv.org/abs/1511.05644
  • 118. 46 / 49 MNIST ResultsAAE Semi-Supervised Adversarial Autoencoders VAE auxiliary classifier Discriminator • Label이 제공되지 않을 경우에는 auxiliary classifier를 통해 label을 예측하고 이 예측이 맞는지를 확인하기 위 한 discriminator를 추가로 학습시킨다. Adversarial Autoencoders : https://arxiv.org/abs/1511.05644
  • 119. 47 / 49 MNIST ResultsAAE Incorporating Label Information in the Adversarial Regularization VAE 실제 실험 결과 : mixture of 10 gaussians 실제 실험 결과 : swiss roll https://github.com/hwalsuklee/tensorflow-mnist-AAE
  • 120. 48 / 49 MNIST ResultsAAE Incorporating Label Information in the Adversarial Regularization VAE 실제 실험 결과 : mixture of 10 gaussians Learned Manifold Generation https://github.com/hwalsuklee/tensorflow-mnist-AAE
  • 121. 49 / 49 MNIST ResultsAAE Incorporating Label Information in the Adversarial Regularization VAE 실제 실험 결과 : swiss roll Learned Manifold Generation https://github.com/hwalsuklee/tensorflow-mnist-AAE
  • 122. 01. Revisit Deep Neural Networks 02. Manifold Learning 03. Autoencoders 04. Variational Autoencoders 05. Applications 데이터 압축의 주 사용처인 retrieval에 관한 사례, 생성 모델로서의 VAE를 사용한 사례, 최근 가장 유행하는 GAN 방법과 결합하여 사용되는 VAE들을 소개한다. • Retrieval • Generation • GAN+VAE
  • 123. Information Retrieval via AutoencodersRETRIEVAL 1 / 22 APPLICATIONS • Text • Semantic Hashing (Link) http://www.cs.utoronto.ca/~rsalakhu/papers/semantic_final.pdf • Dynamic Auto-Encoders for Semantic Indexing (Link) http://yann.lecun.com/exdb/publis/pdf/mirowski-nipsdl-10.pdf • Image • Using Very Deep Autoencoders for Content-Based Image Retrieval (Link) http://nuyoo.utm.mx/~jjf/rna/A6%20Using%20Very%20Deep%20Autoencoders%20for%20Content- Based%20Image%20Retrieval.pdf • Autoencoding the Retrieval Relevance of Medical Images (Link) https://arxiv.org/pdf/1507.01251.pdf • Sound • Retrieving Sounds by Vocal Imitation Recognition (Link) http://www.ece.rochester.edu/~zduan/resource/ZhangDuan_RetrievingSoundsByVocalImitationRecognition_MLSP 15.pdf
  • 124. Information Retrieval via AutoencodersRETRIEVAL 2 / 22 APPLICATIONS • 3D model • Deep Learning Representation using Autoencoder for 3D Shape Retrieval (Link) https://arxiv.org/pdf/1409.7164.pdf • Deep Signatures for Indexing and Retrieval in Large Motion Databases (Link) http://web.cs.ucdavis.edu/~neff/papers/MIG_2015_DeepSignature.pdf • DeepShape: Deep Learned Shape Descriptor for 3D Shape Matching and Retrieval (Link) http://www.cv- foundation.org/openaccess/content_cvpr_2015/papers/Xie_DeepShape_Deep_Learned_2015_CVPR_paper.pdf • Multi-modal • Cross-modal Retrieval with Correspondence Autoencoder (Link) https://people.cs.clemson.edu/~jzwang/1501863/mm2014/p7-feng.pdf • Effective multi-modal retrieval based on stacked autoencoders (Link) http://www.comp.nus.edu.sg/~ooibc/crossmodalvldb14.pdf
  • 125. Gray Face / Handwritten DigitsGENERATION 3 / 22 APPLICATIONS http://vdumoulin.github.io/morphing_faces/online_demo.html |z|=29 64 64 http://www.dpkingma.com/sgvb_mnist_demo/demo.html |z|=12 24 24 Handwritten Digits GenerationGray Face Generation
  • 126. Deep Feature Consistent Variational AutoencoderGENERATION 4 / 22 APPLICATIONS https://arxiv.org/abs/1610.00291 celeba DB BEGAN
  • 127. Deep Feature Consistent Variational AutoencoderGENERATION 5 / 22 APPLICATIONS https://arxiv.org/abs/1610.00291
  • 128. Sketch RNNGENERATION 6 / 22 APPLICATIONS https://magenta.tensorflow.org/sketch-rnn-demo The model can also mimic your drawings and produce similar doodles. In the Variational Autoencoder Demo, you are to draw a complete drawing of a specified object. After you draw a complete sketch inside the area on the left, hit the auto-encode button and the model will start drawing similar sketches inside the smaller boxes on the right. Rather than drawing a perfect duplicate copy of your drawing, the model will try to mimic your drawing instead. You can experiment drawing objects that are not the category you are supposed to draw, and see how the model interprets your drawing. For example, try to draw a cat, and have a model trained to draw crabs generate cat-like crabs. Try the Variational Autoencoder demo. https://magenta.tensorflow.org/assets/sketch_rnn_demo/multi_vae.html
  • 129. IntroductionGAN+VAE 7 / 22 Model Optimization Image Quality Generalization VAE • Stochastic gradient descent • Converge to local minimum • Easier • Smooth • Blurry • Tend to remember input images GAN • Alternating stochastic gradient descent • Converge to saddle points • Harder  Model collapsing  Unstable convergence • Sharp • Artifact • Generate new unseen images Comparison between VAE vs GAN x 0~1D G G(z) Discriminator Generator z 𝑧 DE𝑥 𝑥 DecoderEncoderDiscriminator   Generator VAE GAN APPLICATIONS
  • 130. IntroductionGAN+VAE 8 / 22 Comparison between VAE vs GAN APPLICATIONS VAE : maximum likelihood approach GAN http://videolectures.net/site/normal_dl/tag=1129740/deeplearning2017_courville_generative_models_01.pdf
  • 131. 9 / 22 Regularized Autoencoders x Energy G G(z) Discriminator Generator z AE GAN 𝑧 DE Reconstruction error We argue that the energy function (the discriminator) in the EBGAN framework is also seen as being regularized by having a generator producing the contrastive samples, to which the discriminator ought to give high reconstruction energies. We further argue that the EBGAN framework allows more flexibility from this perspective, because: (i)-the regularizer (generator) is fully trainable instead of being handcrafted; (ii)-the adversarial training paradigm enables a direct interaction between the duality of producing contrastive sample and learning the energy function. EBGAN : Energy-based Generative Adversarial Network ‘16.09 BEGAN : Boundary Equilibrium Generative Adversarial Networks ‘17.03 APPLICATIONS EBGAN, BEGANGAN+VAE
  • 132. 10 / 22 StackGAN : Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks , ‘16.12 Multimodal Feature Learner AI가 생성한 사진을 찾아보세요! (1) This flower has overlapping pink pointed petals surrounding a ring of short yellow filaments 이 꽃은 짧은 노란색 필라멘트의 고리를 둘러싼 핑크색 뾰족한 꽃잎이 겹쳐져 있습니다 (2) This flower has upturned petals which are thin and orange with rounded edges 이 꽃은 둥근 모서리를 가진 얇고 오렌지색의 꽃잎이 위로 향해 있습니다 (3) A flower with small pink petals and a massive central orange and black stamen cluster 작은 분홍색 꽃잎들과 중심에 다수의 오렌지색과 검은 색 수술 군집이 있는 꽃 (1) (2) (3) APPLICATIONS StackGANGAN+VAE
  • 133. 11 / 22 Multimodal Feature Learner StackGAN : Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks , ‘16.12 APPLICATIONS StackGANGAN+VAE
  • 134. 12 / 22 Learning a Probabilistic latent Space of Object Shapes via 3D Generative-Adversarial Modeling (3D-GAN), ‘16.10 Multimodal Feature Learner APPLICATIONS 3DGANGAN+VAE
  • 135. 13 / 22 Denoising SEGAN: Speech Enhancement Generative Adversarial Network ‘17. 03. 28 Nothing is safe. There will be no repeat of that performance, that I can guarantee. before after before after APPLICATIONS SEGANGAN+VAE
  • 136. 14 / 22 Age Progression/Regression by Conditional Adversarial Autoencoder https://zzutk.github.io/Face-Aging-CAAE/ APPLICATIONS Papers in CVPR2017GAN+VAE
  • 137. 15 / 22 Age Progression/Regression by Conditional Adversarial Autoencoder https://zzutk.github.io/Face-Aging-CAAE/ APPLICATIONS Papers in CVPR2017GAN+VAE
  • 138. 16 / 22 Age Progression/Regression by Conditional Adversarial Autoencoder https://zzutk.github.io/Face-Aging-CAAE/ APPLICATIONS Papers in CVPR2017GAN+VAE
  • 139. 17 / 22 PaletteNet: Image Recolorization with Given Color Palette http://tmmse.xyz/2017/07/27/palettenet/ APPLICATIONS Papers in CVPR2017GAN+VAE
  • 140. 18 / 22 PaletteNet: Image Recolorization with Given Color Palette http://tmmse.xyz/2017/07/27/palettenet/ APPLICATIONS Papers in CVPR2017GAN+VAE
  • 141. 19 / 22 Hallucinating Very Low-Resolution Unaligned and Noisy Face Images by Transformative Discriminative Autoencoders http://www.porikli.com/mysite/pdfs/porikli%202017%20-%20Hallucinating%20very%20low- resolution%20unaligned%20and%20noisy%20face%20images%20by%20transformative%20discriminative%20autoencoders.pdf APPLICATIONS Papers in CVPR2017GAN+VAE 16x16  128x128
  • 142. 20 / 22 Hallucinating Very Low-Resolution Unaligned and Noisy Face Images by Transformative Discriminative Autoencoders http://www.porikli.com/mysite/pdfs/porikli%202017%20-%20Hallucinating%20very%20low- resolution%20unaligned%20and%20noisy%20face%20images%20by%20transformative%20discriminative%20autoencoders.pdf APPLICATIONS Papers in CVPR2017GAN+VAE TUN loss DL loss TE loss
  • 143. 21 / 22 A Generative Model of People in Clothing https://arxiv.org/abs/1705.04098 APPLICATIONS Papers in ICCV2017GAN+VAE
  • 144. 22 / 22 A Generative Model of People in Clothing APPLICATIONS Papers in ICCV2017GAN+VAE Conditional generation for test time Condition on human pose Sketch info for cloth Famous pix2pix architecture https://arxiv.org/abs/1705.04098