SlideShare a Scribd company logo
1 of 157
Download to read offline
Generative Adversarial Network
and its Applications to Speech Processing
and Natural Language Processing
Hung-yi Lee
Schedule
• 14:00 – 15:10: Lecture
• 15:10 – 15:30: Break
• 15:30 – 16:40: Lecture
• 16:40 – 17:00: Break
• 17:00 – 18:00: Lecture
Yann LeCun’s comment
https://www.quora.com/What-are-some-recent-and-
potentially-upcoming-breakthroughs-in-unsupervised-learning
Yann LeCun’s comment
https://www.quora.com/What-are-some-recent-and-potentially-upcoming-breakthroughs-
in-deep-learning
……
Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, Shakir Mohamed, “Variational Approaches for Auto-
Encoding Generative Adversarial Networks”, arXiv, 2017
All Kinds of GAN … https://github.com/hindupuravinash/the-gan-zoo
GAN
ACGAN
BGAN
DCGAN
EBGAN
fGAN
GoGAN
CGAN
……
ICASSP Keyword search on session index page,
so session names are included.
Number of papers whose titles include the keyword
0
5
10
15
20
25
30
35
40
45
2012 2013 2014 2015 2016 2017 2018
generative adversarial reinforcement
GAN becomes a very
important technology.
Outline
Part I: General Introduction of Generative
Adversarial Network (GAN)
Part II: Applications to Speech and Text
Processing
Part III: Recent Progress at National
Taiwan University
Part I: General Introduction
Generative Adversarial Network
and its Applications to Speech Processing
and Natural Language Processing
Outline of Part 1
Generation by GAN
• Image Generation as Example
• Theory behind GAN
• Issues and Possible Solutions
• Evaluation
Conditional Generation
Feature Extraction
Unsupervised Conditional Generation
Relation to Reinforcement Learning
Outline of Part 1
Generation by GAN
• Image Generation as Example
• Theory behind GAN
• Issues and Possible Solutions
• Evaluation
Conditional Generation
Feature Extraction
Unsupervised Conditional Generation
Relation to Reinforcement Learning
Anime Face Generation
Draw
Generator
Examples
Basic Idea of GAN
Generator
It is a neural network
(NN), or a function.
Generator
0.1
−3
⋮
2.4
0.9
imagevector
Generator
3
−3
⋮
2.4
0.9
Generator
0.1
2.1
⋮
5.4
0.9
Generator
0.1
−3
⋮
2.4
3.5
high
dimensional
vector
Powered by: http://mattya.github.io/chainer-DCGAN/
Each dimension of input vector
represents some characteristics.
Longer hair
blue hair Open mouth
Discri-
minator
scalar
image
Basic Idea of GAN It is a neural network
(NN), or a function.
Larger value means real,
smaller value means fake.
Discri-
minator
Discri-
minator
Discri-
minator1.0 1.0
0.1 Discri-
minator
0.1
• Initialize generator and discriminator
• In each training iteration:
DG
sample
generated
objects
G
Algorithm
D
Update
vector
vector
vector
vector
0000
1111
randomly
sampled
Database
Step 1: Fix generator G, and update discriminator D
Discriminator learns to assign high scores to real objects
and low scores to generated objects.
Fix
• Initialize generator and discriminator
• In each training iteration:
DG
Algorithm
Step 2: Fix discriminator D, and update generator G
Discri-
minator
NN
Generator
vector
0.13
hidden layer
update fix
Gradient Ascent
large network
Generator learns to “fool” the discriminator
• Initialize generator and discriminator
• In each training iteration:
DG
Learning
D
Sample some
real objects:
Generate some
fake objects:
G
Algorithm
D
Update
Learning
G
G D
image
1111
image
image
image
1
update fix
0000vector
vector
vector
vector
vector
vector
vector
vector
fix
Basic Idea of GAN
NN
Generator
v1
Discri-
minator
v1
Real images:
NN
Generator
v2
Discri-
minator
v2
NN
Generator
v3
Discri-
minator
v3
This is where the term
“adversarial” comes from.
Anime Face Generation
100 updates
Source of training data: https://zhuanlan.zhihu.com/p/24767059
Anime Face Generation
1000 updates
Anime Face Generation
2000 updates
Anime Face Generation
5000 updates
Anime Face Generation
10,000 updates
Anime Face Generation
20,000 updates
Anime Face Generation
50,000 updates
The faces
generated by
machine.
The images are generated
by Yen-Hao Chen, Po-
Chun Chien, Jun-Chen Xie,
Tsung-Han Wu.
0.0
0.0
G
0.9
0.9
G
0.1
0.1
G
0.2
0.2
G
0.3
0.3
G
0.4
0.4
G
0.5
0.5
G
0.6
0.6
G
0.7
0.7
G
0.8
0.8
G
(Variational) Auto-encoder
As close as possible
NN
Encoder
NN
Decoder
code
NN
Decoder
code
Randomly generate
a vector as code Image ?
= Generator
= Generator
Auto-encoder v.s. GAN
As close as possibleNN
Decoder
code
Genera-
tor
code
= Generator
Discri-
minator
If discriminator does not simply memorize the images,
Generator learns the patterns of faces.
Just copy an image
1
Auto-encoder
GAN
Fuzzy …
FID[Martin Heusel, et al., NIPS, 2017]: Smaller is better
[Mario Lucic, et al. arXiv,
2017]
Outline of Part 1
Generation by GAN
• Image Generation as Example
• Theory behind GAN
• Issues and Possible Solutions
• Evaluation
Conditional Generation
Feature Extraction
Unsupervised Conditional Generation
Relation to Reinforcement Learning
Generation
• We want to find data distribution 𝑃𝑑𝑎𝑡𝑎 𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥
High
Probability
Low
Probability
Image
Space
𝑥: an image (a high-
dimensional vector)
Maximum Likelihood Estimation
• Given a data distribution 𝑃𝑑𝑎𝑡𝑎 𝑥 (We can sample from it.)
• We have a distribution 𝑃𝐺 𝑥; 𝜃 parameterized by 𝜃
• We want to find 𝜃 such that 𝑃𝐺 𝑥; 𝜃 close to 𝑃𝑑𝑎𝑡𝑎 𝑥
• E.g. 𝑃𝐺 𝑥; 𝜃 is a Gaussian Mixture Model, 𝜃 are means
and variances of the Gaussians
Sample 𝑥1
, 𝑥2
, … , 𝑥 𝑚
from 𝑃𝑑𝑎𝑡𝑎 𝑥
We can compute 𝑃𝐺 𝑥 𝑖; 𝜃
Likelihood of generating the samples
𝐿 = ෑ
𝑖=1
𝑚
𝑃𝐺 𝑥 𝑖; 𝜃
Find 𝜃∗ maximizing the likelihood
Maximum Likelihood Estimation
= Minimize KL Divergence
𝜃∗ = 𝑎𝑟𝑔 max
𝜃
ෑ
𝑖=1
𝑚
𝑃𝐺 𝑥 𝑖; 𝜃 = 𝑎𝑟𝑔 max
𝜃
𝑙𝑜𝑔 ෑ
𝑖=1
𝑚
𝑃𝐺 𝑥 𝑖; 𝜃
= 𝑎𝑟𝑔 max
𝜃
෍
𝑖=1
𝑚
𝑙𝑜𝑔𝑃𝐺 𝑥 𝑖
; 𝜃
≈ 𝑎𝑟𝑔 max
𝜃
𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎
[𝑙𝑜𝑔𝑃𝐺 𝑥; 𝜃 ]
= 𝑎𝑟𝑔 max
𝜃
න
𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝑃𝐺 𝑥; 𝜃 𝑑𝑥 − න
𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝑃𝑑𝑎𝑡𝑎 𝑥 𝑑𝑥
= 𝑎𝑟𝑔 min
𝜃
𝐾𝐿 𝑃𝑑𝑎𝑡𝑎||𝑃𝐺
𝑥1
, 𝑥2
, … , 𝑥 𝑚
from 𝑃𝑑𝑎𝑡𝑎 𝑥
How to define a general 𝑃𝐺?
Generator
• A generator G is a network. The network defines a
probability distribution 𝑃𝐺
generator
G𝑧 𝑥 = 𝐺 𝑧
Normal
Distribution
𝑃𝐺(𝑥) 𝑃𝑑𝑎𝑡𝑎 𝑥
as close as possible
How to compute the divergence?
𝐺∗
= 𝑎𝑟𝑔 min
𝐺
𝐷𝑖𝑣 𝑃𝐺, 𝑃𝑑𝑎𝑡𝑎
Divergence between distributions 𝑃𝐺 and 𝑃𝑑𝑎𝑡𝑎
𝑥: an image (a high-
dimensional vector)
Discriminator
𝐺∗
= 𝑎𝑟𝑔 min
𝐺
𝐷𝑖𝑣 𝑃𝐺, 𝑃𝑑𝑎𝑡𝑎
Although we do not know the distributions of 𝑃𝐺 and 𝑃𝑑𝑎𝑡𝑎,
we can sample from them.
sample
G
vector
vector
vector
vector
sample from
normal
Database
Sampling from 𝑷 𝑮
Sampling from 𝑷 𝒅𝒂𝒕𝒂
Discriminator 𝐺∗
= 𝑎𝑟𝑔 min
𝐺
𝐷𝑖𝑣 𝑃𝐺, 𝑃𝑑𝑎𝑡𝑎
Discriminator
: data sampled from 𝑃𝑑𝑎𝑡𝑎
: data sampled from 𝑃𝐺
train
𝑉 𝐺, 𝐷 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎
𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼𝑃 𝐺
𝑙𝑜𝑔 1 − 𝐷 𝑥
Example Objective Function for D
(G is fixed)
𝐷∗ = 𝑎𝑟𝑔 max
𝐷
𝑉 𝐷, 𝐺Training:
Using the example objective
function is exactly the same as
training a binary classifier.
[Goodfellow, et al., NIPS, 2014]
The maximum objective value
is related to JS divergence.
Sigmoid Output
max
𝐷
𝑉 𝐺, 𝐷
• Given G, what is the optimal D* maximizing
• Given x, the optimal D* maximizing
𝑉 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎
𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼𝑃 𝐺
𝑙𝑜𝑔 1 − 𝐷 𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 + 𝑃𝐺 𝑥 𝑙𝑜𝑔 1 − 𝐷 𝑥
= න
𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 𝑑𝑥 + න
𝑥
𝑃𝐺 𝑥 𝑙𝑜𝑔 1 − 𝐷 𝑥 𝑑𝑥
= න
𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 + 𝑃𝐺 𝑥 𝑙𝑜𝑔 1 − 𝐷 𝑥 𝑑𝑥
Assume that D(x) can be any function
𝑉 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎
𝑙𝑜𝑔𝐷 𝑥
+𝐸 𝑥∼𝑃 𝐺
𝑙𝑜𝑔 1 − 𝐷 𝑥
max
𝐷
𝑉 𝐺, 𝐷
• Given x, the optimal D* maximizing
• Find D* maximizing: f 𝐷 = a𝑙𝑜𝑔(𝐷) + 𝑏𝑙𝑜𝑔 1 − 𝐷
𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 + 𝑃𝐺 𝑥 𝑙𝑜𝑔 1 − 𝐷 𝑥
𝑑f 𝐷
𝑑𝐷
= 𝑎 ×
1
𝐷
+ 𝑏 ×
1
1 − 𝐷
× −1 = 0
𝑎 ×
1
𝐷∗
= 𝑏 ×
1
1 − 𝐷∗
𝑎 × 1 − 𝐷∗ = 𝑏 × 𝐷∗
𝑎 − 𝑎𝐷∗ = 𝑏𝐷∗
𝐷∗ =
𝑎
𝑎 + 𝑏
𝐷∗
𝑥 =
𝑃𝑑𝑎𝑡𝑎 𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥
a D b D
0 < < 1
𝑎 = 𝑎 + 𝑏 𝐷∗
𝑉 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎
𝑙𝑜𝑔𝐷 𝑥
+𝐸 𝑥∼𝑃 𝐺
𝑙𝑜𝑔 1 − 𝐷 𝑥
max
𝐷
𝑉 𝐺, 𝐷
= 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎
𝑙𝑜𝑔
𝑃𝑑𝑎𝑡𝑎 𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥
+𝐸 𝑥∼𝑃 𝐺
𝑙𝑜𝑔
𝑃𝐺 𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥
= න
𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔
𝑃𝑑𝑎𝑡𝑎 𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥
𝑑𝑥
max
𝐷
𝑉 𝐺, 𝐷
+ න
𝑥
𝑃𝐺 𝑥 𝑙𝑜𝑔
𝑃𝐺 𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥
𝑑𝑥
2
2
1
2
1
2
+2𝑙𝑜𝑔
1
2
𝐷∗ 𝑥 =
𝑃𝑑𝑎𝑡𝑎 𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥
= 𝑉 𝐺, 𝐷∗
𝑉 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎
𝑙𝑜𝑔𝐷 𝑥
+𝐸 𝑥∼𝑃 𝐺
𝑙𝑜𝑔 1 − 𝐷 𝑥
−2𝑙𝑜𝑔2
max
𝐷
𝑉 𝐺, 𝐷
= −2log2 + KL Pdata||
Pdata + PG
2
= −2𝑙𝑜𝑔2 + 2𝐽𝑆𝐷 𝑃𝑑𝑎𝑡𝑎||𝑃𝐺 Jensen-Shannon divergence
= −2𝑙𝑜𝑔2 + න
𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔
𝑃𝑑𝑎𝑡𝑎 𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥 /2
𝑑𝑥
+ න
𝑥
𝑃𝐺 𝑥 𝑙𝑜𝑔
𝑃𝐺 𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥 /2
𝑑𝑥
+KL PG||
Pdata + PG
2
max
𝐷
𝑉 𝐺, 𝐷 𝐷∗ 𝑥 =
𝑃𝑑𝑎𝑡𝑎 𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥
= 𝑉 𝐺, 𝐷∗
Discriminator 𝐺∗
= 𝑎𝑟𝑔 min
𝐺
𝐷𝑖𝑣 𝑃𝐺, 𝑃𝑑𝑎𝑡𝑎
Discriminator
: data sampled from 𝑃𝑑𝑎𝑡𝑎
: data sampled from 𝑃𝐺
train
hard to discriminatesmall divergence
Discriminator
train
easy to discriminatelarge divergence
𝐷∗ = 𝑎𝑟𝑔 max
𝐷
𝑉 𝐷, 𝐺
Training:
(cannot make objective large)
𝐺1 𝐺2 𝐺3
𝑉 𝐺1 , 𝐷 𝑉 𝐺2 , 𝐷 𝑉 𝐺3 , 𝐷
𝐷 𝐷 𝐷
𝑉 𝐺1 , 𝐷1
∗
Divergence between 𝑃𝐺1
and 𝑃𝑑𝑎𝑡𝑎
𝐺∗ = 𝑎𝑟𝑔 min
𝐺
𝐷𝑖𝑣 𝑃𝐺, 𝑃𝑑𝑎𝑡𝑎
The maximum objective value
is related to JS divergence.
𝐷∗
= 𝑎𝑟𝑔 max
𝐷
𝑉 𝐷, 𝐺
max
𝐷
𝑉 𝐺, 𝐷
𝐺∗
= 𝑎𝑟𝑔 min
𝐺
𝐷𝑖𝑣 𝑃𝐺, 𝑃𝑑𝑎𝑡𝑎max
𝐷
𝑉 𝐺, 𝐷
The maximum objective value
is related to JS divergence.
• Initialize generator and discriminator
• In each training iteration:
Step 1: Fix generator G, and update discriminator D
Step 2: Fix discriminator D, and update generator G
𝐷∗ = 𝑎𝑟𝑔 max
𝐷
𝑉 𝐷, 𝐺
[Goodfellow, et al., NIPS, 2014]
Algorithm
• To find the best G minimizing the loss function 𝐿 𝐺 ,
𝜃 𝐺 ← 𝜃 𝐺 − 𝜂 Τ𝜕𝐿 𝐺 𝜕𝜃 𝐺
𝑓 𝑥 = max{𝑓1 𝑥 , 𝑓2 𝑥 , 𝑓3 𝑥 }
𝑑𝑓 𝑥
𝑑𝑥
=?
𝑓1 𝑥
𝑓2 𝑥
𝑓3 𝑥
Τ𝑑𝑓1 𝑥 𝑑𝑥 Τ𝑑𝑓2 𝑥 𝑑𝑥 Τ𝑑𝑓3 𝑥 𝑑𝑥
Τ𝑑𝑓𝑖 𝑥 𝑑𝑥
If 𝑓𝑖 𝑥 is the
max one
𝐺∗ = 𝑎𝑟𝑔 min
𝐺
max
𝐷
𝑉 𝐺, 𝐷
𝐿 𝐺
𝜃 𝐺 defines G
Algorithm
• Given 𝐺0
• Find 𝐷0
∗
maximizing 𝑉 𝐺0, 𝐷
• 𝜃 𝐺 ← 𝜃 𝐺 − 𝜂 Τ𝜕𝑉 𝐺, 𝐷0
∗
𝜕𝜃 𝐺 Obtain 𝐺1
• Find 𝐷1
∗
maximizing 𝑉 𝐺1, 𝐷
• 𝜃 𝐺 ← 𝜃 𝐺 − 𝜂 Τ𝜕𝑉 𝐺, 𝐷1
∗
𝜕𝜃 𝐺 Obtain 𝐺2
• ……
𝑉 𝐺0, 𝐷0
∗
is the JS divergence between 𝑃𝑑𝑎𝑡𝑎 𝑥 and 𝑃𝐺0
𝑥
𝑉 𝐺1, 𝐷1
∗
is the JS divergence between 𝑃𝑑𝑎𝑡𝑎 𝑥 and 𝑃𝐺1
𝑥
𝐺∗ = 𝑎𝑟𝑔 min
𝐺
max
𝐷
𝑉 𝐺, 𝐷
𝐿 𝐺
Decrease JS
divergence(?)
Decrease JS
divergence(?)
Using Gradient Ascent
Algorithm
• Given 𝐺0
• Find 𝐷0
∗
maximizing 𝑉 𝐺0, 𝐷
• 𝜃 𝐺 ← 𝜃 𝐺 − 𝜂 Τ𝜕𝑉 𝐺, 𝐷0
∗
𝜕𝜃 𝐺 Obtain 𝐺1
𝑉 𝐺0, 𝐷0
∗
is the JS divergence between 𝑃𝑑𝑎𝑡𝑎 𝑥 and 𝑃𝐺0
𝑥
𝐺∗ = 𝑎𝑟𝑔 min
𝐺
max
𝐷
𝑉 𝐺, 𝐷
𝐿 𝐺
Decrease JS
divergence(?)
𝑉 𝐺0 , 𝐷
𝐷0
∗
𝑉 𝐺1 , 𝐷
𝐷0
∗
𝑉 𝐺0 , 𝐷0
∗
𝑉 𝐺1 , 𝐷0
∗
smaller
𝑉 𝐺1 , 𝐷1
∗ ……
Assume 𝐷0
∗
≈ 𝐷1
∗
Don’t update G
too much
In practice …
• Given G, how to compute max
𝐷
𝑉 𝐺, 𝐷
• Sample 𝑥1, 𝑥2, … , 𝑥 𝑚 from 𝑃𝑑𝑎𝑡𝑎 𝑥 , sample
෤𝑥1, ෤𝑥2, … , ෤𝑥 𝑚 from generator 𝑃𝐺 𝑥
෨𝑉 =
1
𝑚
෍
𝑖=1
𝑚
𝑙𝑜𝑔𝐷 𝑥 𝑖 +
1
𝑚
෍
𝑖=1
𝑚
𝑙𝑜𝑔 1 − 𝐷 ෤𝑥 𝑖Maximize
𝑉 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎
𝑙𝑜𝑔𝐷 𝑥
+𝐸 𝑥∼𝑃 𝐺
𝑙𝑜𝑔 1 − 𝐷 𝑥
Minimize Cross-entropy
Binary Classifier
𝑥1, 𝑥2, … , 𝑥 𝑚 from 𝑃𝑑𝑎𝑡𝑎 𝑥
෤𝑥1, ෤𝑥2, … , ෤𝑥 𝑚 from 𝑃𝐺 𝑥
D is a binary classifier with sigmoid output (can be deep)
Positive examples
Negative examples
=
• In each training iteration:
• Sample m examples 𝑥1, 𝑥2, … , 𝑥 𝑚 from data distribution
𝑃𝑑𝑎𝑡𝑎 𝑥
• Sample m noise samples 𝑧1, 𝑧2, … , 𝑧 𝑚 from the prior
𝑃𝑝𝑟𝑖𝑜𝑟 𝑧
• Obtaining generated data ෤𝑥1
, ෤𝑥2
, … , ෤𝑥 𝑚
, ෤𝑥 𝑖
= 𝐺 𝑧 𝑖
• Update discriminator parameters 𝜃 𝑑 to maximize
• ෨𝑉 =
1
𝑚
σ𝑖=1
𝑚
𝑙𝑜𝑔𝐷 𝑥 𝑖 +
1
𝑚
σ𝑖=1
𝑚
𝑙𝑜𝑔 1 − 𝐷 ෤𝑥 𝑖
• 𝜃 𝑑 ← 𝜃 𝑑 + 𝜂𝛻 ෨𝑉 𝜃 𝑑
• Sample another m noise samples 𝑧1, 𝑧2, … , 𝑧 𝑚 from the
prior 𝑃𝑝𝑟𝑖𝑜𝑟 𝑧
• Update generator parameters 𝜃𝑔 to minimize
• ෨𝑉 =
1
𝑚
σ𝑖=1
𝑚
𝑙𝑜𝑔𝐷 𝑥 𝑖 +
1
𝑚
σ𝑖=1
𝑚
𝑙𝑜𝑔 1 − 𝐷 𝐺 𝑧 𝑖
• 𝜃𝑔 ← 𝜃𝑔 − 𝜂𝛻 ෨𝑉 𝜃𝑔
Algorithm
Repeat
k times
Learning
D
Learning
G
Initialize 𝜃 𝑑 for D and 𝜃𝑔 for G
max
𝐷
𝑉 𝐺, 𝐷Can only find
lower bound of
Only
Once
Objective Function for Generator
in Real Implementation
𝑉 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎
𝑙𝑜𝑔𝐷 𝑥
𝑉 = 𝐸 𝑥∼𝑃 𝐺
−𝑙𝑜𝑔 𝐷 𝑥
Real implementation:
label x from PG as positive
−𝑙𝑜𝑔 𝐷 𝑥
𝑙𝑜𝑔 1 − 𝐷 𝑥
+𝐸 𝑥∼𝑃 𝐺
𝑙𝑜𝑔 1 − 𝐷 𝑥
𝐷 𝑥
Slow at the beginning
Minimax GAN (MMGAN)
Non-saturating GAN (NSGAN)
Using the divergence
you like 
[Sebastian Nowozin, et al., NIPS,
2016]
Can we use other divergence?
Outline of Part 1
Generation by GAN
• Image Generation as Example
• Theory behind GAN
• Issues and Possible Solutions
• Evaluation
Conditional Generation
Feature Extraction
Unsupervised Conditional Generation
Relation to Reinforcement Learning
More tips and tricks:
https://github.com/soumith/ganhacks
JS divergence is not suitable
• In most cases, 𝑃𝐺 and 𝑃𝑑𝑎𝑡𝑎 are not overlapped.
• 1. The nature of data
• 2. Sampling
Both 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺 are low-dim
manifold in high-dim space.
𝑃𝑑𝑎𝑡𝑎
𝑃𝐺
The overlap can be ignored.
Even though 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺
have overlap.
If you do not have enough
sampling ……
𝑃𝑑𝑎𝑡𝑎𝑃𝐺0 𝑃𝑑𝑎𝑡𝑎𝑃𝐺1
𝐽𝑆 𝑃𝐺0
, 𝑃𝑑𝑎𝑡𝑎
= 𝑙𝑜𝑔2
𝑃𝑑𝑎𝑡𝑎𝑃𝐺100
……
𝐽𝑆 𝑃𝐺1
, 𝑃𝑑𝑎𝑡𝑎
= 𝑙𝑜𝑔2
𝐽𝑆 𝑃𝐺100
, 𝑃𝑑𝑎𝑡𝑎
= 0
What is the problem of JS divergence?
……
JS divergence is log2 if two distributions do not overlap.
Intuition: If two distributions do not overlap, binary classifier
achieves 100% accuracy
Equally bad
Same objective value is obtained. Same divergence
Wasserstein GAN (WGAN):
Earth Mover’s Distance
• Considering one distribution P as a pile of earth,
and another distribution Q as the target
• The average distance the earth mover has to move
the earth.
𝑃 𝑄
d
𝑊 𝑃, 𝑄 = 𝑑
Wasserstein distance
Source of image: https://vincentherrmann.github.io/blog/wasserstein/
𝑃
𝑄
Using the “moving plan” with the smallest average distance to
define the Wasserstein distance.
There are many possible “moving plans”.
Smaller
distance?
Larger
distance?
WGAN: Earth Mover’s Distance
Source of image: https://vincentherrmann.github.io/blog/wasserstein/
𝑃
𝑄
Using the “moving plan” with the smallest average distance to
define the earth mover’s distance.
There many possible “moving plans”.
Best “moving plans” of this example
A “moving plan” is a matrix
The value of the element is the
amount of earth from one
position to another.
moving plan 𝛾
All possible plan Π
𝐵 𝛾 = ෍
𝑥 𝑝,𝑥 𝑞
𝛾 𝑥 𝑝, 𝑥 𝑞 𝑥 𝑝 − 𝑥 𝑞
Average distance of a plan 𝛾:
Earth Mover’s Distance:
𝑊 𝑃, 𝑄 = min
𝛾∈Π
𝐵 𝛾
The best plan
𝑃
𝑄
𝑥 𝑝
𝑥 𝑞
𝑃𝑑𝑎𝑡𝑎𝑃𝐺0 𝑃𝑑𝑎𝑡𝑎𝑃𝐺50
𝐽𝑆 𝑃𝐺0
, 𝑃𝑑𝑎𝑡𝑎
= 𝑙𝑜𝑔2
𝑃𝑑𝑎𝑡𝑎𝑃𝐺100
…… ……
𝑑0
𝑑50
𝐽𝑆 𝑃𝐺50
, 𝑃𝑑𝑎𝑡𝑎
= 𝑙𝑜𝑔2
𝐽𝑆 𝑃𝐺100
, 𝑃𝑑𝑎𝑡𝑎
= 0
𝑊 𝑃𝐺0
, 𝑃𝑑𝑎𝑡𝑎
= 𝑑0
𝑊 𝑃𝐺50
, 𝑃𝑑𝑎𝑡𝑎
= 𝑑50
𝑊 𝑃𝐺100
, 𝑃𝑑𝑎𝑡𝑎
= 0
Why Earth Mover’s Distance?
𝐷𝑓 𝑃𝑑𝑎𝑡𝑎||𝑃𝐺
𝑊 𝑃𝑑𝑎𝑡𝑎, 𝑃𝐺
WGAN
𝑉 𝐺, 𝐷
= max
𝐷∈1−𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧
𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎
𝐷 𝑥 − 𝐸 𝑥~𝑃 𝐺
𝐷 𝑥
Evaluate wasserstein distance between 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺
[Martin Arjovsky, et al., arXiv, 2017]
D has to be smooth enough.
real
−∞
generated
D
∞
Without the constraint, the
training of D will not converge.
Keeping the D smooth forces
D(x) become ∞ and −∞
WGAN
𝑉 𝐺, 𝐷
= max
𝐷∈1−𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧
𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎
𝐷 𝑥 − 𝐸 𝑥~𝑃 𝐺
𝐷 𝑥
Evaluate wasserstein distance between 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺
[Martin Arjovsky, et al., arXiv, 2017]
How to fulfill this constraint?D has to be smooth enough.
𝑓 𝑥1 − 𝑓 𝑥2 ≤ 𝐾 𝑥1 − 𝑥2
Lipschitz Function
K=1 for "1 − 𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧"
Output
change
Input
change 1−Lipschitz
not 1−Lipschitz
Do not change fast
Weight Clipping
Force the parameters w between c and -c
After parameter update, if w > c, w = c;
if w < -c, w = -c
𝑉 𝐺, 𝐷
= max
𝐷∈1−𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧
𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎
𝐷 𝑥 − 𝐸 𝑥~𝑃 𝐺
𝐷 𝑥
≈ max
𝐷
{𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎
𝐷 𝑥 − 𝐸 𝑥~𝑃 𝐺
𝐷 𝑥
−𝜆 ‫׬‬𝑥
𝑚𝑎𝑥 0, 𝛻𝑥 𝐷 𝑥 − 1 𝑑𝑥}
−𝜆𝐸 𝑥~𝑃 𝑝𝑒𝑛𝑎𝑙𝑡𝑦
𝑚𝑎𝑥 0, 𝛻𝑥 𝐷 𝑥 − 1 }
A differentiable function is 1-Lipschitz if and only if
it has gradients with norm less than or equal to 1
everywhere. 𝛻𝑥 𝐷 𝑥 ≤ 1 for all x
Improved WGAN (WGAN-GP)
𝑉 𝐺, 𝐷
𝐷 ∈ 1 − 𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧
Prefer 𝛻𝑥 𝐷 𝑥 ≤ 1 for all x
Prefer 𝛻𝑥 𝐷 𝑥 ≤ 1 for x sampling from 𝑥~𝑃𝑝𝑒𝑛𝑎𝑙𝑡𝑦
Improved WGAN (WGAN-GP)
𝑃𝑑𝑎𝑡𝑎 𝑃𝐺
Only give gradient constraint to the region between 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺
because they influence how 𝑃𝐺 moves to 𝑃𝑑𝑎𝑡𝑎
−𝜆𝐸 𝑥~𝑃 𝑝𝑒𝑛𝑎𝑙𝑡𝑦
𝑚𝑎𝑥 0, 𝛻𝑥 𝐷 𝑥 − 1 }
≈ max
𝐷
{𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎
𝐷 𝑥 − 𝐸 𝑥~𝑃 𝐺
𝐷 𝑥𝑉 𝐺, 𝐷
𝑃𝑝𝑒𝑛𝑎𝑙𝑡𝑦
“Given that enforcing the Lipschitz constraint everywhere
is intractable, enforcing it only along these straight lines
seems sufficient and experimentally results in good
performance.”
“Simply penalizing overly
large gradients also works in
theory, but experimentally we
found that this approach
converged faster and to
better optima.”
Improved WGAN (WGAN-GP)
𝑃𝑑𝑎𝑡𝑎
𝑃𝐺
−𝜆𝐸 𝑥~𝑃 𝑝𝑒𝑛𝑎𝑙𝑡𝑦
𝑚𝑎𝑥 0, 𝛻𝑥 𝐷 𝑥 − 1 }
≈ max
𝐷
{𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎
𝐷 𝑥 − 𝐸 𝑥~𝑃 𝐺
𝐷 𝑥𝑉 𝐺, 𝐷
𝑃𝑝𝑒𝑛𝑎𝑙𝑡𝑦
𝛻𝑥 𝐷 𝑥 − 1 2
𝐷 𝑥𝐷 𝑥
The gradient norm
in this region = 1
More: Improved the improved WGAN
Spectrum Norm Spectral Normalization → Keep
gradient norm smaller than 1
everywhere [Miyato, et al., ICLR, 2018]
• In each training iteration:
• Sample m examples 𝑥1, 𝑥2, … , 𝑥 𝑚 from data distribution
𝑃𝑑𝑎𝑡𝑎 𝑥
• Sample m noise samples 𝑧1, 𝑧2, … , 𝑧 𝑚 from the prior
𝑃𝑝𝑟𝑖𝑜𝑟 𝑧
• Obtaining generated data ෤𝑥1
, ෤𝑥2
, … , ෤𝑥 𝑚
, ෤𝑥 𝑖
= 𝐺 𝑧 𝑖
• Update discriminator parameters 𝜃 𝑑 to maximize
• ෨𝑉 =
1
𝑚
σ𝑖=1
𝑚
𝑙𝑜𝑔𝐷 𝑥 𝑖 +
1
𝑚
σ𝑖=1
𝑚
𝑙𝑜𝑔 1 − 𝐷 ෤𝑥 𝑖
• 𝜃 𝑑 ← 𝜃 𝑑 + 𝜂𝛻 ෨𝑉 𝜃 𝑑
• Sample another m noise samples 𝑧1, 𝑧2, … , 𝑧 𝑚 from the
prior 𝑃𝑝𝑟𝑖𝑜𝑟 𝑧
• Update generator parameters 𝜃𝑔 to minimize
• ෨𝑉 =
1
𝑚
σ𝑖=1
𝑚
𝑙𝑜𝑔𝐷 𝑥 𝑖 +
1
𝑚
σ𝑖=1
𝑚
𝑙𝑜𝑔 1 − 𝐷 𝐺 𝑧 𝑖
• 𝜃𝑔 ← 𝜃𝑔 − 𝜂𝛻 ෨𝑉 𝜃𝑔
Algorithm of Original GAN
Repeat
k times
Learning
D
Learning
G
Only
Once
𝐷 𝑥 𝑖
𝐷 ෤𝑥 𝑖−
No sigmoid for the output of D
WGAN
Weight clipping /
Gradient Penalty …
𝐷 𝐺 𝑧 𝑖−
Energy-based GAN (EBGAN)
• Using an autoencoder as discriminator D
Discriminator
0 for the
best images
Generator is
the same.
-
0.1
EN DE
Autoencoder
X -1 -0.1
[Junbo Zhao, et al., arXiv, 2016]
Using the negative reconstruction error of
auto-encoder to determine the goodness
Benefit: The auto-encoder can be pre-train
by real images without generator.
EBGAN
realgen gen
Hard to reconstruct, easy to destroy
m
0 is for
the best.
Do not have to
be very negative
Auto-encoder based discriminator
only gives limited region large value.
Mode Collapse
: real data
: generated data
Training with too many iterations ……
Mode Dropping
BEGAN on CelebA
Generator
at iteration t
Generator
at iteration t+1
Generator
at iteration t+2
Generator switches mode during training
Ensemble
Generator
1
Generator
2
……
……
Train a set of generators: 𝐺1, 𝐺2, ⋯ , 𝐺 𝑁
Random pick a generator 𝐺𝑖
Use 𝐺𝑖 to generate the image
To generate an image
Outline of Part 1
Generation by GAN
• Image Generation as Example
• Theory behind GAN
• Issues and Possible Solutions
• Evaluation
Conditional Generation
Feature Extraction
Unsupervised Conditional Generation
Relation to Reinforcement Learning
Likelihood
: real data (not observed
during training)
: generated data
𝑥 𝑖
Log Likelihood:
Generator
PG
𝐿 =
1
𝑁
෍
𝑖
𝑙𝑜𝑔𝑃𝐺 𝑥 𝑖
We cannot compute 𝑃𝐺 𝑥 𝑖 . We can only
sample from 𝑃𝐺.
Prior
Distribution
Likelihood
- Kernel Density Estimation
• Estimate the distribution of PG(x) from sampling
Generator
PG
Now we have an approximation of 𝑃𝐺, so we can
compute 𝑃𝐺 𝑥 𝑖 for each real data 𝑥 𝑖
Then we can compute the likelihood.
Each sample is the mean of a
Gaussian with the same covariance.
Likelihood v.s. Quality
• Low likelihood, high quality?
• High likelihood, low quality?
Considering a model generating good images (small variance)
Generator
PG
Generated data Data for evaluation 𝑥 𝑖
𝑃𝐺 𝑥 𝑖 = 0
G1
𝐿 =
1
𝑁
෍
𝑖
𝑙𝑜𝑔𝑃𝐺 𝑥 𝑖
G2
X 0.99
X 0.01
= −𝑙𝑜𝑔100 +
1
𝑁
෍
𝑖
𝑙𝑜𝑔𝑃𝐺 𝑥 𝑖
4.6100
Objective Evaluation
Off-the-shelf
Image Classifier
𝑥 𝑃 𝑦|𝑥
Concentrated distribution
means higher visual quality
CNN𝑥1
𝑃 𝑦1
|𝑥1
Uniform distribution
means higher variety
CNN𝑥2
𝑃 𝑦2
|𝑥2
CNN𝑥3
𝑃 𝑦3|𝑥3
…
𝑃 𝑦 =
1
𝑁
෍
𝑛
𝑃 𝑦 𝑛|𝑥 𝑛
[Tim Salimans, et al., NIPS, 201
𝑥: image
𝑦: class (output of CNN)
e.g. Inception net,
VGG, etc.
class 1
class 2
class 3
Objective Evaluation
= ෍
𝑥
෍
𝑦
𝑃 𝑦|𝑥 𝑙𝑜𝑔𝑃 𝑦|𝑥
− ෍
𝑦
𝑃 𝑦 𝑙𝑜𝑔𝑃 𝑦
Negative entropy of P(y|x)
Entropy of P(y)
Inception Score
𝑃 𝑦 =
1
𝑁
෍
𝑛
𝑃 𝑦 𝑛|𝑥 𝑛
𝑃 𝑦|𝑥
class 1
class 2
class 3
We don’t want memory GAN.
• Using k-nearest neighbor to check whether the
generator generates new objects
https://arxiv.org/pdf/1511.01844.pdf
Mode Dropping
Sanjeev Arora, Andrej Risteski, Yi Zhang, “Do GANs
learn the distribution? Some Theory and Empirics”, ICLR,
2018
If you sample 400 images,
with probability > 50%,
You would sample “identical” images.
There are approximately
0.16M different faces.
DCGAN
ALI
There are approximately
1M different faces.
Outline of Part 1
Generation by GAN
• Image Generation as Example
• Theory behind GAN
• Issues and Possible Solutions
• Evaluation
Conditional Generation
Feature Extraction
Unsupervised Conditional Generation
Relation to Reinforcement Learning
Target of
NN output
Text-to-Image
• Traditional supervised approach
NN Image
Text: “train”
a dog is running
a bird is flying
A blurry image!
c1: a dog is running
as close as
possible
Conditional GAN
D
(original)
scalar𝑥
G
𝑧Normal distribution
x = G(c,z)
c: train
x is real image or not
Image
Real images:
Generated images:
1
0
Generator will learn to
generate realistic images ….
But completely ignore the
input conditions.
[Scott Reed, et al, ICML, 2016]
Conditional GAN
D
(better)
scalar
𝑐
𝑥
True text-image pairs:
G
𝑧Normal distribution
x = G(c,z)
c: train
Image
x is realistic or not +
c and x are matched or not
(train , )
(train , )(cat , )
[Scott Reed, et al, ICML, 2016]
1
00
• In each training iteration:
• Sample m positive examples 𝑐1
, 𝑥1
, 𝑐2
, 𝑥2
, … , 𝑐 𝑚
, 𝑥 𝑚
from database
• Sample m noise samples 𝑧1, 𝑧2, … , 𝑧 𝑚 from a distribution
• Obtaining generated data ෤𝑥1, ෤𝑥2, … , ෤𝑥 𝑚 , ෤𝑥 𝑖 = 𝐺 𝑐 𝑖, 𝑧 𝑖
• Sample m objects ො𝑥1, ො𝑥2, … , ො𝑥 𝑚 from database
• Update discriminator parameters 𝜃 𝑑 to maximize
• ෨𝑉 =
1
𝑚
σ𝑖=1
𝑚
𝑙𝑜𝑔𝐷 𝑐 𝑖, 𝑥 𝑖
+
1
𝑚
෍
𝑖=1
𝑚
𝑙𝑜𝑔 1 − 𝐷 𝑐 𝑖, ෤𝑥 𝑖 +
1
𝑚
෍
𝑖=1
𝑚
𝑙𝑜𝑔 1 − 𝐷 𝑐 𝑖, ො𝑥 𝑖
• 𝜃 𝑑 ← 𝜃 𝑑 + 𝜂𝛻 ෨𝑉 𝜃 𝑑
• Sample m noise samples 𝑧1
, 𝑧2
, … , 𝑧 𝑚
from a distribution
• Sample m conditions 𝑐1, 𝑐2, … , 𝑐 𝑚 from a database
• Update generator parameters 𝜃𝑔 to maximize
• ෨𝑉 =
1
𝑚
σ𝑖=1
𝑚
𝑙𝑜𝑔 𝐷 𝐺 𝑐 𝑖, 𝑧 𝑖 , 𝜃𝑔 ← 𝜃𝑔 + 𝜂𝛻 ෨𝑉 𝜃𝑔
Learning
D
Learning
G
x is realistic or not +
c and x are matched
or not
Conditional GAN - Discriminator
[Takeru Miyato, et al., ICLR, 2018]
[Han Zhang, et al., arXiv, 2017]
[Augustus Odena et al., ICML, 2017]
condition c
object x
Network
Network
Network
score
Network
Network
(almost every paper)
condition c
object x
c and x are matched
or not
x is realistic or not
Conditional GAN
paired data
blue eyes
red hair
short hair
Collecting anime faces
and the description of its
characteristics
red hair,
green eyes
blue hair,
red eyes
圖片生成:
吳宗翰、謝濬丞、
陳延昊、錢柏均
Stack GAN
Han Zhang, Tao Xu, Hongsheng
Li, Shaoting Zhang, Xiaogang
Wang, Xiaolei Huang, Dimitris Metaxas,
“StackGAN: Text to Photo-realistic Image
Synthesis with Stacked Generative Adversarial
Networks”, ICCV, 2017
Image-to-image
https://arxiv.org/pdf/1611.07004
G
𝑧
x = G(c,z)
𝑐
as close as
possible
Image-to-image
• Traditional supervised approach
NN Image
It is blurry because it is the
average of several images.
Testing:
input close
Image-to-image
• Experimental results
Testing:
input close GAN
G
𝑧
Image D scalar
GAN + close
Patch GAN
D
score
D D
score score
https://arxiv.org/pdf/1611.07004.pdf
Image super resolution
• Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew
Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes
Totz, Zehan Wang, Wenzhe Shi, “Photo-Realistic Single Image Super-Resolution
Using a Generative Adversarial Network”, CVPR, 2016
Image Completion
http://hi.cs.waseda.ac.jp/~iizu
ka/projects/completion/en/
Demo
https://www.youtube.com/watch?v=5Ua4NUKowPU
http://hi.cs.waseda.ac.jp/~iizuka/projects/com
pletion/en/
Waseda University
Video Generation
Generator
Discrimi
nator
Last frame is real
or generated
Discriminator thinks it is real
target
Minimize
distance
https://github.com/dyelax/Adversarial_Video_Generation
Outline of Part 1
Generation by GAN
• Image Generation as Example
• Theory behind GAN
• Issues and Possible Solutions
• Evaluation
Conditional Generation
Feature Extraction
Unsupervised Conditional Generation
Relation to Reinforcement Learning
InfoGAN
Regular
GAN
Modifying a specific dimension,
no clear meaning
What we expect Actually …
(The colors
represents the
characteristics.)
What is InfoGAN?
Discrimi
nator
Classifier
scalarGenerator=z
z’
c
x
c
Predict the code c
that generates x
“Auto-
encoder”
Parameter
sharing
(only the last
layer is different)
encoder
decoder
What is InfoGAN?
Classifier
Generator=z
z’
c
x
c
Predict the code c
that generates x
“Auto-
encoder”
encoder
decoder
c must have clear
influence on x
The classifier can
recover c from x.
https://arxiv.org/abs/1606.03657
Modifying Input Code
Generator
0.1
−3
⋮
2.4
0.9
Generator
3
−3
⋮
2.4
0.9
Generator
0.1
−3
⋮
6.4
0.9
Generator
0.1
−3
⋮
2.4
3.5
Each dimension of input vector
represents some characteristics.
Longer hair
blue hair Open mouth
The input code determines the generator output.
Understand the meaning of each dimension to
control the output.
GAN+Autoencoder
• We have a generator (input z, output x)
• However, given x, how can we find z?
• Learn an encoder (input x, output z)
Generator
(Decoder)
Encoder z xx
as close as possible
fixed
Discriminator
Init (only modify the last layer)
Autoencoder
Attribute Representation
𝑧𝑙𝑜𝑛𝑔 =
1
𝑁1
෍
𝑥∈𝑙𝑜𝑛𝑔
𝐸𝑛 𝑥 −
1
𝑁2
෍
𝑥′∉𝑙𝑜𝑛𝑔
𝐸𝑛 𝑥′
Short
Hair
Long
Hair
𝐸𝑛 𝑥 + 𝑧𝑙𝑜𝑛𝑔 = 𝑧′𝑥 𝐺𝑒𝑛 𝑧′
𝑧𝑙𝑜𝑛𝑔
https://www.youtube.com/watch?v=kPEIJJsQr7U
Photo
Editing
https://www.youtube.com/watch?v=9c4z6YsBGQ0
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman and Alexei A. Efros. "Generative
Visual Manipulation on the Natural Image Manifold", ECCV, 2016.
Andrew Brock, Theodore Lim, J.M.
Ritchie, Nick Weston, Neural Photo Editing with
Introspective Adversarial Networks, arXiv preprint,
2017
Basic Idea
space of
code z
Why move on the code space?
Fulfill the
constraint
Generator
(Decoder)
Encoder z xx
as close as possible
Back to z
• Method 1
• Method 2
• Method 3
𝑧∗ = 𝑎𝑟𝑔 min
𝑧
𝐿 𝐺 𝑧 , 𝑥 𝑇
𝑥 𝑇
𝑧∗
Difference between G(z) and xT
 Pixel-wise
 By another network
Using the results from method 2 as the initialization of method 1
Gradient Descent
Editing Photos
• z0 is the code of the input image
𝑧∗ = 𝑎𝑟𝑔 min
𝑧
𝑈 𝐺 𝑧 + 𝜆1 𝑧 − 𝑧0
2 − 𝜆2 𝐷 𝐺 𝑧
Does it fulfill the constraint of editing?
Not too far away from
the original image
Using discriminator
to check the image is
realistic or notimage
VAE-GAN
Discrimi
nator
Generator
(Decoder)
Encoder z x scalarx
as close as possible
VAE
GAN
 Minimize
reconstruction error
 z close to normal
 Minimize
reconstruction error
 Cheat discriminator
 Discriminate real,
generated and
reconstructed images
Discriminator provides the reconstruction loss
Anders Boesen, Lindbo Larsen, Søren Kaae
Sønderby, Hugo Larochelle, Ole Winther,
“Autoencoding beyond pixels using a learned similarity
metric”, ICML. 2016
• Initialize En, De, Dis
• In each iteration:
• Sample M images 𝑥1, 𝑥2, ⋯ , 𝑥 𝑀 from database
• Generate M codes ǁ𝑧1, ǁ𝑧2, ⋯ , ǁ𝑧 𝑀 from encoder
• ǁ𝑧 𝑖 = 𝐸𝑛 𝑥 𝑖
• Generate M images ෤𝑥1
, ෤𝑥2
, ⋯ , ෤𝑥 𝑀
from decoder
• ෤𝑥 𝑖 = 𝐷𝑒 ǁ𝑧 𝑖
• Sample M codes 𝑧1
, 𝑧2
, ⋯ , 𝑧 𝑀
from prior P(z)
• Generate M images ො𝑥1
, ො𝑥2
, ⋯ , ො𝑥 𝑀
from decoder
• ො𝑥 𝑖 = 𝐷𝑒 𝑧 𝑖
• Update En to decrease ෤𝑥 𝑖 − 𝑥 𝑖 , decrease
KL(P( ǁ𝑧 𝑖|xi)||P(z))
• Update De to decrease ෤𝑥 𝑖 − 𝑥 𝑖 , increase
𝐷𝑖𝑠 ෤𝑥 𝑖 and 𝐷𝑖𝑠 ො𝑥 𝑖
• Update Dis to increase 𝐷𝑖𝑠 𝑥 𝑖 , decrease
𝐷𝑖𝑠 ෤𝑥 𝑖
and 𝐷𝑖𝑠 ො𝑥 𝑖
Algorithm
Another kind of
discriminator:
Discriminator
x
real gen recon
BiGAN
Encoder Decoder Discriminator
Image x
code z
Image x
code z
Image x code z
from encoder
or decoder?
(real) (generated)
(from prior
distribution)
Algorithm
• Initialize encoder En, decoder De, discriminator Dis
• In each iteration:
• Sample M images 𝑥1, 𝑥2, ⋯ , 𝑥 𝑀 from database
• Generate M codes ǁ𝑧1
, ǁ𝑧2
, ⋯ , ǁ𝑧 𝑀
from encoder
• ǁ𝑧 𝑖 = 𝐸𝑛 𝑥 𝑖
• Sample M codes 𝑧1, 𝑧2, ⋯ , 𝑧 𝑀 from prior P(z)
• Generate M codes ෤𝑥1, ෤𝑥2, ⋯ , ෤𝑥 𝑀 from decoder
• ෤𝑥 𝑖 = 𝐷𝑒 𝑧 𝑖
• Update Dis to increase 𝐷𝑖𝑠 𝑥 𝑖, ǁ𝑧 𝑖 , decrease 𝐷𝑖𝑠 ෤𝑥 𝑖, 𝑧 𝑖
• Update En and De to decrease 𝐷𝑖𝑠 𝑥 𝑖
, ǁ𝑧 𝑖
, increase
𝐷𝑖𝑠 ෤𝑥 𝑖, 𝑧 𝑖
Encoder Decoder Discriminator
Image x
code z
Image x
code z
Image x code z
from encoder
or decoder?
(real) (generated)
(from prior
distribution)
𝑃 𝑥, 𝑧 𝑄 𝑥, 𝑧
Evaluate the
difference
between P and QP and Q would be the same
Optimal encoder
and decoder:
En(x’) = z’ De(z’) = x’
En(x’’) = z’’De(z’’) = x’’
For all x’
For all z’’
BiGAN
En(x’) = z’ De(z’) = x’
En(x’’) = z’’De(z’’) = x’’
For all x’
For all z’’
How about?
Encoder
Decoder
x
z
෤𝑥
Encoder
Decoder
x
z
ǁ𝑧
En, De
En, De
Optimal encoder
and decoder:
Domain Adversarial Training
• Training and testing data are in different domains
Training
data:
Testing
data:
Generator
Generator
The same
distribution
feature
feature
Take digit
classification as example
blue points
red points
Domain Adversarial Training
feature extractor (Generator)
Discriminator
(Domain classifier)
image Which domain?
Always output
zero vectors
Domain Classifier Fails
Domain Adversarial Training
feature extractor (Generator)
Discriminator
(Domain classifier)
image
Label predictor
Which digits?
Not only cheat the domain
classifier, but satisfying label
predictor at the same time
More speech- and text-related applications in the following.
Successfully applied on image classification
[Ganin et al, ICML, 2015][Ajakan et al. JMLR, 2016 ]
Which domain?
Domain-adversarial training
Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,
ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
Outline of Part 1
Generation by GAN
• Image Generation as Example
• Theory behind GAN
• Issues and Possible Solutions
• Evaluation
Conditional Generation
Feature Extraction
Unsupervised Conditional Generation
Relation to Reinforcement Learning
Domain X Domain Y
male female
It is good.
It’s a good day.
I love you.
It is bad.
It’s a bad day.
I don’t love you.
Unsupervised Conditional Generation
Transform an object from one domain to another
without paired data (e.g. style transfer)
GDomain X Domain Y
Unsupervised
Conditional Generation
• Approach 1: Direct Transformation
• Approach 2: Projection to Common Space
?𝐺 𝑋→𝑌
Domain X Domain Y
For texture or
color change
𝐸𝑁 𝑋 𝐷𝐸 𝑌
Encoder of
domain X
Decoder of
domain Y
Larger change, only keep the semantics
Domain YDomain X Face
Attribute
?
Direct Transformation
𝐺 𝑋→𝑌
Domain X
Domain Y
𝐷 𝑌
Domain Y
Domain X
scalar
Input image
belongs to
domain Y or not
Become similar
to domain Y
Direct Transformation
𝐺 𝑋→𝑌
Domain X
Domain Y
𝐷 𝑌
Domain Y
Domain X
scalar
Input image
belongs to
domain Y or not
Become similar
to domain Y
Not what we want!
ignore input
Direct Transformation
𝐺 𝑋→𝑌
Domain X
Domain Y
𝐷 𝑌
Domain X
scalar
Input image
belongs to
domain Y or not
Become similar
to domain Y
Not what we want!
ignore input
[Tomer Galanti, et al. ICLR, 2018]
The issue can be avoided by network design.
Simpler generator makes the input and
output more closely related.
Direct Transformation
𝐺 𝑋→𝑌
Domain X
Domain Y
𝐷 𝑌
Domain X
scalar
Input image
belongs to
domain Y or not
Become similar
to domain Y
Encoder
Network
Encoder
Network
pre-trained
as close as
possible
Baseline of DTN [Yaniv Taigman, et al., ICLR, 2017]
Direct Transformation
𝐺 𝑋→𝑌
𝐷 𝑌
Domain Y
scalar
Input image
belongs to
domain Y or not
𝐺Y→X
as close as possible
Lack of information
for reconstruction
[Jun-Yan Zhu, et al., ICCV, 201
Cycle consistency
Direct Transformation
𝐺 𝑋→𝑌 𝐺Y→X
as close as possible
𝐺Y→X 𝐺 𝑋→𝑌
as close as possible
𝐷 𝑌𝐷 𝑋
scalar: belongs to
domain Y or not
scalar: belongs to
domain X or not
Cycle GAN –
Silver Hair
• https://github.com/Aixile/c
hainer-cyclegan
Cycle GAN –
Silver Hair
• https://github.com/Aixile/c
hainer-cyclegan
Issue of Cycle Consistency
• CycleGAN: a Master of Steganography
[Casey Chu, et al., NIPS workshop, 2017]
𝐺Y→X𝐺 𝑋→𝑌
The information is hidden.
Cycle GAN
Dual GAN
Disco GAN
[Jun-Yan Zhu, et al., ICCV, 2017
[Zili Yi, et al., ICCV, 2017]
[Taeksoo Kim,
et al., ICML,
2017]
StarGAN
For multiple domains,
considering starGAN
[Yunjey Choi, arXiv, 2017]
StarGAN
StarGAN
StarGAN
Unsupervised
Conditional Generation
• Approach 1: Direct Transformation
• Approach 2: Projection to Common Space
?𝐺 𝑋→𝑌
Domain X Domain Y
For texture or
color change
𝐸𝑁 𝑋 𝐷𝐸 𝑌
Encoder of
domain X
Decoder of
domain Y
Larger change, only keep the semantics
Domain YDomain X Face
Attribute
Domain X Domain Y
𝐸𝑁 𝑋
𝐸𝑁𝑌 𝐷𝐸 𝑌
𝐷𝐸 𝑋image
image
image
image
Face
Attribute
Projection to Common Space
Target
Domain X Domain Y
𝐸𝑁 𝑋
𝐸𝑁𝑌 𝐷𝐸 𝑌
𝐷𝐸 𝑋image
image
image
image
Minimizing reconstruction error
Projection to Common Space
Training
𝐸𝑁 𝑋
𝐸𝑁𝑌 𝐷𝐸 𝑌
𝐷𝐸 𝑋image
image
image
image
Minimizing reconstruction error
Because we train two auto-encoders separately …
The images with the same attribute may not project
to the same position in the latent space.
𝐷 𝑋
𝐷 𝑌
Discriminator
of X domain
Discriminator
of Y domain
Minimizing reconstruction error
Projection to Common Space
Training
Sharing the parameters of encoders and decoders
Projection to Common Space
Training
𝐸𝑁 𝑋
𝐸𝑁𝑌
𝐷𝐸 𝑋
𝐷𝐸 𝑌
Couple GAN[Ming-Yu Liu, et al., NIPS, 2016]
UNIT[Ming-Yu Liu, et al., NIPS, 2017]
𝐸𝑁 𝑋
𝐸𝑁𝑌 𝐷𝐸 𝑌
𝐷𝐸 𝑋image
image
image
image
Minimizing reconstruction error
The domain discriminator forces the output of 𝐸𝑁𝑋 and
𝐸𝑁𝑌 have the same distribution.
From 𝐸𝑁𝑋 or 𝐸𝑁𝑌
𝐷 𝑋
𝐷 𝑌
Discriminator
of X domain
Discriminator
of Y domain
Projection to Common Space
Training
Domain
Discriminator
𝐸𝑁𝑋 and 𝐸𝑁𝑌 fool the
domain discriminator
[Guillaume Lample, et al., NIPS, 2017]
𝐸𝑁 𝑋
𝐸𝑁𝑌 𝐷𝐸 𝑌
𝐷𝐸 𝑋image
image
image
image
𝐷 𝑋
𝐷 𝑌
Discriminator
of X domain
Discriminator
of Y domain
Projection to Common Space
Training
Cycle Consistency:
Used in ComboGAN [Asha Anoosheh, et al., arXiv, 017]
Minimizing reconstruction error
𝐸𝑁 𝑋
𝐸𝑁𝑌 𝐷𝐸 𝑌
𝐷𝐸 𝑋image
image
image
image
𝐷 𝑋
𝐷 𝑌
Discriminator
of X domain
Discriminator
of Y domain
Projection to Common Space
Training
Semantic Consistency:
Used in DTN [Yaniv Taigman, et al., ICLR, 2017] and
XGAN [Amélie Royer, et al., arXiv, 2017]
To the same
latent space
Outline of Part 1
Generation by GAN
• Image Generation as Example
• Theory behind GAN
• Issues and Possible Solutions
• Evaluation
Conditional Generation
Feature Extraction
Unsupervised Conditional Generation
Relation to Reinforcement Learning
Basic Components
EnvActor
Reward
Function
Video
Game
Go
Get 20 scores when
killing a monster
The rule
of GO
You cannot control
• Input of neural network: the observation of machine
represented as a vector or a matrix
• Output neural network : each action corresponds to a
neuron in output layer
…
…
NN as actor
pixels
fire
right
left
Score of an
action
0.7
0.2
0.1
Take the action
based on the
probability.
Neural network as Actor
Start with
observation 𝑠1 Observation 𝑠2 Observation 𝑠3
Example: Playing Video Game
Action 𝑎1: “right”
Obtain reward
𝑟1 = 0
Action 𝑎2 : “fire”
(kill an alien)
Obtain reward
𝑟2 = 5
Start with
observation 𝑠1 Observation 𝑠2 Observation 𝑠3
Example: Playing Video Game
After many turns
Action 𝑎 𝑇
Obtain reward 𝑟𝑇
Game Over
(spaceship destroyed)
This is an episode.
We want the total
reward be maximized.
Total reward:
𝑅 = ෍
𝑡=1
𝑇
𝑟𝑡
Actor, Environment, Reward
𝜏 = 𝑠1, 𝑎1, 𝑠2, 𝑎2, ⋯ , 𝑠 𝑇, 𝑎 𝑇
Trajectory
Actor
𝑠1
𝑎1
Env
𝑠2
Env
𝑠1
𝑎1
Actor
𝑠2
𝑎2
Env
𝑠3
𝑎2
……
Reward Function → Discriminator
Reinforcement Learning v.s. GAN
Actor
𝑠1
𝑎1
Env
𝑠2
Env
𝑠1
𝑎1
Actor
𝑠2
𝑎2
Env
𝑠3
𝑎2
……
𝑅 𝜏 = ෍
𝑡=1
𝑇
𝑟𝑡
Reward
𝑟1
Reward
𝑟2
“Black box”
You cannot use
backpropagation.
Actor → Generator Fixed
updatedupdated
Imitation Learning
We have demonstration of the expert.
Actor
𝑠1
𝑎1
Env
𝑠2
Env
𝑠1
𝑎1
Actor
𝑠2
𝑎2
Env
𝑠3
𝑎2
……
reward function is not available
Ƹ𝜏1, Ƹ𝜏2, ⋯ , Ƹ𝜏 𝑁
Each Ƹ𝜏 is a trajectory
of the expert.
Self driving: record
human drivers
Robot: grab the
arm of robot
Inverse Reinforcement Learning
Reward
Function
Environment
Optimal
Actor
Inverse Reinforcement
Learning
Using the reward function to find the optimal actor.
Modeling reward can be easier. Simple reward
function can lead to complex policy.
Reinforcement
Learning
Expert
demonstration
of the expert
Ƹ𝜏1, Ƹ𝜏2, ⋯ , Ƹ𝜏 𝑁
Framework of IRL
Expert ො𝜋
Actor 𝜋
Obtain
Reward Function R
Ƹ𝜏1, Ƹ𝜏2, ⋯ , Ƹ𝜏 𝑁
𝜏1, 𝜏2, ⋯ , 𝜏 𝑁
Find an actor based
on reward function R
By Reinforcement learning
෍
𝑛=1
𝑁
𝑅 Ƹ𝜏 𝑛 > ෍
𝑛=1
𝑁
𝑅 𝜏
Reward function
→ Discriminator
Actor
→ Generator
Reward
Function R
The expert is always
the best.
𝜏1, 𝜏2, ⋯ , 𝜏 𝑁
GAN
IRL
G
D
High score for real,
low score for generated
Find a G whose output
obtains large score from D
Ƹ𝜏1, Ƹ𝜏2, ⋯ , Ƹ𝜏 𝑁
Expert
Actor
Reward
Function
Larger reward for Ƹ𝜏 𝑛,
Lower reward for 𝜏
Find a Actor obtains
large reward
Robot
http://rll.berkeley.edu/gcl/
Chelsea Finn, Sergey Levine, Pieter
Abbeel, ” Guided Cost Learning: Deep
Inverse Optimal Control via Policy
Optimization”, ICML, 2016
Concluding Remarks
Generation
Conditional Generation
Unsupervised Conditional Generation
Relation to Reinforcement Learning

More Related Content

What's hot

Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)NamHyuk Ahn
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Large Language Models - From RNN to BERT
Large Language Models - From RNN to BERTLarge Language Models - From RNN to BERT
Large Language Models - From RNN to BERTATPowr
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationVikas Jain
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingSangwoo Mo
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithmJie-Han Chen
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language ProcessingJaganadh Gopinadhan
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2
Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2
Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2Kirill Eremenko
 
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...Simplilearn
 

What's hot (20)

Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Large Language Models - From RNN to BERT
Large Language Models - From RNN to BERTLarge Language Models - From RNN to BERT
Large Language Models - From RNN to BERT
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Graph Based Pattern Recognition
Graph Based Pattern RecognitionGraph Based Pattern Recognition
Graph Based Pattern Recognition
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Machine learning
Machine learningMachine learning
Machine learning
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Generative adversarial text to image synthesis
Generative adversarial text to image synthesisGenerative adversarial text to image synthesis
Generative adversarial text to image synthesis
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Region based segmentation
Region based segmentationRegion based segmentation
Region based segmentation
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2
Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2
Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2
 
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
 

Similar to [GAN by Hung-yi Lee]Part 1: General introduction of GAN

[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processingNAVER Engineering
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffTaeoh Kim
 
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...宏毅 李
 
Review of generative adversarial nets
Review of generative adversarial netsReview of generative adversarial nets
Review of generative adversarial netsSungminYou
 
Deep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image RetrievalDeep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image RetrievalEdwin Efraín Jiménez Lepe
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodelsDai-Hai Nguyen
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksChenYiHuang5
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...Shuhei Yoshida
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics JCMwave
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...NAIST Machine Translation Study Group
 
Stochastic optimal control &amp; rl
Stochastic optimal control &amp; rlStochastic optimal control &amp; rl
Stochastic optimal control &amp; rlChoiJinwon3
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
 
GAN in_kakao
GAN in_kakaoGAN in_kakao
GAN in_kakaoJunho Kim
 
A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...MITSUNARI Shigeo
 
Sampling method : MCMC
Sampling method : MCMCSampling method : MCMC
Sampling method : MCMCSEMINARGROOT
 
Machine learning introduction lecture notes
Machine learning introduction lecture notesMachine learning introduction lecture notes
Machine learning introduction lecture notesUmeshJagga1
 

Similar to [GAN by Hung-yi Lee]Part 1: General introduction of GAN (20)

[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion Tradeoff
 
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
 
Review of generative adversarial nets
Review of generative adversarial netsReview of generative adversarial nets
Review of generative adversarial nets
 
Deep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image RetrievalDeep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image Retrieval
 
QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
QMC: Operator Splitting Workshop, Deeper Look at Deep Learning: A Geometric R...
 
Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodels
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...[Paper Introduction] Efficient top down btg parsing for machine translation p...
[Paper Introduction] Efficient top down btg parsing for machine translation p...
 
Stochastic optimal control &amp; rl
Stochastic optimal control &amp; rlStochastic optimal control &amp; rl
Stochastic optimal control &amp; rl
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
 
GAN in_kakao
GAN in_kakaoGAN in_kakao
GAN in_kakao
 
A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...
 
Sampling method : MCMC
Sampling method : MCMCSampling method : MCMC
Sampling method : MCMC
 
Machine learning introduction lecture notes
Machine learning introduction lecture notesMachine learning introduction lecture notes
Machine learning introduction lecture notes
 

More from NAVER Engineering

디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIXNAVER Engineering
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)NAVER Engineering
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트NAVER Engineering
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호NAVER Engineering
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라NAVER Engineering
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기NAVER Engineering
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정NAVER Engineering
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기NAVER Engineering
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)NAVER Engineering
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드NAVER Engineering
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기NAVER Engineering
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활NAVER Engineering
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출NAVER Engineering
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우NAVER Engineering
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...NAVER Engineering
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법NAVER Engineering
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며NAVER Engineering
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기NAVER Engineering
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기NAVER Engineering
 

More from NAVER Engineering (20)

React vac pattern
React vac patternReact vac pattern
React vac pattern
 
디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
 

Recently uploaded

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

[GAN by Hung-yi Lee]Part 1: General introduction of GAN

  • 1. Generative Adversarial Network and its Applications to Speech Processing and Natural Language Processing Hung-yi Lee
  • 2. Schedule • 14:00 – 15:10: Lecture • 15:10 – 15:30: Break • 15:30 – 16:40: Lecture • 16:40 – 17:00: Break • 17:00 – 18:00: Lecture
  • 5. Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, Shakir Mohamed, “Variational Approaches for Auto- Encoding Generative Adversarial Networks”, arXiv, 2017 All Kinds of GAN … https://github.com/hindupuravinash/the-gan-zoo GAN ACGAN BGAN DCGAN EBGAN fGAN GoGAN CGAN ……
  • 6. ICASSP Keyword search on session index page, so session names are included. Number of papers whose titles include the keyword 0 5 10 15 20 25 30 35 40 45 2012 2013 2014 2015 2016 2017 2018 generative adversarial reinforcement GAN becomes a very important technology.
  • 7. Outline Part I: General Introduction of Generative Adversarial Network (GAN) Part II: Applications to Speech and Text Processing Part III: Recent Progress at National Taiwan University
  • 8. Part I: General Introduction Generative Adversarial Network and its Applications to Speech Processing and Natural Language Processing
  • 9. Outline of Part 1 Generation by GAN • Image Generation as Example • Theory behind GAN • Issues and Possible Solutions • Evaluation Conditional Generation Feature Extraction Unsupervised Conditional Generation Relation to Reinforcement Learning
  • 10. Outline of Part 1 Generation by GAN • Image Generation as Example • Theory behind GAN • Issues and Possible Solutions • Evaluation Conditional Generation Feature Extraction Unsupervised Conditional Generation Relation to Reinforcement Learning
  • 12. Basic Idea of GAN Generator It is a neural network (NN), or a function. Generator 0.1 −3 ⋮ 2.4 0.9 imagevector Generator 3 −3 ⋮ 2.4 0.9 Generator 0.1 2.1 ⋮ 5.4 0.9 Generator 0.1 −3 ⋮ 2.4 3.5 high dimensional vector Powered by: http://mattya.github.io/chainer-DCGAN/ Each dimension of input vector represents some characteristics. Longer hair blue hair Open mouth
  • 13. Discri- minator scalar image Basic Idea of GAN It is a neural network (NN), or a function. Larger value means real, smaller value means fake. Discri- minator Discri- minator Discri- minator1.0 1.0 0.1 Discri- minator 0.1
  • 14. • Initialize generator and discriminator • In each training iteration: DG sample generated objects G Algorithm D Update vector vector vector vector 0000 1111 randomly sampled Database Step 1: Fix generator G, and update discriminator D Discriminator learns to assign high scores to real objects and low scores to generated objects. Fix
  • 15. • Initialize generator and discriminator • In each training iteration: DG Algorithm Step 2: Fix discriminator D, and update generator G Discri- minator NN Generator vector 0.13 hidden layer update fix Gradient Ascent large network Generator learns to “fool” the discriminator
  • 16. • Initialize generator and discriminator • In each training iteration: DG Learning D Sample some real objects: Generate some fake objects: G Algorithm D Update Learning G G D image 1111 image image image 1 update fix 0000vector vector vector vector vector vector vector vector fix
  • 17. Basic Idea of GAN NN Generator v1 Discri- minator v1 Real images: NN Generator v2 Discri- minator v2 NN Generator v3 Discri- minator v3 This is where the term “adversarial” comes from.
  • 18. Anime Face Generation 100 updates Source of training data: https://zhuanlan.zhihu.com/p/24767059
  • 25. The faces generated by machine. The images are generated by Yen-Hao Chen, Po- Chun Chien, Jun-Chen Xie, Tsung-Han Wu.
  • 27. (Variational) Auto-encoder As close as possible NN Encoder NN Decoder code NN Decoder code Randomly generate a vector as code Image ? = Generator = Generator
  • 28. Auto-encoder v.s. GAN As close as possibleNN Decoder code Genera- tor code = Generator Discri- minator If discriminator does not simply memorize the images, Generator learns the patterns of faces. Just copy an image 1 Auto-encoder GAN Fuzzy …
  • 29. FID[Martin Heusel, et al., NIPS, 2017]: Smaller is better [Mario Lucic, et al. arXiv, 2017]
  • 30. Outline of Part 1 Generation by GAN • Image Generation as Example • Theory behind GAN • Issues and Possible Solutions • Evaluation Conditional Generation Feature Extraction Unsupervised Conditional Generation Relation to Reinforcement Learning
  • 31. Generation • We want to find data distribution 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 High Probability Low Probability Image Space 𝑥: an image (a high- dimensional vector)
  • 32. Maximum Likelihood Estimation • Given a data distribution 𝑃𝑑𝑎𝑡𝑎 𝑥 (We can sample from it.) • We have a distribution 𝑃𝐺 𝑥; 𝜃 parameterized by 𝜃 • We want to find 𝜃 such that 𝑃𝐺 𝑥; 𝜃 close to 𝑃𝑑𝑎𝑡𝑎 𝑥 • E.g. 𝑃𝐺 𝑥; 𝜃 is a Gaussian Mixture Model, 𝜃 are means and variances of the Gaussians Sample 𝑥1 , 𝑥2 , … , 𝑥 𝑚 from 𝑃𝑑𝑎𝑡𝑎 𝑥 We can compute 𝑃𝐺 𝑥 𝑖; 𝜃 Likelihood of generating the samples 𝐿 = ෑ 𝑖=1 𝑚 𝑃𝐺 𝑥 𝑖; 𝜃 Find 𝜃∗ maximizing the likelihood
  • 33. Maximum Likelihood Estimation = Minimize KL Divergence 𝜃∗ = 𝑎𝑟𝑔 max 𝜃 ෑ 𝑖=1 𝑚 𝑃𝐺 𝑥 𝑖; 𝜃 = 𝑎𝑟𝑔 max 𝜃 𝑙𝑜𝑔 ෑ 𝑖=1 𝑚 𝑃𝐺 𝑥 𝑖; 𝜃 = 𝑎𝑟𝑔 max 𝜃 ෍ 𝑖=1 𝑚 𝑙𝑜𝑔𝑃𝐺 𝑥 𝑖 ; 𝜃 ≈ 𝑎𝑟𝑔 max 𝜃 𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎 [𝑙𝑜𝑔𝑃𝐺 𝑥; 𝜃 ] = 𝑎𝑟𝑔 max 𝜃 න 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝑃𝐺 𝑥; 𝜃 𝑑𝑥 − න 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝑃𝑑𝑎𝑡𝑎 𝑥 𝑑𝑥 = 𝑎𝑟𝑔 min 𝜃 𝐾𝐿 𝑃𝑑𝑎𝑡𝑎||𝑃𝐺 𝑥1 , 𝑥2 , … , 𝑥 𝑚 from 𝑃𝑑𝑎𝑡𝑎 𝑥 How to define a general 𝑃𝐺?
  • 34. Generator • A generator G is a network. The network defines a probability distribution 𝑃𝐺 generator G𝑧 𝑥 = 𝐺 𝑧 Normal Distribution 𝑃𝐺(𝑥) 𝑃𝑑𝑎𝑡𝑎 𝑥 as close as possible How to compute the divergence? 𝐺∗ = 𝑎𝑟𝑔 min 𝐺 𝐷𝑖𝑣 𝑃𝐺, 𝑃𝑑𝑎𝑡𝑎 Divergence between distributions 𝑃𝐺 and 𝑃𝑑𝑎𝑡𝑎 𝑥: an image (a high- dimensional vector)
  • 35. Discriminator 𝐺∗ = 𝑎𝑟𝑔 min 𝐺 𝐷𝑖𝑣 𝑃𝐺, 𝑃𝑑𝑎𝑡𝑎 Although we do not know the distributions of 𝑃𝐺 and 𝑃𝑑𝑎𝑡𝑎, we can sample from them. sample G vector vector vector vector sample from normal Database Sampling from 𝑷 𝑮 Sampling from 𝑷 𝒅𝒂𝒕𝒂
  • 36. Discriminator 𝐺∗ = 𝑎𝑟𝑔 min 𝐺 𝐷𝑖𝑣 𝑃𝐺, 𝑃𝑑𝑎𝑡𝑎 Discriminator : data sampled from 𝑃𝑑𝑎𝑡𝑎 : data sampled from 𝑃𝐺 train 𝑉 𝐺, 𝐷 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼𝑃 𝐺 𝑙𝑜𝑔 1 − 𝐷 𝑥 Example Objective Function for D (G is fixed) 𝐷∗ = 𝑎𝑟𝑔 max 𝐷 𝑉 𝐷, 𝐺Training: Using the example objective function is exactly the same as training a binary classifier. [Goodfellow, et al., NIPS, 2014] The maximum objective value is related to JS divergence. Sigmoid Output
  • 37. max 𝐷 𝑉 𝐺, 𝐷 • Given G, what is the optimal D* maximizing • Given x, the optimal D* maximizing 𝑉 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼𝑃 𝐺 𝑙𝑜𝑔 1 − 𝐷 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 + 𝑃𝐺 𝑥 𝑙𝑜𝑔 1 − 𝐷 𝑥 = න 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 𝑑𝑥 + න 𝑥 𝑃𝐺 𝑥 𝑙𝑜𝑔 1 − 𝐷 𝑥 𝑑𝑥 = න 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 + 𝑃𝐺 𝑥 𝑙𝑜𝑔 1 − 𝐷 𝑥 𝑑𝑥 Assume that D(x) can be any function 𝑉 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 +𝐸 𝑥∼𝑃 𝐺 𝑙𝑜𝑔 1 − 𝐷 𝑥
  • 38. max 𝐷 𝑉 𝐺, 𝐷 • Given x, the optimal D* maximizing • Find D* maximizing: f 𝐷 = a𝑙𝑜𝑔(𝐷) + 𝑏𝑙𝑜𝑔 1 − 𝐷 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 + 𝑃𝐺 𝑥 𝑙𝑜𝑔 1 − 𝐷 𝑥 𝑑f 𝐷 𝑑𝐷 = 𝑎 × 1 𝐷 + 𝑏 × 1 1 − 𝐷 × −1 = 0 𝑎 × 1 𝐷∗ = 𝑏 × 1 1 − 𝐷∗ 𝑎 × 1 − 𝐷∗ = 𝑏 × 𝐷∗ 𝑎 − 𝑎𝐷∗ = 𝑏𝐷∗ 𝐷∗ = 𝑎 𝑎 + 𝑏 𝐷∗ 𝑥 = 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥 a D b D 0 < < 1 𝑎 = 𝑎 + 𝑏 𝐷∗ 𝑉 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 +𝐸 𝑥∼𝑃 𝐺 𝑙𝑜𝑔 1 − 𝐷 𝑥
  • 39. max 𝐷 𝑉 𝐺, 𝐷 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥 +𝐸 𝑥∼𝑃 𝐺 𝑙𝑜𝑔 𝑃𝐺 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥 = න 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥 𝑑𝑥 max 𝐷 𝑉 𝐺, 𝐷 + න 𝑥 𝑃𝐺 𝑥 𝑙𝑜𝑔 𝑃𝐺 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥 𝑑𝑥 2 2 1 2 1 2 +2𝑙𝑜𝑔 1 2 𝐷∗ 𝑥 = 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥 = 𝑉 𝐺, 𝐷∗ 𝑉 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 +𝐸 𝑥∼𝑃 𝐺 𝑙𝑜𝑔 1 − 𝐷 𝑥 −2𝑙𝑜𝑔2
  • 40. max 𝐷 𝑉 𝐺, 𝐷 = −2log2 + KL Pdata|| Pdata + PG 2 = −2𝑙𝑜𝑔2 + 2𝐽𝑆𝐷 𝑃𝑑𝑎𝑡𝑎||𝑃𝐺 Jensen-Shannon divergence = −2𝑙𝑜𝑔2 + න 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥 /2 𝑑𝑥 + න 𝑥 𝑃𝐺 𝑥 𝑙𝑜𝑔 𝑃𝐺 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥 /2 𝑑𝑥 +KL PG|| Pdata + PG 2 max 𝐷 𝑉 𝐺, 𝐷 𝐷∗ 𝑥 = 𝑃𝑑𝑎𝑡𝑎 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥 + 𝑃𝐺 𝑥 = 𝑉 𝐺, 𝐷∗
  • 41. Discriminator 𝐺∗ = 𝑎𝑟𝑔 min 𝐺 𝐷𝑖𝑣 𝑃𝐺, 𝑃𝑑𝑎𝑡𝑎 Discriminator : data sampled from 𝑃𝑑𝑎𝑡𝑎 : data sampled from 𝑃𝐺 train hard to discriminatesmall divergence Discriminator train easy to discriminatelarge divergence 𝐷∗ = 𝑎𝑟𝑔 max 𝐷 𝑉 𝐷, 𝐺 Training: (cannot make objective large)
  • 42. 𝐺1 𝐺2 𝐺3 𝑉 𝐺1 , 𝐷 𝑉 𝐺2 , 𝐷 𝑉 𝐺3 , 𝐷 𝐷 𝐷 𝐷 𝑉 𝐺1 , 𝐷1 ∗ Divergence between 𝑃𝐺1 and 𝑃𝑑𝑎𝑡𝑎 𝐺∗ = 𝑎𝑟𝑔 min 𝐺 𝐷𝑖𝑣 𝑃𝐺, 𝑃𝑑𝑎𝑡𝑎 The maximum objective value is related to JS divergence. 𝐷∗ = 𝑎𝑟𝑔 max 𝐷 𝑉 𝐷, 𝐺 max 𝐷 𝑉 𝐺, 𝐷
  • 43. 𝐺∗ = 𝑎𝑟𝑔 min 𝐺 𝐷𝑖𝑣 𝑃𝐺, 𝑃𝑑𝑎𝑡𝑎max 𝐷 𝑉 𝐺, 𝐷 The maximum objective value is related to JS divergence. • Initialize generator and discriminator • In each training iteration: Step 1: Fix generator G, and update discriminator D Step 2: Fix discriminator D, and update generator G 𝐷∗ = 𝑎𝑟𝑔 max 𝐷 𝑉 𝐷, 𝐺 [Goodfellow, et al., NIPS, 2014]
  • 44. Algorithm • To find the best G minimizing the loss function 𝐿 𝐺 , 𝜃 𝐺 ← 𝜃 𝐺 − 𝜂 Τ𝜕𝐿 𝐺 𝜕𝜃 𝐺 𝑓 𝑥 = max{𝑓1 𝑥 , 𝑓2 𝑥 , 𝑓3 𝑥 } 𝑑𝑓 𝑥 𝑑𝑥 =? 𝑓1 𝑥 𝑓2 𝑥 𝑓3 𝑥 Τ𝑑𝑓1 𝑥 𝑑𝑥 Τ𝑑𝑓2 𝑥 𝑑𝑥 Τ𝑑𝑓3 𝑥 𝑑𝑥 Τ𝑑𝑓𝑖 𝑥 𝑑𝑥 If 𝑓𝑖 𝑥 is the max one 𝐺∗ = 𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺, 𝐷 𝐿 𝐺 𝜃 𝐺 defines G
  • 45. Algorithm • Given 𝐺0 • Find 𝐷0 ∗ maximizing 𝑉 𝐺0, 𝐷 • 𝜃 𝐺 ← 𝜃 𝐺 − 𝜂 Τ𝜕𝑉 𝐺, 𝐷0 ∗ 𝜕𝜃 𝐺 Obtain 𝐺1 • Find 𝐷1 ∗ maximizing 𝑉 𝐺1, 𝐷 • 𝜃 𝐺 ← 𝜃 𝐺 − 𝜂 Τ𝜕𝑉 𝐺, 𝐷1 ∗ 𝜕𝜃 𝐺 Obtain 𝐺2 • …… 𝑉 𝐺0, 𝐷0 ∗ is the JS divergence between 𝑃𝑑𝑎𝑡𝑎 𝑥 and 𝑃𝐺0 𝑥 𝑉 𝐺1, 𝐷1 ∗ is the JS divergence between 𝑃𝑑𝑎𝑡𝑎 𝑥 and 𝑃𝐺1 𝑥 𝐺∗ = 𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺, 𝐷 𝐿 𝐺 Decrease JS divergence(?) Decrease JS divergence(?) Using Gradient Ascent
  • 46. Algorithm • Given 𝐺0 • Find 𝐷0 ∗ maximizing 𝑉 𝐺0, 𝐷 • 𝜃 𝐺 ← 𝜃 𝐺 − 𝜂 Τ𝜕𝑉 𝐺, 𝐷0 ∗ 𝜕𝜃 𝐺 Obtain 𝐺1 𝑉 𝐺0, 𝐷0 ∗ is the JS divergence between 𝑃𝑑𝑎𝑡𝑎 𝑥 and 𝑃𝐺0 𝑥 𝐺∗ = 𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺, 𝐷 𝐿 𝐺 Decrease JS divergence(?) 𝑉 𝐺0 , 𝐷 𝐷0 ∗ 𝑉 𝐺1 , 𝐷 𝐷0 ∗ 𝑉 𝐺0 , 𝐷0 ∗ 𝑉 𝐺1 , 𝐷0 ∗ smaller 𝑉 𝐺1 , 𝐷1 ∗ …… Assume 𝐷0 ∗ ≈ 𝐷1 ∗ Don’t update G too much
  • 47. In practice … • Given G, how to compute max 𝐷 𝑉 𝐺, 𝐷 • Sample 𝑥1, 𝑥2, … , 𝑥 𝑚 from 𝑃𝑑𝑎𝑡𝑎 𝑥 , sample ෤𝑥1, ෤𝑥2, … , ෤𝑥 𝑚 from generator 𝑃𝐺 𝑥 ෨𝑉 = 1 𝑚 ෍ 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 + 1 𝑚 ෍ 𝑖=1 𝑚 𝑙𝑜𝑔 1 − 𝐷 ෤𝑥 𝑖Maximize 𝑉 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 +𝐸 𝑥∼𝑃 𝐺 𝑙𝑜𝑔 1 − 𝐷 𝑥 Minimize Cross-entropy Binary Classifier 𝑥1, 𝑥2, … , 𝑥 𝑚 from 𝑃𝑑𝑎𝑡𝑎 𝑥 ෤𝑥1, ෤𝑥2, … , ෤𝑥 𝑚 from 𝑃𝐺 𝑥 D is a binary classifier with sigmoid output (can be deep) Positive examples Negative examples =
  • 48. • In each training iteration: • Sample m examples 𝑥1, 𝑥2, … , 𝑥 𝑚 from data distribution 𝑃𝑑𝑎𝑡𝑎 𝑥 • Sample m noise samples 𝑧1, 𝑧2, … , 𝑧 𝑚 from the prior 𝑃𝑝𝑟𝑖𝑜𝑟 𝑧 • Obtaining generated data ෤𝑥1 , ෤𝑥2 , … , ෤𝑥 𝑚 , ෤𝑥 𝑖 = 𝐺 𝑧 𝑖 • Update discriminator parameters 𝜃 𝑑 to maximize • ෨𝑉 = 1 𝑚 σ𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 + 1 𝑚 σ𝑖=1 𝑚 𝑙𝑜𝑔 1 − 𝐷 ෤𝑥 𝑖 • 𝜃 𝑑 ← 𝜃 𝑑 + 𝜂𝛻 ෨𝑉 𝜃 𝑑 • Sample another m noise samples 𝑧1, 𝑧2, … , 𝑧 𝑚 from the prior 𝑃𝑝𝑟𝑖𝑜𝑟 𝑧 • Update generator parameters 𝜃𝑔 to minimize • ෨𝑉 = 1 𝑚 σ𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 + 1 𝑚 σ𝑖=1 𝑚 𝑙𝑜𝑔 1 − 𝐷 𝐺 𝑧 𝑖 • 𝜃𝑔 ← 𝜃𝑔 − 𝜂𝛻 ෨𝑉 𝜃𝑔 Algorithm Repeat k times Learning D Learning G Initialize 𝜃 𝑑 for D and 𝜃𝑔 for G max 𝐷 𝑉 𝐺, 𝐷Can only find lower bound of Only Once
  • 49. Objective Function for Generator in Real Implementation 𝑉 = 𝐸 𝑥∼𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 𝑉 = 𝐸 𝑥∼𝑃 𝐺 −𝑙𝑜𝑔 𝐷 𝑥 Real implementation: label x from PG as positive −𝑙𝑜𝑔 𝐷 𝑥 𝑙𝑜𝑔 1 − 𝐷 𝑥 +𝐸 𝑥∼𝑃 𝐺 𝑙𝑜𝑔 1 − 𝐷 𝑥 𝐷 𝑥 Slow at the beginning Minimax GAN (MMGAN) Non-saturating GAN (NSGAN)
  • 50. Using the divergence you like  [Sebastian Nowozin, et al., NIPS, 2016] Can we use other divergence?
  • 51. Outline of Part 1 Generation by GAN • Image Generation as Example • Theory behind GAN • Issues and Possible Solutions • Evaluation Conditional Generation Feature Extraction Unsupervised Conditional Generation Relation to Reinforcement Learning More tips and tricks: https://github.com/soumith/ganhacks
  • 52. JS divergence is not suitable • In most cases, 𝑃𝐺 and 𝑃𝑑𝑎𝑡𝑎 are not overlapped. • 1. The nature of data • 2. Sampling Both 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺 are low-dim manifold in high-dim space. 𝑃𝑑𝑎𝑡𝑎 𝑃𝐺 The overlap can be ignored. Even though 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺 have overlap. If you do not have enough sampling ……
  • 53. 𝑃𝑑𝑎𝑡𝑎𝑃𝐺0 𝑃𝑑𝑎𝑡𝑎𝑃𝐺1 𝐽𝑆 𝑃𝐺0 , 𝑃𝑑𝑎𝑡𝑎 = 𝑙𝑜𝑔2 𝑃𝑑𝑎𝑡𝑎𝑃𝐺100 …… 𝐽𝑆 𝑃𝐺1 , 𝑃𝑑𝑎𝑡𝑎 = 𝑙𝑜𝑔2 𝐽𝑆 𝑃𝐺100 , 𝑃𝑑𝑎𝑡𝑎 = 0 What is the problem of JS divergence? …… JS divergence is log2 if two distributions do not overlap. Intuition: If two distributions do not overlap, binary classifier achieves 100% accuracy Equally bad Same objective value is obtained. Same divergence
  • 54. Wasserstein GAN (WGAN): Earth Mover’s Distance • Considering one distribution P as a pile of earth, and another distribution Q as the target • The average distance the earth mover has to move the earth. 𝑃 𝑄 d 𝑊 𝑃, 𝑄 = 𝑑
  • 55. Wasserstein distance Source of image: https://vincentherrmann.github.io/blog/wasserstein/ 𝑃 𝑄 Using the “moving plan” with the smallest average distance to define the Wasserstein distance. There are many possible “moving plans”. Smaller distance? Larger distance?
  • 56. WGAN: Earth Mover’s Distance Source of image: https://vincentherrmann.github.io/blog/wasserstein/ 𝑃 𝑄 Using the “moving plan” with the smallest average distance to define the earth mover’s distance. There many possible “moving plans”. Best “moving plans” of this example
  • 57. A “moving plan” is a matrix The value of the element is the amount of earth from one position to another. moving plan 𝛾 All possible plan Π 𝐵 𝛾 = ෍ 𝑥 𝑝,𝑥 𝑞 𝛾 𝑥 𝑝, 𝑥 𝑞 𝑥 𝑝 − 𝑥 𝑞 Average distance of a plan 𝛾: Earth Mover’s Distance: 𝑊 𝑃, 𝑄 = min 𝛾∈Π 𝐵 𝛾 The best plan 𝑃 𝑄 𝑥 𝑝 𝑥 𝑞
  • 58. 𝑃𝑑𝑎𝑡𝑎𝑃𝐺0 𝑃𝑑𝑎𝑡𝑎𝑃𝐺50 𝐽𝑆 𝑃𝐺0 , 𝑃𝑑𝑎𝑡𝑎 = 𝑙𝑜𝑔2 𝑃𝑑𝑎𝑡𝑎𝑃𝐺100 …… …… 𝑑0 𝑑50 𝐽𝑆 𝑃𝐺50 , 𝑃𝑑𝑎𝑡𝑎 = 𝑙𝑜𝑔2 𝐽𝑆 𝑃𝐺100 , 𝑃𝑑𝑎𝑡𝑎 = 0 𝑊 𝑃𝐺0 , 𝑃𝑑𝑎𝑡𝑎 = 𝑑0 𝑊 𝑃𝐺50 , 𝑃𝑑𝑎𝑡𝑎 = 𝑑50 𝑊 𝑃𝐺100 , 𝑃𝑑𝑎𝑡𝑎 = 0 Why Earth Mover’s Distance? 𝐷𝑓 𝑃𝑑𝑎𝑡𝑎||𝑃𝐺 𝑊 𝑃𝑑𝑎𝑡𝑎, 𝑃𝐺
  • 59. WGAN 𝑉 𝐺, 𝐷 = max 𝐷∈1−𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧 𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎 𝐷 𝑥 − 𝐸 𝑥~𝑃 𝐺 𝐷 𝑥 Evaluate wasserstein distance between 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺 [Martin Arjovsky, et al., arXiv, 2017] D has to be smooth enough. real −∞ generated D ∞ Without the constraint, the training of D will not converge. Keeping the D smooth forces D(x) become ∞ and −∞
  • 60. WGAN 𝑉 𝐺, 𝐷 = max 𝐷∈1−𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧 𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎 𝐷 𝑥 − 𝐸 𝑥~𝑃 𝐺 𝐷 𝑥 Evaluate wasserstein distance between 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺 [Martin Arjovsky, et al., arXiv, 2017] How to fulfill this constraint?D has to be smooth enough. 𝑓 𝑥1 − 𝑓 𝑥2 ≤ 𝐾 𝑥1 − 𝑥2 Lipschitz Function K=1 for "1 − 𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧" Output change Input change 1−Lipschitz not 1−Lipschitz Do not change fast Weight Clipping Force the parameters w between c and -c After parameter update, if w > c, w = c; if w < -c, w = -c
  • 61. 𝑉 𝐺, 𝐷 = max 𝐷∈1−𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧 𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎 𝐷 𝑥 − 𝐸 𝑥~𝑃 𝐺 𝐷 𝑥 ≈ max 𝐷 {𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎 𝐷 𝑥 − 𝐸 𝑥~𝑃 𝐺 𝐷 𝑥 −𝜆 ‫׬‬𝑥 𝑚𝑎𝑥 0, 𝛻𝑥 𝐷 𝑥 − 1 𝑑𝑥} −𝜆𝐸 𝑥~𝑃 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝑚𝑎𝑥 0, 𝛻𝑥 𝐷 𝑥 − 1 } A differentiable function is 1-Lipschitz if and only if it has gradients with norm less than or equal to 1 everywhere. 𝛻𝑥 𝐷 𝑥 ≤ 1 for all x Improved WGAN (WGAN-GP) 𝑉 𝐺, 𝐷 𝐷 ∈ 1 − 𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧 Prefer 𝛻𝑥 𝐷 𝑥 ≤ 1 for all x Prefer 𝛻𝑥 𝐷 𝑥 ≤ 1 for x sampling from 𝑥~𝑃𝑝𝑒𝑛𝑎𝑙𝑡𝑦
  • 62. Improved WGAN (WGAN-GP) 𝑃𝑑𝑎𝑡𝑎 𝑃𝐺 Only give gradient constraint to the region between 𝑃𝑑𝑎𝑡𝑎 and 𝑃𝐺 because they influence how 𝑃𝐺 moves to 𝑃𝑑𝑎𝑡𝑎 −𝜆𝐸 𝑥~𝑃 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝑚𝑎𝑥 0, 𝛻𝑥 𝐷 𝑥 − 1 } ≈ max 𝐷 {𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎 𝐷 𝑥 − 𝐸 𝑥~𝑃 𝐺 𝐷 𝑥𝑉 𝐺, 𝐷 𝑃𝑝𝑒𝑛𝑎𝑙𝑡𝑦 “Given that enforcing the Lipschitz constraint everywhere is intractable, enforcing it only along these straight lines seems sufficient and experimentally results in good performance.”
  • 63. “Simply penalizing overly large gradients also works in theory, but experimentally we found that this approach converged faster and to better optima.” Improved WGAN (WGAN-GP) 𝑃𝑑𝑎𝑡𝑎 𝑃𝐺 −𝜆𝐸 𝑥~𝑃 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝑚𝑎𝑥 0, 𝛻𝑥 𝐷 𝑥 − 1 } ≈ max 𝐷 {𝐸 𝑥~𝑃 𝑑𝑎𝑡𝑎 𝐷 𝑥 − 𝐸 𝑥~𝑃 𝐺 𝐷 𝑥𝑉 𝐺, 𝐷 𝑃𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝛻𝑥 𝐷 𝑥 − 1 2 𝐷 𝑥𝐷 𝑥 The gradient norm in this region = 1 More: Improved the improved WGAN
  • 64. Spectrum Norm Spectral Normalization → Keep gradient norm smaller than 1 everywhere [Miyato, et al., ICLR, 2018]
  • 65. • In each training iteration: • Sample m examples 𝑥1, 𝑥2, … , 𝑥 𝑚 from data distribution 𝑃𝑑𝑎𝑡𝑎 𝑥 • Sample m noise samples 𝑧1, 𝑧2, … , 𝑧 𝑚 from the prior 𝑃𝑝𝑟𝑖𝑜𝑟 𝑧 • Obtaining generated data ෤𝑥1 , ෤𝑥2 , … , ෤𝑥 𝑚 , ෤𝑥 𝑖 = 𝐺 𝑧 𝑖 • Update discriminator parameters 𝜃 𝑑 to maximize • ෨𝑉 = 1 𝑚 σ𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 + 1 𝑚 σ𝑖=1 𝑚 𝑙𝑜𝑔 1 − 𝐷 ෤𝑥 𝑖 • 𝜃 𝑑 ← 𝜃 𝑑 + 𝜂𝛻 ෨𝑉 𝜃 𝑑 • Sample another m noise samples 𝑧1, 𝑧2, … , 𝑧 𝑚 from the prior 𝑃𝑝𝑟𝑖𝑜𝑟 𝑧 • Update generator parameters 𝜃𝑔 to minimize • ෨𝑉 = 1 𝑚 σ𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 + 1 𝑚 σ𝑖=1 𝑚 𝑙𝑜𝑔 1 − 𝐷 𝐺 𝑧 𝑖 • 𝜃𝑔 ← 𝜃𝑔 − 𝜂𝛻 ෨𝑉 𝜃𝑔 Algorithm of Original GAN Repeat k times Learning D Learning G Only Once 𝐷 𝑥 𝑖 𝐷 ෤𝑥 𝑖− No sigmoid for the output of D WGAN Weight clipping / Gradient Penalty … 𝐷 𝐺 𝑧 𝑖−
  • 66. Energy-based GAN (EBGAN) • Using an autoencoder as discriminator D Discriminator 0 for the best images Generator is the same. - 0.1 EN DE Autoencoder X -1 -0.1 [Junbo Zhao, et al., arXiv, 2016] Using the negative reconstruction error of auto-encoder to determine the goodness Benefit: The auto-encoder can be pre-train by real images without generator.
  • 67. EBGAN realgen gen Hard to reconstruct, easy to destroy m 0 is for the best. Do not have to be very negative Auto-encoder based discriminator only gives limited region large value.
  • 68. Mode Collapse : real data : generated data Training with too many iterations ……
  • 69. Mode Dropping BEGAN on CelebA Generator at iteration t Generator at iteration t+1 Generator at iteration t+2 Generator switches mode during training
  • 70. Ensemble Generator 1 Generator 2 …… …… Train a set of generators: 𝐺1, 𝐺2, ⋯ , 𝐺 𝑁 Random pick a generator 𝐺𝑖 Use 𝐺𝑖 to generate the image To generate an image
  • 71. Outline of Part 1 Generation by GAN • Image Generation as Example • Theory behind GAN • Issues and Possible Solutions • Evaluation Conditional Generation Feature Extraction Unsupervised Conditional Generation Relation to Reinforcement Learning
  • 72. Likelihood : real data (not observed during training) : generated data 𝑥 𝑖 Log Likelihood: Generator PG 𝐿 = 1 𝑁 ෍ 𝑖 𝑙𝑜𝑔𝑃𝐺 𝑥 𝑖 We cannot compute 𝑃𝐺 𝑥 𝑖 . We can only sample from 𝑃𝐺. Prior Distribution
  • 73. Likelihood - Kernel Density Estimation • Estimate the distribution of PG(x) from sampling Generator PG Now we have an approximation of 𝑃𝐺, so we can compute 𝑃𝐺 𝑥 𝑖 for each real data 𝑥 𝑖 Then we can compute the likelihood. Each sample is the mean of a Gaussian with the same covariance.
  • 74. Likelihood v.s. Quality • Low likelihood, high quality? • High likelihood, low quality? Considering a model generating good images (small variance) Generator PG Generated data Data for evaluation 𝑥 𝑖 𝑃𝐺 𝑥 𝑖 = 0 G1 𝐿 = 1 𝑁 ෍ 𝑖 𝑙𝑜𝑔𝑃𝐺 𝑥 𝑖 G2 X 0.99 X 0.01 = −𝑙𝑜𝑔100 + 1 𝑁 ෍ 𝑖 𝑙𝑜𝑔𝑃𝐺 𝑥 𝑖 4.6100
  • 75. Objective Evaluation Off-the-shelf Image Classifier 𝑥 𝑃 𝑦|𝑥 Concentrated distribution means higher visual quality CNN𝑥1 𝑃 𝑦1 |𝑥1 Uniform distribution means higher variety CNN𝑥2 𝑃 𝑦2 |𝑥2 CNN𝑥3 𝑃 𝑦3|𝑥3 … 𝑃 𝑦 = 1 𝑁 ෍ 𝑛 𝑃 𝑦 𝑛|𝑥 𝑛 [Tim Salimans, et al., NIPS, 201 𝑥: image 𝑦: class (output of CNN) e.g. Inception net, VGG, etc. class 1 class 2 class 3
  • 76. Objective Evaluation = ෍ 𝑥 ෍ 𝑦 𝑃 𝑦|𝑥 𝑙𝑜𝑔𝑃 𝑦|𝑥 − ෍ 𝑦 𝑃 𝑦 𝑙𝑜𝑔𝑃 𝑦 Negative entropy of P(y|x) Entropy of P(y) Inception Score 𝑃 𝑦 = 1 𝑁 ෍ 𝑛 𝑃 𝑦 𝑛|𝑥 𝑛 𝑃 𝑦|𝑥 class 1 class 2 class 3
  • 77. We don’t want memory GAN. • Using k-nearest neighbor to check whether the generator generates new objects https://arxiv.org/pdf/1511.01844.pdf
  • 78. Mode Dropping Sanjeev Arora, Andrej Risteski, Yi Zhang, “Do GANs learn the distribution? Some Theory and Empirics”, ICLR, 2018 If you sample 400 images, with probability > 50%, You would sample “identical” images. There are approximately 0.16M different faces. DCGAN ALI There are approximately 1M different faces.
  • 79. Outline of Part 1 Generation by GAN • Image Generation as Example • Theory behind GAN • Issues and Possible Solutions • Evaluation Conditional Generation Feature Extraction Unsupervised Conditional Generation Relation to Reinforcement Learning
  • 80. Target of NN output Text-to-Image • Traditional supervised approach NN Image Text: “train” a dog is running a bird is flying A blurry image! c1: a dog is running as close as possible
  • 81. Conditional GAN D (original) scalar𝑥 G 𝑧Normal distribution x = G(c,z) c: train x is real image or not Image Real images: Generated images: 1 0 Generator will learn to generate realistic images …. But completely ignore the input conditions. [Scott Reed, et al, ICML, 2016]
  • 82. Conditional GAN D (better) scalar 𝑐 𝑥 True text-image pairs: G 𝑧Normal distribution x = G(c,z) c: train Image x is realistic or not + c and x are matched or not (train , ) (train , )(cat , ) [Scott Reed, et al, ICML, 2016] 1 00
  • 83. • In each training iteration: • Sample m positive examples 𝑐1 , 𝑥1 , 𝑐2 , 𝑥2 , … , 𝑐 𝑚 , 𝑥 𝑚 from database • Sample m noise samples 𝑧1, 𝑧2, … , 𝑧 𝑚 from a distribution • Obtaining generated data ෤𝑥1, ෤𝑥2, … , ෤𝑥 𝑚 , ෤𝑥 𝑖 = 𝐺 𝑐 𝑖, 𝑧 𝑖 • Sample m objects ො𝑥1, ො𝑥2, … , ො𝑥 𝑚 from database • Update discriminator parameters 𝜃 𝑑 to maximize • ෨𝑉 = 1 𝑚 σ𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑐 𝑖, 𝑥 𝑖 + 1 𝑚 ෍ 𝑖=1 𝑚 𝑙𝑜𝑔 1 − 𝐷 𝑐 𝑖, ෤𝑥 𝑖 + 1 𝑚 ෍ 𝑖=1 𝑚 𝑙𝑜𝑔 1 − 𝐷 𝑐 𝑖, ො𝑥 𝑖 • 𝜃 𝑑 ← 𝜃 𝑑 + 𝜂𝛻 ෨𝑉 𝜃 𝑑 • Sample m noise samples 𝑧1 , 𝑧2 , … , 𝑧 𝑚 from a distribution • Sample m conditions 𝑐1, 𝑐2, … , 𝑐 𝑚 from a database • Update generator parameters 𝜃𝑔 to maximize • ෨𝑉 = 1 𝑚 σ𝑖=1 𝑚 𝑙𝑜𝑔 𝐷 𝐺 𝑐 𝑖, 𝑧 𝑖 , 𝜃𝑔 ← 𝜃𝑔 + 𝜂𝛻 ෨𝑉 𝜃𝑔 Learning D Learning G
  • 84. x is realistic or not + c and x are matched or not Conditional GAN - Discriminator [Takeru Miyato, et al., ICLR, 2018] [Han Zhang, et al., arXiv, 2017] [Augustus Odena et al., ICML, 2017] condition c object x Network Network Network score Network Network (almost every paper) condition c object x c and x are matched or not x is realistic or not
  • 85. Conditional GAN paired data blue eyes red hair short hair Collecting anime faces and the description of its characteristics red hair, green eyes blue hair, red eyes 圖片生成: 吳宗翰、謝濬丞、 陳延昊、錢柏均
  • 86. Stack GAN Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas, “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks”, ICCV, 2017
  • 88. as close as possible Image-to-image • Traditional supervised approach NN Image It is blurry because it is the average of several images. Testing: input close
  • 89. Image-to-image • Experimental results Testing: input close GAN G 𝑧 Image D scalar GAN + close
  • 90. Patch GAN D score D D score score https://arxiv.org/pdf/1611.07004.pdf
  • 91. Image super resolution • Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi, “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”, CVPR, 2016
  • 94. Video Generation Generator Discrimi nator Last frame is real or generated Discriminator thinks it is real target Minimize distance
  • 96. Outline of Part 1 Generation by GAN • Image Generation as Example • Theory behind GAN • Issues and Possible Solutions • Evaluation Conditional Generation Feature Extraction Unsupervised Conditional Generation Relation to Reinforcement Learning
  • 97. InfoGAN Regular GAN Modifying a specific dimension, no clear meaning What we expect Actually … (The colors represents the characteristics.)
  • 98. What is InfoGAN? Discrimi nator Classifier scalarGenerator=z z’ c x c Predict the code c that generates x “Auto- encoder” Parameter sharing (only the last layer is different) encoder decoder
  • 99. What is InfoGAN? Classifier Generator=z z’ c x c Predict the code c that generates x “Auto- encoder” encoder decoder c must have clear influence on x The classifier can recover c from x.
  • 101. Modifying Input Code Generator 0.1 −3 ⋮ 2.4 0.9 Generator 3 −3 ⋮ 2.4 0.9 Generator 0.1 −3 ⋮ 6.4 0.9 Generator 0.1 −3 ⋮ 2.4 3.5 Each dimension of input vector represents some characteristics. Longer hair blue hair Open mouth The input code determines the generator output. Understand the meaning of each dimension to control the output.
  • 102. GAN+Autoencoder • We have a generator (input z, output x) • However, given x, how can we find z? • Learn an encoder (input x, output z) Generator (Decoder) Encoder z xx as close as possible fixed Discriminator Init (only modify the last layer) Autoencoder
  • 103. Attribute Representation 𝑧𝑙𝑜𝑛𝑔 = 1 𝑁1 ෍ 𝑥∈𝑙𝑜𝑛𝑔 𝐸𝑛 𝑥 − 1 𝑁2 ෍ 𝑥′∉𝑙𝑜𝑛𝑔 𝐸𝑛 𝑥′ Short Hair Long Hair 𝐸𝑛 𝑥 + 𝑧𝑙𝑜𝑛𝑔 = 𝑧′𝑥 𝐺𝑒𝑛 𝑧′ 𝑧𝑙𝑜𝑛𝑔
  • 105. https://www.youtube.com/watch?v=9c4z6YsBGQ0 Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman and Alexei A. Efros. "Generative Visual Manipulation on the Natural Image Manifold", ECCV, 2016.
  • 106. Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston, Neural Photo Editing with Introspective Adversarial Networks, arXiv preprint, 2017
  • 107. Basic Idea space of code z Why move on the code space? Fulfill the constraint
  • 108. Generator (Decoder) Encoder z xx as close as possible Back to z • Method 1 • Method 2 • Method 3 𝑧∗ = 𝑎𝑟𝑔 min 𝑧 𝐿 𝐺 𝑧 , 𝑥 𝑇 𝑥 𝑇 𝑧∗ Difference between G(z) and xT  Pixel-wise  By another network Using the results from method 2 as the initialization of method 1 Gradient Descent
  • 109. Editing Photos • z0 is the code of the input image 𝑧∗ = 𝑎𝑟𝑔 min 𝑧 𝑈 𝐺 𝑧 + 𝜆1 𝑧 − 𝑧0 2 − 𝜆2 𝐷 𝐺 𝑧 Does it fulfill the constraint of editing? Not too far away from the original image Using discriminator to check the image is realistic or notimage
  • 110. VAE-GAN Discrimi nator Generator (Decoder) Encoder z x scalarx as close as possible VAE GAN  Minimize reconstruction error  z close to normal  Minimize reconstruction error  Cheat discriminator  Discriminate real, generated and reconstructed images Discriminator provides the reconstruction loss Anders Boesen, Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, Ole Winther, “Autoencoding beyond pixels using a learned similarity metric”, ICML. 2016
  • 111. • Initialize En, De, Dis • In each iteration: • Sample M images 𝑥1, 𝑥2, ⋯ , 𝑥 𝑀 from database • Generate M codes ǁ𝑧1, ǁ𝑧2, ⋯ , ǁ𝑧 𝑀 from encoder • ǁ𝑧 𝑖 = 𝐸𝑛 𝑥 𝑖 • Generate M images ෤𝑥1 , ෤𝑥2 , ⋯ , ෤𝑥 𝑀 from decoder • ෤𝑥 𝑖 = 𝐷𝑒 ǁ𝑧 𝑖 • Sample M codes 𝑧1 , 𝑧2 , ⋯ , 𝑧 𝑀 from prior P(z) • Generate M images ො𝑥1 , ො𝑥2 , ⋯ , ො𝑥 𝑀 from decoder • ො𝑥 𝑖 = 𝐷𝑒 𝑧 𝑖 • Update En to decrease ෤𝑥 𝑖 − 𝑥 𝑖 , decrease KL(P( ǁ𝑧 𝑖|xi)||P(z)) • Update De to decrease ෤𝑥 𝑖 − 𝑥 𝑖 , increase 𝐷𝑖𝑠 ෤𝑥 𝑖 and 𝐷𝑖𝑠 ො𝑥 𝑖 • Update Dis to increase 𝐷𝑖𝑠 𝑥 𝑖 , decrease 𝐷𝑖𝑠 ෤𝑥 𝑖 and 𝐷𝑖𝑠 ො𝑥 𝑖 Algorithm Another kind of discriminator: Discriminator x real gen recon
  • 112. BiGAN Encoder Decoder Discriminator Image x code z Image x code z Image x code z from encoder or decoder? (real) (generated) (from prior distribution)
  • 113. Algorithm • Initialize encoder En, decoder De, discriminator Dis • In each iteration: • Sample M images 𝑥1, 𝑥2, ⋯ , 𝑥 𝑀 from database • Generate M codes ǁ𝑧1 , ǁ𝑧2 , ⋯ , ǁ𝑧 𝑀 from encoder • ǁ𝑧 𝑖 = 𝐸𝑛 𝑥 𝑖 • Sample M codes 𝑧1, 𝑧2, ⋯ , 𝑧 𝑀 from prior P(z) • Generate M codes ෤𝑥1, ෤𝑥2, ⋯ , ෤𝑥 𝑀 from decoder • ෤𝑥 𝑖 = 𝐷𝑒 𝑧 𝑖 • Update Dis to increase 𝐷𝑖𝑠 𝑥 𝑖, ǁ𝑧 𝑖 , decrease 𝐷𝑖𝑠 ෤𝑥 𝑖, 𝑧 𝑖 • Update En and De to decrease 𝐷𝑖𝑠 𝑥 𝑖 , ǁ𝑧 𝑖 , increase 𝐷𝑖𝑠 ෤𝑥 𝑖, 𝑧 𝑖
  • 114. Encoder Decoder Discriminator Image x code z Image x code z Image x code z from encoder or decoder? (real) (generated) (from prior distribution) 𝑃 𝑥, 𝑧 𝑄 𝑥, 𝑧 Evaluate the difference between P and QP and Q would be the same Optimal encoder and decoder: En(x’) = z’ De(z’) = x’ En(x’’) = z’’De(z’’) = x’’ For all x’ For all z’’
  • 115. BiGAN En(x’) = z’ De(z’) = x’ En(x’’) = z’’De(z’’) = x’’ For all x’ For all z’’ How about? Encoder Decoder x z ෤𝑥 Encoder Decoder x z ǁ𝑧 En, De En, De Optimal encoder and decoder:
  • 116. Domain Adversarial Training • Training and testing data are in different domains Training data: Testing data: Generator Generator The same distribution feature feature Take digit classification as example
  • 117. blue points red points Domain Adversarial Training feature extractor (Generator) Discriminator (Domain classifier) image Which domain? Always output zero vectors Domain Classifier Fails
  • 118. Domain Adversarial Training feature extractor (Generator) Discriminator (Domain classifier) image Label predictor Which digits? Not only cheat the domain classifier, but satisfying label predictor at the same time More speech- and text-related applications in the following. Successfully applied on image classification [Ganin et al, ICML, 2015][Ajakan et al. JMLR, 2016 ] Which domain?
  • 119. Domain-adversarial training Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation, ICML, 2015 Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, Domain-Adversarial Training of Neural Networks, JMLR, 2016
  • 120. Outline of Part 1 Generation by GAN • Image Generation as Example • Theory behind GAN • Issues and Possible Solutions • Evaluation Conditional Generation Feature Extraction Unsupervised Conditional Generation Relation to Reinforcement Learning
  • 121. Domain X Domain Y male female It is good. It’s a good day. I love you. It is bad. It’s a bad day. I don’t love you. Unsupervised Conditional Generation Transform an object from one domain to another without paired data (e.g. style transfer) GDomain X Domain Y
  • 122. Unsupervised Conditional Generation • Approach 1: Direct Transformation • Approach 2: Projection to Common Space ?𝐺 𝑋→𝑌 Domain X Domain Y For texture or color change 𝐸𝑁 𝑋 𝐷𝐸 𝑌 Encoder of domain X Decoder of domain Y Larger change, only keep the semantics Domain YDomain X Face Attribute
  • 123. ? Direct Transformation 𝐺 𝑋→𝑌 Domain X Domain Y 𝐷 𝑌 Domain Y Domain X scalar Input image belongs to domain Y or not Become similar to domain Y
  • 124. Direct Transformation 𝐺 𝑋→𝑌 Domain X Domain Y 𝐷 𝑌 Domain Y Domain X scalar Input image belongs to domain Y or not Become similar to domain Y Not what we want! ignore input
  • 125. Direct Transformation 𝐺 𝑋→𝑌 Domain X Domain Y 𝐷 𝑌 Domain X scalar Input image belongs to domain Y or not Become similar to domain Y Not what we want! ignore input [Tomer Galanti, et al. ICLR, 2018] The issue can be avoided by network design. Simpler generator makes the input and output more closely related.
  • 126. Direct Transformation 𝐺 𝑋→𝑌 Domain X Domain Y 𝐷 𝑌 Domain X scalar Input image belongs to domain Y or not Become similar to domain Y Encoder Network Encoder Network pre-trained as close as possible Baseline of DTN [Yaniv Taigman, et al., ICLR, 2017]
  • 127. Direct Transformation 𝐺 𝑋→𝑌 𝐷 𝑌 Domain Y scalar Input image belongs to domain Y or not 𝐺Y→X as close as possible Lack of information for reconstruction [Jun-Yan Zhu, et al., ICCV, 201 Cycle consistency
  • 128. Direct Transformation 𝐺 𝑋→𝑌 𝐺Y→X as close as possible 𝐺Y→X 𝐺 𝑋→𝑌 as close as possible 𝐷 𝑌𝐷 𝑋 scalar: belongs to domain Y or not scalar: belongs to domain X or not
  • 129. Cycle GAN – Silver Hair • https://github.com/Aixile/c hainer-cyclegan
  • 130. Cycle GAN – Silver Hair • https://github.com/Aixile/c hainer-cyclegan
  • 131. Issue of Cycle Consistency • CycleGAN: a Master of Steganography [Casey Chu, et al., NIPS workshop, 2017] 𝐺Y→X𝐺 𝑋→𝑌 The information is hidden.
  • 132. Cycle GAN Dual GAN Disco GAN [Jun-Yan Zhu, et al., ICCV, 2017 [Zili Yi, et al., ICCV, 2017] [Taeksoo Kim, et al., ICML, 2017]
  • 133. StarGAN For multiple domains, considering starGAN [Yunjey Choi, arXiv, 2017]
  • 137. Unsupervised Conditional Generation • Approach 1: Direct Transformation • Approach 2: Projection to Common Space ?𝐺 𝑋→𝑌 Domain X Domain Y For texture or color change 𝐸𝑁 𝑋 𝐷𝐸 𝑌 Encoder of domain X Decoder of domain Y Larger change, only keep the semantics Domain YDomain X Face Attribute
  • 138. Domain X Domain Y 𝐸𝑁 𝑋 𝐸𝑁𝑌 𝐷𝐸 𝑌 𝐷𝐸 𝑋image image image image Face Attribute Projection to Common Space Target
  • 139. Domain X Domain Y 𝐸𝑁 𝑋 𝐸𝑁𝑌 𝐷𝐸 𝑌 𝐷𝐸 𝑋image image image image Minimizing reconstruction error Projection to Common Space Training
  • 140. 𝐸𝑁 𝑋 𝐸𝑁𝑌 𝐷𝐸 𝑌 𝐷𝐸 𝑋image image image image Minimizing reconstruction error Because we train two auto-encoders separately … The images with the same attribute may not project to the same position in the latent space. 𝐷 𝑋 𝐷 𝑌 Discriminator of X domain Discriminator of Y domain Minimizing reconstruction error Projection to Common Space Training
  • 141. Sharing the parameters of encoders and decoders Projection to Common Space Training 𝐸𝑁 𝑋 𝐸𝑁𝑌 𝐷𝐸 𝑋 𝐷𝐸 𝑌 Couple GAN[Ming-Yu Liu, et al., NIPS, 2016] UNIT[Ming-Yu Liu, et al., NIPS, 2017]
  • 142. 𝐸𝑁 𝑋 𝐸𝑁𝑌 𝐷𝐸 𝑌 𝐷𝐸 𝑋image image image image Minimizing reconstruction error The domain discriminator forces the output of 𝐸𝑁𝑋 and 𝐸𝑁𝑌 have the same distribution. From 𝐸𝑁𝑋 or 𝐸𝑁𝑌 𝐷 𝑋 𝐷 𝑌 Discriminator of X domain Discriminator of Y domain Projection to Common Space Training Domain Discriminator 𝐸𝑁𝑋 and 𝐸𝑁𝑌 fool the domain discriminator [Guillaume Lample, et al., NIPS, 2017]
  • 143. 𝐸𝑁 𝑋 𝐸𝑁𝑌 𝐷𝐸 𝑌 𝐷𝐸 𝑋image image image image 𝐷 𝑋 𝐷 𝑌 Discriminator of X domain Discriminator of Y domain Projection to Common Space Training Cycle Consistency: Used in ComboGAN [Asha Anoosheh, et al., arXiv, 017] Minimizing reconstruction error
  • 144. 𝐸𝑁 𝑋 𝐸𝑁𝑌 𝐷𝐸 𝑌 𝐷𝐸 𝑋image image image image 𝐷 𝑋 𝐷 𝑌 Discriminator of X domain Discriminator of Y domain Projection to Common Space Training Semantic Consistency: Used in DTN [Yaniv Taigman, et al., ICLR, 2017] and XGAN [Amélie Royer, et al., arXiv, 2017] To the same latent space
  • 145. Outline of Part 1 Generation by GAN • Image Generation as Example • Theory behind GAN • Issues and Possible Solutions • Evaluation Conditional Generation Feature Extraction Unsupervised Conditional Generation Relation to Reinforcement Learning
  • 146. Basic Components EnvActor Reward Function Video Game Go Get 20 scores when killing a monster The rule of GO You cannot control
  • 147. • Input of neural network: the observation of machine represented as a vector or a matrix • Output neural network : each action corresponds to a neuron in output layer … … NN as actor pixels fire right left Score of an action 0.7 0.2 0.1 Take the action based on the probability. Neural network as Actor
  • 148. Start with observation 𝑠1 Observation 𝑠2 Observation 𝑠3 Example: Playing Video Game Action 𝑎1: “right” Obtain reward 𝑟1 = 0 Action 𝑎2 : “fire” (kill an alien) Obtain reward 𝑟2 = 5
  • 149. Start with observation 𝑠1 Observation 𝑠2 Observation 𝑠3 Example: Playing Video Game After many turns Action 𝑎 𝑇 Obtain reward 𝑟𝑇 Game Over (spaceship destroyed) This is an episode. We want the total reward be maximized. Total reward: 𝑅 = ෍ 𝑡=1 𝑇 𝑟𝑡
  • 150. Actor, Environment, Reward 𝜏 = 𝑠1, 𝑎1, 𝑠2, 𝑎2, ⋯ , 𝑠 𝑇, 𝑎 𝑇 Trajectory Actor 𝑠1 𝑎1 Env 𝑠2 Env 𝑠1 𝑎1 Actor 𝑠2 𝑎2 Env 𝑠3 𝑎2 ……
  • 151. Reward Function → Discriminator Reinforcement Learning v.s. GAN Actor 𝑠1 𝑎1 Env 𝑠2 Env 𝑠1 𝑎1 Actor 𝑠2 𝑎2 Env 𝑠3 𝑎2 …… 𝑅 𝜏 = ෍ 𝑡=1 𝑇 𝑟𝑡 Reward 𝑟1 Reward 𝑟2 “Black box” You cannot use backpropagation. Actor → Generator Fixed updatedupdated
  • 152. Imitation Learning We have demonstration of the expert. Actor 𝑠1 𝑎1 Env 𝑠2 Env 𝑠1 𝑎1 Actor 𝑠2 𝑎2 Env 𝑠3 𝑎2 …… reward function is not available Ƹ𝜏1, Ƹ𝜏2, ⋯ , Ƹ𝜏 𝑁 Each Ƹ𝜏 is a trajectory of the expert. Self driving: record human drivers Robot: grab the arm of robot
  • 153. Inverse Reinforcement Learning Reward Function Environment Optimal Actor Inverse Reinforcement Learning Using the reward function to find the optimal actor. Modeling reward can be easier. Simple reward function can lead to complex policy. Reinforcement Learning Expert demonstration of the expert Ƹ𝜏1, Ƹ𝜏2, ⋯ , Ƹ𝜏 𝑁
  • 154. Framework of IRL Expert ො𝜋 Actor 𝜋 Obtain Reward Function R Ƹ𝜏1, Ƹ𝜏2, ⋯ , Ƹ𝜏 𝑁 𝜏1, 𝜏2, ⋯ , 𝜏 𝑁 Find an actor based on reward function R By Reinforcement learning ෍ 𝑛=1 𝑁 𝑅 Ƹ𝜏 𝑛 > ෍ 𝑛=1 𝑁 𝑅 𝜏 Reward function → Discriminator Actor → Generator Reward Function R The expert is always the best.
  • 155. 𝜏1, 𝜏2, ⋯ , 𝜏 𝑁 GAN IRL G D High score for real, low score for generated Find a G whose output obtains large score from D Ƹ𝜏1, Ƹ𝜏2, ⋯ , Ƹ𝜏 𝑁 Expert Actor Reward Function Larger reward for Ƹ𝜏 𝑛, Lower reward for 𝜏 Find a Actor obtains large reward
  • 156. Robot http://rll.berkeley.edu/gcl/ Chelsea Finn, Sergey Levine, Pieter Abbeel, ” Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization”, ICML, 2016
  • 157. Concluding Remarks Generation Conditional Generation Unsupervised Conditional Generation Relation to Reinforcement Learning