Description
Generative adversarial network (GAN) has recently emerged as a promising generative modeling approach. It consists of a generative network and a discriminative network. Through the competition between the two networks, it learns to model the data distribution. In addition to modeling the image/video distribution in computer vision problems, the framework finds use in defining visual concept using examples. To a large extent, it eliminates the need of hand-crafting objective functions for various computer vision problems. In this tutorial, we will present an overview of generative adversarial network research. We will cover several recent theoretical studies as well as training techniques and will also cover several vision applications of generative adversarial networks.
3. Before we start
3
• Due to the
exponential growth,
we will miss several
important GAN
works.
• The GAN Zoo
https://deephunt.in/
the-gan-
zoo-79597dc8c347
Image from Deep Hunt, blog by Avinash Hindupur
4. Outlines
1. Introduction
2. GAN objective
3. GAN training
4. Joint image distribution and video distribution
5. Computer vision applications
4
8. Step 1: observe a set of samples
Example: Gaussian mixture models
8
p(x|✓) =
X
i
⇡iN(x|µi, ⌃i)
Step 2: assume a GMM model
Step 3: perform maximum likelihood learning
max
✓
X
x(j)2Dataset
log p(✓|x(j)
)
9. Example: Gaussian mixture models
9
p(x|✓) =
X
i
⇡iN(x|µi, ⌃i)
1. [Density Estimation]
2. [Sample Generation]
Good for low-dimensional data
10. Manifold assumption
• Data lie approximately on a manifold of much lower
dimension than the input space.
10
Latent space Data space
Images credit Ward, A. D. et al 2007
12. Generative modeling
via autoencoders and GMM
• Two stage process
1. Apply an autoencoder to embed
images in a low dimensional space.
2. Fit a GMM to the embeddings
• Drawbacks
• Not end-to-end
• can still be high-dimensional
• Tend to memorize samples.
12
Encoder
Network E
Decoder
Network: D
Input image
Latent code
Reconstructed image
Vincent et al. JMLR 2011
min
E,D
X
x(j)
||x(j)
D(E(x(j)
))||2
E(x(j)
)
E(x(j)
)
13. Variational autoencoders
• Derive from a variational framework
• Constrain the encoder to output a conditional
Gaussian distribution
• The decoder reconstructs inputs from samples
from the conditional distribution
13
• Kingma and Welling, “Auto-encoding variational
Bayes," ICLR 2014
• Rezende, Mohamed, and Wierstra, “Stochastic
back-propagation and variational inference in deep
latent Gaussian models,” ICML 2014
E(x(j)
) = q✓(z|x(j)
) = N(z|µ(x(j)
), ⌃(x(j)
))
D(z(j)
) = p✓(x(j)
|z(j)
⇠ q✓(z|x(j)
))
Encoder
Network E
Decoder
Network: D
Input image
Latent code
Reconstructed image
~
~
14. Maximizing a variational lower bound
14
Encoder
Network E
Decoder
Network: D
Input image
Latent code
Reconstructed image Ideally, we would like to maximize
But we maximize a lower bound for tractability
max
✓
L(✓|Dataset) ⌘
X
x(j)
log p(✓|x(j)
)
max
✓
LV (✓|Dataset) ⌘
X
x(j)
KL(q✓(z|x(j)
)||N(z|0, I))+
Ez⇠(q✓(z|x(j))[log p✓(x(j)
|z)]
L(✓|Dataset)
15. Reparameterization trick
15
• Make sampling a differentiable operation
• The variational distribution
• Sampling
E(x) D(z)
q✓(z|x) = N(z|µ(x), ⌃(x))
z = µ(x) + ⌃(x)✏, where ✏ ⇠ N(0, I)
µ(x) ⌃(x)
Image credit, Courville 2017
17. Why are generated samples blurry?
• Pixel values are modeled as a conditional Gaussian, which
leads to Euclidean loss minimization.
• Regress to the mean problem and rendering blurry image.
• Difficult to hand-craft a good perceptual loss function.
17
log p✓(x(j)
|z) = ||x(j)
D(z)||2
+ Const
18. Pitfall of Euclidean distance
for image modeling
• Blue curve plots the Euclidean distance between a reference
image and its horizontal translation.
• Red curve is the Euclidean distance between and
18
20. PixelCNNs
• Oord, Aaron van den, et al. 2016
• Use masked convolution to enforce the autoregressive
relationship.
• Minimizing cross entropy loss instead of Euclidean loss.
20
Slide credit, Courville 2017
23. Parallel multiscale autoregressive
density estimation
23
• Reed et al. ICML 2017
• Can we speed generation time of PixelCNNs?
• Yes, via multiscale generation
Slide credit, Courville 2017
25. Generative adversarial networks
25
Discriminator
Network: D
Input image
1
Generator
Network G
Discriminator
Network: D
Random vector
Generated image
0 • Goodfellow et al. NIPS
2014
• Forget about designing a
perceptual loss. Let’s train
a discriminator to
differential real and fake
image.
30. Taxonomy of generative models
30
Maximum
likelihood
Explicit density
Trackable
density
Approximate
density
Implicit density
GANs
VAEsPixelCNNs
Slide credit Goodfellow 2016
31. Quick recaps
• Generative modeling
• Manifold assumption
• VAEs, PixelCNNs, GANs
• Taxonomy
31
Maximum
likelihood
Explicit
density
Trackable
density
Approximate
density
Implicit
density
GAN
VAEsPixelCNN
32. Outlines
1. Introduction
2. GAN objective
3. GAN training
4. Joint image distribution and video distribution
5. Computer vision applications
32
34. 2. GAN objective
1. GAN objective
2. Discriminator strategy
3. GAN theory
4. Effect of limited discriminator capacity
34
35. GAN objectives
35
min
G
max
D
Ex⇠pX
[log D(x)] + Ez⇠pZ
[log(1 D(G(z))]
Solving a minimax problem
pX : Data distribution,
usually represented by samples.
pG(Z) : Model distribution, where
Z is usually modeled as uniform or Gaussian.
36. Alternating gradient updates
• Step 1: Fix G and perform a gradient step to
• Step 2: Fix D and perform a gradient step to
(in theory)
(in practice)
36
max
D
Ex⇠pX
[log D(x)] + Ez⇠pZ
[log(1 D(G(z))]
min
G
Ez⇠pZ
[log(1 D(G(z))]
max
G
Ez⇠pZ
[log D(G(z))]
log D(G(z))
bounded above
log 1-D(G(z))
unbounded below
bad for minimization
38. Learning a Gaussian mixture model
using GANs
• Red points: true samples;
samples from data
distribution
• Blue points: fake samples;
samples from model
distribution
• Z-space: a Gaussian
distribution of N(0,I)
• The model first learns 1
mode, then 2 modes, and
finally 3 modes.
38
2D GMM learning
39. GAN theory
• Optimal (non-parametric) discriminator
• Under an ideal discriminator, the generator minimizes the
Jensen-Shannon divergence between pX and pG(Z) . This also
requires that D and G have sufficient capacity and a sufficiently
large dataset.
39
D(x) =
pX (x)
pX(x) + pG(Z)(x)
JS(pX |pG(Z)) = KL(pX ||
pX + pG(Z)
2
)+
KL(pG(Z)||
pX + pG(Z)
2
)
KL(pX |pG(Z)) = EpX
[log
pX (x)
pG(Z)(x)
]
40. For a discriminator with capacity n, the generator only
need to remembers samples to be epsilon
away from the optimal objective.
In practice
• Dataset sizes are limited.
• Finite network capacity
• (Sanjeev Arora et al. ICML 2017)
40
C
n
✏2
log n
Consequences:
1. The generator does not need to generalize.
2. Larger training datasets have limited utility.
Capacity meaning # of trainable parameters.
41. Effect of limited discriminator capacity
41
CIFAR10
dataset
Image credit Arora and Zhang 2017
42. Quick Recaps
• Discriminator strategy
• GAN theory
• Effect of limited
discriminator capacity
42
D(x) =
pX (x)
pX(x) + pG(Z)(x)
43. Outlines
1. Introduction
2. GAN objective
3. GAN training
4. Joint image distribution and video distribution
5. Computer vision applications
43
45. 3. GAN Training
1. Non-convergence in GANs
2. Mode collapse
3. Lake of global structure
4. Tricks
5. New objective functions
6. Surrogate objective functions
7. Network architectures
45
46. Non-convergence in GANs
• GAN training is theoretically guaranteed to converge if we
can modify the density functions directly, but
• Instead, we modify G (sample generation function)
• We represent G and D as highly non-convex parametric
functions.
• “Oscillation”: can train for a very long time, generating
very many different categories of samples, without clearly
generating better samples.
• Mode collapse: most severe form of non-convergence
46
Slide credit Goodfellow 2016
52. Improve GAN training
52
Tricks
• Label smoothing
• Historical batches
• …
Surrogate or auxiliary objective
• UnrolledGAN
• WGAN-GP
• DRAGAN
• …
New objectives
• EBGAN
• LSGAN
• WGAN
• BEGAN
• fGAN
• …
Network architecture
• LAPGAN
• …
53. Improve GAN training
53
Tricks
• Label smoothing
• Historical batches
• …
Surrogate or auxiliary objective
• UnrolledGAN
• WGAN-GP
• DRAGAN
• …
New objectives
• EBGAN
• LSGAN
• WGAN
• BEGAN
• fGAN
• …
Network architecture
• LAPGAN
• …
54. Tricks - one sided label smoothing
• Does not reduce classification accuracy, only confidence.
• Prevents discriminator from giving very large gradient
signal to generator.
• Prevents extrapolating to encourage extreme samples.
• Two sided label smoothing https://github.com/soumith/
ganhacks
54
Ex⇠pX
[0.9 log D(x)] + Ez⇠pZ
[log(1 D(G(z)))]
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen,
“Improved techniques for training GANs”, NIPS 2016
55. Tricks - historical generator batches
Help stabilize the discriminator
training in the early stage.
55
Discriminator
Network: D
Input images
@ t iteration
1
Generator
Network G
Discriminator
Network: D
Random vector
Generated
images
@ t
0
Discriminator
Network: D
Generated
images
@ t-1
0
Discriminator
Network: D
Generated
images
@ t-K
0
A. Srivastava, T. Ofister, O. Tuzel, J. Susskind, W. Wang, R. Webb, “Learning from
Simulated and Unsupervised Images through adversarial training”. CVPR’17
56. Other tricks
• https://github.com/soumith/ganhacks
• Use labels
• Normalize the inputs to [-1,1]
• Use TanH
• Use BatchNorm
• Avoid sparse gradients: ReLU, MaxPool
• …
• Salimans et al. NIPS 2016
• Use Virtual BatchNorm
• MiniBatch Discrimination
• …
56
57. Improve GAN training
57
Tricks
• Label smoothing
• Historical batches
• …
Surrogate or auxiliary objective
• UnrolledGAN
• WGAN-GP
• DRAGAN
• …
New objectives
• EBGAN
• LSGAN
• WGAN
• BEGAN
• fGAN
• …
Network architecture
• LAPGAN
• …
58. EBGAN
58
Junbo Zhao, Michael Mathieu, Yann LeCun, “Energy-based generative adversarial
networks,” ICLR 2017
Discriminator
Network: D
Input image
1
Generator
Network G
Discriminator
Network: D
Random vector
Generated image
0
Energy
Function
Input image
Low energy
value
Generator
Network G
Energy
Function
Random vector
Generated image
High energy
value
Energy function
can be modeled
by an auto-
encoder.
59. EBGAN
59
Junbo Zhao, Michael Mathieu, Yann LeCun, “Energy-based generative adversarial
networks,” ICLR 2017
En(x) = ||Decoder(Encoder(x)) x||
Discriminator objective
min
En
Ex⇠pX
[En(x)] + Ez⇠pZ
E[[m En(G(z))]+
]
Generator objective
min
G
Ez⇠pZ
E[En(G(z))]
61. LSGAN
X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, “Least squares generative adversarial
networks” 2016
Still use a classifier but replace cross-entropy loss with Euclidean loss.
min
D
Ex⇠pX
[ log D(x)] + Ez⇠pZ
[ log(1 D(G(z)))]
Discriminator
min
D
Ex⇠pX
[(D(x) 1)2
] + Ez⇠pZ
[D(G(z))2
]
GAN
LSGAN
Generator
min
G
Ez⇠pZ
[(D(G(z)) 1)2
]
min
G
Ez⇠pZ
[ log D(G(z))]GAN
LSGAN
61
62. LSGAN
X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, “Least squares generative adversarial
networks” 2016
JS(pX ||pG(Z)) = KL(pX ||
pX + pG(Z)
2
) + KL(pG(Z)||
pX + pG(Z)
2
)
GAN: minimize Jensen-Shannon divergence between pX and pG(Z)
LSGAN: minimize chi-square distance between pX + pG(Z) and 2pG(Z)
X2
(pX + pG(Z)||2pG(Z)) =
Z
X
2pG(Z)(x) (pG(Z)(x) + pX(x))
pG(Z)(x) + pX (x)
dx2
(
pX + pG(Z)
2
||pG(Z))
62
63. 63
LSGAN
X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, “Least squares generative adversarial
networks” 2016
64. Wasserstein GAN
Replace classifier with a critic function
Discriminator
GAN
WGAN
64
M. Arjovsky, S. Chintala, L. Bottou “Wasserstein GAN” 2016
max
D
Ex⇠pX
[D(x)] Ez⇠pZ
[D(G(z))]
max
D
Ex⇠pX
[log D(x)] + Ez⇠pZ
[log(1 D(G(z))]
Generator
GAN
WGAN max
G
Ez⇠pZ
[D(G(z))]
max
G
Ez⇠pZ
[log D(G(z))]
65. Wasserstein GAN
M. Arjovsky, S. Chintala, L. Bottou “Wasserstein GAN” 2016
JS(pX ||pG(Z)) = KL(pX ||
pX + pG(Z)
2
) + KL(pG(Z)||
pX + pG(Z)
2
)
GAN: minimize Jensen-Shannon divergence between pX and pG(Z)
WGAN: minimize earth mover distance between pX and pG(Z)
EM(pX , pG(Z)) = inf
2
Q
(pX ,pG(Z))
E(x,y)⇠ [||x y||]
65
68. GAN vs WGAN
68
Slide credit, Courville 2017
ZJS(pX||pG(Z)) = KL(pX||
pX + pG(Z)
2
) + KL(pG(Z)||
pX + pG(Z)
2
)
JS(pX||pG(Z)) = KL(pX||
pX + pG(Z)
2
) + KL(pG(Z)||
pX + pG(Z)
2
)
Jesen-Shannon divergence in this
example
69. GAN vs WGAN
69
Slide credit, Courville 2017
Earth Mover distance in this
example
EM(pX , pG(Z)) = inf
2
Q
(pX ,pG(Z))
E(x,y)⇠ [||x y||]
EM(pX, pG(Z)) = |✓|
70. GAN vs WGAN
GAN WGAN
• If we can directly change the density shape parameter,
the Earth Mover distance is smoother.
• But we do not directly change the density shape
parameter, we change the generation function.
71. Boundary Equilibrium GAN (BEGAN)
71
David Berthelot, Thomas Schumm, Luke Metz, “Boundary equilibrium generative
adversarial networks” 2017
Energy
Function
Input image
Low energy
value
Generator
Network G
Energy
Function
Random vector
Generated image
High energy
value
• Use EBGAN autoencoder-based
discriminator.
• Not based on minimizing distance
between pX and pG(Z)
• Based on minimizing distance
between pEn(X) and pEn(G(Z))
• The distance is a lower bound on
the Wasserstein 1 (Earth Mover)
distance.
72. 72
David Berthelot, Thomas Schumm, Luke Metz, “Boundary equilibrium generative
adversarial networks” 2017
Boundary Equilibrium GAN (BEGAN)
BEGAN
EM(pEn(X), pEn(G(Z))) = inf
2
Q
(pEn(X),pEn(G(Z)))
E(x1,x2)⇠ [|x1 x2|]
EM(pX, pG(Z)) = inf
2
Q
(pX ,pG(Z))
E(x,y)⇠ [||x y||]
WGAN
inf E(x1,x2)⇠ [|x1 x2|] inf |E(x1,x2)⇠ [x1 x2]| = mean(x1) mean(x2)
But low bound
max
G
min
D
Ex⇠pX
[En(x)] Ez⇠pZ
[En(G(z))]
Final objective function
73. 73
Boundary Equilibrium GAN (BEGAN)
David Berthelot, Thomas Schumm, Luke Metz, “Boundary equilibrium generative
adversarial networks” 2017
Discriminator objective: min
En
En(x) ktEn(G(z))
Generator objective: min
G
En(G(z))
Equilibrium objective: kt+1 = kt + k( En(x) En(G(z)))
Hyperparamete: =
En(G(z))
En(x)
value between 0 and 1
74. 74
Boundary Equilibrium GAN (BEGAN)
David Berthelot, Thomas Schumm, Luke Metz, “Boundary equilibrium generative
adversarial networks” 2017
75. Improve GAN training
75
Tricks
• Label smoothing
• Historical batches
• …
Surrogate or auxiliary objective
• UnrolledGAN
• WGAN-GP
• DRAGAN
• …
New objectives
• EBGAN
• LSGAN
• WGAN
• BEGAN
• fGAN
• …
Network architecture
• LAPGAN
• …
76. UnrolledGAN
• Considering several future updates of the discriminator as
updating the generator.
• Update the discriminator in the same way.
• Avoid that the generator generates samples from few modes.
76
L. Metz, B. Poole, D. Pfau, J. Sohl-Dickstein, “Unrolled generative adversarial
networks” ICLR 2017
77. UnrolledGAN
77
UnrolledGAN
GAN
• DK if a function of D and G
• Considering several future updates of the discriminator as
updating the generator.
max
G
Ez⇠pZ
[log DK(G(z))]Surrogate objective
78. WGAN-GP
• y: imaginary samples
78
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Domoulin, A. Courville “Improved Training of
Wasserstein GANs” 2017
y = ux + (1 u)G(z)
min
G
max
D
Ex⇠pX
[D(x)] Ez⇠pZ
[D(G(Z))] + Ey⇠pY
[(||ryD(y)||2 1)2
]min
G
max
D
Ex⇠pX
[D(x)] Ez⇠pZ
[D(G(Z))] + Ey⇠pY
[(||ryD(y)||2 1)2
]
Optimal critic has unit gradient norm almost everywhere
79. DRAGAN
79
N. Kodali, J. Abernethy, J. Hays, Z. Kira “How to train your DRAGAN” 2017
min
G
max
D
Ex⇠pX
[log D(x)] Ez⇠pZ
[log(1 D(G(Z)))] + Ey⇠pY
[(||ryD(y)||2 1)2
]
y = ↵x + (1 ↵)(x + ) y: imaginary samples around true sample
min
G
max
D
Ex⇠pX
[log D(x)] Ez⇠pZ
[log(1 D(G(Z)))] + Ey⇠pY
[(||ryD(y)||2 1)2
]
80. Improve GAN training
80
Tricks
• Label smoothing
• Historical batches
• …
Surrogate or auxiliary objective
• UnrolledGAN
• WGAN-GP
• DRAGAN
• …
New objectives
• EBGAN
• LSGAN
• WGAN
• BEGAN
• fGAN
• …
Network architecture
• LAPGAN
• …
81. LAPGAN
81
E. Denton, S. Chintala, A. Szlam, R. Fergus “Deep Generative Image Models
using a Laplacian Pyramid of Adversarial Networks” NIPS 2015
Testing
82. LAPGAN
82
E. Denton, S. Chintala, A. Szlam, R. Fergus “Deep Generative Image Models
using a Laplacian Pyramid of Adversarial Networks” NIPS 2015
Training of each resolution is performed independent of the others.
83. Evaluation
• Popular metrics for GAN evaluation.
• Inception loss
• Train an accurate classifier
• Train a conditional image generation model.
• Check how accurate the classifier can recognize the
the generated images.
• Human evaluation
• Generative models are difficult to evaluate. (This et al.
ICLR 2016)
• You can hack each evaluation metric.
• Still an open problem.
83
84. Outlines
1. Introduction
2. GAN objective
3. GAN training
4. Joint image distribution and video distribution
5. Computer vision applications
84
86. 4. Joint image distribution and video
distribution
1. Joint image distribution learning
2. Video distribution learning
86
87. Joint distribution of multi-domain
images
87
Scene
Image plane
Camera
color
image
depth
image
near
far
thermal
image
cool
hot
• !(#$,#&,…, #(): where #) are images of the scene in different modalities.
• Ex. !(#*+,+-,#./0-12,,#304./):
• In this presentation, “modality” = “domain”
88. Joint distribution of multi-domain
images
88
• Define domain by attribute.
• Multi-domain images are views of an object with different attributes.
• !(#$,#&,…, #(): where #* are images of the object with different attributes.
Non-smiling Smiling Young SeniorNon-beard Beard
summer winterimages Hand-drawings
Font#1 Font#2
89. Extending GAN for joint image
distribution learning
89
Discriminator
Network: D
Input image
1
Generator
Network G
Discriminator
Network: D
Random vector
Generated image
0
90. What if we do not have paired images
from different domains for learning?
90
103. MoCoGANs
103
S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and
Content for Video Generation” 2017
104. 104
ZC = [zC, zC, ..., zC]
Motion trajectorySampled content
zC
z
(1)
M
z
(2)
M
z
(3)
M
z
(4)
M z
(5)
M
ZM = [z
(1)
M , z
(2)
M , ..., z
(K)
M ]
MoCoGANs
S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and
Content for Video Generation” 2017
106. 106
Training:
min
DI
Ev⇠pV
[ log DI(S1(v))] + E˜v⇠p˜V
[ log(1 DI(S1(˜v)))]
min
DV
Ev⇠pV
[ log DV(ST(v))] + E˜v⇠p˜V
[ log(1 DV(ST(˜v)))]
max
GI,RM
E˜v⇠p˜V
[ log(1 DI(S1(˜v)))] + E˜v⇠p˜V
[ log(1 DV(ST(˜v)))].
MoCoGANs
S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and
Content for Video Generation” 2017
107. 107
MoCoGANs
S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and
Content for Video Generation” 2017
108. VGAN vs. MoCoGAN
108
VGAN MoCoGAN
Video generation
Directly generating 3D
volume
Video frames are generated
sequentially.
Generator
Motion dynamic video
and static background
Image
Discriminator
One discriminator
- 3D CNN for clips
Two discriminators
- 3D CNN for clips
- 2D CNN for frames
Variable length No Yes
Motion and Content
Separation
No Yes
User preference on
generated face video
quality
5% 95%
109. Outlines
1. Introduction
2. GAN objective
3. GAN Training
4. Joint image distribution and video distribution
5. Computer vision applications
109
111. Why are GANs useful for computer
vision?
111
Deep NetworksHand-crafted features
Generative
Adversarial
Networks
Hand-crafted
objective function
112. • Let and be two different image domains (day-
time image and night-time image domains).
• Let .
• Image-to-image translation concerns the problem of
translating to a corresponding image
• Correspondence can mean different things in different
contexts.
Image-to-image Translation
X1 X2
x1 2 X1
x1 x2 2 X2
Input Output
Image
Translator
112
113. Examples and use cases
Low-res to high-res Blurry to sharp
Noisy to cleanLDR to HDR
Thermal to color
Image to painting
Day to night Summer to winter
Synthetic to real
• Bad weather to good
weather
• Greyscale to color
• …
113
114. Prior Works
• Image-to-image translation has been studied for decades
in computer vision.
• Different approaches have been exploited including:
• Filtering-based approaches
• Optimization-based approaches
• Dictionary learning-based approaches
• Deep learning-based approaches
• GAN-based approaches.
114
115. Generative Adversarial Networks
(GANs)
• Training is done via applying alternating stochastic
gradient updates to the following two learning problems.
• Effect: minimizing JSD between and
max
G
Ez⇠pN
[log D(G(z))]
max
D
Ex⇠pX
[log D(X)] + Ez⇠pN
[log(1 D(G(z)))]
pXpG(z),z⇠N
G D
D
z ⇠ N
x 2 XGoodfellow et al. NIPS’14
x = G(z)
115
116. Conditional GANs
• Training is done via applying alternating stochastic
gradient updates to the following two learning problems.
• Effect: minimizing JSD between and
F D
Dx2 ⇠ X2
x1 ⇠ X1 x = F(x1)
max
D
Ex2⇠pX2
[log D(x2)] + Ex1⇠pX1
[log(1 D(F(x1)))]
pF (x1),x1⇠X1
pX2
max
F
Ex1⇠pX1
[log D(F(x1))]
116
117. Conditional GAN for Image-to-image
Translation
• Conditional GAN alone is insufficient for image
translation.
• No guarantee that the conditionally generated image is
related to the source image in a desired way.
• In the supervised setting,
•
• This can be easily fixed.
Dataset = {(x
(1)
1 , x
(1)
2 ), (x
(2)
1 , x
(2)
2 ), ..., (x
(N)
1 , x
(N)
2 )}
117
119. SRGAN
119
C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken,
A. Tejani, J. Totz, Z. Wang, W. Shi “Photo-realistic image superresolution using a
generative adversarial networks “, CVPR 2017
121. Pix2Pix
121
P. Isola, J. Zhu, T. Zhou, A. Efros “Image-to-image translation with conditional
generative networks“, CVPR 2017
122. Unsupervised Image-to-image
Translation
• Corresponding images could be expensive to get.
• In the unsupervised setting
•
•
• With no correspondences, we need additional
constraints/assumptions for learning image-to-image
translation.
Dataset2 = {x
(m1)
2 , x
(m2)
2 , ..., x
(mM )
2 }
Dataset1 = {x
(n1)
1 , x
(n2)
1 , ..., x
(nN )
1 }
122
123. SimGAN
F Dx1 ⇠ X1 x = F(x1)
• Srivastava et al. CVPR’17: adding cross-domain content
loss.
max
F
EpX1
[log D(F(x1)) ||F(x1) x1||1]
123
124. Cycle Constraint
• Learning two-way translation.
• DiscoGAN by Kim et al. ICML’17 (arXiv 1703.05192 )
• CycleGAN by Zhu et al. arXiv 1703.10593
F12
D2
F21
D1
ˆx2 = F12(x1)
max
F12,F21
EpX1
[log(D2(F12(x1)) ||F21(F12(x1)) x1||p
p]+
EpX2
[log(D1(F21(x2)) ||F12(F21(x2)) x2||p
p]
124
x1 ⇠ X1 F21(F12(x1))
126. Shared Latent Space Constraint
• CoGAN by Liu et al. NIPS’16
• Assume corresponding images in different domains share
the same representation in a hidden space.
• and
•
G1 D1
D2G2
z ⇠ N
ˆx1 = G1(z)
ˆx2 = G2(z)
weight-sharing
G1 = GL1 GH G2 = GL2 GH
max
GH ,GL1,GL2
EpN
[log D1(GL1(GH(z)) + log D2(GL2(GH(z))]
126
127. Shared Latent Space Constraint
• Under sufficient capacity and descending to local
minimum assumptions as in Goodfellow et al., CoGAN
learns a joint distribution of multi-domain images.
• Image translation
G1 D1
D2G2
z ⇠ N
ˆx1 = G1(z)
ˆx2 = G2(z)
weight-sharing
G2(argminz||G1(z) x1||p
p)
127
128. The CoVAE-GAN Framework
G1 D1
D2G2
E1
E2
x1 ⇠ X1
x2 ⇠ X2
G1(E1(x1))
G1(E2(x2))
G2(E1(x1))
G2(E2(x2))
• Augmenting CoGAN with VAEs for mapping images to the
shared latent space.
• Applying weight-sharing constraints to encoders E1 and E2
• Image reconstruction streams and image translation
streams 128
Ming-Yu Liu, Thomas Breuel, Jan Kautz “Unsupervised Image-to-image
translation networks“, arXiv 2017
130. Intuition
G1
G2
E1
E2
In the ideal case (supervised setting)
Embedding Space
We can enforce that corresponding images are
mapping to the same point in the embedding
space.
E1(x
(i)
1 ) ⌘ E2(x
(i)
2 )
130
131. Intuition
G1
G2
E1
In the unsupervised setting
Embedding Space
Although we can use a GAN discriminator to
make sure the images coming out from G2
resembling real images, there is no guarantee
that the output image from G1 and G2 form a
corresponding pair.
D2
131
132. Intuition
G1
G2
E1
In the unsupervised setting
Embedding Space
However, if we apply shared latent space
assumption via weight-sharing, then the
generated images from G1 and G2 will resemble a
pair of corresponding images. D2
132
133. G1
G2
E1
In the unsupervised setting
Embedding Space
Similarly, we can enforce weight-sharing to E1
and E2 to encourage a pair of corresponding
images to have the same embedding.
D2
E2
D1
133
140. Conclusions
• Family of generative models
• Theory of GANs
• Various training techniques
• Joint image distribution and video distribution
• Image translation applications
140