5. Why generative model?
• Unsupervised learning is future
• Many Applications: Image compression, debluring, generate
synthetic images, frames, text to image and so on.
3
6. Challenges of generative model
• Probabilistic dependency on previous contents like pixels
• Complex and highly dimensional structures like images
• Inability to train complex and expressive and tractable yet scalable
models
4
7. Generative models
• Laten Variable models (VAES, DRAW1
)
• Adversarial models (GAN2
)
• Autoregressive models (NADE3
, MADE4
, RIDE5
)
1Karol Gregor et al. “DRAW: A recurrent neural network for image generation”. In:
arXiv preprint arXiv:1502.04623 (2015).
2Ian Goodfellow et al. “Generative adversarial nets”. In: NIPS. 2014.
3Hugo Larochelle and Iain Murray. “The Neural Autoregressive Distribution
Estimator.” In: AISTATS. vol. 1. 2011, p. 2.
4Mathieu Germain et al. “MADE: Masked Autoencoder for Distribution Estimation.”
In: ICML. 2015.
5Lucas Theis and Matthias Bethge. “Generative Image Modeling Using Spatial
LSTMs”. In: NIPS. 2015.
5
8. Comparison of generative model
Image Generation Models
-Three image generation approaches are dominating the field:
Variational AutoEncoders (VAE) Generative Adversarial Networks (GAN)
z
x
)(~ zpz θ
)|(~ zxpx θ
Decoder
Encoder
)|( xzqφ
x
z
Real
D
G
Fake
Real/Fake ?
generate
Autoregressive Models
(cf. https://openai.com/blog/generative-models/)
VAE GAN Autoregressive Models
Pros.
- Efficient inference with
approximate latent variables.
- generate sharp image.
- no need for any Markov chain or
approx networks during sampling.
- very simple and stable training process
- currently gives the best log likelihood.
- tractable likelihood
Cons.
- generated samples tend to be
blurry.
- difficult to optimize due to
unstable training dynamics.
- relatively inefficient during sampling
This slide is from Yohei Sugawara
6
12. Generative image modeling with Spatial LSTM
MCGSM: mixtures of conditional Gaussian mixutre6
The figure is from RIDE7
6Lucas Theis, Reshad Hosseini, and Matthias Bethge. “Mixtures of conditional
Gaussian scale mixtures applied to multiscale image representations”. In: PloS one
(2012).
7Lucas Theis and Matthias Bethge. “Generative Image Modeling Using Spatial
LSTMs”. In: NIPS. 2015.
9
13. Row LSTM
• Capture a roughly triangular
context.
• 1-D convolutional Kernel size
K 3
• Convolution is masked
• Input to state is parallelized
(output feature size is
4h × n × n)
10
15. Diagonal BiLSTM Skew Operation
• Parallelized by skew operation
• n × n ←→ n × (2n − 1)
• Convolutional kernel is 2 x 1
12
16. PixelCNN
• Large bounded receptive field replace
the PixelRNN’s unbounded dependency
• Turn the problem into pixel level
classification problem
• Parallelization on train step but not
test generation step
13
17. PixelRNN vs PixelCNN
Previous work: Pixel Recurrent Neural Networks.
“Pixel Recurrent Neural Networks” got best paper award at ICML2016.
They proposed two types of models, PixelRNN and PixelCNN
(two types of LSTM layers are proposed for PixelRNN.)
PixelCNNPixelRNN
masked convolution
Row LSTM Diagonal BiLSTM
PixelRNN PixelCNN
Pros.
• effectively handles long-range dependencies
⇒ good performance
Convolutions are easier to parallelize ⇒ much faster to train
Cons.
• Each state needs to be computed sequentially.
⇒ computationally expensive
Bounded receptive field ⇒ inferior performance
Blind spot problem (due to the masked convolution) needs to be eliminated.
• LSTM based models are natural choice for
dealing with the autoregressive dependencies.
• CNN based model uses masked convolution,
to ensure the model is causal.
11w 12w 13w
21w 22w 23w
31w 32w 33w
This slide is from Yohei Sugawara
14
18. Multi-scale PixelRNN
• Uncondional PixelRNN and one more
conditional PixelRNNs
• Use a small original image as a sample.
• Conditional network is similar to
PixelRNN but biased by up-sampled
version of the given small image.
15
21. Masked Convolution
• Masks are adopted to avoid capturing future context.
• Mask A is only used at the first convolutional layer, mask B is all the
subsequent input-to-state convolutional transitions.
MADE:Masked Autoencoder for Distribution Estimation8
8Mathieu Germain et al. “MADE: Masked Autoencoder for Distribution Estimation.”
In: ICML. 2015.
17
29. Summary
• Raw and Diagonal LSTM, PixelCNN
• Using softmax layer
• Using Masked convolution
• Using Residual connection
• New SoA MNIST, CIFAR-10 and tested on ImageNet
23
30. Useful resources
• Sergei Turukin PixelCNN post and implementation
• PixeRNN conference presentation
• PixelRNN Review byKyle Kastner
• Post for Draw
24