SlideShare a Scribd company logo
1 of 25
Download to read offline
Pixel RNN 부터
Pixel CNN++ 까지
2020. 01. 16 (목)
이동헌
Contents
Taxonomy of Generative Models
(1) Pixel RNN
(2) Pixel CNN
(3) Gated Pixel CNN
(4) Pixel CNN++
(Google DeepMind, arxiv, 2016)
(Google DeepMind, arxiv, 2016)
(Google DeepMind, NIPS, 2016)
(OpenAI, ICML, 2017)
Taxonomy of Generative Models
Generative model은 Maximum Likelihood를 바탕으로 학습하는 것으로
정리할 수 있으며, 이 때 어떤 식으로 likelihood를 다루느냐 (근사를 할
것이냐 혹은 정확히 표현할 것이냐 등)에 따라 다양한 전략이 존재
Taxonomy of Generative Models
Density (=Prior distribution, model) 정의
(+) 다루기가 비교적 편하고 어느 정도 모델의 움직임이
예측가능
(-) 우리가 아는 것 이상으로는 결과를 낼 수 없는 한계
Density를 정의하지 않고 Sampling 함
Taxonomy of Generative Models
Generator가 만드는 분포로부터 sample을 생성
(Markov Chain과 다르게 input 없이 sample 생성)
sample x′을 반복적으로 뽑다보면 결국에
는 x′이 pmodel(x)로부터 나온 sample로 수렴
(+) Sample간의 분산이 높지 않은 경우 괜찮
은 성능
(-) 고차원에서 성능 떨어지고 계산 느림
Taxonomy of Generative Models
학습 시, Density를
수학적으로 계산
(미적분)이 가능
Neural Autoregressive à
: 이전의 자기 자신을 이용하여
현재의 자신을 예측하는 모델
Taxonomy of Generative Models
• Encoder:
• Decoder: from a latent code z, reconstructed sample
!" #$ z to be close to the data used to obtain the latent code, x
5!67! 5 8 79 8~;< 8 $ , =>?@@A B7!C?@ ß VAE는 결합분포를 적분식으로 표현
하며 이를 ‘직접’ 적분하지 못하기 때문
에 variational inference로 '추정'
(1) Pixel RNN
• Autoregreesive Model의 핵심은, 데이터간의 dependency 순서를 정해주는 것!
• One effective approach to tractably model a joint distribution of the pixels in the
image is to cast it as a product of conditional distributions.
à Pixel (1~n2) 순서로 진행
Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
(1) Pixel RNN
Architecture
Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
(1) Pixel RNN
• R, G, B 순서로 진행
MASK
: First Layer, each of the RGB channels is connected to previous
channels and to the context, but is not connected to itself.
: Subsequent Layers, the channels are also connected to themselves.
Multiple Residual Blocks (모델마다 다름)
Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
(1) Pixel RNN
Input
Hidden
State
input-to-state & state-to-state
Row LSTM
Multiplication à Convolution
https://www.slideshare.net/thinkingfactory/pr12-pixelrnn-jaejun-yoo?from_action=save
(1) Pixel RNN
Input
Hidden
State
input-to-state & state-to-state
Diagonal BiLSTM 2x1 Conv
• Diagonal convolution 어려우므로, skew the feature maps
à it can be parallelized
https://www.slideshare.net/thinkingfactory/pr12-pixelrnn-jaejun-yoo?from_action=save
(2) Pixel CNN
input-to-state
Input
Hidden
State
Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
Experiments
• Discrete Softmax Distribution
Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
Experiments
• Negative log-likelihood (NLL)
Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
Experiments
Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
Experiments
Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
(3) Gated Pixel CNN
v Pixel CNN 성능 개선
1) ReLU à Gated Activation Unit à Conditional PixelCNN
<A single layer in the Gated PixelCNN architecture>
Condition
(Vk,g ∗ s is an unmasked 1 × 1 convolution, h=s)
Van den Oord, Aaron, et al. "Conditional image generation with pixelcnn decoders." Advances in neural information processing systems. 2016.
(3) Gated Pixel CNN
2) Stacks : blinded spot 제거
PixelCNN
1.Horizontal Stack : It conditions only on the current row and takes as input the output of previous layer as
well as the of the vertical stack.
2.Vertical Stack : It conditions on all the rows above the current pixel. It doesn’t have any masking. It’s output
is fed into the horizontal stack and the receptive field grows in rectangular fashion.
Gated PixelCNN
current pixel
https://towardsdatascience.com/auto-regressive-generative-models-pixelrnn-pixelcnn-32d192911173
(4) Pixel CNN++
1) Discretized logistic mixture likelihood
The softmax layer which is used to compute the conditional distribution of a pixel although efficiency is very costly in terms of
memory. Also, it makes gradients sparse early on during training.
à To counter this, we assume a latent color intensity akin to that used in variational autoencoders, with a continuous distribution
It is rounded off to its nearest 8-bit representation to give pixel value. The distribution of intensity is logistic so the pixel values
can be easily determined.
Salimans, Tim, et al. "Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications." arXiv preprint arXiv:1701.05517 (2017).
à This method is memory efficient, output is of lower dimensions which provides denser gradients thus solving both problems.
(4) Pixel CNN++
2) Other Modification
• Conditioning on whole pixels : PixelCNN factorizes the model over the 3 sub pixels according to the color(RGB) which
however, complicates the model. The dependency between color channels of a pixel is relatively simple and doesn’t
require a deep model to train.
à Therefore, it is better to condition on whole pixels instead of separate colors and then output joint distributions over
all 3 channels of the predicted pixel.
• Downsampling : PixelCNN cannot compute long range dependencies. This is one of the disadvantages of PixelCNN as
to why it cannot match the performance of PixelRNN. To overcome this, we downsample the layers by using
convolutions of stride 2. Downsampling reduces input size and thus improves relative size of receptive field which
leads to some loss of information but it can be compensated by adding extra short-cut connections.
https://towardsdatascience.com/auto-regressive-generative-models-pixelrnn-pixelcnn-32d192911173
(4) Pixel CNN++
2) Other Modification
• Short-cut connections : This model the encoder-decoder structure of U-net. Layers 2 and 3 are downsampled and then
layers 5 and 6 are upsampled. There is a residual connection from encoders to decoders to provide the localised
information.
• Dropout : Since the model for PixelCNN and PixelCNN++ are both very powerful, they are likely to overfit data if not
regularized. So, we apply dropout on the residual path after the first convolution.
https://towardsdatascience.com/auto-regressive-generative-models-pixelrnn-pixelcnn-32d192911173
Experiments
Salimans, Tim, et al. "Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications." arXiv preprint arXiv:1701.05517 (2017).
감사합니다

More Related Content

What's hot

What's hot (20)

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
"Attention Is All You Need" presented by Maroua Maachou (Veepee)
"Attention Is All You Need" presented by Maroua Maachou (Veepee)"Attention Is All You Need" presented by Maroua Maachou (Veepee)
"Attention Is All You Need" presented by Maroua Maachou (Veepee)
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
SSII2019TS: Shall We GANs?​ ~GANの基礎から最近の研究まで~
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
 
[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...
[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...
[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work II
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Neural scene representation and rendering の解説(第3回3D勉強会@関東)
Neural scene representation and rendering の解説(第3回3D勉強会@関東)Neural scene representation and rendering の解説(第3回3D勉強会@関東)
Neural scene representation and rendering の解説(第3回3D勉強会@関東)
 

Similar to Pixel RNN to Pixel CNN++

Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
 
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
inside-BigData.com
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
ssuser3aa461
 

Similar to Pixel RNN to Pixel CNN++ (20)

Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
 
Yolo
YoloYolo
Yolo
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Mnist report
Mnist reportMnist report
Mnist report
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Review on cs231 part-2
Review on cs231 part-2Review on cs231 part-2
Review on cs231 part-2
 
Pixel Recurrent Neural Networks
Pixel Recurrent Neural NetworksPixel Recurrent Neural Networks
Pixel Recurrent Neural Networks
 
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentation
 

More from Dongheon Lee (10)

Workshop 210417 dhlee
Workshop 210417 dhleeWorkshop 210417 dhlee
Workshop 210417 dhlee
 
GAN Evaluation
GAN EvaluationGAN Evaluation
GAN Evaluation
 
BeautyGlow
BeautyGlowBeautyGlow
BeautyGlow
 
ModuLab DLC-Medical5
ModuLab DLC-Medical5ModuLab DLC-Medical5
ModuLab DLC-Medical5
 
ModuLab DLC-Medical4
ModuLab DLC-Medical4ModuLab DLC-Medical4
ModuLab DLC-Medical4
 
ModuLab DLC-Medical1
ModuLab DLC-Medical1ModuLab DLC-Medical1
ModuLab DLC-Medical1
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)
 
Deep Learning for AI (3)
Deep Learning for AI (3)Deep Learning for AI (3)
Deep Learning for AI (3)
 
Deep Learning for AI (1)
Deep Learning for AI (1)Deep Learning for AI (1)
Deep Learning for AI (1)
 

Recently uploaded

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Recently uploaded (20)

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 

Pixel RNN to Pixel CNN++

  • 1. Pixel RNN 부터 Pixel CNN++ 까지 2020. 01. 16 (목) 이동헌
  • 2. Contents Taxonomy of Generative Models (1) Pixel RNN (2) Pixel CNN (3) Gated Pixel CNN (4) Pixel CNN++ (Google DeepMind, arxiv, 2016) (Google DeepMind, arxiv, 2016) (Google DeepMind, NIPS, 2016) (OpenAI, ICML, 2017)
  • 3. Taxonomy of Generative Models Generative model은 Maximum Likelihood를 바탕으로 학습하는 것으로 정리할 수 있으며, 이 때 어떤 식으로 likelihood를 다루느냐 (근사를 할 것이냐 혹은 정확히 표현할 것이냐 등)에 따라 다양한 전략이 존재
  • 4. Taxonomy of Generative Models Density (=Prior distribution, model) 정의 (+) 다루기가 비교적 편하고 어느 정도 모델의 움직임이 예측가능 (-) 우리가 아는 것 이상으로는 결과를 낼 수 없는 한계 Density를 정의하지 않고 Sampling 함
  • 5. Taxonomy of Generative Models Generator가 만드는 분포로부터 sample을 생성 (Markov Chain과 다르게 input 없이 sample 생성) sample x′을 반복적으로 뽑다보면 결국에 는 x′이 pmodel(x)로부터 나온 sample로 수렴 (+) Sample간의 분산이 높지 않은 경우 괜찮 은 성능 (-) 고차원에서 성능 떨어지고 계산 느림
  • 6. Taxonomy of Generative Models 학습 시, Density를 수학적으로 계산 (미적분)이 가능 Neural Autoregressive à : 이전의 자기 자신을 이용하여 현재의 자신을 예측하는 모델
  • 7. Taxonomy of Generative Models • Encoder: • Decoder: from a latent code z, reconstructed sample !" #$ z to be close to the data used to obtain the latent code, x 5!67! 5 8 79 8~;< 8 $ , =>?@@A B7!C?@ ß VAE는 결합분포를 적분식으로 표현 하며 이를 ‘직접’ 적분하지 못하기 때문 에 variational inference로 '추정'
  • 8. (1) Pixel RNN • Autoregreesive Model의 핵심은, 데이터간의 dependency 순서를 정해주는 것! • One effective approach to tractably model a joint distribution of the pixels in the image is to cast it as a product of conditional distributions. à Pixel (1~n2) 순서로 진행 Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
  • 9. (1) Pixel RNN Architecture Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
  • 10. (1) Pixel RNN • R, G, B 순서로 진행 MASK : First Layer, each of the RGB channels is connected to previous channels and to the context, but is not connected to itself. : Subsequent Layers, the channels are also connected to themselves. Multiple Residual Blocks (모델마다 다름) Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
  • 11. (1) Pixel RNN Input Hidden State input-to-state & state-to-state Row LSTM Multiplication à Convolution https://www.slideshare.net/thinkingfactory/pr12-pixelrnn-jaejun-yoo?from_action=save
  • 12. (1) Pixel RNN Input Hidden State input-to-state & state-to-state Diagonal BiLSTM 2x1 Conv • Diagonal convolution 어려우므로, skew the feature maps à it can be parallelized https://www.slideshare.net/thinkingfactory/pr12-pixelrnn-jaejun-yoo?from_action=save
  • 13. (2) Pixel CNN input-to-state Input Hidden State Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
  • 14.
  • 15. Experiments • Discrete Softmax Distribution Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
  • 16. Experiments • Negative log-likelihood (NLL) Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
  • 17. Experiments Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
  • 18. Experiments Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016).
  • 19. (3) Gated Pixel CNN v Pixel CNN 성능 개선 1) ReLU à Gated Activation Unit à Conditional PixelCNN <A single layer in the Gated PixelCNN architecture> Condition (Vk,g ∗ s is an unmasked 1 × 1 convolution, h=s) Van den Oord, Aaron, et al. "Conditional image generation with pixelcnn decoders." Advances in neural information processing systems. 2016.
  • 20. (3) Gated Pixel CNN 2) Stacks : blinded spot 제거 PixelCNN 1.Horizontal Stack : It conditions only on the current row and takes as input the output of previous layer as well as the of the vertical stack. 2.Vertical Stack : It conditions on all the rows above the current pixel. It doesn’t have any masking. It’s output is fed into the horizontal stack and the receptive field grows in rectangular fashion. Gated PixelCNN current pixel https://towardsdatascience.com/auto-regressive-generative-models-pixelrnn-pixelcnn-32d192911173
  • 21. (4) Pixel CNN++ 1) Discretized logistic mixture likelihood The softmax layer which is used to compute the conditional distribution of a pixel although efficiency is very costly in terms of memory. Also, it makes gradients sparse early on during training. à To counter this, we assume a latent color intensity akin to that used in variational autoencoders, with a continuous distribution It is rounded off to its nearest 8-bit representation to give pixel value. The distribution of intensity is logistic so the pixel values can be easily determined. Salimans, Tim, et al. "Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications." arXiv preprint arXiv:1701.05517 (2017). à This method is memory efficient, output is of lower dimensions which provides denser gradients thus solving both problems.
  • 22. (4) Pixel CNN++ 2) Other Modification • Conditioning on whole pixels : PixelCNN factorizes the model over the 3 sub pixels according to the color(RGB) which however, complicates the model. The dependency between color channels of a pixel is relatively simple and doesn’t require a deep model to train. à Therefore, it is better to condition on whole pixels instead of separate colors and then output joint distributions over all 3 channels of the predicted pixel. • Downsampling : PixelCNN cannot compute long range dependencies. This is one of the disadvantages of PixelCNN as to why it cannot match the performance of PixelRNN. To overcome this, we downsample the layers by using convolutions of stride 2. Downsampling reduces input size and thus improves relative size of receptive field which leads to some loss of information but it can be compensated by adding extra short-cut connections. https://towardsdatascience.com/auto-regressive-generative-models-pixelrnn-pixelcnn-32d192911173
  • 23. (4) Pixel CNN++ 2) Other Modification • Short-cut connections : This model the encoder-decoder structure of U-net. Layers 2 and 3 are downsampled and then layers 5 and 6 are upsampled. There is a residual connection from encoders to decoders to provide the localised information. • Dropout : Since the model for PixelCNN and PixelCNN++ are both very powerful, they are likely to overfit data if not regularized. So, we apply dropout on the residual path after the first convolution. https://towardsdatascience.com/auto-regressive-generative-models-pixelrnn-pixelcnn-32d192911173
  • 24. Experiments Salimans, Tim, et al. "Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications." arXiv preprint arXiv:1701.05517 (2017).