SlideShare a Scribd company logo
1 of 22
Download to read offline
Categorical Reparameterization with
Gumbel-Softmax
PR12와 함께 이해하는
Jaejun Yoo
Clova ML / NAVER
PR12
4th Mar, 2018
The Concrete Distribution: A Continuous Relaxation of
Discrete Random Variables
by C.J. Mddison, A. Mnih, Y. W. Teh
Nov. 2016: https://arxiv.org/abs/1611.00712
Today’s contents
NIPS 2016 workshop / ICLR 2017
Categorical Reparameterization with Gumbel-Softmax
by E. Jang, S. Gu, B. Poole
Nov. 2016: https://arxiv.org/abs/1611.01144
들어가기 전에 잠시 한탄…
“Trust me. It’s complicated….”
금새 볼 줄 알고 덤볐다가 매우 시간 잡
아먹은 논문입니다. 내 주말..Orz…
Motivation
How do we deal with stochastic nodes with discrete
random variables?
Optimizing Stochastic Computation Graphs
Forward pass of SCG
Optimizing Stochastic Computation Graphs
Backward pass of SCG
Challenging part
Optimizing Stochastic Computation Graphs
Backward pass of SCG
Challenging part
1) Score Function Estimators
2) Reparameterization Trick
Score Function Estimators
Challenging part
Score Function Estimators
Challenging part
“Still, there remains an issue of high variance.”
Score Function Estimators
Challenging part
“Still, there remains an issue of high variance.”
• This is NOT universally true. There is no proof
• Good discussion in Section 3.1 in Yarin Gal’s Thesis
Reparameterization Trick
Why things go wrong in DISCRETE cases?
“Is this defined?”
“we cannot backpropagate the gradients through
discrete nodes in the computational graph”.
Discrete node
Gumbel Distribution Trick (Relaxation)
The main contribution of this work is
a reparameterization trick for the categorical distribution
Well, not quite – it’s actually a reparameterization trick
for a distribution that we can smoothly deform into
the categorical distribution.
Combine the idea of both
“reprameterization trick and smooth relaxation”
Gumbel Distribution Trick (Relaxation)
Gumbel-Max Trick
* Here, 𝛼𝛼 and 𝜋𝜋 are both unnormalized class probability. Since I am interchangeably referring
from both papers, the notations are a little mixed.
To sample from a discrete categorical distribution we draw a
sample of Gumbel noise, add it to 𝒍𝒍𝒍𝒍 𝒍𝒍(𝝅𝝅𝒊𝒊), and use 𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂
to find the value of 𝒊𝒊 that produces the maximum.
Gumbel Distribution Trick (Relaxation)
Gumbel-Softmax Trick
Smooth relaxation
Gumbel Distribution Trick (Relaxation)
Smooth relaxation
Gumbel-Softmax Trick
Advantage of Gumbel Trick
• Biased but low variance estimator
(Biased estimator w.r.t. original discrete objective but low variance & unbiased
estimator w.r.t. continuous surrogate objective)
• Plug & play (easy to code and implement)
• Computational efficiency
• Better performance
Implementation (Super easy)
def gumbel_max_sample(x):
z = gumbel(loc=0, scale=1, size=x.shape)
return (x + g).argmax(axis=1)
Inverse Transform Sampling
Smoothing relaxation
𝑭𝑭 𝒙𝒙 = 𝐞𝐞𝐞𝐞 𝐩𝐩 − 𝐞𝐞𝐞𝐞𝐞𝐞 −𝒙𝒙 ⟹ 𝐗𝐗 = −𝐥𝐥𝐥𝐥𝐥𝐥(−𝐥𝐥𝐥𝐥𝐥𝐥 𝐔𝐔 )
Results
Structured Output Prediction
NLL을 report하는게 정말 정량적 그리고 정성적 성능 혹은 퀄리티에 의미가 있는 것?
“we find that they are competitive—occasionally outperforming and occasionally
underperforming—all the while being implemented in an AD library without special casing.”
References
• https://www.youtube.com/watch?v=JFgXEbgcT7g (presentation, YouTube)
• https://github.com/ericjang/gumbel-softmax/blob/master/Categorical%20VAE.ipynb
(code)
• https://blog.evjang.com/2016/11/tutorial-categorical-variational.html (blog)
• https://casmls.github.io/general/2017/02/01/GumbelSoftmax.html (blog)
Inverse Transform Sampling
균등 분포의 보편성과 난수 생성기 만들기
𝑼𝑼 ~ 𝑼𝑼𝑼𝑼𝑼𝑼𝑼𝑼 𝟎𝟎, 𝟏𝟏 , 𝑿𝑿 = 𝑭𝑭−𝟏𝟏(𝑼𝑼)
임의의 확률 분포를 따르는 확률 변수 𝑿𝑿에 난수를 추출하고 싶다면?
확률 변수 X의 누적 분포 함수(CDF) 𝑭𝑭(𝒙𝒙)의 역함수 𝑭𝑭−𝟏𝟏
를 알 수 있다면
기본 난수 생성기를 이용하여 확률 변수 𝑿𝑿에 대한 난수 생성기를 만들 수 있다.
즉, 균등 분포만 있으면 다른 모든 분포를 만들어낼 수 있다.
e.g. Standard Gumbel:
http://www.boxnwhis.kr/2017/04/13/how_to_make_random_number_generator_for_any_probability_distribution.html
𝑭𝑭 𝒙𝒙 = 𝐞𝐞𝐞𝐞 𝐩𝐩 − 𝐞𝐞𝐞𝐞𝐞𝐞 −𝒙𝒙 ⟹ 𝐗𝐗 = −𝐥𝐥𝐥𝐥𝐥𝐥(−𝐥𝐥𝐥𝐥𝐥𝐥 𝐔𝐔 )

More Related Content

What's hot

What's hot (20)

[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIP
[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIP[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIP
[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIP
 
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se... [DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
 
[DL輪読会]Attentive neural processes
[DL輪読会]Attentive neural processes[DL輪読会]Attentive neural processes
[DL輪読会]Attentive neural processes
 
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
 
Overcoming Catastrophic Forgetting in Neural Networks読んだ
Overcoming Catastrophic Forgetting in Neural Networks読んだOvercoming Catastrophic Forgetting in Neural Networks読んだ
Overcoming Catastrophic Forgetting in Neural Networks読んだ
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2
 
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−
 
[DL輪読会]A closer look at few shot classification
[DL輪読会]A closer look at few shot classification[DL輪読会]A closer look at few shot classification
[DL輪読会]A closer look at few shot classification
 
Variational inference intro. (korean ver.)
Variational inference intro. (korean ver.)Variational inference intro. (korean ver.)
Variational inference intro. (korean ver.)
 
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
[DL輪読会] Spectral Norm Regularization for Improving the Generalizability of De...
 
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
 
主成分分析
主成分分析主成分分析
主成分分析
 
[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models
 
[DL輪読会]Understanding Black-box Predictions via Influence Functions
[DL輪読会]Understanding Black-box Predictions via Influence Functions [DL輪読会]Understanding Black-box Predictions via Influence Functions
[DL輪読会]Understanding Black-box Predictions via Influence Functions
 
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
 
[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...
[DL輪読会]data2vec: A General Framework for  Self-supervised Learning in Speech,...[DL輪読会]data2vec: A General Framework for  Self-supervised Learning in Speech,...
[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...
 
深層自己符号化器+混合ガウスモデルによる教師なし異常検知
深層自己符号化器+混合ガウスモデルによる教師なし異常検知深層自己符号化器+混合ガウスモデルによる教師なし異常検知
深層自己符号化器+混合ガウスモデルによる教師なし異常検知
 

Similar to [PR12] categorical reparameterization with gumbel softmax

Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Mengxi Jiang
 

Similar to [PR12] categorical reparameterization with gumbel softmax (20)

Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014
 
2021 04-01-dalle
2021 04-01-dalle2021 04-01-dalle
2021 04-01-dalle
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
 
Paper review: Text Generation Models using Deep Generative Approaches
Paper review: Text Generation Models using Deep Generative ApproachesPaper review: Text Generation Models using Deep Generative Approaches
Paper review: Text Generation Models using Deep Generative Approaches
 
Optforml
OptformlOptforml
Optforml
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
 
Ensembling & Boosting 概念介紹
Ensembling & Boosting  概念介紹Ensembling & Boosting  概念介紹
Ensembling & Boosting 概念介紹
 
Structure Learning of Bayesian Networks with p Nodes from n Samples when n&lt...
Structure Learning of Bayesian Networks with p Nodes from n Samples when n&lt...Structure Learning of Bayesian Networks with p Nodes from n Samples when n&lt...
Structure Learning of Bayesian Networks with p Nodes from n Samples when n&lt...
 
Sharp Characterization of Optimal Minibatch Size for Stochastic Finite Sum Co...
Sharp Characterization of Optimal Minibatch Size for Stochastic Finite Sum Co...Sharp Characterization of Optimal Minibatch Size for Stochastic Finite Sum Co...
Sharp Characterization of Optimal Minibatch Size for Stochastic Finite Sum Co...
 
Wasserstein 1031 thesis [Chung il kim]
Wasserstein 1031 thesis [Chung il kim]Wasserstein 1031 thesis [Chung il kim]
Wasserstein 1031 thesis [Chung il kim]
 
Compressed Sensing using Generative Model
Compressed Sensing using Generative ModelCompressed Sensing using Generative Model
Compressed Sensing using Generative Model
 
Binary Vector Reconstruction via Discreteness-Aware Approximate Message Passing
Binary Vector Reconstruction via Discreteness-Aware Approximate Message PassingBinary Vector Reconstruction via Discreteness-Aware Approximate Message Passing
Binary Vector Reconstruction via Discreteness-Aware Approximate Message Passing
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic Rules
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
2021 06-02-tabnet
2021 06-02-tabnet2021 06-02-tabnet
2021 06-02-tabnet
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
 

More from JaeJun Yoo

Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
JaeJun Yoo
 

More from JaeJun Yoo (14)

[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques
 
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Anal...
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
 
[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks[PR12] Spectral Normalization for Generative Adversarial Networks
[PR12] Spectral Normalization for Generative Adversarial Networks
 
Introduction to ambient GAN
Introduction to ambient GANIntroduction to ambient GAN
Introduction to ambient GAN
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
 
[PR12] Capsule Networks - Jaejun Yoo
[PR12] Capsule Networks - Jaejun Yoo[PR12] Capsule Networks - Jaejun Yoo
[PR12] Capsule Networks - Jaejun Yoo
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
[PR12] PixelRNN- Jaejun Yoo
[PR12] PixelRNN- Jaejun Yoo[PR12] PixelRNN- Jaejun Yoo
[PR12] PixelRNN- Jaejun Yoo
 
[Pr12] dann jaejun yoo
[Pr12] dann   jaejun yoo[Pr12] dann   jaejun yoo
[Pr12] dann jaejun yoo
 
Variants of GANs - Jaejun Yoo
Variants of GANs - Jaejun YooVariants of GANs - Jaejun Yoo
Variants of GANs - Jaejun Yoo
 
[PR12] intro. to gans jaejun yoo
[PR12] intro. to gans   jaejun yoo[PR12] intro. to gans   jaejun yoo
[PR12] intro. to gans jaejun yoo
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

[PR12] categorical reparameterization with gumbel softmax

  • 1. Categorical Reparameterization with Gumbel-Softmax PR12와 함께 이해하는 Jaejun Yoo Clova ML / NAVER PR12 4th Mar, 2018
  • 2. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables by C.J. Mddison, A. Mnih, Y. W. Teh Nov. 2016: https://arxiv.org/abs/1611.00712 Today’s contents NIPS 2016 workshop / ICLR 2017 Categorical Reparameterization with Gumbel-Softmax by E. Jang, S. Gu, B. Poole Nov. 2016: https://arxiv.org/abs/1611.01144
  • 3. 들어가기 전에 잠시 한탄… “Trust me. It’s complicated….” 금새 볼 줄 알고 덤볐다가 매우 시간 잡 아먹은 논문입니다. 내 주말..Orz…
  • 4. Motivation How do we deal with stochastic nodes with discrete random variables?
  • 5. Optimizing Stochastic Computation Graphs Forward pass of SCG
  • 6. Optimizing Stochastic Computation Graphs Backward pass of SCG Challenging part
  • 7. Optimizing Stochastic Computation Graphs Backward pass of SCG Challenging part 1) Score Function Estimators 2) Reparameterization Trick
  • 9. Score Function Estimators Challenging part “Still, there remains an issue of high variance.”
  • 10. Score Function Estimators Challenging part “Still, there remains an issue of high variance.” • This is NOT universally true. There is no proof • Good discussion in Section 3.1 in Yarin Gal’s Thesis
  • 12. Why things go wrong in DISCRETE cases? “Is this defined?” “we cannot backpropagate the gradients through discrete nodes in the computational graph”. Discrete node
  • 13. Gumbel Distribution Trick (Relaxation) The main contribution of this work is a reparameterization trick for the categorical distribution Well, not quite – it’s actually a reparameterization trick for a distribution that we can smoothly deform into the categorical distribution. Combine the idea of both “reprameterization trick and smooth relaxation”
  • 14. Gumbel Distribution Trick (Relaxation) Gumbel-Max Trick * Here, 𝛼𝛼 and 𝜋𝜋 are both unnormalized class probability. Since I am interchangeably referring from both papers, the notations are a little mixed. To sample from a discrete categorical distribution we draw a sample of Gumbel noise, add it to 𝒍𝒍𝒍𝒍 𝒍𝒍(𝝅𝝅𝒊𝒊), and use 𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 to find the value of 𝒊𝒊 that produces the maximum.
  • 15. Gumbel Distribution Trick (Relaxation) Gumbel-Softmax Trick Smooth relaxation
  • 16. Gumbel Distribution Trick (Relaxation) Smooth relaxation Gumbel-Softmax Trick
  • 17. Advantage of Gumbel Trick • Biased but low variance estimator (Biased estimator w.r.t. original discrete objective but low variance & unbiased estimator w.r.t. continuous surrogate objective) • Plug & play (easy to code and implement) • Computational efficiency • Better performance
  • 18. Implementation (Super easy) def gumbel_max_sample(x): z = gumbel(loc=0, scale=1, size=x.shape) return (x + g).argmax(axis=1) Inverse Transform Sampling Smoothing relaxation 𝑭𝑭 𝒙𝒙 = 𝐞𝐞𝐞𝐞 𝐩𝐩 − 𝐞𝐞𝐞𝐞𝐞𝐞 −𝒙𝒙 ⟹ 𝐗𝐗 = −𝐥𝐥𝐥𝐥𝐥𝐥(−𝐥𝐥𝐥𝐥𝐥𝐥 𝐔𝐔 )
  • 19. Results Structured Output Prediction NLL을 report하는게 정말 정량적 그리고 정성적 성능 혹은 퀄리티에 의미가 있는 것? “we find that they are competitive—occasionally outperforming and occasionally underperforming—all the while being implemented in an AD library without special casing.”
  • 20. References • https://www.youtube.com/watch?v=JFgXEbgcT7g (presentation, YouTube) • https://github.com/ericjang/gumbel-softmax/blob/master/Categorical%20VAE.ipynb (code) • https://blog.evjang.com/2016/11/tutorial-categorical-variational.html (blog) • https://casmls.github.io/general/2017/02/01/GumbelSoftmax.html (blog)
  • 21.
  • 22. Inverse Transform Sampling 균등 분포의 보편성과 난수 생성기 만들기 𝑼𝑼 ~ 𝑼𝑼𝑼𝑼𝑼𝑼𝑼𝑼 𝟎𝟎, 𝟏𝟏 , 𝑿𝑿 = 𝑭𝑭−𝟏𝟏(𝑼𝑼) 임의의 확률 분포를 따르는 확률 변수 𝑿𝑿에 난수를 추출하고 싶다면? 확률 변수 X의 누적 분포 함수(CDF) 𝑭𝑭(𝒙𝒙)의 역함수 𝑭𝑭−𝟏𝟏 를 알 수 있다면 기본 난수 생성기를 이용하여 확률 변수 𝑿𝑿에 대한 난수 생성기를 만들 수 있다. 즉, 균등 분포만 있으면 다른 모든 분포를 만들어낼 수 있다. e.g. Standard Gumbel: http://www.boxnwhis.kr/2017/04/13/how_to_make_random_number_generator_for_any_probability_distribution.html 𝑭𝑭 𝒙𝒙 = 𝐞𝐞𝐞𝐞 𝐩𝐩 − 𝐞𝐞𝐞𝐞𝐞𝐞 −𝒙𝒙 ⟹ 𝐗𝐗 = −𝐥𝐥𝐥𝐥𝐥𝐥(−𝐥𝐥𝐥𝐥𝐥𝐥 𝐔𝐔 )