(Korean) Introduction to (paper1) Categorical Reparameterization with Gumbel Softmax and (paper2) The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
Video: https://youtu.be/ty3SciyoIyk
Paper1: https://arxiv.org/abs/1611.01144
Paper2: https://arxiv.org/abs/1611.00712
2. The Concrete Distribution: A Continuous Relaxation of
Discrete Random Variables
by C.J. Mddison, A. Mnih, Y. W. Teh
Nov. 2016: https://arxiv.org/abs/1611.00712
Today’s contents
NIPS 2016 workshop / ICLR 2017
Categorical Reparameterization with Gumbel-Softmax
by E. Jang, S. Gu, B. Poole
Nov. 2016: https://arxiv.org/abs/1611.01144
3. 들어가기 전에 잠시 한탄…
“Trust me. It’s complicated….”
금새 볼 줄 알고 덤볐다가 매우 시간 잡
아먹은 논문입니다. 내 주말..Orz…
10. Score Function Estimators
Challenging part
“Still, there remains an issue of high variance.”
• This is NOT universally true. There is no proof
• Good discussion in Section 3.1 in Yarin Gal’s Thesis
12. Why things go wrong in DISCRETE cases?
“Is this defined?”
“we cannot backpropagate the gradients through
discrete nodes in the computational graph”.
Discrete node
13. Gumbel Distribution Trick (Relaxation)
The main contribution of this work is
a reparameterization trick for the categorical distribution
Well, not quite – it’s actually a reparameterization trick
for a distribution that we can smoothly deform into
the categorical distribution.
Combine the idea of both
“reprameterization trick and smooth relaxation”
14. Gumbel Distribution Trick (Relaxation)
Gumbel-Max Trick
* Here, 𝛼𝛼 and 𝜋𝜋 are both unnormalized class probability. Since I am interchangeably referring
from both papers, the notations are a little mixed.
To sample from a discrete categorical distribution we draw a
sample of Gumbel noise, add it to 𝒍𝒍𝒍𝒍 𝒍𝒍(𝝅𝝅𝒊𝒊), and use 𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂
to find the value of 𝒊𝒊 that produces the maximum.
19. Results
Structured Output Prediction
NLL을 report하는게 정말 정량적 그리고 정성적 성능 혹은 퀄리티에 의미가 있는 것?
“we find that they are competitive—occasionally outperforming and occasionally
underperforming—all the while being implemented in an AD library without special casing.”
22. Inverse Transform Sampling
균등 분포의 보편성과 난수 생성기 만들기
𝑼𝑼 ~ 𝑼𝑼𝑼𝑼𝑼𝑼𝑼𝑼 𝟎𝟎, 𝟏𝟏 , 𝑿𝑿 = 𝑭𝑭−𝟏𝟏(𝑼𝑼)
임의의 확률 분포를 따르는 확률 변수 𝑿𝑿에 난수를 추출하고 싶다면?
확률 변수 X의 누적 분포 함수(CDF) 𝑭𝑭(𝒙𝒙)의 역함수 𝑭𝑭−𝟏𝟏
를 알 수 있다면
기본 난수 생성기를 이용하여 확률 변수 𝑿𝑿에 대한 난수 생성기를 만들 수 있다.
즉, 균등 분포만 있으면 다른 모든 분포를 만들어낼 수 있다.
e.g. Standard Gumbel:
http://www.boxnwhis.kr/2017/04/13/how_to_make_random_number_generator_for_any_probability_distribution.html
𝑭𝑭 𝒙𝒙 = 𝐞𝐞𝐞𝐞 𝐩𝐩 − 𝐞𝐞𝐞𝐞𝐞𝐞 −𝒙𝒙 ⟹ 𝐗𝐗 = −𝐥𝐥𝐥𝐥𝐥𝐥(−𝐥𝐥𝐥𝐥𝐥𝐥 𝐔𝐔 )