[ICLR/ICML2019読み会] A Wrapped Normal Distribution on Hyperbolic Space for Gradient Based Learning (ICML2019)
1. A Wrapped Normal Distribution on Hyperbolic Space
for Gradient Based Learning
ICML’19, Jun 12th, 2019
Yoshihiro Nagano1), Shoichiro Yamaguchi2), Yasuhiro Fujita2), Masanori Koyama2)
1) Department of Complexity Science, The University of Tokyo, Japan
2) Preferred Networks, Inc., Japan
Paper: proceedings.mlr.press/v97/nagano19a.html
Code: github.com/pfnet-research/hyperbolic_wrapped_distribution
ICLR/ICML2019 , Jul 21st, 2019
3. Motivation
ARTICLERESEARCH
Figure 3 | Monte Carlo tree search in AlphaGo. a, Each simulation
traverses the tree by selecting the edge with maximum action value Q,
plus a bonus u(P) that depends on a stored prior probability P for that
is evaluated
a rollout to
Selectiona b cExpansion Evaluation
p
p
Q + u(P)
Q + u(P)Q + u(P)
Q + u(P)
P P
P P
r
P
max
max
P
[Silver+2016]
Mammal
Primate
Human Monkey
Rodent
4. Motivation
Mammal
Primate
Human Monkey
Rodent
ARTICLECH
Monte Carlo tree search in AlphaGo. a, Each simulation
he tree by selecting the edge with maximum action value Q,
is evaluated in two ways: using the value network vθ
a rollout to the end of the game with the fast rollout
Selection b c dExpansion Evaluation Backup
p
p
Q + u(P)
Q + u(P)Q + u(P)
Q + u(P)
P P
P P
Q QQ
Q
rr r
P
max
max
P
[Silver+2016]
Hierarchical Datasets Hyperbolic Space
[Image: wikipedia.org]
[Nickel & Kiela, 2017]
5. Motivation
Mammal
Primate
Human Monkey
Rodent
ARTICLECH
Monte Carlo tree search in AlphaGo. a, Each simulation
he tree by selecting the edge with maximum action value Q,
is evaluated in two ways: using the value network vθ
a rollout to the end of the game with the fast rollout
Selection b c dExpansion Evaluation Backup
p
p
Q + u(P)
Q + u(P)Q + u(P)
Q + u(P)
P P
P P
Q QQ
Q
rr r
P
max
max
P
[Silver+2016]
Hierarchical Datasets Hyperbolic Space
Volume increases exponentially
with its radius
6. Motivation
Mammal
Primate
Human Monkey
Rodent
ARTICLECH
Monte Carlo tree search in AlphaGo. a, Each simulation
he tree by selecting the edge with maximum action value Q,
is evaluated in two ways: using the value network vθ
a rollout to the end of the game with the fast rollout
Selection b c dExpansion Evaluation Backup
p
p
Q + u(P)
Q + u(P)Q + u(P)
Q + u(P)
P P
P P
Q QQ
Q
rr r
P
max
max
P
[Silver+2016]
Hierarchical Datasets Hyperbolic Space
[Nickel+2017]
7. Motivation
Mammal
Primate
Human Monkey
Rodent
ARTICLECH
Monte Carlo tree search in AlphaGo. a, Each simulation
he tree by selecting the edge with maximum action value Q,
is evaluated in two ways: using the value network vθ
a rollout to the end of the game with the fast rollout
Selection b c dExpansion Evaluation Backup
p
p
Q + u(P)
Q + u(P)Q + u(P)
Q + u(P)
P P
P P
Q QQ
Q
rr r
P
max
max
P
[Silver+2016]
Hierarchical Datasets Hyperbolic Space
[Nickel+2017]
How can we extend these works to
probabilistic inference?
15. Hyperbolic Wrapped Distribution(b)
Figure 3: The heatmaps of log-likelihood of the pesudo-
hyperbolic Gaussians with various µ and Σ. We designate
the origin of hyperbolic space by the × mark. See Ap-
pendix B for further details.
Since the metric at the tangent space coincides with the Eu-
clidean metric, we can produce various types of Hyperbolic
distributions by applying our construction strategy to other
distributions defined on Euclidean space, such as Laplace
and Cauchy distribution.
to a
rep
gra
wor
β-V
a sc
In H
is i
cod
µ
As
allo
dien
of t
rep
4.2
We
bili
lum
tual
wor
on
ing
wri
Density:
Projection:
(910 1
(;2 2 ; 9120 +
) 0 2 9 2 92 (
≃ ℝ* 2
16. Numerical Evaluations: VAEs on Synthetic Data
Hyperbolic VAE
Yoshihiro Nagano 1
Shoichiro Yamaguchi 2
Yasuhiro Fujita 2
Masanori Koyama 2
Abstract
rbolic space is a geometry that is known to
ell-suited for representation learning of data
an underlying hierarchical structure. In this
r, we present a novel hyperbolic distribution
d pseudo-hyperbolic Gaussian, a Gaussian-
distribution on hyperbolic space whose den-
can be evaluated analytically and differen-
d with respect to the parameters. Our dis-
ion enables the gradient-based learning of
robabilistic models on hyperbolic space that
d never have been considered before. Also,
an sample from this hyperbolic probability
bution without resorting to auxiliary means
ejection sampling. As applications of our
bution, we develop a hyperbolic-analog of
tional autoencoder and a method of prob-
tic word embedding on hyperbolic space.
emonstrate the efficacy of our distribution
rious datasets including MNIST, Atari 2600
kout, and WordNet.
duction
hyperbolic geometry is drawing attention as a
geometry to assist deep networks in capturing
tal structural properties of data such as a hi-
Hyperbolic attention network (G¨ulc¸ehre et al.,
proved the generalization performance of neural
on various tasks including machine translation
ng the hyperbolic geometry on several parts of
(a) A tree representation of the
training dataset
(b) Normal VAE (β = 1.0) (c) Hyperbolic VAE
Figure 1: The visual results of Hyperbolic VAE applied to
an artificial dataset generated by applying random pertur-
bations to a binary tree. The visualization is being done
on the Poincar´e ball. The red points are the embeddings
of the original tree, and the blue points are the embeddings
of noisy observations generated from the tree. The pink
× represents the origin of the hyperbolic space. The VAE
was trained without the prior knowledge of the tree struc-
ture. Please see 6.1 for experimental details
determines the properties of the dataset that can be learned
from the embedding. For the dataset with a hierarchical
stribution on Hyperbolic Space for
sed Learning
2
Yasuhiro Fujita 2
Masanori Koyama 2
(a) A tree representation of the
training dataset
(b) Normal VAE (β = 1.0) (c) Hyperbolic VAE
Figure 1: The visual results of Hyperbolic VAE applied to
(