SlideShare a Scribd company logo
1 of 27
Download to read offline
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Wasserstein GAN
JIN HO LEE
2018-11-30
JIN HO LEE Wasserstein GAN 2018-11-30 1 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
• 1. Introduction
• 2 Different Distances
• 3 Wasserstein GAN
• 4 Empirical Results
▷ 4.1 Experimental Procedure
▷ 4.2 Meaningful loss metric
▷ 4.3 Improved stability
• 5 Related Work
JIN HO LEE Wasserstein GAN 2018-11-30 2 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1. Introduction
1. Introduction
• Main Goal : Learning GAN by using Wasserstein distance W(Pr, Pg)
• In Section 2, we provide how the Earth Mover (EM) distance behaves in
comparison to Total Variation (TV), Kullback-Leibler (KL) divergence and
Jensen-Shannon (JS) divergence.
• In Section 3, we define Wasserstein-GAN and efficient approximation of
the EM distance
• we empirically show that WGANs cure the main training problems of
GANs.
JIN HO LEE Wasserstein GAN 2018-11-30 3 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances
2. Different Distances
• A σ-algebra Σ of subst of X is a collection Σ of subsets of X satisfying
the following conditions
(a) ∅ ∈ Σ
(b) if B ∈ Σ then Bc ∈ Σ
(c) if B1, B2, · · · is a countable collection of sets in Σ then ∪∞
n=1Bn ∈ Σ
• Borel algebra : the smallest σ-algebra containing the open sets
• A probability space consists of sample space Ω, events F and
probability measure P where the set of events F is a σ-algebra
• A function µ is a probability measure on a probability space (X, Σ, P) if
(a) µ(X) = 1, µ(∅) = 0, µ(A) ∈ [0, 1] for every A ∈ Σ
(b) countable additivity : for all countable collections {Ei} of pairwise
disjoint sets:
µ (∪iEi) =
∑
i
µ(Ei).
JIN HO LEE Wasserstein GAN 2018-11-30 4 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances
• The Total Variation (TV) distance
δ(Pr, Pg) = sup
A∈Σ
|Pr(A) − Pg(A)|.
• The Kullback-Leibler (KL) divergence
KL(Pr||Pg) =
∫
log
(
Pr(x)
Pg(x)
)
Pr(x)dµ(x).
• The Jensen-Shannon (JS) divergence
JS(Pr, Pg) = KL(Pr||Pm) + KL(Pg||Pm),
where Pm = (Pr + Pg)/2 is the mixture.
• The Earth-Mover (EM) distance or Wasserstein-1
W(Pr, Pg) = inf
γ∈Π(Pr,Pg)
E(x,y)∼γ[||x − y||],
where Π(Pr, Pg) denotes the set of all joint distributions γ(x, y) whose
marginals are respectively Pr and Pg, that is γ is a coupling of Pr and Pg.
JIN HO LEE Wasserstein GAN 2018-11-30 5 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Couplings
Couplings
• χ : compact metric space
• Σ : the set of all Borel subset of χ
• Prob(χ) : probability measures on χ
Definition
Let µ and ν be probability measures on the same measurable space (S, Σ).
A coupling of µ and ν is a probability measure on the coupling product
space (S × S, Σ × Σ) such that the marginals of coincide with µ and ν, i.e.,
γ(A × S) = µ(A) and γ(S × A) = ν(A) ∀A ∈ Σ.
JIN HO LEE Wasserstein GAN 2018-11-30 6 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Couplings
Example
For 0 ≤ p1 ≤ p2 ≤ 1, qi = 1 − pi(i = 1, 2), we consider the following joint
distributions:
Since ˜X ∼ Ber(p1) and ˜Y ∼ Ber(p2), f and g are couplings of Ber(p1) and
Ber(p2).
JIN HO LEE Wasserstein GAN 2018-11-30 7 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Example of Wasserstein Distance
Example
For previous joint distributions f and g, we assume that(it’s not true)
Π[Ber(p1), Ber(p2)] = {f, g}.
Then we have
W(Ber(p1), Ber(p2)) = min{q1p2 + p1q2, p2 − p1}.
Proof.
Since Π[Ber(p1), Ber(p2)] = {f, g}, we consider only two cases.
case 1. f ∈ Π[Ber(p1), Ber(p2)].
E(x,y)∼f[||x − y||]
= f(0, 0)||0 − 0|| + f(0, 1)||0 − 1|| + f(1, 0)||1 − 0|| + f(1, 1)||1 − 1||
= q1p2 + p1q2
JIN HO LEE Wasserstein GAN 2018-11-30 8 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Example of Wasserstein Distance
case 2. g ∈ Π[Ber(p1), Ber(p2)].
E(x,y)∼g[||x − y||]
= g(0, 0)||0 − 0|| + g(0, 1)||0 − 1|| + g(1, 0)||1 − 0|| + g(1, 1)||1 − 1||
= p2 − p1
By case 1 and 2, we have
W(Ber(p1), Ber(p2)) = inf
γ∈Π[Ber(p1),Ber(p2)]
E(x,y)∼γ[||x − y||]
= inf
γ∈{f,g}
E(x,y)∼γ[||x − y||]
= min{q1p2 + p1q2, p2 − p1}.
JIN HO LEE Wasserstein GAN 2018-11-30 9 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances An example of coulpings
Lemma
For p1, p2 ∈ [0, 1], the set of all couplings Π[Ber(p1), Ber(p2)] of Ber(p1)
and Ber(p2) is {pa|a ∈ [0, 1]} where
pa(0, 0) = a
pa(0, 1) = q1 − a
pa(1, 0) = q2 − a
pa(1, 1) = p2 − q1 + a
Proof.
Let γ ∈ Π[Ber(p1), Ber(p2)]. Then we have the following table
γ Y = 0 Y = 1 Σyγ(x, y)
X = 0 q1
X = 1 q2
Σxγ(x, y) q2 p2
JIN HO LEE Wasserstein GAN 2018-11-30 10 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances An example of coulpings
For a ∈ [0, 1], if γ(0, 0) = a, then the following table is completely
determined.
γ Y = 0 Y = 1 Σyγ(x, y)
X = 0 a q1 − a q1
X = 1 q2 − a p2 − (q1 − a) q2
Σxγ(x, y) q2 p2
It means that, for a ∈ [0, 1], we can have a coupling γ of Ber(p1) and
Ber(p2) such that γ(0, 0) = a. This complete the proof.
JIN HO LEE Wasserstein GAN 2018-11-30 11 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances A computational result of Wasserstein Distance
Theorem
For p1 ≤ p2, we have
W(Ber(p1), Ber(p2)) = p2 − p1.
Proof.
From the previous Lemma, we have Π[Ber(p1), Ber(p2)] = {pa|a ∈ [0, 1]}
where pa(0, 0) = a. Then we obtain
E(x,y)∼pa
[||x − y||]
= pa(0, 0)||0 − 0|| + pa(0, 1)||0 − 1|| + pa(1, 0)||1 − 0|| + pa(1, 1)||1 − 1||
= 2 − p1 − p2 − 2a
Since p1 and p2 are constants and a is less or equal to marginal
probabilities, we have a ≤ min{q1, q2}. From the assumption p1 ≤ p2, we
have q1 ≥ q2 and min{q1, q2} = q2.
JIN HO LEE Wasserstein GAN 2018-11-30 12 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances A computational result of Wasserstein Distance
The function E(x,y)∼pa
[||x − y||] = 2 − p1 − p2 − 2a is linear by a and
a ≤ q2, we have
2 − p1 − p2 − 2a ≥ 2 − p1 − p2 − 2(1 − p2) = p2 − p1.
JIN HO LEE Wasserstein GAN 2018-11-30 13 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Example 1
Example (1)
• We assume that
▷ Z ∼ U[0, 1] : uniform distribution on the unit interval.
▷ P0 : be the distribution of (0, Z) ∈ R2, uniform on a straight vertical
line passing through the origin.
▷ gθ(z) = (θ, z) with θ a single real parameter.
Then we obtain the following.
• W(P0, Pθ) = |θ|
• JS(P0, Pθ) =
{
log 2 if θ ̸= 0,
0 if θ = 0,
• KL(Pθ||P0) = KL(P0||Pθ) =
{
+∞ if θ ̸= 0,
0 if θ = 0,
JIN HO LEE Wasserstein GAN 2018-11-30 14 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Example 1
• δ(P0, Pθ) =
{
1 if θ ̸= 0,
0 if θ = 0,
• When θt → 0, the sequence (Pθ)t∈N converges to P0 under the EM
distance, but does not convege at all under either us JS, KL, reverse KL,
or TV divergences.
JIN HO LEE Wasserstein GAN 2018-11-30 15 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Theorem 1
Theorem (1)
Let Pr be a fixed distribution over X. Let Z be a random variable (e.g
Gaussian) over another space Z. Let g : Z × Rd → χ be a function, that
will be denoted gθ(z) with z the first coordinate and θ the second. Let Pθ
denote the distribution of gθ(z). Then,
1. If g is continuous in θ, so is W(Pr, Pθ).
2. If g is locally Lipschitz and satisfies regularity assumption 1, then
W(Pr, Pθ) is continuous everywhere, and differentiable almost everywhere.
3. Statements 1-2 are false for the Jensen-Shannon divergence JS(Pr, Pθ)
and all the KLs.
JIN HO LEE Wasserstein GAN 2018-11-30 16 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Theorem 1
The following corollary tells us that learning by minimizing the EM
distance makes sense (at least in theory) with neural networks.
Corollary
Let gθ be any feedforward neural network parameterized by θ, and p(z) a
prior over z such that Ez∼p(z)[||z||] < ∞ (e.g. Gaussian, uniform, etc.).
Then assumption 1 is satisfied and therefore W(Pr, Pθ) is continuous
everywhere and differentiable almost everywhere.
JIN HO LEE Wasserstein GAN 2018-11-30 17 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2. Different Distances Theorem 2
Theorem (2)
Let P be a distribution on a compact space X and (Pn)n∈N be a sequence
of distributions on X. Then, considering all limits as n → ∞,
1. The following statements are equivalent
• δ(Pn, P) → 0 with δ the total variation distance.
• JS(Pn, P) → 0 with JS the Jensen-Shannon divergence.
2. The following statements are equivalent
• W(Pn, P) → 0.
• Pn
D
−→ P where
D
−→ represents convergence in distribution for random
variables.
3. KL(Pn||P) → 0 or KL(P||n) → 0 imply the statements in (2)
JIN HO LEE Wasserstein GAN 2018-11-30 18 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3. Wasserstein GAN
3. Wasserstein GAN
• Computing W(Pr, Pg) is intractible from the definition of Wasserstein
distance. However, then Kantorovich-Rubinstein duality tell us that:
W(Pr, Pg) = sup
||f||L≤1
Ex∼Pr [f(x)] − Ex∼Pθ
[f(x)]
where ||f||L ≤ 1 means that f satisfies 1-Lipschitz condition.
• Note that, if we replace ||f||L ≤ 1 for ||f||L ≤ K for some K, we have
K · W(Pr, Pg) = sup
||f||L≤K
Ex∼Pr [f(x)] − Ex∼Pθ
[f(x)].
• If we have a parametrized family functions {fw}w∈W that are all
K-Lipschitz for some K, then we have:
max
w∈W
Ex∼Pr [fw(x)] − Ex∼Pθ
[fw(x)] ≤ sup
||f||L≤K
Ex∼Pr [f(x)] − Ex∼Pθ
[f(x)]
= K · W(Pr, Pθ)
JIN HO LEE Wasserstein GAN 2018-11-30 19 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3. Wasserstein GAN Theorem 3
Theorem (3)
Let Pr be any distribution. Let Pθ be the distribution of gθ(Z) with Z a
random variable with density p and gθ a function satisfying assumption 1.
Then, there is a solution f : χ → R to the problem
max
||f||L≤1
Ex∼Pr [f(x)] − Ex∼Pθ
[f(x)]
and we have
∇θW(Pr, Pθ) = −Ez∼p(z)[∇θf(gθ(z))]
when both terms are well-defined.
• Objective functions:
LWGAN
D = Ex∼Pr [fw(x)] − Ez∼P(z)[fw(gθ(z))]
LWGAN
G = Ez∼P(z)[f(gθ(z))]
where wD ← clip(w, −0.01, 0.01) in LD.
JIN HO LEE Wasserstein GAN 2018-11-30 20 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3. Wasserstein GAN Algorithm 1
JIN HO LEE Wasserstein GAN 2018-11-30 21 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3. Wasserstein GAN Figure 2
In this paper, the Authors call discriminator critic. In Figure 2, we train a
GAN discriminator and a WGAN critic still optimality. The discriminator
learn very quickly to distinguish between fake and real. But, the critic
can’t saturate and converges to a linear function.
JIN HO LEE Wasserstein GAN 2018-11-30 22 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4. Empirical Results
• We claim two main benefits:
▷ a meaningful loss metric that correlates with the generator’s
convergence and sample quality
▷ improved stability of the optimization process
JIN HO LEE Wasserstein GAN 2018-11-30 23 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4. Empirical Results 4.1 Experimental Procedure
• Training curves and the visualization of samples at different stages of
training show clear correlation between the Wasserstein estimate and the
generated image quality.
JIN HO LEE Wasserstein GAN 2018-11-30 24 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4. Empirical Results 4.1 Experimental Procedure
JIN HO LEE Wasserstein GAN 2018-11-30 25 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Some knowledge to read Appendix
• Let χ ∈ Rd be a compact set, that is closed and bounded by Heine-Borel
Theorem and Prob(χ) a probability measure over χ.
• We define
Cb(χ) = {f : χ → R|f is continuous and bounded}
• For f ∈ Cb(χ), we can define a norm ||f||∞ = max
x∈χ
|f(x)|, since f is
bounded.
• Then we have a normed vector space (Cb(χ), || · ||∞).
• The dual space
Cb(χ)∗
= {ϕ : Cb(χ) → R|ϕ is linear and continuous}
has norm ||ϕ|| = sup
f∈Cb(χ),||f||∞≤1
|ϕ(f)|.
JIN HO LEE Wasserstein GAN 2018-11-30 26 / 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Some knowledge to read Appendix
• Let µ be a signed measure over χ, and let the Total Variational distance
||µ||TV = sup
A⊂χ
|µ(A)|
where A is a Borel subset in χ. For two probability distributions Pr and
Pθ, we the function
δ(Pr, Pθ) = ||Pr − Pθ||TV
is a distance in Prob(χ) (called the Total Variation distance)
• We can consider
Φ : (Prob(χ), δ) → (Cb(χ)∗
, || · ||)
where Φ(P)(f) = Ex∼P[f(x)] is a linear function over Cb(χ).
• By the Riesz Representation Theorem, Φ is an isometric immersion, that
is δ(P, Q) = ||Φ(P) − Φ(Q)|| and ϕ is a 1-1 correspondence.
JIN HO LEE Wasserstein GAN 2018-11-30 26 / 26

More Related Content

What's hot

A Note on Over-replicated Softmax Model
A Note on Over-replicated Softmax ModelA Note on Over-replicated Softmax Model
A Note on Over-replicated Softmax Model
Tomonari Masada
 
TMYM - ASU seminar 03:27:2015
TMYM - ASU seminar 03:27:2015TMYM - ASU seminar 03:27:2015
TMYM - ASU seminar 03:27:2015
Tuna Yildirim
 
MATH3031_Project 130515
MATH3031_Project 130515MATH3031_Project 130515
MATH3031_Project 130515
Matt Grifferty
 

What's hot (20)

3.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-243.common fixed point theorem for compatible mapping of type a -21-24
3.common fixed point theorem for compatible mapping of type a -21-24
 
ProjectAndersSchreiber
ProjectAndersSchreiberProjectAndersSchreiber
ProjectAndersSchreiber
 
M1l5
M1l5M1l5
M1l5
 
Chern-Simons Theory
Chern-Simons TheoryChern-Simons Theory
Chern-Simons Theory
 
Bellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproductsBellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproducts
 
Rotation in 3d Space: Euler Angles, Quaternions, Marix Descriptions
Rotation in 3d Space: Euler Angles, Quaternions, Marix DescriptionsRotation in 3d Space: Euler Angles, Quaternions, Marix Descriptions
Rotation in 3d Space: Euler Angles, Quaternions, Marix Descriptions
 
A Note on Over-replicated Softmax Model
A Note on Over-replicated Softmax ModelA Note on Over-replicated Softmax Model
A Note on Over-replicated Softmax Model
 
Stability
StabilityStability
Stability
 
Connected Total Dominating Sets and Connected Total Domination Polynomials of...
Connected Total Dominating Sets and Connected Total Domination Polynomials of...Connected Total Dominating Sets and Connected Total Domination Polynomials of...
Connected Total Dominating Sets and Connected Total Domination Polynomials of...
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted Paraproducts
 
Dr. Arpan Bhattacharyya (Indian Institute Of Science, Bangalore)
Dr. Arpan Bhattacharyya (Indian Institute Of Science, Bangalore)Dr. Arpan Bhattacharyya (Indian Institute Of Science, Bangalore)
Dr. Arpan Bhattacharyya (Indian Institute Of Science, Bangalore)
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysis
 
TMYM - ASU seminar 03:27:2015
TMYM - ASU seminar 03:27:2015TMYM - ASU seminar 03:27:2015
TMYM - ASU seminar 03:27:2015
 
Spherical interval-valued fuzzy bi-ideals of gamma near-rings
Spherical interval-valued fuzzy bi-ideals of gamma near-ringsSpherical interval-valued fuzzy bi-ideals of gamma near-rings
Spherical interval-valued fuzzy bi-ideals of gamma near-rings
 
M1l6
M1l6M1l6
M1l6
 
pres06-main
pres06-mainpres06-main
pres06-main
 
1982 A Stochastic Logistic Diffusion Equation
1982 A Stochastic Logistic Diffusion Equation1982 A Stochastic Logistic Diffusion Equation
1982 A Stochastic Logistic Diffusion Equation
 
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
 
M1l3
M1l3M1l3
M1l3
 
MATH3031_Project 130515
MATH3031_Project 130515MATH3031_Project 130515
MATH3031_Project 130515
 

Similar to Wasserstein gan

orthogonal.pptx
orthogonal.pptxorthogonal.pptx
orthogonal.pptx
JaseSharma
 

Similar to Wasserstein gan (20)

Wasserstein GAN
Wasserstein GANWasserstein GAN
Wasserstein GAN
 
Common Fixed Point Theorems in Uniform Spaces
Common Fixed Point Theorems in Uniform SpacesCommon Fixed Point Theorems in Uniform Spaces
Common Fixed Point Theorems in Uniform Spaces
 
lec12.ppt
lec12.pptlec12.ppt
lec12.ppt
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theorem
 
Some Generalization of Eneström-Kakeya Theorem
Some Generalization of Eneström-Kakeya TheoremSome Generalization of Eneström-Kakeya Theorem
Some Generalization of Eneström-Kakeya Theorem
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract means
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restriction
 
Justification of canonical quantization of Josephson effect in various physic...
Justification of canonical quantization of Josephson effect in various physic...Justification of canonical quantization of Josephson effect in various physic...
Justification of canonical quantization of Josephson effect in various physic...
 
Quantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko RobnikQuantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko Robnik
 
On uniformly continuous uniform space
On uniformly continuous uniform spaceOn uniformly continuous uniform space
On uniformly continuous uniform space
 
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
 
Some Examples of Scaling Sets
Some Examples of Scaling SetsSome Examples of Scaling Sets
Some Examples of Scaling Sets
 
On the Zeros of Polar Derivatives
On the Zeros of Polar DerivativesOn the Zeros of Polar Derivatives
On the Zeros of Polar Derivatives
 
8803-09-lec16.pdf
8803-09-lec16.pdf8803-09-lec16.pdf
8803-09-lec16.pdf
 
orthogonal.pptx
orthogonal.pptxorthogonal.pptx
orthogonal.pptx
 
Common fixed point theorem for occasionally weakly compatible mapping in q fu...
Common fixed point theorem for occasionally weakly compatible mapping in q fu...Common fixed point theorem for occasionally weakly compatible mapping in q fu...
Common fixed point theorem for occasionally weakly compatible mapping in q fu...
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
lec23.ppt
lec23.pptlec23.ppt
lec23.ppt
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
lec2.ppt
lec2.pptlec2.ppt
lec2.ppt
 

More from Jinho Lee (9)

Quantum computing and quantum information 1.2, 1.3
Quantum computing and quantum information 1.2, 1.3Quantum computing and quantum information 1.2, 1.3
Quantum computing and quantum information 1.2, 1.3
 
Effective active learning strategy for multi label learning
Effective active learning strategy for multi label learningEffective active learning strategy for multi label learning
Effective active learning strategy for multi label learning
 
Generative Adversarial Nets
Generative Adversarial NetsGenerative Adversarial Nets
Generative Adversarial Nets
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Ch.5 machine learning basics
Ch.5  machine learning basicsCh.5  machine learning basics
Ch.5 machine learning basics
 
Ch.4 numerical computation
Ch.4  numerical computationCh.4  numerical computation
Ch.4 numerical computation
 
Auto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-EncodersAuto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-Encoders
 
Ch.3 Probability and Information Theory
Ch.3  Probability and Information TheoryCh.3  Probability and Information Theory
Ch.3 Probability and Information Theory
 
Ch.2 Linear Algebra
Ch.2  Linear AlgebraCh.2  Linear Algebra
Ch.2 Linear Algebra
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Wasserstein gan

  • 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contents • 1. Introduction • 2 Different Distances • 3 Wasserstein GAN • 4 Empirical Results ▷ 4.1 Experimental Procedure ▷ 4.2 Meaningful loss metric ▷ 4.3 Improved stability • 5 Related Work JIN HO LEE Wasserstein GAN 2018-11-30 2 / 26
  • 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction 1. Introduction • Main Goal : Learning GAN by using Wasserstein distance W(Pr, Pg) • In Section 2, we provide how the Earth Mover (EM) distance behaves in comparison to Total Variation (TV), Kullback-Leibler (KL) divergence and Jensen-Shannon (JS) divergence. • In Section 3, we define Wasserstein-GAN and efficient approximation of the EM distance • we empirically show that WGANs cure the main training problems of GANs. JIN HO LEE Wasserstein GAN 2018-11-30 3 / 26
  • 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances 2. Different Distances • A σ-algebra Σ of subst of X is a collection Σ of subsets of X satisfying the following conditions (a) ∅ ∈ Σ (b) if B ∈ Σ then Bc ∈ Σ (c) if B1, B2, · · · is a countable collection of sets in Σ then ∪∞ n=1Bn ∈ Σ • Borel algebra : the smallest σ-algebra containing the open sets • A probability space consists of sample space Ω, events F and probability measure P where the set of events F is a σ-algebra • A function µ is a probability measure on a probability space (X, Σ, P) if (a) µ(X) = 1, µ(∅) = 0, µ(A) ∈ [0, 1] for every A ∈ Σ (b) countable additivity : for all countable collections {Ei} of pairwise disjoint sets: µ (∪iEi) = ∑ i µ(Ei). JIN HO LEE Wasserstein GAN 2018-11-30 4 / 26
  • 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances • The Total Variation (TV) distance δ(Pr, Pg) = sup A∈Σ |Pr(A) − Pg(A)|. • The Kullback-Leibler (KL) divergence KL(Pr||Pg) = ∫ log ( Pr(x) Pg(x) ) Pr(x)dµ(x). • The Jensen-Shannon (JS) divergence JS(Pr, Pg) = KL(Pr||Pm) + KL(Pg||Pm), where Pm = (Pr + Pg)/2 is the mixture. • The Earth-Mover (EM) distance or Wasserstein-1 W(Pr, Pg) = inf γ∈Π(Pr,Pg) E(x,y)∼γ[||x − y||], where Π(Pr, Pg) denotes the set of all joint distributions γ(x, y) whose marginals are respectively Pr and Pg, that is γ is a coupling of Pr and Pg. JIN HO LEE Wasserstein GAN 2018-11-30 5 / 26
  • 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Couplings Couplings • χ : compact metric space • Σ : the set of all Borel subset of χ • Prob(χ) : probability measures on χ Definition Let µ and ν be probability measures on the same measurable space (S, Σ). A coupling of µ and ν is a probability measure on the coupling product space (S × S, Σ × Σ) such that the marginals of coincide with µ and ν, i.e., γ(A × S) = µ(A) and γ(S × A) = ν(A) ∀A ∈ Σ. JIN HO LEE Wasserstein GAN 2018-11-30 6 / 26
  • 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Couplings Example For 0 ≤ p1 ≤ p2 ≤ 1, qi = 1 − pi(i = 1, 2), we consider the following joint distributions: Since ˜X ∼ Ber(p1) and ˜Y ∼ Ber(p2), f and g are couplings of Ber(p1) and Ber(p2). JIN HO LEE Wasserstein GAN 2018-11-30 7 / 26
  • 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Example of Wasserstein Distance Example For previous joint distributions f and g, we assume that(it’s not true) Π[Ber(p1), Ber(p2)] = {f, g}. Then we have W(Ber(p1), Ber(p2)) = min{q1p2 + p1q2, p2 − p1}. Proof. Since Π[Ber(p1), Ber(p2)] = {f, g}, we consider only two cases. case 1. f ∈ Π[Ber(p1), Ber(p2)]. E(x,y)∼f[||x − y||] = f(0, 0)||0 − 0|| + f(0, 1)||0 − 1|| + f(1, 0)||1 − 0|| + f(1, 1)||1 − 1|| = q1p2 + p1q2 JIN HO LEE Wasserstein GAN 2018-11-30 8 / 26
  • 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Example of Wasserstein Distance case 2. g ∈ Π[Ber(p1), Ber(p2)]. E(x,y)∼g[||x − y||] = g(0, 0)||0 − 0|| + g(0, 1)||0 − 1|| + g(1, 0)||1 − 0|| + g(1, 1)||1 − 1|| = p2 − p1 By case 1 and 2, we have W(Ber(p1), Ber(p2)) = inf γ∈Π[Ber(p1),Ber(p2)] E(x,y)∼γ[||x − y||] = inf γ∈{f,g} E(x,y)∼γ[||x − y||] = min{q1p2 + p1q2, p2 − p1}. JIN HO LEE Wasserstein GAN 2018-11-30 9 / 26
  • 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances An example of coulpings Lemma For p1, p2 ∈ [0, 1], the set of all couplings Π[Ber(p1), Ber(p2)] of Ber(p1) and Ber(p2) is {pa|a ∈ [0, 1]} where pa(0, 0) = a pa(0, 1) = q1 − a pa(1, 0) = q2 − a pa(1, 1) = p2 − q1 + a Proof. Let γ ∈ Π[Ber(p1), Ber(p2)]. Then we have the following table γ Y = 0 Y = 1 Σyγ(x, y) X = 0 q1 X = 1 q2 Σxγ(x, y) q2 p2 JIN HO LEE Wasserstein GAN 2018-11-30 10 / 26
  • 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances An example of coulpings For a ∈ [0, 1], if γ(0, 0) = a, then the following table is completely determined. γ Y = 0 Y = 1 Σyγ(x, y) X = 0 a q1 − a q1 X = 1 q2 − a p2 − (q1 − a) q2 Σxγ(x, y) q2 p2 It means that, for a ∈ [0, 1], we can have a coupling γ of Ber(p1) and Ber(p2) such that γ(0, 0) = a. This complete the proof. JIN HO LEE Wasserstein GAN 2018-11-30 11 / 26
  • 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances A computational result of Wasserstein Distance Theorem For p1 ≤ p2, we have W(Ber(p1), Ber(p2)) = p2 − p1. Proof. From the previous Lemma, we have Π[Ber(p1), Ber(p2)] = {pa|a ∈ [0, 1]} where pa(0, 0) = a. Then we obtain E(x,y)∼pa [||x − y||] = pa(0, 0)||0 − 0|| + pa(0, 1)||0 − 1|| + pa(1, 0)||1 − 0|| + pa(1, 1)||1 − 1|| = 2 − p1 − p2 − 2a Since p1 and p2 are constants and a is less or equal to marginal probabilities, we have a ≤ min{q1, q2}. From the assumption p1 ≤ p2, we have q1 ≥ q2 and min{q1, q2} = q2. JIN HO LEE Wasserstein GAN 2018-11-30 12 / 26
  • 13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances A computational result of Wasserstein Distance The function E(x,y)∼pa [||x − y||] = 2 − p1 − p2 − 2a is linear by a and a ≤ q2, we have 2 − p1 − p2 − 2a ≥ 2 − p1 − p2 − 2(1 − p2) = p2 − p1. JIN HO LEE Wasserstein GAN 2018-11-30 13 / 26
  • 14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Example 1 Example (1) • We assume that ▷ Z ∼ U[0, 1] : uniform distribution on the unit interval. ▷ P0 : be the distribution of (0, Z) ∈ R2, uniform on a straight vertical line passing through the origin. ▷ gθ(z) = (θ, z) with θ a single real parameter. Then we obtain the following. • W(P0, Pθ) = |θ| • JS(P0, Pθ) = { log 2 if θ ̸= 0, 0 if θ = 0, • KL(Pθ||P0) = KL(P0||Pθ) = { +∞ if θ ̸= 0, 0 if θ = 0, JIN HO LEE Wasserstein GAN 2018-11-30 14 / 26
  • 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Example 1 • δ(P0, Pθ) = { 1 if θ ̸= 0, 0 if θ = 0, • When θt → 0, the sequence (Pθ)t∈N converges to P0 under the EM distance, but does not convege at all under either us JS, KL, reverse KL, or TV divergences. JIN HO LEE Wasserstein GAN 2018-11-30 15 / 26
  • 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Theorem 1 Theorem (1) Let Pr be a fixed distribution over X. Let Z be a random variable (e.g Gaussian) over another space Z. Let g : Z × Rd → χ be a function, that will be denoted gθ(z) with z the first coordinate and θ the second. Let Pθ denote the distribution of gθ(z). Then, 1. If g is continuous in θ, so is W(Pr, Pθ). 2. If g is locally Lipschitz and satisfies regularity assumption 1, then W(Pr, Pθ) is continuous everywhere, and differentiable almost everywhere. 3. Statements 1-2 are false for the Jensen-Shannon divergence JS(Pr, Pθ) and all the KLs. JIN HO LEE Wasserstein GAN 2018-11-30 16 / 26
  • 17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Theorem 1 The following corollary tells us that learning by minimizing the EM distance makes sense (at least in theory) with neural networks. Corollary Let gθ be any feedforward neural network parameterized by θ, and p(z) a prior over z such that Ez∼p(z)[||z||] < ∞ (e.g. Gaussian, uniform, etc.). Then assumption 1 is satisfied and therefore W(Pr, Pθ) is continuous everywhere and differentiable almost everywhere. JIN HO LEE Wasserstein GAN 2018-11-30 17 / 26
  • 18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Different Distances Theorem 2 Theorem (2) Let P be a distribution on a compact space X and (Pn)n∈N be a sequence of distributions on X. Then, considering all limits as n → ∞, 1. The following statements are equivalent • δ(Pn, P) → 0 with δ the total variation distance. • JS(Pn, P) → 0 with JS the Jensen-Shannon divergence. 2. The following statements are equivalent • W(Pn, P) → 0. • Pn D −→ P where D −→ represents convergence in distribution for random variables. 3. KL(Pn||P) → 0 or KL(P||n) → 0 imply the statements in (2) JIN HO LEE Wasserstein GAN 2018-11-30 18 / 26
  • 19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Wasserstein GAN 3. Wasserstein GAN • Computing W(Pr, Pg) is intractible from the definition of Wasserstein distance. However, then Kantorovich-Rubinstein duality tell us that: W(Pr, Pg) = sup ||f||L≤1 Ex∼Pr [f(x)] − Ex∼Pθ [f(x)] where ||f||L ≤ 1 means that f satisfies 1-Lipschitz condition. • Note that, if we replace ||f||L ≤ 1 for ||f||L ≤ K for some K, we have K · W(Pr, Pg) = sup ||f||L≤K Ex∼Pr [f(x)] − Ex∼Pθ [f(x)]. • If we have a parametrized family functions {fw}w∈W that are all K-Lipschitz for some K, then we have: max w∈W Ex∼Pr [fw(x)] − Ex∼Pθ [fw(x)] ≤ sup ||f||L≤K Ex∼Pr [f(x)] − Ex∼Pθ [f(x)] = K · W(Pr, Pθ) JIN HO LEE Wasserstein GAN 2018-11-30 19 / 26
  • 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Wasserstein GAN Theorem 3 Theorem (3) Let Pr be any distribution. Let Pθ be the distribution of gθ(Z) with Z a random variable with density p and gθ a function satisfying assumption 1. Then, there is a solution f : χ → R to the problem max ||f||L≤1 Ex∼Pr [f(x)] − Ex∼Pθ [f(x)] and we have ∇θW(Pr, Pθ) = −Ez∼p(z)[∇θf(gθ(z))] when both terms are well-defined. • Objective functions: LWGAN D = Ex∼Pr [fw(x)] − Ez∼P(z)[fw(gθ(z))] LWGAN G = Ez∼P(z)[f(gθ(z))] where wD ← clip(w, −0.01, 0.01) in LD. JIN HO LEE Wasserstein GAN 2018-11-30 20 / 26
  • 22. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Wasserstein GAN Figure 2 In this paper, the Authors call discriminator critic. In Figure 2, we train a GAN discriminator and a WGAN critic still optimality. The discriminator learn very quickly to distinguish between fake and real. But, the critic can’t saturate and converges to a linear function. JIN HO LEE Wasserstein GAN 2018-11-30 22 / 26
  • 23. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Empirical Results • We claim two main benefits: ▷ a meaningful loss metric that correlates with the generator’s convergence and sample quality ▷ improved stability of the optimization process JIN HO LEE Wasserstein GAN 2018-11-30 23 / 26
  • 24. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Empirical Results 4.1 Experimental Procedure • Training curves and the visualization of samples at different stages of training show clear correlation between the Wasserstein estimate and the generated image quality. JIN HO LEE Wasserstein GAN 2018-11-30 24 / 26
  • 25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Empirical Results 4.1 Experimental Procedure JIN HO LEE Wasserstein GAN 2018-11-30 25 / 26
  • 26. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some knowledge to read Appendix • Let χ ∈ Rd be a compact set, that is closed and bounded by Heine-Borel Theorem and Prob(χ) a probability measure over χ. • We define Cb(χ) = {f : χ → R|f is continuous and bounded} • For f ∈ Cb(χ), we can define a norm ||f||∞ = max x∈χ |f(x)|, since f is bounded. • Then we have a normed vector space (Cb(χ), || · ||∞). • The dual space Cb(χ)∗ = {ϕ : Cb(χ) → R|ϕ is linear and continuous} has norm ||ϕ|| = sup f∈Cb(χ),||f||∞≤1 |ϕ(f)|. JIN HO LEE Wasserstein GAN 2018-11-30 26 / 26
  • 27. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some knowledge to read Appendix • Let µ be a signed measure over χ, and let the Total Variational distance ||µ||TV = sup A⊂χ |µ(A)| where A is a Borel subset in χ. For two probability distributions Pr and Pθ, we the function δ(Pr, Pθ) = ||Pr − Pθ||TV is a distance in Prob(χ) (called the Total Variation distance) • We can consider Φ : (Prob(χ), δ) → (Cb(χ)∗ , || · ||) where Φ(P)(f) = Ex∼P[f(x)] is a linear function over Cb(χ). • By the Riesz Representation Theorem, Φ is an isometric immersion, that is δ(P, Q) = ||Φ(P) − Φ(Q)|| and ϕ is a 1-1 correspondence. JIN HO LEE Wasserstein GAN 2018-11-30 26 / 26