Iclr2020: Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network

Taiji Suzuki1, Hiroshi Abe2, Tomoaki Nishimura3
Compression based bound for non-
compressed network: unified
generalization error analysis of large
compressible deep neural network
1
1 University of Tokyo/AIP-RIKEN/Japan Digital Design
2 iPride
3 NTT Data Corporation
https://openreview.net/forum?id=ByeGzlrKwH

Generalization of
overparameterized networks
2
[Neyshabur et al., ICLR2019]
# of parameters ≫ sample size
Why do they generalize?
⇒ Intrinsic dimensionality is small.
Compression based bound
(billions) (millions)

Generalization error of DL
• Generalization gap
3
: loss function (1-Lipschitz continuous w.r.t. 𝑓)
Empirical risk (training error) Population risk (generalization error)
For an estimator 𝑓 (DNN), we want to bound
: training data
Gen. Gap

Naïve bound (VC-bound) 4
?
VC-dimension
[Harvey et al.2017]
☹ The number of parameters ℓ=1
𝐿
𝑚ℓ 𝑚ℓ+1 appears in the bound.
☹ It does not explain the generalization ability of overparameterized net.
L

Bias Variance
Typical compression based bound:
[Arora et al., 2018; Zhou et al., 2019; Baykal et al., 2019; Suzuki et al., 2018]
Compression based bound 5
Original network Compressed network
compressible ⇔ simple
𝑚ℓ 𝑚ℓ
#
This type of bound does not give gen error of 𝒇.
Q: What happens for “non-compressed” network 𝒇 ?

Bias Variance
Typical compression based bound:
[Arora et al., 2018; Zhou et al., 2019; Baykal et al., 2019; Suzuki et al., 2018]
Compression based bound 6
Original network Compressed network
compressible ⇔ simple
Compressed
network
Original net
𝑚ℓ 𝑚ℓ
#
Size of compressed
network
This type of bound does not give gen error of 𝒇.
Q: What happens for “non-compressed” network 𝒇 ?
Bias-variance trade-off

Our new compression based bound 7
Trained network 𝑓 can be compressed to smaller one 𝑓#
.
( 𝑓 ∈ ℱ, 𝑓# ∈ ℱ#; ℱ is a set of trained net, ℱ# is a set of compressed net.)
Our new compression based bound (main result):
:compression scheme can be data dependent.
(This assumption restricts training procedure too)
(Existing bound)
𝑚ℓ 𝑚ℓ
#
𝑟

Our new compression based bound 8
Trained network 𝑓 can be compressed to smaller one 𝑓#
.
Our new compression based bound (main result):
(Existing bound)
Variance term can be smaller.
𝑚ℓ 𝑚ℓ
#
𝑟Improved

More precise description 9
with probability at least 1 − 𝑒−𝑡
.
: local Rademacher complexity
: fixed point of local Rad.
Trained network 𝑓 can be compressed to smaller one 𝑓#.
•
•
•
Theorem (compression based bound for the original net)
Fast part (O(1/n)) Main part (O(1/ 𝒏))
bias variance

Compression bounds for
non-compressed network
with low rank properties
10

Singular values of weight matrix 11
Rapid decay
See also Martin&Mahoney,
arXiv:1901.08276.
7-th layer in VGG-19 trained on CIFAR-10
Rapid decay
Eigenvalues of covariance matrix Singular-values of weight matrix
Both covariance matrix and
weight matrix shows rapid
decay of eigenvalues.
⇒ Small degree of freedom.

Near low rank weight and covariance12
• Near low rank weight matrix:
• Both of weight and covariance
are near low rank
Theorem
•
where .
+ Other boundedness condition.
Much smaller than the VC-bound:

Comparison with existing work 13
Comparison of intrinsic dimensionality between our degree of freedom and that in
Arora et al. (2018). They are computed on VGG-19 network trained on CIFAR-10.
larger smaller
2
[S. Arora, R. Ge, B. Neyshabur, and Y. Zhang. Stronger generalization bounds for deep nets via
a compression approach. ICML2018.]

Summary
Why overparamterized network can generalize?
• If the network can be compressed to a smaller
one, then it generalizes well.
 A general frame-work to obtain compression based
bound for non-compressed net is derived.
 Our bound gives better bias-variance trade-off.
 If the covariance and weight matrices are near low
rank, then the network can be compressed efficiently.
⇒ Better generalization.
14
For more details, please look at our paper:
https://openreview.net/forum?id=ByeGzlrKwH

Iclr2020: Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Iclr2020: Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network

Similar to Iclr2020: Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network (20)

More from Taiji Suzuki

More from Taiji Suzuki (13)

Recently uploaded

Recently uploaded (20)

Iclr2020: Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network