4. VC Dimension & VC Bound
• Learning is feasible when is small -> is small
• With 1-ẟ probability, the following inequality (VC Bound) is satisfied
✏(h)
numbef of
training instances
VC Dimension
(model complexity)
✏(h) ˆ✏(h) +
r
8
n
log(
4(2n)d
)
13. PAC-Bayesian Bound
• With 1-ẟ probability, the following inequality (PAC-Bayesian Bound) is
satisfied
number of
training instances
KL divergence
between P and Q
✏(Q) ˆ✏(Q) +
s
KL(Q||P) + log(n
) + 2
2n 1
14. PAC-Bayesian Bound
✏(Q) ˆ✏(Q) +
s
KL(Q||P) + log(n
) + 2
2n 1
P
Q
✏(Q) ˆ✏(Q) +
s
KL(Q||P) + log(n
) + 2
2n 1
high ✏(Q) ˆ✏(Q) +
s
KL(Q||P) + log(n
) + 2
2n 1
, high
under fitting over fittingappropriate fitting
✏(Q) ˆ✏(Q) +
s
KL(Q||P) + log(n
) + 2
2n 1
moderate ✏(Q) ˆ✏(Q) +
s
KL(Q||P) + lo
2n 1
, moderate ✏(Q) ˆ✏(Q) +
s
KL(Q||P
2
low ✏(Q) , high
low KL(Q||P) moderate KL(Q||P) high KL(Q||P)
|✏(Q) ˆ✏(Q)|low |✏(Q) ˆ✏(Q)|moderae |✏(Q) ˆ✏(Q)|high
P
Q
P
Q
training data
testing data
17. PAC-Bayesian Bound for Deep Learning
• With 1-ẟ probability, the following inequality (PAC-Bayesian Bound) is
satisfied
KL(ˆ✏(Q)||✏(Q))
KL(Q||P) + log(n
)
n 1
SNN after training
(posterior)
SNN before training
(prior)
20. Tips for Training Deep Neural Networks
Traditional Machine Learning
• Reducing d:
• reducing the number of
parameters
• weight decay
• early stop
Modern Deep Learning
• Reducing KL(Q||P)
• reducing the number of
parameters
• weight decay ?
• early stop ?
• starting from a good P
(transfer learning)
✏(h) ˆ✏(h) +
r
8
n
log(
4(2n)d
) KL(ˆ✏(Q)||✏(Q))
KL(Q||P) + log(n
)
n 1