Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- What to Upload to SlideShare by SlideShare 6471684 views
- Customer Code: Creating a Company C... by HubSpot 4822440 views
- Be A Great Product Leader (Amplify,... by Adam Nash 1080032 views
- Trillion Dollar Coach Book (Bill Ca... by Eric Schmidt 1267010 views
- APIdays Paris 2019 - Innovation @ s... by apidays 1519227 views
- A few thoughts on work life-balance by Wim Vanderbauwhede 1107446 views

PAC-Bayesian Bound for Deep Learning

License: CC Attribution License

No Downloads

Total views

302

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

11

Comments

0

Likes

1

No notes for slide

- 1. PAC-Bayesian Bound for Deep Learning Mark Chang 2019/11/26
- 2. Outlines • VC Dimension & VC Bound • Generalization in Deep Learning • PAC-Bayesian Bound • PAC-Bayesian Bound for Deep Learning • Tips for Training Deep Neural Networks
- 3. VC Dimension & VC Bound data training data testing data sampling sampling hypothesis h hypothesis h training algorithm: minimize training error update the hypothesis h ✏(h) testing error
- 4. VC Dimension & VC Bound • Learning is feasible when is small -> is small • With 1-ẟ probability, the following inequality (VC Bound) is satisfied ✏(h) numbef of training instances VC Dimension (model complexity) ✏(h) ˆ✏(h) + r 8 n log( 4(2n)d )
- 5. VC Dimension & VC Bound • VC-Dimension for 2D Linear Model (VC Dimension = 3) n = 3, shatter n = 4, not shattern = 2, shatter
- 6. VC Dimension & VC Bound • For a fixed n: ✏(h) ˆ✏(h) + r 8 n log( 4(2n)d ) d=n d (VC Dimension) over-fittingerror
- 7. Generalization in Deep Learning • Considering a M-NIST toy example: 2-layer NN input: 780 hidden: 600 d = 26M M-NIST Dataset n = 50,000 d >> n
- 8. Generalization in Deep Learning feature: label: 1 0 1 0 1 1 1 0 0 1 1 0 1 0 1 original M-MNIST random label random noise features 2-layer NN ✏(h) ⇡ 0 ✏(h) ⇡ 0 ✏(h) ⇡ 0 shatter !
- 9. Generalization in Deep Learning feature: label: 1 0 1 0 1 1 1 0 0 1 1 0 1 0 1 original M-MNIST random label random noise features 2-layer NN ✏(h) ⇡ 0 ✏(h) ⇡ 0 ✏(h) ⇡ 0 ˆ✏(h) ⇡ 0.5 ˆ✏(h) ⇡ 0.5ˆ✏(h) ⇡ 0.02
- 10. PAC-Bayesian Bound • COLT 1999
- 11. PAC-Bayesian Bound P (Prior) : distribution of hypothesis before training Q (Posterior) : distribution of hypothesis after training training algorithm training data
- 12. PAC-Bayesian Bound data training data sampling hypothesis h hypothesis h training algorithm: minimize training error update the distribution of hypothesis Q testing error Distribution of hypothesis Q testing data sampling sampling sampling ˆ✏(Q) = Eh⇠Q(ˆ✏(h)) ✏(Q) = Eh⇠Q(✏(h)) ˆ✏(Q)
- 13. PAC-Bayesian Bound • With 1-ẟ probability, the following inequality (PAC-Bayesian Bound) is satisfied number of training instances KL divergence between P and Q ✏(Q) ˆ✏(Q) + s KL(Q||P) + log(n ) + 2 2n 1
- 14. PAC-Bayesian Bound ✏(Q) ˆ✏(Q) + s KL(Q||P) + log(n ) + 2 2n 1 P Q ✏(Q) ˆ✏(Q) + s KL(Q||P) + log(n ) + 2 2n 1 high ✏(Q) ˆ✏(Q) + s KL(Q||P) + log(n ) + 2 2n 1 , high under fitting over fittingappropriate fitting ✏(Q) ˆ✏(Q) + s KL(Q||P) + log(n ) + 2 2n 1 moderate ✏(Q) ˆ✏(Q) + s KL(Q||P) + lo 2n 1 , moderate ✏(Q) ˆ✏(Q) + s KL(Q||P 2 low ✏(Q) , high low KL(Q||P) moderate KL(Q||P) high KL(Q||P) |✏(Q) ˆ✏(Q)|low |✏(Q) ˆ✏(Q)|moderae |✏(Q) ˆ✏(Q)|high P Q P Q training data testing data
- 15. PAC-Bayesian Bound for Deep Learning • UAI 2017
- 16. PAC-Bayesian Bound for Deep Learning • deterministic neural networks • stochastic neural networks (SNN) w, bx y=wx+b w, bx y=wx+b distribution of w, b sampling
- 17. PAC-Bayesian Bound for Deep Learning • With 1-ẟ probability, the following inequality (PAC-Bayesian Bound) is satisfied KL(ˆ✏(Q)||✏(Q)) KL(Q||P) + log(n ) n 1 SNN after training (posterior) SNN before training (prior)
- 18. PAC-Bayesian Bound for Deep Learning feature: label: 1 0 1 0 1 1 1 0 0 1 1 0 1 0 1 original M-MNIST random label random noise features easy problem : ->small KL (Q||P) ->small ε(Q) hard problems : ->large KL(Q||P) ->large ε(Q) KL(ˆ✏(Q)||✏(Q)) KL(Q||P) + log(n ) n 1
- 19. PAC-Bayesian Bound for Deep Learning Random Label Original M-NIST One, two or three hidden layers with 600~1200 neurons
- 20. Tips for Training Deep Neural Networks Traditional Machine Learning • Reducing d: • reducing the number of parameters • weight decay • early stop Modern Deep Learning • Reducing KL(Q||P) • reducing the number of parameters • weight decay ? • early stop ? • starting from a good P (transfer learning) ✏(h) ˆ✏(h) + r 8 n log( 4(2n)d ) KL(ˆ✏(Q)||✏(Q)) KL(Q||P) + log(n ) n 1
- 21. About the Speaker Mark Chang • Email: ckmarkoh at gmail dot com • Facebook: https://www.facebook.com/ckmarkoh.chang

No public clipboards found for this slide

Be the first to comment