PAC-Bayesian Bound for Deep Learning

•

1 like•551 views

Mark Chang

Technology

Outlines
• VC Dimension & VC Bound
• Generalization in Deep Learning
• PAC-Bayesian Bound
• PAC-Bayesian Bound for Deep Learning
• Tips for Training Deep Neural Networks

VC Dimension & VC Bound
data
training data
testing data
sampling
sampling
hypothesis
h
hypothesis
h
training algorithm:
minimize
training
error
update the hypothesis h
✏(h)
testing
error

VC Dimension & VC Bound
• Learning is feasible when is small -> is small
• With 1-ẟ probability, the following inequality (VC Bound) is satisfied
✏(h)
numbef of
training instances
VC Dimension
(model complexity)
✏(h)  ˆ✏(h) +
r
8
n
log(
4(2n)d
)

VC Dimension & VC Bound
• VC-Dimension for 2D Linear Model (VC Dimension = 3)
n = 3, shatter n = 4, not shattern = 2, shatter

VC Dimension & VC Bound
• For a fixed n:
✏(h)  ˆ✏(h) +
r
8
n
log(
4(2n)d
)
d=n
d (VC Dimension)
over-fittingerror

Generalization in Deep Learning
• Considering a M-NIST toy example:
2-layer NN
input: 780 hidden: 600
d = 26M
M-NIST Dataset
n = 50,000
d >> n

Generalization in Deep Learning
feature:
label: 1 0 1 0 1 1 1 0 0 1 1 0 1 0 1
original M-MNIST random label random noise features
2-layer NN
✏(h) ⇡ 0 ✏(h) ⇡ 0 ✏(h) ⇡ 0
shatter !

Generalization in Deep Learning
feature:
label: 1 0 1 0 1 1 1 0 0 1 1 0 1 0 1
original M-MNIST random label random noise features
2-layer NN
✏(h) ⇡ 0 ✏(h) ⇡ 0 ✏(h) ⇡ 0
ˆ✏(h) ⇡ 0.5 ˆ✏(h) ⇡ 0.5ˆ✏(h) ⇡ 0.02

PAC-Bayesian Bound
P (Prior) :
distribution of hypothesis
before training
Q (Posterior) :
distribution of hypothesis
after training
training
algorithm
training data

PAC-Bayesian Bound
data
training
data
sampling hypothesis
h
hypothesis
h
training algorithm:
minimize
training error
update the distribution
of hypothesis Q
testing error
Distribution
of hypothesis
Q
testing
data
sampling
sampling
sampling
ˆ✏(Q)
= Eh⇠Q(ˆ✏(h))
✏(Q)
= Eh⇠Q(✏(h))
ˆ✏(Q)

PAC-Bayesian Bound
• With 1-ẟ probability, the following inequality (PAC-Bayesian Bound) is
satisfied
number of
training instances
KL divergence
between P and Q
✏(Q)  ˆ✏(Q) +
s
KL(Q||P) + log(n
) + 2
2n 1

PAC-Bayesian Bound
✏(Q)  ˆ✏(Q) +
s
KL(Q||P) + log(n
) + 2
2n 1
P
Q
✏(Q)  ˆ✏(Q) +
s
KL(Q||P) + log(n
) + 2
2n 1
high ✏(Q)  ˆ✏(Q) +
s
KL(Q||P) + log(n
) + 2
2n 1
, high
under fitting over fittingappropriate fitting
✏(Q)  ˆ✏(Q) +
s
KL(Q||P) + log(n
) + 2
2n 1
moderate ✏(Q)  ˆ✏(Q) +
s
KL(Q||P) + lo
2n 1
, moderate ✏(Q)  ˆ✏(Q) +
s
KL(Q||P
2
low ✏(Q) , high
low KL(Q||P) moderate KL(Q||P) high KL(Q||P)
|✏(Q) ˆ✏(Q)|low |✏(Q) ˆ✏(Q)|moderae |✏(Q) ˆ✏(Q)|high
P
Q
P
Q
training data
testing data

PAC-Bayesian Bound for Deep Learning
• UAI 2017

PAC-Bayesian Bound for Deep Learning
• deterministic neural networks • stochastic neural networks (SNN)
w, bx y=wx+b
w, bx y=wx+b
distribution
of w, b
sampling

PAC-Bayesian Bound for Deep Learning
• With 1-ẟ probability, the following inequality (PAC-Bayesian Bound) is
satisfied
KL(ˆ✏(Q)||✏(Q)) 
KL(Q||P) + log(n
)
n 1
SNN after training
(posterior)
SNN before training
(prior)

PAC-Bayesian Bound for Deep Learning
feature:
label: 1 0 1 0 1 1 1 0 0 1 1 0 1 0 1
original M-MNIST random label random noise features
easy problem :
->small KL (Q||P)
->small ε(Q)
hard problems :
->large KL(Q||P)
->large ε(Q)
KL(ˆ✏(Q)||✏(Q)) 
KL(Q||P) + log(n
)
n 1

PAC-Bayesian Bound for Deep Learning
Random
Label
Original M-NIST
One, two or three hidden layers with 600~1200 neurons

Tips for Training Deep Neural Networks
Traditional Machine Learning
• Reducing d:
• reducing the number of
parameters
• weight decay
• early stop
Modern Deep Learning
• Reducing KL(Q||P)
• reducing the number of
parameters
• weight decay ?
• early stop ?
• starting from a good P
(transfer learning)
✏(h)  ˆ✏(h) +
r
8
n
log(
4(2n)d
) KL(ˆ✏(Q)||✏(Q)) 
KL(Q||P) + log(n
)
n 1

About the Speaker
Mark Chang
• Email: ckmarkoh at gmail dot com
• Facebook:
https://www.facebook.com/ckmarkoh.chang

What's hot

[DL輪読会]Recent Advances in Autoencoder-Based Representation LearningDeep Learning JP

「世界モデル」と関連研究についてMasahiro Suzuki

Hidden technical debt in machine learning systems（日本語資料）Ogushi Masaya

近年のHierarchical Vision TransformerYusuke Uchida

C++による数値解析の並列化手法dc1394

機械学習モデルのハイパパラメータ最適化gree_tech

AutoEncoderで特徴抽出Kai Sasaki

[DL輪読会]Disentangling by FactorisingDeep Learning JP

SSII2021 [OS2-01] 転移学習の基礎：異なるタスクの知識を利用するための機械学習の方法SSII

【DL輪読会】DINOv2: Learning Robust Visual Features without SupervisionDeep Learning JP

ブラックボックス最適化とその応用gree_tech

One Class SVMを用いた異常値検知Yuto Mori

統計的学習の基礎　5章前半（~5.6）Kota Mori

距離まとめられませんでしたHaruka Ozaki

【メタサーベイ】Transformerから基盤モデルまでの流れ / From Transformer to Foundation Modelscvpaper. challenge

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo

時系列問題に対するCNNの有用性検証Masaharu Kinoshita

最近のDeep Learning (NLP) 界隈におけるAttention事情Yuta Kikuchi

tvmf-similarityKyushu University

PFP：材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催Preferred Networks

What's hot (20)

[DL輪読会]Recent Advances in Autoencoder-Based Representation Learning

「世界モデル」と関連研究について

Hidden technical debt in machine learning systems（日本語資料）

近年のHierarchical Vision Transformer

C++による数値解析の並列化手法

機械学習モデルのハイパパラメータ最適化

AutoEncoderで特徴抽出

[DL輪読会]Disentangling by Factorising

SSII2021 [OS2-01] 転移学習の基礎：異なるタスクの知識を利用するための機械学習の方法

【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision

ブラックボックス最適化とその応用

One Class SVMを用いた異常値検知

統計的学習の基礎　5章前半（~5.6）

距離まとめられませんでした

【メタサーベイ】Transformerから基盤モデルまでの流れ / From Transformer to Foundation Models

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...

時系列問題に対するCNNの有用性検証

最近のDeep Learning (NLP) 界隈におけるAttention事情

tvmf-similarity

PFP：材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催

Similar to PAC-Bayesian Bound for Deep Learning

PAC Bayesian for Deep LearningMark Chang

Deep Learning for Personalized Search and Recommender SystemsBenjamin Le

Information in the WeightsMark Chang

lecture_16.pptxObaidUllah693733

Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Sean Moran

DeepLearningLecture.pptxssuserf07225

Methods of Manifold Learning for Dimension Reduction of Large Data SetsRyan B Harvey, CSDP, CSM

Domain AdaptationMark Chang

Deep Learning Bangalore meet up Satyam Saxena

DLBLR talkAnuj Gupta

Algorithmic Data Science = Theory + PracticeTwo Sigma

SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONaftab alam

D3L4-objects.pdfssusere945ae

presentation2-180202073525.pptxKtonNguyn2

PAC LearningSanghyuk Chun

lec10svm.pptTheULTIMATEALLROUNDE

Deep Learning for Computer Vision: Object Detection (UPC 2016)Universitat Politècnica de Catalunya

Anomaly detection using deep one class classifier홍배 김

NIPS2007: structured predictionzukun

Similar to PAC-Bayesian Bound for Deep Learning (20)

PAC Bayesian for Deep Learning

Deep Learning for Personalized Search and Recommender Systems

Information in the Weights

lecture_16.pptx

Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...

DeepLearningLecture.pptx

Methods of Manifold Learning for Dimension Reduction of Large Data Sets

Domain Adaptation

Deep Learning Bangalore meet up

DLBLR talk

Algorithmic Data Science = Theory + Practice

SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION

D3L4-objects.pdf

presentation2-180202073525.pptx

PAC Learning

lec10svm.ppt

Deep Learning for Computer Vision: Object Detection (UPC 2016)

Anomaly detection using deep one class classifier

NIPS2007: structured prediction

Recently uploaded

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Gen AI in Business - Global Trends Report 2024.pdfAddepto

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Advanced Computer Architecture – An IntroductionDilum Bandara

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Commit 2024 - Secret Management made easyAlfredo García Lavilla

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf

DevoxxFR 2024 Reproducible Builds with Apache Maven

Gen AI in Business - Global Trends Report 2024.pdf

How AI, OpenAI, and ChatGPT impact business and software.

What is DBT - The Ultimate Data Build Tool.pdf

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

WordPress Websites for Engineers: Elevate Your Brand

Advanced Computer Architecture – An Introduction

DSPy a system for AI to Write Prompts and Do Fine Tuning

The Ultimate Guide to Choosing WordPress Pros and Cons

Nell’iperspazio con Rocket: il Framework Web di Rust!

Commit 2024 - Secret Management made easy

What's New in Teams Calling, Meetings and Devices March 2024

DMCC Future of Trade Web3 - Special Edition

Dev Dives: Streamline document processing with UiPath Studio Web

Are Multi-Cloud and Serverless Good or Bad?

DevEX - reference for building teams, processes, and platforms

TeamStation AI System Report LATAM IT Salaries 2024

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

Take control of your SAP testing with UiPath Test Suite

PAC-Bayesian Bound for Deep Learning

1. PAC-Bayesian Bound for Deep Learning Mark Chang 2019/11/26

2. Outlines • VC Dimension & VC Bound • Generalization in Deep Learning • PAC-Bayesian Bound • PAC-Bayesian Bound for Deep Learning • Tips for Training Deep Neural Networks

3. VC Dimension & VC Bound data training data testing data sampling sampling hypothesis h hypothesis h training algorithm: minimize training error update the hypothesis h ✏(h) testing error

4. VC Dimension & VC Bound • Learning is feasible when is small -> is small • With 1-ẟ probability, the following inequality (VC Bound) is satisfied ✏(h) numbef of training instances VC Dimension (model complexity) ✏(h)  ˆ✏(h) + r 8 n log( 4(2n)d )

5. VC Dimension & VC Bound • VC-Dimension for 2D Linear Model (VC Dimension = 3) n = 3, shatter n = 4, not shattern = 2, shatter

6. VC Dimension & VC Bound • For a fixed n: ✏(h)  ˆ✏(h) + r 8 n log( 4(2n)d ) d=n d (VC Dimension) over-fittingerror

7. Generalization in Deep Learning • Considering a M-NIST toy example: 2-layer NN input: 780 hidden: 600 d = 26M M-NIST Dataset n = 50,000 d >> n

8. Generalization in Deep Learning feature: label: 1 0 1 0 1 1 1 0 0 1 1 0 1 0 1 original M-MNIST random label random noise features 2-layer NN ✏(h) ⇡ 0 ✏(h) ⇡ 0 ✏(h) ⇡ 0 shatter !

9. Generalization in Deep Learning feature: label: 1 0 1 0 1 1 1 0 0 1 1 0 1 0 1 original M-MNIST random label random noise features 2-layer NN ✏(h) ⇡ 0 ✏(h) ⇡ 0 ✏(h) ⇡ 0 ˆ✏(h) ⇡ 0.5 ˆ✏(h) ⇡ 0.5ˆ✏(h) ⇡ 0.02

10. PAC-Bayesian Bound • COLT 1999

11. PAC-Bayesian Bound P (Prior) : distribution of hypothesis before training Q (Posterior) : distribution of hypothesis after training training algorithm training data

12. PAC-Bayesian Bound data training data sampling hypothesis h hypothesis h training algorithm: minimize training error update the distribution of hypothesis Q testing error Distribution of hypothesis Q testing data sampling sampling sampling ˆ✏(Q) = Eh⇠Q(ˆ✏(h)) ✏(Q) = Eh⇠Q(✏(h)) ˆ✏(Q)

13. PAC-Bayesian Bound • With 1-ẟ probability, the following inequality (PAC-Bayesian Bound) is satisfied number of training instances KL divergence between P and Q ✏(Q)  ˆ✏(Q) + s KL(Q||P) + log(n ) + 2 2n 1

14. PAC-Bayesian Bound ✏(Q)  ˆ✏(Q) + s KL(Q||P) + log(n ) + 2 2n 1 P Q ✏(Q)  ˆ✏(Q) + s KL(Q||P) + log(n ) + 2 2n 1 high ✏(Q)  ˆ✏(Q) + s KL(Q||P) + log(n ) + 2 2n 1 , high under fitting over fittingappropriate fitting ✏(Q)  ˆ✏(Q) + s KL(Q||P) + log(n ) + 2 2n 1 moderate ✏(Q)  ˆ✏(Q) + s KL(Q||P) + lo 2n 1 , moderate ✏(Q)  ˆ✏(Q) + s KL(Q||P 2 low ✏(Q) , high low KL(Q||P) moderate KL(Q||P) high KL(Q||P) |✏(Q) ˆ✏(Q)|low |✏(Q) ˆ✏(Q)|moderae |✏(Q) ˆ✏(Q)|high P Q P Q training data testing data

15. PAC-Bayesian Bound for Deep Learning • UAI 2017

16. PAC-Bayesian Bound for Deep Learning • deterministic neural networks • stochastic neural networks (SNN) w, bx y=wx+b w, bx y=wx+b distribution of w, b sampling

17. PAC-Bayesian Bound for Deep Learning • With 1-ẟ probability, the following inequality (PAC-Bayesian Bound) is satisfied KL(ˆ✏(Q)||✏(Q))  KL(Q||P) + log(n ) n 1 SNN after training (posterior) SNN before training (prior)

18. PAC-Bayesian Bound for Deep Learning feature: label: 1 0 1 0 1 1 1 0 0 1 1 0 1 0 1 original M-MNIST random label random noise features easy problem : ->small KL (Q||P) ->small ε(Q) hard problems : ->large KL(Q||P) ->large ε(Q) KL(ˆ✏(Q)||✏(Q))  KL(Q||P) + log(n ) n 1

19. PAC-Bayesian Bound for Deep Learning Random Label Original M-NIST One, two or three hidden layers with 600~1200 neurons

20. Tips for Training Deep Neural Networks Traditional Machine Learning • Reducing d: • reducing the number of parameters • weight decay • early stop Modern Deep Learning • Reducing KL(Q||P) • reducing the number of parameters • weight decay ? • early stop ? • starting from a good P (transfer learning) ✏(h)  ˆ✏(h) + r 8 n log( 4(2n)d ) KL(ˆ✏(Q)||✏(Q))  KL(Q||P) + log(n ) n 1

21. About the Speaker Mark Chang • Email: ckmarkoh at gmail dot com • Facebook: https://www.facebook.com/ckmarkoh.chang

PAC-Bayesian Bound for Deep Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PAC-Bayesian Bound for Deep Learning

Similar to PAC-Bayesian Bound for Deep Learning (20)

More from Mark Chang

More from Mark Chang (20)

Recently uploaded

Recently uploaded (20)

PAC-Bayesian Bound for Deep Learning