SlideShare a Scribd company logo
1 of 18
Download to read offline
Minimax statistical learning with Wasserstein distances
by Jaeho Lee and Maxim Raginsky
January 26, 2019
Presenter: Kenta Oono @ NeurIPS 2018 Reading Club
Kenta Oono (@delta2323 )
Profile
• 2011.3: MSc. (Mathematics)
• 2011.4-2014.10: Preferred Infrastructure (PFI)
• 2014.10-current: Preferred Networks (PFN)
• 2018.4-current: Ph.D student @U.Tokyo
Interests
• Mathematics
• Bioinformatics
• Theory of Deep Learning
2/18
Summary
What this paper does.
• Develop a distributionally-robust risk minimization problem.
• Derive the excess-risk rate O(n−1
2 ), same as the non-robust case.
• Application to domain adaptation.
Why I choose this paper?
• Spotlight talk
• Wanted to learn statistics learning theory
• Especially minimax optimality of DL. But this paper turned out to not be about it.
• Wanted to learn Wasserstein distance
3/18
Problem Setting (Expected Risk)
Given
• Z: sample space
• P: (unknown) distribution over Z
• Dataset: D = (z1, . . . , zN) ∼ P i.i.d.
For a hypothesis f : Z → R, we evaluate its expected risk by
• Expected Risk: R(P, f ) = EZ∼P[f (Z)]
• Hypothesis space: F ⊂ {Z → R}
4/18
Problem Setting (Estimator)
Goal:
• Devise an algorithm A : D → ˆf = ˆf (D)
• We treat D as a random variable. So, is ˆf .
• If A is a random algorithm (e.g. SGD), randomness of ˆf (D) comes from A, too.
• Evaluate excess risk: R(P, ˆf ) − inff ∈F R(P, f )
Typical form of theorems:
• EA,D[R(P, ˆf ) − inff ∈F R(P, f )] = O(g(n))
• R(P, ˆf ) − inff ∈F R(P, f ) = O(g(n, δ)) with probability 1 − δ with respect to the
choice of D (and A)
5/18
Problem Setting (ERM Estimator)
Since we cannot compute the expected risk R, we compute empirical risk instead:
ˆRD(f ) =
1
n
n
i=1
f (zi )
= R(Pn, f ) (Pn: empirical distribution).
ERM (Empirical Risk Minimization) estimator for hypothesis space F is
ˆf = ˆf (D) ∈ min
f ∈F
R(Pn, f )
6/18
Relation
7/18
Assumptions
+
OR
Ref. Lee and Raginsky (2018)
8/18
Example
Supervised learning
• Z = (X, Y ), X = RD: input space, Y = R: label space
• : Y × Y → R: loss function
• H ⊂ {X → Y }: set of models
• F = {fh(x, y) = (h(x), y)|h ∈ H}
Regression
• X = RD, Y = R, (y, y) = (y − y)2
• H = (Function realized by a neural networks with a fixed architecture)
9/18
Classical Result
Typically, we have
R(P, ˆf ) − inf
f ∈F
R(P, f ) = OP
complexity of F
√
n
Model complexity measure complexity of F (intuitively, how ”large” F is)
10/18
Covering number
Definition (Covering Number)
For F ⊂ F0 := {f : [−1, 1]D → R}, and ε > 0, the (external) covering number of F is
N(F, ε) := inf N ∈ N
∃f1, . . . , fN ∈ F0 s.t. ∀f ∈ F, ∃n ∈ [N] s.t.
f − fn ∞ ≤ ε
.
• Intuition: the minimum # of balls
(with radius ε) to cover the space F.
• Entropy integral:
C(F) :=
∞
0 log N(F, u) du.
11/18
Distributionally Robust Framework
Minimize the worst-case risk close to true distribution P.
minimize R(P, f )
↓
minimize Rρ,p(P, f ) := supQ∈Aρ,p(P) R(Q, f )
We consider p-Wasserstein distance:
Aρ,p(P) = {Q|Wp(P, Q) ≤ ρ}
Applications
• Adversarial attack: ρ = noise level
• Domain adaptation: ρ = discrepancy level of train/test dists.
12/18
Estimator
Correspondingly, we change the estimator
ˆf ∈ inf
f ∈F
Rρ,p(Pn, f )
Want to evaluate
Rρ,p(P, ˆf ) − inf
f ∈F
Rρ,pR(P, f )
13/18
Main Theorems
Same excess-risk rate as the non-robust setting.
Ref. Lee and Raginsky (2018)
14/18
Strategy
From authors slide
Ref: https://nips.cc/media/Slides/nips/2018/517cd(05-09-45)
-05-10-20-12649-Minimax_Statist.pdf
15/18
Key Lemmas
Ref. Lee
and Raginsky (2018)
16/18
Why these lemmas are important?
(Complexity of ΨΛ,F ) ≈ (Complexity of F) × (Complexity of Λ)
17/18
Impression
• Duality form of risk (Rρ(P, f ) = infλ≥0 E[ψλ,f (Z)]) may be useful of its own.
• Mysterious assumption 4 (incredibly local property of F).
• Special structure of p=1-Wasserstein distance?
18/18

More Related Content

Similar to Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading Club)

New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodYoonho Lee
 
Strategic Argumentation is NP-complete
Strategic Argumentation is NP-completeStrategic Argumentation is NP-complete
Strategic Argumentation is NP-completeGuido Governatori
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfPo-Chuan Chen
 
Regret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationRegret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationTasuku Soma
 
On the smallest enclosing information disk
 On the smallest enclosing information disk On the smallest enclosing information disk
On the smallest enclosing information diskFrank Nielsen
 
Uncoupled Regression from Pairwise Comparison Data
Uncoupled Regression from Pairwise Comparison DataUncoupled Regression from Pairwise Comparison Data
Uncoupled Regression from Pairwise Comparison DataLiyuan Xu
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon informationFrank Nielsen
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeTwo Sigma
 
A Unified Perspective for Darmon Points
A Unified Perspective for Darmon PointsA Unified Perspective for Darmon Points
A Unified Perspective for Darmon Pointsmmasdeu
 
Double Robustness: Theory and Applications with Missing Data
Double Robustness: Theory and Applications with Missing DataDouble Robustness: Theory and Applications with Missing Data
Double Robustness: Theory and Applications with Missing DataLu Mao
 
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Frank Nielsen
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Francesco Tudisco
 

Similar to Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading Club) (20)

New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
 
Strategic Argumentation is NP-complete
Strategic Argumentation is NP-completeStrategic Argumentation is NP-complete
Strategic Argumentation is NP-complete
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Mapping analysis
Mapping analysisMapping analysis
Mapping analysis
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdf
 
Lecture notes
Lecture notes Lecture notes
Lecture notes
 
Regret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationRegret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function Maximization
 
On the smallest enclosing information disk
 On the smallest enclosing information disk On the smallest enclosing information disk
On the smallest enclosing information disk
 
Slides lln-risques
Slides lln-risquesSlides lln-risques
Slides lln-risques
 
Uncoupled Regression from Pairwise Comparison Data
Uncoupled Regression from Pairwise Comparison DataUncoupled Regression from Pairwise Comparison Data
Uncoupled Regression from Pairwise Comparison Data
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon information
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and Practice
 
A Unified Perspective for Darmon Points
A Unified Perspective for Darmon PointsA Unified Perspective for Darmon Points
A Unified Perspective for Darmon Points
 
Double Robustness: Theory and Applications with Missing Data
Double Robustness: Theory and Applications with Missing DataDouble Robustness: Theory and Applications with Missing Data
Double Robustness: Theory and Applications with Missing Data
 
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
 

More from Kenta Oono

Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017Kenta Oono
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Kenta Oono
 
深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介Kenta Oono
 
20170422 数学カフェ Part2
20170422 数学カフェ Part220170422 数学カフェ Part2
20170422 数学カフェ Part2Kenta Oono
 
20170422 数学カフェ Part1
20170422 数学カフェ Part120170422 数学カフェ Part1
20170422 数学カフェ Part1Kenta Oono
 
情報幾何学の基礎、第7章発表ノート
情報幾何学の基礎、第7章発表ノート情報幾何学の基礎、第7章発表ノート
情報幾何学の基礎、第7章発表ノートKenta Oono
 
GTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionGTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionKenta Oono
 
On the benchmark of Chainer
On the benchmark of ChainerOn the benchmark of Chainer
On the benchmark of ChainerKenta Oono
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1Kenta Oono
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksKenta Oono
 
Introduction to Chainer and CuPy
Introduction to Chainer and CuPyIntroduction to Chainer and CuPy
Introduction to Chainer and CuPyKenta Oono
 
Stochastic Gradient MCMC
Stochastic Gradient MCMCStochastic Gradient MCMC
Stochastic Gradient MCMCKenta Oono
 
Chainer Contribution Guide
Chainer Contribution GuideChainer Contribution Guide
Chainer Contribution GuideKenta Oono
 
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用 2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用 Kenta Oono
 
Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)Kenta Oono
 
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料Kenta Oono
 
提供AMIについて
提供AMIについて提供AMIについて
提供AMIについてKenta Oono
 
Chainerインストール
ChainerインストールChainerインストール
ChainerインストールKenta Oono
 

More from Kenta Oono (20)

Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
 
深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介
 
20170422 数学カフェ Part2
20170422 数学カフェ Part220170422 数学カフェ Part2
20170422 数学カフェ Part2
 
20170422 数学カフェ Part1
20170422 数学カフェ Part120170422 数学カフェ Part1
20170422 数学カフェ Part1
 
情報幾何学の基礎、第7章発表ノート
情報幾何学の基礎、第7章発表ノート情報幾何学の基礎、第7章発表ノート
情報幾何学の基礎、第7章発表ノート
 
GTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionGTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introduction
 
On the benchmark of Chainer
On the benchmark of ChainerOn the benchmark of Chainer
On the benchmark of Chainer
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning Frameworks
 
Introduction to Chainer and CuPy
Introduction to Chainer and CuPyIntroduction to Chainer and CuPy
Introduction to Chainer and CuPy
 
Stochastic Gradient MCMC
Stochastic Gradient MCMCStochastic Gradient MCMC
Stochastic Gradient MCMC
 
Chainer Contribution Guide
Chainer Contribution GuideChainer Contribution Guide
Chainer Contribution Guide
 
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用 2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
 
Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)
 
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
 
提供AMIについて
提供AMIについて提供AMIについて
提供AMIについて
 
Chainerインストール
ChainerインストールChainerインストール
Chainerインストール
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading Club)

  • 1. Minimax statistical learning with Wasserstein distances by Jaeho Lee and Maxim Raginsky January 26, 2019 Presenter: Kenta Oono @ NeurIPS 2018 Reading Club
  • 2. Kenta Oono (@delta2323 ) Profile • 2011.3: MSc. (Mathematics) • 2011.4-2014.10: Preferred Infrastructure (PFI) • 2014.10-current: Preferred Networks (PFN) • 2018.4-current: Ph.D student @U.Tokyo Interests • Mathematics • Bioinformatics • Theory of Deep Learning 2/18
  • 3. Summary What this paper does. • Develop a distributionally-robust risk minimization problem. • Derive the excess-risk rate O(n−1 2 ), same as the non-robust case. • Application to domain adaptation. Why I choose this paper? • Spotlight talk • Wanted to learn statistics learning theory • Especially minimax optimality of DL. But this paper turned out to not be about it. • Wanted to learn Wasserstein distance 3/18
  • 4. Problem Setting (Expected Risk) Given • Z: sample space • P: (unknown) distribution over Z • Dataset: D = (z1, . . . , zN) ∼ P i.i.d. For a hypothesis f : Z → R, we evaluate its expected risk by • Expected Risk: R(P, f ) = EZ∼P[f (Z)] • Hypothesis space: F ⊂ {Z → R} 4/18
  • 5. Problem Setting (Estimator) Goal: • Devise an algorithm A : D → ˆf = ˆf (D) • We treat D as a random variable. So, is ˆf . • If A is a random algorithm (e.g. SGD), randomness of ˆf (D) comes from A, too. • Evaluate excess risk: R(P, ˆf ) − inff ∈F R(P, f ) Typical form of theorems: • EA,D[R(P, ˆf ) − inff ∈F R(P, f )] = O(g(n)) • R(P, ˆf ) − inff ∈F R(P, f ) = O(g(n, δ)) with probability 1 − δ with respect to the choice of D (and A) 5/18
  • 6. Problem Setting (ERM Estimator) Since we cannot compute the expected risk R, we compute empirical risk instead: ˆRD(f ) = 1 n n i=1 f (zi ) = R(Pn, f ) (Pn: empirical distribution). ERM (Empirical Risk Minimization) estimator for hypothesis space F is ˆf = ˆf (D) ∈ min f ∈F R(Pn, f ) 6/18
  • 8. Assumptions + OR Ref. Lee and Raginsky (2018) 8/18
  • 9. Example Supervised learning • Z = (X, Y ), X = RD: input space, Y = R: label space • : Y × Y → R: loss function • H ⊂ {X → Y }: set of models • F = {fh(x, y) = (h(x), y)|h ∈ H} Regression • X = RD, Y = R, (y, y) = (y − y)2 • H = (Function realized by a neural networks with a fixed architecture) 9/18
  • 10. Classical Result Typically, we have R(P, ˆf ) − inf f ∈F R(P, f ) = OP complexity of F √ n Model complexity measure complexity of F (intuitively, how ”large” F is) 10/18
  • 11. Covering number Definition (Covering Number) For F ⊂ F0 := {f : [−1, 1]D → R}, and ε > 0, the (external) covering number of F is N(F, ε) := inf N ∈ N ∃f1, . . . , fN ∈ F0 s.t. ∀f ∈ F, ∃n ∈ [N] s.t. f − fn ∞ ≤ ε . • Intuition: the minimum # of balls (with radius ε) to cover the space F. • Entropy integral: C(F) := ∞ 0 log N(F, u) du. 11/18
  • 12. Distributionally Robust Framework Minimize the worst-case risk close to true distribution P. minimize R(P, f ) ↓ minimize Rρ,p(P, f ) := supQ∈Aρ,p(P) R(Q, f ) We consider p-Wasserstein distance: Aρ,p(P) = {Q|Wp(P, Q) ≤ ρ} Applications • Adversarial attack: ρ = noise level • Domain adaptation: ρ = discrepancy level of train/test dists. 12/18
  • 13. Estimator Correspondingly, we change the estimator ˆf ∈ inf f ∈F Rρ,p(Pn, f ) Want to evaluate Rρ,p(P, ˆf ) − inf f ∈F Rρ,pR(P, f ) 13/18
  • 14. Main Theorems Same excess-risk rate as the non-robust setting. Ref. Lee and Raginsky (2018) 14/18
  • 15. Strategy From authors slide Ref: https://nips.cc/media/Slides/nips/2018/517cd(05-09-45) -05-10-20-12649-Minimax_Statist.pdf 15/18
  • 16. Key Lemmas Ref. Lee and Raginsky (2018) 16/18
  • 17. Why these lemmas are important? (Complexity of ΨΛ,F ) ≈ (Complexity of F) × (Complexity of Λ) 17/18
  • 18. Impression • Duality form of risk (Rρ(P, f ) = infλ≥0 E[ψλ,f (Z)]) may be useful of its own. • Mysterious assumption 4 (incredibly local property of F). • Special structure of p=1-Wasserstein distance? 18/18