Slideshare breaking inter layer co-adaptation

•

2 likes•1,119 views

This paper proposes a technique called classifier anonymization (FOCA) to break co-adaptation between the feature extractor and classifier in deep neural networks. FOCA trains the feature extractor to make weak classifiers strong by optimizing it for different randomly generated weak classifiers on small batches of data. The paper theoretically proves that under FOCA, the feature extractor learns to project data points to a simple point-like distribution in the feature space. Experiments on real datasets show that FOCA allows the classifier to be trained with fewer samples than standard training and largely confirms the point-like property.

Science

Masayuki Tanaka
Breaking Inter-Layer Co-Adaptation
by Classifier Anonymization
Ikuro Sato†, Kohta Ishikawa†, Guoqing Liu†, Masayuki Tanaka‡
(ICML2019)
† ‡

Meta reviewer’s comment
…This paper seems to me like a perfect example of a
“High Risk High Reward” paper, …
Acceptance ratio of ICML2019: 773/3424 = 22.6%
We have taken that as a compliment. It is a research!
1

What I’m going to talk
𝑥𝑥
Input
𝐹𝐹𝜙𝜙(𝑥𝑥) 𝐶𝐶𝜃𝜃(𝜉𝜉)
𝜂𝜂
Output
𝜉𝜉
Feature
Let’s consider a classification task.
Feature extractor Classifier
+
-
Feature space 𝜉𝜉
+
+
+ +
+
+ +
--
-
-
-- -
-
Feature space 𝜉𝜉
+
++
+
+
+
+-- --
--
-
End-to-end DNN
<<
Which is better? Why? How can we obtain good features?2

Summary
About what?
How?
Theory?
In reality?
Breaking co-adaptation between
feature extractor and classifier.
By classifier anonymization technique.
Proved: Features form simple
point-like distribution.
Point-like property largely confirmed
on real datasets.
3

What is a co-adaptation?
𝑥𝑥
Input
𝐹𝐹𝜙𝜙(𝑥𝑥) 𝐶𝐶𝜃𝜃(𝜉𝜉)
𝜂𝜂
Output
𝜉𝜉
Feature
Let’s consider a classification task.
Feature extractor Classifier
+
-
Feature space 𝜉𝜉
Decision
boundary
+
+
+ +
+
+ +
--
-
-
-- -
Co-adaptation:
Feature extractor adapts a particular classifier.
Classifier adapts a particular feature extractor.
Break
co-adaptation
-
Feature space 𝜉𝜉
+
++
+
+
+
+-- --
--
-
Classifiers
Feature extractor should be
trained for many classifiers.
End-to-end DNN
4

Proposed algorithm: FOCA
-
Feature space 𝜉𝜉
+++
+
+ ++
--
-----
(Under several conditions,)
we theoretically proved the FOCA
can train the feature extractor
which projects single point.
for given feature extractor
FOCA can train feature extractor to make any weak classifier strong.
FOCA:
Feature-extractor Optimization through Classifier Anonymization
5

Message of FOCA
Traditional training FOCA training
Feature extractor
(Junior researcher)
Feature extractor
(Junior researcher)
Weak classifiers
(Boss variety???)
Strong classifier
(Smart boss)
Transfer learning
(New boss, new domain)
FOCA can train
feature extractor strong.
6

Weak classifier assumption
Definition:
Weak classifier is slightly better than random guess.
𝜃𝜃𝜙𝜙
∗
= arg min
𝜃𝜃
E
(𝑥𝑥,𝑡𝑡)~𝑝𝑝(𝑥𝑥,𝑡𝑡)
𝐿𝐿 𝐶𝐶𝜃𝜃 𝐹𝐹𝜙𝜙(𝑥𝑥) , 𝑡𝑡
𝜃𝜃𝜙𝜙
𝐵𝐵
= arg min
𝜃𝜃
�
𝑥𝑥,𝑡𝑡 ∈𝐵𝐵
𝐿𝐿 𝐶𝐶𝜃𝜃 𝐹𝐹𝜙𝜙(𝑥𝑥) , 𝑡𝑡
Strong classifier
Strong classifier is strong for entire data.
Weak classifier assumption
We assume that strong classifier for small samples is
weak classifier for entire data.
B is small samples of entire data.
7

Practical FOCA algorithm
𝐹𝐹𝜙𝜙(𝑥𝑥)
𝐶𝐶𝜃𝜃(𝜉𝜉)
Weak classifier
generatorFeature
extractor
Classifier model
𝐹𝐹𝐹𝜙𝜙(𝑥𝑥)
Previous
feature extractor
Training data
Optimize the classifier
for given small samples
with previous feature extractor.
Update feature extractor
for given mini-batch
with weak classifier.
Sampling
𝐶𝐶𝜃𝜃(𝜉𝜉)
Weak classifier
Update
Mini-batch
8

Experimental validation
Two-step training:
Train the feature extractor. Then, train the classifier with the fixed
given feature extractor.
-
Feature space 𝜉𝜉
+
+
+ +
+
+ +
--
-
-
-- -
Co-adaptation Point-like
-
Feature space 𝜉𝜉
+++
+
+ ++
--
-----
Many samples are required to train
the classifier.
A few samples are good enough to
train the classifier.
9

Links
Official proceedings of ICML2019
http://proceedings.mlr.press/v97/
arxiv: Breaking Inter-Layer Co-Adaptation by Classifier Anonymization
https://arxiv.org/abs/1906.01150
Twitter: Masayuki Tanaka
https://twitter.com/likesilkto
Twitter: Ikuro Sato
https://twitter.com/ikuro_s
12

What's hot

Fuzzy logic member functionsDr. C.V. Suresh Babu

GenericsRavi_Kant_Sahu

Optimal feature selection from v mware esxi 5.1 feature setijccmsjournal

Best practices in JavaMudit Gupta

Wrapper classesRavi_Kant_Sahu

DotNet programming & PracticesDev Raj Gautam

(Recursion)adsRavi Rao

Recursion Pattern Analysis and FeedbackSander Mak (@Sander_Mak)

Pattern Matching - at a glanceKnoldus Inc.

Chapter 11 dsHanif Durad

Java GenericsDeeptiJava

wrapper classesRajesh Roky

Feature selectionDong Guo

Generics in javasuraj pandey

Feature recognition and classificationSooraz Sresta

Data Handling and FunctionRatnaJava

What's hot (16)

Fuzzy logic member functions

Generics

Optimal feature selection from v mware esxi 5.1 feature set

Best practices in Java

Wrapper classes

DotNet programming & Practices

(Recursion)ads

Recursion Pattern Analysis and Feedback

Pattern Matching - at a glance

Chapter 11 ds

Java Generics

wrapper classes

Feature selection

Generics in java

Feature recognition and classification

Data Handling and Function

Recently uploaded

Harmful and Useful Microorganisms Presentationtahreemzahra82

Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain

User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems

CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS

GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh

Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur

User Guide: Magellan MX™ Weather StationColumbia Weather Systems

Topic 9- General Principles of International Law.pptxJorenAcuavera1

Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju

trihybrid cross , test cross chi squaresusmanzain586

Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju

Organic farming with special reference to vermicultureTakeleZike1

well logging & petrophysical analysis.pptxzaydmeerab121

Biological classification of plants with detailhaiderbaloch3

STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju

AZOTOBACTER AS BIOFERILIZER.PPTXGovt. N.P.G College of Science Raipur (C.G)

Ai in communication electronicss[1].pptxsubscribeus100

Recently uploaded (20)

Harmful and Useful Microorganisms Presentation

Servosystem Theory / Cybernetic Theory by Petrovic

User Guide: Orion™ Weather Station (Columbia Weather Systems)

CHROMATOGRAPHY PALLAVI RAWAT.pptx

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...

GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides

Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...

User Guide: Magellan MX™ Weather Station

Topic 9- General Principles of International Law.pptx

Pests of soyabean_Binomics_IdentificationDr.UPR.pdf

trihybrid cross , test cross chi squares

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf

Organic farming with special reference to vermiculture

well logging & petrophysical analysis.pptx

Biological classification of plants with detail

STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf

AZOTOBACTER AS BIOFERILIZER.PPTX

Ai in communication electronicss[1].pptx

Slideshare breaking inter layer co-adaptation

1. Masayuki Tanaka Breaking Inter-Layer Co-Adaptation by Classifier Anonymization Ikuro Sato†, Kohta Ishikawa†, Guoqing Liu†, Masayuki Tanaka‡ (ICML2019) † ‡

2. Meta reviewer’s comment …This paper seems to me like a perfect example of a “High Risk High Reward” paper, … Acceptance ratio of ICML2019: 773/3424 = 22.6% We have taken that as a compliment. It is a research! 1

3. What I’m going to talk 𝑥𝑥 Input 𝐹𝐹𝜙𝜙(𝑥𝑥) 𝐶𝐶𝜃𝜃(𝜉𝜉) 𝜂𝜂 Output 𝜉𝜉 Feature Let’s consider a classification task. Feature extractor Classifier + - Feature space 𝜉𝜉 + + + + + + + -- - - -- - - Feature space 𝜉𝜉 + ++ + + + +-- -- -- - End-to-end DNN << Which is better? Why? How can we obtain good features?2

4. Summary About what? How? Theory? In reality? Breaking co-adaptation between feature extractor and classifier. By classifier anonymization technique. Proved: Features form simple point-like distribution. Point-like property largely confirmed on real datasets. 3

5. What is a co-adaptation? 𝑥𝑥 Input 𝐹𝐹𝜙𝜙(𝑥𝑥) 𝐶𝐶𝜃𝜃(𝜉𝜉) 𝜂𝜂 Output 𝜉𝜉 Feature Let’s consider a classification task. Feature extractor Classifier + - Feature space 𝜉𝜉 Decision boundary + + + + + + + -- - - -- - Co-adaptation: Feature extractor adapts a particular classifier. Classifier adapts a particular feature extractor. Break co-adaptation - Feature space 𝜉𝜉 + ++ + + + +-- -- -- - Classifiers Feature extractor should be trained for many classifiers. End-to-end DNN 4

6. Proposed algorithm: FOCA - Feature space 𝜉𝜉 +++ + + ++ -- ----- (Under several conditions,) we theoretically proved the FOCA can train the feature extractor which projects single point. for given feature extractor FOCA can train feature extractor to make any weak classifier strong. FOCA: Feature-extractor Optimization through Classifier Anonymization 5

7. Message of FOCA Traditional training FOCA training Feature extractor (Junior researcher) Feature extractor (Junior researcher) Weak classifiers (Boss variety???) Strong classifier (Smart boss) Transfer learning (New boss, new domain) FOCA can train feature extractor strong. 6

8. Weak classifier assumption Definition: Weak classifier is slightly better than random guess. 𝜃𝜃𝜙𝜙 ∗ = arg min 𝜃𝜃 E (𝑥𝑥,𝑡𝑡)~𝑝𝑝(𝑥𝑥,𝑡𝑡) 𝐿𝐿 𝐶𝐶𝜃𝜃 𝐹𝐹𝜙𝜙(𝑥𝑥) , 𝑡𝑡 𝜃𝜃𝜙𝜙 𝐵𝐵 = arg min 𝜃𝜃 � 𝑥𝑥,𝑡𝑡 ∈𝐵𝐵 𝐿𝐿 𝐶𝐶𝜃𝜃 𝐹𝐹𝜙𝜙(𝑥𝑥) , 𝑡𝑡 Strong classifier Strong classifier is strong for entire data. Weak classifier assumption We assume that strong classifier for small samples is weak classifier for entire data. B is small samples of entire data. 7

9. Practical FOCA algorithm 𝐹𝐹𝜙𝜙(𝑥𝑥) 𝐶𝐶𝜃𝜃(𝜉𝜉) Weak classifier generatorFeature extractor Classifier model 𝐹𝐹𝐹𝜙𝜙(𝑥𝑥) Previous feature extractor Training data Optimize the classifier for given small samples with previous feature extractor. Update feature extractor for given mini-batch with weak classifier. Sampling 𝐶𝐶𝜃𝜃(𝜉𝜉) Weak classifier Update Mini-batch 8

10. Experimental validation Two-step training: Train the feature extractor. Then, train the classifier with the fixed given feature extractor. - Feature space 𝜉𝜉 + + + + + + + -- - - -- - Co-adaptation Point-like - Feature space 𝜉𝜉 +++ + + ++ -- ----- Many samples are required to train the classifier. A few samples are good enough to train the classifier. 9

11. Results 10

12. Poster as a summary 11

13. Links Official proceedings of ICML2019 http://proceedings.mlr.press/v97/ arxiv: Breaking Inter-Layer Co-Adaptation by Classifier Anonymization https://arxiv.org/abs/1906.01150 Twitter: Masayuki Tanaka https://twitter.com/likesilkto Twitter: Ikuro Sato https://twitter.com/ikuro_s 12

Slideshare breaking inter layer co-adaptation

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to Slideshare breaking inter layer co-adaptation

Similar to Slideshare breaking inter layer co-adaptation (20)

More from Masayuki Tanaka

More from Masayuki Tanaka (20)

Recently uploaded

Recently uploaded (20)

Slideshare breaking inter layer co-adaptation