SlideShare a Scribd company logo
1 of 29
Learning by Association :
A versatile semi-supervised training
method for neural networks
Masashi Yokota, Tokyo univ.
번역 : 김홍배
Idea기억
개!
인간은、학습 샘플간 연관지어서 생각하는 것이 가능하여 (can
think about by associating with learning samples) 적 은
샘플로도 정확하게 답할 수 있다.
→ 학습 샘플과 unlabeled 데이터를 연관지으면서 학습하는
것이 가능하지 않을까?
Idea
Labeled Unlabeled Labeled
동일 클래스 2개의 labeled 데이터의 사이에 적절한
unlabeled 데이터를 연관시키도록 학습
Overview
Unlabeled
Label X
Label Y
Walker가「labeled → unlabeled → labeled」로 이동하는데 출발과
도착 클래스가 동일하도록 학습시킨다. 이때 walker는 유사도
(similarity)로부터 계산된 천이 확률(Transaction probability)에 따라
이동한다.  Association
Networks이 class간 반드시 필요한 특성을 찾아내도록 훈련됨
Key Contributions
- Semi-supervised end to end training of
arbitrary network architectures
* Surpassing state of the art methods
when only a few labeled samples are available
- Very powerful Domain adaptation method
Method
가정 : networks이 embedding vector를 제대로
만들어낸다면, 동일 클래스의 경우 vector간 높은
유사성을 갖는다.
 Labeled & unlabeled data 모두를 사용하여
networks이 embedding vector를 제대로
만들어내도록 parameter를 최적화
Method
• A: Labeled data
• B: Unlabeled data
• Ai 데이터와 Bj데이터의 유사도 Mij: 내적(inner product)
• 이 유사도 M에 따라 walker의 천이확률을 구한다.
Method
• Transaction Probability
• Round Trip Probability
Walker Loss
동일 클래스간의 Path의 확률은 uniformly distribution으로, 다른
클래스간 Path의 천이확률은 0으로.
MNIST의 1과 비슷한 7의 unlabeled 이미지처럼 구별이 어려운 것의
천이확률도 0에 가깝도록 하여 구별이 쉬운 데이터만 남도록 한다.
(※H: cross entropy)
Visit Loss
Unlabeled
Label X
Label Y
여기처럼 unlabeled data 중
애매한 부분도 효과적으로 활용하고 싶다.
구별이 쉬운 샘플들간 연관 짖는 것보다 주어진 모든 샘플을
“visit”하는 것이 embedding을 좀더 일반화하는데 효과적
Visit Loss
Ai에서 모든 B에 대해 천이확률이 균일하게 분포하도록
→ 명확한 데이터뿐만 아니라 애매한 데이터도
천이확률이 올라가게 됨.
(※H: cross entropy)
Loss Function
• 는 일반적으로 지도학습에서 사용하는
softmax cross entropy
• 실제로는 Visit Loss에는 정규화의 영향이 크므로
가중치를 사용하는 쪽이 좋은 결과를 얻음(후술)
Total Loss Function
Experiment
Experiment
- 검증항목
‣ 제안방법을 사용하여 성능이 올라가나 ?
‣ Unlabeled data를 잘 연관 지어 가는가 ?
‣ Domain Adaptation(SVHN→MNIST)에 응용
- Dataset
‣ MNIST: (labeled: 100 or 1000 or All,
unlabeled: labeled에서 사용하지 않은 데이터)
‣ STL-10: (labeled: 5k, unlabeled: 100k)
‣ SVHN: (labeled: 0 or 1000 or 2000 or All,
unlabeled: labeled에서 사용하지 않은 데이터)
✓ 훈련용 데이터 중에서 labeled data를 일부만 사용, 나머지는 unlabeled로서 학습
Setting
• Batch Size: 100 for both labeled batch A(10
samples per class) and unlabeled batch B
• Optimizer: Adam
• 정규화항: L2 norm (가중치:10-4)
CNN model for MNIST
C(32,3)  C(32,3)  P(2)
 C(64,3)  C(64,3)  P(2)
 C(128,3)  C(128, 3)  P(2)  FC(128)  FC(10)
C(n,k) : convolutional layes with n kernels of
size kxk and stride 1
P(k) : a max pooling layes with window size kxk
and stride 1
출력층 이외의 활성화 함수 : elu
MNIST Result
천이확률의
변화(MNIST)
학습전
학습후
MNIST 에러분석
저자의 주장으로는 labeled data에 없는 특징이 테스트에
존재하기 때문에 (Ex. 4의 윗쪽이 닫혀져서 마치 9와
비슷해짐)착오가 일어났다고
모든 labeled data
테스트 데이터의
틀린 부분
테스트 데이터의
Confusion Matrix
CNN model for SLT-10
Training using 100 randomly chosen samples per
class from the labeled and unlabeled training sets
Data augmentation : random cropping, brightness
change, saturation, hue and small rotation
C(32,3)  C(64,3, s=2)  P(3)
 C(64,3)  C(128,3)  P(2)
 C(128,3)  C(256, 3)  P(2)  FC(128)  FC(10)
SLT-10
학습 데이터에 없는 클래스의 데이터를 입력해도
비교적 가까운 unlabeled data간의 연관 짖기가 이루어짐
CNN model for SVHN
Data augmentation : random affine transformations,
Gaussian blurring
C(32,3)  C(32,3)  C(32,3)  P(2)
 C(64,3)  C(64,3)  C(64,3)  P(2)
 C(128,3)  C(128, 3)  C(128, 3)  P(2)  FC(128)
SVHN Result
적은 샘플에서도 선행연구보다는 정확도가 높음
SVHN
Unlabeled data의 효과 검증
Unlabeled data가 많아짐에 따라 정확도가 향상됨
Fully supervised
Minimum # of
labeled samples
Visit Loss의 효과 검증
Labeled Data Size: 1000
Visit Loss가 너무 커지면 모델의 정규화가 너무 강해져서
학습이 쉽지 않음. 데이터의 variance에 따라 가중치를
조정해야 함.(Ex. Labeled과 unlabeled이 비슷하지 않은 경우
Visit Loss는 작아진다.)
Domain Adaptation
Method 1.
fine-tuning a network on the target domain
after training it on the source domain
Method 2.
designing a network with multiple outputs
for the respective domains
“ Learning w/o forgetting”
Domain Adaptation
이 논문에서의 주요 특징
1. Train a network on the source domain
2. Only exchange the unlabeled data set to the target domain and
train again
* No labels from the target class are used
• Example
- Network trained on SVHN
- Train with labeled samples from SVHN(source) and unlabeled
samples from MNIST(target)  18.56% error
- Train the network with both data sources with 0.5 as weight for
the visit loss  0.51% error
Domain Adaptation
DA: Domain-Adversarial Training of Neural Network[2016 Ganin et. al.]
DS: Domain separation networks [2016 Bousmalis et. al.]
지도학습
지도학습
Domain
Adaptation
정리
• Unlabeled data와 labeled data간이 서로
연관되어지도록 학습
• Labeled data가 적어도 비교적 잘 학습이 됨
• Visit Loss는 데이터의 variance를 봐서 설정
• Domain Adaptation에 응용해도 잘됨

More Related Content

What's hot

텐서플로우로 배우는 딥러닝
텐서플로우로 배우는 딥러닝텐서플로우로 배우는 딥러닝
텐서플로우로 배우는 딥러닝찬웅 주
 
One-Shot Learning
One-Shot LearningOne-Shot Learning
One-Shot LearningJisung Kim
 
인공 신경망 구현에 관한 간단한 설명
인공 신경망 구현에 관한 간단한 설명인공 신경망 구현에 관한 간단한 설명
인공 신경망 구현에 관한 간단한 설명Woonghee Lee
 
Ai 그까이거
Ai 그까이거Ai 그까이거
Ai 그까이거도형 임
 
인공신경망
인공신경망인공신경망
인공신경망종열 현
 
Anomaly Detection with GANs
Anomaly Detection with GANsAnomaly Detection with GANs
Anomaly Detection with GANs홍배 김
 
파이썬(Python) 으로 나만의 딥러닝 API 만들기 강좌 (Feat. AutoAI )
파이썬(Python) 으로 나만의 딥러닝 API 만들기 강좌 (Feat. AutoAI ) 파이썬(Python) 으로 나만의 딥러닝 API 만들기 강좌 (Feat. AutoAI )
파이썬(Python) 으로 나만의 딥러닝 API 만들기 강좌 (Feat. AutoAI ) Yunho Maeng
 
알아두면 쓸데있는 신비한 딥러닝 이야기
알아두면 쓸데있는 신비한 딥러닝 이야기알아두면 쓸데있는 신비한 딥러닝 이야기
알아두면 쓸데있는 신비한 딥러닝 이야기Kwangsik Lee
 
5.model evaluation and improvement
5.model evaluation and improvement5.model evaluation and improvement
5.model evaluation and improvementHaesun Park
 
집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01
집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01
집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01Kwang Woo NAM
 
2.supervised learning(epoch#2)-3
2.supervised learning(epoch#2)-32.supervised learning(epoch#2)-3
2.supervised learning(epoch#2)-3Haesun Park
 
알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder홍배 김
 
3.neural networks
3.neural networks3.neural networks
3.neural networksHaesun Park
 
Intriguing properties of contrastive losses
Intriguing properties of contrastive lossesIntriguing properties of contrastive losses
Intriguing properties of contrastive lossestaeseon ryu
 
Siamese neural networks for one shot image recognition paper explained
Siamese neural networks for one shot image recognition paper explainedSiamese neural networks for one shot image recognition paper explained
Siamese neural networks for one shot image recognition paper explainedtaeseon ryu
 
A neural image caption generator
A neural image caption generatorA neural image caption generator
A neural image caption generator홍배 김
 
From maching learning to deep learning
From maching learning to deep learningFrom maching learning to deep learning
From maching learning to deep learningYongdae Kim
 
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝Haesun Park
 
3.unsupervised learing
3.unsupervised learing3.unsupervised learing
3.unsupervised learingHaesun Park
 
2.supervised learning
2.supervised learning2.supervised learning
2.supervised learningHaesun Park
 

What's hot (20)

텐서플로우로 배우는 딥러닝
텐서플로우로 배우는 딥러닝텐서플로우로 배우는 딥러닝
텐서플로우로 배우는 딥러닝
 
One-Shot Learning
One-Shot LearningOne-Shot Learning
One-Shot Learning
 
인공 신경망 구현에 관한 간단한 설명
인공 신경망 구현에 관한 간단한 설명인공 신경망 구현에 관한 간단한 설명
인공 신경망 구현에 관한 간단한 설명
 
Ai 그까이거
Ai 그까이거Ai 그까이거
Ai 그까이거
 
인공신경망
인공신경망인공신경망
인공신경망
 
Anomaly Detection with GANs
Anomaly Detection with GANsAnomaly Detection with GANs
Anomaly Detection with GANs
 
파이썬(Python) 으로 나만의 딥러닝 API 만들기 강좌 (Feat. AutoAI )
파이썬(Python) 으로 나만의 딥러닝 API 만들기 강좌 (Feat. AutoAI ) 파이썬(Python) 으로 나만의 딥러닝 API 만들기 강좌 (Feat. AutoAI )
파이썬(Python) 으로 나만의 딥러닝 API 만들기 강좌 (Feat. AutoAI )
 
알아두면 쓸데있는 신비한 딥러닝 이야기
알아두면 쓸데있는 신비한 딥러닝 이야기알아두면 쓸데있는 신비한 딥러닝 이야기
알아두면 쓸데있는 신비한 딥러닝 이야기
 
5.model evaluation and improvement
5.model evaluation and improvement5.model evaluation and improvement
5.model evaluation and improvement
 
집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01
집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01
집단지성 프로그래밍 07-고급 분류 기법-커널 기법과 svm-01
 
2.supervised learning(epoch#2)-3
2.supervised learning(epoch#2)-32.supervised learning(epoch#2)-3
2.supervised learning(epoch#2)-3
 
알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder
 
3.neural networks
3.neural networks3.neural networks
3.neural networks
 
Intriguing properties of contrastive losses
Intriguing properties of contrastive lossesIntriguing properties of contrastive losses
Intriguing properties of contrastive losses
 
Siamese neural networks for one shot image recognition paper explained
Siamese neural networks for one shot image recognition paper explainedSiamese neural networks for one shot image recognition paper explained
Siamese neural networks for one shot image recognition paper explained
 
A neural image caption generator
A neural image caption generatorA neural image caption generator
A neural image caption generator
 
From maching learning to deep learning
From maching learning to deep learningFrom maching learning to deep learning
From maching learning to deep learning
 
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
 
3.unsupervised learing
3.unsupervised learing3.unsupervised learing
3.unsupervised learing
 
2.supervised learning
2.supervised learning2.supervised learning
2.supervised learning
 

Viewers also liked

Binarized CNN on FPGA
Binarized CNN on FPGABinarized CNN on FPGA
Binarized CNN on FPGA홍배 김
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...홍배 김
 
Explanation on Tensorflow example -Deep mnist for expert
Explanation on Tensorflow example -Deep mnist for expertExplanation on Tensorflow example -Deep mnist for expert
Explanation on Tensorflow example -Deep mnist for expert홍배 김
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법 홍배 김
 
MNIST for ML beginners
MNIST for ML beginnersMNIST for ML beginners
MNIST for ML beginners홍배 김
 
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...홍배 김
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE홍배 김
 
Convolution 종류 설명
Convolution 종류 설명Convolution 종류 설명
Convolution 종류 설명홍배 김
 
Meta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural NetworksMeta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural Networks홍배 김
 
Learning to remember rare events
Learning to remember rare eventsLearning to remember rare events
Learning to remember rare events홍배 김
 
Single Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance SegmentationSingle Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance Segmentation홍배 김
 
딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향홍배 김
 
머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)홍배 김
 
Q Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object LocalizationQ Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object Localization홍배 김
 

Viewers also liked (14)

Binarized CNN on FPGA
Binarized CNN on FPGABinarized CNN on FPGA
Binarized CNN on FPGA
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
Explanation on Tensorflow example -Deep mnist for expert
Explanation on Tensorflow example -Deep mnist for expertExplanation on Tensorflow example -Deep mnist for expert
Explanation on Tensorflow example -Deep mnist for expert
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법
 
MNIST for ML beginners
MNIST for ML beginnersMNIST for ML beginners
MNIST for ML beginners
 
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
 
Convolution 종류 설명
Convolution 종류 설명Convolution 종류 설명
Convolution 종류 설명
 
Meta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural NetworksMeta-Learning with Memory Augmented Neural Networks
Meta-Learning with Memory Augmented Neural Networks
 
Learning to remember rare events
Learning to remember rare eventsLearning to remember rare events
Learning to remember rare events
 
Single Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance SegmentationSingle Shot MultiBox Detector와 Recurrent Instance Segmentation
Single Shot MultiBox Detector와 Recurrent Instance Segmentation
 
딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향
 
머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)
 
Q Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object LocalizationQ Learning과 CNN을 이용한 Object Localization
Q Learning과 CNN을 이용한 Object Localization
 

Similar to Learning by association

PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesSunghoon Joo
 
Active learning literature survey
Active learning  literature surveyActive learning  literature survey
Active learning literature surveyhyunsikkim30
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks ISang Jun Lee
 
(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)MYEONGGYU LEE
 
Cop 2주차발표 복사본
Cop 2주차발표   복사본Cop 2주차발표   복사본
Cop 2주차발표 복사본jungyounjung1
 
2.supervised learning(epoch#2)-1
2.supervised learning(epoch#2)-12.supervised learning(epoch#2)-1
2.supervised learning(epoch#2)-1Haesun Park
 
Anomaly detection practive_using_deep_learning
Anomaly detection practive_using_deep_learningAnomaly detection practive_using_deep_learning
Anomaly detection practive_using_deep_learning도형 임
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsseungwoo kim
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper ReviewLEE HOSEONG
 
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...Susang Kim
 
Adversarial Attack in Neural Machine Translation
Adversarial Attack in Neural Machine TranslationAdversarial Attack in Neural Machine Translation
Adversarial Attack in Neural Machine TranslationHyunKyu Jeon
 
[네이버AI해커톤]어떻게 걱정을 멈추고 베이스라인을 사랑하는 법을 배우게 되었는가
[네이버AI해커톤]어떻게 걱정을 멈추고 베이스라인을 사랑하는 법을 배우게 되었는가[네이버AI해커톤]어떻게 걱정을 멈추고 베이스라인을 사랑하는 법을 배우게 되었는가
[네이버AI해커톤]어떻게 걱정을 멈추고 베이스라인을 사랑하는 법을 배우게 되었는가NAVER Engineering
 
Dense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDong Heon Cho
 
Chapter 11 Practical Methodology
Chapter 11 Practical MethodologyChapter 11 Practical Methodology
Chapter 11 Practical MethodologyKyeongUkJang
 
boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)SANG WON PARK
 
Distilling the knowledge in a neural network
Distilling the knowledge in a neural networkDistilling the knowledge in a neural network
Distilling the knowledge in a neural networkKyeongUkJang
 
Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2KyeongUkJang
 
From maching learning to deep learning episode2
From maching learning to deep learning episode2 From maching learning to deep learning episode2
From maching learning to deep learning episode2 Yongdae Kim
 
Ml for 정형데이터
Ml for 정형데이터Ml for 정형데이터
Ml for 정형데이터JEEHYUN PAIK
 

Similar to Learning by association (20)

PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
 
Active learning literature survey
Active learning  literature surveyActive learning  literature survey
Active learning literature survey
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks I
 
Naive ML Overview
Naive ML OverviewNaive ML Overview
Naive ML Overview
 
(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)
 
Cop 2주차발표 복사본
Cop 2주차발표   복사본Cop 2주차발표   복사본
Cop 2주차발표 복사본
 
2.supervised learning(epoch#2)-1
2.supervised learning(epoch#2)-12.supervised learning(epoch#2)-1
2.supervised learning(epoch#2)-1
 
Anomaly detection practive_using_deep_learning
Anomaly detection practive_using_deep_learningAnomaly detection practive_using_deep_learning
Anomaly detection practive_using_deep_learning
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendations
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
 
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
 
Adversarial Attack in Neural Machine Translation
Adversarial Attack in Neural Machine TranslationAdversarial Attack in Neural Machine Translation
Adversarial Attack in Neural Machine Translation
 
[네이버AI해커톤]어떻게 걱정을 멈추고 베이스라인을 사랑하는 법을 배우게 되었는가
[네이버AI해커톤]어떻게 걱정을 멈추고 베이스라인을 사랑하는 법을 배우게 되었는가[네이버AI해커톤]어떻게 걱정을 멈추고 베이스라인을 사랑하는 법을 배우게 되었는가
[네이버AI해커톤]어떻게 걱정을 멈추고 베이스라인을 사랑하는 법을 배우게 되었는가
 
Dense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other Models
 
Chapter 11 Practical Methodology
Chapter 11 Practical MethodologyChapter 11 Practical Methodology
Chapter 11 Practical Methodology
 
boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)
 
Distilling the knowledge in a neural network
Distilling the knowledge in a neural networkDistilling the knowledge in a neural network
Distilling the knowledge in a neural network
 
Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2
 
From maching learning to deep learning episode2
From maching learning to deep learning episode2 From maching learning to deep learning episode2
From maching learning to deep learning episode2
 
Ml for 정형데이터
Ml for 정형데이터Ml for 정형데이터
Ml for 정형데이터
 

More from 홍배 김

Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...홍배 김
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing홍배 김
 
Lecture Summary : Camera Projection
Lecture Summary : Camera Projection Lecture Summary : Camera Projection
Lecture Summary : Camera Projection 홍배 김
 
Learning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robotsLearning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robots홍배 김
 
Robotics of Quadruped Robot
Robotics of Quadruped RobotRobotics of Quadruped Robot
Robotics of Quadruped Robot홍배 김
 
Basics of Robotics
Basics of RoboticsBasics of Robotics
Basics of Robotics홍배 김
 
Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명홍배 김
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Optimal real-time landing using DNN
Optimal real-time landing using DNNOptimal real-time landing using DNN
Optimal real-time landing using DNN홍배 김
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain홍배 김
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier홍배 김
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...홍배 김
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance홍배 김
 

More from 홍배 김 (15)

Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Lecture Summary : Camera Projection
Lecture Summary : Camera Projection Lecture Summary : Camera Projection
Lecture Summary : Camera Projection
 
Learning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robotsLearning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robots
 
Robotics of Quadruped Robot
Robotics of Quadruped RobotRobotics of Quadruped Robot
Robotics of Quadruped Robot
 
Basics of Robotics
Basics of RoboticsBasics of Robotics
Basics of Robotics
 
Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Optimal real-time landing using DNN
Optimal real-time landing using DNNOptimal real-time landing using DNN
Optimal real-time landing using DNN
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance
 

Learning by association

  • 1. Learning by Association : A versatile semi-supervised training method for neural networks Masashi Yokota, Tokyo univ. 번역 : 김홍배
  • 2. Idea기억 개! 인간은、학습 샘플간 연관지어서 생각하는 것이 가능하여 (can think about by associating with learning samples) 적 은 샘플로도 정확하게 답할 수 있다. → 학습 샘플과 unlabeled 데이터를 연관지으면서 학습하는 것이 가능하지 않을까?
  • 3. Idea Labeled Unlabeled Labeled 동일 클래스 2개의 labeled 데이터의 사이에 적절한 unlabeled 데이터를 연관시키도록 학습
  • 4. Overview Unlabeled Label X Label Y Walker가「labeled → unlabeled → labeled」로 이동하는데 출발과 도착 클래스가 동일하도록 학습시킨다. 이때 walker는 유사도 (similarity)로부터 계산된 천이 확률(Transaction probability)에 따라 이동한다.  Association Networks이 class간 반드시 필요한 특성을 찾아내도록 훈련됨
  • 5. Key Contributions - Semi-supervised end to end training of arbitrary network architectures * Surpassing state of the art methods when only a few labeled samples are available - Very powerful Domain adaptation method
  • 6. Method 가정 : networks이 embedding vector를 제대로 만들어낸다면, 동일 클래스의 경우 vector간 높은 유사성을 갖는다.  Labeled & unlabeled data 모두를 사용하여 networks이 embedding vector를 제대로 만들어내도록 parameter를 최적화
  • 7. Method • A: Labeled data • B: Unlabeled data • Ai 데이터와 Bj데이터의 유사도 Mij: 내적(inner product) • 이 유사도 M에 따라 walker의 천이확률을 구한다.
  • 8. Method • Transaction Probability • Round Trip Probability
  • 9. Walker Loss 동일 클래스간의 Path의 확률은 uniformly distribution으로, 다른 클래스간 Path의 천이확률은 0으로. MNIST의 1과 비슷한 7의 unlabeled 이미지처럼 구별이 어려운 것의 천이확률도 0에 가깝도록 하여 구별이 쉬운 데이터만 남도록 한다. (※H: cross entropy)
  • 10. Visit Loss Unlabeled Label X Label Y 여기처럼 unlabeled data 중 애매한 부분도 효과적으로 활용하고 싶다. 구별이 쉬운 샘플들간 연관 짖는 것보다 주어진 모든 샘플을 “visit”하는 것이 embedding을 좀더 일반화하는데 효과적
  • 11. Visit Loss Ai에서 모든 B에 대해 천이확률이 균일하게 분포하도록 → 명확한 데이터뿐만 아니라 애매한 데이터도 천이확률이 올라가게 됨. (※H: cross entropy)
  • 12. Loss Function • 는 일반적으로 지도학습에서 사용하는 softmax cross entropy • 실제로는 Visit Loss에는 정규화의 영향이 크므로 가중치를 사용하는 쪽이 좋은 결과를 얻음(후술) Total Loss Function
  • 14. Experiment - 검증항목 ‣ 제안방법을 사용하여 성능이 올라가나 ? ‣ Unlabeled data를 잘 연관 지어 가는가 ? ‣ Domain Adaptation(SVHN→MNIST)에 응용 - Dataset ‣ MNIST: (labeled: 100 or 1000 or All, unlabeled: labeled에서 사용하지 않은 데이터) ‣ STL-10: (labeled: 5k, unlabeled: 100k) ‣ SVHN: (labeled: 0 or 1000 or 2000 or All, unlabeled: labeled에서 사용하지 않은 데이터) ✓ 훈련용 데이터 중에서 labeled data를 일부만 사용, 나머지는 unlabeled로서 학습
  • 15. Setting • Batch Size: 100 for both labeled batch A(10 samples per class) and unlabeled batch B • Optimizer: Adam • 정규화항: L2 norm (가중치:10-4)
  • 16. CNN model for MNIST C(32,3)  C(32,3)  P(2)  C(64,3)  C(64,3)  P(2)  C(128,3)  C(128, 3)  P(2)  FC(128)  FC(10) C(n,k) : convolutional layes with n kernels of size kxk and stride 1 P(k) : a max pooling layes with window size kxk and stride 1 출력층 이외의 활성화 함수 : elu
  • 19. MNIST 에러분석 저자의 주장으로는 labeled data에 없는 특징이 테스트에 존재하기 때문에 (Ex. 4의 윗쪽이 닫혀져서 마치 9와 비슷해짐)착오가 일어났다고 모든 labeled data 테스트 데이터의 틀린 부분 테스트 데이터의 Confusion Matrix
  • 20. CNN model for SLT-10 Training using 100 randomly chosen samples per class from the labeled and unlabeled training sets Data augmentation : random cropping, brightness change, saturation, hue and small rotation C(32,3)  C(64,3, s=2)  P(3)  C(64,3)  C(128,3)  P(2)  C(128,3)  C(256, 3)  P(2)  FC(128)  FC(10)
  • 21. SLT-10 학습 데이터에 없는 클래스의 데이터를 입력해도 비교적 가까운 unlabeled data간의 연관 짖기가 이루어짐
  • 22. CNN model for SVHN Data augmentation : random affine transformations, Gaussian blurring C(32,3)  C(32,3)  C(32,3)  P(2)  C(64,3)  C(64,3)  C(64,3)  P(2)  C(128,3)  C(128, 3)  C(128, 3)  P(2)  FC(128)
  • 23. SVHN Result 적은 샘플에서도 선행연구보다는 정확도가 높음
  • 24. SVHN Unlabeled data의 효과 검증 Unlabeled data가 많아짐에 따라 정확도가 향상됨 Fully supervised Minimum # of labeled samples
  • 25. Visit Loss의 효과 검증 Labeled Data Size: 1000 Visit Loss가 너무 커지면 모델의 정규화가 너무 강해져서 학습이 쉽지 않음. 데이터의 variance에 따라 가중치를 조정해야 함.(Ex. Labeled과 unlabeled이 비슷하지 않은 경우 Visit Loss는 작아진다.)
  • 26. Domain Adaptation Method 1. fine-tuning a network on the target domain after training it on the source domain Method 2. designing a network with multiple outputs for the respective domains “ Learning w/o forgetting”
  • 27. Domain Adaptation 이 논문에서의 주요 특징 1. Train a network on the source domain 2. Only exchange the unlabeled data set to the target domain and train again * No labels from the target class are used • Example - Network trained on SVHN - Train with labeled samples from SVHN(source) and unlabeled samples from MNIST(target)  18.56% error - Train the network with both data sources with 0.5 as weight for the visit loss  0.51% error
  • 28. Domain Adaptation DA: Domain-Adversarial Training of Neural Network[2016 Ganin et. al.] DS: Domain separation networks [2016 Bousmalis et. al.] 지도학습 지도학습 Domain Adaptation
  • 29. 정리 • Unlabeled data와 labeled data간이 서로 연관되어지도록 학습 • Labeled data가 적어도 비교적 잘 학습이 됨 • Visit Loss는 데이터의 variance를 봐서 설정 • Domain Adaptation에 응용해도 잘됨