6. Self-Supervised Learning
How : Paradigm Overview
6
SSII2021
Representation
Similar
or not
Generative / Predictive Contrastive
Representation
• Loss measured in the output space
• Learning to reconstruct/predict
Better reconstruction/prediction
better representation
• Loss measured in the representation space
• Learning to distinguish
Do not need to reconstruct all detail
focus on distinguishing samples
Positive Negative
Anchor
Original Generated
Eg. Auto-Encoders, BERT
7. Self-Supervised Learning
How : Paradigm Overview
SSII2021
Generative / Predictive
Representation
• Loss measured in the output space
• Learning to reconstruct/predict
Better reconstruction/prediction
better representation
Original Generated
Eg. Auto-Encoders, BERT
Top: Drawing of a dollar bill from memory
Down: Drawing subsequently made with a
dollar bill present. [Image source: Epstein, 2016]
8. Self-Supervised Learning
How : Paradigm Overview
8
SSII2021
Representation
Similar
or not
Generative / Predictive Contrastive
• Loss measured in the output space
• Learning to reconstruct/predict
Better reconstruction/prediction
better representation
• Loss measured in the representation space
• Learning to distinguish
Do not need to reconstruct all detail
focus on distinguishing samples
Positive Negative
Anchor
Representation
Original Generated
9. Contrastive Learning
Point: distinguish features among different instances
9
SSII2021
Representation
Positive (𝒙+
)
Negative
(𝐱𝐣, 𝒕𝒉𝒆 𝒐𝒕𝒉𝒆𝒓 𝒔𝒂𝒎𝒑𝒍𝒆𝒔)
Anchor (𝒙)
Positive
Anchor Negative
Anchor
similar dissimilar
InfoNCE Loss [Gutmann+, AISTATS’10]
10. Contrastive Learning
• MoCo: Momentum contrast for unsupervised visual
representation learning [CVPR’20]
• SimCLR: A Simple Framework for Contrastive Learning
of Visual Representations [ICML’20]
SSII2021 10
Recent related works
Increase negatives
Increase positives
12. MoCo[CVPR’20]
He, Kaiming, et al. "Momentum contrast for unsupervised visual representation learning." CVPR2020.
SSII2021
12
End-to-end:
• Negative
• All the other samples (-anchor/positive)
• Two encoders
• q: anchor; k:positive/negative
• Benefit from large batch size
• Memory problem
13. MoCo[CVPR’20]
He, Kaiming, et al. "Momentum contrast for unsupervised visual representation learning." CVPR2020.
SSII2021
13
Memory Bank(MB):
• Negative:
• Embedding stored in MB
• Random sampling from
MB
• Memory bank updating
• Computing cost problem
14. MoCo[CVPR’20]
He, Kaiming, et al. "Momentum contrast for unsupervised visual representation learning." CVPR2020.
SSII2021
14
Momentum encoder
• Encoder:
• Only positive sample
• Negative:
• Past embeddings of positives
• Queue: save embedded features
• Updating: weight of momentum encoder
15. SimCLR[ICML’20]
Points
• Positive samples: data augmentation
• Random crops + color distortion
• Negative samples: larger batch size(end-to-end)
SimCLR:Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." ICML2020.
SSII2021
15
16. Effect of Recent Works
SSII2021 16
source: [SimCLR: A Simple Framework for Contrastive Learning of Visual Representations]
• Performance approaching
supervised methods
• Limitation:
more time & parameters
• Less labeling cost
• High generalization
(Eg. cross-domain)
17. Summary of Recent Works
• Positive samples:
• Multi-sampling method: transformation; crops;
• Negative samples:
• Larger batch size
• Save previous features: memory bank ; queue
SSII2021 17
• Intra-positives: same instance, same class
• Inter-negatives: different instances
18. Inter-intra Contrastive Framework [MM’20]
SSII2021 18
• Intra-positives: same instance, same class
Intra-negatives: same instance, different class
• Inter-positives: different instances, same class
Inter-negatives: different instances
Traditional contrastive learning
IIC: L. TAO, X. Wang, and T. YAMASAKI, “Self-supervised Video Representation Learning Using Inter-
intra Contrastive Framework”, ACMMM2020.
Inter-intra contrastive (IIC) learning framework
makes the most use of data
19. Proposed Concept (video task)
Constrains of our method
• Intra-positive: multi view (CMC based)
• Optical flow
• Frame difference (Residual frame)
• Inter-negative:
• Different instances
• Intra-negative samples
• Same instance destroying
temporal information
19
Inter-intra contrastive learning
SSII2021
CMC: Tian, Yonglong et al. “Contrastive Multiview Coding.”, ArXiv abs/1906.05849 (2019): n. pag.
20. Generation of Intra-negative Samples
• Break temporal relations
• Similar statistical information
• Two options
• Frame repeating
• Frame shuffling
20
SSII2021
23. Results: video recognition
• SSL pretrained on
• UCF101 split 1
• Finetuned on
• UCF101 (3 splits)
• HMDB51 (3 splits)
23
* indicates results using the same network backbone, R3D.
SSII2021
24. Summary of IIC
• Introduce intra-negative samples to encourage models to learn
rich temporal information
• Significant improvements over the state-of-the-art methods are
achieved on two video tasks
24
Project page GitHub repo
SSII2021
25. Directions of
Contrastive Self-supervised Learning
• Pretext task selection/design
• Pair samples
• Find and use effective positive/negative samples
• Combine with supervised learning
• Supervised contrastive learning
• Task related Self-supervised Learning
SSII2021 25