自监督学习的项目与paper总结-Project and paper summary of self-supervised learning 【不断更新,2022.8.12构建仓库】
-
A Simple Framework for Contrastive Learning of Visual Representations
-
Pdf: /simclr/A Simple Framework for Contrastive Learning of Visual Representations.pdf
-
Abstract:
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive selfsupervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-ofthe-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100× fewer labels.
-
Related work: based on SimCLR
-
Pdf: /simclr/
-
[1] Speech simclr: Combining contrastive and reconstruction objective for self-supervised speech representation learning(https://arxiv.org/abs/2010.13991)
-
[2] Improved baselines with momentum contrastive learning (https://arxiv.org/abs/2003.04297)
-
[3] Semi-Supervising Learning, Transfer Learning, and Knowledge Distillation with SimCLR (https://arxiv.org/abs/2108.00587)
-
Bootstrap Your Own Latent-a New Approach to Self-supervised Learning
-
Pdf: /simclr/Bootstrap Your Own Latent-a New Approach to Self-supervised Learning.pdf
-
Abstract:
We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods intrinsically rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches 74.3% top-1 classification accuracy on ImageNet using the standard linear evaluation protocol with a standard ResNet-50 architecture and 79.6% with a larger ResNet. We also show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks.
-
Related work: based on BYOL
-
Pdf: /byol/
-
[1] BYOL for audio: Self-supervised learning for general-purpose audio representation (https://arxiv.org/abs/2010.13991)
-
[2] Run away from your teacher: Understanding byol by a novel self-supervised approach (https://arxiv.org/abs/2003.04297)
-
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
-
Paper: [http://proceedings.mlr.press/v119/chen20j.html](http://proceedings.mlr.press/v139/zbontar21a.html)
-
Pdf: /simclr/Barlow Twins_Self-Supervised Learning via Redundancy Reduction.pdf
-
Code: [https://github.com/google-research/simclr](https://github.com/facebookresearch/barlowtwins)
-
Abstract:
Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A successful approach to SSL is to learn embeddings which are invariant to distortions of the input sample. However, a recurring issue with this approach is the existence of trivial constant solutions. Most current methods avoid such solutions by careful implementation details. We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. This causes the embedding vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors. The method is called Barlow Twins, owing to neuroscientist H. Barlow's redundancy-reduction principle applied to a pair of identical networks. Barlow Twins does not require large batches nor asymmetry between the network twins such as a predictor network, gradient stopping, or a moving average on the weight updates. Intriguingly it benefits from very high-dimensional output vectors. Barlow Twins outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime, and is on par with current state of the art for ImageNet classification with a linear classifier head, and for transfer tasks of classification and object detection.
-
Related work: based on Barlow Twins
-
Pdf: /barlow twins/
-
[1] A note on connecting barlow twins with negative-sample-free contrastive learning ([https://arxiv.org/abs/2010.13991](https://arxiv.org/abs/2104.13712))
-
[2] Graph Barlow Twins: A self-supervised representation learning framework for graphs ([https://arxiv.org/abs/2003.04297](https://arxiv.org/abs/2106.02466))
-
Exploring Simple Siamese Representation Learning
-
Pdf: /simclr/Exploring Simple Siamese Representation Learning.pdf
-
Abstract:
Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions. In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing. We provide a hypothesis on the implication of stop-gradient, and further show proof-of-concept experiments verifying it. Our "SimSiam" method achieves competitive results on ImageNet and downstream tasks. We hope this simple baseline will motivate people to rethink the roles of Siamese architectures for unsupervised representation learning. Code is made available. (https://github.com/facebookresearch/simsiam)
-
Related work: based on SimSam
-
Pdf: /simsam/
-
[1] Contrastive learning meets transfer learning_a case study in medical image analysis (https://arxiv.org/abs/2010.13991)
-
[2] How does simsiam avoid collapse without negative samples a unified understanding with self-supervised contrastive learning (https://arxiv.org/abs/2003.04297)
-
[3] Simtriplet_Simple triplet representation learning with a single gpu (https://arxiv.org/abs/2108.00587)