POA: Pre-training Once for Models of All Sizes #13

Dongwoo-Im · 2024-11-10T01:26:19Z

arxiv : https://arxiv.org/abs/2408.01031
github : https://github.com/alipay/POA

Motivation = weight-sharing strategy in NAS

Similar study

Cosub: Co-training 2L Submodels for Visual Recognition (CVPR 2023)
- supervised learning
Weighted Ensemble Self-Supervised Learning (ICLR 2023)
- each head contains an identical number of prototypes
- and employs an averaging of cross-entropy loss which is weighted by the predictive entropies of each head

Cross-view distillation = teacher <-> intact student (elastic student)
Same-view distillation = intact student <-> elastic student

Depth = the number of blocks
Width = the number of channels

=> elastic widths and depths, we can generate a total of (N +1) x (M +1) distinct sub-networks

For training, use multiple projection heads (MPH)
For each head, the distillation loss LSi for both the intact and elastic student is calculated

Model

ViT (11 x 13)
Swin (3 x 13)
ResNet (3 x 155)
- Probabilistic Sampling for Elastic Student

we derive the sub-networks ViT-S/16 and ViT-B/16 from the teacher ViT-L/16 without any additional pre-training
-> 그냥 Large 모델을 Distillation한 꼴이긴 하지만,, 이러한 학습 방식으로도 Small, Base 모델에서 안정적으로 성능이 나온다는 점이 중요한 듯

Detection good

How Does Elastic Student Facilitate Pre-training?

it acts as a training regularization to stabilize the training progress
Unlike existing self-distillation methods, the teacher in the POA SSL integrates a series of sub-networks through an EMA update
-> 코드 뜯어보면 multi-crop, ibot(Masked Patch Tokens Prediction), dino 등 여러 SSL 학습 방법론을 중복해서 사용하더라
-> 여러 개의 elastic student 중에서 샘플링해서 1개만 뽑는건가 싶다. (모든 elastic student를 업데이트 하는건 아무리 네트워크가 작아도 비효율적일 수도..)

Dongwoo-Im added SSL ECCV labels Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POA: Pre-training Once for Models of All Sizes #13

POA: Pre-training Once for Models of All Sizes #13

Dongwoo-Im commented Nov 10, 2024 •

edited

Loading

POA: Pre-training Once for Models of All Sizes #13

POA: Pre-training Once for Models of All Sizes #13

Comments

Dongwoo-Im commented Nov 10, 2024 • edited Loading

Dongwoo-Im commented Nov 10, 2024 •

edited

Loading