Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low Linear-Prob accuracy #65

Open
launchauto opened this issue Dec 16, 2021 · 6 comments
Open

Low Linear-Prob accuracy #65

launchauto opened this issue Dec 16, 2021 · 6 comments

Comments

@launchauto
Copy link

launchauto commented Dec 16, 2021

Dear author
I have reproduced your code using 64 V100 GPUs. Every setting is the same as paper (batch size 4096), The end-to-end finetuning is almost the same as paper. However, the linear prob is lower than expected in the paper. All of the experiments use normalized targets.

Arichtecture          epochs   end-to-end finetuning         linear probing
MAE-ViT-Base          1600       top1 83.186, top5 93.486      top1 53.64 , top5 77.32
MAE-ViT-Large         800        top1 85.320, top5 97.296      top1 67.45,  top5 87.13    
**according to paper MAE-ViT-Large linear probing top1 will be 73.9**

By the way, I used MoCo V3 2d position-embedding to replace 1d sin-cos position-embedding, which may help(MAE Vit-Base+0.3% in e2e finetuning and linear probing). MoCo V3

I also test your 400 epochs open MAE-ViT-Base model, the linear probing top1-accuracy is 50.91.

Did I miss something mentioned details in the paper?

For the parameters used in linear probing, I followed the setting in the appendix of the paper.
optimizer LARS lr=6.4 batchsize=16384 weight_decay=0, momentum=0.9, cosine decay
warmup epochs=10, total training epochs=90, only use random resize and crop as data augmentation
Replaced the last layer norm with the Batch norm(affine=False) before the classifier.
During the linear probing, I have frozen the backbone, only updating the fc+norm+mean pooling in the head of the classifier.

@launchauto launchauto changed the title Low Linear-Prob accuracy Low Linear-Prob accuracy Dec 23, 2021
@launchauto launchauto changed the title Low Linear-Prob accuracy Low Linear-Prob accuracy and MoCo V3 2d-position embedding is useful. Dec 23, 2021
@launchauto launchauto changed the title Low Linear-Prob accuracy and MoCo V3 2d-position embedding is useful. Low Linear-Prob accuracy Dec 23, 2021
@michuanhaohao
Copy link

I got similar results. MAE+ViT-B+400ep: the linear probing top1-accuracy is 53.01.

optimizer adamw
lr=0.016 batchsize=4096 weight_decay=0, cosine decay,
mixup=0.0 cutmix=0.0, labelsmooth=0.0,
warmup epochs=5, total training epochs=100 only use random resize, random flip and crop as data augmentation

@leeyegy
Copy link

leeyegy commented Dec 31, 2021

Thanks for your sharing.
According to your reproduction, the 1600epochs pretrained ViT-B has only 83.2 end-to-end finetune acc, which resulting in a 0.4 gap compared to the paper report. However, the 400epochs pretrained ViT-B has already achieved 83.1 end-to-end finetune acc. It seems that extra 1200 epochs longer pretrain brings negligible improvement, which is quite confusing.
Do you have any ideas about it?

@launchauto
Copy link
Author

negligible

Sorry, no idea.

@launchauto
Copy link
Author

I got similar results. MAE+ViT-B+400ep: the linear probing top1-accuracy is 53.01.

optimizer adamw lr=0.016 batchsize=4096 weight_decay=0, cosine decay, mixup=0.0 cutmix=0.0, labelsmooth=0.0, warmup epochs=5, total training epochs=100 only use random resize, random flip and crop as data augmentation

Yeah, I use your linear-prob method and may get +0.33% when testing mae-large model. However, it is still much lower than expected.

@mts42000
Copy link

I also tried to reproduce the linear probe results with no success. Interestingly, when I tried the non normalized loss during pretraining, the linear probe accuracy for the base config increased to 60% (still much lower than the expected 68%). With the normalized loss I also got 53.9% accuracy as you. Were you able to reproduce the linear probe results lately?

@ShoufaChen
Copy link

Hi, @launchauto, @michuanhaohao , @mts42000

Thanks for your efforts in reproducing the linear probe results.

I noticed that the official MAE repo has released the linear probe code. Thus, it is not hard to reproduce.

However, I was wondering did you find what caused the inconsistent performance compared with your original reproduction? I think there is not much difference between your configuration and the official configuration. However, the performance gap is very large.

Any help would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants