Low Linear-Prob accuracy #65

launchauto · 2021-12-16T08:13:59Z

Dear author
I have reproduced your code using 64 V100 GPUs. Every setting is the same as paper (batch size 4096), The end-to-end finetuning is almost the same as paper. However, the linear prob is lower than expected in the paper. All of the experiments use normalized targets.

Arichtecture          epochs   end-to-end finetuning         linear probing
MAE-ViT-Base          1600       top1 83.186, top5 93.486      top1 53.64 , top5 77.32
MAE-ViT-Large         800        top1 85.320, top5 97.296      top1 67.45,  top5 87.13    
**according to paper MAE-ViT-Large linear probing top1 will be 73.9**

By the way, I used MoCo V3 2d position-embedding to replace 1d sin-cos position-embedding, which may help(MAE Vit-Base+0.3% in e2e finetuning and linear probing). MoCo V3

I also test your 400 epochs open MAE-ViT-Base model, the linear probing top1-accuracy is 50.91.

Did I miss something mentioned details in the paper?

For the parameters used in linear probing, I followed the setting in the appendix of the paper.
optimizer LARS lr=6.4 batchsize=16384 weight_decay=0, momentum=0.9, cosine decay
warmup epochs=10, total training epochs=90, only use random resize and crop as data augmentation
Replaced the last layer norm with the Batch norm(affine=False) before the classifier.
During the linear probing, I have frozen the backbone, only updating the fc+norm+mean pooling in the head of the classifier.

The text was updated successfully, but these errors were encountered:

michuanhaohao · 2021-12-23T02:18:54Z

I got similar results. MAE+ViT-B+400ep: the linear probing top1-accuracy is 53.01.

optimizer adamw
lr=0.016 batchsize=4096 weight_decay=0, cosine decay,
mixup=0.0 cutmix=0.0, labelsmooth=0.0,
warmup epochs=5, total training epochs=100 only use random resize, random flip and crop as data augmentation

leeyegy · 2021-12-31T18:14:22Z

Thanks for your sharing.
According to your reproduction, the 1600epochs pretrained ViT-B has only 83.2 end-to-end finetune acc, which resulting in a 0.4 gap compared to the paper report. However, the 400epochs pretrained ViT-B has already achieved 83.1 end-to-end finetune acc. It seems that extra 1200 epochs longer pretrain brings negligible improvement, which is quite confusing.
Do you have any ideas about it?

launchauto · 2022-01-04T10:40:13Z

negligible

Sorry, no idea.

launchauto · 2022-01-04T10:43:13Z

I got similar results. MAE+ViT-B+400ep: the linear probing top1-accuracy is 53.01.

optimizer adamw lr=0.016 batchsize=4096 weight_decay=0, cosine decay, mixup=0.0 cutmix=0.0, labelsmooth=0.0, warmup epochs=5, total training epochs=100 only use random resize, random flip and crop as data augmentation

Yeah, I use your linear-prob method and may get +0.33% when testing mae-large model. However, it is still much lower than expected.

mts42000 · 2022-02-14T05:56:38Z

I also tried to reproduce the linear probe results with no success. Interestingly, when I tried the non normalized loss during pretraining, the linear probe accuracy for the base config increased to 60% (still much lower than the expected 68%). With the normalized loss I also got 53.9% accuracy as you. Were you able to reproduce the linear probe results lately?

ShoufaChen · 2022-05-09T03:30:38Z

Hi, @launchauto, @michuanhaohao , @mts42000

Thanks for your efforts in reproducing the linear probe results.

I noticed that the official MAE repo has released the linear probe code. Thus, it is not hard to reproduce.

However, I was wondering did you find what caused the inconsistent performance compared with your original reproduction? I think there is not much difference between your configuration and the official configuration. However, the performance gap is very large.

Any help would be appreciated.

launchauto changed the title ~~Low Linear-Prob accuracy~~ Low Linear-Prob accuracy Dec 23, 2021

launchauto changed the title ~~Low Linear-Prob accuracy~~ Low Linear-Prob accuracy and MoCo V3 2d-position embedding is useful. Dec 23, 2021

launchauto changed the title ~~Low Linear-Prob accuracy and MoCo V3 2d-position embedding is useful.~~ Low Linear-Prob accuracy Dec 23, 2021

Jeff-LiangF mentioned this issue Jan 7, 2022

Release of the linear probing code facebookresearch/mae#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low Linear-Prob accuracy #65

Low Linear-Prob accuracy #65

launchauto commented Dec 16, 2021 •

edited

Loading

michuanhaohao commented Dec 23, 2021

leeyegy commented Dec 31, 2021

launchauto commented Jan 4, 2022

launchauto commented Jan 4, 2022

mts42000 commented Feb 14, 2022

ShoufaChen commented May 9, 2022

Low Linear-Prob accuracy #65

Low Linear-Prob accuracy #65

Comments

launchauto commented Dec 16, 2021 • edited Loading

michuanhaohao commented Dec 23, 2021

leeyegy commented Dec 31, 2021

launchauto commented Jan 4, 2022

launchauto commented Jan 4, 2022

mts42000 commented Feb 14, 2022

ShoufaChen commented May 9, 2022

launchauto commented Dec 16, 2021 •

edited

Loading