Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce the results for RoBERTa on SST-2 #21

Open
TrueNobility303 opened this issue Sep 10, 2023 · 1 comment
Open

Cannot reproduce the results for RoBERTa on SST-2 #21

TrueNobility303 opened this issue Sep 10, 2023 · 1 comment

Comments

@TrueNobility303
Copy link

Hello,

Thank you for your fantastic work. When I ran the codes for RoBERTa on SST-2, I found the results could not be reproduced.

For instance, for FT (full parameter fine-tune) with Adam, running

TASK=SST-2 K=16 SEED=42 BS=8 LR=5e-5 MODEL=roberta-large bash finetune.sh

gets val_acc = 0.90625, and test_acc = 0.84518

When I try smaller learning rates, the results are worse.

But Table 16 in the paper claims the acc should be 91.9. Is it acc on test_set or val_set?
If it is on test_acc, it seems difficult to reproduce.

The results for MeZO are worse. About eval_acc < 0.8 when I run

TASK=SST-2 K=16 SEED=42 BS=64 LR=1e-5 EPS=1e-3 MODEL=roberta-large bash mezo.sh

Can the authors provide more details on the best hyper-parameters? I would be very grateful for that.

Best regards,

@gaotianyu1350
Copy link
Member

Hi,

The example script we provided is not necessarily the best hyperparameter setting (just a hyperparameter setting that can usually get good results so it would be a good starting point for you to try). To reproduce the results, please follow our appendix D.3 for the complete grid search.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants