Cannot reproduce the results for RoBERTa on SST-2 #21

TrueNobility303 · 2023-09-10T00:33:42Z

Hello,

Thank you for your fantastic work. When I ran the codes for RoBERTa on SST-2, I found the results could not be reproduced.

For instance, for FT (full parameter fine-tune) with Adam, running

TASK=SST-2 K=16 SEED=42 BS=8 LR=5e-5 MODEL=roberta-large bash finetune.sh

gets val_acc = 0.90625, and test_acc = 0.84518

When I try smaller learning rates, the results are worse.

But Table 16 in the paper claims the acc should be 91.9. Is it acc on test_set or val_set?
If it is on test_acc, it seems difficult to reproduce.

The results for MeZO are worse. About eval_acc < 0.8 when I run

TASK=SST-2 K=16 SEED=42 BS=64 LR=1e-5 EPS=1e-3 MODEL=roberta-large bash mezo.sh

Can the authors provide more details on the best hyper-parameters? I would be very grateful for that.

Best regards,

The text was updated successfully, but these errors were encountered:

gaotianyu1350 · 2023-09-10T14:29:49Z

Hi,

The example script we provided is not necessarily the best hyperparameter setting (just a hyperparameter setting that can usually get good results so it would be a good starting point for you to try). To reproduce the results, please follow our appendix D.3 for the complete grid search.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce the results for RoBERTa on SST-2 #21

Cannot reproduce the results for RoBERTa on SST-2 #21

TrueNobility303 commented Sep 10, 2023

gaotianyu1350 commented Sep 10, 2023

Cannot reproduce the results for RoBERTa on SST-2 #21

Cannot reproduce the results for RoBERTa on SST-2 #21

Comments

TrueNobility303 commented Sep 10, 2023

gaotianyu1350 commented Sep 10, 2023