Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results of Trec dataset on Roberta-large(K=512) with MeZO(LoRA) #25

Open
Yanjun-Zhao opened this issue Dec 20, 2023 · 8 comments
Open

Comments

@Yanjun-Zhao
Copy link

I use the grid research below but couldn't reproduce the result of the paper. (I have update the code for WD and successfully reproduce the result on SST2)

TASK=trec K=512 SEED=42 BS=64 WD=0.1 LR=1e-4/5e-5/1e-5 EPS=1e-3 MODEL=roberta-large EXTRA_TAG=lora bash mezo.sh --apply_lora --lora_r 8 --lora_alpha 16

Here is my produced result but paper result is Accuracy=95.

LR | Accuracy
1e-4 | 57.4
5e-5 | 60
1e-5 | 58.2

@Yanjun-Zhao Yanjun-Zhao changed the title Results on Trec dataset cannot be reproduced on Roberta-large(K=16/512) with MeZO(LoRA) Results of Trec dataset on Roberta-large(K=512) with MeZO(LoRA) Dec 21, 2023
@gaotianyu1350
Copy link
Member

Hi,

There was a small bug in incorporating weight decay in the code and now it is fixed. Please try again!

@Yanjun-Zhao
Copy link
Author

thanks for your reply!
But I have used the updated code and failed in Trec dataset.

@gaotianyu1350
Copy link
Member

Which commit are you using? Also, by fail you mean not being able to reproduce the result or there was a runtime error?

@Yanjun-Zhao
Copy link
Author

I use the code with param.data = param.data - self.args.learning_rate * (projected_grad * z + self.args.weight_decay * param.data).
I cann't reproduce the result with TASK=trec K=512 SEED=42 BS=64 WD=0.1 LR=1e-4/5e-5/1e-5 EPS=1e-3 MODEL=roberta-large EXTRA_TAG=lora bash mezo.sh --apply_lora --lora_r 8 --lora_alpha 16. Not runtime error.

@gaotianyu1350
Copy link
Member

Hi,

Can you post the results you get with this experiment? Also, note that our reported results are averaged over five seeds following this paper's setting. The five seeds are 13 21 42 87 100.

@fjxmlzn
Copy link

fjxmlzn commented Dec 29, 2023

@Yanjun-Zhao We were not able to reproduce the results of Roberta-large either. You mentioned that "I have update the code for WD and successfully reproduce the result on SST2". Do you mean that you used the code after this fix 552cb1b, and would you mind sharing the command and parameters you used?

Thanks a lot!

@hxixixh
Copy link

hxixixh commented Apr 3, 2024

@gaotianyu1350 Hi we have the same issue as well. Would you mind sharing the code and configurations for reproduction?

@gaotianyu1350
Copy link
Member

@hxixixh can you post the configuration you used and the results you got?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants