-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results on WSC and WIC datasets cannot be reproduced on OPT-13B with MeZO #15
Comments
Hi, Can you provide more details on your run, for example, what is the result that you got? BTW, in our experiments, we only used one A100 for OPT-13B. |
Hi, I just realized that for WSC you should use 1e-7/1e-3. Note that for all OPT-13B MeZO experiments we do a grid search over LR=1e-6/1e-7. |
FYI, here are the results I got WSC 1e-6/1e-3
WSC 1e-7/1e-3
WIC 1e-6/1e-3
WIC 1e-7/1e-3
|
Hi, May I ask for the seed of WSC and WIC. I find the default seed is 0 and I cannot reproduce the result of WIC 61.1. |
The seed we used for the experiment is 0. Note that different hardwares may lead to slightly different results, even if the random seeds are the same. |
Hello,
Thank you for your fantastic work. When I run mezo.sh for WSC and WIC on OPT-13B with MeZO, the reported results in paper cannot be reproduced.
I run mezh.sh on 4 x A100, with per_device_batch = 4, lr = 1e-6, eps = 1e-3.
I want to know more details about the training settings for the reproduction.
Thanks!
The text was updated successfully, but these errors were encountered: