Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about MeZO-adam #37

Open
zhaoaustin opened this issue Aug 19, 2024 · 1 comment
Open

question about MeZO-adam #37

zhaoaustin opened this issue Aug 19, 2024 · 1 comment

Comments

@zhaoaustin
Copy link

Hi! I find MeZO-adam code in medium size folder, but it uses the Adam from pytorch.optim. Its not like the case in large_models that author re-write the inner_loop. Can you please explain it? Thank you every much for your time and efforts.

@gaotianyu1350
Copy link
Member

Hi,

We found in the medium model experiments that Adam is only comparable or worse compared to SGD. Hence we only ran it as an ablation (on the medium-sized models) and did not implement the efficient version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants