question about MeZO-adam #37

zhaoaustin · 2024-08-19T19:56:50Z

Hi! I find MeZO-adam code in medium size folder, but it uses the Adam from pytorch.optim. Its not like the case in large_models that author re-write the inner_loop. Can you please explain it? Thank you every much for your time and efforts.

gaotianyu1350 · 2024-08-19T20:04:43Z

Hi,

We found in the medium model experiments that Adam is only comparable or worse compared to SGD. Hence we only ran it as an ablation (on the medium-sized models) and did not implement the efficient version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about MeZO-adam #37

question about MeZO-adam #37

zhaoaustin commented Aug 19, 2024

gaotianyu1350 commented Aug 19, 2024

question about MeZO-adam #37

question about MeZO-adam #37

Comments

zhaoaustin commented Aug 19, 2024

gaotianyu1350 commented Aug 19, 2024