You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for great repo and the models. I want to pretrain the model with a new tokenizer, but since 16 A100 GPUs are hard to get by, I was wondering if you can release further checkpoints in <500B tokens-seen category. As shown in OLMo paper (see figure), for some tasks, there is a linear increase in performance with number of tokens seen by the model. Thus training for even less steps can give some idea of the new setup.
Thanks!
The text was updated successfully, but these errors were encountered:
Hi!
Thank you for great repo and the models. I want to pretrain the model with a new tokenizer, but since 16 A100 GPUs are hard to get by, I was wondering if you can release further checkpoints in <500B tokens-seen category. As shown in OLMo paper (see figure), for some tasks, there is a linear increase in performance with number of tokens seen by the model. Thus training for even less steps can give some idea of the new setup.
Thanks!
The text was updated successfully, but these errors were encountered: