More intermediate checkpoints in < 240k steps #176

MaveriQ · 2024-04-07T09:42:37Z

Hi!

Thank you for great repo and the models. I want to pretrain the model with a new tokenizer, but since 16 A100 GPUs are hard to get by, I was wondering if you can release further checkpoints in <500B tokens-seen category. As shown in OLMo paper (see figure), for some tasks, there is a linear increase in performance with number of tokens seen by the model. Thus training for even less steps can give some idea of the new setup.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More intermediate checkpoints in < 240k steps #176

More intermediate checkpoints in < 240k steps #176

MaveriQ commented Apr 7, 2024

More intermediate checkpoints in < 240k steps #176

More intermediate checkpoints in < 240k steps #176

Comments

MaveriQ commented Apr 7, 2024