Skip to content

v1.7.0

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 27 Nov 22:31
· 24 commits to main since this release

What's new

Added πŸŽ‰

  • Added key_mapping argument to olmo_core.distributed.checkpoint.load_model_and_optim_state()
    for loading checkpoints with different key names.
  • Added load_key_mapping field to the trainer, same idea as the new key_mapping argument above.
  • Added an implementation of nGPT called NormalizedTransformer.
  • Added an example showing how to convert a HuggingFace Llama 3.2 checkpoint into the right format for OLMo-core.
  • Added an API for scaling RoPE embeddings.
  • Added a ModelLadder API.

Changed ⚠️

  • The w_out and norm top-level children of the Transformer model are now wrapped together in an lm_head module. Training scripts will have backwards compatibility with older checkpoints due to the load_key_mapping explained above.

Fixed βœ…

  • (Optimization) Mark model input sizes as dynamic for torch.compile() to avoid recompile during evals or variable-sequence / batch size training. This doesn't seem to hurt throughput.
  • Made HTTPS and GCS IO functions more robust.
  • Fixed a bug where we were always getting dolma2 tokenized validation data when generating config with DataMix.v3_small_ppl_validation.

Commits

62d2c9e (chore) prepare for release v1.7.0
cb77039 mark model ladder as a beta feature
08c8073 Adapt conversion script to work with OLMo2 models (#116)
8e716b5 Add model ladder building blocks (#114)
1647f78 Add some more tests for nGPT (#113)
37e0e88 improve docs
d68d47a Make nn configs more flexible (#112)
0bcc840 RoPE scaling, document how to convert HuggingFace checkpoints (#111)
7655a3b Add template variable to ppl validation file manifest (#110)
ca44cf4 Implement nGPT (#108)
c47df7c make IO functions more robust (#109)
4f2c8ef Update README.md
57b38ad Mark model input as dynamically sized (#105)
776e235 remove duplicate script