yarn-mistral-flax

An implementation of yarn-mistral-7B in flax. This implementation is based on the pytorch version uploaded to huggingface.

Nous-Yarn-Mistral-7b-128k is a state-of-the-art language model for long context, further pretrained on long context data for 1500 steps using the YaRN extension method. It is an extension of Mistral-7B-v0.1 and supports a 128k token context window.

From the abstract of the original arxiv submission:

"Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps than previous methods. Using YaRN, we show that LLaMA models can effectively utilize and extrapolate to context lengths much longer than their original pre-training would allow, while also surpassing previous the state-of-the-art at context window extension. In addition, we demonstrate that YaRN exhibits the capability to extrapolate beyond the limited context of a fine-tuning dataset."

Visualizing the improvements of yarn

Below I compare the attention strength when using RoPE with positional interpolation and Yarn. I assume that the original context length was 1000 tokens and the new extended context length is 5000 tokens. We see that RoPE with positional interpolation drops quickly to close to 0 attention preactivation strength, while at the same time exhibiting severe oscillations for further values. On the other hand the attention preactivation implied by Yarn drops in a much smoother fashion and has much smaller oscillations.

✉️ Contact Information

You can contact me at any of my social network profiles:

💼 Linkedin: https://www.linkedin.com/in/konstantinos-pitas-lts2-epfl/
Github: https://github.com/konstantinos-p

Or via email at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
added_tokens.json		added_tokens.json
config.json		config.json
configuration_mistral.py		configuration_mistral.py
generation_config.json		generation_config.json
modeling_mistral_yarn.py		modeling_mistral_yarn.py
modeling_mistral_yarn_flax.py		modeling_mistral_yarn_flax.py
pytorch_model.bin.index.json		pytorch_model.bin.index.json
special_tokens_map.json		special_tokens_map.json
test_modeling_mistral_yarn.py		test_modeling_mistral_yarn.py
test_modeling_mistral_yarn_flax.py		test_modeling_mistral_yarn_flax.py
tokenizer.json		tokenizer.json
tokenizer_config.json		tokenizer_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

yarn-mistral-flax

Visualizing the improvements of yarn

✉️ Contact Information

About

Releases

Packages

Languages

konstantinos-p/yarn-mistral-flax

Folders and files

Latest commit

History

Repository files navigation

yarn-mistral-flax

Visualizing the improvements of yarn

✉️ Contact Information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages