Skip to content

archit-spec/RoPE-scaling

Repository files navigation


Extending GPT-2 Context Length via RoPE Scaling

note: ive chose gpt2 as its the only small model i could find that i could finetune easiy without ooming which was old enough to not have rope preimplemented (qlora was making it harder and giving too many errors so dint wanna go into that as time constrians)

Training Runs

Demo

  • Try the model here: GPT-2 Long Demo
  • Try giving an input of >1k or 2k tokens Demo

Evaluation

Approach

  • Use the rotatory pos implementation by lucid rains here
  • change the model to use rope pos embeddings
  • save and upload to huggingface (to not oom), the model can be found here
  • load and train seperately on long-alpaca12k
  • these steps can be seen in notebook and notebook
  • for logs and other findings or docs check logs and this

Note:

  • i kind of get that the ideal way to apply pathces to models would be something like this kaiokendev impl though this was my frist time doing this and time constrains so i just used whatever i could

About

Implementing RoPE for GPT2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published