Skip to content

Latest commit

 

History

History
32 lines (23 loc) · 1.83 KB

Readme.md

File metadata and controls

32 lines (23 loc) · 1.83 KB

Extending GPT-2 Context Length via RoPE Scaling

note: ive chose gpt2 as its the only small model i could find that i could finetune easiy without ooming which was old enough to not have rope preimplemented (qlora was making it harder and giving too many errors so dint wanna go into that as time constrians)

Training Runs

Demo

  • Try the model here: GPT-2 Long Demo
  • Try giving an input of >1k or 2k tokens Demo

Evaluation

Approach

  • Use the rotatory pos implementation by lucid rains here
  • change the model to use rope pos embeddings
  • save and upload to huggingface (to not oom), the model can be found here
  • load and train seperately on long-alpaca12k
  • these steps can be seen in notebook and notebook
  • for logs and other findings or docs check logs and this

Note:

  • i kind of get that the ideal way to apply pathces to models would be something like this kaiokendev impl though this was my frist time doing this and time constrains so i just used whatever i could