DAY 1

Project TODO

Extend context length with good contenders:
- TinyLLaMA 1B (4096 ctx length)
- OpenELM 1B (1024 ctx length)
- Danube 1B (8192 ctx length)
Generate a chain-of-thought (CoT) dataset leveraging the high context window.

Key Points:

TinyLLaMA 1B, OpenELM 1B, Danube 1B, Mistral-7B, GPT-3.5-Turbo, Mamba, and Recurrent Memory Transformers (RMT) are good candidates for extending context length.
OpenELM 1B has a maximum context length of 1024 tokens, while Danube 1B supports up to 8192 tokens.
Mistral-7B and GPT-3.5-Turbo have context lengths of 32K and 16K tokens respectively.
Generating a chain-of-thought dataset can leverage the high context window capabilities of these models.

Huge inspiration comes from this blog: kaiokendev rope implementation.

Note:

Actually, scrap those top models off, they already have rotary position encoding enabled by default. We will do it for GPT-2.
Source: We'll build our dataset from this.
Colab Notebook: Current work.

Current Training:

Using LIMA to train.
After 5 tries on full fine-tuning and OOM'ing, I'm going to try QLoRA.

Got fine-tuned but incoherent, maybe I should try LLaMA instead of GPT-2.

DAY 2

References:

Monkey Patch Script
GitHub Repository
Extending GPT-2 context length via rope scaling:
- Training Run: This supposedly failed for obvious reasons.
- Should try with the 8B parameter: Source.

DAY 3

Clear Steps:

Apply monkey patch for rope embeddings.
Add EOS token as pad token (GPT-2 tokenizer doesn't have it by default) and increase lm_head size by one because the vocab increased by 1.
Format and tokenize long Alpaca for training.
Pass model and data to the trainer.
Train?

Alternative Method:

Monkey patch and change config and upload the model to Hugging Face (override PretrainedConfig class in config.py).
Directly download and train with Trainer API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pretier_documentation.md

pretier_documentation.md

DAY 1

Project TODO

Key Points:

Note:

Current Training:

DAY 2

References:

DAY 3

Clear Steps:

Alternative Method:

Files

pretier_documentation.md

Latest commit

History

pretier_documentation.md

File metadata and controls

DAY 1

Project TODO

Key Points:

Note:

Current Training:

DAY 2

References:

DAY 3

Clear Steps:

Alternative Method: