Training script #31

jettjaniak · 2024-02-13T18:01:48Z

No description provided.

scripts/train.py

jettjaniak · 2024-02-13T18:04:27Z

scripts/upload_stories.py

this should use argparse and take some arguments?

scripts/upload_tokens.py

src/delphi/llama2.py

src/delphi/mamba.py

scripts/train.py

src/delphi/mamba.py

src/delphi/train/training.py

src/delphi/train/utils.py

src/delphi/train/training_old.py

scripts/train.py

src/delphi/llama2.py

jettjaniak · 2024-02-16T17:31:41Z

float16 vs bfloat16 vs float32 - Can we just train everything on float32 to ensure reproducibility across all devices?

src/delphi/train/training.py

SrGonao · 2024-02-26T09:22:01Z

What can I do to finish this? Do you want me to take over @jannik-brinkmann

src/delphi/train/training_old.py

jannik-brinkmann · 2024-02-26T19:00:06Z

@SrGonao it would be great if someone could pickup the refactoring of the training - the functionality should be fine. @jettjaniak depending on your feedback, I can already start the trainings on the un-refactored version of the code, to provide new models to the evals team. If you want to wait for the refactoring to be done, I could work on it on Thursday, if no one else could do it before then :)

jettjaniak · 2024-02-26T20:16:39Z

All the comments need to be resolved, but below are some of top priorities:

training independent of tokenizer, PretokDataset etc. - just use a tokenized HF dataset
revamp TrainingConfig (waiting for try hydra for training configs #48)
remove things related to model compilation, ddp and all things that we don't need
remove amp, everything should train on float32
define a function performing a single training step (wandb should be outside of this)

src/delphi/train/llama2.py

scripts/upload_tokens.py

jettjaniak · 2024-03-01T04:09:17Z

scripts/upload_stories.py

I don't think we need this script. We're hosting the dataset on HF, so we should host it in the llama2c github repo. How we generated it is perhaps more interesting and could be it's own script, but whatever

jaidhyani

I added a bunch of comments to try to make this obnoxiously-huge PR slightly less of a pain to review.

jaidhyani · 2024-03-16T18:19:27Z

.github/workflows/checks.yml

+        with:
+          submodules: recursive


We added this when we added the llama2c submodule. Ironically, we also removed llama2c in the course of developing this PR. Technically we don't need this anymore, but it's not a bad idea to have it around for any submodules we add in the future.

jaidhyani · 2024-03-16T18:22:27Z

.github/workflows/checks.yml

@@ -31,11 +33,11 @@ jobs:
      - name: dependencies
        run: |
          python -m pip install --upgrade pip
-          pip install -r requirements.txt
+          pip install -r requirements-nocuda.txt


mamba really wants to run on CUDA, and there are optional-but-preferred dependencies for doing so. We want to use that when we can, because otherwise training mamba models would take foreverrrrrrrrr - but only on platforms that support it. We split out requirements into those that don't require cuda (-nocuda) and those that do (still in requirements.txt, which automatically includes -nocuda requirements). Github CI doesn't run in a CUDA env because why would Microsoft give away that much GPU compute? So we use the non-cuda requirements.txt for CI.

jaidhyani · 2024-03-16T18:23:05Z

.github/workflows/checks.yml

          pip install -e .
      - name: black
        run: black --check .
      - name: isort
-        run: isort --profile black --check .
+        run: isort --check .


--black config is now implicit in pyproject.toml

jaidhyani · 2024-03-16T18:24:10Z

.gitignore

+bin
+include
+lib64
+pyvenv.cfg


I think these are llama2c artifacts? Also no technically longer needed, but on the other hand these are generally things we'd want to exclude from git if anything with these names ever showed up.

jaidhyani · 2024-03-16T18:24:53Z

.gitignore

+# ignore wandb files
+**/wandb/*
+**/*.wandb
+**/wandb-summary.json
+**/wandb-metadata.json


Debugging wandb integration involved a lot of wandb artifacts being created and I was too lazy to change directories.

src/delphi/train/config/model/delphi_mamba_config.py

src/delphi/train/config/model/delphi_model_config.py

jaidhyani · 2024-03-17T01:00:46Z

src/delphi/train/iteration_params.py

This is a weird one. Iteration parameters are always derivable from config and dataset size, but it's convenient to store them anyway, if only to cut down on how many parameters we're passing around. Originally implemented while trying to break the original script into functional pieces, and then never revisited because that seemed like too much effort.

jaidhyani · 2024-03-17T01:02:06Z

src/delphi/train/training.py

The main training function! Given a GigaConfig, does setup and runs the training loop. Most of the actual logic lives in train_step.

src/delphi/train/utils.py

…ixed preset args; set default logging to INFO

jaidhyani

jettjaniak assigned jettjaniak, SrGonao and jannik-brinkmann Feb 13, 2024