-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training script #31
Training script #31
Conversation
scripts/upload_stories.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should use argparse and take some arguments?
float16 vs bfloat16 vs float32 - Can we just train everything on float32 to ensure reproducibility across all devices? |
What can I do to finish this? Do you want me to take over @jannik-brinkmann |
bc335cb
to
d67d6a1
Compare
@SrGonao it would be great if someone could pickup the refactoring of the training - the functionality should be fine. @jettjaniak depending on your feedback, I can already start the trainings on the un-refactored version of the code, to provide new models to the evals team. If you want to wait for the refactoring to be done, I could work on it on Thursday, if no one else could do it before then :) |
All the comments need to be resolved, but below are some of top priorities:
|
scripts/upload_stories.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this script. We're hosting the dataset on HF, so we should host it in the llama2c github repo. How we generated it is perhaps more interesting and could be it's own script, but whatever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a bunch of comments to try to make this obnoxiously-huge PR slightly less of a pain to review.
with: | ||
submodules: recursive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added this when we added the llama2c submodule. Ironically, we also removed llama2c in the course of developing this PR. Technically we don't need this anymore, but it's not a bad idea to have it around for any submodules we add in the future.
@@ -31,11 +33,11 @@ jobs: | |||
- name: dependencies | |||
run: | | |||
python -m pip install --upgrade pip | |||
pip install -r requirements.txt | |||
pip install -r requirements-nocuda.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mamba really wants to run on CUDA, and there are optional-but-preferred dependencies for doing so. We want to use that when we can, because otherwise training mamba models would take foreverrrrrrrrr - but only on platforms that support it. We split out requirements into those that don't require cuda (-nocuda) and those that do (still in requirements.txt, which automatically includes -nocuda requirements). Github CI doesn't run in a CUDA env because why would Microsoft give away that much GPU compute? So we use the non-cuda requirements.txt for CI.
pip install -e . | ||
- name: black | ||
run: black --check . | ||
- name: isort | ||
run: isort --profile black --check . | ||
run: isort --check . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--black config is now implicit in pyproject.toml
bin | ||
include | ||
lib64 | ||
pyvenv.cfg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are llama2c artifacts? Also no technically longer needed, but on the other hand these are generally things we'd want to exclude from git if anything with these names ever showed up.
# ignore wandb files | ||
**/wandb/* | ||
**/*.wandb | ||
**/wandb-summary.json | ||
**/wandb-metadata.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugging wandb integration involved a lot of wandb artifacts being created and I was too lazy to change directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird one. Iteration parameters are always derivable from config and dataset size, but it's convenient to store them anyway, if only to cut down on how many parameters we're passing around. Originally implemented while trying to break the original script into functional pieces, and then never revisited because that seemed like too much effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main training function! Given a GigaConfig, does setup and runs the training loop. Most of the actual logic lives in train_step.
…ixed preset args; set default logging to INFO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.