Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate llama instead of downloading it #250

Merged
merged 3 commits into from
Sep 5, 2024
Merged

Conversation

satyaog
Copy link
Member

@satyaog satyaog commented Aug 13, 2024

No description provided.

benchmarks/llm/prepare.py Outdated Show resolved Hide resolved
@satyaog satyaog force-pushed the feature/generate_llama branch 2 times, most recently from cc8a7c3 to de3ffda Compare August 14, 2024 21:51
@satyaog
Copy link
Member Author

satyaog commented Aug 14, 2024

This should work for 8B and 70B llama3 models. To test it tho there's 2 points where I would need tips:

  • Where can I find the resources (8 GPUs) to test the 70B llama3? Do we have a dedicated machine on the mila cluster?
  • 70B llama3 benches are all using *.safetensors whereas the generated files are original/*.pth files. Are safetensors needed? How could I change the config to use the original/*.pth instead?

@Delaunay
Copy link
Collaborator

We could try to use in the config to use the original checkpoints.

checkpointer:
  _component_: torchtune.utils.FullModelMetaCheckpointer

instead

checkpointer:
  _component_: torchtune.utils.FullModelHFCheckpointer

@satyaog satyaog force-pushed the feature/generate_llama branch from 42f3aed to 6da9faa Compare August 20, 2024 13:50
@Delaunay Delaunay mentioned this pull request Aug 20, 2024
@Delaunay
Copy link
Collaborator

Could you try to see if we are able to load the original/*.pth on 4 GPUs or if it requires 8GPUs ?

@satyaog
Copy link
Member Author

satyaog commented Aug 20, 2024

Sure I'll try but with the safetensors as unfortunatly torchtune is not compatible with multiple *.pth for now :

Currently we support reading from a single checkpoint file only. Support for reading from
sharded checkpoints is WIP.

https://github.com/pytorch/torchtune/blob/main/torchtune/utils/_checkpointing/_checkpointer.py#L650-L656

But I think it should work as there's an option to offload on CPU :

fsdp_cpu_offload: True

https://github.com/mila-iqia/milabench/pull/250/files#diff-1072365d60b45fad1a39661bb5e7a99bc355fc167d08476e83f8b97484eecd6aL98

@satyaog satyaog force-pushed the feature/generate_llama branch from 6da9faa to eca81da Compare August 20, 2024 20:13
@satyaog
Copy link
Member Author

satyaog commented Aug 20, 2024

Tested with 8 GPUs and with 4 GPUs and ~600G RAM testing right now with 4 GPUs and ~600G RAM. So far so good with 18/30 steps completed. Both are working. So llama3 8B will generate/load original/*.pth model format of the model and the 70B will generate/load safetensors model format

@satyaog
Copy link
Member Author

satyaog commented Aug 20, 2024

Would you use the cloud-ci to test this?

@satyaog satyaog force-pushed the feature/generate_llama branch from 8b35c3d to 282e9c2 Compare August 27, 2024 17:55
* rename huggingface token to MILABENCH_* to automatically forward the env var to a remote in such cases
@satyaog satyaog force-pushed the feature/generate_llama branch from 282e9c2 to 39222b8 Compare September 5, 2024 14:30
@Delaunay Delaunay changed the base branch from master to staging September 5, 2024 15:28
@Delaunay Delaunay merged commit ea44ea6 into staging Sep 5, 2024
3 of 6 checks passed
model-00028-of-00030.safetensors,
model-00029-of-00030.safetensors,
model-00030-of-00030.safetensors,
model-00001-of-00062.safetensors,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cela serait bien si les poids étaient interchangeables

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qu'on fasse la génération du fichier .yaml après avoir généré les fichiers .safetensors?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

que la génération génère 30 safetensors pour qu'on puisse changer entre pretrained & generated sans problème

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here : #278

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants