Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Dataset Preprocessing due to CPU affinity (?) issues #118

Open
mgolub2 opened this issue May 2, 2024 · 5 comments
Open

Slow Dataset Preprocessing due to CPU affinity (?) issues #118

mgolub2 opened this issue May 2, 2024 · 5 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@mgolub2
Copy link

mgolub2 commented May 2, 2024

🐛 Bug

I'm attempting to train a model using litgpt, and the openwebtext dataset. I launch the run as normal following their examples, and the dataset preprocessing starts:
image
However, the workers are all using a single core!
image
Checking the pid, the affinity is set incorrectly (?)
image

I don't know why that would be though. This is fairly new behavior, running the prepare_data portion of openwebtext was quite fast few weeks ago.

To Reproduce

litgpt pretrain --config config_hub/pretrain/tinyllama

Expected behavior

Data preparation should use multiple cores

Environment

  • PyTorch Version (e.g., 1.0): 2.4.0a0+gitc82fcb7 (cxx11 abi build)
  • OS (e.g., Linux): Rocky Linux 9.3
  • How you installed PyTorch (conda, pip, source): source
  • Build command you used (if compiling from source): python setup.py develop
  • Python version: 3.11
  • CUDA/cuDNN version: 12.4
  • GPU models and configuration: 2x 4090s
  • Any other relevant information: You Rock!

Additional context

Possibly thinking this is some weird interplay between threading, affinity and the cx11 ABI? Will test in a more normal configuration soon.

@mgolub2 mgolub2 added bug Something isn't working help wanted Extra attention is needed labels May 2, 2024
Copy link

github-actions bot commented May 2, 2024

Hi! thanks for your contribution!, great first issue!

@mgolub2
Copy link
Author

mgolub2 commented May 2, 2024

Possibly thinking this is some weird interplay between threading, affinity and the cx11 ABI? Will test in a more normal configuration soon.

Wow, it might actually be cxx11 abi related (edit: or python 3.9 vs 3.11?) - running the pretraining example in a non cxx11 environment uses all the cores:

image image

I double checked that tokenizers and transformers were both on the same version - they were not, but fixing that changed nothing, and both are now on 0.19.1 & 4.40.1 respectivly.

@tchaton
Copy link
Collaborator

tchaton commented May 7, 2024

That's odd indeed. I recommend using Lightning Studio to prepare your dataset.

@mgolub2
Copy link
Author

mgolub2 commented May 15, 2024

I can paper over the issue by setting the cpu affinity using (fishshell incoming) sudo pgrep pt_main_thread | while read -l pid; taskset --all-tasks -p ffff $pid; end

That's odd indeed. I recommend using Lightning Studio to prepare your dataset.

I'd rather not, thanks.

@tchaton
Copy link
Collaborator

tchaton commented May 15, 2024

Hey @mgolub2, PyTorch Geometric supports CPU affinity mapping for their dataloader: https://github.com/pyg-team/pytorch_geometric/blob/e9648df16dcb6dde0e09b5736b1b2da5d68db2ad/docs/source/advanced/cpu_affinity.rst#L80.

If you are interested, you can take inspiration to contribute to litdata native support for cpu affinity, so you don't need to hack around.

I'd rather not, thanks.
The main advantage is that data file manipulations are faster in the cloud, so it scales better. Plus, you get free machine to use whenever you need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants