-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When providing a local path to the optimize method, make it work in a distributed settings for Jobs #193
Comments
I tested this in lightning studio and it worked (as you've also stated) import os
from litdata import optimize, Machine
def compress(index):
return (index, index ** 2)
optimize(
fn=compress,
inputs=list(range(100)),
num_workers=2,
output_dir="./output_dir",
chunk_bytes="64MB",
mode="overwrite",
num_nodes=1,
machine=Machine.DATA_PREP,
) But, I don't get it.
What do you mean by: |
Sorry, it was a typo. I meant multi machine jobs. If you put Both machines are going to store the data locally but never merge it. |
Got it. I'll try fixing this. |
Plz clarify this:
Let's say So, what should _output_dir be modified to? I tried making it: |
Hey @deependujha. It needs to be this one: https://github.com/Lightning-AI/litdata/blob/main/src/litdata/processing/utilities.py#L182 that gets translated into /teamspace/jobs/{job_name}/{rank_0}/{user_folder} |
🚀 Feature
Motivation
Right now, it is possible to do this in a Lightning Studio
However, when running this code in a multi machine jobs, this won't properly work.
Instead, we should convert the output_dir to an s3 path pointing to the node 0 artifacts path + the user provided output_dir
Pitch
Alternatives
Additional context
The text was updated successfully, but these errors were encountered: