When providing a local path to the optimize method, make it work in a distributed settings for Jobs #193

tchaton · 2024-06-27T18:48:02Z

🚀 Feature

Motivation

Right now, it is possible to do this in a Lightning Studio

optimize(
	output_dir="./optimized_data"
)

However, when running this code in a multi machine jobs, this won't properly work.

Instead, we should convert the output_dir to an s3 path pointing to the node 0 artifacts path + the user provided output_dir

Pitch

Alternatives

Additional context

deependujha · 2024-07-05T04:55:19Z

I tested this in lightning studio and it worked (as you've also stated)

import os
from litdata import optimize, Machine

def compress(index):
    return (index, index ** 2)

optimize(
    fn=compress,
    inputs=list(range(100)),
    num_workers=2,
    output_dir="./output_dir",
    chunk_bytes="64MB",
    mode="overwrite",
    num_nodes=1,
    machine=Machine.DATA_PREP,
)

But, I don't get it.

However, when running this code in a machine machine jobs, this won't properly work.

What do you mean by: machine machine jobs?

tchaton · 2024-07-05T06:09:07Z

Sorry, it was a typo. I meant multi machine jobs. If you put num_nodes=2 for example.

Both machines are going to store the data locally but never merge it.

deependujha · 2024-07-05T06:22:53Z

Got it. I'll try fixing this.

deependujha · 2024-07-07T12:31:13Z

Plz clarify this:

Instead, we should convert the output_dir to an s3 path pointing to the node 0 artifacts path + the user provided output_dir.

Let's say output_dir="./optimized_data" and resolve_dir returns us with _output_dir=Dir(path='/teamspace/studios/this_studio/optimized_data', url=None).

So, what should _output_dir be modified to?

I tried making it: /teamspace/studios/{STUDIO_NAME}/optimized_data, but it fails with error that OSError: [Errno 30] Read-only file system: '/teamspace/studios/local-path-in-distributed-optimize/optimized_data'.

tchaton · 2024-07-08T07:15:48Z

Hey @deependujha. It needs to be this one: https://github.com/Lightning-AI/litdata/blob/main/src/litdata/processing/utilities.py#L182

that gets translated into /teamspace/jobs/{job_name}/{rank_0}/{user_folder}

tchaton added enhancement New feature or request help wanted Extra attention is needed labels Jun 27, 2024

deependujha mentioned this issue Jul 8, 2024

Fix: local path issue in distributed optimize method #214

Merged

4 tasks

tchaton closed this as completed in #214 Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When providing a local path to the optimize method, make it work in a distributed settings for Jobs #193

When providing a local path to the optimize method, make it work in a distributed settings for Jobs #193

tchaton commented Jun 27, 2024 •

edited

Loading

deependujha commented Jul 5, 2024

tchaton commented Jul 5, 2024

deependujha commented Jul 5, 2024

deependujha commented Jul 7, 2024

tchaton commented Jul 8, 2024

When providing a local path to the optimize method, make it work in a distributed settings for Jobs #193

When providing a local path to the optimize method, make it work in a distributed settings for Jobs #193

Comments

tchaton commented Jun 27, 2024 • edited Loading

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

deependujha commented Jul 5, 2024

tchaton commented Jul 5, 2024

deependujha commented Jul 5, 2024

deependujha commented Jul 7, 2024

tchaton commented Jul 8, 2024

tchaton commented Jun 27, 2024 •

edited

Loading