Enhance Dataloader Configuration #60

surajpaib · 2023-04-20T21:30:18Z

🚀 Feature Request

A lot of dataloader arguments are mentioned in system parameters. For example, batch_size, drop_last_batch.

Would be good to have a way to set other parameters of the dataloader such as prefetch_factor, persist_workers and potentially other future additions to this.

🛰 Alternatives

Maybe we can add a partial dataloader to the system config? and give it dataset and sampler later?

The text was updated successfully, but these errors were encountered:

ibro45 · 2024-03-27T22:48:28Z

Discussed with @john-zielke-snkeos:

Get rid of batch size, num_workers, samplers, and collate_fns
Introduce dataloaders. A user shouldn't need to define the whole dataloader, for example, to define the batch size and num workers, this should be sufficient:

dataloaders:
    train:
        batch_size: 4
        num_workers: 8

and the rest was already set by default. If a user needs a completely different DataLoader, they can go ahead and define _target: ..., but ensure thae the other default args aren't given to it in that case.

surajpaib · 2024-04-04T08:16:42Z

@ibro45 Agree with this.

This would now bring us into the territory of templates where we set some default object for dataloaders.

If we do this for data loaders and there is a default expected behaviour for it that our user can expect, should we not do this for other items in the config as well?

For instance, trainer can be defaulted to pytorch_lightning.Trainer with benchmark=True, precision=16-mixed, etc.

surajpaib · 2024-04-04T08:21:37Z

We can also extend this templating and have several templates for different workflows.

Say we want to have a classification workflow, we can set templates for a few different models and losses. We can set up a simple CLI interface for the user to generate a configuration that provides selection between these different templates and spits out a final config that they just have to configure their data for (These templates won't get assigned by default unlike the dataloaders)

We can use something like Cookiecutter (https://github.com/cookiecutter/cookiecutter) to map user CLI to pre-set templates

This will be a separate feature ofcourse and should go in a separate issue if we agree to do it but templating can provide us a lot of extra features without comprising on the dynamicism of the library

ibro45 · 2024-04-04T19:00:02Z

Seems like pydantic could be the way to go in this case. I will attempt to refactor it sometime soon, hopefully, over the weekend. This combo of pydantic and MONAI Bundle will somewhat resemble Hydra's integration with Data Classes.

Let's discuss the defaults in the future PR.

surajpaib added the enhancement New feature or request label Apr 20, 2023

surajpaib changed the title ~~Propagate parameters through system~~ Enhance Dataloader Configuration Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Dataloader Configuration #60

Enhance Dataloader Configuration #60

surajpaib commented Apr 20, 2023

ibro45 commented Mar 27, 2024

surajpaib commented Apr 4, 2024

surajpaib commented Apr 4, 2024

ibro45 commented Apr 4, 2024

Enhance Dataloader Configuration #60

Enhance Dataloader Configuration #60

Comments

surajpaib commented Apr 20, 2023

🚀 Feature Request

🛰 Alternatives

ibro45 commented Mar 27, 2024

surajpaib commented Apr 4, 2024

surajpaib commented Apr 4, 2024

ibro45 commented Apr 4, 2024