Workplan for v0.3.0 -- Tell us what you think / want! #1612

vmoens · 2023-10-06T10:49:47Z

vmoens
Oct 6, 2023
Collaborator

vmoens · 2023-11-09T14:19:24Z

vmoens
Nov 9, 2023
Collaborator Author

Refactoring of losses

Deprecate usage of tensordict functional calls

As noted in #1613, functional calls like module(*input, params) can cause some problems if the module cannot be deepcopied. There are many other issues with this, for instance recognising the params input with any given signature can be quite hard.
Another option could be to use torch.func.functional_call but, as of today, this does not support TensorDict inputs well (i will if pytorch/pytorch#112441 goes through but I'm keeping my expectations low).
The option I would go for is to use this feature from TensorDict that we will land soon:

with params.to_module(module):
    output = module(input)

which (1) works well with tensordict and (2) allows us to call other methods than forward (eg, get_dist) and (3) allows for more complex calls like vmap or grad within the decorator (@matteobettini).

What you should expect: In theory this won't be bc-breaking, all changes will occur within the losses. We'll keep make_functional and related for backward compatibility but losses won't use these anymore. From a performance point of view, I wouldn't expect any substential slowdown.

Modularizing losses

Some losses (eg, TD3) already benefit from methods like actor_loss and value_loss that can be called indepdently. The forward method just calls them one after another and groups the results. We'd like to generalize this to all losses. In some cases (eg, CQL) this can even allow users to quickly make ablation tests (cc @BY571).

`reduction` arg in losses

It has been brought to our attention that calling mean on losses by default may be unwanted in some cases. We could add a reduction kwarg in constructors to allow for different reduction strategies. cc @albertbou92

0 replies

vmoens · 2023-11-10T18:42:16Z

vmoens
Nov 10, 2023
Collaborator Author

On the data collector side, we could make things run faster by allowing a more flexible device handling.

Currently, the device of data collectors is poorly handled IMO. It's not super clear what your env and policy run on if you pass or don't pass a device.
Ultimately, I think the best API would be to be the following:

if you do not specify any device during construction, the collector uses the env and policy device and returns data on the env device (because we need to pick one between policy and env, but policy does not generally have a .device attribute). All collectors will perform the casting from device to device internally.
If you want, you can cast policy and env to a specific device by using policy_device and env_device kwargs.
We keep the storing_device which indicates where the data needs to be stored (if you have very long rollout you may want to dump data on RAM). So we'll have 3 devices that you can pass to the constructor.
Finally, the current device will be kept for backward compatibility purposes and will override policy_device and env_device (an error will be thrown if you pass both device and one of env_device and policy_device).

Will this be bc-breaking

I believe not

What to expect from these changes

The main thing I want to resolve is that executing envs that would naturally be on CPU with a parallel env on cuda is slow because there is a lot of io.
With this change, you will be able to execute your envs on CPU, and automatically cast the data on CUDA to run your policy. That will be much faster than having a parallel env on cuda! (cc @skandermoalla)

cc @matteobettini this is something we talked about I believe

0 replies

matteobettini · 2023-11-16T11:59:51Z

matteobettini
Nov 16, 2023

Thanks a lot for this discussion. Alll the features listed are going to be incredibly helpful for our lab as users of TorchRL, and I agree with all the roadmaps set.

A question I had is about the defaults for the new device arguments passed to collectors. I imagine the policy and env devices will default to the current device of these compnents. But what will be the default for storing_device?

1 reply

vmoens Nov 17, 2023
Collaborator Author

I would stand by

returns data on the env device (because we need to pick one between policy and env, but policy does not generally have a .device attribute)

skandermoalla · 2023-11-18T20:33:24Z

skandermoalla
Nov 18, 2023

What is the plan for improving RLHF primitives? Can one do distributed RLHF with TorchRL across multiple nodes already?

12 replies

vmoens Jan 12, 2024
Collaborator Author

I would need to test them first, do you mind if i dedicate more time to this next week?

wbinventor Jan 12, 2024

That sounds great! Definitely interested to hear if this is an appropriate workaround, for now.

vmoens Jan 19, 2024
Collaborator Author

@wbinventor #1804
This is a proposal for PPO, A2C and Reinforce.
I have an idea of how to make the others also functionalless but that will take a bit more time!

wbinventor Jan 19, 2024

@vmoens awesome, I will pull and test this out!

wbinventor Jan 20, 2024

This patch looks good and works for me 😄

wbinventor · 2024-01-11T20:52:30Z

wbinventor
Jan 11, 2024

When will v0.3.0 be released? And will it come with the same requirement on PyTorch >=2.2.0 that torchRL-nightly has?

1 reply

vmoens Jan 12, 2024
Collaborator Author

Release date at the end of the month
RE pytorch requirement: if you pip install yes it will ask for the latest torch. But if you make a local install through a git clone, anything above torch 1.13 will work

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workplan for v0.3.0 -- Tell us what you think / want! #1612

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 14 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Workplan for v0.3.0 -- Tell us what you think / want! #1612

vmoens Oct 6, 2023 Collaborator

Refactoring

New algos and fixes

Improvements

Replies: 5 comments · 14 replies

vmoens Nov 9, 2023 Collaborator Author

Refactoring of losses

Deprecate usage of tensordict functional calls

Modularizing losses

reduction arg in losses

vmoens Nov 10, 2023 Collaborator Author

Will this be bc-breaking

What to expect from these changes

matteobettini Nov 16, 2023

vmoens Nov 17, 2023 Collaborator Author

skandermoalla Nov 18, 2023

vmoens Jan 12, 2024 Collaborator Author

wbinventor Jan 12, 2024

vmoens Jan 19, 2024 Collaborator Author

wbinventor Jan 19, 2024

wbinventor Jan 20, 2024

wbinventor Jan 11, 2024

vmoens Jan 12, 2024 Collaborator Author

vmoens
Oct 6, 2023
Collaborator

Replies: 5 comments 14 replies

vmoens
Nov 9, 2023
Collaborator Author

`reduction` arg in losses

vmoens
Nov 10, 2023
Collaborator Author

matteobettini
Nov 16, 2023

vmoens Nov 17, 2023
Collaborator Author

skandermoalla
Nov 18, 2023

vmoens Jan 12, 2024
Collaborator Author

vmoens Jan 19, 2024
Collaborator Author

wbinventor
Jan 11, 2024

vmoens Jan 12, 2024
Collaborator Author