Replies: 5 comments 14 replies
-
Refactoring of lossesDeprecate usage of tensordict functional callsAs noted in #1613, functional calls like with params.to_module(module):
output = module(input) which (1) works well with tensordict and (2) allows us to call other methods than forward (eg, What you should expect: In theory this won't be bc-breaking, all changes will occur within the losses. We'll keep Modularizing lossesSome losses (eg, TD3) already benefit from methods like
|
Beta Was this translation helpful? Give feedback.
-
On the data collector side, we could make things run faster by allowing a more flexible device handling. Currently, the device of data collectors is poorly handled IMO. It's not super clear what your env and policy run on if you pass or don't pass a device.
Will this be bc-breakingI believe not What to expect from these changesThe main thing I want to resolve is that executing envs that would naturally be on CPU with a parallel env on cuda is slow because there is a lot of io. cc @matteobettini this is something we talked about I believe |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot for this discussion. Alll the features listed are going to be incredibly helpful for our lab as users of TorchRL, and I agree with all the roadmaps set. A question I had is about the defaults for the new device arguments passed to collectors. I imagine the policy and env devices will default to the current device of these compnents. But what will be the default for |
Beta Was this translation helpful? Give feedback.
-
What is the plan for improving RLHF primitives? Can one do distributed RLHF with TorchRL across multiple nodes already? |
Beta Was this translation helpful? Give feedback.
-
When will v0.3.0 be released? And will it come with the same requirement on PyTorch >=2.2.0 that |
Beta Was this translation helpful? Give feedback.
-
Now that v0.2.0 is out, let's talk v0.3.0!
Refactoring
Removing trailing dim in done, reward, etcThe extra singleton dim is annoying, unclear if we should use it for _reset etc.
The main motivation to have it was to avoid silent bugs like
reward + (1-terminated) * next_val * gamma
but these things are internal and we already check the shapes so we can expect that where things will break, they will be easy to spot.This will be bc-breaklng, I don't see a clear path to smoothly change it (maybe a flag like old_api or smth but that isn't the greatest as we will need to do a lot of assumptions in losses and similar).
env.step_and_maybe_reset
for faster data collectionNew algos and fixes
Improvements
set_gym_backend
and similar functionspolicy_device
,env_device
in collector (see below)Beta Was this translation helpful? Give feedback.
All reactions