You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a bit of a tougher one. We'd probably benefit from having at least one example of a simple RL workflow.
There's already a proof-of-concept for this that you could base yourself on, but it's probably best to start fresh and ignore it (it's over-engineered and not ideal). At least if you're stuck and need some ideas, you can draw some inspiration from it.
User Research
Consult some RL folks at Mila, ask them what frameworks / tools they use for A) The environments, and B) the baseline algorithms. Make a doc on Confluence listing some of the options out and briefly describing their pros/cons.
I (@lebrice ) would be partial to a jax-based environment library such as Gymnax and/or Brax, but it isn't strictly necessary.
Take a look at different RL baseline algorithm implementations. Choose one that is best according to these criteria:
- Relevance - This algo should be something that people might actually want to run (e.g. PPO/DQN/other)
- Ease of implementation: This is a bit trickier. Some baselines are great, but it's almost impossible to extract out the "gettting some interaction data from the environment or replay buffer" portion of the code from the "Updating the agent using that experience data" portion of the code. This separation will become important for the next part
- Note: There could definitely be some benefits to using something like PyTorch-Lightning for the RL algorithm, even though this framework makes the most sense for supervised learning-type workflows.
(optional) Design a LightningDataModule for RL.
Consider carefully how we want to deal with the environment/agent interaction and/or replay buffers.
(There's already a Proof of concept for this on the rl branch. You can take some inspiration from there. Ask @lebrice for some precision if needed.
Implementation
(optional, recommended): Write some tests for such an algorithm (before) writing the code for it. (Also, would be a good idea to crowdsource some ideas for good tests for RL algorithms from RL researchers at Mila)
(optional) Write the tests for the RLDataModule and implement it.
Implement the algorithm
Validation
Make sure that all tests pass, that the algo is reproducible, that it uses resources effectively.
Show the algo to some RL researchers at Mila, make sure that everything looks good to them. (ideally we'd depend as much as possible on external libraries for the algo, but our portion of the algo should be checked).
The text was updated successfully, but these errors were encountered:
This is a bit of a tougher one. We'd probably benefit from having at least one example of a simple RL workflow.
There's already a proof-of-concept for this that you could base yourself on, but it's probably best to start fresh and ignore it (it's over-engineered and not ideal). At least if you're stuck and need some ideas, you can draw some inspiration from it.
User Research
Design
- Relevance - This algo should be something that people might actually want to run (e.g. PPO/DQN/other)
- Ease of implementation: This is a bit trickier. Some baselines are great, but it's almost impossible to extract out the "gettting some interaction data from the environment or replay buffer" portion of the code from the "Updating the agent using that experience data" portion of the code. This separation will become important for the next part
- Note: There could definitely be some benefits to using something like PyTorch-Lightning for the RL algorithm, even though this framework makes the most sense for supervised learning-type workflows.
rl
branch. You can take some inspiration from there. Ask @lebrice for some precision if needed.Implementation
Validation
The text was updated successfully, but these errors were encountered: