Add an example for RL #13

lebrice · 2024-06-26T17:28:41Z

This is a bit of a tougher one. We'd probably benefit from having at least one example of a simple RL workflow.

There's already a proof-of-concept for this that you could base yourself on, but it's probably best to start fresh and ignore it (it's over-engineered and not ideal). At least if you're stuck and need some ideas, you can draw some inspiration from it.

User Research

Consult some RL folks at Mila, ask them what frameworks / tools they use for A) The environments, and B) the baseline algorithms. Make a doc on Confluence listing some of the options out and briefly describing their pros/cons.
- I (@lebrice ) would be partial to a jax-based environment library such as Gymnax and/or Brax, but it isn't strictly necessary.
- Rejax
- PureJaxRL

Design

Take a look at different RL baseline algorithm implementations. Choose one that is best according to these criteria:
- Relevance - This algo should be something that people might actually want to run (e.g. PPO/DQN/other)
- Ease of implementation: This is a bit trickier. Some baselines are great, but it's almost impossible to extract out the "gettting some interaction data from the environment or replay buffer" portion of the code from the "Updating the agent using that experience data" portion of the code. This separation will become important for the next part
- Note: There could definitely be some benefits to using something like PyTorch-Lightning for the RL algorithm, even though this framework makes the most sense for supervised learning-type workflows.
(optional) Design a LightningDataModule for RL.
- Consider carefully how we want to deal with the environment/agent interaction and/or replay buffers.
- (There's already a Proof of concept for this on the rl branch. You can take some inspiration from there. Ask @lebrice for some precision if needed.

Implementation

(optional, recommended): Write some tests for such an algorithm (before) writing the code for it. (Also, would be a good idea to crowdsource some ideas for good tests for RL algorithms from RL researchers at Mila)
(optional) Write the tests for the RLDataModule and implement it.
Implement the algorithm

Validation

Make sure that all tests pass, that the algo is reproducible, that it uses resources effectively.
Show the algo to some RL researchers at Mila, make sure that everything looks good to them. (ideally we'd depend as much as possible on external libraries for the algo, but our portion of the algo should be checked).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an example for RL #13

Add an example for RL #13

lebrice commented Jun 26, 2024 •

edited

Loading

Add an example for RL #13

Add an example for RL #13

Comments

lebrice commented Jun 26, 2024 • edited Loading

User Research

Design

Implementation

Validation

lebrice commented Jun 26, 2024 •

edited

Loading