Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an example for RL #13

Open
8 tasks
lebrice opened this issue Jun 26, 2024 · 0 comments
Open
8 tasks

Add an example for RL #13

lebrice opened this issue Jun 26, 2024 · 0 comments

Comments

@lebrice
Copy link
Collaborator

lebrice commented Jun 26, 2024

This is a bit of a tougher one. We'd probably benefit from having at least one example of a simple RL workflow.

There's already a proof-of-concept for this that you could base yourself on, but it's probably best to start fresh and ignore it (it's over-engineered and not ideal). At least if you're stuck and need some ideas, you can draw some inspiration from it.

User Research

  • Consult some RL folks at Mila, ask them what frameworks / tools they use for A) The environments, and B) the baseline algorithms. Make a doc on Confluence listing some of the options out and briefly describing their pros/cons.

Design

  • Take a look at different RL baseline algorithm implementations. Choose one that is best according to these criteria:
    - Relevance - This algo should be something that people might actually want to run (e.g. PPO/DQN/other)
    - Ease of implementation: This is a bit trickier. Some baselines are great, but it's almost impossible to extract out the "gettting some interaction data from the environment or replay buffer" portion of the code from the "Updating the agent using that experience data" portion of the code. This separation will become important for the next part
    - Note: There could definitely be some benefits to using something like PyTorch-Lightning for the RL algorithm, even though this framework makes the most sense for supervised learning-type workflows.
  • (optional) Design a LightningDataModule for RL.
    • Consider carefully how we want to deal with the environment/agent interaction and/or replay buffers.
    • (There's already a Proof of concept for this on the rl branch. You can take some inspiration from there. Ask @lebrice for some precision if needed.

Implementation

  • (optional, recommended): Write some tests for such an algorithm (before) writing the code for it. (Also, would be a good idea to crowdsource some ideas for good tests for RL algorithms from RL researchers at Mila)
  • (optional) Write the tests for the RLDataModule and implement it.
  • Implement the algorithm

Validation

  • Make sure that all tests pass, that the algo is reproducible, that it uses resources effectively.
  • Show the algo to some RL researchers at Mila, make sure that everything looks good to them. (ideally we'd depend as much as possible on external libraries for the algo, but our portion of the algo should be checked).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant