Develop new learning algorithm: PPO #239

nick-harder · 2023-11-06T09:14:15Z

We should start working on a new DRL algorithm based on MA PPO algorithm, it promises significant speed improvements, and would solver the critique of the centralized critic approach

nick-harder · 2024-06-20T07:26:13Z

This tasks has been given low priority as other issues need to be adressed first

kim-mskw · 2024-11-06T09:46:56Z

The general structure is ready on the PPO branch and runnable with one gradient step. However, the conversion for a single agent seems to get stuck in extreme values, so nothing too valuable is learned.

nick-harder assigned nick-harder and kim-mskw Nov 6, 2023

nick-harder added feature A required feature implementation enhancement An optional feature or enhancement labels Nov 6, 2023

kim-mskw linked a pull request Oct 28, 2024 that will close this issue

Algorithmic Expansion to PPO #462

Draft

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop new learning algorithm: PPO #239

Develop new learning algorithm: PPO #239

nick-harder commented Nov 6, 2023

nick-harder commented Jun 20, 2024

kim-mskw commented Nov 6, 2024

Develop new learning algorithm: PPO #239

Develop new learning algorithm: PPO #239

Comments

nick-harder commented Nov 6, 2023

nick-harder commented Jun 20, 2024

kim-mskw commented Nov 6, 2024