You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should start working on a new DRL algorithm based on MA PPO algorithm, it promises significant speed improvements, and would solver the critique of the centralized critic approach
The text was updated successfully, but these errors were encountered:
The general structure is ready on the PPO branch and runnable with one gradient step. However, the conversion for a single agent seems to get stuck in extreme values, so nothing too valuable is learned.
We should start working on a new DRL algorithm based on MA PPO algorithm, it promises significant speed improvements, and would solver the critique of the centralized critic approach
The text was updated successfully, but these errors were encountered: