You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nice, finally found this project that updates the policy using reward from discriminator and aligns itself with the algorithm in the GAIL paper. In many other libraries, they just use the reward from the environment. I was wondering why they do that and if optimizing policy with that reward detached from discriminator can really maximize the objective function.
The text was updated successfully, but these errors were encountered:
Nice, finally found this project that updates the policy using reward from discriminator and aligns itself with the algorithm in the GAIL paper. In many other libraries, they just use the reward from the environment. I was wondering why they do that and if optimizing policy with that reward detached from discriminator can really maximize the objective function.
The text was updated successfully, but these errors were encountered: