Question on using reward from discriminator instead for policy update #10

guoyangqin · 2022-11-22T12:38:04Z

Nice, finally found this project that updates the policy using reward from discriminator and aligns itself with the algorithm in the GAIL paper. In many other libraries, they just use the reward from the environment. I was wondering why they do that and if optimizing policy with that reward detached from discriminator can really maximize the objective function.

Charlesyyun · 2023-04-14T02:14:44Z

Exactly, that's why we need GAIL! Because we don't know about the real reward function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on using reward from discriminator instead for policy update #10

Question on using reward from discriminator instead for policy update #10

guoyangqin commented Nov 22, 2022 •

edited

Loading

Charlesyyun commented Apr 14, 2023

Question on using reward from discriminator instead for policy update #10

Question on using reward from discriminator instead for policy update #10

Comments

guoyangqin commented Nov 22, 2022 • edited Loading

Charlesyyun commented Apr 14, 2023

guoyangqin commented Nov 22, 2022 •

edited

Loading