Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on using reward from discriminator instead for policy update #10

Open
guoyangqin opened this issue Nov 22, 2022 · 1 comment
Open

Comments

@guoyangqin
Copy link

guoyangqin commented Nov 22, 2022

Nice, finally found this project that updates the policy using reward from discriminator and aligns itself with the algorithm in the GAIL paper. In many other libraries, they just use the reward from the environment. I was wondering why they do that and if optimizing policy with that reward detached from discriminator can really maximize the objective function.

@Charlesyyun
Copy link

Exactly, that's why we need GAIL! Because we don't know about the real reward function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants