-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential bug during training? #11
Comments
Did that work out for you? I found my actor loss unable to converge. |
Yes it did, although I was running it on discrete state and action environments. Which env are you using? |
@liubaoryol it is great to hear that you got it working with discrete action space! could you please share your code? i think it will be valuable, as multiple people here already asked about discrete action support. Thanks in advance |
Of course! Let me clean it up and I'll share it next week:) |
I'm interested to know about the implementation for discrete action support too. :) |
reward = -logsigmoid(-logits) = -log[1 - sigmoid(logits)] = -log(1 - D), which corresponds the objective of G is minimize log(1-D). |
Is there a reason you calculate the reward the way you do in line 69?
gail-airl-ppo.pytorch/gail_airl_ppo/algo/airl.py
Line 69 in 4e13a23
My models were able to learn after I changed that line to
This gives the unshaped rewards
The text was updated successfully, but these errors were encountered: