Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong value in call to F.softmax #2

Open
marioyc opened this issue Sep 14, 2020 · 4 comments
Open

Wrong value in call to F.softmax #2

marioyc opened this issue Sep 14, 2020 · 4 comments

Comments

@marioyc
Copy link

marioyc commented Sep 14, 2020

Should F.softmax(Q_targets_next, dim=1) be F.softmax(Q_targets_next / entropy_tau, dim=1) instead?

@BY571
Copy link
Owner

BY571 commented Sep 14, 2020

for DQN its only Q_targets_next:

image

but for IQN you are right :)

@marioyc
Copy link
Author

marioyc commented Sep 15, 2020

Oh, I didn't notice that, seems to contradict equation 2, and it would also change the logsumexp calculations, given that these assume the q values are divided by entropy_tau

@marioyc
Copy link
Author

marioyc commented Sep 18, 2020

Confirmed with the author that it is a typo, the values should be divided by entropy_tau.
Also there is a TF implementation here: https://github.com/google-research/google-research/tree/master/munchausen_rl

@BY571
Copy link
Owner

BY571 commented Sep 18, 2020

@marioyc Thank you! I'll fix it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants