Bugfix policy gradinet reinforce tf2 #29
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
there is a bg in
policy_gradient_reinforce_tf2.py
atline 39
.loss = network.train_on_batch(states, discounted_rewards)
to fix this I made two changes,
one_hot_encode = np.array([[1 if a==i else 0 for i in range(2)] for a in actions])
I think it also solves issues #26 #27 #28
I tested it
gym.make("CartPole-v0")
It converged in 2000 episodes!