[BUG] Different final epsilon and evaluation epsilon for Atari implementations #429

pseudo-rnd-thoughts · 2023-11-15T13:33:16Z

Problem Description

Within the Q-learning implementation for Atari (DQN, C51 and QDAgger DQN, both jax and pytorch implementations), then there are different final epsilon values used during training (example at 0.01) and the epsilon value used during evaluation at the end (example at 0.05)

I believe this will result in the Atari environments having unfair evaluations compared to the true agent performance.

I don't think this affects the training curves as we mostly compare the episodic rewards rather than evaluation results but we should fix for users comparing the evaluation results.

This bug appears to have occurred when copying code from the DQN agent where 0.05 is the final epsilon.

Checklist

I have installed dependencies via poetry install (see CleanRL's installation guideline.
I have checked that there is no similar issue in the repo.
I have checked the documentation site and found not relevant information in GitHub issues.

Current Behavior

Agent policies should be evaluated with their final epsilon used during training

Expected Behavior

Agent policies are being evaluated at a different and higher epsilon than the training epsilon

Possible Solution

Modify all Q-learning agents to use the evaluation epsilon equal to the final training epsilon

The text was updated successfully, but these errors were encountered:

pseudo-rnd-thoughts mentioned this issue Nov 15, 2023

Use the training end_e as the evaluation(..., epsilon=end_e) for atari #430

Open

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Different final epsilon and evaluation epsilon for Atari implementations #429

[BUG] Different final epsilon and evaluation epsilon for Atari implementations #429

pseudo-rnd-thoughts commented Nov 15, 2023

[BUG] Different final epsilon and evaluation epsilon for Atari implementations #429

[BUG] Different final epsilon and evaluation epsilon for Atari implementations #429

Comments

pseudo-rnd-thoughts commented Nov 15, 2023

Problem Description

Checklist

Current Behavior

Expected Behavior

Possible Solution