Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the expert data the real expert? #13

Open
Ericonaldo opened this issue Oct 20, 2022 · 4 comments
Open

Is the expert data the real expert? #13

Ericonaldo opened this issue Oct 20, 2022 · 4 comments

Comments

@Ericonaldo
Copy link

I find that the expert dataset has some problems. For example, for game 'asterix', I use terminal to split the trajectory, and the maximum return is only round 260. Can you please check the problem?

env = gym.make('asterix-expert-v0'.format(game), stack=True)\
dataset = env.get_dataset()

# Split trajectories
traj_ends = np.where(dataset['terminals'] == 1)[0]
traj_start_ends = [(0, traj_ends[0])]

for i in range(len(traj_ends) - 2):
    traj_start_ends.append((traj_ends[i], traj_ends[i + 1]))
    
rewards_list = list()
for traj_start, traj_end in traj_start_ends:
    rewards_list.append(np.array(dataset['rewards'][traj_start:traj_end][:,np.newaxis]))

print(np.mean([np.sum(_) for _ in rewards_list]), np.std([np.sum(_) for _ in rewards_list]))
@Ericonaldo
Copy link
Author

Seems the rlunplugged dataset is using clipped reward

@KeLiChloe
Copy link

Hi I met the same question. Do you know how to scale the clip reward to real reward? thanks!

@Ericonaldo
Copy link
Author

Not to my knowledge. No.

@takuseno
Copy link
Owner

Sorry for the super response. But, yes, the rewards are clipped. Also ,let me redirect you to this publication since this repository is simply relying on the dataset provided by Google.
https://arxiv.org/abs/1907.04543

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants