You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I find that the expert dataset has some problems. For example, for game 'asterix', I use terminal to split the trajectory, and the maximum return is only round 260. Can you please check the problem?
env = gym.make('asterix-expert-v0'.format(game), stack=True)\
dataset = env.get_dataset()
# Split trajectories
traj_ends = np.where(dataset['terminals'] == 1)[0]
traj_start_ends = [(0, traj_ends[0])]
for i in range(len(traj_ends) - 2):
traj_start_ends.append((traj_ends[i], traj_ends[i + 1]))
rewards_list = list()
for traj_start, traj_end in traj_start_ends:
rewards_list.append(np.array(dataset['rewards'][traj_start:traj_end][:,np.newaxis]))
print(np.mean([np.sum(_) for _ in rewards_list]), np.std([np.sum(_) for _ in rewards_list]))
The text was updated successfully, but these errors were encountered:
Sorry for the super response. But, yes, the rewards are clipped. Also ,let me redirect you to this publication since this repository is simply relying on the dataset provided by Google. https://arxiv.org/abs/1907.04543
I find that the expert dataset has some problems. For example, for game 'asterix', I use terminal to split the trajectory, and the maximum return is only round 260. Can you please check the problem?
The text was updated successfully, but these errors were encountered: