-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Support latest Jumanji version #1134
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @WiemKhlifi! Just a few questions, but it looks mostly good to me.
As a sanity check can you please do a few test runs to just check that the system performance is unaffected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Wiem, couple small things, mostly removing the git stuff from requirements.txt where possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @WiemKhlifi. Some suggestions from my side.
mava/wrappers/jumanji.py
Outdated
# The environment returns a list of individual rewards and these are used as is. | ||
return timestep.replace(observation=modified_observation) | ||
# Whether or not aggregate the list of individual rewards. | ||
reward = aggregate_rewards(timestep.reward, self.num_agents, self._use_individual_rewards) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit I am not a massive fan of the not
here. But I prefer it over having the conditional in the aggregation function. What do you think of just having the config option be aggregate_rewards
instead of use_individual_rewards
? Then we could change the conditional here to if self._aggregate_rewards
.
reward = aggregate_rewards(timestep.reward, self.num_agents, self._use_individual_rewards) | |
if not self._use_individual_rewards: | |
reward = aggregate_rewards(timestep.reward, self.num_agents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh either way I prefer this 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose the second option to use aggregate_rewards
instead of the not which is less confusing 😅
What?
Upgrade to the latest Jumanji version of
1.0.1
instead of0.3.1
and pin to the original and latestJumanji
andMatrax
.How?
requirements.txt
to use original versions instead of a fork.Extra:
super().__init__(env)
( Theself.__getattr__(env,name)
in parent class can't get the attribute from env if it's defined with different name in the env wrapper class).For example: