-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Env crashes with IndexError, when using random actions #9
Comments
Running ray.tune.run(..) with max_failures=-1 helps, but spends lot of time with failure runs :( |
Hi, The behavior you observed is normal. As the environment contains illegal action depending on the state, you have to sample for the legal action vector. Using parametric action space with RLLib requires to use of a network that can mask such actions: https://docs.ray.io/en/latest/rllib/rllib-models.html#variable-length-parametric-action-spaces |
Hey, I didn't saw the discussion, thanks for the hint - it looks like it can help me. I tried to make the environment more internal stable, so that it can handle any random action. (I'd like to use the action masking as a speed-up, not as the only way to make it run) Nevertheless: The IndexOutOfBounds-Error is indeed a result of setting the action_space too large. It should be equal to self.jobs, not self.jobs+1 ( [0,self.jobs -1] are the jobs, self.jobs is the Nope-Action, right? ) I avoided the other error by adding PS: Good work with the paper and the code! I'm very grateful that you went out of your way to publish your work here. |
Thanks a lot for the interest and the compliment ;)
I've included this in the README, I hope it clarify how to use it
I don't recommend doing so, mainly because if you allow the agent to take random actions, it makes the problem harder.
Indeed, the environment allows for a Nope action, you're correct [0,self.jobs -1] are the jobs, and the last action is the Nope action
In theory, this shouldn't have any impact as the environment checks it before allowing actions in the legal action vector. |
Occasionally my environment (and thus my ray workers) crash at the beginning of training.
I observed two cases so far:
Obviously the random actions steer the environment into a bad place.
How did you handle this during your own training? Currently I can't train my agents, because they crash when the env crashes. (Im using Ray[rllib])
Code to reproduce the error:
The text was updated successfully, but these errors were encountered: