You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
And, the paper said:
If the continuous action ac must depend on the discrete action chosen by the agent, then ad
can be used as input when computing µc and σc.
I think the continuous action depend on the discrete action, but I did not find the
where using ad to compute µc and σc.
Could you tell me where to show this dependency of actions.
Thank you very much for any advance!!!
The text was updated successfully, but these errors were encountered:
Hi @Chenhait, I'm glad you're interested in the code.
I agree that for the Platform environment, the continuous actions depend on discrete actions. Namely, there are 3 discrete actions (run, hop, leap), each associated with 1 continuous component ("how much" to run/hop/leap).
Consider the case where, in the policy network, we have just one common set of μc and σc for all discrete actions. In this case, we would definitely need to condition it on the chosen discrete action.
However, in the implementation here, we have separate μc and σc for each discrete action. Hence, we don't need to condition on the chosen discrete action.
Hi, thank you so much for offering this code.
I read the code: hybrid_sac_platform.py
And, the paper said:
If the continuous action ac must depend on the discrete action chosen by the agent, then ad
can be used as input when computing µc and σc.
I think the continuous action depend on the discrete action, but I did not find the
where using ad to compute µc and σc.
Could you tell me where to show this dependency of actions.
Thank you very much for any advance!!!
The text was updated successfully, but these errors were encountered: