High frequency in RPMs when include action buffer in observation space can couse problems in real hardware #212

piratax007 · 2024-05-10T10:20:09Z

First of all, congratulations on this wonderful repo.

Now, I'm training a policy to control a real Crazyflie, I'm using RPM as action space instead of ONE_DIM_RPM and I have success in simulation except for the RPM plot.

As you can see here:

the RMP has this high frequency that makes it useless to implement in a real drone.

I understand what you said in #180 "The main thing to note is that the observation contains the actions of the last .5 seconds, so increasing the ctrl freq will increase the obs space." and "The idea of the action buffer is that the policy might be better guided by knowing what the controller had done just before, the proportionality to the control frequency makes it dependent on the wall-clock only, and not the type of controller (but it might be appropriate to change that, depending on application).". Nevertheless, adding the action buffer in the observation space has as a consequence the high frequency shown before. If I remove the buffer and use only the states (12 inputs) as observation space, the drone achieves the target position and orientation (because I'm controlling yaw) and the RPM doesn't present the high frequency reported

Questions:

What is the difference between transferring the trained policy to a real drone, with the action buffer in the observation space and without it?
I'm trying to add a low pass filter to reduce the high frequency in the RPMs, can you help me to deduce what is the best cut-off and sample frequency to set up the filter?
In the SB3 documentation that you refer to, I cannot find anything about using this action buffer in the observation space and I have some questions about it like, how to determine the size of the buffer. As you said, the buffer's size is related to the CTRL_FREQUENCY, but why? What means CTRL_FREQUENCY? and what is the relation between CTRL_FREQUENCY and PYB_FREQUENCY and time-step? (I know that in BaseAviary.py line 481 you define the time step using PYB_FREQUENCY).
What is the frequency in which the policy interacts with the drone (send actions and receive observations and rewards)?

Thanks for your time.

The text was updated successfully, but these errors were encountered:

JacopoPan · 2024-06-02T15:02:48Z

Hi @piratax007, apologies for the late answer.

The buffer is something I introduced only in a second moment to make the training examples faster (and because it's a feature you see in other similar drone RL works), it's not something intended to facilitate sim2real transfer
I would start attempting sim2real with the smoother policy first, I'm a bit skeptic about deploying a very noisy controller + a filter in real hardware.
What I refer to as action buffer has nothing to do with SB3 per se: it is simply the concatenation of the actions commanded to the environment in the last N seconds to the observation the environment returns. Its size depends on CTRL_FREQUENCY because the faster the CTRL_FREQUENCY, the more actions will have been sent in the last N seconds.
CTRL_FREQUENCY is the frequenty at which the policy interacts with the drone, PYB_FREQUENCY is the frequency at which Bullet is called to update the state of the simulation (it must be a multiple of CTRL_FREQUENCY).

piratax007 · 2024-06-04T13:19:03Z

Thanks for your time and answer.

I extracted the action buffer from the observation space and training using curriculum learning the training process takes a couple of hours for a complicated task.

Now I'm trying sim2real using crazyflie 2.x and Vicon system, I have issues because the observations from Vicon come with noise and the policy was trained in a perfect environment. Any suggestions for the sim2real transfer part?

Best regards.

JacopoPan added the question Further information is requested label Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High frequency in RPMs when include action buffer in observation space can couse problems in real hardware #212

High frequency in RPMs when include action buffer in observation space can couse problems in real hardware #212

piratax007 commented May 10, 2024 •

edited

Loading

JacopoPan commented Jun 2, 2024

piratax007 commented Jun 4, 2024

High frequency in RPMs when include action buffer in observation space can couse problems in real hardware #212

High frequency in RPMs when include action buffer in observation space can couse problems in real hardware #212

Comments

piratax007 commented May 10, 2024 • edited Loading

JacopoPan commented Jun 2, 2024

piratax007 commented Jun 4, 2024

piratax007 commented May 10, 2024 •

edited

Loading