Robot Rolling in New Gazebo Environment and TensorBoard Axis Labels Issue #157

zahra370 · 2024-07-29T06:18:15Z

I've started training in a different Gazebo environment, but my robot slips and rolls uncontrollably and doesn't reach the goal position. What could be the issue? Also, how can I set the x-axis and y-axis labels for a TensorBoard ?

reiniscimurs · 2024-07-29T13:45:19Z

Hi,

Please provide more detailed description of your issue.
But check if your robot has spawned properly and can actually move as in: #116

AFAIK there is no such way, but support for that is better asked on tensorboard forums.

zahra370 · 2024-07-30T06:59:39Z

Thank you for your quick response. i have resolve the previous problems, Now i'm facing few issues, I have trained your model and observed that the loss function graph does not exhibit the ideal trend we typically aim for. What could be the potential causes of this issue? Please help me achieve the desired downward slope of the loss function.

Below is the TD3 trained loss function. It initially decreases to zero, but then it starts to increase again.

but normally we see this ideal trends for loss function

I would appreciate your help and Time.

reiniscimurs · 2024-07-30T07:52:44Z

That loss curve is correct for different training methods, but it wouldn't work that way here. See the discussion in: #89 (comment)

zahra370 · 2024-08-01T06:06:15Z

Thank you, I understand. I attempted to train a TD3 model on different environments, but the reward function is diverging.

gazebo world

reiniscimurs · 2024-08-01T07:34:53Z

See any issue tagged with label convergence: https://github.com/reiniscimurs/DRL-robot-navigation/issues?q=is%3Aissue+label%3Aconvergence+is%3Aclosed

zahra370 · 2024-08-14T05:36:24Z

I’m encountering an issue where my robot pauses for about 15 to 16 seconds at the end of each episode before the environment resets. I discovered that transformation errors are causing this delay, which in turn increases the training time. I’m unsure how to reduce these errors.

reiniscimurs · 2024-08-14T08:46:03Z

Could it be that it is other way around? The delay of 13 seconds causes the tf issues due to de-syncing?

My only other guess is checking that your gmapping (I suggest using slam_toolbox as it has better slam performance) is not affecting anything here. I would start there as that is the variable that has changed and could have a significant impact on tf tree and transforms.

zahra370 · 2024-08-14T11:18:50Z

Thanks for the advice. I’ll look into

zahra370 · 2024-08-16T10:42:13Z

I've found that the delays at the end of episode terminations are due to the train function taking 30 to 50 seconds to complete before the Gazebo world resets. Could you help me figure out how to reduce this time?

reiniscimurs · 2024-08-16T13:13:41Z

After every episode we run training. This is the actual deep learning part, before that we simply collect samples. Usually it would take a couple of seconds if you are using cuda so 30 to 50 seconds seems quite a long time to me. However, during this time gazebo is paused and no desync should happen. Have you increased your state size/batch size/ number of training iterations so that 50 seconds would be necessary? Are you using GPU to train and does your computer have enough resources to maintain sync and train at the same time?

If I recall correctly you would have to call a slam as a separate service. If you are not pausing the slam service while pausing everything else, this could lead to timing issues.

zahra370 · 2024-09-18T13:04:41Z

I've noticed that the TD3 model tends to rotate excessively near the goal position and struggles to reach it, as if it's getting stuck in a local minimum. How can this behavior be prevented?

reiniscimurs · 2024-09-19T06:26:34Z

You should provide full description of the problem you are facing. It is difficult to say anything without extra information.

zahra370 · 2024-09-19T12:30:30Z

I have trained the TD3 model with a dense reward function and SLAM on your TD3.world and tested it in a narrow corridor environment. In most cases, it reaches the goal position easily, but it struggles in two situations:
1- If the goal position is close to the walls, the robot rotates in place and doesn't reach the goal. I think this is because, in the TD3.world, there is a significant safe distance around the walls. During testing, if this distance is reduced, the robot is unable to reach the goal.

TD3_rotation_near_goal.1.mp4

I have trained TD3 model for only 14k epochs. I tested it in local minima environments. Its performance in narrow corridors was quite good, but it faced significant difficulty in local minima environments.

TD3_local_minima-2024-09-06_10.03.36.online-video-cutter.com.1.mp4

Could you explain why the robot rotates in place? Is it because the robot gets stuck in local minima? How can I reduce this issue?

how can I generalize this model to work effectively in unseen environments?

narrow corridor world

local minima world

reiniscimurs · 2024-09-25T07:48:34Z

1- If the goal position is close to the walls, the robot rotates in place and doesn't reach the goal. I think this is because, in the TD3.world, there is a significant safe distance around the walls. During testing, if this distance is reduced, the robot is unable to reach the goal.

What do you mean with "safe distance around the walls"? We do not specify any safety distance. We do have a term in the reward function (r3) that gives a slight negative reward if a robot is too close to an object. This gives a bit of a potential field around obstacles. However, changing anything in the reward will not do anything in a testing stage as a reward is not used there for anything anymore. It is entirely based on Q-value the estimation which is trained based on the reward during training.

The reason for goals not being reached near walls is because the Q-value for going forward in such states is lower than staying still. Think of how the Q-value is learned. Lets assume that robot has collected 100 experiences where it is close to the wall at 1 meter distance. In only one of these experiences there was a goal point between the robot and the wall. If the robot goes forward it will hit the wall 99 times out of 100 and receive a reward of -100 and only once a reward of +100. So your Q value for such a state will be (-100 * 99 + 100 * 1) / 100 = -98. If the robot stands still, it will get a reward of (0 * 100) / 100 = 0. Since 0 > -98, the robot chooses the best Q-value for the state-action pair and chooses to stand still. I am heavily simplifying it here but it should give an overview that it is entirely based on the learned Q-value how the model chooses to solve this issue.

So the issue is not that the model is not suitable for unseen environments, as it actually is, and it is solving the problem as it has learned it in an unseen environment. It is just that your scenario is not suitable for the current model and you need to find a way to form a Q-value in such a way as to solve your environment where goals are very close to the walls.

zahrakhan010 · 2024-11-05T03:56:08Z

After training the TD3 model with this reward function for 424 epochs, the reward reached only 91.504. Given the number of epochs, I expected it to approach 200. I suspect that the lower reward might be due to the reduced weight assigned to certain components within the reward function. Could you suggest some ways to help increase the reward?

reiniscimurs · 2024-11-05T06:44:13Z

Which reported reward do you mean here? The average reward at the end of the evaluation?

zahrakhan010 · 2024-11-05T07:08:58Z

yes, I mean the average reward at the end of the evaluation.

reiniscimurs · 2024-11-05T07:48:47Z

You would have to look at the actual individual values of each reward element. If the accumulated immediate reward is around 0 or even just slightly positive then this would make sense.

You w1 termed reward could be around 0 on average. W2 will be 0 or negative. In w3 I don't know what dist_reward is but the term should be negative or else you would be encouraging the robot to stay away from the goal so this term will be negative and it even has a very high weight. No clue what values would be for w4.

So looks to me that the immediate reward would not accumulate a lot of positive values over the episode length. Then if you have some collision, then average reward would be below 100.

I suggest simply debugging into this and checking the actual values you get.

reiniscimurs mentioned this issue Sep 25, 2024

Problem with testing in another envrionment #148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robot Rolling in New Gazebo Environment and TensorBoard Axis Labels Issue #157

Robot Rolling in New Gazebo Environment and TensorBoard Axis Labels Issue #157

zahra370 commented Jul 29, 2024

reiniscimurs commented Jul 29, 2024

zahra370 commented Jul 30, 2024

reiniscimurs commented Jul 30, 2024

zahra370 commented Aug 1, 2024

reiniscimurs commented Aug 1, 2024

zahra370 commented Aug 14, 2024

reiniscimurs commented Aug 14, 2024

zahra370 commented Aug 14, 2024

zahra370 commented Aug 16, 2024

reiniscimurs commented Aug 16, 2024

zahra370 commented Sep 18, 2024

reiniscimurs commented Sep 19, 2024

zahra370 commented Sep 19, 2024

reiniscimurs commented Sep 25, 2024 •

edited

Loading

zahrakhan010 commented Nov 5, 2024

reiniscimurs commented Nov 5, 2024

zahrakhan010 commented Nov 5, 2024

reiniscimurs commented Nov 5, 2024

Robot Rolling in New Gazebo Environment and TensorBoard Axis Labels Issue #157

Robot Rolling in New Gazebo Environment and TensorBoard Axis Labels Issue #157

Comments

zahra370 commented Jul 29, 2024

reiniscimurs commented Jul 29, 2024

zahra370 commented Jul 30, 2024

reiniscimurs commented Jul 30, 2024

zahra370 commented Aug 1, 2024

reiniscimurs commented Aug 1, 2024

zahra370 commented Aug 14, 2024

reiniscimurs commented Aug 14, 2024

zahra370 commented Aug 14, 2024

zahra370 commented Aug 16, 2024

reiniscimurs commented Aug 16, 2024

zahra370 commented Sep 18, 2024

reiniscimurs commented Sep 19, 2024

zahra370 commented Sep 19, 2024

reiniscimurs commented Sep 25, 2024 • edited Loading

zahrakhan010 commented Nov 5, 2024

reiniscimurs commented Nov 5, 2024

zahrakhan010 commented Nov 5, 2024

reiniscimurs commented Nov 5, 2024

reiniscimurs commented Sep 25, 2024 •

edited

Loading