Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robot Rolling in New Gazebo Environment and TensorBoard Axis Labels Issue #157

Open
zahra370 opened this issue Jul 29, 2024 · 18 comments
Open

Comments

@zahra370
Copy link

I've started training in a different Gazebo environment, but my robot slips and rolls uncontrollably and doesn't reach the goal position. What could be the issue? Also, how can I set the x-axis and y-axis labels for a TensorBoard ?

@reiniscimurs
Copy link
Owner

Hi,

Please provide more detailed description of your issue.
But check if your robot has spawned properly and can actually move as in: #116

AFAIK there is no such way, but support for that is better asked on tensorboard forums.

@zahra370
Copy link
Author

Thank you for your quick response. i have resolve the previous problems, Now i'm facing few issues, I have trained your model and observed that the loss function graph does not exhibit the ideal trend we typically aim for. What could be the potential causes of this issue? Please help me achieve the desired downward slope of the loss function.

Below is the TD3 trained loss function. It initially decreases to zero, but then it starts to increase again.
loss
but normally we see this ideal trends for loss function
image
I would appreciate your help and Time.

@reiniscimurs
Copy link
Owner

That loss curve is correct for different training methods, but it wouldn't work that way here. See the discussion in: #89 (comment)

@zahra370
Copy link
Author

zahra370 commented Aug 1, 2024

Thank you, I understand. I attempted to train a TD3 model on different environments, but the reward function is diverging.

gazebo world
image
image
image

@reiniscimurs
Copy link
Owner

@zahra370
Copy link
Author

I’m encountering an issue where my robot pauses for about 15 to 16 seconds at the end of each episode before the environment resets. I discovered that transformation errors are causing this delay, which in turn increases the training time. I’m unsure how to reduce these errors.

@reiniscimurs
Copy link
Owner

Could it be that it is other way around? The delay of 13 seconds causes the tf issues due to de-syncing?

My only other guess is checking that your gmapping (I suggest using slam_toolbox as it has better slam performance) is not affecting anything here. I would start there as that is the variable that has changed and could have a significant impact on tf tree and transforms.

@zahra370
Copy link
Author

Thanks for the advice. I’ll look into

@zahra370
Copy link
Author

I've found that the delays at the end of episode terminations are due to the train function taking 30 to 50 seconds to complete before the Gazebo world resets. Could you help me figure out how to reduce this time?

@reiniscimurs
Copy link
Owner

After every episode we run training. This is the actual deep learning part, before that we simply collect samples. Usually it would take a couple of seconds if you are using cuda so 30 to 50 seconds seems quite a long time to me. However, during this time gazebo is paused and no desync should happen. Have you increased your state size/batch size/ number of training iterations so that 50 seconds would be necessary? Are you using GPU to train and does your computer have enough resources to maintain sync and train at the same time?

If I recall correctly you would have to call a slam as a separate service. If you are not pausing the slam service while pausing everything else, this could lead to timing issues.

@zahra370
Copy link
Author

I've noticed that the TD3 model tends to rotate excessively near the goal position and struggles to reach it, as if it's getting stuck in a local minimum. How can this behavior be prevented?

@reiniscimurs
Copy link
Owner

You should provide full description of the problem you are facing. It is difficult to say anything without extra information.

@zahra370
Copy link
Author

I have trained the TD3 model with a dense reward function and SLAM on your TD3.world and tested it in a narrow corridor environment. In most cases, it reaches the goal position easily, but it struggles in two situations:
1- If the goal position is close to the walls, the robot rotates in place and doesn't reach the goal. I think this is because, in the TD3.world, there is a significant safe distance around the walls. During testing, if this distance is reduced, the robot is unable to reach the goal.

TD3_rotation_near_goal.1.mp4

I have trained TD3 model for only 14k epochs. I tested it in local minima environments. Its performance in narrow corridors was quite good, but it faced significant difficulty in local minima environments.

TD3_local_minima-2024-09-06_10.03.36.online-video-cutter.com.1.mp4

Could you explain why the robot rotates in place? Is it because the robot gets stuck in local minima? How can I reduce this issue?

how can I generalize this model to work effectively in unseen environments?

narrow corridor world
image

local minima world
image

@reiniscimurs
Copy link
Owner

reiniscimurs commented Sep 25, 2024

1- If the goal position is close to the walls, the robot rotates in place and doesn't reach the goal. I think this is because, in the TD3.world, there is a significant safe distance around the walls. During testing, if this distance is reduced, the robot is unable to reach the goal.

What do you mean with "safe distance around the walls"? We do not specify any safety distance. We do have a term in the reward function (r3) that gives a slight negative reward if a robot is too close to an object. This gives a bit of a potential field around obstacles. However, changing anything in the reward will not do anything in a testing stage as a reward is not used there for anything anymore. It is entirely based on Q-value the estimation which is trained based on the reward during training.

The reason for goals not being reached near walls is because the Q-value for going forward in such states is lower than staying still. Think of how the Q-value is learned. Lets assume that robot has collected 100 experiences where it is close to the wall at 1 meter distance. In only one of these experiences there was a goal point between the robot and the wall. If the robot goes forward it will hit the wall 99 times out of 100 and receive a reward of -100 and only once a reward of +100. So your Q value for such a state will be (-100 * 99 + 100 * 1) / 100 = -98. If the robot stands still, it will get a reward of (0 * 100) / 100 = 0. Since 0 > -98, the robot chooses the best Q-value for the state-action pair and chooses to stand still. I am heavily simplifying it here but it should give an overview that it is entirely based on the learned Q-value how the model chooses to solve this issue.

So the issue is not that the model is not suitable for unseen environments, as it actually is, and it is solving the problem as it has learned it in an unseen environment. It is just that your scenario is not suitable for the current model and you need to find a way to form a Q-value in such a way as to solve your environment where goals are very close to the walls.

@zahrakhan010
Copy link

After training the TD3 model with this reward function for 424 epochs, the reward reached only 91.504. Given the number of epochs, I expected it to approach 200. I suspect that the lower reward might be due to the reduced weight assigned to certain components within the reward function. Could you suggest some ways to help increase the reward?
image

@reiniscimurs
Copy link
Owner

Which reported reward do you mean here? The average reward at the end of the evaluation?

@zahrakhan010
Copy link

yes, I mean the average reward at the end of the evaluation.

@reiniscimurs
Copy link
Owner

You would have to look at the actual individual values of each reward element. If the accumulated immediate reward is around 0 or even just slightly positive then this would make sense.

You w1 termed reward could be around 0 on average. W2 will be 0 or negative. In w3 I don't know what dist_reward is but the term should be negative or else you would be encouraging the robot to stay away from the goal so this term will be negative and it even has a very high weight. No clue what values would be for w4.

So looks to me that the immediate reward would not accumulate a lot of positive values over the episode length. Then if you have some collision, then average reward would be below 100.

I suggest simply debugging into this and checking the actual values you get.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants