Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak (after dis-, then reconnect of Agent) #35

Open
gavanderhoorn opened this issue Jun 1, 2023 · 8 comments
Open

Possible memory leak (after dis-, then reconnect of Agent) #35

gavanderhoorn opened this issue Jun 1, 2023 · 8 comments
Labels
bug Something isn't working rv:0.1.0 Reported in MotoROS2 0.1.0 yrc1000

Comments

@gavanderhoorn
Copy link
Collaborator

As mentioned in the Known issues & limitations section of the README (here), MotoROS2 currently appears to suffer from a small memory leak.

Testing shows the following values for used and free memory after several consecutive connect and disconnect cycles with a Galactic micro-ROS Agent:

Initialization complete. Memory available: (962992) bytes. Memory in use: (85584) bytes
Initialization complete. Memory available: (962992) bytes. Memory in use: (85584) bytes
Initialization complete. Memory available: (962552) bytes. Memory in use: (86024) bytes
Initialization complete. Memory available: (962280) bytes. Memory in use: (86296) bytes
Initialization complete. Memory available: (962280) bytes. Memory in use: (86296) bytes
Initialization complete. Memory available: (962280) bytes. Memory in use: (86296) bytes
Initialization complete. Memory available: (962256) bytes. Memory in use: (86320) bytes
Initialization complete. Memory available: (962224) bytes. Memory in use: (86352) bytes

As a table:

# Free Delta In use Delta
0 962992 85584
1 962992 0 85584 0
2 962552 -440 86024 440
3 962280 -272 86296 272
4 962280 0 86296 0
5 962280 0 86296 0
6 962256 -24 86320 24
7 962224 -32 86352 32

No changes to the config file between reboots dis-and-re-connects.

It's unclear what causes this at this point.

@gavanderhoorn gavanderhoorn added bug Something isn't working rv:0.1.0 Reported in MotoROS2 0.1.0 yrc1000 labels Jun 1, 2023
@gavanderhoorn
Copy link
Collaborator Author

Summary of earlier findings/comments:

  • we could be handling message/service/action memory incorrectly. The micro-ROS documentation specifically states:

    If the init function is called twice for the same message without calling fini in between previously allocated memory will be leaked

    There's *_create()/*_destroy() and micro_ros_utilities_create_message_memory(..)/micro_ros_utilities_destroy_message_memory(..) in micro_ros_utils. Calling *_create() and then *_init() on sequence fields might also leak memory.

  • our libmicroros might be missing some fixes upstream already has: we're using micro-ROS/rmw_microxrcedds@a2a7484 in our Humble builds, but according to the commit history, a couple commits later Fix memory release on entities creation micro-ROS/rmw_microxrcedds#280 gets merged (as micro-ROS/rmw_microxrcedds@6df0003). From the description, it sounds like this could have something to do with the memory leak we're seeing.

@jimmy-mcelwain
Copy link
Collaborator

jimmy-mcelwain commented Nov 11, 2024

Just as an update to this, I am running iron, which includes the previously mentioned PR (micro-ROS/rmw_microxrcedds#280). But it seems to still have the same memory issues.

Here's a table showing the memory free and delta on my YRC1000 with Iron

# Free Delta
0 797368
1 796848 -520
2 796800 -48
3 796712 -88
4 796712 0
5 796752 +40
6 796752 0
7 796800 +48
8 796568 -232
9 796448 -120
10 796184 -264
11 796072 -112
12 796072 0
13 795976 -96
14 795920 -56
15 795928 +8
16 796000 +72
17 795888 -112
18 795792 -96
19 795744 -48

It is still unclear to me where the problem is. But it seems like at the very least, the problem is not solved by the linked PR. I can try to track it down. But with how inconsistent it is, it might be tough to tell what exactly is going wrong.

@jimmy-mcelwain
Copy link
Collaborator

jimmy-mcelwain commented Nov 13, 2024

On my very ugly branch, I have fixed seemingly all of the memory leaks from disconnect/reconnect cycles except for one. I believe that I have narrowed down the remaining leak to g_microRosNodeInfo.support. When it gets initialized, it allocates 368 bytes, but when it is freed it only frees 344 bytes. I don't know what is happening to the other 24 bytes. But every single disconnect/reconnect cycle (except for the very first one, where some memory is permanently allocated), the total loss is exactly 24 bytes across the whole program, so I am almost certain that this is the only remaining problem. I don't know if this is a problem upstream or a problem with how we are using it.

My branch is a branch off of iron_wip. There is no particular reason for that, I was just using iron at the time that I decided to try to fix this issue. The memory leak exists for both humble and iron, though.

If anybody has any ideas on how to fix this last leak, please let me know.

@jimmy-mcelwain
Copy link
Collaborator

I created a new branch off of main and copied over the relevant fix from my fork and made a PR. This will make it more clear what changes I made, and decrease the impact of the memory leak for the time being. The other leak that I mentioned in my previous comment is still present, though.

@jimmy-mcelwain
Copy link
Collaborator

I believe that I have found the second memory leak. The problem is seemingly this line in rcl/init.c. Memory is allocated by strdup, but that memory is never freed. I don't know whose responsibility it is to free that memory, but I added a single line that freed it to our rcl fork. I made a new branch with the change. Using libmicroros built with this change, the memory loss for every cycle other than the first one is 0.

I talked to Ted, and it seems to be an upstream problem, so I'm not suggesting that we merge the branch that I just made into our fork's main, but I wanted to share what I did to get rid of the leak. I think that I will create an issue upstream.

@gavanderhoorn
Copy link
Collaborator Author

Yes, that is an issue upstream. Seeing as the same code is in ros2/rcl, I would suggest posting there (if there isn't already an issue tracking this). Please post a comment here linking to it after you've opened it.

I would however suggest to check whether it's been perhaps already fixed in upstream's Humble, Jazzy or Rolling branches and we / micro-ROS is just lagging those. Just to avoid posting about an already resolved issue.

I'm not suggesting that we merge the branch that I just made

indeed, I don't believe that'd be necessary. The leak is very small and it doesn't affect functionality in MotoROS2. I would suggest we wait until upstream patches it and then we can update to those versions.

@jimmy-mcelwain
Copy link
Collaborator

I created an issue here ros2/rcl#1198

@jimmy-mcelwain
Copy link
Collaborator

Apparently it is the responsibility of the rmw to free that memory. So I opened a new issue here: micro-ROS/rmw_microxrcedds#311.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rv:0.1.0 Reported in MotoROS2 0.1.0 yrc1000
Projects
None yet
Development

No branches or pull requests

2 participants