-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix redis memory leak issue in PhysicalEntityCacheUpdater #343
Merged
SuvarnaMeenakshi
merged 2 commits into
sonic-net:master
from
yejianquan:jianquanye/fix_memory_leak
Dec 18, 2024
Merged
Fix redis memory leak issue in PhysicalEntityCacheUpdater #343
SuvarnaMeenakshi
merged 2 commits into
sonic-net:master
from
yejianquan:jianquanye/fix_memory_leak
Dec 18, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
yejianquan
force-pushed
the
jianquanye/fix_memory_leak
branch
from
December 17, 2024 04:19
5463f95
to
a8c0cfb
Compare
yejianquan
force-pushed
the
jianquanye/fix_memory_leak
branch
from
December 17, 2024 08:38
d77c535
to
1aff8bf
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Removed all the sonicbld comment due to adding new test cases. |
Hi @Junchao-Mellanox @liuh-80 @SuvarnaMeenakshi , could you help to review this PR? |
@qiluo-msft for viz |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
SuvarnaMeenakshi
approved these changes
Dec 18, 2024
mssonicbld
pushed a commit
to mssonicbld/sonic-snmpagent
that referenced
this pull request
Dec 18, 2024
) - What I did Fixes the redis memory leak bug: sonic-net#342 There's chance that the physical_entity_updater creates subscriptions to redis, and never consume the messages due to exceptions. Then the memory buffer(omem) of redis client starts to increase and never end, redis memory leaks. The reason is all 5 physical entity cache updaters inherit from PhysicalEntityCacheUpdater. In the first update_data, they initialize the psubscription to redis database. self.pub_sub_dict[db_index] = mibs.get_redis_pubsub(db, db.STATE_DB, self.get_key_pattern()) And everytime when the update_data is called again, it get the message from the psub and process. msg = pubsub.get_message() And outside, in the logic of the MIBUpdater, it calls update_data more frequently than reinit_data. A side-effect is, if reinit_data failed forever, the update_counter will not been cleaned, then update_data will not be called forever. self.update_counter = 0 So the problem is, at the begining, the psub is created at the first update_data and all things work well, until an unrecoverable issue happened, PHYSICAL_ENTITY_INFO|PSU * missing in the database (it's a pmon issue) This causes both reinit_data and update_data to be failed, because all of them finally call _update_per_namespace_data, which tries to cast an empty string '' to int and raises ValueError. Then the update_data is not called forever, because reinit_data will never success. But the previously established psubscription is still there, and no one gonna to consume it(the update_data is blocked), then Redis database memory starts to slowly leak. - How I did it Catch the exception during the loop of reinit_data, make sure the reinit_data of every physical_entity_updater will be called Clear message and cancel the subscription in the reinit_data, avoid the message accumulates in the redis subscription queue - How to verify it Tested on Cisco chassis, the memory is not leaking anymore.
Cherry-pick PR to 202405: #344 |
mssonicbld
pushed a commit
that referenced
this pull request
Dec 18, 2024
- What I did Fixes the redis memory leak bug: #342 There's chance that the physical_entity_updater creates subscriptions to redis, and never consume the messages due to exceptions. Then the memory buffer(omem) of redis client starts to increase and never end, redis memory leaks. The reason is all 5 physical entity cache updaters inherit from PhysicalEntityCacheUpdater. In the first update_data, they initialize the psubscription to redis database. self.pub_sub_dict[db_index] = mibs.get_redis_pubsub(db, db.STATE_DB, self.get_key_pattern()) And everytime when the update_data is called again, it get the message from the psub and process. msg = pubsub.get_message() And outside, in the logic of the MIBUpdater, it calls update_data more frequently than reinit_data. A side-effect is, if reinit_data failed forever, the update_counter will not been cleaned, then update_data will not be called forever. self.update_counter = 0 So the problem is, at the begining, the psub is created at the first update_data and all things work well, until an unrecoverable issue happened, PHYSICAL_ENTITY_INFO|PSU * missing in the database (it's a pmon issue) This causes both reinit_data and update_data to be failed, because all of them finally call _update_per_namespace_data, which tries to cast an empty string '' to int and raises ValueError. Then the update_data is not called forever, because reinit_data will never success. But the previously established psubscription is still there, and no one gonna to consume it(the update_data is blocked), then Redis database memory starts to slowly leak. - How I did it Catch the exception during the loop of reinit_data, make sure the reinit_data of every physical_entity_updater will be called Clear message and cancel the subscription in the reinit_data, avoid the message accumulates in the redis subscription queue - How to verify it Tested on Cisco chassis, the memory is not leaking anymore.
mssonicbld
added
Included in 202405 Branch
and removed
Created PR to 202405 Branch
labels
Dec 18, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
- What I did
Fixes the redis memory leak bug:
#342
There's chance that the physical_entity_updater creates subscriptions to redis, and never consume the messages due to exceptions.
Then the memory buffer(omem) of redis client starts to increase and never end, redis memory leaks.
The reason is all 5 physical entity cache updaters inherit from PhysicalEntityCacheUpdater.
In the first update_data, they initialize the psubscription to redis database.
sonic-snmpagent/src/sonic_ax_impl/mibs/ietf/rfc2737.py
Line 667 in c5301b2
And everytime when the update_data is called again, it get the message from the psub and process.
sonic-snmpagent/src/sonic_ax_impl/mibs/ietf/rfc2737.py
Line 678 in c5301b2
And outside, in the logic of the MIBUpdater, it calls update_data more frequently than reinit_data.
A side-effect is, if reinit_data failed forever, the update_counter will not been cleaned, then update_data will not be called forever.
sonic-snmpagent/src/ax_interface/mib.py
Line 43 in c5301b2
So the problem is, at the begining, the psub is created at the first update_data and all things work well, until an unrecoverable issue happened,
PHYSICAL_ENTITY_INFO|PSU *
missing in the database (it's a pmon issue)This causes both reinit_data and update_data to be failed, because all of them finally call _update_per_namespace_data, which tries to cast an empty string '' to int and raises ValueError.
Then the update_data is not called forever, because reinit_data will never success.
But the previously established psubscription is still there, and no one gonna to consume it(the update_data is blocked), then Redis database memory starts to slowly leak.
- How I did it
- How to verify it
Tested on Cisco chassis, the memory is not leaking anymore.
- Description for the changelog