-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
redis omem leaking issue on T2 supervisor #20680
Comments
Issue is not seen in last few runs in Cisco and MSFT testbed. is there a way to map this client data to the client process? The id / fd from here were not very helpful to pinpoint the client.
|
quick update: The client connections which are leaking mem are from snmp docker. I see 100+ client connections from snmp and restarting process like thermalctld in pmon causes the omem increase in snmp connections. |
@SuvarnaMeenakshi : can you help looking into this. |
Hi @SuvarnaMeenakshi , did you get a chance to look into this issue? Thanks |
I have confirmed and know the RCA of the memory leaking, is drafting the fix PR.
and narrow down to the command
Can you confirm the command is the only and shared trigger of the 3 test modules? |
offline synced with Chenyang and Shawn, |
The redis memory leaking is caused by 2 issues., more details in the 2 issues and fixes snmpagentsnmpagent has a memory leak issue, it will be triggered when an never-autorecovered exception happensIssue: Redis memory leak risk in PhysicalEntityCacheUpdater #342Fix: Fix redis memory leak issue in PhysicalEntityCacheUpdater #343pmonpmon on chassis will enter a wrong state that won't auto-recover, which triggers the memory leakingIssue: [chassis] PSU keys(generated by psud) got removed by the restart of thermalctld and won't auto recover. #575Fix: [chassis][psud] Move the PSU parent information generation to the loop run function from the initialization function #576 |
the issue can be closed after the 2 fix PRs got merged |
Description
We are seeing memory leaking issue on T2 Supervisor when running nightly test, which caused the redis memory keeps increasing until it fails sanity_check in sonic-mgmt.
Following is one of the log which memory sanity_check threshold.
The memory leaking was seen after running one of the following 3 modules. once the total_omem becomes non-zero, it will keep increasing until it's over the threshold.
Steps to reproduce the issue:
Describe the results you received:
testbed fails sanity check due to omem over threshold after running nightly test on T2 testbed.
Describe the results you expected:
redis omem should be released after usage, should not keep increasing.
Output of
show version
:Output of
show techsupport
:When running
system_health/test_system_health.py
test:At the beginning of the test:
At the end of the test:
This symptom is observed for all 3 test cases so far:
Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: