Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox] Remove pmon delay for certain platforms #19190

Merged
merged 2 commits into from
Jun 6, 2024

Conversation

stepanblyschak
Copy link
Collaborator

@stepanblyschak stepanblyschak commented Jun 4, 2024

Why I did it

For doing fast-reboot with CMIS active modules.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Removed PMON delay for SKUs with support for CMIS active modules

How to verify it

Run fast-reboot test

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305
  • [] 202311

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@stepanblyschak stepanblyschak requested a review from lguohan as a code owner June 4, 2024 07:18
@keboliu keboliu requested a review from prgeor June 6, 2024 06:31
@liat-grozovik liat-grozovik changed the title [nvidia] Remove pmon delay for certain platforms [Mellanox] Remove pmon delay for certain platforms Jun 6, 2024
@liat-grozovik
Copy link
Collaborator

@yxieca can you help to handle ms_conflict and re trigger?

@liat-grozovik
Copy link
Collaborator

@stepanblyschak please check #18926 and if this PR need to be updated now with it?cccccbkv

@yxieca
Copy link
Contributor

yxieca commented Jun 6, 2024

/azpw ms_conflict

@yxieca yxieca merged commit 6968aaa into sonic-net:master Jun 6, 2024
19 of 20 checks passed
@yuazhe yuazhe mentioned this pull request Jun 11, 2024
11 tasks
@bingwang-ms
Copy link
Contributor

@vaibhavhd @prgeor Is this a must have for 202405? The change is on all Mellanox platform.

arun1355492 pushed a commit to arun1355492/sonic-buildimage that referenced this pull request Jul 26, 2024
* [nvidia] Remove pmon delay for certain platforms

Signed-off-by: Stepan Blyschak <[email protected]>
@dprital
Copy link
Collaborator

dprital commented Jul 31, 2024

@bingwang-ms , @vaibhavhd , @prgeor - Can you please cherry pick to 202405 ?

@bingwang-ms
Copy link
Contributor

Discussed offline, we are good to cherry-pick it.

mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Jul 31, 2024
* [nvidia] Remove pmon delay for certain platforms

Signed-off-by: Stepan Blyschak <[email protected]>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #19754

mssonicbld pushed a commit that referenced this pull request Aug 1, 2024
* [nvidia] Remove pmon delay for certain platforms

Signed-off-by: Stepan Blyschak <[email protected]>
@chiourung
Copy link
Contributor

This only works for the Mellanox platform. For other platforms, it would always delay pmon and cause a problem. sonic-net/sonic-platform-daemons#531.
How about to remove delay for pmon and include #18907 to enhance the performance?

@stepanblyschak
Copy link
Collaborator Author

@chiourung The issue is know and a day one xcvrd issue - #17943.
Please do not remove pmon delay for nvidia platforms unless reviewed by nvidia.

Even if we remove the delay it does not fix the problem but makes it very unlikelly, need to fix a race condition

liat-grozovik pushed a commit that referenced this pull request Nov 20, 2024
- Why I did it
After this pull request #19190 , the pmon has been added to the start list in fast/warm reboot scenarios. However, certain non-critical daemons of pmon could be delayed, resulting in a saving of approximately 1 second in the reboot process. For performance considerations, especially as the current time usage of fast reboot is closer to 30 seconds limitation, this change could ease the pressure.

- How I did it
add a script as fast/warm reboot monitor and relative supervisord rlues.
once the script exited means the reboot process has ended, other delayed daemon would then initialize.

- How to verify it
check the fast/warm reboot time usage

Signed-off-by: Yuanzhe, Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants