-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[eventd] Add incremental polling when waiting for capture service to start #18138
Conversation
@zbud-msft Can you please check if we can cherrypick this fix to 202305? If not can you pelase create a separate PR? |
@@ -547,8 +548,9 @@ capture_service::set_control(capture_control_t ctrl, event_serialized_lst_t *lst | |||
case INIT_CAPTURE: | |||
m_thr = thread(&capture_service::do_capture, this); | |||
for(int i=0; !m_cap_run && (i < CAPTURE_SERVICE_POLLING_RETRIES); ++i) { | |||
/* Wait max a second for thread to init */ | |||
this_thread::sleep_for(chrono::milliseconds(CAPTURE_SERVICE_POLLING_DURATION)); | |||
/* Poll to see if thread has been init, if so exit early. Add delay on every attempt */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added logic that eventd will not exit when capture service fails to initialize; warning logs are added, and main proxy/zmq service will continue to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closed.
hi @qiluo-msft @zbud-msft could you update the ADO? Is this a must to have for 202305? |
@zbud-msft This PR has conflict with 202305. Can you please check and if required create a separate PR for 202305? |
@dgsudharsan Currently depending on #18249 |
@yxieca Can you please help to cherry-pick this fix to 202311? |
…start (sonic-net#18138) ### Why I did it Addresses sonic-net#17350 ### How I did it Instead of a 1 second delay, we poll to check that the thread is available and after each poll increment the delay. There were situations where if there was less memory available, fixed polling would not be effective for starting zmq capture service. Add an incremental delay such that eventd can wait longer to start up capture service if system is too busy or overloaded, but still keep a max duration/retry limit so that we do not wait forever. #### How to verify it UT
Cherry-pick PR to 202311: #18552 |
…start (#18138) ### Why I did it Addresses #17350 ### How I did it Instead of a 1 second delay, we poll to check that the thread is available and after each poll increment the delay. There were situations where if there was less memory available, fixed polling would not be effective for starting zmq capture service. Add an incremental delay such that eventd can wait longer to start up capture service if system is too busy or overloaded, but still keep a max duration/retry limit so that we do not wait forever. #### How to verify it UT
Why I did it
Addresses #17350
Work item tracking
How I did it
Instead of a 1 second delay, we poll to check that the thread is available and after each poll increment the delay. There were situations where if there was less memory available, fixed polling would not be effective for starting zmq capture service. Add an incremental delay such that eventd can wait longer to start up capture service if system is too busy or overloaded, but still keep a max duration/retry limit so that we do not wait forever.
How to verify it
UT
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)