-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow AsyncTasks to indicate if they should not be scheduled during a shutdown #10860
Closed
Closed
Changes from 3 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
3d9ea0b
Silently ignore rejections when threadpools are terminating
peternied 805a98f
Update unit tests
peternied 1613000
Changelog entry
peternied 1b3df2d
Fix spotless issues
peternied cbb8d8d
Revert "Fix spotless issues"
peternied f820030
Revert "Changelog entry"
peternied 5816f8b
Revert "Update unit tests"
peternied 5709c45
Revert "Silently ignore rejections when threadpools are terminating"
peternied 0ffc227
Allow AsyncTasks to indicate if they should not be scheduled during a…
peternied 6b06067
Fix mocked threadpools + add unit tests
peternied 7f2b997
Supress loggerUsageCheck on test method
peternied File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peternied we should not do that: the listeners will be called on rejection to notify the caller that the submission failed, with this change it won't happen anymore, leaving some callbacks in dangling state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reta Thanks for the feedback, I picked an ugly issue to attempt to root cause and fix in the flaky test space - I might be making assumptions that are not well founded.
I understand from the programming purity - this is against the conventions of executors; however, aren't the only cases where we are shutting down executors when we are shutting down nodes? Is there a better way to determine that the cluster is going down so these sources of failures can be ignored?
My goal with this change is to stabilize the test infrastructure assuming this change passes CI, don't we have sufficient coverage to accept this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be the case, but even than we need a clean shutdown, tracing comes immediately as an example here - shutting down the node should gracefully flush the data to collector but with this change, the trace spans won't be flushed since they will be left in dangling state.
We should have coverage but we may end up well in a flaky state. But this is not really about coverage - we break the contract by not calling the callbacks when we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To revisit;
I’ve got two ways forward that I’ll pursue:
The stale state detection task in seg-rep is not-useful while a node is shutting down maybe I can dequeue or swallow the scheduling from that system.
Failing that, I’m going to see if there is a way to keep honoring the contract - if there are callbacks - but not in other scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some places, there are lifecycle check, when node goes down, the lifecycle check may not even proceed with the tasks (I would strongly advocate to not alter the pool behaviour - this is low level tool).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I found a way forward using the first path which is much cleaner in my mind, not 100% sure how I can validate other than get a number of test runs in. Let me know what you think of this.