Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEVPROD-5498 Do not consider elapased communication time during group teardown #8506

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

hadjri
Copy link
Contributor

@hadjri hadjri commented Nov 22, 2024

DEVPROD-5498

Description

Currently, hosts can get hit with an idle timeout due to lack of communication with the Evergreen app server if their teardown group is sufficiently long, because the teardown group happens after task completion, so no heartbeat signals are sent during that period.

Since the change introduced in #7635 adds protections against long-running teardown groups, it should be sufficient to skip the last communicated time in the idle host check if a host is actively tearing down a task group.

Testing

Tested in staging (executions 3 vs 4 for reference) and confirmed that without the change, a host that just ran a task group with a long teardown group is unable to pick up more tasks afterwards because it immediately gets marked by the idle termination job, whereas after the change, said host is able to continue picking up tasks.

@hadjri hadjri requested a review from a team November 22, 2024 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant