Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ps cmd should ignore dead runners #1050

Merged
merged 4 commits into from
Mar 12, 2024
Merged

Conversation

matt2e
Copy link
Collaborator

@matt2e matt2e commented Mar 8, 2024

fixes #1046

Runners that were found to have died (rather than cleanly being killed) end up with deployment_id & module_name not set to null.
Changes:

  • removes all dead runners from showing in the ps command
  • correctly updates deployment_id and module_name to null in the following cases:
    • Runner terminates unexpectedly
    • Runner is stale

@alecthomas
Copy link
Collaborator

Are these runners getting reaped eventually, and if so does the reaping correctly set the module_name to NULL eventually? If not it seems like we should fix that?

@@ -160,7 +160,7 @@ SELECT d.min_replicas,
r.endpoint,
r.labels AS runner_labels
FROM deployments d
LEFT JOIN runners r on d.id = r.deployment_id
LEFT JOIN runners r on d.id = r.deployment_id AND r.state = 'assigned'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm actually if we just want to ignore dead runners, this probably isn't the right solution. We probably do want != 'dead' so that it will also pick up runners that are reserved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure I'm right there, so it would be worth verifying if that does what we want...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will fix

@matt2e
Copy link
Collaborator Author

matt2e commented Mar 8, 2024

Are these runners getting reaped eventually, and if so does the reaping correctly set the module_name to NULL eventually? If not it seems like we should fix that?

Looking into why it's not being set to null.
I don't think the reap job touches these runners because they are set to dead immediately, and the SQL for that filters by <> dead

@matt2e
Copy link
Collaborator Author

matt2e commented Mar 8, 2024

It's the deregister runner code that is setting it to dead in the db without updating deployment_id and module_name. I've updated the sql to amend that.

But I think the reap case will also end up those columns not nulled out... I'll set this PR back to draft and try recreating that case next week

@matt2e matt2e marked this pull request as draft March 8, 2024 06:08
@matt2e matt2e force-pushed the matt2e/dead-runners-in-ps-cmd branch from d6f9b32 to ef7ba66 Compare March 11, 2024 23:53
@matt2e
Copy link
Collaborator Author

matt2e commented Mar 11, 2024

Handled the above cases and updated PR description

@matt2e matt2e marked this pull request as ready for review March 11, 2024 23:57
@matt2e matt2e requested a review from alecthomas March 11, 2024 23:58
@matt2e matt2e force-pushed the matt2e/dead-runners-in-ps-cmd branch from ef7ba66 to 5cd2373 Compare March 11, 2024 23:58
@matt2e matt2e merged commit 11a63b1 into main Mar 12, 2024
11 checks passed
@matt2e matt2e deleted the matt2e/dead-runners-in-ps-cmd branch March 12, 2024 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ps cmd can show inaccurate state
2 participants