Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make shutdown action return meaningful exit code #336

Merged
merged 7 commits into from
Nov 30, 2023

Conversation

benhoyt
Copy link
Contributor

@benhoyt benhoyt commented Nov 27, 2023

This PR makes Pebble shutdowns due to on-failure: shutdown return exit code 10 instead of always returning exit code 0. The code 10 was chosen to keep it "a few away" from other Pebble exit codes.

Similarly, if on-check-failure is used with shutdown, return exit code 11 instead of always returning code 0. (The 11 to distinguish a "check failure" shutdown from a "service failure" shutdown.

It also adds support for on-failure: success-shutdown (also for on-check-failure) and on-success: failure-shutdown, which reverses the "polarity" of the exit code for those cases.

Both of these are as per spec DA062.

Fixes #324.

Also add support for "on-failure: success-shutdown" and
"on-success: failure-shutdown".

Fixes canonical#324. Implements spec DA062.
@benhoyt benhoyt requested review from cjdcordeiro, pedronis, flotter and hpidcock and removed request for pedronis and flotter November 27, 2023 00:36
Copy link
Collaborator

@cjdcordeiro cjdcordeiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested it and it works as expected. The error 10 is propagated to the container runtime. I also have no comments on the code, so it LGTM. thanks

@benhoyt benhoyt requested a review from flotter November 27, 2023 21:52
Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial code read looks promising. Still need to perform some manual QA.

internals/plan/plan.go Outdated Show resolved Hide resolved
Copy link
Contributor

@flotter flotter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. I am not tracking all the use-cases of Pebble, but this PR according to me will have some breaking changes (as pointed out already) with a varying degree of impact depending on the Pebble surrounding environment.

logger.Noticef("WARNING: cannot stop daemon: %v", err)
} else {
return err
}
}

if restartSystem {
if restartType == restart.RestartSystem {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the previous behaviour was that both a RestartDaemon and RestartSystem resulted in issuing a doReboot here (both set the restartSystem flag). I believe this meant that previously any service that had a shutdown action would have caused shutdown -r command to run (and would block here with time.Sleep(waitTimeout) for that to issue a system reboot). Just checking this is intended, because without the reboot action the RestartDaemon restart type would need help from systemd (and therefore needs it) to restart the daemon.

After the change this is now again aligned with how snapd uses RestartDaemon, but of course they render on systemd for everything.

(For KernOS this means that as PID1 we will actually now exit the process, and as a result cause a kernel panic. This is not necessarily a problem, since our kernels are configured to reboot on any PID1 exit).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the previous behaviour was that both a RestartDaemon and RestartSystem resulted in issuing a doReboot here (both set the restartSystem flag)

I believe this is incorrect, as Go's case statement doesn't fall through by default (like C's). That said, I've shuffled the code a little bit to be a bit more like snapd's and as a result it's a bit clearer they're different cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no Fred. Well this was the only worry I had, while not understanding why the code does not match my experience. Merge away!

README.md Outdated Show resolved Hide resolved
internals/daemon/daemon.go Outdated Show resolved Hide resolved
@@ -36,6 +36,7 @@ const (
RestartSystemHaltNow
// RestartSystemPoweroffNow will shutdown --poweroff the system asap
RestartSystemPoweroffNow
RestartServiceFailure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just sharing my mind here - maybe something we can boldly try to clean up a bit in the future ?

The RestartType name has evolved over time it seems to no longer work very well. The fact that we start the type with Restart prefix also does not help make this easy to follow. Seems like the meaning of many of these options are overloaded depending if you take the wider system effect into place (where systemd exists, and will restart Pebble on exit, or not).

My view is this list tries to sometimes include 3 things:

Action, Scope and now also Reason (while having a prefix called Restart)

However, I cannot see an easy way to clean this up without mayor surgery (I read your spec considerations). In an ideal world, I would have referred to this group of actions and the package as shutdown (like the command), and perhaps a simpler list of options:

type ShutdownType int
const (
    ShutdownHaltSystem ShutdownType = iota
    ShutdownRebootSystem
    ShutdownPoweroffSystem
    ShutdownExitDaemon                      // supports exit code
)

type Handler interface {
	HandleShutdown(t ShutdownType, immediate bool, exitcode int)
	RebootIsFine(st *state.State) error
	RebootIsMissing(st *state.State) error
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree this is quite messy! However, I'm going to leave it for future work.

I've made a couple of tweaks to make the code a bit more like snapd's (which is even messier as it has more cases :-), for example renaming the field to requestedRestart.

- rename requestType -> requestedRestart
- structure slightly closer to what snapd has now
And also allow "success-shutdown" for on-check-failure
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
internals/daemon/daemon.go Show resolved Hide resolved
Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for applying the feedback.

@benhoyt benhoyt merged commit 05fe0d6 into canonical:master Nov 30, 2023
15 checks passed
@benhoyt benhoyt deleted the shutdown-exit-code branch November 30, 2023 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pebble shutting down on error return 0 exit code
4 participants