Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"RebootConfig = always" logic in pre and post-patch seems wrong #755

Open
chrisboulton opened this issue Dec 16, 2024 · 0 comments
Open

Comments

@chrisboulton
Copy link

chrisboulton commented Dec 16, 2024

We recently had an incident where an old version of a daemon was running in memory while the new version had been installed on disk (during an OS patch run). This was odd to me, because we configure all of our OS patch jobs with RebootConfig = always (https://cloud.google.com/compute/docs/osconfig/rest/v1/PatchConfig#rebootconfig):

ALWAYS: Always reboot the machine after the update completes.

Here's the transcript of OSConfigAgent:

OSConfigAgent[517]: 2024-12-03T02:43:03.9463Z OSConfigAgent Info: Beginning ExecStepTask
OSConfigAgent[517]: 2024-12-03T02:43:23.9832Z OSConfigAgent Info: Command exit code: 0, out:
OSConfigAgent[517]: 2024-12-03T02:43:24.0294Z OSConfigAgent Info: Successfully completed ApplyConfigTask
OSConfigAgent[517]: 2024-12-03T02:43:34.4493Z OSConfigAgent Info: Beginning ApplyPatchesTask
OSConfigAgent[517]: 2024-12-03T02:43:34.4845Z OSConfigAgent Info: System indicates a reboot is required.
OSConfigAgent[517]: 2024-12-03T02:44:04.6225Z OSConfigAgent Warning: Error waiting for task (attempt 1 of 10): rpc error: code = Canceled desc = context canceled
OSConfigAgent[517]: 2024-12-03T02:44:04.6227Z OSConfigAgent Info: OSConfig Agent (version 20240926.03-g1) shutting down.

-- reboot happens here --

OSConfigAgent[518]: 2024-12-03T02:44:41.2684Z OSConfigAgent Info: OSConfig Agent (version 20240926.03-g1) started.
OSConfigAgent[518]: 2024-12-03T02:44:44.5830Z OSConfigAgent Info: Beginning ApplyPatchesTask
OSConfigAgent[518]: 2024-12-03T02:45:02.8319Z OSConfigAgent Info: Updating 2 packages: ["tzdata all 2024b-0+deb12u1" "vault x86_64 1.18.2-1"]
OSConfigAgent[518]: 2024-12-03T02:45:28.1150Z OSConfigAgent Info: Success. Updated 2 packages: ["tzdata all 2024b-0+deb12u1" "vault x86_64 1.18.2-1"]
OSConfigAgent[518]: 2024-12-03T02:45:28.1151Z OSConfigAgent Info: System indicates a reboot is not required.
OSConfigAgent[518]: 2024-12-03T02:45:28.1544Z OSConfigAgent Info: Successfully completed ApplyPatchesTask
OSConfigAgent[518]: 2024-12-03T02:45:35.4709Z OSConfigAgent Info: Beginning ExecStepTask
OSConfigAgent[518]: 2024-12-03T02:45:36.7917Z OSConfigAgent Info: Writing inventory to guest attributes
OSConfigAgent[518]: 2024-12-03T02:46:04.4407Z OSConfigAgent Info: Command exit code: 0, out:

You can see what's happening here: the system has the need restart/reboot flag set (the OS managed one), so before the OS patching the system is rebooted. The system comes back, and performs the requested patching operation (two packages installed). After, the OS flag for requiring a reboot is not set -- which is fine, these packages are not kernel (and friends) updates so don't set that flag.

There is no reboot after the instance is patched.

My expectation based on the GCP docs for our OS patch job configuration is that the instance must ALWAYS be rebooted post-patch.

The problem appears to be with the logic here:

if r.Task.GetPatchConfig().GetRebootConfig() == agentendpointpb.PatchConfig_ALWAYS && !prePatch && r.RebootCount == 0 {

I'm struggling with r.RebootCount == 0 here -- it seems like if prior to patching the system indicated a reboot was required, one was performed (which sets RebootCount += 1), and then during a post-patch reboot check, even if the reboot flag is set to always the system won't be rebooted. Either the documentation is wrong (in which case I am wondering how we configure patching to always reboot), or the logic here should more appropriately be:

if r.Task.GetPatchConfig().GetRebootConfig() == agentendpointpb.PatchConfig_ALWAYS && !prePatch {

(ie, it should not consider if a reboot was already performed).

Steps to Reproduce

  1. Create an OS patch job with RebootConfig = always
  2. On a targeted instance, set the instance level need restart flag (/var/run/reboot-required on Linux)
  3. Perform a patch run
  4. Observe the instance is rebooted pre-patch
  5. Observe the instance is not rebooted post-patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant