From 02d7034ea53fb3e9c06b3ecd92e9fd0660158120 Mon Sep 17 00:00:00 2001 From: Frederik Lotter Date: Thu, 2 May 2024 08:24:00 +0200 Subject: [PATCH] fix(checkstate): ignore and abort carryover changes (#415) Addresses https://github.com/canonical/pebble/issues/414. Running checks have changes that get persisted by the state-engine. This means that following a reboot, changes and tasks that are not complete (not ready) will be resumed. In the case of some managers, such as checkstate, carryover changes and tasks are an unwanted side effect. Currently the plan manager will perform the first plan load very early during startup (before the state-engine is ready, and before StartUp hooks have been called). The result is that a PlanChanged propagation will take place at plan load, and force checkstate to inspect the running checks prematurely. Checkstate discovers a running change from a previous boot context, and tries to load its data from cached state, which does not exist. Gracefully ignore changes for which cached state does not exist, and use this to identify changes that should be aborted on the first ensure pass. Test: ``` 59 Hold today at 23:29 SAST today at 23:37 SAST Recover exec check "internet-online" 60 Error today at 23:37 SAST today at 23:37 SAST Perform exec check "internet-online" : 66 Doing today at 23:37 SAST - Recover exec check "internet-online" : 66 Hold today at 23:37 SAST today at 23:41 SAST Recover exec check "internet-online" : 69 Error today at 23:41 SAST today at 23:42 SAST Perform exec check "internet-online" : 73 Doing today at 23:42 SAST - Recover exec check "internet-online" ``` As the test demonstrates, following a reboot, the change (66) is aborted, and as a result the status is now ```Hold```. A new change is created which starts as a ```Perform```, and then fails (as expected) after a while and changes to a ```Recover```. --- internals/overlord/checkstate/manager.go | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/internals/overlord/checkstate/manager.go b/internals/overlord/checkstate/manager.go index efdcf71ac..45fcf54d3 100644 --- a/internals/overlord/checkstate/manager.go +++ b/internals/overlord/checkstate/manager.go @@ -122,7 +122,14 @@ func (m *CheckManager) PlanChanged(newPlan *plan.Plan) { } else { configKey = recoverConfigKey{change.ID()} } - oldConfig := m.state.Cached(configKey).(*plan.Check) // panic if key not present (always should be) + v := m.state.Cached(configKey) + if v == nil { + // Pebble restarted, and this change is a carryover. + change.Abort() + shouldEnsure = true + continue + } + oldConfig := v.(*plan.Check) existingChecks[oldConfig.Name] = true newConfig, inNew := newPlan.Checks[details.Name]