[teleport-update] Add support for systemd process management #49102
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds more complete systemd process management to the
teleport-update
command.This include automatically writing/updating a timer and service file for the updater, using the new
teleport-update
binary, during upgrades. This validates the new updater is a valid executable on the target platform, and ensures that the timer/service file match the new updater. Note that the hidden--self-setup
flag can be passed to exec the same binary, if necessary.Additionally, this PR adding support for detection of various failures during upgrades.
teleport-update
will rollback the agent immediately in these cases.This is accomplished by monitoring
/run/teleport.pid
for changes that indicate different failure modes. For example, if Teleport crashes after a soft reload, systemd is unaware, and a stale PID is present with no running process. Alternatively, if Teleport crashes after a hard restart, the PID file is rapidly created/removed with different PID values. Other cases, such as hanging on quit, are covered as well. This catches fatal errors in new versions, as well as client-too-new errors.Notably, connection failures, including clients rejected by the server for being outdated, do not trigger a revert.
This is the seventh in a series of PRs implementing
teleport-update
:Link Command: #48712
Update Command: #48244
Reloading with rollbacks: #47929
Linking: #47879
Enable Command: #47565
Initial scaffolding PR: #46418
The
teleport-update
binary will be used to enable, disable, and trigger automatic Teleport agent updates. The new auto-updates system manages a local installation of the cluster-specified version of Teleport stored in/var/lib/teleport/versions
.RFD: #47126
Goal (internal): https://github.com/gravitational/cloud/issues/10289
Example: Upgrading to v17 on a v16 cluster, with successful rollback.