Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[teleport-update] Add support for systemd process management #49102

Closed
wants to merge 14 commits into from

Conversation

sclevine
Copy link
Member

@sclevine sclevine commented Nov 15, 2024

This PR adds more complete systemd process management to the teleport-update command.

This include automatically writing/updating a timer and service file for the updater, using the new teleport-update binary, during upgrades. This validates the new updater is a valid executable on the target platform, and ensures that the timer/service file match the new updater. Note that the hidden --self-setup flag can be passed to exec the same binary, if necessary.

Additionally, this PR adding support for detection of various failures during upgrades. teleport-update will rollback the agent immediately in these cases.

This is accomplished by monitoring /run/teleport.pid for changes that indicate different failure modes. For example, if Teleport crashes after a soft reload, systemd is unaware, and a stale PID is present with no running process. Alternatively, if Teleport crashes after a hard restart, the PID file is rapidly created/removed with different PID values. Other cases, such as hanging on quit, are covered as well. This catches fatal errors in new versions, as well as client-too-new errors.

Notably, connection failures, including clients rejected by the server for being outdated, do not trigger a revert.

This is the seventh in a series of PRs implementing teleport-update:
Link Command: #48712
Update Command: #48244
Reloading with rollbacks: #47929
Linking: #47879
Enable Command: #47565
Initial scaffolding PR: #46418

The teleport-update binary will be used to enable, disable, and trigger automatic Teleport agent updates. The new auto-updates system manages a local installation of the cluster-specified version of Teleport stored in /var/lib/teleport/versions.

RFD: #47126
Goal (internal): https://github.com/gravitational/cloud/issues/10289

Example: Upgrading to v17 on a v16 cluster, with successful rollback.

Nov 19 02:44:44 legendary-mite systemd[1]: Starting teleport-update.service - Teleport auto-update service...
Nov 19 02:44:45 legendary-mite teleport-update[595947]: 2024-11-19T02:44:45Z INFO [UPDATER]   Update available. Initiating update. target_version:17.0.1 active_version:16.4.7 agent/updater.go:475
Nov 19 02:45:46 legendary-mite teleport-update[595947]: 2024-11-19T02:45:46Z INFO [UPDATER]   Version already present. version:17.0.1 agent/installer.go:153
Nov 19 02:45:46 legendary-mite teleport-update[595947]: 2024-11-19T02:45:46Z INFO [UPDATER]   Executing new teleport-update binary to update configuration. agent/updater.go:185
Nov 19 02:45:46 legendary-mite teleport-update[596859]: 2024-11-19T02:45:46Z INFO [UPDATER]   Systemd configuration synced. agent/process.go:253
Nov 19 02:45:46 legendary-mite teleport-update[596859]: 2024-11-19T02:45:46Z INFO [UPDATER]   Service enabled. unit:teleport-update.timer agent/process.go:270
Nov 19 02:45:46 legendary-mite teleport-update[595947]: 2024-11-19T02:45:46Z INFO [UPDATER]   Finished executing new teleport-update binary. agent/updater.go:187
Nov 19 02:45:47 legendary-mite teleport-update[595947]: 2024-11-19T02:45:47Z INFO [UPDATER]   Systemd configuration synced. agent/process.go:253
Nov 19 02:45:47 legendary-mite teleport-update[595947]: 2024-11-19T02:45:47Z INFO [UPDATER]   Target version successfully installed. target_version:17.0.1 agent/updater.go:568
Nov 19 02:45:47 legendary-mite teleport-update[595947]: 2024-11-19T02:45:47Z INFO [UPDATER]   Gracefully reloaded. unit:teleport.service agent/process.go:110
Nov 19 02:45:47 legendary-mite teleport-update[595947]: 2024-11-19T02:45:47Z INFO [UPDATER]   Monitoring PID file to detect crashes. unit:teleport.service agent/process.go:113
Nov 19 02:45:51 legendary-mite teleport-update[595947]: 2024-11-19T02:45:51Z WARN [UPDATER]   Detected stale PID. unit:teleport.service pid:597038 agent/process.go:194
Nov 19 02:45:57 legendary-mite teleport-update[595947]: 2024-11-19T02:45:57Z ERRO [UPDATER]   Reverting symlinks due to failed restart. agent/updater.go:578
Nov 19 02:45:57 legendary-mite teleport-update[595947]: 2024-11-19T02:45:57Z INFO [UPDATER]   Systemd configuration synced. agent/process.go:253
Nov 19 02:45:57 legendary-mite teleport-update[595947]: 2024-11-19T02:45:57Z ERRO [UPDATER]   [stderr] Job for teleport.service failed. agent/process.go:362
Nov 19 02:45:57 legendary-mite teleport-update[595947]: 2024-11-19T02:45:57Z ERRO [UPDATER]   [stderr] See "systemctl status teleport.service" and "journalctl -xeu teleport.service" for details. agent/process.go:368
Nov 19 02:45:57 legendary-mite teleport-update[595947]: 2024-11-19T02:45:57Z ERRO [UPDATER]   Error running systemctl. args:[reload teleport.service] code:1 agent/process.go:298
Nov 19 02:45:57 legendary-mite teleport-update[595947]: 2024-11-19T02:45:57Z WARN [UPDATER]   Service ungracefully restarted. Connections potentially dropped. unit:teleport.service agent/process.go:108
Nov 19 02:45:57 legendary-mite teleport-update[595947]: 2024-11-19T02:45:57Z INFO [UPDATER]   Monitoring PID file to detect crashes. unit:teleport.service agent/process.go:113
Nov 19 02:46:11 legendary-mite teleport-update[595947]: 2024-11-19T02:46:11Z WARN [UPDATER]   Teleport updater encountered a configuration error and successfully reverted the installation. agent/updater.go:586
Nov 19 02:46:11 legendary-mite teleport-update[595947]: ERROR: failed to start new version "17.0.1" of Teleport: detected crashing process
Nov 19 02:46:11 legendary-mite systemd[1]: teleport-update.service: Main process exited, code=exited, status=1/FAILURE
Nov 19 02:46:11 legendary-mite systemd[1]: teleport-update.service: Failed with result 'exit-code'.
Nov 19 02:46:11 legendary-mite systemd[1]: Failed to start teleport-update.service - Teleport auto-update service.
^C
ubuntu@legendary-mite:~$ ls -la /usr/local/bin/
total 12
drwxr-xr-x  2 root root 4096 Nov 19 02:45 .
drwxr-xr-x 11 root root 4096 Nov 13 01:41 ..
lrwxrwxrwx  1 root root   53 Nov 19 02:45 fdpass-teleport -> /var/lib/teleport/versions/16.4.7/bin/fdpass-teleport
lrwxrwxrwx  1 root root   42 Nov 19 02:45 tbot -> /var/lib/teleport/versions/16.4.7/bin/tbot
lrwxrwxrwx  1 root root   42 Nov 19 02:45 tctl -> /var/lib/teleport/versions/16.4.7/bin/tctl
lrwxrwxrwx  1 root root   46 Nov 19 02:45 teleport -> /var/lib/teleport/versions/16.4.7/bin/teleport
lrwxrwxrwx  1 root root   65 Nov 17 22:29 teleport-update -> /home/ubuntu/mounts/teleport/
ubuntu@legendary-mite:~$ systemctl status teleport
● teleport.service - Teleport Service
     Loaded: loaded (/usr/lib/systemd/system/teleport.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-11-19 03:09:20 UTC; 4min 42s ago

@sclevine
Copy link
Member Author

Split:
#49174
#49175

@sclevine sclevine closed this Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant