Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[teleport-update] Add enable command #47565

Merged
merged 24 commits into from
Oct 21, 2024
Merged

Conversation

sclevine
Copy link
Member

@sclevine sclevine commented Oct 15, 2024

This PR adds the enable subcommand to the teleport-update binary specified by #40190.

The teleport-update binary will be used to enable, disable, and trigger automatic Teleport agent updates. The new auto-updates system manages a local installation of the cluster-specified version of Teleport stored in /var/lib/teleport/versions.

This is the second in a series of PRs extracted and tested from this proof-of-concept: #46357
Initial scaffolding PR: #46418

This PR is missing a number of features necessary for the teleport-update binary to ship, including:

  • Old version cleanup
  • Restarting the Teleport agent

RFD: #47126
Goal (internal): https://github.com/gravitational/cloud/issues/10289


root@7e41707a19a4:/teleport# ./tool/teleport-update/teleport-update enable -h
usage: teleport-updater enable [<flags>]

Enable agent auto-updates and perform initial update.

Flags:
  -d, --[no-]debug  Verbose logging to stdout.
      --data-dir    Teleport data directory. Access to this directory should be limited.
      --log-format  Controls the format of output logs. Can be `json` or `text`. Defaults to `text`.
  -p, --proxy       Address of the Teleport Proxy.
  -g, --group       Update group for this agent installation.
  -t, --template    Go template used to override Teleport download URL.
  -f, --version     Force the provided version instead of querying it from the Teleport cluster.

Example:

root@7e41707a19a4:/teleport# ./tool/teleport-update/teleport-update enable --data-dir=./temp/ --proxy=levine.teleport.sh
2024-10-15T01:19:20Z INFO [UPDATER]   Downloading Teleport tarball. url:https://cdn.teleport.dev/teleport-v16.3.0-linux-arm64-bin.tar.gz size:153895772 autoupdate/installer.go:240
2024-10-15T01:19:25Z INFO [UPDATER]   Extracting Teleport tarball. path:temp/versions/16.3.0 size:615014400 autoupdate/installer.go:277
2024-10-15T01:19:29Z INFO [UPDATER]   Target version successfully installed. version:16.3.0 autoupdate/agent.go:242
2024-10-15T01:19:29Z INFO [UPDATER]   Configuration updated. autoupdate/agent.go:251
root@7e41707a19a4:/teleport# ./tool/teleport-update/teleport-update enable --data-dir=./temp/ --proxy=levine.teleport.sh
2024-10-15T01:19:33Z INFO [UPDATER]   Target version already installed. version:16.3.0 autoupdate/agent.go:244
2024-10-15T01:19:33Z INFO [UPDATER]   Configuration updated. autoupdate/agent.go:251
root@7e41707a19a4:/teleport# ls temp/versions/
16.3.0  update.yaml
root@7e41707a19a4:/teleport# cat temp/versions/update.yaml
version: v1
kind: update_config
spec:
    proxy: levine.teleport.sh
    group: ""
    url_template: ""
    enabled: true
status:
    active_version: 16.3.0
root@7e41707a19a4:/teleport# ./tool/teleport-update/teleport-update disable --data-dir=./temp/
root@7e41707a19a4:/teleport# cat temp/versions/update.yaml
version: v1
kind: update_config
spec:
    proxy: levine.teleport.sh
    group: ""
    url_template: ""
    enabled: false
status:
    active_version: 16.3.0

@sclevine sclevine added the no-changelog Indicates that a PR does not require a changelog entry label Oct 15, 2024
@sclevine sclevine requested review from vapopov and hugoShaka October 15, 2024 01:28
lib/autoupdate/agent.go Outdated Show resolved Hide resolved
lib/autoupdate/agent.go Outdated Show resolved Hide resolved
lib/autoupdate/agent.go Outdated Show resolved Hide resolved
lib/autoupdate/agent.go Outdated Show resolved Hide resolved
lib/autoupdate/installer.go Outdated Show resolved Hide resolved
lib/autoupdate/agent.go Outdated Show resolved Hide resolved
lib/autoupdate/agent.go Outdated Show resolved Hide resolved
lib/autoupdate/installer.go Outdated Show resolved Hide resolved
@sclevine sclevine requested a review from hugoShaka October 16, 2024 21:19
@sclevine sclevine requested a review from vapopov October 16, 2024 21:19
Copy link
Contributor

@vapopov vapopov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

)

var (
featureFlag int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would probably need to move this one to lib/autoupdate level, since both agent and tools packages use this variable.
Also updateVersionEnvVar = "TELEPORT_UPDATE_VERSION" kind of common
Can do it in mine PR after merging this one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good for now. Eventually I think the agent updater should query the cluster for this.

Copy link
Contributor

@bl-nero bl-nero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change, I like how easy it is to understand everything because of appropriate use of comments and overall clear structure.

// cdnURITemplate is the default template for the Teleport tgz download.
cdnURITemplate = "https://cdn.teleport.dev/teleport{{if .Enterprise}}-ent{{end}}-v{{.Version}}-{{.OS}}-{{.Arch}}{{if .FIPS}}-fips{{end}}-bin.tar.gz"
// reservedFreeDisk is the minimum required free space left on disk during downloads.
reservedFreeDisk = 10_000_000 // 10 MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we choose this value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arbitrarily 🙂

It might be better to use a percentage of the total space along with a maximum. I'll expand on the comment and add a TODO.

lib/autoupdate/agent/updater.go Outdated Show resolved Hide resolved
// UpdateStatus describes the status field in update.yaml.
type UpdateStatus struct {
// ActiveVersion is the currently active Teleport version.
ActiveVersion string `yaml:"active_version"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we prevent this from drifting away from the actual state? I guess we shouldn't rely on a running teleport binary to report its own version, since if an agent crashes, there's nothing to query, but I'd recommend making an effort to update this every time an agent starts (unless it's something you've already thought about.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, active_version is the only source of truth. This will become a problem in the next PR, which links the installation into /usr/local/bin, creating two definitions of active.

This is tricky to solve, since there's no way to link multiple binaries into /usr/local/bin atomically in a way that's resilient to, e.g., power loss. The strategy in the next PR will be to write active_version only after the install and linking is successful. That way, drift should only happen where the installed version is newer (or partially newer) than the active_version, which will get corrected by the updater on the next invocation (by checking the symlink paths).

This file is only written by teleport-update. teleport, tbot, and other agent-like processes that run concurrently with teleport-update are only intended to read from it.

// Installer provides an API for installing Teleport agents.
type Installer interface {
// Install the Teleport agent at version from the download template.
Install(ctx context.Context, version, template string) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to pass the template from outside? If we create an installer knowing the existing config, perhaps the installer itself could already be aware of the template string it should use.

Copy link
Member Author

@sclevine sclevine Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The download URL template is also persisted into update.yaml (in the case that it is overridden by the user). It could be confusing to pass the template to Enable for storage, but not use Enable's copy of the value during the installation that Enable initiates.

Remove(ctx context.Context, version string) error
}

// UserConfig contains user overrides for installing Teleport agents.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use "command line flags" instead of "user overrides" in this comment? It may be even better to rename the struct name to something like that.

Unless, of course, this structure can/will be filled from something other than CLI flags.

Copy link
Member Author

@sclevine sclevine Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, meant to rename this after refactoring. Trying to keep all of the CLI-related terminology in tool/teleport-update, but this should be named something like OverrideConfig

lib/autoupdate/agent/updater_test.go Outdated Show resolved Hide resolved
package agent

func init() {
featureFlag |= flagEnt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the first time I see something like this. How exactly does it work? Is there some sort of conditional compilation built into our makefiles?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from #47466. Both the client and agent packages will use the same flag set from lib/autoupdate in an upcoming PR. The build flags should be set when compiled via teleport.e. See the top of the file:

//go:build webassets_ent
// +build webassets_ent

@vapopov I'm noticing now that this flag is only used in webassets_embed_ent.go, and the webassets prefix is confusing in this non-web-asset context. Would it make sense to look into making this flag more generic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I previously mentioned about this one #46587 (comment) since we don't have proper build flag for ent

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the modules package for this instead of making new flags? This package can already tell you if the binary is OSS, Enterprise, FIPS, etc.

Copy link
Member Author

@sclevine sclevine Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we open a separate PR that renames the build flag so that it can be used in a wider context? (race condition on the Github comments)

Copy link
Member Author

@sclevine sclevine Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think modules.GetModules().IsEnterpriseBuild() will work as-is for teleport-update, since it doesn't have a separate tool/teleport-update target in teleport.e that runs modules.SetModules().

That said, we actually want the edition of the agent download to come from the cluster via /webapi/find before this ships, so I'll just default it to false and add a TODO for now.

@vapopov you should be able to modules for tsh and tctl, but tbot may not work for the reason mentioned.

lib/autoupdate/agent/installer.go Outdated Show resolved Hide resolved
return trace.Errorf("mismatched checksum, download possibly corrupt")
}
// Get uncompressed size of the tgz
n, err := uncompressedSize(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since we don't use n anywhere else, perhaps checking for uncompressed size should belong to the extract function. Up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this, but ended up leaving it out to simplify the contract between Install and helpers like extract. If we calculate the uncompressed size inside extract, we need an io.ReadSeeker, and I wanted to keep all of the seeking in the same function.

lib/autoupdate/agent/installer.go Outdated Show resolved Hide resolved
@bl-nero
Copy link
Contributor

bl-nero commented Oct 18, 2024

@sclevine This comment is probably the only serious one here, the rest of it is minor stuff.

@sclevine sclevine requested a review from bl-nero October 18, 2024 18:20
return nil
}
li.Log.WarnContext(ctx, "Removing version that does not match checksum.", "version", version)
if err := li.Remove(ctx, version); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will remove the active installation if the checksum does not match. In the next PR, Remove will error if we try to remove a linked (and therefore active) installation.

@public-teleport-github-review-bot public-teleport-github-review-bot bot removed the request for review from probakowski October 21, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Indicates that a PR does not require a changelog entry size/lg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants