-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[teleport-update] Add enable command #47565
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
lib/autoupdate/agent/updater.go
Outdated
) | ||
|
||
var ( | ||
featureFlag int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would probably need to move this one to lib/autoupdate
level, since both agent and tools packages use this variable.
Also updateVersionEnvVar = "TELEPORT_UPDATE_VERSION"
kind of common
Can do it in mine PR after merging this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good for now. Eventually I think the agent updater should query the cluster for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great change, I like how easy it is to understand everything because of appropriate use of comments and overall clear structure.
// cdnURITemplate is the default template for the Teleport tgz download. | ||
cdnURITemplate = "https://cdn.teleport.dev/teleport{{if .Enterprise}}-ent{{end}}-v{{.Version}}-{{.OS}}-{{.Arch}}{{if .FIPS}}-fips{{end}}-bin.tar.gz" | ||
// reservedFreeDisk is the minimum required free space left on disk during downloads. | ||
reservedFreeDisk = 10_000_000 // 10 MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did we choose this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arbitrarily 🙂
It might be better to use a percentage of the total space along with a maximum. I'll expand on the comment and add a TODO.
// UpdateStatus describes the status field in update.yaml. | ||
type UpdateStatus struct { | ||
// ActiveVersion is the currently active Teleport version. | ||
ActiveVersion string `yaml:"active_version"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we prevent this from drifting away from the actual state? I guess we shouldn't rely on a running teleport
binary to report its own version, since if an agent crashes, there's nothing to query, but I'd recommend making an effort to update this every time an agent starts (unless it's something you've already thought about.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR, active_version
is the only source of truth. This will become a problem in the next PR, which links the installation into /usr/local/bin
, creating two definitions of active.
This is tricky to solve, since there's no way to link multiple binaries into /usr/local/bin
atomically in a way that's resilient to, e.g., power loss. The strategy in the next PR will be to write active_version
only after the install and linking is successful. That way, drift should only happen where the installed version is newer (or partially newer) than the active_version
, which will get corrected by the updater on the next invocation (by checking the symlink paths).
This file is only written by teleport-update
. teleport
, tbot
, and other agent-like processes that run concurrently with teleport-update
are only intended to read from it.
lib/autoupdate/agent/updater.go
Outdated
// Installer provides an API for installing Teleport agents. | ||
type Installer interface { | ||
// Install the Teleport agent at version from the download template. | ||
Install(ctx context.Context, version, template string) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to pass the template from outside? If we create an installer knowing the existing config, perhaps the installer itself could already be aware of the template string it should use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The download URL template is also persisted into update.yaml
(in the case that it is overridden by the user). It could be confusing to pass the template to Enable
for storage, but not use Enable
's copy of the value during the installation that Enable
initiates.
lib/autoupdate/agent/updater.go
Outdated
Remove(ctx context.Context, version string) error | ||
} | ||
|
||
// UserConfig contains user overrides for installing Teleport agents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use "command line flags" instead of "user overrides" in this comment? It may be even better to rename the struct name to something like that.
Unless, of course, this structure can/will be filled from something other than CLI flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, meant to rename this after refactoring. Trying to keep all of the CLI-related terminology in tool/teleport-update
, but this should be named something like OverrideConfig
lib/autoupdate/agent/feature_ent.go
Outdated
package agent | ||
|
||
func init() { | ||
featureFlag |= flagEnt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the first time I see something like this. How exactly does it work? Is there some sort of conditional compilation built into our makefiles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this from #47466. Both the client
and agent
packages will use the same flag set from lib/autoupdate
in an upcoming PR. The build flags should be set when compiled via teleport.e
. See the top of the file:
//go:build webassets_ent
// +build webassets_ent
@vapopov I'm noticing now that this flag is only used in webassets_embed_ent.go
, and the webassets
prefix is confusing in this non-web-asset context. Would it make sense to look into making this flag more generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I previously mentioned about this one #46587 (comment) since we don't have proper build flag for ent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the modules
package for this instead of making new flags? This package can already tell you if the binary is OSS, Enterprise, FIPS, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we open a separate PR that renames the build flag so that it can be used in a wider context? (race condition on the Github comments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think modules.GetModules().IsEnterpriseBuild()
will work as-is for teleport-update
, since it doesn't have a separate tool/teleport-update
target in teleport.e that runs modules.SetModules()
.
That said, we actually want the edition of the agent download to come from the cluster via /webapi/find
before this ships, so I'll just default it to false and add a TODO for now.
@vapopov you should be able to modules
for tsh
and tctl
, but tbot
may not work for the reason mentioned.
return trace.Errorf("mismatched checksum, download possibly corrupt") | ||
} | ||
// Get uncompressed size of the tgz | ||
n, err := uncompressedSize(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: since we don't use n
anywhere else, perhaps checking for uncompressed size should belong to the extract
function. Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on this, but ended up leaving it out to simplify the contract between Install
and helpers like extract
. If we calculate the uncompressed size inside extract
, we need an io.ReadSeeker
, and I wanted to keep all of the seeking in the same function.
@sclevine This comment is probably the only serious one here, the rest of it is minor stuff. |
return nil | ||
} | ||
li.Log.WarnContext(ctx, "Removing version that does not match checksum.", "version", version) | ||
if err := li.Remove(ctx, version); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will remove the active installation if the checksum does not match. In the next PR, Remove
will error if we try to remove a linked (and therefore active) installation.
This PR adds the
enable
subcommand to theteleport-update
binary specified by #40190.The
teleport-update
binary will be used to enable, disable, and trigger automatic Teleport agent updates. The new auto-updates system manages a local installation of the cluster-specified version of Teleport stored in/var/lib/teleport/versions
.This is the second in a series of PRs extracted and tested from this proof-of-concept: #46357
Initial scaffolding PR: #46418
This PR is missing a number of features necessary for the
teleport-update
binary to ship, including:RFD: #47126
Goal (internal): https://github.com/gravitational/cloud/issues/10289
Example: