Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve client tools host resolution #50175

Merged
merged 1 commit into from
Dec 20, 2024
Merged

Conversation

rosstimothy
Copy link
Contributor

@rosstimothy rosstimothy commented Dec 12, 2024

Host resolution performed because labels, fuzzy search, or predicate expressions were supplied to commands that establish connections to a single host has historically been performed client side in tsh. While that works in most cases, it can prevent correctly resolving hosts in some situations, i.e. when there are ambiguous hosts and tsh is unaware that the cluster routing strategy is set to ROUTE_TO_MOST_RECENT.

To improve the experience, a new ResolveSSHTarget was added to Auth to allow host resolution to be performed server side. The resolution works in a similar manner to, and was inspired by GetSSHTargets. In the event that the new RPC is not implemented, because the client is newer than Auth, tsh has also been updated to pull the cluster networking config and address any host ambiguity if allowed.

As a result tsh scp and tsh proxy ssh should be much more tolerant to, and still permit access in situations where ambiguous hosts are present for some amount of time. Prior to this the only way to connect in these situations was to find the UUID of the correct target instance and try again after seeing an ambiguous host error.

Changelog: Honor the cluster routing strategy when client initiated host resolution via proxy templates or label matching is ambiguous.

@rosstimothy rosstimothy force-pushed the tross/host_resolution branch 7 times, most recently from 8cdb983 to 84d6013 Compare December 16, 2024 18:30
@rosstimothy rosstimothy marked this pull request as ready for review December 16, 2024 19:42
@github-actions github-actions bot added size/md tsh tsh - Teleport's command line tool for logging into nodes running Teleport. labels Dec 16, 2024
@github-actions github-actions bot requested review from creack and hugoShaka December 16, 2024 19:43
@rosstimothy rosstimothy changed the title Improve tsh host resolution Improve client tools host resolution Dec 16, 2024
@rosstimothy rosstimothy force-pushed the tross/host_resolution branch from b85e58f to f964532 Compare December 18, 2024 13:50
lib/auth/auth_with_roles.go Outdated Show resolved Hide resolved
lib/auth/auth_with_roles.go Outdated Show resolved Hide resolved
Comment on lines +828 to 837
// ErrNodeIsAmbiguous serves as an identifying error string indicating that
// the proxy subsystem found multiple nodes matching the specified hostname.
var ErrNodeIsAmbiguous = &trace.NotFoundError{Message: "ambiguous host could match multiple nodes"}

const (
// NodeIsAmbiguous serves as an identifying error string indicating that
// the proxy subsystem found multiple nodes matching the specified hostname.
// TODO(tross) DELETE IN v20.0.0
// Deprecated: Prefer using ErrNodeIsAmbiguous
NodeIsAmbiguous = "err-node-is-ambiguous"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I might be missing something super obvious, but I'm not seeing where/how we're preserving the meaningfulness of ErrNodeIsAmbiguous across network boundaries/ssh. Is there some magic error handling I'm missing that is making sure that errors.Is(...) is working correctly when ErrNodeIsAmbiguous is generated during routing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrNodeIsAmbiguous should be unpacked after traversing network boundaries automagically by our interceptors which make use of trail.FromGRPC. The only thing that maybe won't work is the wrapping I added to include NodeIsAmbiguous for backwards compatibility. I might have to actually include that in the error message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, I follow now. gravitational/trace has an implementation of the Is method that takes message string equality into account. Depending on entire message string equality feels a lot more brittle than depending on the presence of a single unique identifier within the message. It basically locks us into this exact error message forever. Also, string equality doesn't seem to usually be something that errors.Is cares about, and I think most people touching an error that is intended to be used with errors.Is wouldn't think of string equality as a potential compatibility issue. My preference would be to leave this the way it was.

api/proto/teleport/legacy/client/proto/authservice.proto Outdated Show resolved Hide resolved
api/proto/teleport/legacy/client/proto/authservice.proto Outdated Show resolved Hide resolved
lib/auth/auth_with_roles.go Outdated Show resolved Hide resolved
lib/auth/auth_with_roles.go Outdated Show resolved Hide resolved
})
switch {
//TODO(tross): DELETE IN v20.0.0
case trace.IsNotImplemented(err):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this case to a separate method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid this "legacy" behavior being in it's own method because I don't think it should ever be called directly. Even with warnings and disclaimers added to the separate method it could still be abused. I'd rather force all callers to use a single unified way of retrieving hosts than each implement something slightly different that doesn't respect proxy templates or the routing strategy.

}
case err == nil:
if resp.GetServer() == nil {
return nil, trace.NotFound("no matching SSH hosts found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a version mismatch or a bug, not a NotFound IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't quite sure what would be the best error type to return here. I was trying to avoid anything that would trigger our automatic reauthentication as that is likely not going to help in this scenario.

lib/client/api.go Outdated Show resolved Hide resolved
lib/tbot/service_ssh_multiplexer.go Show resolved Hide resolved
switch {
//TODO(tross): DELETE IN v20.0.0
case trace.IsNotImplemented(err):
resources, err := client.GetAllUnifiedResources(ctx, clt, &proto.ListUnifiedResourcesRequest{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you move this section back into its own method it'll just be a renaming of the old method.

@rosstimothy rosstimothy force-pushed the tross/host_resolution branch from f964532 to 05b337b Compare December 19, 2024 22:39
Host resolution performed because labels, fuzzy search, or predicate
expressions were supplied to commands that establish connections to
a single host has historically been performed client side in tsh.
While that works in most cases, it can prevent correctly
resolving hosts in some situations, i.e. when there are ambiguous
hosts and tsh is unaware that the cluster routing strategy is set
to ROUTE_TO_MOST_RECENT.

To improve the experience, a new ResolveSSHTarget was added to
Auth to allow host resolution to be performed server side. The
resolution works in a similar manner to, and was inspired by
GetSSHTargets. In the event that the new RPC is not implemented,
because the client is newer than Auth, tsh has also been updated
to pull the cluster networking config and address any host
ambiguity if allowed.

As a result tsh scp and tsh proxy ssh should be much more
tolerant to, and still permit access in situations where
ambiguous hosts are present for some amount of time. Prior to
this the only way to connect in these situations was to find
the UUID of the correct target instance and try again after
seeing an ambiguous host error.
@rosstimothy rosstimothy force-pushed the tross/host_resolution branch from 6cf0529 to 67136e4 Compare December 20, 2024 15:08
@rosstimothy rosstimothy added this pull request to the merge queue Dec 20, 2024
Merged via the queue into master with commit c61461b Dec 20, 2024
43 checks passed
@rosstimothy rosstimothy deleted the tross/host_resolution branch December 20, 2024 16:16
@public-teleport-github-review-bot

@rosstimothy See the table below for backport results.

Branch Result
branch/v16 Failed
branch/v17 Failed

rosstimothy added a commit that referenced this pull request Jan 6, 2025
Host resolution performed because labels, fuzzy search, or predicate
expressions were supplied to commands that establish connections to
a single host has historically been performed client side in tsh.
While that works in most cases, it can prevent correctly
resolving hosts in some situations, i.e. when there are ambiguous
hosts and tsh is unaware that the cluster routing strategy is set
to ROUTE_TO_MOST_RECENT.

To improve the experience, a new ResolveSSHTarget was added to
Auth to allow host resolution to be performed server side. The
resolution works in a similar manner to, and was inspired by
GetSSHTargets. In the event that the new RPC is not implemented,
because the client is newer than Auth, tsh has also been updated
to pull the cluster networking config and address any host
ambiguity if allowed.

As a result tsh scp and tsh proxy ssh should be much more
tolerant to, and still permit access in situations where
ambiguous hosts are present for some amount of time. Prior to
this the only way to connect in these situations was to find
the UUID of the correct target instance and try again after
seeing an ambiguous host error.
rosstimothy added a commit that referenced this pull request Jan 6, 2025
Host resolution performed because labels, fuzzy search, or predicate
expressions were supplied to commands that establish connections to
a single host has historically been performed client side in tsh.
While that works in most cases, it can prevent correctly
resolving hosts in some situations, i.e. when there are ambiguous
hosts and tsh is unaware that the cluster routing strategy is set
to ROUTE_TO_MOST_RECENT.

To improve the experience, a new ResolveSSHTarget was added to
Auth to allow host resolution to be performed server side. The
resolution works in a similar manner to, and was inspired by
GetSSHTargets. In the event that the new RPC is not implemented,
because the client is newer than Auth, tsh has also been updated
to pull the cluster networking config and address any host
ambiguity if allowed.

As a result tsh scp and tsh proxy ssh should be much more
tolerant to, and still permit access in situations where
ambiguous hosts are present for some amount of time. Prior to
this the only way to connect in these situations was to find
the UUID of the correct target instance and try again after
seeing an ambiguous host error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/branch/v16 backport/branch/v17 size/md tsh tsh - Teleport's command line tool for logging into nodes running Teleport.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants