Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split user keys for k8s access #44779

Merged
merged 6 commits into from
Aug 13, 2024
Merged

Split user keys for k8s access #44779

merged 6 commits into from
Aug 13, 2024

Conversation

nklaassen
Copy link
Contributor

This PR is part of the implementation of RFD 136.

The main change here is that tsh now uses a unique private key every time it gets a new k8s cert issued. This new key will use a signature algorithm according to the cluster's currently configured signature_algorithm_suite.

This is very similar to the parent PR #44718

@github-actions github-actions bot requested review from atburke and strideynet July 29, 2024 22:27
@github-actions github-actions bot added machine-id size/sm tsh tsh - Teleport's command line tool for logging into nodes running Teleport. labels Jul 29, 2024
@nklaassen nklaassen added the no-changelog Indicates that a PR does not require a changelog entry label Jul 29, 2024
@gravitational gravitational deleted a comment from github-actions bot Jul 30, 2024
@zmb3 zmb3 requested review from AntonAM and tigrato July 31, 2024 14:57
// - $TSH_HOME/keys/$PROXY/$USER-kube/$TELEPORT_CLUSTER/$KUBE_CLUSTER-x509.pem
// - $TSH_HOME/keys/$PROXY/$USER
// - $TSH_HOME/keys/$PROXY/$USER-kube/$TELEPORT_CLUSTER/$KUBE_CLUSTER.crt
// - $TSH_HOME/keys/$PROXY/$USER-kube/$TELEPORT_CLUSTER/$KUBE_CLUSTER.key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this means that we generate a private key per kube cluster?
Is there any reason why we do not generate a single pk per teleport cluster given that it satisfies the cluster signature algo

seems a bit slow to generate a new rsa pk every time we switch to a different cluster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this means that we generate a private key per kube cluster?

Yes

Is there any reason why we do not generate a single pk per teleport cluster given that it satisfies the cluster signature algo

seems a bit slow to generate a new rsa pk every time we switch to a different cluster

Two reasons to use unique private keys:

  1. better isolation in case a single protocol or connection compromises the key somehow
  2. better flexibility to possibly use a different key algorithm for different clusters in the future, without another breaking change to the file layout

With ECDSA keys I'd like to keep it this way, but if you think rsa keygen will be too slow client side, I can make a change to reuse a single RSA TLS private key. We could still save that same key at this file path for each cluster, just duplicated, so that we won't need another breaking change to file paths

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think I should reuse RSA keys I'd do it for all protocols, I can handle it in another PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we reuse RSA keys, considering how slow they are to generate.

Additionally, you need to handle a special case when multiple commands are executed in parallel. Currently, we use the private key (PK) generated during tsh login, and each time someone invokes kubectl ..., it calls tsh kube credentials.

tsh kube credentials is responsible for asking auth to sign the certificate for a given cluster, which it then stores on disk. If multiple kubectl calls happen in parallel and all generate certificates, it's fine because the PK remains the same, and only the certificate is rewritten to disk (which ends up being the same for all signing requests). Although this behavior is racy, it has no downsides.

However, with a unique PK per cluster, you might end up writing a PK and a signed certificate that aren't a cert-key pair, preventing TLS from being established. The resulting error will be difficult to understand.

Note that tsh kube login doesn't generate keys; only tsh kube credentials does, and it is invoked with each call to kubectl. If you have a script or run a tool like Kubelens or k9s, they end up invoking tsh kube credentials several times per second, almost certainly causing this racy behavior to manifest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points

I'm reusing RSA keys in the latest

I also added file locking when reading and writing key/cert pairs now so that processes won't do interleaved reads/writes of non-matching keys/certs. Whenever reading or writing the 2 files, the process will grab an OS file lock on the key file.

The file locking makes LoadKeysToKubeFromStore about 30% slower due to the extra syscalls, not ideal but not sure it's avoidable. On the upside, working with ECDSA keys will be much faster than RSA in other areas. It's about 4% faster just to read the (smaller) files, and in my experience cert signing and mTLS dials will be ~10x faster

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be crazy and store the cert and key in the same file instead of separate files? That would solve the lock issue and make it faster to load

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For kube we probably can if never have other clients load the key/cert. Couldn't do it for apps/dbs though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that for kube it's probably worth it. We had customers in the past that had more than 30 kubectl invocations per second and it was really slow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack I'll put that in on monday

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tigrato done in the latest

LoadKeysToKubeFromStore is now ~39% faster here than it is on master

$ benchstat master.txt split.txt unified.txt
goos: darwin
goarch: arm64
pkg: github.com/gravitational/teleport/lib/client
                                                   │  master.txt  │               split.txt               │             unified.txt             │
                                                   │    sec/op    │    sec/op     vs base                 │   sec/op     vs base                │
LoadKeysToKubeFromStore/LoadKeysToKubeFromStore-12   162.34µ ± 0%   211.42µ ± 0%   +30.24% (p=0.000 n=10)   99.22µ ± 0%  -38.88% (p=0.000 n=10)
LoadKeysToKubeFromStore/GetKeyRing-12                 1.030m ± 0%    2.256m ± 0%  +119.05% (p=0.000 n=10)   1.174m ± 0%  +13.97% (p=0.000 n=10)
geomean                                               408.9µ         690.7µ        +68.91%                  341.3µ       -16.54%

GetKeyRing is still a bit slower because it reads the TLS key cert, and I added a file lock here

lib/client/kube/kube.go Outdated Show resolved Hide resolved
api/utils/keypaths/keypaths.go Outdated Show resolved Hide resolved
@nklaassen nklaassen force-pushed the nklaassen/split-db-keys branch from 326dbba to f43a861 Compare August 6, 2024 19:03
@nklaassen nklaassen force-pushed the nklaassen/split-k8s-keys branch from 4c234d9 to 83e594b Compare August 6, 2024 19:03
@nklaassen nklaassen force-pushed the nklaassen/split-db-keys branch from f43a861 to a57db58 Compare August 6, 2024 23:00
@nklaassen nklaassen force-pushed the nklaassen/split-k8s-keys branch from 83e594b to f698725 Compare August 6, 2024 23:00
@nklaassen nklaassen force-pushed the nklaassen/split-db-keys branch from a57db58 to 951fe64 Compare August 7, 2024 23:09
@nklaassen nklaassen force-pushed the nklaassen/split-k8s-keys branch from f698725 to 953530d Compare August 7, 2024 23:10
Base automatically changed from nklaassen/split-db-keys to master August 8, 2024 01:29
@nklaassen nklaassen force-pushed the nklaassen/split-k8s-keys branch from 953530d to 3590903 Compare August 8, 2024 15:57
@nklaassen nklaassen requested a review from tigrato August 13, 2024 15:41
@@ -111,13 +114,16 @@ const (
// │ ├── foo-kube --> Kubernetes certs for user "foo"
// │ | ├── root --> Kubernetes certs for Teleport cluster "root"
// │ | │ ├── kubeA-kubeconfig --> standalone kubeconfig for Kubernetes cluster "kubeA"
// │ | │ ├── kubeA-x509.pem --> TLS cert for Kubernetes cluster "kubeA"
// │ | │ ├── kubeA.crt --> TLS cert for Kubernetes cluster "kubeA"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update these docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thanks for catching this, done

@nklaassen nklaassen force-pushed the nklaassen/split-k8s-keys branch from 31a2f9e to 779f22e Compare August 13, 2024 15:58
@nklaassen nklaassen requested a review from tigrato August 13, 2024 16:36
Copy link
Contributor

@tigrato tigrato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for handling it

@nklaassen nklaassen added this pull request to the merge queue Aug 13, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 13, 2024
@nklaassen nklaassen added this pull request to the merge queue Aug 13, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 13, 2024
@nklaassen nklaassen added this pull request to the merge queue Aug 13, 2024
Merged via the queue into master with commit 6eb6b98 Aug 13, 2024
38 checks passed
@nklaassen nklaassen deleted the nklaassen/split-k8s-keys branch August 13, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
machine-id no-changelog Indicates that a PR does not require a changelog entry size/sm tsh tsh - Teleport's command line tool for logging into nodes running Teleport.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants