-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KEP-267] Rotate Kubelet Server Certificate Retrospective KEP #4411
[KEP-267] Rotate Kubelet Server Certificate Retrospective KEP #4411
Conversation
kannon92
commented
Jan 17, 2024
- One-line PR description: Open a retrospective KEP to document existing design and mark what work will be needed to eventually GA this feature.
- Issue link: Kubelet Server TLS Certificate Rotation #267
- Other comments: Taking over @SergeyKanzhelev KEP as I will be leading this effort.
/hold |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kannon92 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
8ae93de
to
31e9394
Compare
/hold cancel |
/cc @tallclair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Here's some feedback. I think we need more detail, even if we already have shipped code.
|
||
## Alternatives | ||
|
||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why “N/A”? Maybe:
N/A | |
_None identified_ |
### Goals | ||
|
||
- ability to configure kubelet to get a server certificate for the kubelet from | ||
the Certificate Signing Request API instead of generating one self signed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit)
the Certificate Signing Request API instead of generating one self signed | |
the CertificateSigningRequest API instead of generating one self signed |
expiration approaches. | ||
|
||
## Motivation | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cluster operators want TLS between their nodes and the control plane. However, managing the | |
certificates at scale is problematic. Kubernetes provides APIs to define X.509 certificates, signing | |
requests for those certificates, and even trust anchors. | |
As a cluster operator, automating the rotation of kubelet X.509 certificates has the potential to replace | |
significant toil with an easy and secure ongoing process. | |
### Background | |
Kubernetes v1.7 introduced kubelet server certificate rotation as alpha. |
This is retrospective KEP description for the feature `RotateKubeletServerCertificate`. | ||
|
||
The feature gate `RotateKubeletServerCertificate` allows to configure kubelet to | ||
get a server certificate for the kubelet from the Certificate Signing Request | ||
API instead of generating one self signed and auto rotates the certificate as | ||
expiration approaches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is retrospective KEP description for the feature `RotateKubeletServerCertificate`. | |
The feature gate `RotateKubeletServerCertificate` allows to configure kubelet to | |
get a server certificate for the kubelet from the Certificate Signing Request | |
API instead of generating one self signed and auto rotates the certificate as | |
expiration approaches. | |
This KEP proposes a mechanism to allow the kubelet to request and then use a certificate | |
using the Kubernetes CertificateSigningRequest (CSR) API. A kubelet that can authenticate to the | |
cluster's control plane can use that mechanism to support automatic certificate rotation | |
before the existing certificate expires. Certificates are only issued to nodes if their CSR is | |
explicitly approved. |
I'd add a Background heading later in the KEP that explains why it's retrospective.
|
||
### Non-Goals | ||
|
||
- built-in support for approving CSRs nodes request (not currently done because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- built-in support for approving CSRs nodes request (not currently done because | |
- built-in support for approving CSRs that nodes have requested (omitted because |
<!-- | ||
Describe the metrics themselves and the reasons why they weren't added (e.g., cost, | ||
implementation difficulties, etc.). | ||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section needs answering.
|
||
###### Will enabling / using this feature result in increasing size or count of the existing API objects? | ||
|
||
No |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No | |
Yes, CertificateSigningRequests. |
|
||
### Troubleshooting | ||
|
||
Failure to get certificate can be troubleshooted using logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need any more detail here?
- "server_expiration_renew_errors" | ||
- "certificate_manager_server_rotation_seconds" | ||
- "certificate_manager_server_ttl_seconds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right?
- "server_expiration_renew_errors" | |
- "certificate_manager_server_rotation_seconds" | |
- "certificate_manager_server_ttl_seconds" | |
- `server_expiration_renew_errors` (counter) | |
- `certificate_manager_server_rotation_seconds` (histogram) | |
- `certificate_manager_server_ttl_seconds` (histogram) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you add the help text (metric descriptions) here?
Discussing this feature in sig-auth it was suggested that integration tests should be sufficient | ||
|
||
NA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussing this feature in sig-auth it was suggested that integration tests should be sufficient | |
NA | |
We won't do any end to end tests; this is because SIG Auth feel that integration tests provide sufficient | |
coverage. |
?
I think an optional end to end test would be nice to have, even so.
ServerTLSBootstrap bool `json:"serverTLSBootstrap,omitempty"` | ||
``` | ||
|
||
The corresponding CLI (which CLI for Kubelet are deprecated) is `--rotate-server-certificates`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation also states that the command line flag is deprecated, but we don't actually mark it as deprecated in the code. I don't think it is deprecated?
// the 'certificates.k8s.io' API. This requires an approver to approve the | ||
// certificate signing requests (CSR). The RotateKubeletServerCertificate feature | ||
// must be enabled when setting this field. | ||
// Default: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any path to enabling this by default? It wouldn't make sense for standalone kubelets, but could we enable it by default for Kubelets with a kubeconfig?
|
||
The corresponding CLI (which CLI for Kubelet are deprecated) is `--rotate-server-certificates`. | ||
|
||
When the feature gate `RotateKubeletServerCertificate` is true, then `ServerTLSBootstrap` is also required to be true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the feature gate `RotateKubeletServerCertificate` is true, then `ServerTLSBootstrap` is also required to be true. | |
To enable this feature, both the feature gate `RotateKubeletServerCertificate` and configuration option `ServerTLSBootstrap` must be enabled. |
- "certificate_manager_server_rotation_seconds" | ||
- "certificate_manager_server_ttl_seconds" | ||
|
||
```golang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just link to the implementation, rather than inlining it here?
These metrics should be promoted to [beta](https://github.com/kubernetes/kubernetes/pull/122834 | ||
) before GA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not Stable? This line seems to conflict with the previous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly because they were alpha so I would think they should go alpha -> beta -> ga?
- "server_expiration_renew_errors" | ||
- "certificate_manager_server_rotation_seconds" | ||
- "certificate_manager_server_ttl_seconds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you add the help text (metric descriptions) here?
##### e2e tests | ||
|
||
Discussing this feature in sig-auth it was suggested that integration tests should be sufficient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the apiserver is configured to validate Kubelet serving certificates, and this feature is enabled, then it's functionality will be covered by a number existing E2E tests (anything that touches the kubelet serving path, e.g. exec/attach/portforward/logs).
Can you confirm whether any E2E test suites match this configuration?
- [ ] There are conformance tests for the certificates API endpoint, functionality | ||
itself do not require covering with conformance tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't make this a check box. It sounds like this is rationalizing the fact that conformance tests aren't part of the GA criteria?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the template for PRR.
|
||
###### Does this feature depend on any specific services running in the cluster? | ||
|
||
No |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this depend on a CSR approver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What usually approves CSRs? I'm not very well versed in this area so I'm interested to learn more.
- If it is a rotation, kubelet will become non-functional unless will be able to | ||
renew the certificate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it will keep using the old cert until expiration? And I also assume kubelet would continue to function, but it's serving cert would be invalid? API server can be configured to ignore kubelet serving cert verification.
Sorry I didn't address these comments. I realize that I don't have the domain knowledge to really answer these questions. I brought this up to sig-node hoping that someone else could spearhead this feature. Our hope was to get this KEP out of the permanent beta state but the requirement is that we need to document the design of this feature. I will be on leave for 1.31 so this won't make much progress unless someone else takes it. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/close |
@kannon92: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@aojea is taking this over. Closing my attempt. |