Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-267: kubelet server certificate bootstrap and rotation #4848

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aojea
Copy link
Member

@aojea aojea commented Sep 12, 2024

This feature has been beta enabled by default since 1.12, there is no much concern about the stability of the feature, however, it requires an external component to be used. Kubernetes must provide "battery included" for all its built-in features. Graduate the existing feature to GA by providing a controller in the cloud-controller-manager that users can opt-in to auto approve the CSR from the nodes.

Note for reviewers: Please take into consideration this is a personal effort to remove technical debt by addressing the technical gaps identified by the sig-auth leaders, please be constructive to get this to the finish line and avoid derailing on the main topic that is move this feature to GA (that is beta since 1.12)

Discussed in sig-auth on 2022-12-07 - https://docs.google.com/document/d/1woLGRoONE3EBVx-wTb4pvp4CI7tmLZ6lS26VTbosLKM/edit#bookmark=id.52okchz28cmr
the functionality that exists is stable, in use, working successfully, but requires bringing your own CSR approver; it's a little weird to have a GA feature with no project-provided approver, but since kubernetes is agnostic about how nodes get IPs/DNS names, it also currently has to be agnostic about how to verify a given node owns a given IP/DNS name; I would +1 marking the current functionality stable and deferring a project-provided node address validation / serving CSR approver to a separate effort; would be good to capture the design and production implications of the current approach in a KEP and note the remaining/future possible related work

/sig auth
/assign @liggitt @deads2k

This feature has been beta enabled by default since 1.12, there is no
much concern about the stability of the feature, however, it requires an
external component to be used.

Kubernetes must provide "battery included" for all its built-in
features.

Graduate the existing feature to GA by providing a controller in the
cloud-controller-manager that users can opt-in to auto approve the CSR
from the nodes.

Change-Id: I7500f4cb6582fdff423e430518d370bcd08f144a
Signed-off-by: Antonio Ojea <[email protected]>
@k8s-ci-robot k8s-ci-robot added sig/auth Categorizes an issue or PR as relevant to SIG Auth. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 12, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 12, 2024
#### GA

- Real world usage
- Opt-in built-in Node CSR approver on cloud-controller-manager so users does not have to depend on external components to use this feature
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt @deads2k this is the only change required AFAIK

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the past discussion was saying that it is not needed. See #267 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read it as is the part missing #267 (comment)

##### Prerequisite testing updates


##### Unit tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardest part of testing when I looked at GA-ing this was testing of the edge cases. How we retry, how we start kubelet that failed to get certs, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a feature that has been running in production for more than 20 releases, there is no functionality added in this KEP, this is just adding the missing requirement per sig-auth, a builtin component that can provide the same functionality

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How we retry, how we start kubelet that failed to get certs, etc.

what do you mean by start? how we start kubelet that failed to get certs


Any cluster running e2e tests with the feature enabled will be exercising the feature.

A job using the built-in CSR approver will be added exercising all the Conformance e2e tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to run all tests? Once kubelet bootstrapped, there is nothing new when we run conformance. Would it be best to concentrate on covering edge cases like various failures handling and certs rotation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after 20 releases what regressions or errors we plan to uncover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, the test is to run existing jobs with the builtin cloudprovider, everything should pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can setup a time to rotate certificates very low, so the job guarantee that the certificates are rotated during the execution of the tests


- Real world usage

#### GA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corresponding metrics needs to be reviewed and GA-ed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a new requirement?

We have most of the metrics in alpha

kubernetes$ grep -r StabilityLevel pkg/ | cut -d\: -f3 | sort | uniq -c
     11  compbasemetrics.ALPHA,
      2     metrics.ALPHA,
    179  metrics.ALPHA,
      1  metrics.BETA,
      1  metrics.INTERNAL,
      1     metrics.STABLE,
     14  metrics.STABLE,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked Han about this the other day because I also wanted to know and he said that metrics can stay in alpha even when features graduate. There's no process that ensures that metrics mature either.

@stlaz
Copy link
Member

stlaz commented Sep 30, 2024

per triage:
@deads2k would you please take a PRR look?

@aojea
Copy link
Member Author

aojea commented Oct 31, 2024

/reopen

was not my intention to close it

@aojea aojea reopened this Oct 31, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aojea
Once this PR has been reviewed and has the lgtm label, please ask for approval from deads2k. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

7 participants