Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSD-24275: Validate machineCIDR is contained in default ingresscontro… #318

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

nephomaniac
Copy link

@nephomaniac nephomaniac commented Aug 14, 2024

This PR intends to address OSD-24275 adding checks to help ensure at least one of the provided 'allowedSourceRanges' subnets contains the install-config's machineCIDR.
This includes a new role and rolebinding in the syncselectorset allowing the validation-webhook service account get() access to the install-config configmap in the kube-system namespace.
This PR attempts to include a set of unit tests intended to check different 'allowedSourceRanges' values across different machineCIDR values across update() and create() requests.

This ingresscontroller validation/check is intended to have the following limitations:

  • Should only apply to Create, and/or Update request operations
  • Should only apply to update/create requests for the "default" ingresscontroller (namespace: openshift-ingress-operator)
  • This validation should not apply to requests which do not contain the "allowedSourceRanges" param/attr, or contain an empty value in the request. (Note: This does not prevent users from removing the 'allowedSourceRanges' values from an ingress controller which can result in a controller get stuck in progressing state).
  • At this time this webhook is not enabled for Hypershift, and is enabled for Classic only.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 14, 2024

@nephomaniac: This pull request references OSD-24275 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

This PR intends to address OSD-24275 adding checks to help insure at least one of the provided 'allowedSourceRanges' subnets contains the install-config's machineCIDR.
This includes a new role and rolebinding in the syncselectorset allowing the validation-webhook service account get() access to the install-config configmap in the kube-system namespace.
This PR attempts to include a set of unit tests intended to check different 'allowedSourceRanges' values across different machineCIDR values across update() and create() requests.

This ingresscontroller validation/check is intended to have the following limitations:

  • Should only apply to Create, and/or Update request operations
  • Should only apply to update/create requests for the "default" ingresscontroller (namespace: openshift-ingress-operator)
  • This validation should not apply to requests which do not contain the "allowedSourceRanges" param/attr, or contain an empty value in the request. (Note: This does not prevent users from removing the 'allowedSourceRanges' values from an ingress controller which can result in a controller get stuck in progressing state).
  • At this time this webhook is not enabled for HCP, and is enabled for Classic only.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 14, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 14, 2024

@nephomaniac: This pull request references OSD-24275 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

This PR intends to address OSD-24275 adding checks to help insure at least one of the provided 'allowedSourceRanges' subnets contains the install-config's machineCIDR.
This includes a new role and rolebinding in the syncselectorset allowing the validation-webhook service account get() access to the install-config configmap in the kube-system namespace.
This PR attempts to include a set of unit tests intended to check different 'allowedSourceRanges' values across different machineCIDR values across update() and create() requests.

This ingresscontroller validation/check is intended to have the following limitations:

  • Should only apply to Create, and/or Update request operations
  • Should only apply to update/create requests for the "default" ingresscontroller (namespace: openshift-ingress-operator)
  • This validation should not apply to requests which do not contain the "allowedSourceRanges" param/attr, or contain an empty value in the request. (Note: This does not prevent users from removing the 'allowedSourceRanges' values from an ingress controller which can result in a controller get stuck in progressing state).
  • At this time this webhook is not enabled for HCP, and is enabled for Classic only.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 14, 2024

@nephomaniac: This pull request references OSD-24275 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

This PR intends to address OSD-24275 adding checks to help insure at least one of the provided 'allowedSourceRanges' subnets contains the install-config's machineCIDR.
This includes a new role and rolebinding in the syncselectorset allowing the validation-webhook service account get() access to the install-config configmap in the kube-system namespace.
This PR attempts to include a set of unit tests intended to check different 'allowedSourceRanges' values across different machineCIDR values across update() and create() requests.

This ingresscontroller validation/check is intended to have the following limitations:

  • Should only apply to Create, and/or Update request operations
  • Should only apply to update/create requests for the "default" ingresscontroller (namespace: openshift-ingress-operator)
  • This validation should not apply to requests which do not contain the "allowedSourceRanges" param/attr, or contain an empty value in the request. (Note: This does not prevent users from removing the 'allowedSourceRanges' values from an ingress controller which can result in a controller get stuck in progressing state).
  • At this time this webhook is not enabled for Hypershift, and is enabled for Classic only.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@nephomaniac
Copy link
Author

Noting that this PR currently has all the ingressController validations under the single IngressController webhook. This seemed to be the pattern? However, if we want to extend this and possibly support different validations between Classic vs HCP, etc would it then make sense to separate this out? Would we potentially want additional IngressController webhooks, and/or build time flags to indicate cluster environments such as hcp vs classic, etc?

@wgordon17
Copy link

wgordon17 commented Aug 19, 2024

@nephomaniac that's definitely a good call out, and probably an improvement worth exploring. I do believe that's probably outside the scope of OSD-24275. For this card, perhaps its "good enough" to just focus on targeting Classic for the time being

@nephomaniac nephomaniac marked this pull request as draft August 21, 2024 04:19
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 21, 2024
@nephomaniac
Copy link
Author

Converted this to a draft as it seems likely this should be applied to HCP as well and changes the implementation. In my limited HCP testing the cluster admin can edit the default ingressController's allowedSourceRanges excluding the machineCidr network causing the same/similar errors (ie blocking the console, etc).

@nephomaniac nephomaniac marked this pull request as ready for review August 26, 2024 16:54
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 26, 2024
@openshift-ci openshift-ci bot requested review from lisa and rogbas August 26, 2024 16:55
@nephomaniac
Copy link
Author

After some discussion I believe this PR can reviewed and tested for merge. The current intention is that additional Hypershift work , if desired, is to be tracked as part of a separate effort.

@nephomaniac
Copy link
Author

Ready for review.
Updated to load + cache machineCidr, or fail fast during webhook initialization. Moved shared test args specific to webhooks into the webhooks package allowing unit tests and make target tests to run without having to access a live cluster's config.

@nephomaniac
Copy link
Author

/test

Copy link
Contributor

openshift-ci bot commented Sep 27, 2024

@nephomaniac: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

  • /test images

Use /test all to run all jobs.

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@nephomaniac
Copy link
Author

/retest

@nephomaniac
Copy link
Author

Will need another dry-run or equiv webhook flag to allow additional make targets, build routines to iterate through hooks w/o requiring an actual cluster runtime env.

@tnierman
Copy link
Member

tnierman commented Oct 9, 2024

/approve

This looks good to me, but lets get a second opinion too

Copy link
Contributor

openshift-ci bot commented Oct 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nephomaniac, tnierman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 9, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 9, 2024

@nephomaniac: This pull request references OSD-24275 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

This PR intends to address OSD-24275 adding checks to help ensure at least one of the provided 'allowedSourceRanges' subnets contains the install-config's machineCIDR.
This includes a new role and rolebinding in the syncselectorset allowing the validation-webhook service account get() access to the install-config configmap in the kube-system namespace.
This PR attempts to include a set of unit tests intended to check different 'allowedSourceRanges' values across different machineCIDR values across update() and create() requests.

This ingresscontroller validation/check is intended to have the following limitations:

  • Should only apply to Create, and/or Update request operations
  • Should only apply to update/create requests for the "default" ingresscontroller (namespace: openshift-ingress-operator)
  • This validation should not apply to requests which do not contain the "allowedSourceRanges" param/attr, or contain an empty value in the request. (Note: This does not prevent users from removing the 'allowedSourceRanges' values from an ingress controller which can result in a controller get stuck in progressing state).
  • At this time this webhook is not enabled for Hypershift, and is enabled for Classic only.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@iamkirkbater
Copy link
Contributor

Just wondering offhand - what are the steps to test this? Assuming we have a cluster running with this webhook code - what things do I need to do in order to force a failure and then change to view a success?

Basically - if we merge this and get this into integration/staging and we were asking QA to test this - what would they have to do/modify/create in order to test this?

Not a blocker to merge this - but we'd want to know how to test this after it gets into staging to validate that it's ready to be promoted to prod at some point :)

@nephomaniac
Copy link
Author

nephomaniac commented Oct 16, 2024

Thanks @iamkirkbater,
The first tests will need to be things outside of the unit tests. General init/install/bootstrapping. My biggest concern is race conditions on install/init as I've only tested this by replacing the webhook daemonset on a running cluster.

For pointed tests targeted specifically at this validation I'd recommend the following minimum set of tests, somewhat similar to the unit tests:
Assuming a default machine cidr of 10.0.0.0/16

  • Test a request believed to be valid, including the machine cidr in one of the provided allowed subnets...
    oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge --patch='{"spec":{"endpointPublishingStrategy": {"loadBalancer":{"allowedSourceRanges":["10.128.0.0/16", "10.129.0.0/16", "10.69.0.0/16", "10.0.0.0/8", "172.30.0.0/16"]}}}}'

  • Test an invalid request where none of the subnets contain the machine cidr...
    oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge --patch='{"spec":{"endpointPublishingStrategy": {"loadBalancer":{"allowedSourceRanges":["10.128.0.0/16", "10.129.0.0/16", "10.69.0.0/16", "172.30.0.0/16"]}}}}'

  • Test a valid request which does not contain an allowedSourceRanges field or is empty. This should be tested before and after a cluster has had the allowedSourceRanges field populated.
    _Note: https://docs.openshift.com/container-platform/4.13/networking/configuring_ingress_cluster_traffic/configuring-ingress-cluster-traffic-load-balancer-allowed-source-ranges.html
    // Note: From docs it appears a missing ASR value/attr allows all. However...
    // once ASR values have been added to an ingresscontroller, later deleting all the ASRs can expose an issue
    // where the IGC will remaining in progressing state indefinitely
    oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge --patch='{"spec":{"endpointPublishingStrategy": {"loadBalancer":{"allowedSourceRanges":[]}}}}'

  • Repeat the above tests using a cluster with a non-default machine cidr (ie machineCidr != 10.0.0.0/16).

  • Upgrade with and without a valid ingresscontroller allowedSourceRange config (is this WH in the path?).

  • Likely outside the scope of this PR, but additional testing for IGC requests that degrade the cluster that are missed by this webhook.

@bmeng
Copy link
Contributor

bmeng commented Oct 25, 2024

It seems we implemented several functions within the webhook. Does it mean those functions will be called for each validation? Would it be any potential performance degradation?

@nephomaniac
Copy link
Author

It seems we implemented several functions within the webhook. Does it mean those functions will be called for each validation? Would it be any potential performance degradation?

Thanks @bmeng,
Currently the intention is to attempt the client/fetch work, which could have potential request delay/overhead, only once during the init/newWebhook() portion.
The machineCIDR value should be cache'd for use during later ingresscontroller request subnet validations. The new subnet validation functions should be limited to 'create' & 'update' requests, and only for the 'default' ingresscontroller. Currently not enabled for hypershift.
Worth noting that if the WH's client fails to fetch the machineCIDR during init, the WH will exit, retrying on the next init.

Copy link
Contributor

openshift-ci bot commented Oct 29, 2024

@nephomaniac: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants