Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(SliceGwReconciler): Add PodDisruptionBudget logic to SliceGwReconciler (#308) #334

Conversation

Bhargav-InfraCloud
Copy link
Contributor

@Bhargav-InfraCloud Bhargav-InfraCloud commented Feb 19, 2024

Description

A PodDisruptionBedget is required that matches the slice gateway pods, and to specify a minimum availability of 1 pod in case of disruptions.

The SliceGwReconciler handles the lifecycle of this PodDisruptionBudget object.

Added RBAC permissions for SliceGwReconciler to maintain PodDisruptionBudget.

Fixes #308

How Has This Been Tested?

Checklist:

  • The title of the PR states what changed and the related issues number (used for the release note).
  • Does this PR requires documentation updates?
    • Since the change is user facing, i.e., the PDB is one extra resource that will be visible to users, I think it is better to a add some explanation in the documentation.
  • I've updated documentation as required by this PR.
  • I have ran go fmt
  • I have updated the helm chart as required by this PR.
    • Updated the RBAC permissions for CRUD on PodDisruptionBudget objects. However, a PR for the same has to be created once this is approved.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have tested it for all user roles.
  • I have added all the required unit test cases.
  • I have verified the E2E test cases with new code changes.
  • I have added all the required E2E test cases.
    • Added 3 subtest cases under spoke tests.

Does this PR introduce a breaking change?


Steps to test

  1. Deploy the whole setup controller and 2 workers. Check if there are slice gateway pods similar to this in both the worker clusters:
water-kind-kubeslice-worker-2-kind-kubeslice-worker-1-0-0-6tqrk   3/3     Running   0          61m
water-kind-kubeslice-worker-2-kind-kubeslice-worker-1-1-0-h4vsp   3/3     Running   0          61m
  1. With this change, there should also be a PodDisruptionBudget created in the same namespace
water-kind-kubeslice-worker-2-kind-kubeslice-worker-1-pdb   1               N/A               1                     62m

with labels similar to:

labels:
  kubeslice.io/slice: water
  kubeslice.io/slice-gw: water-kind-kubeslice-worker-2-kind-kubeslice-worker-1
  1. When trying to disrupt the node, say one of the worker cluster's worker nodes (if having multi-node setup), using the command:
kubectl drain --ignore-daemonsets --delete-emptydir-data <worker-node-name-here>

it should be able to evict one pod but fail to evict the other:

pod/water-kind-kubeslice-worker-2-kind-kubeslice-worker-1-0-0-6tqrk evicted

error when evicting pods/"water-kind-kubeslice-worker-2-kind-kubeslice-worker-1-1-0-h4vsp" -n "kubeslice-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

@Bhargav-InfraCloud Bhargav-InfraCloud force-pushed the br_add_pdb_for_slicegw_deployments branch 10 times, most recently from 33983c1 to ea6e856 Compare February 22, 2024 15:53
@Bhargav-InfraCloud
Copy link
Contributor Author

The issue #335 impacts this PR. Hence holding this until it is resolved.

@Bhargav-InfraCloud Bhargav-InfraCloud force-pushed the br_add_pdb_for_slicegw_deployments branch 3 times, most recently from 210cff2 to d429902 Compare February 23, 2024 12:48
@mridulgain
Copy link
Contributor

Thanks for bringing this to our attention @Bhargav-InfraCloud . The issue #335 has been resolved.

@Bhargav-InfraCloud Bhargav-InfraCloud marked this pull request as ready for review February 23, 2024 19:02
@Bhargav-InfraCloud Bhargav-InfraCloud changed the title WIP: feat(SliceGwReconciler): Add PodDisruptionBudget Logic (#308) feat(SliceGwReconciler): Add PodDisruptionBudget Logic (#308) Feb 23, 2024
@Bhargav-InfraCloud Bhargav-InfraCloud changed the title feat(SliceGwReconciler): Add PodDisruptionBudget Logic (#308) feat(SliceGwReconciler): Add PodDisruptionBudget Logic (#308) Feb 23, 2024
@Bhargav-InfraCloud
Copy link
Contributor Author

Thanks @mridulgain! Changes in this PR are now based on the latest master.

@Bhargav-InfraCloud Bhargav-InfraCloud changed the title feat(SliceGwReconciler): Add PodDisruptionBudget Logic (#308) feat(SliceGwReconciler): Add PodDisruptionBudget logic to SliceGwReconciler (#308) Feb 23, 2024
@Bhargav-InfraCloud Bhargav-InfraCloud force-pushed the br_add_pdb_for_slicegw_deployments branch from d429902 to 845b305 Compare February 25, 2024 15:16
controllers/slicegateway/slicegateway.go Outdated Show resolved Hide resolved
controllers/slicegateway/pod_disruption_budget.go Outdated Show resolved Hide resolved
controllers/slicegateway/pod_disruption_budget.go Outdated Show resolved Hide resolved
controllers/slicegateway/slicegateway.go Outdated Show resolved Hide resolved
controllers/slicegateway/slicegateway.go Outdated Show resolved Hide resolved
@narmidm narmidm assigned narmidm and Bhargav-InfraCloud and unassigned narmidm Feb 27, 2024
@Bhargav-InfraCloud Bhargav-InfraCloud force-pushed the br_add_pdb_for_slicegw_deployments branch from 845b305 to 048a014 Compare February 28, 2024 05:48
Copy link
Contributor

@Rahul-D78 Rahul-D78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thank you for adding this @Bhargav-InfraCloud 👍

@Bhargav-InfraCloud
Copy link
Contributor Author

The E2E is failing as the image is missing in the Docker hub:

Unable to find image 'aveshadev/kubeslice-e2e:latest' locally
docker: Error response from daemon: pull access denied for aveshadev/kubeslice-e2e, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.

Was it moved to somewhere else?

@Bhargav-InfraCloud
Copy link
Contributor Author

@narmidm @bharath-avesha @gourishkb Can you please review this PR? Thanks!

@mridulgain
Copy link
Contributor

mridulgain commented Mar 6, 2024

The E2E is failing as the image is missing in the Docker hub:

Unable to find image 'aveshadev/kubeslice-e2e:latest' locally
docker: Error response from daemon: pull access denied for aveshadev/kubeslice-e2e, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.

Was it moved to somewhere else?

@Bhargav-InfraCloud we have resolved the image issue & re triggered the pipeline.

@Bhargav-InfraCloud Bhargav-InfraCloud force-pushed the br_add_pdb_for_slicegw_deployments branch from 048a014 to b17c49f Compare March 6, 2024 11:33
@Bhargav-InfraCloud
Copy link
Contributor Author

@mridulgain Great! Thanks for the update.
I've rebased with the latest master. Please trigger the E2E again.

@NishantSingh10
Copy link
Contributor

report link 'https://kubeslice.github.io/e2e-allure-reports/Kind-worker-operator-2024-03-08T08:57:25-master-437/index.html'

controllers/slicegateway/slicegateway.go Outdated Show resolved Hide resolved
controllers/slicegateway/slicegateway.go Outdated Show resolved Hide resolved
controllers/slicegateway/slicegateway.go Outdated Show resolved Hide resolved
A PodDisruptionBedget is required that matches the slice gateway pods,
and to specify a minimum availability of 1 pod in case of disruptions.

The SliceGwReconciler handles the lifecycle of this PodDisruptionBudget
object.

Added RBAC permissions for SliceGwReconciler to maintain
PodDisruptionBudget.

Fixes kubeslice#308

Signed-off-by: Bhargav Ravuri <[email protected]>
@Bhargav-InfraCloud Bhargav-InfraCloud force-pushed the br_add_pdb_for_slicegw_deployments branch from b17c49f to bd0283b Compare March 9, 2024 14:07
@narmidm
Copy link
Member

narmidm commented Mar 12, 2024

started E2E pipeline - https://github.com/kubeslice/worker-operator/actions/runs/8244374883/job/22546499751?pr=334

@NishantSingh10
Copy link
Contributor

@narmidm narmidm merged commit 535616d into kubeslice:master Mar 12, 2024
8 checks passed
@Bhargav-InfraCloud
Copy link
Contributor Author

Thanks, all! 😊

@Bhargav-InfraCloud Bhargav-InfraCloud deleted the br_add_pdb_for_slicegw_deployments branch March 12, 2024 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pod Disruption Budget for slice gateway pods
6 participants