Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrading a cluster with xpk also upgrades kueue but does so improperly #296

Open
logicalhan opened this issue Dec 14, 2024 · 3 comments
Open

Comments

@logicalhan
Copy link

logicalhan commented Dec 14, 2024

I think there is a bug in your upgrade process.

Looking at your code, it doesn't seem to be the case that the old kueue CRDs are deleted prior to the application of the new ones according to some of your prescriptions.

And you guys are waiting for the kueue deployment to be available https://github.com/AI-Hypercomputer/xpk/blob/main/src/xpk/core/kueue.py#L201-L202 which is fine I guess, but it turns out that when you force apply a new deployment of the controller-manager, it does this:

Name:                   kueue-controller-manager
Namespace:              kueue-system
CreationTimestamp:      SOME-DATE
Labels:                 app.kubernetes.io/component=controller
                        app.kubernetes.io/name=kueue
                        control-plane=controller-manager
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               control-plane=controller-manager
Replicas:               1 desired | 1 updated | 2 total | 1 available | 1 unavailable

It literally just force overwrites over the existing object. It now seems to own two replicasets when it only really wants one.

The replicasets are now bifurcated with their own respective versions, and they are owned by the new deployment which has been overwritten due to --force-conflicts :

Replicaset One:

Name:           kueue-controller-manager-REPLICASE-ID
Namespace:      kueue-system
Selector:       control-plane=controller-manager,pod-template-hash=ID
Labels:         control-plane=controller-manager
                pod-template-hash=ID
Annotations:    deployment.kubernetes.io/desired-replicas: 1
                deployment.kubernetes.io/max-replicas: 2
                deployment.kubernetes.io/revision: 1
Controlled By:  Deployment/kueue-controller-manager

Replicaset Two:

Name:           kueue-controller-manager-REPLICASE-ID2
Namespace:      kueue-system
Selector:       control-plane=controller-manager,pod-template-hash=ID2
Labels:         app.kubernetes.io/component=controller
                app.kubernetes.io/name=kueue
                control-plane=controller-manager
                pod-template-hash=ID2
Annotations:    deployment.kubernetes.io/desired-replicas: 1
                deployment.kubernetes.io/max-replicas: 2
                deployment.kubernetes.io/revision: 2
Controlled By:  Deployment/kueue-controller-manager

If you note, the owner references just point at the deployment, which doesn't seem to have an ID other than the uniqueness of the name.

Controlled By: Deployment/kueue-controller-manager

Due to this, this deployment-controller now owns two replicasets, each with their own versions for their pods:

Image: registry.k8s.io/kueue/kueue:v0.9.1
Image: registry.k8s.io/kueue/kueue:v0.8.1

and then you end up with these pods:

kueue-system      kueue-controller-manager-1       2/2     Running            0              3d6h
kueue-system      kueue-controller-manager-2       1/2     CrashLoopBackOff   69 (43s ago)   5h28m

We're quite fortunate that the upgrade did fail (despite xpk thinking it succeeded), because the jobs continued to get processed by the old controller-managers, while the new ones failed.
But if you look at the configuration of the controller managers, you can see that they are indeed leader-elected and I'm not sure that leader election would work across different versions of the CRD? It's probably tied to the replicaset, is my guess.

apiVersion: config.kueue.x-k8s.io/v1beta1
...
leaderElection:
  leaderElect: true
@logicalhan
Copy link
Author

Yeah, it looks likes xpk.py create is also an update and if the cluster exists you still go through the entire xpk installation process. That means kueue is getting upgraded every time as well, as long as it doesn't bomb out before it gets there.

@fedebongio
Copy link

/cc

@mbobrovskyi
Copy link
Collaborator

@logicalhan Please take a look how to upgrade kueue 0.8 -> 0.9 https://github.com/kubernetes-sigs/kueue/releases/tag/v0.9.0 on the Upgrading steps section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants