Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding/removing providers in management object results in deletion of managed clusters. #703

Closed
a13x5 opened this issue Dec 2, 2024 · 4 comments · Fixed by #706
Closed
Assignees
Labels
bug Something isn't working

Comments

@a13x5
Copy link
Collaborator

a13x5 commented Dec 2, 2024

Issue

When changing providers (adding or removing) the helm releases connected to managed clusters are being deleted by management controller and then immediately recreated by managedcluster controller.

Logs:

2024-12-02T20:05:55Z	INFO	Reconciling Management	{"controller": "management", "controllerGroup": "hmc.mirantis.com", "controllerKind": "Management", "Management": {"name":"hmc"}, "namespace": "", "name": "hmc", "reconcileID": "5d0f6d25-dc34-4144-80c5-2d6798e928e9"}
2024-12-02T20:05:55Z	INFO	Found component to remove	{"controller": "management", "controllerGroup": "hmc.mirantis.com", "controllerKind": "Management", "Management": {"name":"hmc"}, "namespace": "", "name": "hmc", "reconcileID": "5d0f6d25-dc34-4144-80c5-2d6798e928e9", "component_name": "vsphere-dev"}
2024-12-02T20:05:55Z	INFO	Removed HelmRelease	{"controller": "management", "controllerGroup": "hmc.mirantis.com", "controllerKind": "Management", "Management": {"name":"hmc"}, "namespace": "", "name": "hmc", "reconcileID": "5d0f6d25-dc34-4144-80c5-2d6798e928e9", "reference": "hmc-system/vsphere-dev"}
2024-12-02T20:05:55Z	INFO	Reconciling ManagedCluster	{"controller": "managedcluster", "controllerGroup": "hmc.mirantis.com", "controllerKind": "ManagedCluster", "ManagedCluster": {"name":"vsphere-dev","namespace":"hmc-system"}, "namespace": "hmc-system", "name": "vsphere-dev", "reconcileID": "1046080e-c2a0-437d-ad5a-554072de12c1"}
2024-12-02T20:05:55Z	INFO	Downloading Helm chart	{"controller": "managedcluster", "controllerGroup": "hmc.mirantis.com", "controllerKind": "ManagedCluster", "ManagedCluster": {"name":"vsphere-dev","namespace":"hmc-system"}, "namespace": "hmc-system", "name": "vsphere-dev", "reconcileID": "1046080e-c2a0-437d-ad5a-554072de12c1"}
2024/12/02 20:05:55 [DEBUG] GET http://source-controller.hmc-system.svc.cluster.local./helmchart/hmc-system/vsphere-standalone-cp-0-0-3/vsphere-standalone-cp-0.0.3.tgz
2024-12-02T20:05:55Z	INFO	Initializing Helm client	{"controller": "managedcluster", "controllerGroup": "hmc.mirantis.com", "controllerKind": "ManagedCluster", "ManagedCluster": {"name":"vsphere-dev","namespace":"hmc-system"}, "namespace": "hmc-system", "name": "vsphere-dev", "reconcileID": "1046080e-c2a0-437d-ad5a-554072de12c1"}
2024-12-02T20:05:55Z	INFO	Validating Helm chart with provided values	{"controller": "managedcluster", "controllerGroup": "hmc.mirantis.com", "controllerKind": "ManagedCluster", "ManagedCluster": {"name":"vsphere-dev","namespace":"hmc-system"}, "namespace": "hmc-system", "name": "vsphere-dev", "reconcileID": "1046080e-c2a0-437d-ad5a-554072de12c1"}

Repro steps

  1. Create managed cluster (I created vsphere cluster), ensure that helm release is installed properly and Cluster object is created properly
  2. Edit management cluster and add or remove unused provider (I removed and added cluster-api-provider-aws)
  3. Check helmrelease related to managed cluster. Note that creation time was changed and release was recreated.

Additional note: When installed in airgap environment a number of public cloud controllers were not initialized properly (on purpose) by no passing airgap flag. This caused deletion and recreation happening every 10 seconds for all ManagedClusters. Unfortunately I couldn't reproduce it properly.

Conclusions

Most probably caused by #584 . It looks like selector is too broad and we should avoid deleting helm releases related to managed cluster.
Also most probably this will not affect managed clusters created in namespaces other than hmc-system (not tested).

@a13x5 a13x5 added the bug Something isn't working label Dec 2, 2024
@zerospiel
Copy link
Contributor

zerospiel commented Dec 3, 2024

The repro steps suggest that an unused provider has been deleted. According to the mentioned #584 and unmentioned #574 PR where the validation occurs, it is the expected behavior. If the removed provider does not have any clustertemplates or has some that are utilized by none of managedclusters objects, then it is allowed to remove the provider and drop helm releases whose names are not equal to the left components (providers) names (excluding the core's releases). There are integration tests for this particular feature of the management-ctrl.

UPD. The assumption regarding the hmc-system namespace is correct: only the releases in the system namespace are being selected during the removal check, no need to test it.

In particular case: I suppose the vsphere-dev has not been included in the list of the management.providers list, hence it is being removed. There are no labels available on the helm releases installed by the template-ctrl, that could narrow the selector.

At the time of implementation, there was no airgap feature, JIC.

As a suggestion: add the new component to the list of the providers (I believe it is why the section exists in the first place), or the template-ctrl could add extra specific labels during create/update of helmreleases objects, that could be utilized during that removal check in the mgmt-ctrl.

@a13x5
Copy link
Collaborator Author

a13x5 commented Dec 3, 2024

I will disagree.
I'm removing AWS provider and my cluster, created with vSphere provider gets deleted.And then recreated. How this could be expected?
The removal of provider shouldn't result removal of all clusters in hmc-system namespace (for all providers). This behavior is confusing (to say the least).
And adding managedclusters as a providers in the Management spec is even more confusing.
If we think that this UX is ok, we should explicitly document that behavior, since it's not standard and it's not something that could be expected.

@zerospiel
Copy link
Contributor

I do not quite understand what you disagree with. I've checked by myself, and there is the only suggestion left from the already mentioned: add extra labels

@zerospiel zerospiel self-assigned this Dec 3, 2024
@zerospiel zerospiel moved this from Todo to In Progress in Project 2A Dec 3, 2024
zerospiel added a commit to zerospiel/hmc that referenced this issue Dec 3, 2024
* add helm-release label to the helmreleases
  installed by the HMC management part
* cleanup helmreleases only if removed and
  the new label value equals to the mgmt part
* small refactor to fix linter

Closes k0rdent#703
zerospiel added a commit to zerospiel/hmc that referenced this issue Dec 3, 2024
* add helm-release label to the helmreleases
  installed by the HMC management part
* cleanup helmreleases only if removed and
  the new label value equals to the mgmt part
* small refactor to fix linter

Closes k0rdent#703
zerospiel added a commit to zerospiel/hmc that referenced this issue Dec 3, 2024
* add helm-release label to the helmreleases
  installed by the HMC management part
* cleanup helmreleases only if removed and
  the new label value equals to the mgmt part
* small refactor to fix linter

Closes k0rdent#703
@a13x5
Copy link
Collaborator Author

a13x5 commented Dec 3, 2024

@zerospiel Sorry, probably I misread the original message - had an impression that you're telling that this is normal expected behavior.

zerospiel added a commit to zerospiel/hmc that referenced this issue Dec 4, 2024
* add helm-release label to the helmreleases
  installed by the HMC management part
* cleanup helmreleases only if removed and
  the new label value equals to the mgmt part
* small refactor to fix linter

Closes k0rdent#703
zerospiel added a commit to zerospiel/hmc that referenced this issue Dec 4, 2024
* add helm-release label to the helmreleases
  installed by the HMC management part
* cleanup helmreleases only if removed and
  the new label value equals to the mgmt part
* small refactor to fix linter

Closes k0rdent#703
zerospiel added a commit to zerospiel/hmc that referenced this issue Dec 4, 2024
* add helm-release label to the helmreleases
  installed by the HMC management part
* cleanup helmreleases only if removed and
  the new label value equals to the mgmt part
* small refactor to fix linter

Closes k0rdent#703
zerospiel added a commit to zerospiel/hmc that referenced this issue Dec 4, 2024
* add helm-release label to the helmreleases
  installed by the HMC management part
* cleanup helmreleases only if removed and
  the new label value equals to the mgmt part
* small refactor to fix linter

Closes k0rdent#703
zerospiel added a commit to zerospiel/hmc that referenced this issue Dec 4, 2024
* add helm-release label to the helmreleases
  installed by the HMC management part
* cleanup helmreleases only if removed and
  the new label value equals to the mgmt part
* small refactor to fix linter

Closes k0rdent#703
zerospiel added a commit to zerospiel/hmc that referenced this issue Dec 4, 2024
* add helm-release label to the helmreleases
  installed by the HMC management part
* cleanup helmreleases only if removed and
  the new label value equals to the mgmt part
* small refactor to fix linter

Closes k0rdent#703
Kshatrix pushed a commit that referenced this issue Dec 6, 2024
* add helm-release label to the helmreleases
  installed by the HMC management part
* cleanup helmreleases only if removed and
  the new label value equals to the mgmt part
* small refactor to fix linter

Closes #703
@github-project-automation github-project-automation bot moved this from In Progress to Done in Project 2A Dec 6, 2024
@alex-shl alex-shl added this to k0rdent Jan 3, 2025
@alex-shl alex-shl moved this to Done in k0rdent Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants