Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kwasm-operator is creating issue with node in AKS #187

Open
ishaq786tb opened this issue Nov 28, 2024 · 0 comments
Open

kwasm-operator is creating issue with node in AKS #187

ishaq786tb opened this issue Nov 28, 2024 · 0 comments

Comments

@ishaq786tb
Copy link

context of the issue:
I have deployed kwasm-operator inside AKS cluster as per the instructions. Pod for the operator get up and running. On adding annotation to node for kwasm, Required configuration get added to node. The operator started working fine in the AKS cluster and can able to handle WASM workload. In short node work fine to handle both WASM workload and Runc workload (validated).

Issue happens with the annotated node after some interval of time or almost after 1 day. On investigation, I found out that Microsoft have Node Auto-Remediation mechanism which revert back the configuration of the node done by kwasm operator in AKS to default configuration back after some interval of time.

consequences of the issue:

  • Node stop working as expected and is not able to handle both Runc as well as WASM workload.
  • Normal pods which was running fine before installation of the operator also end up with 'ContainerCreating' state and even WASM pod also end up with 'ContainerCreating' state.
  • In case, pod which was running on node is restarted or deleted, New pod get schedule on the annotated node but end up in 'ContainerCreate' state.

In short, Node of AKS is not able to handle Runc workload or WASM workload after Azure change back configuration to Default after the installation of operator once

Questions:

  1. The official page state that this operator work fine for AKS, Then why is this issue coming on the node after few hours of deployment of the operator?
  2. Is there any way by which we can prevent Azure to change the node configuration to default after specific interval of time also?
  3. Is this issue is noticed for AKS already with the operator?
  4. Is any prerequisite need to be done on AKS cluster before installing this operator, which can prevent this issue to happen?

Thanks for help in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant