Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unmountKubeletDirectory should return err instead of nil if unmounting fails #2923

Closed
SituationNormal opened this issue Jul 31, 2023 · 9 comments · Fixed by kubernetes/kubernetes#122530
Labels
area/kubelet kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@SituationNormal
Copy link

SituationNormal commented Jul 31, 2023

https://github.com/kubernetes/kubernetes/blob/9e644106593f3f4aa98f8a84b23db5fa378900bd/cmd/kubeadm/app/cmd/phases/reset/unmount_linux.go#L52

unmountKubeletDirectory currently returns nil if it encounters an error unmounting. However, runCleanupNode relies on this code to return an error if there's a problem so it doesn't wipe out the kubelet run directory. In this case, if there's an error unmounting since unmountKubeletDirectory returns nil, runCleanupNode will wipe out the kubelet run directory, including any mounted data that may be pointing to an NFS/persistent storage or something from the kubelet/pods directory.

/sig k8s-infra

@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 31, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@SituationNormal
Copy link
Author

/sig k8s-infra

@k8s-ci-robot k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 31, 2023
@SituationNormal
Copy link
Author

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Aug 31, 2023
@neolit123
Copy link
Member

/remove-sig node k8s-infra
/transfer kubeadm

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Sep 1, 2023
@k8s-ci-robot k8s-ci-robot transferred this issue from kubernetes/kubernetes Sep 1, 2023
@neolit123 neolit123 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. area/kubelet and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. labels Sep 1, 2023
@neolit123
Copy link
Member

https://github.com/kubernetes/kubernetes/blob/9e644106593f3f4aa98f8a84b23db5fa378900bd/cmd/kubeadm/app/cmd/phases/reset/unmount_linux.go#L52

unmountKubeletDirectory currently returns nil if it encounters an error unmounting. However, runCleanupNode relies on this code to return an error if there's a problem so it doesn't wipe out the kubelet run directory. In this case, if there's an error unmounting since unmountKubeletDirectory returns nil, runCleanupNode will wipe out the kubelet run directory, including any mounted data that may be pointing to an NFS/persistent storage or something from the kubelet/pods directory.

i think you have a point.
if we are going to refactor this we have to try to keep reset as a "best practice". thatt is the principle it tries to follow.

please send a PR if you want. also let's see if others have an opinion.

@SataQiu
Copy link
Member

SataQiu commented Sep 1, 2023

unmountKubeletDirectory is a NOOP on all but linux

So unmountKubeletDirectory is just best-effort, and users performing kubeadm reset usually means that they want to clean up the existing data on the node, right?
If users want to keep the data, I think they need to make a backup before executing kubeadm reset.

@huww98
Copy link

huww98 commented Sep 4, 2023

So unmountKubeletDirectory is just best-effort, and users performing kubeadm reset usually means that they want to clean up the existing data on the node, right? If users want to keep the data, I think they need to make a backup before executing kubeadm reset.

kubeadm reset can be a routine process of removing a node from cluster. It can also be part of an automated process. I think the user would not expect their (network attached) persistent volume to be wiped just because it happens mounted on this node.

@pacoxu
Copy link
Member

pacoxu commented Sep 4, 2023

unmountKubeletDirectory is a NOOP on all but linux

So unmountKubeletDirectory is just best-effort, and users performing kubeadm reset usually means that they want to clean up the existing data on the node, right? If users want to keep the data, I think they need to make a backup before executing kubeadm reset.

+1 for this point.

However, the change in kubernetes/kubernetes#120377 will make the reset safer, which is acceptable to me.

@neolit123 neolit123 added kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Sep 8, 2023
@neolit123 neolit123 added this to the v1.29 milestone Sep 8, 2023
@chendave
Copy link
Member

I think the user would not expect their (network attached) persistent volume to be wiped just because it happens mounted on this node.

+1, user might not aware how the PV is mounted in a pods, just wipe it might not appropriate.

@neolit123 neolit123 modified the milestones: v1.29, v1.30 Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
7 participants