-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hard rebooting Kubernetes nodes leads to "volume already mounted at more than one place" #153
Comments
Another hard reboot, another stuck pod, now with a slightly different error:
The "special file" it complains as not existing is there, which makes this even more mysterious:
|
I happen to be considering using openebs' dynamic-nfs-provisioner vs other options right now, and I was curious about this problem you ran into. A few questions: Can you clarify where the workload pod, and control plane pods were and end up? A table like the following:
The control plane pods ("OpenEBS pod") isn't strictly necessary but I'm curious. It's unclear exactly which pods are failing and how wide the failure is -- when you say "NFS volume pods", I assume you mean your workloads, but you must also mean the NFS server pod(s) as well, correct? What exactly happened in the node failure? Did the node crash and come back up? Was Jiva control plane running on the node that went down? Was the Jiva data plane running on the node that went down? It seems like you had a Jiva failure which caused the NFS server pod to not be able to access it's own PVC, which means it can't serve the drive for your actual workload. What's weird is that if the node went down but came back up (so the identical mount was available, which you saw), then maybe the Jiva data plane pod went to a different node in the meantime? That shouldn't be possible (it's been a while since I ran Jiva but it should pin controllers/data-plane managers to nodes)... Since Jiva is longhorn underneath, you should be able to check the dashboard/longhorn UI. I can't remember how hard the Longhorn UI was to get to was to get to, but you should be able to find where longhorn is running in the Jiva control plane pods and port-forward to get at the UI (assuming the port is exposed, you may have to edit the pod). That will tell you if the drive is failing at the Jiva level. |
I don't think I am able to reproduce this problem anymore as such in my cluster because I forced all the Dynamic NFS Provisioner server pods to be on the same node, so they won't switch between nodes in hard crashes anymore, and as they stay on the same node, as they mount their Jiva volumes ReadWriteOnce, it doesn't matter if there are some ghost containers leftover on that node as those won't prevent a remount on the same node. However, it would be nice if there was a way to diagnose what exactly is keeping a Jiva volume mounted, or Dynamic NFS Provisioner to somehow get over this. |
Ah OK, well if the problem is gone with Jiva being constrained that way it seems like it might be a Jiva-level problem for sure -- maybe this ticket is worth closing then? What else do you think would make it an NFS provisioner specific issue? |
No matter what backing storage is used for NFS provisioner, it would generally be ReadWriteOnce. That means that if the NFS pod moves to another node, it should continue to be able to use these volumes (if they aren't node specific like hostpath), as long as it is the only one using them. However, in some hard reboot cases, something is left hanging around, and the zombie NFS provisioner pods keep the volumes reserved from some other node. It would be nice if there was a way to see what is keeping these volumes tangled, but I wonder if there might be some way NFS provisioner could unmount the volume in this error case automatically. I don't know if that is possible, as I don't quite understand what exactly is left hanging to keep the volume mounted as it's not visible on the Kubernetes level, and I couldn't find it on the node mount level either. Perhaps you're right that this is a Jiva level problem. Maybe Jiva is miscounting the mount locations somehow, still imagining that some already terminated container holds a mount. Ok, it can be closed, since you could be right, because I am personally not affected by this problem anymore due to the workaround, and if someone has the same problem they can find a workaround here from this closed issue. |
Yeah if only someone had gotten to this and we were able to debug when you ran into it!
Maybe someone will have this issue and ask for the ticket to be reopened.
…On Sat, Jan 28, 2023 at 10:13, Tero Keski-Valkama ***@***.***> wrote:
No matter what backing storage is used for NFS provisioner, it would generally be ReadWriteOnce. That means that if the NFS pod moves to another node, it should continue to be able to use these volumes (if they aren't node specific like hostpath), as long as it is the only one using them.
However, in some hard reboot cases, something is left hanging around, and the zombie NFS provisioner pods keep the volumes reserved from some other node.
It would be nice if there was a way to see what is keeping these volumes tangled, but I wonder if there might be some way NFS provisioner could unmount the volume in this error case automatically.
I don't know if that is possible, as I don't quite understand what exactly is left hanging to keep the volume mounted as it's not visible on the Kubernetes level, and I couldn't find it on the node mount level either.
Perhaps you're right that this is a Jiva level problem. Maybe Jiva is miscounting the mount locations somehow, still imagining that some already terminated container holds a mount. Ok, it can be closed, since you could be right, because I am personally not affected by this problem anymore due to the workaround, and if someone has the same problem they can find a workaround here from this closed issue.
—
Reply to this email directly, [view it on GitHub](#153 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AAIJCNQAPBVDCXA3BUIZLV3WURXDFANCNFSM6AAAAAAT5AXIEM).
You are receiving this because you commented.Message ID: ***@***.***>
|
openebs/openebs#3632 seems like possibly the same issue. |
NFS provisioner binds to a persistent volume claim in the back in the
ReadWriteOnce
mode.This is otherwise all well and good, but in a hard reboot of a node, starting these NFS volume pods fails as they fail to get the volume mount.
Specifically with these events:
At least in microk8s I have found no way to find out what is mounting the volume behind the scenes exactly, or maybe the accounting is simply wrong. I suppose some weird ghost container could in principle be the one keeping the volume reserved, but I haven't managed to find out what and how.
What I have tried:
mount
s. Nothing special there.Steps to reproduce the bug:
Have several NFS persistent volume claims which use ReadWriteOnce volumes behind them active and reboot a Kubernetes node.
Expected:
What happens:
I have no clue how to investigate further and due to manual surgery to try to make the cluster up and running again after this problem, the whole cluster is now in a state of no return and I need to rebuild it from scratch.
Environment details:
openebs.io/version=3.3.0
kubectl version
):uname -a
): Linux curie 5.15.0-58-generic refactor(metrics): Refactoring metrics package with prometheus metrics naming convention #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/LinuxI'm not sure if this is a NFS Provisioner bug, OpenEBS Jiva bug or MicroK8S bug.
This happens to me about weekly, if anyone has suggestions on how to debug what happens, I'd be glad to hear such.
The text was updated successfully, but these errors were encountered: