Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to restore PVCs from backup #160

Open
ryshoooo opened this issue Jun 19, 2023 · 2 comments
Open

Unable to restore PVCs from backup #160

ryshoooo opened this issue Jun 19, 2023 · 2 comments

Comments

@ryshoooo
Copy link

ryshoooo commented Jun 19, 2023

Describe the bug: I'm currently using Velero as a backup service for my Kubernetes cluster. When I try to restore openebs nfs volumes, they do not get restored with the data from the snapshot, but are empty instead.

Expected behaviour: I'd expect that the restoration would work and restore all of the data made during the backup time.

Steps to reproduce the bug:
Assuming we have a cluster, then

  • Install NFS dynamic provisioner
  • Install Velero
  • Create an NFS volume and put some data in
  • Create a Velero backup
  • Destroy the volume
  • Restore from the backup

Anything else we need to know?:
The reason why it's happening is that NFS dynamic provisioner depends on the PVC's UIDs and uses them to create the backend volumes. Upon Velero restoration, the UIDs are not preserved (it's actually impossible to preserve them, as Kubernetes does not allow to specify the UID of an object that is deployed). Meaning that the volumes are actually restored for a short time until the provisioner's garbage collector picks them up as trash and deletes them again.

So what happens is Velero restores the NFS volume claim and also the backend persistent volume claims. As the NFS volume claim has a different ID, the provisioner creates new persistent volumes and the garbage collector then collects the old (just restored ones) and deletes them.

Environment details:

  • OpenEBS version (use kubectl get po -n openebs --show-labels): 0.10.0
  • Kubernetes version (use kubectl version): 1.26.3
  • Cloud provider or hardware configuration: Azure/AWS
  • OS (e.g: cat /etc/os-release): Ubuntu
  • kernel (e.g: uname -a): Linux 2023 x86_64 x86_64 x86_64 GNU/Linux
@ryshoooo
Copy link
Author

Actually, I got this to work, the key is to set the reclaim policy to Retain instead of Delete.

It's still not a smooth restoration though, after the restoration now I constantly get log messages complaining about different UIDs from the garbage collector. Also, I had to manually patch the PVs to point to the new cluster IPs of the NFS services, since the restoration does not preserve those either.

This could be something the provisioner/garbage collector does as well? I.e. if the nfs server specification of the PV does not match the service IP, then automatically patch it? I'm willing to make a PR for this if anybody here thinks it's a good idea

@AndreiCojocaru96
Copy link

I'm dealing with the same problem as you are. You are right, that you can restore by velero like that. But i wanted to try a disaster recovery, so i deleted even the helm release and the namespace with the bd's and cvr's and the volume with the configs, and upon creating the restore i encountered even more problems with the pv not being created by velero at all, i fixed this by creating it manually but that proves to cause other kind of problems with the nfs pod not even starting

  Warning  FailedMount  5m19s (x11 over 11m)  kubelet                                MountVolume.MountDevice failed for volume "pvc-7953bc86-83a1-4359-9d6f-43ffc27e3c53" : rpc error: code = Internal desc = Volume pvc-7953bc86-83a1-4359-9d6f-43ffc27e3c53 still mounted on node gke-staging-storage-f0076abe-5nd9

At this point i gave up trying to do a disaster recovery

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants