Unable to restore PVCs from backup #160

ryshoooo · 2023-06-19T14:31:17Z

Describe the bug: I'm currently using Velero as a backup service for my Kubernetes cluster. When I try to restore openebs nfs volumes, they do not get restored with the data from the snapshot, but are empty instead.

Expected behaviour: I'd expect that the restoration would work and restore all of the data made during the backup time.

Steps to reproduce the bug:
Assuming we have a cluster, then

Install NFS dynamic provisioner
Install Velero
Create an NFS volume and put some data in
Create a Velero backup
Destroy the volume
Restore from the backup

Anything else we need to know?:
The reason why it's happening is that NFS dynamic provisioner depends on the PVC's UIDs and uses them to create the backend volumes. Upon Velero restoration, the UIDs are not preserved (it's actually impossible to preserve them, as Kubernetes does not allow to specify the UID of an object that is deployed). Meaning that the volumes are actually restored for a short time until the provisioner's garbage collector picks them up as trash and deletes them again.

So what happens is Velero restores the NFS volume claim and also the backend persistent volume claims. As the NFS volume claim has a different ID, the provisioner creates new persistent volumes and the garbage collector then collects the old (just restored ones) and deletes them.

Environment details:

OpenEBS version (use kubectl get po -n openebs --show-labels): 0.10.0
Kubernetes version (use kubectl version): 1.26.3
Cloud provider or hardware configuration: Azure/AWS
OS (e.g: cat /etc/os-release): Ubuntu
kernel (e.g: uname -a): Linux 2023 x86_64 x86_64 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

ryshoooo · 2023-07-30T10:31:20Z

Actually, I got this to work, the key is to set the reclaim policy to Retain instead of Delete.

It's still not a smooth restoration though, after the restoration now I constantly get log messages complaining about different UIDs from the garbage collector. Also, I had to manually patch the PVs to point to the new cluster IPs of the NFS services, since the restoration does not preserve those either.

This could be something the provisioner/garbage collector does as well? I.e. if the nfs server specification of the PV does not match the service IP, then automatically patch it? I'm willing to make a PR for this if anybody here thinks it's a good idea

AndreiCojocaru96 · 2023-09-18T14:42:28Z

I'm dealing with the same problem as you are. You are right, that you can restore by velero like that. But i wanted to try a disaster recovery, so i deleted even the helm release and the namespace with the bd's and cvr's and the volume with the configs, and upon creating the restore i encountered even more problems with the pv not being created by velero at all, i fixed this by creating it manually but that proves to cause other kind of problems with the nfs pod not even starting

  Warning  FailedMount  5m19s (x11 over 11m)  kubelet                                MountVolume.MountDevice failed for volume "pvc-7953bc86-83a1-4359-9d6f-43ffc27e3c53" : rpc error: code = Internal desc = Volume pvc-7953bc86-83a1-4359-9d6f-43ffc27e3c53 still mounted on node gke-staging-storage-f0076abe-5nd9

At this point i gave up trying to do a disaster recovery

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to restore PVCs from backup #160

Unable to restore PVCs from backup #160

ryshoooo commented Jun 19, 2023 •

edited

Loading

ryshoooo commented Jul 30, 2023

AndreiCojocaru96 commented Sep 18, 2023

Unable to restore PVCs from backup #160

Unable to restore PVCs from backup #160

Comments

ryshoooo commented Jun 19, 2023 • edited Loading

ryshoooo commented Jul 30, 2023

AndreiCojocaru96 commented Sep 18, 2023

ryshoooo commented Jun 19, 2023 •

edited

Loading