Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph-csi-rbd provisioner tries to handle delete for rbd-provisioner pvs creating stucked pvs #4488

Closed
dragoangel opened this issue Mar 12, 2024 · 9 comments

Comments

@dragoangel
Copy link
Contributor

Describe the bug

If ceph-csi-rbd provisioner will be deployed alongside with rbd-provisioner it will try delete pvs created and managed by rbd-provisioner, resulting in issue that rbd will be deleted from ceph by rbd-provisioner but blocked from deletion from k8s due to errors in finilizers.

Environment details

  • Image/version of Ceph CSI driver : 3.10.1
  • Helm chart version : 3.10.1
  • Kernel version : any
  • Mounter used for mounting PVC : krbd
  • Kubernetes cluster version : 1.24.x
  • Ceph cluster version : any

Steps to reproduce

Steps to reproduce the behavior:

  1. Setup rbd-provisioner https://artifacthub.io/packages/helm/kubesphere-test/rbd-provisioner
  2. Create pvc with storage class of rbd-provisioner with reclaimPolicy: Delete
  3. Setup ceph-csi-rbd via offical helm chart
  4. Create pvc with storage class of ceph-csi-rbd with reclaimPolicy: Delete
  5. Delete both pvcs
  6. Check pvs - one from rbd-provisioner will stay in Released status endlessly
  7. Check logs of both rbd-provisioner and ceph-csi-rbd - see that ceph-csi-rbd tried delete not owned by him pv

Actual results

rbd-provisioner tried delete pod but failed due to ceph-csi-rbd also tried to manipulate it, but it shouldn't.

Expected behavior

ceph-csi-rbd should not try delete pvs not owned by him when no In-tree migration enabled, for example should not touch:

...
  annotations:
    kubernetes.io/createdby: rbd-dynamic-provisioner
    pv.kubernetes.io/bound-by-controller: 'yes'
    pv.kubernetes.io/provisioned-by: kubernetes.io/rbd
...
spec:
  capacity:
    storage: 5Gi
  rbd:
    monitors:
      - x:6789
    image: kubernetes-dynamic-pvc-be78a32a-ec64-499d-a08b-f187a3796505
    fsType: ext4
    pool: rbds
    user: rbds
    keyring: /etc/ceph/keyring
    secretRef:
      name: ceph-key
      namespace: kube-system
...

as work only with pvs like:

...
  annotations:
    pv.kubernetes.io/provisioned-by: rbd.csi.ceph.com
    volume.kubernetes.io/provisioner-deletion-secret-name: ceph-csi-rbd-vault-secret
    volume.kubernetes.io/provisioner-deletion-secret-namespace: ceph-csi-rbd
...
spec:
  csi:
    driver: rbd.csi.ceph.com
    fsType: xfs
    volumeAttributes:
...

Logs

rbd-provisioner logs:

Common labels: {"app":"rbd-provisioner","container":"rbd-provisioner","environment":"sandbox","image":"quay.io/external_storage/rbd-provisioner:v2.1.1-k8s1.11","job":"kube-system/rbd-provisioner","namespace":"kube-system","pod":"rbd-provisioner-64f5998589-n8fl8","stream":"stderr"}
2024-03-11 21:32:02.250	I0311 21:32:02.249955       1 controller.go:1158] delete "pvc-1ce738ef-cc2c-45b2-b0ef-6249fe56afa4": started
2024-03-11 21:32:02.255	E0311 21:32:02.255468       1 controller.go:1181] delete "pvc-1ce738ef-cc2c-45b2-b0ef-6249fe56afa4": volume deletion failed: identity annotation not found on PV
2024-03-11 21:32:02.255	E0311 21:32:02.255492       1 event.go:259] Could not construct reference to: '&v1.PersistentVolume{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"pvc-1ce738ef-cc2c-45b2-b0ef-6249fe56afa4", GenerateName:"", Namespace:"", SelfLink:"", UID:"da52ca4a-997f-4d3e-bd0d-a5d7e653f94e", ResourceVersion:"424345582", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63845787671, loc:(*time.Location)(0x1bc94e0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"kubernetes.io/createdby":"rbd-dynamic-provisioner", "pv.kubernetes.io/bound-by-controller":"yes", "pv.kubernetes.io/provisioned-by":"kubernetes.io/rbd"}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string{"kubernetes.io/pv-protection", "external-provisioner.volume.kubernetes.io/finalizer"}, ClusterName:""}, Spec:v1.PersistentVolumeSpec{Capacity:v1.ResourceList{"storage":resource.Quantity{i:resource.int64Amount{value:8589934592, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"", Format:"BinarySI"}}, PersistentVolumeSource:v1.PersistentVolumeSource{GCEPersistentDisk:(*v1.GCEPersistentDiskVolumeSource)(nil), AWSElasticBlockStore:(*v1.AWSElasticBlockStoreVolumeSource)(nil), HostPath:(*v1.HostPathVolumeSource)(nil), Glusterfs:(*v1.GlusterfsVolumeSource)(nil), NFS:(*v1.NFSVolumeSource)(nil), RBD:(*v1.RBDPersistentVolumeSource)(0xc420a7ac80), ISCSI:(*v1.ISCSIPersistentVolumeSource)(nil), Cinder:(*v1.CinderPersistentVolumeSource)(nil), CephFS:(*v1.CephFSPersistentVolumeSource)(nil), FC:(*v1.FCVolumeSource)(nil), Flocker:(*v1.FlockerVolumeSource)(nil), FlexVolume:(*v1.FlexPersistentVolumeSource)(nil), AzureFile:(*v1.AzureFilePersistentVolumeSource)(nil), VsphereVolume:(*v1.VsphereVirtualDiskVolumeSource)(nil), Quobyte:(*v1.QuobyteVolumeSource)(nil), AzureDisk:(*v1.AzureDiskVolumeSource)(nil), PhotonPersistentDisk:(*v1.PhotonPersistentDiskVolumeSource)(nil), PortworxVolume:(*v1.PortworxVolumeSource)(nil), ScaleIO:(*v1.ScaleIOPersistentVolumeSource)(nil), Local:(*v1.LocalVolumeSource)(nil), StorageOS:(*v1.StorageOSPersistentVolumeSource)(nil), CSI:(*v1.CSIPersistentVolumeSource)(nil)}, AccessModes:[]v1.PersistentVolumeAccessMode{"ReadWriteOnce"}, ClaimRef:(*v1.ObjectReference)(0xc4203be2a0), PersistentVolumeReclaimPolicy:"Delete", StorageClassName:"rbd-sb", MountOptions:[]string(nil), VolumeMode:(*v1.PersistentVolumeMode)(0xc4207c4570), NodeAffinity:(*v1.VolumeNodeAffinity)(nil)}, Status:v1.PersistentVolumeStatus{Phase:"Released", Message:"", Reason:""}}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Warning' 'VolumeFailedDelete' 'identity annotation not found on PV'
2024-03-11 21:32:02.255	W0311 21:32:02.255658       1 controller.go:787] Retrying syncing volume "pvc-1ce738ef-cc2c-45b2-b0ef-6249fe56afa4" because failures 0 < threshold 15
2024-03-11 21:32:02.255	E0311 21:32:02.255696       1 controller.go:802] error syncing volume "pvc-1ce738ef-cc2c-45b2-b0ef-6249fe56afa4": identity annotation not found on PV

csi-provisioner logs:

Common labels: {"app":"ceph-csi-rbd","component":"provisioner","container":"csi-provisioner","environment":"sandbox","image":"registry.k8s.io/sig-storage/csi-provisioner:v3.6.2","job":"ceph-csi-rbd/ceph-csi-rbd","namespace":"ceph-csi-rbd","pod":"ceph-csi-rbd-provisioner-5f6656fd89-4jj4v","stream":"stderr"}
2024-03-11 21:32:02.251	I0311 21:32:02.251604       1 controller.go:1509] delete "pvc-1ce738ef-cc2c-45b2-b0ef-6249fe56afa4": started
2024-03-11 21:32:02.577	E0311 21:32:02.577701       1 controller.go:1519] delete "pvc-1ce738ef-cc2c-45b2-b0ef-6249fe56afa4": volume deletion failed: rpc error: code = Internal desc = missing configuration for cluster ID "cd6a5ce846b013ef9091b8dfe0b02f9d"
2024-03-11 21:32:02.577	W0311 21:32:02.577773       1 controller.go:989] Retrying syncing volume "pvc-1ce738ef-cc2c-45b2-b0ef-6249fe56afa4", failure 0
2024-03-11 21:32:02.577	E0311 21:32:02.577835       1 controller.go:1007] error syncing volume "pvc-1ce738ef-cc2c-45b2-b0ef-6249fe56afa4": rpc error: code = Internal desc = missing configuration for cluster ID "cd6a5ce846b013ef9091b8dfe0b02f9d"
2024-03-11 21:32:02.577	I0311 21:32:02.577873       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-1ce738ef-cc2c-45b2-b0ef-6249fe56afa4", UID:"da52ca4a-997f-4d3e-bd0d-a5d7e653f94e", APIVersion:"v1", ResourceVersion:"424345582", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' rpc error: code = Internal desc = missing configuration for cluster ID "cd6a5ce846b013ef9091b8dfe0b02f9d"

Additional context

looks similar to #4242

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 12, 2024

@dragoangel if you are planning to use 2 csi drivers you need to keep the csi driver names unique. may i know what is the reason to deploy 2 csi drivers in same cluster?

@dragoangel
Copy link
Contributor Author

@dragoangel if you are planning to use 2 csi drivers you need to keep the csi driver names unique. may i know what is the reason to deploy 2 csi drivers in same cluster?

one is old - rbd-provisioner, and I need to keep it to handle existing old pvcs, and ceph-csi is new to handle all in new way

@dragoangel
Copy link
Contributor Author

I don't understand why it should try touch pvcs that it doesn't belong to

@dragoangel
Copy link
Contributor Author

If I add cluster to csi-rbd provisioner as md5 and give perms to my csi user - it's deletes pvcs fine

@dragoangel
Copy link
Contributor Author

dragoangel commented Mar 12, 2024

Okay, not obvious, but solution is:

  1. set rbd-provisioner replicas to 0

  2. add configuration for cluster mentioned in logs

and create\delete\etc works out of the box
important: user mentioned in old kube sc should still exist, as well as a secret in k8s

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 12, 2024

one is old - rbd-provisioner, and I need to keep it to handle existing old pvcs, and ceph-csi is new to handle all in new way

this is possible only when you use different csi driver names, if you want to use the same driver names for both above the is the solution you have provided.

@dragoangel
Copy link
Contributor Author

one is old - rbd-provisioner, and I need to keep it to handle existing old pvcs, and ceph-csi is new to handle all in new way

this is possible only when you use different csi driver names, if you want to use the same driver names for both above the is the solution you have provided.

provisioner for old one is kubernetes.io/rbd, yeah, and that why I think it should not be touched by ceph-csi

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 12, 2024

i think its happening due to the migration https://github.com/ceph/ceph-csi/blob/devel/docs/intree-migrate.md#in-tree-storage-plugin-to-csi-driver-migration

@dragoangel
Copy link
Contributor Author

Okay, I think so, then this not a bug, but feature. Not obvious one, but still 😁. I think I can close it then. Hope other people will find this issue in future when have same issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants