Skip to content

Latest commit

 

History

History
107 lines (95 loc) · 4.65 KB

csi-debug.md

File metadata and controls

107 lines (95 loc) · 4.65 KB

CSI driver troubleshooting guide

Case#1: volume create/delete issue

  • locate csi driver pod
kubectl get po -o wide -n kube-system | grep csi-blob-controller
NAME                                       READY   STATUS    RESTARTS   AGE     IP             NODE
csi-blob-controller-56bfddd689-dh5tk       4/4     Running   0          35s     10.240.0.19    k8s-agentpool-22533604-0
csi-blob-controller-56bfddd689-sl4ll       4/4     Running   0          35s     10.240.0.23    k8s-agentpool-22533604-1
  • get csi driver logs
kubectl logs csi-blob-controller-56bfddd689-dh5tk -c blob -n kube-system > csi-blob-controller.log

note: there could be multiple controller pods, logs can be taken from all of them simultaneously, also with follow (realtime) mode kubectl logs deploy/csi-blob-controller -c blob -f -n kube-system

Case#2: volume mount/unmount failed

  • locate csi driver pod and make sure which pod does the actual volume mount/unmount
kubectl get po -o wide -n kube-system | grep csi-blob-node
NAME                                       READY   STATUS    RESTARTS   AGE     IP             NODE
csi-blob-node-cvgbs                        3/3     Running   0          7m4s    10.240.0.35    k8s-agentpool-22533604-1
csi-blob-node-dr4s4                        3/3     Running   0          7m4s    10.240.0.4     k8s-agentpool-22533604-0
  • get csi driver logs
kubectl logs csi-blob-node-cvgbs -c blob -n kube-system > csi-blob-node.log

note: to watch logs in realtime from multiple csi-blob-node DaemonSet pods simultaneously, run the command:

kubectl logs daemonset/csi-blob-node -c blob -n kube-system -f
  • check blobfuse mount inside driver
kubectl exec -it csi-blob-node-9vl9t -c blob -n kube-system -- mount | grep blobfuse
blobfuse on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-efce16db-bf15-4634-b82b-068385019d7c/globalmount type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
blobfuse on /var/lib/kubelet/pods/e73d0984-a253-4203-9e8c-9237ae5c55d5/volumes/kubernetes.io~csi/pvc-efce16db-bf15-4634-b82b-068385019d7c/mount type fuse (rw,relatime,user_id=0,group_id=0,allow_other)
  • check nfs mount inside driver
kubectl exec -it csi-blob-node-9vl9t -n kube-system -c blob -- mount | grep nfs
accountname.file.core.windows.net:/accountname/pvcn-46c357b2-333b-4c42-8a7f-2133023d6c48 on /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-46c357b2-333b-4c42-8a7f-2133023d6c48/globalmount type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.244.0.6,local_lock=none,addr=20.150.29.168)
accountname.file.core.windows.net:/accountname/pvcn-46c357b2-333b-4c42-8a7f-2133023d6c48 on /var/lib/kubelet/pods/7994e352-a4ee-4750-8cb4-db4fcf48543e/volumes/kubernetes.io~csi/pvc-46c357b2-333b-4c42-8a7f-2133023d6c48/mount type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.244.0.6,local_lock=none,addr=20.150.29.168)

Update driver version quickly by editing driver deployment directly

  • update controller deployment
kubectl edit deployment csi-blob-controller -n kube-system
  • update daemonset deployment
kubectl edit ds csi-blob-node -n kube-system

change below deployment config, e.g.

        image: mcr.microsoft.com/k8s/csi/blob-csi:v1.4.0
        imagePullPolicy: Always

get blobfuse driver version

kubectl exec -it csi-blob-node-fmbqw -n kube-system -c blob -- sh
blobfuse -v
blobfuse 1.2.4

check blobfuse mount on the agent node

mount | grep blobfuse | uniq
  • Troubleshooting blobfuse mount failure on the agent node

troubleshooting connection failure on agent node

  • blobfuse

Blobfuse mount will fail due to incorrect storage account name, key or container name, run below commands to check whether blobfuse mount would work on agent node:

mkdir test
export AZURE_STORAGE_ACCOUNT=
export AZURE_STORAGE_ACCESS_KEY=
# only for sovereign cloud
# export AZURE_STORAGE_BLOB_ENDPOINT=accountname.blob.core.chinacloudapi.cn
blobfuse test --container-name=CONTAINER-NAME --tmp-path=/tmp/blobfuse -o allow_other --file-cache-timeout-in-seconds=120
  • NFSv3
mkdir /tmp/test
mount -t nfs -o sec=sys,vers=3,nolock accountname.blob.core.windows.net:/accountname/container-name /tmp/test