Skip to content

Latest commit

 

History

History
155 lines (110 loc) · 3.87 KB

csi-debug.md

File metadata and controls

155 lines (110 loc) · 3.87 KB

CSI driver troubleshooting guide

 

Case#1: volume create/delete issue

 

  • Symptoms
    • PVC can't go into Bound status
    • User workload pod can't go into a Running status

 

  • Locate csi driver pod
$ kubectl get po -o wide -n kube-system -l app=csi-azurelustre-controller
NAME                                              READY   STATUS    RESTARTS   AGE     IP             NODE
csi-azurelustre-controller-56bfddd689-dh5tk       3/3     Running   0          35s     10.240.0.19    k8s-agentpool-22533604-0
csi-azurelustre-controller-56bfddd689-sl4ll       3/3     Running   0          35s     10.240.0.23    k8s-agentpool-22533604-1

 

  • Get csi driver logs
$ kubectl logs csi-azurelustre-controller-56bfddd689-dh5tk -c azurelustre -n kube-system > csi-lustre-controller.log

note:

  • add --previous to retrieve logs from a previous running container

  • there could be multiple controller pods, logs can be taken from all of them simultaneously

$ kubectl logs -n kube-system -l app=csi-azurelustre-controller -c azurelustre --tail=-1 --prefix 
  • retrieve logs with follow (realtime) mode
$ kubectl logs deploy/csi-azurelustre-controller -c azurelustre -f -n kube-system

 

Case#2: volume mount/unmount issue

  • Locate csi driver pod and find out the pod does the actual volume mount/unmount operation
$ kubectl get po -o wide -n kube-system -l app=csi-azurelustre-node
NAME                           READY   STATUS    RESTARTS   AGE     IP             NODE
csi-azurelustre-node-9ds7f     3/3     Running   0          7m4s    10.240.0.35    k8s-agentpool-22533604-1
csi-azurelustre-node-dr4s4     3/3     Running   0          7m4s    10.240.0.4     k8s-agentpool-22533604-0

 

  • Get csi driver logs
$ kubectl logs csi-azurelustre-node-9ds7f -c azurelustre -n kube-system > csi-azurelustre-node.log

note: to watch logs in realtime from multiple csi-azurelustre-node DaemonSet pods simultaneously, run the command:

$ kubectl logs daemonset/csi-azurelustre-node -c azurelustre -n kube-system -f

 

  • Check Lustre mounts inside driver
$ kubectl exec -it csi-azurelustre-node-9ds7f -n kube-system -c azurelustre -- mount | grep lustre
172.18.8.12@tcp:/lustrefs on /var/lib/kubelet/pods/6632349a-05fd-466f-bc8a-8946617089ce/volumes/kubernetes.io~csi/pvc-841498d9-fa63-418c-8cc7-d94ec27f2ee2/mount type lustre (rw,flock,lazystatfs,encrypt)
172.18.8.12@tcp:/lustrefs on /var/lib/kubelet/pods/6632349a-05fd-466f-bc8a-8946617089ce/volumes/kubernetes.io~csi/pvc-841498d9-fa63-418c-8cc7-d94ec27f2ee2/mount type lustre (rw,flock,lazystatfs,encrypt)

   

Update driver version quickly by editing driver deployment directly

 

  • Update controller deployment
$ kubectl edit deployment csi-azurelustre-controller -n kube-system

 

  • Update daemonset deployment
$ kubectl edit ds csi-azurelustre-node -n kube-system

 

  • Change lustre CSI docker image config
image: mcr.microsoft.com/k8s/csi/azurelustre-csi:v0.1.0
imagePullPolicy: Always

   

Get azure lustre driver version

$ kubectl exec -it csi-azurelustre-node-9ds7f -n kube-system -c azurelustre -- /bin/bash -c "./azurelustreplugin -version"
Build Date: "2022-05-11T10:25:15Z"
Compiler: gc
Driver Name: azurelustre.csi.azure.com
Driver Version: v0.1.0
Git Commit: 43017c96b7cecaa09bc05ce9fad3fb9860a4c0ce
Go Version: go1.18.1
Platform: linux/amd64

   

Collect logs for Lustre CSI Driver Product Team for further investigation

 

  • get utility from /utils/azurelustre_log.sh, run it and share output lustre.logs with us
$ chmod +x ./azurelustre_log.sh
$ ./azurelustre_log.sh > lustre.logs 2>&1