- Symptoms
- PVC can't go into Bound status
- User workload pod can't go into a Running status
- Locate csi driver pod
$ kubectl get po -o wide -n kube-system -l app=csi-azurelustre-controller
NAME READY STATUS RESTARTS AGE IP NODE csi-azurelustre-controller-56bfddd689-dh5tk 3/3 Running 0 35s 10.240.0.19 k8s-agentpool-22533604-0 csi-azurelustre-controller-56bfddd689-sl4ll 3/3 Running 0 35s 10.240.0.23 k8s-agentpool-22533604-1
- Get csi driver logs
$ kubectl logs csi-azurelustre-controller-56bfddd689-dh5tk -c azurelustre -n kube-system > csi-lustre-controller.log
note:
add --previous to retrieve logs from a previous running container
there could be multiple controller pods, logs can be taken from all of them simultaneously
$ kubectl logs -n kube-system -l app=csi-azurelustre-controller -c azurelustre --tail=-1 --prefix
- retrieve logs with
follow
(realtime) mode$ kubectl logs deploy/csi-azurelustre-controller -c azurelustre -f -n kube-system
- Locate csi driver pod and find out the pod does the actual volume mount/unmount operation
$ kubectl get po -o wide -n kube-system -l app=csi-azurelustre-node
NAME READY STATUS RESTARTS AGE IP NODE csi-azurelustre-node-9ds7f 3/3 Running 0 7m4s 10.240.0.35 k8s-agentpool-22533604-1 csi-azurelustre-node-dr4s4 3/3 Running 0 7m4s 10.240.0.4 k8s-agentpool-22533604-0
- Get csi driver logs
$ kubectl logs csi-azurelustre-node-9ds7f -c azurelustre -n kube-system > csi-azurelustre-node.log
note: to watch logs in realtime from multiple
csi-azurelustre-node
DaemonSet pods simultaneously, run the command:$ kubectl logs daemonset/csi-azurelustre-node -c azurelustre -n kube-system -f
- Check Lustre mounts inside driver
$ kubectl exec -it csi-azurelustre-node-9ds7f -n kube-system -c azurelustre -- mount | grep lustre
172.18.8.12@tcp:/lustrefs on /var/lib/kubelet/pods/6632349a-05fd-466f-bc8a-8946617089ce/volumes/kubernetes.io~csi/pvc-841498d9-fa63-418c-8cc7-d94ec27f2ee2/mount type lustre (rw,flock,lazystatfs,encrypt) 172.18.8.12@tcp:/lustrefs on /var/lib/kubelet/pods/6632349a-05fd-466f-bc8a-8946617089ce/volumes/kubernetes.io~csi/pvc-841498d9-fa63-418c-8cc7-d94ec27f2ee2/mount type lustre (rw,flock,lazystatfs,encrypt)
- Update controller deployment
$ kubectl edit deployment csi-azurelustre-controller -n kube-system
- Update daemonset deployment
$ kubectl edit ds csi-azurelustre-node -n kube-system
- Change lustre CSI docker image config
image: mcr.microsoft.com/k8s/csi/azurelustre-csi:v0.1.0
imagePullPolicy: Always
$ kubectl exec -it csi-azurelustre-node-9ds7f -n kube-system -c azurelustre -- /bin/bash -c "./azurelustreplugin -version"
Build Date: "2022-05-11T10:25:15Z" Compiler: gc Driver Name: azurelustre.csi.azure.com Driver Version: v0.1.0 Git Commit: 43017c96b7cecaa09bc05ce9fad3fb9860a4c0ce Go Version: go1.18.1 Platform: linux/amd64
- get utility from /utils/azurelustre_log.sh, run it and share output lustre.logs with us
$ chmod +x ./azurelustre_log.sh
$ ./azurelustre_log.sh > lustre.logs 2>&1