Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI Driver CrashLoop #840

Closed
pscloud-patrick opened this issue Jan 9, 2025 · 5 comments
Closed

CSI Driver CrashLoop #840

pscloud-patrick opened this issue Jan 9, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@pscloud-patrick
Copy link

pscloud-patrick commented Jan 9, 2025

TL;DR

CSI Driver in K8s is crashlooping

Expected behavior

No crashloops

Observed behavior

The CSI driver is constantly crashlooping and cant get a connection to the socket:

Minimal working example

I have the following values set:

  helm_values_csi_driver:
    replicaCount: 3
    controller:
      extraEnvVars:
        - name: HCLOUD_DEBUG
          value: "1"
        - name: LOG_LEVEL
          value: trace

Log output

│ Stream closed EOF for kube-system/hcsi-hcloud-csi-controller-7bf7d7bdc7-f5k88 (hcloud-csi-driver)                                                                                 │
│ liveness-probe I0109 23:44:09.477775       1 connection.go:253] "Still connecting" address="unix:///run/csi/socket"                                                               │
│ csi-attacher I0109 23:44:09.689376       1 connection.go:253] "Still connecting" address="unix:///run/csi/socket"                                                                 │
│ csi-resizer I0109 23:44:09.766386       1 connection.go:253] "Still connecting" address="unix:///run/csi/socket"                                                                  │
│ csi-provisioner I0109 23:44:09.837618       1 connection.go:253] "Still connecting" address="unix:///run/csi/socket"                                                              │
│ liveness-probe I0109 23:44:19.477200       1 connection.go:253] "Still connecting" address="unix:///run/csi/socket"                                                               │
│ csi-attacher I0109 23:44:19.689581       1 connection.go:253] "Still connecting" address="unix:///run/csi/socket"                                                                 │
│ csi-resizer I0109 23:44:19.766279       1 connection.go:253] "Still connecting" address="unix:///run/csi/socket"                                                                  │
│ csi-provisioner I0109 23:44:19.837818       1 connection.go:253] "Still connecting" address="unix:///run/csi/socket"                                                              │
│ liveness-probe I0109 23:44:29.477530       1 connection.go:253] "Still connecting" address="unix:///run/csi/socket"                                                               │
│ csi-attacher I0109 23:44:29.689548       1 connection.go:253] "Still connecting" address="unix:///run/csi/socket"                                                                 │
│ csi-attacher E0109 23:44:29.689810       1 main.go:149] "Failed to connect to the CSI driver" err="context deadline exceeded" csiAddress="/run/csi/socket"                        │
│ csi-resizer I0109 23:44:29.765364       1 connection.go:253] "Still connecting" address="unix:///run/csi/socket"                                                                  │
│ csi-resizer E0109 23:44:29.765934       1 main.go:153] "Failed to create CSI client" err="failed to connect to CSI driver: context deadline exceeded"                             │
│ csi-provisioner I0109 23:44:29.837542       1 connection.go:253] "Still connecting" address="unix:///run/csi/socket"                                                              │
│ csi-provisioner E0109 23:44:29.837720       1 csi-provisioner.go:215] context deadline exceeded                                                                                   │
│ Stream closed EOF for kube-system/hcsi-hcloud-csi-controller-7bf7d7bdc7-f5k88 (csi-attacher)                                                                                      │
│ Stream closed EOF for kube-system/hcsi-hcloud-csi-controller-7bf7d7bdc7-f5k88 (csi-resizer)                                                                                       │
│ Stream closed EOF for kube-system/hcsi-hcloud-csi-controller-7bf7d7bdc7-f5k88 (csi-provisioner)                                                                                   │
│ liveness-probe I0109 23:44:39.477696       1 connection.go:253] "Still connecting" address="unix:///run/csi/socket"


### Additional information

Kubernetes version 1.31.4, hccm Version 1.21.0, csi driver Version 2.11.0, installed via helm
@pscloud-patrick pscloud-patrick added the bug Something isn't working label Jan 9, 2025
@lukasmetzner
Copy link
Contributor

Hey,

could you check if the Hetzner API and the metadata service are available from inside the pod?

If you cannot access the pod, due to it crashing, you can also spawn a test pod with the following command:

kubectl run temp-pod --image=curlimages/curl --restart=Never -it -- sh

Best Regards
Lukas

@pscloud-patrick
Copy link
Author

pscloud-patrick commented Jan 10, 2025

Hm, there seeems to be DNS issues

~ $ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXX" https://api.hetzner.cloud/v1/servers
curl: (6) Could not resolve host: api.hetzner.cloud
~ $ curl https://ipv4.syseleven.de
curl: (6) Could not resolve host: ipv4.syseleven.de
~ $ curl http://169.254.169.254/hetzner/v1/metadata
availability-zone: fsn1-dc14
hostname: terraform
instance-id: 58530508
local-ipv4: ''
network-config:
  config:
  - mac_address: 96:00:03:fa:59:20
    name: eth0
    subnets:
    - ipv4: true
      type: dhcp
    type: physical
  version: 1
public-ipv4: 138.199.XXX.XXX
public-keys:
- 'ssh-ed25519 AAAAC3NzaXXXXXXXXX
  '
region: eu-central
vendor_data: "Content-Type: multipart/mixed; boundary=\"===============2513752762708167408==\"\
  \nMIME-Version: 1.0\n\n--===============2513752762708167408==\nContent-Type: text/cloud-config;\
  \ charset=\"us-ascii\"\nMIME-Version: 1.0\nContent-Transfer-Encoding: 7bit\nContent-Disposition:\
  \ attachment; filename=\"cloud-config\"\n\n#cloud-config\ndisable_root: false\n\
  fqdn: terraform.local.test.fsn1-dc14.worker02\nmanage_etc_hosts: true\nrandom_seed:\n\
  \  data: !!binary |\n    enVCY1AwRnVZelVqY3lONGE0c1FoaHpwU3pxYjhvb0YwdVlSMUF3RTdDT3UrUjJrTDRqZmFYeXZl\n\
  \    MnNSZ3FxNnRsNVpVK05mLzdmbFh2ajRPdUtRMFc3ZVJBU2liN2I2Tm1EblB1MVQ5RE0rV1JoQWpS\n\
  \ [SHORTENED]
  \    N0Z3K3lkTWlldmR4L1ZpUVJEN0ZvVXJ3TzVkd2ttSVJFSklmaUhHUjBJa2plL2pjM2Zhbz0=\n\
  \  encoding: base64\n  file: /dev/urandom\nruncmd:\n- udevadm trigger -c add -s\
  \ block -p ID_VENDOR=HC --verbose -p ID_MODEL=Volume\nsystem_info:\n  default_user:\n\
  \    lock_passwd: true\n    name: root\n    shell: /bin/bash\n\n--===============2513752762708167408==--\n"
  ~ $ curl https://google.de
curl: (6) Could not resolve host: google.de

@lukasmetzner
Copy link
Contributor

If you use kube-hetzner you might want to check out this comment: #742 (comment)

@pscloud-patrick
Copy link
Author

pscloud-patrick commented Jan 10, 2025

Thanks for the hint, but I dont think thats the point. And I'm not using kube-hetzner, I use self managed k8s. When I look at the routes in the Hetzner Networks tab, I think it's missing routes for the service subnet 10.96.0.0 ?

Bildschirmfoto 2025-01-10 um 16 19 48

@pscloud-patrick
Copy link
Author

Resolved the issue with changing the flannel backend from vxlan to host-gw in kube-flannel-cfg config map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants