Skip to content

Commit

Permalink
fix: reverted NodeGetInfo response as it breaks Nomad clusters (#776)
Browse files Browse the repository at this point in the history
We modified the response for `NodeGetInfo` to return an additional
Topology Segment. We assumed that this only “adds” new info, but in
practice it breaks the spec.

When trying to schedule a volume to nodes, the container orchestration
systems should verify that the Node fulfills at least one Accessible
Topology of the Node, where “fulfills” means that all supplied segments
match.

This is not implemented in the same way between Kubernetes and Nomad.

- **Kubernetes**: requirements are fulfilled if the volume specifies a
subset of the Nodes topology
- **Nomad**: requirements are fulfilled if the volume specifies all of
the Nodes topology

We made these changes to work around a bug in the Kubernetes scheduler
([here](kubernetes-csi/external-provisioner#544))
where nodes without the CSI Plugin would still be considered for
scheduling, but then creating and attaching the volume fails with no
automatic reconciliation of this error.
  • Loading branch information
lukasmetzner authored Nov 12, 2024
1 parent c84ca4a commit 417d201
Show file tree
Hide file tree
Showing 5 changed files with 2 additions and 23 deletions.
7 changes: 0 additions & 7 deletions chart/templates/core/storageclass.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,6 @@ provisioner: csi.hetzner.cloud
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: {{ $val.reclaimPolicy | quote }}
{{- if $val.allowedTopologyCloudServer }}
allowedTopologies:
- matchLabelExpressions:
- key: instance.hetzner.cloud/provided-by
values:
- "cloud"
{{- end }}
---
{{- end }}
{{- end }}
2 changes: 0 additions & 2 deletions chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -728,5 +728,3 @@ storageClasses:
- name: hcloud-volumes
defaultStorageClass: true
reclaimPolicy: Delete
## @param storageClass.allowedTopologyCloudServer Prevents pods from being scheduled on nodes, specifically Robot servers, where Hetzner volumes are unavailable
allowedTopologyCloudServer: false
14 changes: 2 additions & 12 deletions docs/kubernetes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,19 +209,9 @@ $ kubectl apply -f https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.

## Integration with Root Servers

Root servers can be part of the cluster, but the CSI plugin doesn't work there and the current behaviour of the scheduler can cause Pods to be stuck in `Pending`.
Root servers can be part of the cluster, but the CSI plugin doesn't work there.

In the Helm Chart you can set `allowedTopologyCloudServer` to true to prevent pods from being scheduled on nodes, specifically Robot servers, where Hetzner volumes are unavailable. This value can not be changed after the initial creation of a storage class.

```yaml
storageClasses:
- name: hcloud-volumes
defaultStorageClass: true
reclaimPolicy: Delete
allowedTopologyCloudServer: true # <---
```

To ensure proper topology evaluation, labels are needed to indicate whether a node is a cloud VM or a dedicated server from Robot. If you are using the `hcloud-cloud-controller-manager` version 1.21.0 or later, these labels are added automatically. Otherwise, you will need to label the nodes manually.
Labels are needed to indicate whether a node is a cloud VM or a dedicated server from Robot. If you are using the `hcloud-cloud-controller-manager` version 1.21.0 or later, these labels are added automatically. Otherwise, you will need to label the nodes manually.

### Adding labels manually

Expand Down
1 change: 0 additions & 1 deletion internal/driver/driver.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,4 @@ const (
DefaultVolumeSize = MinVolumeSize

TopologySegmentLocation = PluginName + "/location"
ProvidedByLabel = "instance.hetzner.cloud/provided-by"
)
1 change: 0 additions & 1 deletion internal/driver/node.go
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,6 @@ func (s *NodeService) NodeGetInfo(_ context.Context, _ *proto.NodeGetInfoRequest
AccessibleTopology: &proto.Topology{
Segments: map[string]string{
TopologySegmentLocation: s.serverLocation,
ProvidedByLabel: "cloud",
},
},
}
Expand Down

0 comments on commit 417d201

Please sign in to comment.