Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet/etcd uses wrong IPv6 Address #9725

Open
trevex opened this issue Nov 14, 2024 · 15 comments · May be fixed by #9749 or #9847
Open

Kubelet/etcd uses wrong IPv6 Address #9725

trevex opened this issue Nov 14, 2024 · 15 comments · May be fixed by #9749 or #9847

Comments

@trevex
Copy link

trevex commented Nov 14, 2024

Bug Report

Description

When Talos is run in an IPv6 Single-Stack environment and is assigned multiple IPs by DHCP and RA (although this will most likely apply to Dual-Stack as well) the Kubelet will use the wrong Address.

In our case Talos is running in KubeVirt with the Passt network binding plugin and gets an IP via RA followed by an /128 IP from DHCPv6. Only the latter has full bi-directional connectivity.

The preferred /128 address has the flag permanent while the RA address has the flag mngmtmpaddr.

The permanent address should be preferred.

Logs

Relevant excerpts from omnictl support:

AddressStatuses

# cat fd01:cafe::5054:ff:fe1f:c7bd/resources/addressstatuses.net.talos.dev.yaml
metadata:
    namespace: network
    type: AddressStatuses.net.talos.dev
    id: eth0/fd01:cafe::5054:ff:fe1f:c7bd/64
    version: 1
    owner: network.AddressStatusController
    phase: running
    created: 2024-11-14T15:52:53Z
    updated: 2024-11-14T15:52:53Z
spec:
    address: fd01:cafe::5054:ff:fe1f:c7bd/64
    linkIndex: 8
    linkName: eth0
    family: inet6
    scope: global
    flags: mngmtmpaddr
---
metadata:
    namespace: network
    type: AddressStatuses.net.talos.dev
    id: eth0/fd01:cafe::f14c:9fa1:8496:557f/128
    version: 1
    owner: network.AddressStatusController
    phase: running
    created: 2024-11-14T15:52:55Z
    updated: 2024-11-14T15:52:55Z
spec:
    address: fd01:cafe::f14c:9fa1:8496:557f/128
    linkIndex: 8
    linkName: eth0
    family: inet6
    scope: global
    flags: permanent
---
metadata:
    namespace: network
    type: AddressStatuses.net.talos.dev
    id: eth0/fe80::5054:ff:fe1f:c7bd/64
    version: 2
    owner: network.AddressStatusController
    phase: running
    created: 2024-11-14T15:52:50Z
    updated: 2024-11-14T15:52:52Z
spec:
    address: fe80::5054:ff:fe1f:c7bd/64
    linkIndex: 8
    linkName: eth0
    family: inet6
    scope: link
    flags: permanent
---
metadata:
    namespace: network
    type: AddressStatuses.net.talos.dev
    id: lo/127.0.0.1/8
    version: 1
    owner: network.AddressStatusController
    phase: running
    created: 2024-11-14T15:52:48Z
    updated: 2024-11-14T15:52:48Z
spec:
    address: 127.0.0.1/8
    linkIndex: 1
    linkName: lo
    family: inet4
    scope: host
    flags: permanent
---
metadata:
    namespace: network
    type: AddressStatuses.net.talos.dev
    id: lo/169.254.116.108/32
    version: 1
    owner: network.AddressStatusController
    phase: running
    created: 2024-11-14T15:52:55Z
    updated: 2024-11-14T15:52:55Z
spec:
    address: 169.254.116.108/32
    linkIndex: 1
    linkName: lo
    family: inet4
    scope: host
    flags: permanent
---
metadata:
    namespace: network
    type: AddressStatuses.net.talos.dev
    id: lo/::1/128
    version: 1
    owner: network.AddressStatusController
    phase: running
    created: 2024-11-14T15:52:49Z
    updated: 2024-11-14T15:52:49Z
spec:
    address: ::1/128
    linkIndex: 1
    linkName: lo
    family: inet6
    scope: host
    flags: permanent
---
metadata:
    namespace: network
    type: AddressStatuses.net.talos.dev
    id: siderolink/fdae:41e4:649b:9303:2972:b262:4fad:b458/64
    version: 1
    owner: network.AddressStatusController
    phase: running
    created: 2024-11-14T15:52:53Z
    updated: 2024-11-14T15:52:53Z
spec:
    address: fdae:41e4:649b:9303:2972:b262:4fad:b458/64
    linkIndex: 9
    linkName: siderolink
    family: inet6
    scope: global
    flags: permanent

NodeAddresses:

# cat fd01:cafe::5054:ff:fe1f:c7bd/resources/nodeaddresses.net.talos.dev.yaml
metadata:
    namespace: network
    type: NodeAddresses.net.talos.dev
    id: accumulative
    version: 4
    owner: network.NodeAddressController
    phase: running
    created: 2024-11-14T15:52:47Z
    updated: 2024-11-14T15:52:55Z
spec:
    addresses:
        - fd01:cafe::5054:ff:fe1f:c7bd/64
        - fd01:cafe::f14c:9fa1:8496:557f/128
        - fdae:41e4:649b:9303:2972:b262:4fad:b458/64
---
metadata:
    namespace: network
    type: NodeAddresses.net.talos.dev
    id: accumulative-no-k8s
    version: 2
    owner: network.NodeAddressController
    phase: running
    created: 2024-11-14T15:52:55Z
    updated: 2024-11-14T15:52:55Z
spec:
    addresses:
        - fd01:cafe::5054:ff:fe1f:c7bd/64
        - fd01:cafe::f14c:9fa1:8496:557f/128
        - fdae:41e4:649b:9303:2972:b262:4fad:b458/64
---
metadata:
    namespace: network
    type: NodeAddresses.net.talos.dev
    id: accumulative-only-k8s
    version: 1
    owner: network.NodeAddressController
    phase: running
    created: 2024-11-14T15:52:55Z
    updated: 2024-11-14T15:52:55Z
spec:
    addresses: []
---
metadata:
    namespace: network
    type: NodeAddresses.net.talos.dev
    id: current
    version: 4
    owner: network.NodeAddressController
    phase: running
    created: 2024-11-14T15:52:47Z
    updated: 2024-11-14T15:52:55Z
spec:
    addresses:
        - fd01:cafe::5054:ff:fe1f:c7bd/64
        - fd01:cafe::f14c:9fa1:8496:557f/128
        - fdae:41e4:649b:9303:2972:b262:4fad:b458/64
---
metadata:
    namespace: network
    type: NodeAddresses.net.talos.dev
    id: current-no-k8s
    version: 2
    owner: network.NodeAddressController
    phase: running
    created: 2024-11-14T15:52:55Z
    updated: 2024-11-14T15:52:55Z
spec:
    addresses:
        - fd01:cafe::5054:ff:fe1f:c7bd/64
        - fd01:cafe::f14c:9fa1:8496:557f/128
        - fdae:41e4:649b:9303:2972:b262:4fad:b458/64
---
metadata:
    namespace: network
    type: NodeAddresses.net.talos.dev
    id: current-only-k8s
    version: 1
    owner: network.NodeAddressController
    phase: running
    created: 2024-11-14T15:52:55Z
    updated: 2024-11-14T15:52:55Z
spec:
    addresses: []
---
metadata:
    namespace: network
    type: NodeAddresses.net.talos.dev
    id: default
    version: 1
    owner: network.NodeAddressController
    phase: running
    created: 2024-11-14T15:52:53Z
    updated: 2024-11-14T15:52:53Z
spec:
    addresses:
        - fd01:cafe::5054:ff:fe1f:c7bd/64
---
metadata:
    namespace: network
    type: NodeAddresses.net.talos.dev
    id: routed
    version: 3
    owner: network.NodeAddressController
    phase: running
    created: 2024-11-14T15:52:47Z
    updated: 2024-11-14T15:52:55Z
spec:
    addresses:
        - fd01:cafe::5054:ff:fe1f:c7bd/64
        - fd01:cafe::f14c:9fa1:8496:557f/128
---
metadata:
    namespace: network
    type: NodeAddresses.net.talos.dev
    id: routed-no-k8s
    version: 2
    owner: network.NodeAddressController
    phase: running
    created: 2024-11-14T15:52:55Z
    updated: 2024-11-14T15:52:55Z
spec:
    addresses:
        - fd01:cafe::5054:ff:fe1f:c7bd/64
        - fd01:cafe::f14c:9fa1:8496:557f/128
---
metadata:
    namespace: network
    type: NodeAddresses.net.talos.dev
    id: routed-only-k8s
    version: 1
    owner: network.NodeAddressController
    phase: running
    created: 2024-11-14T15:52:55Z
    updated: 2024-11-14T15:52:55Z
spec:
    addresses: []

NodeIPs:

# cat fd01:cafe::5054:ff:fe1f:c7bd/resources/nodeips.kubernetes.talos.dev.yaml
metadata:
    namespace: k8s
    type: NodeIPs.kubernetes.talos.dev
    id: kubelet
    version: 1
    owner: k8s.NodeIPController
    phase: running
    created: 2024-11-14T15:52:55Z
    updated: 2024-11-14T15:52:55Z
spec:
    addresses:
        - fd01:cafe::5054:ff:fe1f:c7bd

Environment

  • Talos version: [talosctl version --nodes <problematic nodes>] v1.8.2
  • Kubernetes version: [kubectl version --short] v1.30.1
  • Platform: Omni+KubeVirt
@trevex
Copy link
Author

trevex commented Nov 14, 2024

A potential solution could be to sort the IPs by preferred flags here: https://github.com/siderolabs/talos/blob/e26d0043e022eccf5ea9c9d9b4a57e4bff1f80cc/internal/app/machined/pkg/controllers/network/node_address.go#L154C1-L155C1

However this would mean addresses in NodeAddress objects are sorted by preference rather than alphabetically.

If this is a valid solution I could draft up a PR.

@smira
Copy link
Member

smira commented Nov 14, 2024

I agree it might be better for IPv6, but you can use also https://www.talos.dev/v1.8/introduction/prodnotes/#multihoming

@trevex
Copy link
Author

trevex commented Nov 14, 2024

I am not sure how this helps here. Both addresses are from the same subnet.

KubeVirt's Passt network binding (which is currently the only fully functional IPv6 option supporting the primary pod network) announces the Pod Subnet (of the hosting cluster) as Prefix via RA and Talos will derive a SLAAC/Temp and follow it up with DHCPv6.

This means the SLAAC and DHCPv6 assigned IP are in the same subnet. I don't see a reasonable subnet filter to specify.

The SLAAC address itself is not reachable by the underlying pod network of the KubeVirt hosting cluster. Using it for etcd or kubelet will break connectivity. This is stopping Talos from scaling beyond a single node in an IPv6 KubeVirt environment as an unreachable IP will be advertised.

Is sorting the IPs alphanumerical and by preference based on flags a suitable solution (on top of the existing filtering)? If so, the changes required should be minimal and I might be able to draft up a PR.

@trevex
Copy link
Author

trevex commented Nov 14, 2024

It might be worth mentioning that the kubelet will choose the correct IP if no node IP is specified. This is the case with a kubeadm setup based on KubeVirt. From my understanding the Kubelet is using https://github.com/kubernetes/apimachinery/blob/v0.31.2/pkg/util/net/interface.go#L468 under the hood to choose the address.

@smira
Copy link
Member

smira commented Nov 15, 2024

I understand the issue, but I'd like to make sure we have a proper solution ground up for IPv6, so I don't want to rush into fixing this until we have a proper testbed for IPv6 we can use to ensure proper operations going forward.

I know it doesn't sound too much fun, but the proper IP can be selected with /128 match if the IP is known beforehand.

@trevex
Copy link
Author

trevex commented Nov 15, 2024

I know it doesn't sound too much fun, but the proper IP can be selected with /128 match if the IP is known beforehand.

Unfortunately the VM's IP is a Pod IP, so for KubeVirt IPv6 (omni-infra-provider-kubevirt) use-cases this is not an option and blocking adoption, but I understand the desire to find the best solution

@smira
Copy link
Member

smira commented Nov 15, 2024

I think it does make sense to prefer IPv6 addresses based on flags (not sure if we can omit mngmtmpaddr completely from NodeAddresses ?)

@sbrivio-rh
Copy link

KubeVirt's Passt network binding (which is currently the only fully functional IPv6 option supporting the primary pod network) announces the Pod Subnet (of the hosting cluster) as Prefix via RA and Talos will derive a SLAAC/Temp and follow it up with DHCPv6.

By the way, passt does this because you can't "turn off SLAAC" while sending router advertisements (the M flag is set, but it doesn't tell a node to skip SLAAC). You can disable router advertisements with passt's --no-ra option, but then you'd be missing the route.

But passt also does this because it works with Linux, as addresses with the longest prefixes are preferred as source addresses, see __ipv6_dev_get_saddr() and ipv6_get_saddr_eval() (rule #8) in net/ipv6/addrconf.c for details.

Now, without making this as generic as the Linux kernel, I guess it would be anyway reasonable to pick the longest matching prefix as preferred address.

@trevex
Copy link
Author

trevex commented Nov 15, 2024

I think it does make sense to prefer IPv6 addresses based on flags (not sure if we can omit mngmtmpaddr completely from NodeAddresses ?)

Funny enough in our bare-metal Talos setup we do not use DHCPv6 so the SLAAC address is used. A preference based on longest matching prefix sounds like a reasonable approach.

@sbrivio-rh
Copy link

Funny enough in our bare-metal Talos setup we do not use DHCPv6 so the SLAAC address is used.

The main reason why passt implements a (minimalistic) DHCPv6 server is that, I've been told, having the same exact address inside and outside the guest is convenient for integration with some container-oriented service meshes that assume "host networking" (hence, addressing).

@trevex
Copy link
Author

trevex commented Nov 15, 2024

Funny enough in our bare-metal Talos setup we do not use DHCPv6 so the SLAAC address is used.

The main reason why passt implements a (minimalistic) DHCPv6 server is that, I've been told, having the same exact address inside and outside the guest is convenient for integration with some container-oriented service meshes that assume "host networking" (hence, addressing).

Yes, and it is also a necessity to run Kubernetes Clusters in KubeVirt either through CAPI or Omni/Talos.

@smira Does Talos have a "feature gate" functionality allowing us to hide the changed behaviour behind a feature gate?

@trevex trevex changed the title Kubelet uses wrong IPv6 Address Kubelet/etcd uses wrong IPv6 Address Nov 18, 2024
@smira
Copy link
Member

smira commented Nov 18, 2024

Yes, and it is also a necessity to run Kubernetes Clusters in KubeVirt either through CAPI or Omni/Talos.

@smira Does Talos have a "feature gate" functionality allowing us to hide the changed behaviour behind a feature gate?

Yes, we do have feature gates, if you could open a proposed PR, we can make a feature gate, and even enable it by default for new clusters on 1.9.

@trevex
Copy link
Author

trevex commented Nov 18, 2024

Over the weekend I figured there might be a (dirty) workaround for the Kubevirt use-case (will not help for bare-metal IPv6 use-cases involving DHCPv6):
Spoofing the MAC address of VMs allows us to predict the IP, so we can blacklist it.
Unfortunately blacklisting does not seem to be supported anymore. The documentation mentions the use of !, but this is not handled in code (anymore).

This will leave the node in a non-functional state:

 # cat fdae:41e4:649b:9303:9cd5:e54b:8120:4adb/resources/nodeipconfigs.kubernetes.talos.dev.yaml
metadata:
    namespace: k8s
    type: NodeIPConfigs.kubernetes.talos.dev
    id: kubelet
    version: 1
    owner: k8s.NodeIPConfigController
    phase: running
    created: 2024-11-18T11:29:36Z
    updated: 2024-11-18T11:29:36Z
spec:
    validSubnets:
        - '!fd01:cafe::dcad:ff:fe00:beaf/128'
    excludeSubnets:
        - fd90:cafe::/64
        - fd95:cafe::/108

This might be either outdated documentation or another bug report.

I'll start working on a PR to establish a preference for IPv6 IPs ASAP.

@smira
Copy link
Member

smira commented Nov 29, 2024

This will leave the node in a non-functional state:

This is because there are no positive matches, you need to include ::/0 before the ! statement.

smira added a commit to smira/talos that referenced this issue Nov 29, 2024
@smira smira linked a pull request Nov 29, 2024 that will close this issue
@trevex
Copy link
Author

trevex commented Nov 30, 2024

Thanks @smira, including a range before adding an ignore statement fixes it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants