Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: prefer longer prefixes in IPv6 only environments as default Nod… #9749

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

trevex
Copy link

@trevex trevex commented Nov 18, 2024

Pull Request

What? (description)

Prefers longer prefixes in IPv6 only environments as default address, but keeps current behaviour everywhere else.

Why? (reasoning)

Talos IP preference strategy does not well in IPv6 Single-Stack environments as RA SLAAC addresses will be preferred over DHCPv6. In environments such as KubeVirt with Passt the SLAAC address is not routable by other Pods. If a DHCPv6 address is available it should be preferred.

The current filtering methods do not work as SLAAC and DHCPv6 address can be from the same prefix or in the case of KubeVirt a non-predictable pod IP.

This change only changes the preference in IPv6 only environments by adapting the default NodeAddress accordingly.
In the future it might be desirable to do a larger rewrite or even introduce CEL or expr to let the user define filtering and expression functions. As this is my first PR to Talos I kept the changes as minimal as possible.

Fixes #9725.

Acceptance

Please use the following checklist:

  • you linked an issue (if applicable)
  • you included tests (if applicable)
  • you ran conformance (make conformance)
  • you formatted your code (make fmt)
  • you linted your code (make lint)
  • you generated documentation (make docs)
  • you ran unit-tests (make unit-tests)

See make help for a description of the available targets.

Unfortunately make fmt changed lots of *.proto files, so I did not include the changes.
make unit-tests did not run (might be a NixOS issue), but running relevant tests with go test passed.
make docs did not include relevant updates to the documentation.

…eAddress

Fixes behavior with DHCPv6 available and prefers the assigned address.

Signed-off-by: Niklas Voss <[email protected]>
@trevex
Copy link
Author

trevex commented Nov 19, 2024

This is my first PR for Talos and I am not overly familiar with the codebase, but my assumption is changing the default NodeAddress should fix #9725.

@@ -127,9 +127,17 @@ func (ctrl *NodeAddressController) Run(ctx context.Context, r controller.Runtime
continue
}

// in IPv6 only environments we want to prefer longer prefixes, in other environments we keep old behavior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the proper fix is to re-sort the addresses list in the first place. Today the order comes from the alphabetical order of address resource IDs which look like <link>/address/bits so that the IPv6 addresses for the same link are sorted with longest prefix first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way it will not only affect defaultAddress, but all other list of addresses which are used to pick up etcd, and other listening addresses

Copy link
Author

@trevex trevex Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not overly familiar with the Talos codebase, is the defaultAddress not used by etcd as well? Quick glance through the codebase this is also true for kubelet.

Sorting the addresses was my initial thought, but I shied away from that approach as I was uncertain how to handle Dual-Stack environments properly. As a first step I could however change the sorting for IPv6 Single-Stack environments or sort only IPv6 addresses and keep the preference of IPv4 addresses due to sorting by bits.
Would this be the preferred approach?

Another third option, but larger change might be to introduce CEL or expr to allow users to customize filtering and IP selection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think about this more, but my gut feeling is that sorting by string value (as it is today) is not quite good

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have another look this evening and/or tomorrow as I also believe sorting might be the better approach. Grepping the codebase once more I am not certain I even understand what the purpose of defaultAddress is as it is only used in the hostname and etcfile controller (today, might be a relic of the past).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about https://datatracker.ietf.org/doc/html/rfc6724 and the algorithm here, as RFC doesn't directly map (as we try to sort source addresses without ever knowning destination address), so given two addresses, A and B, if one is a prefix of another, then we prefer the longer prefix, but if they don't have a common prefix, I'm not sure if this matters (?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what I'm thinking about - introduce a feature gate (and enable it by default for 1.9) to introduce new sorting algorithm for addresses in general, implement the required changes; simply changing the algorithm might have consequences of Talos choosing different addresses by default which doesn't sound like a good approach.

…y prefix length if one contains the other

A new feature flag, enabled by default in 1.9, controls sorting
behavior. If a prefix contains the other the longest is preferred.

Signed-off-by: Niklas Voss <[email protected]>
@trevex
Copy link
Author

trevex commented Nov 26, 2024

Finally found some time to work on resorting IPs. I am planning to improve the test coverage, but before doing that, it would be great to some feedback, whether I am on the right track.

(Also feel free to suggest alternative names for the feature.)

@smira smira self-assigned this Nov 28, 2024
@smira
Copy link
Member

smira commented Nov 29, 2024

Finally found some time to work on resorting IPs. I am planning to improve the test coverage, but before doing that, it would be great to some feedback, whether I am on the right track.

I think you're on the right track, I just feel we need to change sorting algorithm a bit more dramatically, and disable it by default.

I will try to get to it today and get it before we cut 1.9 beta.

smira added a commit to smira/talos that referenced this pull request Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubelet/etcd uses wrong IPv6 Address
2 participants