Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] subnet router failover not working properly when the subnet router node is not there at client authentication time ? #2228

Open
3 of 4 tasks
codingtony-candid opened this issue Nov 6, 2024 · 0 comments
Labels
bug Something isn't working no-stale-bot

Comments

@codingtony-candid
Copy link

codingtony-candid commented Nov 6, 2024

Is this a support request?

  • This is not a support request

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

If a router node is not online when the tailscale client logs in, the client does not route the traffic to that router node in case of failover. In that situation, when the node that is advertising routes goes offline, its advertised routes are still flagged as primary.

Expected Behavior

If the primary subnet router node disappear, the failover should kick-in immediately, even if the failover node came online after the client authenticated with headscale.

Steps To Reproduce

Topology

We have one headscale server and 2 router nodes both advertising the same routes.
Our router nodes are subnets routers, we do not route internet traffic through them.
We have a tailscale client running on MacOS that need to access an IP routed by the route nodes.

Case 1 : router node comes online after tailscale client

From a tailscale client (macOS, 1.76.1 standalone variant) there is a ping running to a routed IP : 192.168.168.1

Nodes:

39 | userM1Pro                 | userm1pro                 | [IytP7]    | [uiEpB] | user | 100.64.0.7, fd7a:115c:a1e0::7 | false     | 2024-11-06 16:20:19 | 2024-11-12 18:56:13 | online    | no
63 | router-node-i-0380d27217348885b | router-node-i-0380d27217348885b | [CnYak]    | [T6Q47] | router-node    | 100.64.0.4, fd7a:115c:a1e0::4 | false     | 2024-11-06 15:29:04 | 0001-01-01 00:00:00 | online    | no

Routes:

ID  | Node                            | Prefix           | Advertised | Enabled | Primary
447 | router-node-i-0380d27217348885b | 0.0.0.0/0        | true       | false   | -
448 | router-node-i-0380d27217348885b | ::/0             | true       | false   | -
446 | router-node-i-0380d27217348885b | 192.168.168.1/32 | true       | true    | true

We start a 2nd router node (router-node-i-065a1cf0840b26f0d)

Nodes

39 | userM1Pro                 | userm1pro                 | [IytP7]    | [uiEpB] | user | 100.64.0.7, fd7a:115c:a1e0::7 | false     | 2024-11-06 16:20:19 | 2024-11-12 18:56:13 | online    | no
63 | router-node-i-0380d27217348885b | router-node-i-0380d27217348885b | [CnYak]    | [T6Q47] | router-node    | 100.64.0.4, fd7a:115c:a1e0::4 | false     | 2024-11-06 15:29:04 | 0001-01-01 00:00:00 | online    | no
65 | router-node-i-01cd8b42c5599852d | router-node-i-01cd8b42c5599852d | [F3wqN]    | [Nu1nL] | router-node    | 100.64.0.8, fd7a:115c:a1e0::8 | false     | 2024-11-06 16:22:34 | 0001-01-01 00:00:00 | online    | no

Routes

ID  | Node                            | Prefix           | Advertised | Enabled | Primary
452 | router-node-i-01cd8b42c5599852d | ::/0             | true       | false   | -
454 | router-node-i-01cd8b42c5599852d | 0.0.0.0/0        | true       | false   | -
453 | router-node-i-01cd8b42c5599852d | 192.168.168.1/32 | true       | true    | false
447 | router-node-i-0380d27217348885b | 0.0.0.0/0        | true       | false   | -
448 | router-node-i-0380d27217348885b | ::/0             | true       | false   | -
446 | router-node-i-0380d27217348885b | 192.168.168.1/32 | true       | true    | true

From the macos tailscale client we see the 2 router-nodes
image

We stop router-node-i-0380d27217348885b (which has the primary route)
On the MacOs tailscale client the ping stop working

Nodes

ID | Hostname                        | Name                            | MachineKey | NodeKey | User           | IP addresses                  | Ephemeral | Last seen           | Expiration          | Connected | Expired
39 | userM1Pro                 | userm1pro                 | [IytP7]    | [uiEpB] | user | 100.64.0.7, fd7a:115c:a1e0::7 | false     | 2024-11-06 16:20:19 | 2024-11-12 18:56:13 | online    | no
63 | router-node-i-0380d27217348885b | router-node-i-0380d27217348885b | [CnYak]    | [T6Q47] | router-node    | 100.64.0.4, fd7a:115c:a1e0::4 | false     | 2024-11-06 16:24:49 | 0001-01-01 00:00:00 | offline   | no
65 | router-node-i-01cd8b42c5599852d | router-node-i-01cd8b42c5599852d | [F3wqN]    | [Nu1nL] | router-node    | 100.64.0.8, fd7a:115c:a1e0::8 | false     | 2024-11-06 16:22:34 | 0001-01-01 00:00:00 | online    | no

Routes

ID  | Node                            | Prefix           | Advertised | Enabled | Primary
452 | router-node-i-01cd8b42c5599852d | ::/0             | true       | false   | -
454 | router-node-i-01cd8b42c5599852d | 0.0.0.0/0        | true       | false   | -
447 | router-node-i-0380d27217348885b | 0.0.0.0/0        | true       | false   | -
448 | router-node-i-0380d27217348885b | ::/0             | true       | false   | -
446 | router-node-i-0380d27217348885b | 192.168.168.1/32 | true       | true    | false
453 | router-node-i-01cd8b42c5599852d | 192.168.168.1/32 | true       | true    | true

router-node-i-01cd8b42c5599852d is still advertising as primary

The macos client still show 2 node
image

We disconnect and reconnect the macos client and the ping comes back

Nodes

ID | Hostname                        | Name                            | MachineKey | NodeKey | User           | IP addresses                  | Ephemeral | Last seen           | Expiration          | Connected | Expired
39 | userM1Pro                 | userm1pro                 | [IytP7]    | [uiEpB] | user | 100.64.0.7, fd7a:115c:a1e0::7 | false     | 2024-11-06 16:20:19 | 2024-11-12 18:56:13 | online    | no
63 | router-node-i-0380d27217348885b | router-node-i-0380d27217348885b | [CnYak]    | [T6Q47] | router-node    | 100.64.0.4, fd7a:115c:a1e0::4 | false     | 2024-11-06 16:24:49 | 0001-01-01 00:00:00 | offline   | no
65 | router-node-i-01cd8b42c5599852d | router-node-i-01cd8b42c5599852d | [F3wqN]    | [Nu1nL] | router-node    | 100.64.0.8, fd7a:115c:a1e0::8 | false     | 2024-11-06 16:22:34 | 0001-01-01 00:00:00 | online    | no

Routes

ID  | Node                            | Prefix           | Advertised | Enabled | Primary
447 | router-node-i-0380d27217348885b | 0.0.0.0/0        | true       | false   | -
448 | router-node-i-0380d27217348885b | ::/0             | true       | false   | -
446 | router-node-i-0380d27217348885b | 192.168.168.1/32 | true       | true    | false
452 | router-node-i-01cd8b42c5599852d | ::/0             | true       | false   | -
454 | router-node-i-01cd8b42c5599852d | 0.0.0.0/0        | true       | false   | -
453 | router-node-i-01cd8b42c5599852d | 192.168.168.1/32 | true       | true    | true

What if the two nodes are up when the MacOS tailscale clients logs in

Ping is continuously running on the MacOS tailscale client

Nodes

ID | Hostname                        | Name                            | MachineKey | NodeKey | User           | IP addresses                  | Ephemeral | Last seen           | Expiration          | Connected | Expired
39 | userM1Pro                 | userm1pro                 | [IytP7]    | [uiEpB] | user | 100.64.0.7, fd7a:115c:a1e0::7 | false     | 2024-11-06 16:27:43 | 2024-11-12 18:56:13 | online    | no
63 | router-node-i-0380d27217348885b | router-node-i-0380d27217348885b | [CnYak]    | [T6Q47] | router-node    | 100.64.0.4, fd7a:115c:a1e0::4 | false     | 2024-11-06 16:36:32 | 0001-01-01 00:00:00 | online    | no
65 | router-node-i-01cd8b42c5599852d | router-node-i-01cd8b42c5599852d | [F3wqN]    | [Nu1nL] | router-node    | 100.64.0.8, fd7a:115c:a1e0::8 | false     | 2024-11-06 16:28:54 | 0001-01-01 00:00:00 | online    | no

Routes

ID  | Node                            | Prefix           | Advertised | Enabled | Primary
452 | router-node-i-01cd8b42c5599852d | ::/0             | true       | false   | -
454 | router-node-i-01cd8b42c5599852d | 0.0.0.0/0        | true       | false   | -
453 | router-node-i-01cd8b42c5599852d | 192.168.168.1/32 | true       | true    | true
447 | router-node-i-0380d27217348885b | 0.0.0.0/0        | true       | false   | -
448 | router-node-i-0380d27217348885b | ::/0             | true       | false   | -
446 | router-node-i-0380d27217348885b | 192.168.168.1/32 | true       | true    | false

We stop the primary : router-node-i-01cd8b42c5599852d

The ping only misses one beat :

64 bytes from 192.168.168.1: icmp_seq=1708 ttl=64 time=26.200 ms
64 bytes from 192.168.168.1: icmp_seq=1709 ttl=64 time=26.138 ms
64 bytes from 192.168.168.1: icmp_seq=1710 ttl=64 time=29.193 ms
Request timeout for icmp_seq 1711
64 bytes from 192.168.168.1: icmp_seq=1712 ttl=64 time=94.161 ms

Nodes

ID | Hostname                        | Name                            | MachineKey | NodeKey | User           | IP addresses                  | Ephemeral | Last seen           | Expiration          | Connected | Expired
39 | userM1Pro                 | userm1pro                 | [IytP7]    | [uiEpB] | user | 100.64.0.7, fd7a:115c:a1e0::7 | false     | 2024-11-06 16:42:35 | 2024-11-12 18:56:13 | online    | no
63 | router-node-i-0380d27217348885b | router-node-i-0380d27217348885b | [CnYak]    | [T6Q47] | router-node    | 100.64.0.4, fd7a:115c:a1e0::4 | false     | 2024-11-06 16:50:47 | 0001-01-01 00:00:00 | online    | no
65 | router-node-i-01cd8b42c5599852d | router-node-i-01cd8b42c5599852d | [F3wqN]    | [Nu1nL] | router-node    | 100.64.0.8, fd7a:115c:a1e0::8 | false     | 2024-11-06 16:52:18 | 0001-01-01 00:00:00 | offline   | no

Routes

ID  | Node                            | Prefix           | Advertised | Enabled | Primary
452 | router-node-i-01cd8b42c5599852d | ::/0             | true       | false   | -
454 | router-node-i-01cd8b42c5599852d | 0.0.0.0/0        | true       | false   | -
447 | router-node-i-0380d27217348885b | 0.0.0.0/0        | true       | false   | -
448 | router-node-i-0380d27217348885b | ::/0             | true       | false   | -
453 | router-node-i-01cd8b42c5599852d | 192.168.168.1/32 | true       | true    | false
446 | router-node-i-0380d27217348885b | 192.168.168.1/32 | true       | true    | true

The route fails over correctly (primary is now router-node-i-0380d27217348885b ) !

re-starting router-node-i-01cd8b42c5599852d

Nodes

ID | Hostname                        | Name                            | MachineKey | NodeKey | User           | IP addresses                  | Ephemeral | Last seen           | Expiration          | Connected | Expired
39 | userM1Pro                 | userm1pro                 | [IytP7]    | [uiEpB] | user | 100.64.0.7, fd7a:115c:a1e0::7 | false     | 2024-11-06 16:42:35 | 2024-11-12 18:56:13 | online    | no
63 | router-node-i-0380d27217348885b | router-node-i-0380d27217348885b | [CnYak]    | [T6Q47] | router-node    | 100.64.0.4, fd7a:115c:a1e0::4 | false     | 2024-11-06 16:50:47 | 0001-01-01 00:00:00 | online    | no
65 | router-node-i-01cd8b42c5599852d | router-node-i-01cd8b42c5599852d | [F3wqN]    | [Nu1nL] | router-node    | 100.64.0.8, fd7a:115c:a1e0::8 | false     | 2024-11-06 16:55:01 | 0001-01-01 00:00:00 | online    | no

(no effect on the ping)

Routes

ID  | Node                            | Prefix           | Advertised | Enabled | Primary
447 | router-node-i-0380d27217348885b | 0.0.0.0/0        | true       | false   | -
448 | router-node-i-0380d27217348885b | ::/0             | true       | false   | -
446 | router-node-i-0380d27217348885b | 192.168.168.1/32 | true       | true    | true
452 | router-node-i-01cd8b42c5599852d | ::/0             | true       | false   | -
454 | router-node-i-01cd8b42c5599852d | 0.0.0.0/0        | true       | false   | -
453 | router-node-i-01cd8b42c5599852d | 192.168.168.1/32 | true       | true    | false

stop both router-nodes

Nodes

ID | Hostname                        | Name                            | MachineKey | NodeKey | User           | IP addresses                  | Ephemeral | Last seen           | Expiration          | Connected | Expired
39 | userM1Pro                 | userm1pro                 | [IytP7]    | [uiEpB] | user | 100.64.0.7, fd7a:115c:a1e0::7 | false     | 2024-11-06 16:42:35 | 2024-11-12 18:56:13 | online    | no
63 | router-node-i-0380d27217348885b | router-node-i-0380d27217348885b | [CnYak]    | [T6Q47] | router-node    | 100.64.0.4, fd7a:115c:a1e0::4 | false     | 2024-11-06 16:58:26 | 0001-01-01 00:00:00 | offline   | no
65 | router-node-i-01cd8b42c5599852d | router-node-i-01cd8b42c5599852d | [F3wqN]    | [Nu1nL] | router-node    | 100.64.0.8, fd7a:115c:a1e0::8 | false     | 2024-11-06 16:58:36 | 0001-01-01 00:00:00 | offline   | no

As soon as we start router-node-i-01cd8b42c5599852d
ping recover as expected.

Environment

- OS: AmazonLinux2023 for Headscale and tailscale router nodes, MacOS Sonoma 14.6.1 for the tailscale client (standalone variant) 
- Headscale version: 0.23
- Tailscale version: 1.76.1

Runtime environment

  • Headscale is behind a (reverse) proxy
  • Headscale runs in a container

Anything else?

Other details for the setup

ACLs used :

{
  "acls": [
    {
      "action": "accept",
      "src": ["*"],
      "dst": [
                "192.168.168.1/32:*"
      ]
    }
  ],
  "tagOwners": {
    "tag:application-router-node": [
      "router-node"
    ]
  },
  "autoApprovers": {
    "routes": {
            "192.168.168.1/32": ["tag:application-router-node"]
    },
    "exitNode": []
  }
}

router-node tailscale parameters :

--login-server=https://********** --advertise-exit-node --advertise-routes=192.168.168.1/32 --accept-dns=true --advertise-tags=tag:application-router-node
@codingtony-candid codingtony-candid added the bug Something isn't working label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working no-stale-bot
Projects
None yet
Development

No branches or pull requests

2 participants