-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefer routes with lowest rtt #235
Conversation
I tested it with wireguard. It seems that this strategy will affect tinc to choose the right endpoint of node. I have 3 nodes, A,B,C A, has public ip 1.2.3.4 I set tinc cidr as 10.0.0.0/24 A has ip 10.0.0.1, 10.0.1.1, 1.2.3.4 Wireguard only support forword packets from B to C By A. So B can ping C using 10.0.1.3,and this is stable. Tinc can make nat hole punching, using A, so C may has some public ip 2.3.4.5 with port 34562. C endpoint 2.3.4.5:34562 is not stable due to nat timeout. With this strategy, i find that tinc node C's endpoint will become 10.0.1.3:655. I want this, But, result in I think this strategy should be controlled by configuration, not automatically. Some situations may not fit this strategy. |
This shouldn't affect the current behaviour, only for situations where 1 or more nodes have the same subnet (what you're referring to as CIDR). Example: A has If C has a lower RTT to A rather than B, it will choose to forward those packets to A. If B has a better RTT than A, C will forward the packets to B. |
This PR does not affect the behavior you describe. Tinc already does this automatically. This PR only affects which node packets are sent to, not how. |
yeah, you are right. @iczero @wolfy1339. I should disable forward between 10.0.0.0/24 and 10.0.1.0/24. This will make C = 1.0.1.3 => A => 1.0.0.3 => B impossible. |
@gsliepen , I think this pull request is an excellent feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an issue with this patch: the first time an address is looked up, the result of the lookup is cached. That means that if the RTT changes, packets will still go to the node found in the first attempt.
Another issue is that tinc only tries to establish UDP communication when there is traffic to a given node. This means there is a chicken and egg problem. In the beginning, all candidates for a given address might have a udp_ping_rtt
of -1, so it will just choose the node that just came last, and then it will try to establish UDP with that node, but not with any of the other candidates, so the other candidates will never be preferred.
Finally, the original loop quit when the first matching subnet was found, and subnets are sorted by their prefixlength. However with this change, you will continue going through all the subnets until you find a matching one with the lowest RTT. But that means that if you have two subnets, say 192.168.1.1/32 and 192.168.1.0/24, then this might cause a lookup for 192.168.1.1 to choose the latter subnet, which is wrong.
103d75f
to
279c281
Compare
Regarding cache I think it's perfectly acceptable to employ a heavy level of route caching. Not only is it something that should not change frequently route switches for small changes in network conditions should be avoided as many protocols (those designed to handle such situations) will become unstable. |
Frequent route switches should indeed be averted, but there's no telling how long the route will stay cached, and thus it's rather unpredictable when, if ever, it will switch to a route with a lower RTT. A possible solution is to just flush the cache at regular intervals. Also, the second and third point I mentioned are still issues that should be solved. |
How does this PR affect relayed connections? i.e A has 10.0.0.1/24 i.e where A is not connected to D but B&C are. A is connected to both B&C. Will this PR choose an improved route e.g A->B->D if B route has lower RTT than C? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There still needs to be some way to update the cache when RTTs have been updated. It doesn't have to be frequent.
Apparently github ate my previous comment... I currently do not have time to work on this, but I will update the PR when I do. Regarding the current code, it seems that it could possibly work better if there was a separate table containing all routes announced by more than one node (anycast routes), then potentially doing things from there (udp probes, lookups for routes flagged anycast, etc). @splitice this PR will not affect that case, it should only pick the lowest rtt node between 2 nodes advertising the same subnet. Please ignore the rebases for now. They do not introduce changes. |
@iczero Have you thought about aproaching this in a way that could be used for general purpose routing? You might be able to kill a few birds with one PR too. Perhaps each node advertises known subnets (routes) with the best RTT to that node (route cache can be used and as of #270 as it will not be cleared as frequently) to it's known nodes. On receipt simply update the splay entry and (if no conflict or subnet already exists in cache) the route cache on the receiving node. The routing algorithm could simply select the lowest RTT stored in the splay tree on cache miss. This would be predictable in that each node would be expected to supply regular updates and by invalidating the cache. There would still be some room for improvement by further reducing over clearing of the subnet cache and perhaps linking the cache more closely with the splay to prevent the need to do regular splay searches on these updates. To rehash my example from before:
In this case node A having received updates from both B and C would know that it's best path to D is via B. This would also work for your case of anycast. This would also resolve #45 (not of interest to me, but I do understand why it's interseting to other use cases) and the regular updates could be used to provide accurate RTT values. |
This is going to cause a lot of traffic in large meshes. Also, finding the path with the lowest total RTT if direct communication with a node is not possible would require running the graph algorithms more frequently as well, and the sum of measured RTTs along a path might be a poor indication of the actual RTT since it doesn't include the overhead of forwarding packets by all the intermediate nodes. We have to balance some things here. Perfect path selection requires continuous measurement of RTT along all possible paths, which is intractable on meshes with many nodes. So we should forget about this. We currently have some knowledge of RTTs, but for paths not taken the RTT is currently not updated. So I think the approach should be that if we have traffic to a Subnet with multiple possible nodes, we have the one we currently are using, and then we regularly try to improve the selection by probing the RTT of one of the other possible nodes (using |
@gsliepen what would be too much traffic in your opinion? Perhaps: Each node sends a packet to each of it's directly connected nodes with the Assuming that each node has has 200 active peers at a given time (thats alot) then that would be one full 1,500 byte packet every 30 seconds sent to each of them and received from each of those active peers. That means on a 30 second interval assuming all these numbers we would be looking in the ballpark of 10kb/s rx&tx at that large scale. Thats on the same scale as the TCP keepalives (if TCP connections are in use) and likely just as negligible. For smaller networks the bandwidth cost of something like this would be miniscule and comparable to a regular tunnel check operation. Sending these from each node to all their active peers every say 30 seconds with jitter (best to be configurable) wouldnt be substantial. Sending these updates to all peers (not just active) would be a nice feature to also resolve #45 (either a configuration option or an option to dio this on a reduced rate) and could be looked at. As someone who operates a larger network I wouldnt have an issue with 10kb/s usage in tunnel upkeep (or even much more). I'm much more irritated when my traffic is hauled around the world twice.
That sounds good but on larger networks it could take a VERY long time to find the best candidate if 200 connections have to be tested randomly. I don't dislike the idea entirely (I do like it's seeming simplicity). Some problems I see & thoughts:
Possibly difficult but perhaps on each route add (incl relayed route) each tinc node pings the node through that relay and generates a RTT to use. This RTT can be cached until its updated next. |
We want to limit background traffic as much as possible. We need some keepalive packets for the meta-connections, and we have on-demand UDP probe packets (ie, probe packets are only sent while actual VPN packets are being exchanged between two nodes). Probing for better routes should also be on-demand.
Sure, but that's just exchanging already known information. The problem is that unless nodes are exchanging data with each other, there is no information to share, because RTT times are only measured via the UDP probes, which are only sent on-demand. Consider nodes A, B, C and D. A, B, C all announce Note that tinc tries to avoid hop-by-hop routing, it always prefers to send packets directly to their destination. So if you have nodes A <-> B <-> C, then A tries to send directly to C via UDP, and knowing the RTT between A and B or B and C is not going to help, unless it couldn't send packets directly to C. But that would be the exceptional case.
Uhm no, the PING/PONG requests are only sent once a minute by default and are more like 50 bytes in size. If you mean TCP's own keepalive packets, then those are disabled by default, and if enabled, the default time before it sends a packet on an idle connection is 2 hours. 10 kb/s is already quite a lot for some users, especially those that pay for bandwidth.
But we don't have to do that. If we send a packet for a Subnet that is advertized by multiple nodes, only then do we need to ensure we also probe the other nodes that advertize that Subnet, we don't need to test all nodes in the mesh.
The new routes should use the UDP probe packets first. These will also help punch holes through NAT. Actual data packets should go to the known working node unless we are sure the new route is viable and better.
We should detect when a Subnet is advertized by multiple nodes and set some flag, so when we send a packet to that Subnet and it's time to send a probe, we also probe one of the other candidates. That could be just a random pick, or round-robin.
You mean finding the best route in the first 10 seconds?
It shouldn't be passive, it should react to actual demand for VPN traffic.
Well if we probe other candidates regularly anyway, then we should have reasonably up-to-date RTT information for all candidates. So if one goes down or suddenly gets a bad RTT, then we just flush the bad entry from the cache, and then the cache will be repopulated with the remaining best candidate.
We could indeed do that. |
Wouldnt the liveness be covered simply by the route being in the cache?
I'm not suggesting is suggesting one datapoint per packet, as many records as possible should be packed into a single packet. As you have already alluded to there is is already existing interactions that would use significantly more traffic than what this feature would.
You do if you want to use the same implementation to also handle smarter routing for relaying (which is where my interest lies). Anycast improvement / support is just a buyproduct for me. What about this as an alternative.
This way the backup route is also ready when it's needed (providing that the chosen relay node isnt also offline). That's something I would also like to cover, but with the limitations expressed (no active information transfer) I'm not certain how at this stage. The reason I would like for more information transfer is that quite often when a partial interupution occurs between two points it's the result of a single transit provider or link but it may take out 50%+ of upstreams. Finding a working path is currently something tinc does not do well (quickly), and I would love if this was to improve that. $N can be set to 0 in the config if active "route optimization" (name?) is not destired. Set to non zero with a regular enough interval this would also serve to keep holepunched ports open. Not important for my use cases, but for others it is. If someone was in the situation where the <1MB/day (small network) to potentially 10's of MB/day (big network) something like this would use at reasonable $T interval and $N nodes they could disable the feature by setting the config option(s) to zero. Assuming that such nodes would also be non-relayers then there would be no difference to them, asside from the odd additional ping message (for their own subnet). |
I think having a solution is better than having no solution. Personally, I wouldn't mind for using more bandwidth overall. I would much prefer if tinc can default to route with the lowest rtt, even if that means using more bandwidth overall. We could have this feature implemented and turned off by default, so tinc behaves the same as before by default. The users by default don't get an increase in bandwidth usage, unless they manually turn this feature on.
If the identities of these users are not sensitive information, I would love to know who they are. I think we could do with a testimonial section on tinc's website. I feel people don't realise how useful and versatile tinc is, compared to things like Wireguard and OpenVPN. The amount of bandwidth people have access to have changed a lot since tinc was first originally written (20 years ago), and since tinc-1.1 was first released (10 years ago). Just to give some examples from my personal experiences, 10 years ago video chatting with my parents in China was a struggle probably around half of the time, and the resolution was probably 240p. These days I pretty much expect to see my parents' face in 720p, well, at least 480p. 20 years ago I was on dial up, "broadband" wasn't in my country yet... Even 5 years ago I wouldn't be watching Youtube non-stop on my mobile phone's data plan. |
Fixed the subnet selection issue. The cache seems to be invalidated with graph changes, which may be enough. |
break; | ||
} | ||
|
||
if(p->owner->status.reachable) { | ||
int rtt = INT_MAX - 1; // use INT_MAX - 1 as rtt for nodes not directly reachable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A node can be directly reachable via TCP only. That still might be faster than via UDP to another node. But tinc doesn't measure TCP RTT unfortunately.
for splay_each(subnet_t, p, &subnet_tree) { | ||
if(!p || p->type != SUBNET_IPV4) { | ||
continue; | ||
} | ||
|
||
if(!maskcmp(address, &p->net.ipv4.address, p->net.ipv4.prefixlength)) { | ||
r = p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice this before, but by removing this line you've subtly changed how tinc handles packets routed to unreachable nodes. Before, lookup_subnet_*()
would return a non-NULL
result even if the subnet found wasn't reachable. Then tinc would send ICMP_NET_UNREACH
packets back. But now you'll return NULL
, so then tinc will send back ICMP_NET_UNKNOWN
.
Currently this patch is no longer being updated. It is, and likely will continue to be, a pile of hacks. I will close this PR for now. I apologize for the noise from the rebase. |
In situations where two identical routes exist, prefer the node with the lowest rtt.
An implementation of "poor man's anycast" as originally proposed in #194.