-
Notifications
You must be signed in to change notification settings - Fork 13
Logic flaws (?) on PunchedUdpSocket
error handling
#33
Comments
PunchedUdpSocket
algorithm
PunchedUdpSocket
algorithmPunchedUdpSocket
error handling
Maybe. If we're having trouble writing to the socket then something has gone wrong. Is there any specific IO error that would be better to handle by continuing? Possibly I really wish the standard library used more precise error types instead of this crappy
In that case we've received an ACK so we know that the peer has received our packets. We could send back our own ACK before exiting but I don't think it would help.
I don't know what you mean by racy here but re-reading that code I think it should send the ACK more than twice and wait more than 100ms.
The sleep is the call to |
Well, one of the addresses that we get may be unreachable and we can get a failure just to one of the addresses. We're trying to punch a hole and this includes searching which address is usable. If we find one address is unusable it doesn't make sense to abort the hole operation.
I disagree. These Rust APIs are low-level and behavior depends a lot on operating system.
Sorry. I checked again and there is no problem with current algorithm. |
I missed that. Thanks. But shouldn't failing to decode the packet also imply in increase the timeout (i.e. Also, timeout calculate looks very inaccurate for me. If packet is received before timeout is reached, we should not increase elapsed time by |
What do we check for in this case? I still think a
They could have enough enum variants to cover all the possible failures on all operating systems, and make those variants more descriptive and still avoid having bogus variants like
If we receive a valid packet we never use the deadline again before exiting the function. If we receive an invalid packet we still don't exit the loop until the deadline is reached. Setting the deadline to |
I didn't want to illustrate/give-examples because behavior is system-dependent. The only sane cross-platform behavior is to ignore the error and try again (withing a timeout ofc). However, the reason why I didn't want to illustrate the problem is because you may conclude that we should ignore some specific errors and release a "fix" because the fix work on our machines, but the error isn't fixed and we may see random failures on the wild (without a chance to debug). You already started the sentence with "What do we check for..." suggesting that you might do exactly that. Anyway, I stumbled upon this kind of issue when I was working on rust-utp stabilization and one of the errors that can happen only on one of the addresses is We promised to test several addresses to find one that works. But what we're doing is finding one that does NOT working and aborting the search operation. This is completely the opposite of what we promised.
I find this very unlikely, but it doesn't matter what I think here because this discussion is OFF-TOPIC for this thread/issue.
Some misbehaving node could keep sending invalid packets that will cause the packet cbor decoding to fail and we'll never leave this loop. This type of timeout is very inaccurate and a shame to have such fragile implementation (even if very unlikely to happen).
You're right. I'm still "eating" this code so I missing some parts of the behavior still. |
Agreed it's off-topic but anyway.. I don't share your pessimism. Version 0.0.1 of this design might panic when it sees Back on topic.. Okay I'll make the error handling a bit more detailed and not abort on unknown errors.
This couldn't happen though. |
I checked again. This time I realized my mistake. I was interpreting the code was using the "wait_for" primitive, but the actual primitive being used was "wait_until". I can see now that the timeout logic is pretty much correct. |
#34 fixes "It should be an error if we fail to send to all addresses, not just a single address: https://github.com/maidsafe/nat_traversal/blob/8207018ff4332157975765b7fe1eaee893d020f4/src/punched_udp_socket.rs#L208" |
What should we do in this case? There's no point returning back to the loop because we know that we're trying to message the correct address. |
We should try a few more times (e.g. until our timeout is reached or maybe after 10 attempts are made). |
mk, here's a PR: #38 |
We shouldn't abort operation before we send at least one packet to the other peer (current code looks very racy to me).deadline += time::Duration::milliseconds(DELAY_BETWEEN_RESENDS_MS)
is notthread::sleep
(i.e. no delays between attempts are happening at all; removingPeriodicSender
was a bad idea)The text was updated successfully, but these errors were encountered: