-
Notifications
You must be signed in to change notification settings - Fork 125
Adds uTP network test #454
Adds uTP network test #454
Conversation
vinipsmaker
commented
Dec 28, 2015
r? @inetic (maidsafe_highfive has picked a reviewer for you, use r? to override) |
Continued work from #451 |
This test seems to be hanging on Windows, but actually the Windows support on I'll leave it to you whether to accept/reject this PR as is or to fix the the Windows hanging issue prior to the addition of the |
I changed my mind. Test is failing on Windows. Better to fix one problem at a time instead accumulating'em all. |
9164a28
to
bb9695c
Compare
There is some issue under Windows:
|
The tracked down Windows error:
Probable origin: https://github.com/maidsafe/rust-utp/blob/ffa531c0c58e078baa60c36661fd185825739813/src/socket.rs#L576 |
I was wrong about that.
I'm confused. It's like I'm getting a different error every time.
Windows support seems to be much less robust than Linux/OSX. |
I investigated the issue more and tracked down to this function, which is weird. It's weird because the error is UDP is connection-less, so why this error? I found an explanation on StackOverflow. So I think the correct behaviour is to ignore this errors, as UDP is connection-less and not being able to receive now doesn't mean not able to receive a few seconds later. I'll make a test and find out. |
Just did a small test and it was not enough, but I didn't get much info. Now I'll investigate |
A part of the crust workflow in the introduced test is like (note that some parts only exist in crust and the test will indirectly and implicitly invoke such actions):
I can confirm the uTP connection is established as the multiplexer thread is created. I only got this far by making rust-utp ignore the error codes 10054 (WSAECONNRESET) and 10040 (WSAEMSGSIZE) on Windows. WSAECONNRESET looks safe to ignore, but WSAEMSGSIZE is really weird. For now, I'll research if moving UDP socket descriptors among threads can cause any problem on Windows. And it's troublesome to debug some error without having direct access to the operating system. Maybe next year I'll get a new laptop and install Windows on my current/old laptop. |
Before continuing to investigate this, I did one more test just to confirm that the errors only start after the objects are moved among threads. Now my suspicion has grow stronger. |
|
After a long time debugging, I found something interesting.
Also, even if we're facing this bug now, it doesn't mean it is THE bug, as other changes are required to make crust working properly under Windows for UDP support (ignoring I think it's unlikely I'll finish the fix before Jan 4th, but maybe next week I can bury these bugs for good. |
I found something interesting. First a summary of the important part of the test to understand my finding:
Now consider only a pair of nodes. We can have the following:
Some observations:
In short, there might be a race within |
@inetic mentioned that he also was getting out-of-order packets (at least I understood this) and (implicitly) suggested me to try his fork: https://github.com/inetic/rust-utp/tree/master But using his fork I got this error (WSAEMSGSIZE) when calling Then I proceed to merge my changes and these other two @inetic's commits: And the test passed (but CI machine hanged). I'll do a few more tests before cleaning the code for release. |
I got a new error, |
So... both commits are required to fix the out-of-order packets issue, but then we have a new issue to fix. |
Just a quick correction. Actually, I was seeing an |
bb9695c
to
e47bda8
Compare
I checked again and I saw that actually still happens. It just happens to be very rare. Also, it doesn't prevent the test from succeeding. Therefore, I reverted the commit that increases buffer size and tested again. The frequency of this error increased, but again, it didn't prevent the test from succeeding. The commit will remain disabled, as it represents no improvement. |
My progress in this task is stalled until branch |
73118c1
to
7def39c
Compare
I fixed branch |
uTP seems stable when we use Peter's patchset. However, I went a little further to be sure just in case:
I'll investigate why the new changes make some of the old tests fail before merging the branch. |
Just because I'm used to find races, I've run the
So... I'll start investigating one of these. |
My work on maidsafe-archive/rust-utp#16 PR is done. So, as soon as that PR is merged, this very one can also be merged. |
maidsafe-archive/rust-utp#16 was merged. I added a dummy commit to trigger a new CI build, but it seems the uTP test failed on Windows:
I'll have to investigate why. |
It seems if the |
95acde4
to
7def39c
Compare
I was wrong. The CI builds on this PR failed because I forgot to release a new crate for CI tests are passing now. PR is ready to be merged. |
r? @dirvine |
self.connection_map.clone())); | ||
self.acceptors.push(acceptor); | ||
|
||
// FIXME: Instead of hardcoded wrapping in loopback V4, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed the return of the bind should be used in these cases Ok for now though
@dirvine: all these pieces of code you're commenting related to make an uTP listener just mirror the current TCP listener in I'll create a new merge/PR for the |
Sounds good, we are still toiling with compilation in new_refactor branch but yes good idea. Merging now. |
This one took quite a while:
And at least three developers have worked on this issue. |