-
Notifications
You must be signed in to change notification settings - Fork 29
Added connection timeouts. #9
base: cloneable-socket
Are you sure you want to change the base?
Conversation
Well, this is awkward. I also have a branch to handle the timeout cases I mentioned in #8, with commits I held back from that merge for further tests and refinement. I did so because I will have to focus exclusively on my thesis for at least a couple of days. My approach was slightly different from yours, based on the number of retries rather than the time, but either should be fine. After a quick read, I have a couple of concerns about some of the proposed changes:
dd60097 looks okay, apart from some minor style quibbles. As I said previously, it'll be a couple of days until I can review this pull request properly. Sorry. |
Firstly, you should prioritise your schooling ahead of all else. I have no problem if you throw away everything I do, indeed I would expect it given my lack of experience in Rust. And besides it's your library, you get the right to rewrite any pull requests anyone sends you or simply refuse a pull request, indeed I regularly do so with pull requests against my own libraries and the Boost libraries I am maintainer for. So long as some time in the next six months the ability to time out a stale connection is possible I don't think MaidSafe cares either. Regarding the congestion timeout, there are two issues here really. The first is that a retry timeout is very different to a connection timeout. If I understand your code correctly, the congestion timeout is a retry timeout, and you were using a multiple of that for a connection timeout. I've simplinearly replaced that with an absolute timeout as the same loop can also handle EINTR, so it solves both problems. If you look up maximum timeouts for TCP you should find there is a substantial difference in philosophy about when a connection should be considered dead, with Linux being particularly aggressive. I think in your library you should simply provide a settable option. You can default that to minutes if you like so long as it can be overridden. In crust we'll need it in seconds to reduce DDoS attack surface. What I'm about to do next is to change your use of select() to use a "cancellation fd" which I suppose should be a pipe. When I write a byte to the pipe, the blocking select will immediately exit. I'll use that to implement ungraceful connection shutdown as a way of immediately cancelling all blocking reads and listens. Once I have that in place, crust should supposedly now work on UTP, and I would suppose MaidSafe will reallocate me to new work. Anyway, I'll send that new code to you as additional pull requests, but honestly don't worry about them until your studies are done. And feel absolutely free to ignore and/or rewrite anything I send. I am well aware that my Rust skills are poor, indeed I still can't write lifetimes which actually compile yet, I haven't had the time to practice. |
Thank you for understanding. :) You are correct in your analysis that the congestion timeout is a retry timeout. According to the LEDBAT specification, this timeout doubles every time it is exceeded during a receive operation, and it has a ceiling of 60 seconds (which was the point I tried to make in my previous comment). I intentionally reused that same rule in the connection timeout (despite of the difference in situations), admitting at most 5 timeouts in
I see your point. I hope your changes go smoothly and you don't have to fight the compiler too much. The compiler can be unforgiving at times, especially when lifetimes are involved (and they tend to proliferate like bacteria once you need them!). [1]: A total of 6 attempts: 1 + 2 + 4 + 8 + 16 + 32 = 63 seconds. |
…don't necessarily exit, rather they check to see if the connection got closed in another thread, and if so then they might exit.
…esult, and you need to check again :(. Anyway fixes a spurious hang on Linux.
@meqif Ricardo it looks like my time with MaidSafe will be coming to an end in the next few weeks, so can I check what your schedule will be this coming month? When do you submit your thesis? |
@ned14 My schedule is not entirely rigid, except for the meetings with my thesis adviser and deadlines related to paper submissions. The deadline for submitting the thesis is late September, although the plan is to finish it by late July. I will try to carve out as much time as I can (within reason) to make and integrate the needed changes in the library, so you can get crust working with "vanilla" rust-utp before your time is up. |
@meqif It looks currently like my time with MaidSafe will end somewhere between mid July and end of July. I'd like to deliver a working UTP support by then, so what I was going to do was to keep monkey patching my fork of rust-utp with temporary code to get crust and the rest of the MaidSafe codebase working on UTP. That should round out my time with MaidSafe nicely. What I'd suggest to you is that you draw up a contract proposal for a refactoring of rust-utp which is threadsafe and reentrant, and send it along with an estimate of hours and your desired hourly rate to MaidSafe. That way instead of you doing needed changes for free, you get paid. Regarding a suitable hourly rate to propose, it's negotiable based on the strength of your CV and how much the client wants your time. I'll assume you won't be familiar with this (apologies if you are and this is patronising), so taking off any MaidSafe hat I can tell you a fresh graduate working remotely from home on open source programming with no commercial programming experience might be worth maybe €20/hour in the EU. With three years commercial programming experience that might reach €30-35/hour. With ten years and a strong open source portfolio in your CV it's about €50/hour. Remember working remotely comes with about a 30-35% rate discount from being on-site, but equally there are no commuting costs. The big bucks arise only if you aren't writing open source and/or are onsite, if you're working for a US hedge fund on proprietary trading algorithms you would start negotiating from €150/hour upwards but then your liability insurance and work enjoyment are substantially affected. Also, as a rule American clients pay more than European clients, but you'll find a markedly increased pressure to deliver code as quickly as possible and they really dislike you taking time off. In Europe we tend to prefer a work life balance and you are not punished culturally for saying so, in the US if you don't claim to live only to work then it's viewed as you being a slacker and you'll be treated accordingly. Anyway, that's my recommendation to you. I certainly would advise looking at this situation as an opportunity to earn some money and get valuable commercial programming experience onto your CV rather than working ad hoc for free. Besides, I'd assume MaidSafe will need someone to replace me :) and it might as well be you. |
@meqif To keep you in the loop I've spent today investigating and prototyping how to solve a problem in crust with UTP, which is that we're running out of file descriptors during the unit testing. The problem is caused by my use of a pipe to workaround the inability to cancel blocking reads, so I am causing 3x the file descriptor consumption over normal. The crust unit tests configure two hundred connections, so that's 600 fds. I'll admit I have no idea how that then exceeds the 1024 maximum to cause the hard fault, but the real point is that three fd's per UTP connection is unacceptable. I've decided after a bit of research that on POSIX at least, I'll combine pselect() and pthread_kill() to let one thread unblock reads occurring in other threads. That should dispense with the need for a pipe. On Windows none of this is a problem as everything there is natively async, so cancelling reads is a cinch. I'll get to work on that on Monday. An alternative for you in a later ground up refactor of rust-utp is to add magic unblocking datagram support, so to unblock reads happening in other threads you ping them with the magic datagram, taking care that that isn't a DDoS attack. A null sized datagram does also work on some platforms if I remember, but not all. |
…tion. Works very nicely, though there is obviously a spurious loop in rust-utp because it's occasionally timing out. I'll nail that tomorrow.
@ned14 Sorry for the late response. I'm very grateful for your advice. :) I'm sure you tried this, but I want to check for peace of mind: did you try calling |
@meqif I've rebooted into Windows to write the Windows support, but I am pretty sure that shutdown on UDP sockets is not portable. If you look at the POSIX spec for shutdown http://pubs.opengroup.org/onlinepubs/009695399/functions/shutdown.html, you will note the language assumes a connection based transport. https://bugs.python.org/issue19530 suggests calling shutdown on UDP sockets is a nest of vipers. Regarding the loop problem, it is due to two reasons (i) I'm force shutting down the reader thread too soon by breaking reads when the writer thread exits, thus occasionally losing packets which causes the unit test to hang till timeout and (ii) there is a race, due to lack of synchronisation, in the graceful close() sequencing part caused by me forcing closed the connection in (i) from the other side, so depending on timing it can hang. The force close is necessary because if we have exited the writer thread, there is no code available to pump the recv() function because the reader thread is still blocked on reads. The existing code tries to work around that by "breaking reads" really means that the reader thread sets a socket timeout of 1ns i.e. loop polls for new packets, constantly checking if the connection state is closed, and therefore pumping recv for us. But to be honest, I'm just grafting on bad hack atop bad hack here on a library design not designed for concurrent read/write (no offense intended). What rust-utp really needs is a ground up redesign around concurrent operations, and that is what I'd suggest you send MaidSafe a quote for. If you choose mio as your base you'd make your life much easier. I'd even wonder if porting mio to Windows IOCP is the least cost path forwards, why mio isn't already on WinIOCP is beyond me :( To be honest, if I were implementing mio I'd take Boost.ASIO and wrap it with the mio Rust API. mio is thus solved for all platforms quickly and easily and you're not constantly hammering down weird async bugs, as ASIO has already invested a decade in doing that for you, indeed ASIO is entering the official C++ standard next release. I suspect wrapping a C++ library isn't considered Rust idiomatic though :( |
@ned14 Yes, I don't expect Carl (mio's author) mentioned here he's working on Windows support on another library, but I can't seem to find it in his public repositories. There's also rust-iocp, which looks unmaintained. At a glance, Peter's winapi-rs looks promising for the Windows side of things. I'm not sure how idiomatic that approach (wrapping Boost) would be, but I agree it's a viable path nonetheless. |
@meqif Note that Boost.ASIO doesn't require any of Boost, it only needs a C++ 11 compiler. One of those comes with LLVM :), so Rust is sorted :) ASIO's enormous benefit is it is debugged. All those weird Windows quirks, and quirks across XP - Win7 - WinRT are all handled and worked around. Any new Rust library would likely take at least three to five years to achieve a similar reliability, as indeed ASIO originally did. Indeed Microsoft themselves use ASIO internally as it's easier than dealing with their own bugs, plus it targets all their platform stacks equally. I also think such a wrapper would be highly Rust idiomatic in practice as Rust does most of C++ also does, so patterns such as: buffer_sequence bufferseq=...; ... in Rust would be the highly idiomatic: let bufferseq = some slice; My single biggest concern would be how to maintain such wrappers. There is no tooling for automating Rust C++ wrappers at all, so you'd probably have to wrap ASIO in an automatically generated C API using SWIG/CppComponents and then build a Rust mirror using that C API. Anyway, these are just fanciful musings on my part. One daydreams when you spend hours getting pthread signals to unblock reads for you! |
…the non-blocking sockets emulation in Winsock2. Note that the Windows emulation of POSIX non-blocking sockets has the unfortunate habit of returning WSAWOULDBLOCK everywhere which is not correctly trapped by Rust's libc as WouldBlock, thus causing some of the rust-utp unit tests to fail. If you wrap all socket operations with a map_err() lambda trapping and handling WSAWOULDBLOCK, it should work perfectly. crust, however passes beautifully on Windows now, so I'm pressing onwards without fixing up the rest of rust-utp (sorry!).
Crust set read timeout
Your earlier code doesn't retry a socket read receiving EINTR, plus I need connections to die if they are not live. I therefore added a loop of all reads with CONNECTION_TIMEOUT which is 20 secs, as it is for TCP on Linux.