Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-connection Strategy #2

Open
ProbablePrime opened this issue Jan 30, 2018 · 10 comments
Open

Re-connection Strategy #2

ProbablePrime opened this issue Jan 30, 2018 · 10 comments
Labels

Comments

@ProbablePrime
Copy link

Awesome Library.

I was taking a look at the re-connection handler and noticed it just takes a static delay before attempting to reconnect.

I usually like to use some form of re-connection strategy to control how a re-connection occurs for example an Exponential Back-off.

Would this be a suitable feature addition?

To keep the module size down the constructor could take a function:

new Sockette(..., {
  reconnection: (attempts, maxAttempts) => Number
});
@lukeed
Copy link
Owner

lukeed commented Jan 30, 2018

Hey, thanks!

Yeah, maybe. It would have to support static numbers & functions, all while staying below 350 bytes. That's my magic number for this module 😇

Originally, I was using an exponential delay, but I actually found it kind of useless because simple functions can prevent you from ever reaching the maxAttempts... unless you're okay with waiting huge amounts of time.

Now I much prefer a predictable interval, even if it means that the max is reached quickly. I can then display a "Unable to Reconnect. Click to Retry~" and then repeat the process on user action. A lot easier to handle and skips the extra 200-300 bytes of timeout logic


I'll keep this open for future discussion, but (atm) I've reminded myself why I left this out. 😆

@lukeed lukeed added the maybe label Jan 30, 2018
@lpinca
Copy link

lpinca commented Jan 30, 2018

Another issue with the current approach is that all clients will reconnect at the same time DDoSing the server.

@lukeed
Copy link
Owner

lukeed commented Jan 30, 2018

That's true regardless of time delay.

It doesn't matter if you have a 1000ms delay or a functional (curr, attempts) => attempts * 1e3 + curr * 500 delay. When the server shuts off, all connected clients are starting from 0 ± 500ms.

All clients will still be spamming reconnections at roughly the same time:

  • 1000, 2000, 3000, 4000...
  • 500, 2000, 3500, 5000...

It's still a responsibility of the server to handle reboot w/ appropriate load balancing & co in place.

@lpinca
Copy link

lpinca commented Jan 30, 2018

@lukeed not if you add a random factor in your delay generation.

@lukeed
Copy link
Owner

lukeed commented Jan 30, 2018

If you have 10k clients trying to reconnect at the same time, it doesn't really matter if you have a random factor, unless you're willing to randomize up to 10s for some poor user. The difference of a few ms makes 0 difference. Still need to build for that case on server-side.

@bmcminn
Copy link

bmcminn commented Jan 31, 2018

The DDoSing issue sounds like a documentation concern that this library is aware of but has minimal capabilities at resolving.

As @lukeed mentioned, the mitigation strategies for this are devops/server level concerns that the client shouldn't have to concern themselves with.

@ThisIsMissEm
Copy link

Hi! From someone who's once built large scale websockets as a service infrastructure (many year ago), you'll definitely want to introduce some jitter on reconnection, otherwise you'll get a thundering herd problem, as described by other commenters.

Even if this is for 100-700ms, it'll still give you time to process the connection attempt, and then try to handle it. The alternative is that you end up overload various aspects of your infrastructure, making it impossible to restart any single component. Additionally, a simple Math.random() * 500 can really make a difference. Sure, it might still be 1000-3000 clients reconnecting at once, but that's far more manageable than 10,000 trying at once, even if those 3000 don't all manage to successfully reconnect, it still cuts load on the subsequent reconnect attempt.

It's great to make a small module, but with caching and service workers, that becomes practically irrelevant if an intermittent issue will make it really hard to recover availability due to a thundering herd. People will generally be reloading your page less than they will be reconnecting.

@ghost
Copy link

ghost commented Feb 3, 2018

If your server can't handle a couple of reconnects then you have serious problems server side. First off, even if you spam a server with tons and tons of SYN packets, the Linux kernel will just discard any such packet it cannot handle, fully according to TCP spec. The user space process will not even see those connection attempts, and the sender will just resend them in exponential backoff ways until the server can handle them. You don't need to reinvent the wheel by implementing that same logic once again on top because it doesn't make any difference. Latency + sender backoff is already achieving this for you. Also, any well-written server can handle millions of connections with no problem at all. Don't let Socket.IO marketing buzz become what you call "source of truth" because those kind of projects rely on nothing but fancy words and marketing bullshit to make you think they actually achieve anything.

@spankyed
Copy link

spankyed commented Nov 9, 2021

Exponential back-off with some randomness to prevent thundering herd can only help , not hurt, this library. And it would be simple to implement. I am very much ok with waiting huge amounts of time to reach maxAttempts. As opposed to quickly reaching max attempt after 20 or so spam requests.

Alternatively, an extension could be made that begins a reconnect strategy when onmaximum is fired.

@dav1app
Copy link

dav1app commented Nov 22, 2021

Should I open a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants