Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce tooBusy false-negative #17

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

sauvainr
Copy link

@sauvainr sauvainr commented Nov 8, 2016

In my company we realize that toobusy.js generate a large amount of false-positive (Our settings are highWater = 80 & smootingFactor = 1/5).
Event with this conservative factor, in a sudden load increase on the server would cause currentLag to jump from 0 to 200+ ms and cause all consecutive requests to be rejected, even the server has largely the resource to handle them.

This commit bring a proposal that solve this situation:

  1. Limit the maximum lag value:
    As highWater * 2 == (100% too busy) we want to avoid the current lag to suddenly jump to a state of rejection of all requests.
    Limiting the lag metric to highWater * 2 insure a smooth and coherent current Lag increase.

  2. Inverse smoothfactor for decrementing the currentLag value:
    In a situation of a quick punctual overload of the system the recovery should be fast to avoid false-negative rejections when the resources are already available.
    Inverting the smoothfactor when the lag measure is smaller than the current lag insure full resources usage.

In my company we realize that toobusy.js generate a large amount of false-positive (Our settings are highWater = 80 & smootingFactor = 1/5).
Event with this conservative factor, in a sudden load on the service would cause currentLag to jump from 0 to 200+ ms which triggered cause all consecutive requests to be rejected even the server has largely the resource to handle them.

This commit bring a proposal that solve this situation:
1. Limit the maximum lag value:
As highWater * 2 == (100% too busy) we want to avoid the current lag to suddenly jump to a stat of rejection all requests.
Limiting the lag metric to highWater * 2 insure a smooth and coherent current Lag increase.

2. Inverse smoothfactor for decrementing the currentLag value:
In a situation of a quick punctual overload of the system the recovery should be fast to avoid false-negative rejections when the resources are already available.
Inverting the smoothfactor when the lag measure is smaller than the current lag insure full resources usage.
@asilvas
Copy link

asilvas commented Nov 8, 2016

I've used this pattern for quite some time, and agree it has it's limitations. It's especially flaky with apps that can have bursts of cpu-intensive workloads, easily spiking lag times even during relatively low traffic. If your entire app/api is reliably light weight, toobusy pattern works quite well.

For more complex apps, I recommend concurrency monitoring, capping the number of concurrently processed requests, and responding too busy if threshold is exceeded. Or monitor the average request/response timings averaged over the last X requests, and responding too busy if threshold exceeds Y.

@knoxcard
Copy link

knoxcard commented Jan 28, 2018

Sounds critical, has this been merged in and pushed up?

@sauvainr
Copy link
Author

Hi, @knoxcard as the thread didn't have updates, I am still using our fork in my company -> https://github.com/exosite/node-toobusy Which is on production for more then a year now and provided the expected behavior for us.

@asilvas may want to have a look again and decides if he wants to integrates or document the limitation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants