Skip to content
This repository has been archived by the owner on Apr 19, 2024. It is now read-only.

Scheduling over-limit requests #172

Open
Dreamsorcerer opened this issue May 3, 2023 · 17 comments
Open

Scheduling over-limit requests #172

Dreamsorcerer opened this issue May 3, 2023 · 17 comments
Labels
help wanted Extra attention is needed

Comments

@Dreamsorcerer
Copy link

Is there any way to schedule currently over-limit requests?

e.g. I send a request that I want to happen eventually, but the rate limit has currently been reached.
The service could add the request to a queue and then return the success response once the rate limit allows.
This could also take a timeout parameter for the individual request. As the service can calculate when it will resolve the request (from the queue size and rate limit), it can reject immediately if it won't be resolved within the timeout.
This means the requester will know that the request will be handled eventually within the timeout, or will get an immediate rejection if not.

@Dreamsorcerer
Copy link
Author

Dreamsorcerer commented May 5, 2023

The service could add the request to a queue and then return the success response once the rate limit allows.

I suppose it could actually return a response immediately, with a new response status that says you are allowed to send a request at X time.

The advantages to this over having the service simply retrying at the next reset_time are:

  • Enforcing a first-come-first-served process, whereas a retry attempt might be beaten by newer requests, leaving a stale request possibly never getting through.
  • Only one request is needed, instead of several retry requests.
  • The caller knows immediately if the request will be allowed or rejected, rather than having to wait until a timeout has passed.

@Baliedge
Copy link
Contributor

Baliedge commented May 5, 2023

@Dreamsorcerer There is a ResetTime in the return value of GetRateLimits(). You should be able to use this timestamp determine when the next allowable hit can occur.

@Dreamsorcerer
Copy link
Author

See the previous bullet points. You'd still need to retry the request at ResetTime, when there could be hundreds of other requests all attempting to retry at exactly the same time (as they will all have received the same ResetTime). So, all of those bullet points above are not true by simply using ResetTime.

@Baliedge
Copy link
Contributor

Baliedge commented May 8, 2023

Gubernator is not designed to maintain a session beyond the request lifecycle, nor does it make assumptions about what the client will do at a later time. If the GetRateLimits call returns OverLimit: true, then you'll just have to try again later. I would suggest implementing a client helper that would do the automation that you're requesting.

@Dreamsorcerer
Copy link
Author

Dreamsorcerer commented May 8, 2023

But, it must remember the number of requests that have happened in the past, so I don't see how this would require it maintaining any more information than it does currently. It just needs to remember requests that have been approved in the future as well as the past.

e.g. There must be some internal counter that counts by how many requests it is under the limit. So, the change might look like:

  • Allow that counter to go to negative as it counts how many requests over the limit it is.
  • From the rate limit information and over-rate count, it can calculate when the next request would be allowed.
  • If that is over the timeout value of the request, it rejects with an OverLimit response.
  • Otherwise, it increments the over-limit count and responds with a positive response that says you can make the request at X time.

@Baliedge
Copy link
Contributor

Baliedge commented May 9, 2023

What you describe sounds like a "burst" mode. The user wants to force a hit to be allowed even if it's over limit, but then consider the burst activity within the rate limiting math. Am I understanding correctly?

Either way, could you explain your use case for this feature?

@Dreamsorcerer
Copy link
Author

The user wants to force a hit to be allowed even if it's over limit, but then consider the burst activity within the rate limiting math. Am I understanding correctly?

But, specifically, we need to know when the request is allowed to be made. We want to use this to match our outgoing requests to third-party API rate limits.

Either way, could you explain your use case for this feature?

So, we want to make some requests within a reasonable timeframe (in some cases upto a few minutes), but still want to complete as fast as possible.

So, as mentioned in the above bullet points, the retry approach would mean that if the system is overloaded with requests, we would need to keep retrying until the end of the timeframe, not knowing if the request will go through in time. This leaves users with a lengthy loading time, just to get no results.

With the proposed approach, we'd know immediately if we are going to timeout and give the user an error immediately. Enforcing a first-come-first-served approach would also increase reliability that an individual request will get through. And this all reduces resource usage (bandwidth etc.) compared to the retry approach.

@Baliedge
Copy link
Contributor

But, specifically, we need to know when the request is allowed to be made. We want to use this to match our outgoing requests to third-party API rate limits.

To find next available opportunity in advance you can call GetRateLimits() with Hits=0 and then observe ResetTime.

@Dreamsorcerer
Copy link
Author

But, that doesn't register it as a hit, right? We're just back to the retry logic.

Let me try and come up with an example.

Leaky bucket with 10 requests per minute:

  • We're at the limit, and send another 10 requests with a 30s timeout.
  • Expected response is for the first 5 to get a response which says they can execute at, for example, +6, +12, +18, +24 and +30 seconds.
  • The remaining 5 requests get a standard over-limit response and cancel their API requests.
  • 15 seconds later, another request comes with a 30s timeout, this one gets allowed to make a request at +21 seconds (6 seconds after last approved request).

Token bucket with 10 requests per minute:

  • We're at the limit, and send another 25 requests with a 2 min timeout.
  • The first 10 get a response that say they can execute at +60 seconds (for example, obviously depends when the period resets).
  • Next 10 get a response for +120 seconds (60 seconds after the first 10).
  • Last 5 get an over-limit response and cancel.

@Dreamsorcerer
Copy link
Author

Dreamsorcerer commented May 10, 2023

If I use ResetTime, then my understanding is that the token bucket example would look like:

  • 25 requests get an overlimit response with a ResetTime of +60 seconds.
  • 25 requests retry at +60 seconds.
  • 10 random (i.e. Not the first 10 that originally tried) requests get approved.
  • 15 random requests get an overlimit response with a ResetTime of +120 seconds (relative to the original request).
  • 15 requests retry at +120 seconds.
  • 10 random requests get approved.
  • 5 random requests get an overlimit response and cancel their API request after waiting 2 mins for no reason.

@Dreamsorcerer
Copy link
Author

Dreamsorcerer commented May 10, 2023

If you're using retry logic yourselves, then I think my proposal would significantly improve the performance issues reported in #74.

i.e. In that example above, you have 65 requests made for the retry logic, compared with only 25 in the proposal.

@thrawn01
Copy link
Contributor

It sounds like you want a queue rate limit requests.

In a queue style rate limit system, requests for a rate limit are placed in a queue. Each item on the queue is processed in a first-in-first-out (FIFO) order until the rate limit is reached. Once the rate limit is reached, subsequent requests are placed in the queue. Existing requests in the queue wait until the rate limit expires (token) or until there is space in the (leaky) bucket . When the rate limit has room the next request in the queue is processed. If any requests remain in the queue for longer than a predetermined time, (30s) they are removed from the queue, and the client is informed that the rate limit has been reached. The client is then advised to retry again in X amount of time in the future.

Gubernator isn't currently designed for this, but it's not impossible to implement. If the implementation was simple enough, I would approve such a feature to be merged.

The request would have to include a new behavior IE: Behavior_QUEUE. In the ideal situation the client would cancel the request to Gubernator after 30s was reached. GPRC would notify gubernator this happened (I think) which in code we should see as a context cancel. The implementation would have to watch for that context cancel and remove the request from the queue.

To implement this correctly, one would need to implement a new version of the pool which https://github.com/mailgun/gubernator/blob/30437ac270ec6bbaae146c614ae1f3cfe7d91476/gubernator_pool.go#L56 would queue the requests to a specific rate limit hash and only return if the context was cancelled or the request returns UNDER_LIMIT. Additionally it would return if the request stays in the queue for some maximum amount of configurable time just so we don't hold on to requests for ever when the client doesn't implement connection timeout properly.

@thrawn01 thrawn01 added the help wanted Extra attention is needed label May 10, 2023
@Dreamsorcerer
Copy link
Author

Dreamsorcerer commented May 11, 2023

Yes, a queue is approximately what I meant. But, I think the way I described a possible implementation earlier (#172 (comment)) would avoid the need to actually manage a queue and have the benefit of being able to immediately cancel requests that would timeout, instead of at the end of the timeout period.

I think that behaviour could also be enabled simply by specifying the desired timeout in the request, rather than adding a Behaviour_QUEUE. i.e. If no timeout value is given, then it would behave exactly the same as today.

@thrawn01
Copy link
Contributor

  • Allow that counter to go to negative as it counts how many requests over the limit it is.
  • From the rate limit information and over-rate count, it can calculate when the next request would be allowed.
  • If that is over the timeout value of the request, it rejects with an OverLimit response.
  • Otherwise, it increments the over-limit count and responds with a positive response that says you can make the request at X time.

I think I follow what you are saying. However, what happens if X number clients request the same rate limit at the same time and the condition you described is hit. In the positive response case, how do you know which of the client requests gets the positive response? Without some sort of queue all clients would get a positive response.

The only way to fairly respond is by implementing a queue such that the first client request in the queue gets the positive response.

I hope I understand your suggestion and you follow my reasoning. Let me know if we are crossing wires here.

@Dreamsorcerer
Copy link
Author

Dreamsorcerer commented May 30, 2023

In the positive response case, how do you know which of the client requests gets the positive response?

How is that different to the current case? i.e. Remaining is 1, and several requests come in. How do you ensure that only 1 gets the positive response? It seems like it should work exactly the same to me, we just allow remaining to go to a negative value rather than 0.

I think the specific bit of logic I'm proposing changing is here:
https://github.com/mailgun/gubernator/blob/master/algorithms.go#L179-L202

So, instead of checking for something like hits > remaining, it'd be checking for something along the lines of hits > remaining + (limit / duration * timeout) (that's obviously simplified, and closer to the logic of the leaky bucket, but hopefully you get the idea). Then, when remaining < 0, you'd give the new response with a timestamp of now + abs(remaining / (limit / duration)), which is the time that the client is allowed to make their request at.

I'd call this "cooperative queueing", as it depends on the clients waiting until the designated time to send their API request. This obviously makes the assumption that the server and clients have their clocks synchronised, but given that most machines use NTP or similar, that seems reasonable to me.
The clock sync requirement is the only drawback to this method I've thought of. The advantages are immediate cancellation for timeouts and less resources needed (RAM and number of open connections).

@thrawn01
Copy link
Contributor

thrawn01 commented Jun 6, 2023

cooperative queueing and the functionality already supported by Gubernator sound similar. Gubernator already provides clients with the ability to calculate when to retry a rate limit request using the reset_time value returned by the rate limit request. This allows clients to estimate a future time when it is likely to receive an "under the limit" response.

if I understand your original request correctly,

The service could add the request to a queue and then return the success response once the rate limit allows.

You were looking for a mechanism where an "under the limit" response is guaranteed within a specific timeout, as long as the client is the first in line to receive such a response. In this case, implementing a queue is the only solution. By adding the rate limit request to a queue and returning a success response once the rate limit allows, we ensure that the client receives an "under the limit" response within the specified timeout.

On the other hand, if you are open to the possibility of a knowing when a "likely" (but not guaranteed) "under the limit" response will be returned, then the client can make use of the reset_time value to calculate a future time when a rate limit request is expected to return an "under the limit" response. This approach provides flexibility but does not guarantee an "under the limit" response within a specific request timeout.

Again, we might be crossing wires here.

@Dreamsorcerer
Copy link
Author

Gubernator already provides clients with the ability to calculate when to retry a rate limit request using the reset_time value returned by the rate limit request.

Yes, but in contrast to either queuing proposals, this doesn't guarantee a FIFO order of operations (or even guarantee a particular request will be sent in a reasonable timeframe at all). It also can't tell you if a request will go through or not, until it either does or the timeout is reached.

In this case, implementing a queue is the only solution.

Except that a queue is not the only way to implement that, my above suggestion using a count over the limit and having the client wait until the designated time, would achieve the same result without needing to store a queue in memory or leaving connections open for a long time.

If you want to be able to abort immediately if there's going to be a timeout, you'd need to calculate the timing based on the queue size. At that point, even if it's going to happen within the timeout, you might as well just return that calculated time to the client immediately and let it make the request at that time on its own. And at that point, you might as well get rid of the queue and just track the size of the queue (i.e. the number of requests over-limit), which can be done just by using a negative value for remaining.

Again, we might be crossing wires here.

I fully understand everything you've said, I just don't see any issues with my other proposal yet, except for needing clock synchronisation, while seeing several advantages for resource usage over a full queue implementation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants