-
Notifications
You must be signed in to change notification settings - Fork 99
Scheduling over-limit requests #172
Comments
I suppose it could actually return a response immediately, with a new response status that says you are allowed to send a request at X time. The advantages to this over having the service simply retrying at the next reset_time are:
|
@Dreamsorcerer There is a |
See the previous bullet points. You'd still need to retry the request at ResetTime, when there could be hundreds of other requests all attempting to retry at exactly the same time (as they will all have received the same ResetTime). So, all of those bullet points above are not true by simply using ResetTime. |
Gubernator is not designed to maintain a session beyond the request lifecycle, nor does it make assumptions about what the client will do at a later time. If the |
But, it must remember the number of requests that have happened in the past, so I don't see how this would require it maintaining any more information than it does currently. It just needs to remember requests that have been approved in the future as well as the past. e.g. There must be some internal counter that counts by how many requests it is under the limit. So, the change might look like:
|
What you describe sounds like a "burst" mode. The user wants to force a hit to be allowed even if it's over limit, but then consider the burst activity within the rate limiting math. Am I understanding correctly? Either way, could you explain your use case for this feature? |
But, specifically, we need to know when the request is allowed to be made. We want to use this to match our outgoing requests to third-party API rate limits.
So, we want to make some requests within a reasonable timeframe (in some cases upto a few minutes), but still want to complete as fast as possible. So, as mentioned in the above bullet points, the retry approach would mean that if the system is overloaded with requests, we would need to keep retrying until the end of the timeframe, not knowing if the request will go through in time. This leaves users with a lengthy loading time, just to get no results. With the proposed approach, we'd know immediately if we are going to timeout and give the user an error immediately. Enforcing a first-come-first-served approach would also increase reliability that an individual request will get through. And this all reduces resource usage (bandwidth etc.) compared to the retry approach. |
To find next available opportunity in advance you can call |
But, that doesn't register it as a hit, right? We're just back to the retry logic. Let me try and come up with an example. Leaky bucket with 10 requests per minute:
Token bucket with 10 requests per minute:
|
If I use ResetTime, then my understanding is that the token bucket example would look like:
|
If you're using retry logic yourselves, then I think my proposal would significantly improve the performance issues reported in #74. i.e. In that example above, you have 65 requests made for the retry logic, compared with only 25 in the proposal. |
It sounds like you want a queue rate limit requests. In a queue style rate limit system, requests for a rate limit are placed in a queue. Each item on the queue is processed in a first-in-first-out (FIFO) order until the rate limit is reached. Once the rate limit is reached, subsequent requests are placed in the queue. Existing requests in the queue wait until the rate limit expires (token) or until there is space in the (leaky) bucket . When the rate limit has room the next request in the queue is processed. If any requests remain in the queue for longer than a predetermined time, (30s) they are removed from the queue, and the client is informed that the rate limit has been reached. The client is then advised to retry again in X amount of time in the future. Gubernator isn't currently designed for this, but it's not impossible to implement. If the implementation was simple enough, I would approve such a feature to be merged. The request would have to include a new behavior IE: To implement this correctly, one would need to implement a new version of the pool which |
Yes, a queue is approximately what I meant. But, I think the way I described a possible implementation earlier (#172 (comment)) would avoid the need to actually manage a queue and have the benefit of being able to immediately cancel requests that would timeout, instead of at the end of the timeout period. I think that behaviour could also be enabled simply by specifying the desired timeout in the request, rather than adding a |
I think I follow what you are saying. However, what happens if X number clients request the same rate limit at the same time and the condition you described is hit. In the positive response case, how do you know which of the client requests gets the positive response? Without some sort of queue all clients would get a positive response. The only way to fairly respond is by implementing a queue such that the first client request in the queue gets the positive response. I hope I understand your suggestion and you follow my reasoning. Let me know if we are crossing wires here. |
How is that different to the current case? i.e. Remaining is 1, and several requests come in. How do you ensure that only 1 gets the positive response? It seems like it should work exactly the same to me, we just allow remaining to go to a negative value rather than 0. I think the specific bit of logic I'm proposing changing is here: So, instead of checking for something like I'd call this "cooperative queueing", as it depends on the clients waiting until the designated time to send their API request. This obviously makes the assumption that the server and clients have their clocks synchronised, but given that most machines use NTP or similar, that seems reasonable to me. |
if I understand your original request correctly,
You were looking for a mechanism where an "under the limit" response is guaranteed within a specific timeout, as long as the client is the first in line to receive such a response. In this case, implementing a queue is the only solution. By adding the rate limit request to a queue and returning a success response once the rate limit allows, we ensure that the client receives an "under the limit" response within the specified timeout. On the other hand, if you are open to the possibility of a knowing when a "likely" (but not guaranteed) "under the limit" response will be returned, then the client can make use of the Again, we might be crossing wires here. |
Yes, but in contrast to either queuing proposals, this doesn't guarantee a FIFO order of operations (or even guarantee a particular request will be sent in a reasonable timeframe at all). It also can't tell you if a request will go through or not, until it either does or the timeout is reached.
Except that a queue is not the only way to implement that, my above suggestion using a count over the limit and having the client wait until the designated time, would achieve the same result without needing to store a queue in memory or leaving connections open for a long time. If you want to be able to abort immediately if there's going to be a timeout, you'd need to calculate the timing based on the queue size. At that point, even if it's going to happen within the timeout, you might as well just return that calculated time to the client immediately and let it make the request at that time on its own. And at that point, you might as well get rid of the queue and just track the size of the queue (i.e. the number of requests over-limit), which can be done just by using a negative value for
I fully understand everything you've said, I just don't see any issues with my other proposal yet, except for needing clock synchronisation, while seeing several advantages for resource usage over a full queue implementation. |
Is there any way to schedule currently over-limit requests?
e.g. I send a request that I want to happen eventually, but the rate limit has currently been reached.
The service could add the request to a queue and then return the success response once the rate limit allows.
This could also take a timeout parameter for the individual request. As the service can calculate when it will resolve the request (from the queue size and rate limit), it can reject immediately if it won't be resolved within the timeout.
This means the requester will know that the request will be handled eventually within the timeout, or will get an immediate rejection if not.
The text was updated successfully, but these errors were encountered: