[RFC][QSB] Approached to enforcement of system resource limits #11846
Labels
discuss
Issues intended to help drive brainstorming and decision making
enhancement
Enhancement or improvement to existing feature or request
RFC
Issues requesting major changes
Roadmap:Stability/Availability/Resiliency
Project-wide roadmap label
Search:Resiliency
Is your feature request related to a problem? Please describe
It is a meta request to QSB feature request to get some community feedback on possible approaches.
Describe the solution you'd like
Approaches to Enforce Resource Limits
There are basically two ways to enforce the resource consumption limits I can think of. First one can focuses on allocating or maintaining the fixed amount of resource usage for a sandbox while second one can be made flexible to make optimum use of resources available.
Lets understand them with the help of some examples here. For the sake of simplicity I am only using a single value for resource limit but there will be two limits for each system resource low and high.
Constrained
Lets say we have 3 Sandboxes in the System
System wide resource limit: 90
Lets caputre the current resource usage of the sandboxes at different times
Cancellation Case: sandbox limit breached
Sandbox2 will start rejecting new requests for this sandbox and cancel some.
Cancellation Case: system limit breached
here cells in bold will see cancellation as cumulatively it is breaching the system limit. It means that sandbox2 will face cancellation even though the sandbox level limits are not breaching here.
Rejection Case:
In this case Sandbox2 will face rejections as the sandbox level limits are breaching.
Reserved
Lets say we have 3 Sandboxes in the System
The sandbox limits for the example are taken in such a way that cumulative sum of the resource limits on sandboxes should sum up to 100 as inherent in the approach.
Cancellation Case: sanbox limit breached
(1) At this point the sandbox2 will start rejecting new incoming requests
(2) At this point we will also start cancelling running requests from sandbox2 due to sandbox level resource limit breach.
Cancellation Case: system level limit breached
In this case the sandbox2 will start cancelling the requests because it is the lowest priority sandbox.
Decision driving factors to select the Approach from one of the Above
Personal Verdict
Problems with the selected approach to enforce sys resource limits and possible solutions
The only ambiguity with this approach is the ability to maintain the cumulative resource limit to 100 since the user can supply any random value for new sandboxes.
To understand this with the help of examples, lets say at any point in time we have 3 sandboxes in the system
now lets say user want to create a new sandbox with resource limit of 30 the new cumulative sum will become 120 (>100). This warrants the readjustment of the existing sandbox limits or create the new sandbox with the limit of 10.
Now how do we resolve this conflict there are two ways I can think of resolving this
Personally I think the 2nd option provides better user experience. But I am looking forward to hear from the folks on this.
I am using Sandbox keyword as we had started envisioning this feature with it. But It is not the final name for the construct to be used in the implementation.
Main Issues
Related component
Search:Resiliency
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: