http ratelimit: option to reduce budget on stream done #37548

mathetake · 2024-12-06T22:07:51Z

Commit Message: ratelimit: option to excute action on stream done

Additional Description:
This adds a new option apply_on_stream_done to the Action
message of the http ratelimit. This basically allows to configure
actions to be executed in a response content-aware way and do not
enforce the rate limit (in other words "fire-and-forget"). Since addend
can be currently controlled via envoy.ratelimit.hits_addend metadata,
another filter can be used to set the value to reflect their intent there,
for example, by using Lua or Ext Proc filters.

This use case arises from the LLM API services which usually return
the usage statistics in the response body. More specifically,
they have "streaming" APIs whose response is a line-by-line event
stream where the very last line of the response line contains the
usage statistics. The lazy nature of this action is perfectly fine
as in these use cases, the rate limit happens like "you are forbidden
from the next time".

Besides the LLM specific, I've also encountered the use case from the
data center resource allocation case where the operators want to
"block the computation from the next time since you used this much
resources in this request".

Ref: envoyproxy/gateway#4756

Risk Level: low
Testing: done
Docs Changes: done
Release Notes: TODO
Platform Specific Features: n/a

Signed-off-by: Takeshi Yoneda <[email protected]>

repokitteh-read-only · 2024-12-06T22:07:56Z

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #37548 was opened by mathetake.

see: more, trace.

repokitteh-read-only · 2024-12-06T22:08:02Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @wbpcode
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #37548 was opened by mathetake.

see: more, trace.

api/envoy/extensions/filters/http/ratelimit/v3/rate_limit.proto

mathetake · 2024-12-06T22:12:08Z

i guess the impl can be a bit large, so I might do that in separate PRs - anyways will think about it after the API gets approved

Signed-off-by: Takeshi Yoneda <[email protected]>

wbpcode · 2024-12-07T01:06:30Z

wow, we have a similar requirement internally and I finally figured out a similar way. It is super surprised and happy to see this.

mathetake · 2024-12-07T18:15:13Z

cool glad to hear that you came to the similar idea!

Signed-off-by: Takeshi Yoneda <[email protected]>

…both for clarity Signed-off-by: Takeshi Yoneda <[email protected]>

…nd for future extension Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake · 2024-12-09T17:35:13Z

@wbpcode thank you for the valuable feedback offline! I think I will go ahead and try implementing the idea - i don't think the change won't be that huge

Signed-off-by: Takeshi Yoneda <[email protected]>

source/extensions/filters/http/ratelimit/ratelimit.cc

api/envoy/extensions/filters/http/ratelimit/v3/rate_limit.proto

Signed-off-by: Takeshi Yoneda <[email protected]>

arkodg

from an API perspective, LGTM thanks ! This would help users be able to also count based response attributes into the same RL bucket

mathetake · 2024-12-11T20:02:03Z

now i am working on polishing the impl...

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake · 2024-12-11T21:22:12Z

@wbpcode i feel like the impl is ready for review- much simpler than I thought. Could you review that? and meanwhile I will add more tests

Signed-off-by: Takeshi Yoneda <[email protected]>

source/extensions/filters/http/ratelimit/ratelimit.h

Signed-off-by: Takeshi Yoneda <[email protected]>

wbpcode

Thanks for this great contribution and sorry for the delay. Add a comment to the API.

/wait

wbpcode · 2024-12-12T02:31:51Z

api/envoy/extensions/filters/http/ratelimit/v3/rate_limit.proto

+
+  // If true, rate limit requests will also be sent to the rate limit service when the stream completes.
+  // This is useful when the rate limit budget needs to reflect the response context that is not available
+  // on the request path.
+  //
+  // On the stream completion, the filter will reuse the exact same descriptors matched during the request path.
+  // In other words, the descriptors are not recalculated on the stream completion, but the rate limit requests
+  // are sent with the same descriptors as the original request sent during the request path.
+  // For example, request header matching descriptors are available on the stream completion.
+  //
+  // For example, let's say the upstream service calculates the usage statistics, returns them in the response body
+  // and we want to utilize these numbers to apply the rate limit action for the subsequent requests.
+  // Combined with another filter that can set ``envoy.ratelimit.hits_addend`` based on the response (e.g. Lua filter),
+  // this can be used to subtract the usage statistics from the rate limit budget.
+  //
+  // The rate limit requests sent on the stream completion are "fire-and-forget" by nature, and rate limit is not enforced
+  // on the current HTTP stream being completed. The filter will only update the budget for the subsequent requests at
+  // that point. Hence the effect of the rate limit requests made during the stream completion is not visible in the current
+  // but only in the subsequent requests.
+  bool apply_on_stream_done = 14;


Rather than a filter level flag, I think it should be per-descriptor level (in the route.v3.RateLimit). Then different routes could have different choice. And more important, different resource could also have different choice. Like the qps limit need only to works at request path but only the ai tokens limit need to works at response.

This semantics should be clear: request or response, but not for both. When we populate the descritpors, we can tell the the populateDescritors() what phase is and lets the populateDescritpors() only populate related descritpros. This bring a complexity that the users need to configure two descritpors: one for request with 0 or 1 hits addend for check only, one for response with CUSTOM_HITS_ADDEND or CUSTOM_HITS_ADDEND - 1 for report. But the semantics is more clear and behavior is more predictable. If we let same descritpor works on both request and response, we must ensure we can get correct different values for hits_addend from same source, that's weird.

ratelimit: option to excute action on stream done

9deac5f

Signed-off-by: Takeshi Yoneda <[email protected]>

repokitteh-read-only bot added the api label Dec 6, 2024

repokitteh-read-only bot assigned wbpcode Dec 6, 2024

mathetake commented Dec 6, 2024

View reviewed changes

api/envoy/extensions/filters/http/ratelimit/v3/rate_limit.proto Outdated Show resolved Hide resolved

mathetake marked this pull request as ready for review December 6, 2024 22:10

mathetake added 2 commits December 6, 2024 22:27

format

5fe0370

Signed-off-by: Takeshi Yoneda <[email protected]>

more clarify

32ea3e9

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake added 3 commits December 7, 2024 18:25

clarify more

e346217

Signed-off-by: Takeshi Yoneda <[email protected]>

Apply feedback in offline: action is either request or response, not …

c8e0b73

…both for clarity Signed-off-by: Takeshi Yoneda <[email protected]>

Apply offline review: remove the mention of envoy.ratelimit.hits_adde…

6166b90

…nd for future extension Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake added 2 commits December 9, 2024 17:39

more comments

cab0154

Signed-off-by: Takeshi Yoneda <[email protected]>

Put it in the correct place

371e4c6

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake changed the title ~~http ratelimit: option to excute action on stream done~~ ratelimit: option to excute action on stream done Dec 9, 2024

This comment was marked as outdated.

Sign in to view

Filter config level

7e77a7f

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake requested a review from mattklein123 as a code owner December 9, 2024 22:15

mathetake changed the title ~~ratelimit: option to excute action on stream done~~ http ratelimit: option to excute action on stream done Dec 9, 2024

This comment was marked as outdated.

Sign in to view

mathetake changed the title ~~http ratelimit: option to excute action on stream done~~ http ratelimit: option to reduce budget on stream done Dec 9, 2024

mathetake commented Dec 9, 2024

View reviewed changes

source/extensions/filters/http/ratelimit/ratelimit.cc Outdated Show resolved Hide resolved

mathetake commented Dec 9, 2024

View reviewed changes

source/extensions/filters/http/ratelimit/ratelimit.cc Outdated Show resolved Hide resolved

arkodg reviewed Dec 11, 2024

View reviewed changes

api/envoy/extensions/filters/http/ratelimit/v3/rate_limit.proto Show resolved Hide resolved

zirain mentioned this pull request Dec 11, 2024

support ReturnDescriptorsInResponse envoyproxy/ratelimit#752

Closed

mention envoy.ratelimit.hits_addend

8f29563

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake added 2 commits December 11, 2024 17:42

Merge remote-tracking branch 'origin/main' into actionapplyondone

a02e932

Add more comments

6598d44

Signed-off-by: Takeshi Yoneda <[email protected]>

arkodg approved these changes Dec 11, 2024

View reviewed changes

impl

59c2d1d

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake added 3 commits December 11, 2024 21:54

format

3047603

Signed-off-by: Takeshi Yoneda <[email protected]>

adds tests

c812eec

Signed-off-by: Takeshi Yoneda <[email protected]>

ok

d4448c4

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake commented Dec 12, 2024

View reviewed changes

source/extensions/filters/http/ratelimit/ratelimit.h Outdated Show resolved Hide resolved

mathetake added 2 commits December 12, 2024 01:09

simplifies

7b9bd57

Signed-off-by: Takeshi Yoneda <[email protected]>

more

011a816

Signed-off-by: Takeshi Yoneda <[email protected]>

wbpcode reviewed Dec 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http ratelimit: option to reduce budget on stream done #37548

http ratelimit: option to reduce budget on stream done #37548

mathetake commented Dec 6, 2024 •

edited

Loading

repokitteh-read-only bot commented Dec 6, 2024

repokitteh-read-only bot commented Dec 6, 2024

mathetake commented Dec 6, 2024 •

edited

Loading

wbpcode commented Dec 7, 2024 •

edited

Loading

mathetake commented Dec 7, 2024

mathetake commented Dec 9, 2024

This comment was marked as outdated.

This comment was marked as outdated.

arkodg left a comment •

edited

Loading

mathetake commented Dec 11, 2024

mathetake commented Dec 11, 2024

wbpcode left a comment •

edited

Loading

wbpcode Dec 12, 2024

http ratelimit: option to reduce budget on stream done #37548

Are you sure you want to change the base?

http ratelimit: option to reduce budget on stream done #37548

Conversation

mathetake commented Dec 6, 2024 • edited Loading

repokitteh-read-only bot commented Dec 6, 2024

repokitteh-read-only bot commented Dec 6, 2024

mathetake commented Dec 6, 2024 • edited Loading

wbpcode commented Dec 7, 2024 • edited Loading

mathetake commented Dec 7, 2024

mathetake commented Dec 9, 2024

This comment was marked as outdated.

This comment was marked as outdated.

arkodg left a comment • edited Loading

Choose a reason for hiding this comment

mathetake commented Dec 11, 2024

mathetake commented Dec 11, 2024

wbpcode left a comment • edited Loading

Choose a reason for hiding this comment

wbpcode Dec 12, 2024

Choose a reason for hiding this comment

mathetake commented Dec 6, 2024 •

edited

Loading

mathetake commented Dec 6, 2024 •

edited

Loading

wbpcode commented Dec 7, 2024 •

edited

Loading

arkodg left a comment •

edited

Loading

wbpcode left a comment •

edited

Loading