Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http ratelimit: option to reduce budget on stream done #37548

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

more

011a816
Select commit
Loading
Failed to load commit list.
Open

http ratelimit: option to reduce budget on stream done #37548

more
011a816
Select commit
Loading
Failed to load commit list.
CI (Envoy) / Mobile/Release validation skipped Dec 12, 2024 in 0s

Check was skipped

This check was not triggered in this CI run

Details

Request (pr/37548/main@011a816)

mathetake @mathetake 011a816 #37548 merge main@602a2b9

http ratelimit: option to reduce budget on stream done

Commit Message: ratelimit: option to excute action on stream done

Additional Description:
This adds a new option apply_on_stream_done to the Action
message of the http ratelimit. This basically allows to configure
actions to be executed in a response content-aware way and do not
enforce the rate limit (in other words "fire-and-forget"). Since addend
can be currently controlled via envoy.ratelimit.hits_addend metadata,
another filter can be used to set the value to reflect their intent there,
for example, by using Lua or Ext Proc filters.

This use case arises from the LLM API services which usually return
the usage statistics in the response body. More specifically,
they have "streaming" APIs whose response is a line-by-line event
stream where the very last line of the response line contains the
usage statistics. The lazy nature of this action is perfectly fine
as in these use cases, the rate limit happens like "you are forbidden
from the next time".

Besides the LLM specific, I've also encountered the use case from the
data center resource allocation case where the operators want to
"block the computation from the next time since you used this much
resources in this request".

Ref: envoyproxy/gateway#4756

Risk Level: low
Testing: TODO
Docs Changes: done (via comments in proto)
Release Notes: TODO
Platform Specific Features: n/a


the description might not reflect the actual change as it's being discussed and developed - please refer to the diff for now

Environment

Request variables

Key Value
ref f9e79c7
sha 011a816
pr 37548
base-sha 602a2b9
actor mathetake @mathetake
message http ratelimit: option to reduce budget on stream done ...
started 1733966086.098874
target-branch main
trusted false
Build image

Container image/s (as used in this CI run)

Key Value
default envoyproxy/envoy-build-ubuntu:f94a38f62220a2b017878b790b6ea98a0f6c5f9c
mobile envoyproxy/envoy-build-ubuntu:mobile-f94a38f62220a2b017878b790b6ea98a0f6c5f9c
Version

Envoy version (as used in this CI run)

Key Value
major 1
minor 33
patch 0
dev true