-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(spec): Add specification for output rate limiting #15665
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Output rate limiting | ||
|
||
## Objective | ||
|
||
Allow to control the metric-rate sent by outputs | ||
|
||
## Keywords | ||
|
||
output plugins, rate limit, buffer | ||
|
||
## Overview | ||
|
||
Output plugins send metrics to their corresponding services respecting the | ||
`metric_batch_size` and the `flush_interval` configured. While this works well | ||
in most situations, special situations might occur where the output will send | ||
a large number of metrics in a short time-span. E.g. when a large number of | ||
metrics are gathered in a short amount of time by one or more inputs or when | ||
reconnecting after a longer disconnect of an output from it's service. | ||
In all of those cases a large number of batches are prepared and sent via the | ||
output plugin to its service potentially overwhelming the service in turn with | ||
the number of metrics sent. | ||
Furthermore, use-cases exist where operators want to provision limited resources | ||
to Telegraf and in turn want to control the data-rate to a service. | ||
|
||
This specification intends to introduce an _optional_ rate limiting feature | ||
configurable per output to gain control of the sending rate of output plugins. | ||
Therefore, a new `metric_rate_limit` setting is proposed allowing to set the | ||
maximum number of metrics sent __per second__ via an output. By default, the | ||
metric rate must be unlimited. | ||
|
||
In case the specified rate limit is reached, a smaller batch satisfying the | ||
limit is sent or, if no metrics are left, the write-cycle is skipped by the | ||
output. The user should be informed in the logs if the rate limit applies. | ||
|
||
## Caveats | ||
|
||
It is important to note that setting a metric rate limit poses a severe | ||
constraint for an output, so the feature should be used carefully. Please make | ||
sure the configured metric rate limit exceeds the average input rate of metrics | ||
gathered by inputs. | ||
In case the limit is set too low, i.e. below the average rate metrics are | ||
gathered by inputs, the output might not be able to sent the metrics fast | ||
enough. In turn the metrics buffer will fill up and metrics are dropped. | ||
Telegraf might not be able to recover from this situation in case the output | ||
rate is permanently below the input rate. | ||
|
||
## Related Issues | ||
|
||
- [#15353](https://github.com/influxdata/telegraf/issues/15353) rate limiting processor proposal |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is that in many cases a cloud or database or service will put limits in terms of "MB per second". Here is some prior art from InfluxDB: influxdata/influxdb#19660
The logical next question from a user is "how do I know how many metrics translate to N MB per second".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this must be done in the output plugin as the message size depends on the serialization of the metrics and this is NOT known to the framework. The proposed solution is to limit the peak sends of metrics, not about server-side rate limiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no disagreement with this and certainly wasn't implying moving this out of the output plugin.
Correct, but one of the reasons people want this option is the service they are sending to will have limits as to how much it can accept. For example, using the influxdb output, we know some free plans are limited to 17 kb/s. How can a user take that limit, and apply it their telegraf config?
I don't see customers or users asking for "limit to N metrics per second". Rather it is a data rate per second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am getting this convolluted by trying to solve two things at once:
Your proposal is clearly tackling the first, but I see users grabbing this option to also tackle 2 as well.
Can we consider 2 to avoid yet another option.