-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(processors.regex): Allow batch transforms using named groups #13971
Changes from all commits
bea17dd
683ac96
c47cf07
88621fe
1cb4a2b
65c5968
fc9e1cd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,12 @@ | ||
# Regex Processor Plugin | ||
|
||
The `regex` plugin transforms tag and field values with regex pattern. If | ||
`result_key` parameter is present, it can produce new tags and fields from | ||
existing ones. | ||
This plugin transforms tag and field _values_ as well as renaming tags, fields | ||
and metrics using regex patterns. Tag and field _values_ can be transformed | ||
using named-groups in a batch fashion. | ||
|
||
The regex processor **only operates on string fields**. It will not work on | ||
any other data types, like an integer or float. | ||
|
||
For tags transforms, if `append` is set to `true`, it will append the | ||
transformation to the existing tag value, instead of overwriting it. | ||
|
||
For metrics transforms, `key` denotes the element that should be | ||
transformed. Furthermore, `result_key` allows control over the behavior applied | ||
in case the resulting `tag` or `field` name already exists. | ||
|
||
## Global configuration options <!-- @/docs/includes/plugin_config.md --> | ||
|
||
In addition to the plugin-specific configuration settings, plugins support | ||
|
@@ -30,74 +23,221 @@ See the [CONFIGURATION.md][CONFIGURATION.md] for more details. | |
[[processors.regex]] | ||
namepass = ["nginx_requests"] | ||
|
||
# Tag and field conversions defined in a separate sub-tables | ||
## Tag value conversion(s). Multiple instances are allowed. | ||
[[processors.regex.tags]] | ||
## Tag to change, "*" will change every tag | ||
## Tag(s) to process with optional glob expressions such as '*'. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While unlikely... what happens if a user had * in their metric name : There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We support what we always support in filtering... The asterix is just an example and your comment applies to the other dozens of filter parameters. What do you expect here? And should we also add this information to all the other filter instances? |
||
key = "resp_code" | ||
## Regular expression to match on a tag value | ||
## Regular expression to match the tag value. If the value doesn't | ||
## match the tag is ignored. | ||
pattern = "^(\\d)\\d\\d$" | ||
## Matches of the pattern will be replaced with this string. Use ${1} | ||
## notation to use the text of the first submatch. | ||
## Replacement expression defining the value of the target tag. You can | ||
## use regexp groups or named groups e.g. ${1} references the first group. | ||
replacement = "${1}xx" | ||
|
||
## Name of the target tag defaulting to 'key' if not specified. | ||
## In case of wildcards being used in `key` the currently processed | ||
## tag-name is used as target. | ||
# result_key = "method" | ||
## Appends the replacement to the target tag instead of overwriting it when | ||
## set to true. | ||
# append = false | ||
|
||
## Field value conversion(s). Multiple instances are allowed. | ||
[[processors.regex.fields]] | ||
## Field to change | ||
## Field(s) to process with optional glob expressions such as '*'. | ||
key = "request" | ||
## All the power of the Go regular expressions available here | ||
## For example, named subgroups | ||
## Regular expression to match the field value. If the value doesn't | ||
## match or the field doesn't contain a string the field is ignored. | ||
pattern = "^/api(?P<method>/[\\w/]+)\\S*" | ||
## Replacement expression defining the value of the target field. You can | ||
## use regexp groups or named groups e.g. ${method} references the group | ||
## named "method". | ||
replacement = "${method}" | ||
## If result_key is present, a new field will be created | ||
## instead of changing existing field | ||
result_key = "method" | ||
## Name of the target field defaulting to 'key' if not specified. | ||
## In case of wildcards being used in `key` the currently processed | ||
## field-name is used as target. | ||
# result_key = "method" | ||
|
||
# Multiple conversions may be applied for one field sequentially | ||
# Let's extract one more value | ||
[[processors.regex.fields]] | ||
key = "request" | ||
pattern = ".*category=(\\w+).*" | ||
replacement = "${1}" | ||
result_key = "search_category" | ||
|
||
# Rename metric fields | ||
## Rename metric fields | ||
[[processors.regex.field_rename]] | ||
## Regular expression to match on a field name | ||
## Regular expression to match on the field name | ||
pattern = "^search_(\\w+)d$" | ||
## Matches of the pattern will be replaced with this string. Use ${1} | ||
## notation to use the text of the first submatch. | ||
## Replacement expression defining the name of the new field | ||
replacement = "${1}" | ||
## If the new field name already exists, you can either "overwrite" the | ||
## existing one with the value of the renamed field OR you can "keep" | ||
## both the existing and source field. | ||
# result_key = "keep" | ||
|
||
# Rename metric tags | ||
# [[processors.regex.tag_rename]] | ||
# ## Regular expression to match on a tag name | ||
# pattern = "^search_(\\w+)d$" | ||
# ## Matches of the pattern will be replaced with this string. Use ${1} | ||
# ## notation to use the text of the first submatch. | ||
# replacement = "${1}" | ||
# ## If the new tag name already exists, you can either "overwrite" the | ||
# ## existing one with the value of the renamed tag OR you can "keep" | ||
# ## both the existing and source tag. | ||
# # result_key = "keep" | ||
|
||
# Rename metrics | ||
# [[processors.regex.metric_rename]] | ||
# ## Regular expression to match on an metric name | ||
# pattern = "^search_(\\w+)d$" | ||
# ## Matches of the pattern will be replaced with this string. Use ${1} | ||
# ## notation to use the text of the first submatch. | ||
# replacement = "${1}" | ||
## Rename metric tags | ||
[[processors.regex.tag_rename]] | ||
## Regular expression to match on a tag name | ||
pattern = "^search_(\\w+)d$" | ||
## Replacement expression defining the name of the new tag | ||
replacement = "${1}" | ||
## If the new tag name already exists, you can either "overwrite" the | ||
## existing one with the value of the renamed tag OR you can "keep" | ||
## both the existing and source tag. | ||
# result_key = "keep" | ||
|
||
## Rename metrics | ||
[[processors.regex.metric_rename]] | ||
## Regular expression to match on an metric name | ||
pattern = "^search_(\\w+)d$" | ||
## Replacement expression defining the new name of the metric | ||
replacement = "${1}" | ||
``` | ||
|
||
Please note, you can use multiple `tags`, `fields`, `tag_rename`, `field_rename` | ||
and `metric_rename` sections in one processor. All of those are applied. | ||
|
||
### Tag and field _value_ conversions | ||
|
||
Conversions are only applied if a tag/field _name_ matches the `key` which can | ||
contain glob statements such as `*` (asterix) _and_ the `pattern` matches the | ||
tag/field _value_. For fields the field values has to be of type `string` to | ||
apply the conversion. If any of the given criteria does not apply the conversion | ||
is not applied to the metric. | ||
|
||
The `replacement` option specifies the value of the resulting tag or field. It | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should there be another header here for replacement? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No it describes how the |
||
can reference capturing groups by index (e.g. `${1}` being the first group) or | ||
by name (e.g. `${mygroup}` being the group named `mygroup`). | ||
|
||
By default, the currently processed tag or field is overwritten by the | ||
`replacement`. To create a new tag or field you can additionally specify the | ||
`result_key` option containing the new target tag or field name. In case the | ||
given tag or field already exists, its value is overwritten. For `tags` you | ||
might use the `append` flag to append the `replacement` value to an existing | ||
tag. | ||
|
||
### Batch processing using named groups | ||
|
||
In `tags` and `fields` sections it is possible to use named groups to create | ||
multiple new tags or fields respectively. To do so, _all_ capture groups have | ||
to be named in the `pattern`. Additional non-capturing ones or other | ||
expressions are allowed. Furthermore, neither `replacement` nor `result_key` | ||
can be set as the resulting tag/field name is the name of the group and the | ||
value corresponds to the group's content. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth an example below here? Or even below each of these? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is an example in |
||
|
||
### Tag and field _name_ conversions | ||
|
||
You can batch-rename tags and fields using the `tag_rename` and `field_rename` | ||
sections. Contrary to the `tags` and `fields` sections, the rename operates on | ||
the tag or field _name_, not its _value_. | ||
|
||
A tag or field is renamed if the given `pattern` matches the name. The new name | ||
is specified via the `replacement` option. Optionally, the `result_key` can be | ||
set to either `overwrite` or `keep` (default) to control the behavior in case | ||
the target tag/field already exists. For `overwrite` the target tag/field is | ||
replaced by the source key. With this setting, the source tag/field | ||
is removed in any case. When using the `keep` setting (default), the target | ||
tag/field as well as the source is left unchanged and no renaming takes place. | ||
|
||
### Metric _name_ conversions | ||
|
||
Similar to the tag and field renaming, `metric_rename` section(s) can be used | ||
to rename metrics matching the given `pattern`. The resulting metric name is | ||
given via `replacement` option. If matching `pattern` the conversion is always | ||
applied. The `result_key` option has no effect on metric renaming and shall | ||
not be specified. | ||
|
||
## Tags | ||
|
||
No tags are applied by this processor. | ||
|
||
## Example | ||
|
||
In the following examples we are using this metric | ||
|
||
```text | ||
nginx_requests,verb=GET,resp_code=2xx request="/api/search/?category=plugins&q=regex&sort=asc",method="/search/",category="plugins",referrer="-",ident="-",http_version=1.1,agent="UserAgent",client_ip="127.0.0.1",auth="-",resp_bytes=270i 1519652321000000000 | ||
nginx_requests,verb=GET,resp_code=200 request="/api/search/?category=plugins&q=regex&sort=asc",referrer="-",ident="-",http_version=1.1,agent="UserAgent",client_ip="127.0.0.1",auth="-",resp_bytes=270i 1519652321000000000 | ||
``` | ||
|
||
### Explicit specification | ||
|
||
```toml | ||
[[processors.regex]] | ||
namepass = ["nginx_requests"] | ||
|
||
[[processors.regex.tags]] | ||
key = "resp_code" | ||
pattern = "^(\\d)\\d\\d$" | ||
replacement = "${1}xx" | ||
|
||
[[processors.regex.fields]] | ||
key = "request" | ||
pattern = "^/api(?P<method>/[\\w/]+)\\S*" | ||
replacement = "${method}" | ||
result_key = "method" | ||
|
||
[[processors.regex.fields]] | ||
key = "request" | ||
pattern = ".*category=(\\w+).*" | ||
replacement = "${1}" | ||
result_key = "search_category" | ||
|
||
[[processors.regex.field_rename]] | ||
pattern = "^client_(\\w+)$" | ||
replacement = "${1}" | ||
``` | ||
|
||
will result in | ||
|
||
```diff | ||
-nginx_requests,verb=GET,resp_code=200 request="/api/search/?category=plugins&q=regex&sort=asc",referrer="-",ident="-",http_version=1.1,agent="UserAgent",client_ip="127.0.0.1",auth="-",resp_bytes=270i 1519652321000000000 | ||
+nginx_requests,verb=GET,resp_code=2xx request="/api/search/?category=plugins&q=regex&sort=asc",method="/search/",category="plugins",referrer="-",ident="-",http_version=1.1,agent="UserAgent",ip="127.0.0.1",auth="-",resp_bytes=270i 1519652321000000000 | ||
``` | ||
|
||
### Appending | ||
|
||
```toml | ||
[[processors.regex]] | ||
namepass = ["nginx_requests"] | ||
|
||
[[processors.regex.tags]] | ||
key = "resp_code" | ||
pattern = '^2\d\d$' | ||
replacement = " OK" | ||
result_key = "verb" | ||
append = true | ||
``` | ||
|
||
will result in | ||
|
||
```diff | ||
-nginx_requests,verb=GET,resp_code=200 request="/api/search/?category=plugins&q=regex&sort=asc",referrer="-",ident="-",http_version=1.1,agent="UserAgent",client_ip="127.0.0.1",auth="-",resp_bytes=270i 1519652321000000000 | ||
+nginx_requests,verb=GET\ OK,resp_code=200 request="/api/search/?category=plugins&q=regex&sort=asc",referrer="-",ident="-",http_version=1.1,agent="UserAgent",client_ip="127.0.0.1",auth="-",resp_bytes=270i 1519652321000000000 | ||
``` | ||
|
||
### Named groups | ||
|
||
```toml | ||
[[processors.regex]] | ||
namepass = ["nginx_requests"] | ||
|
||
[[processors.regex.fields]] | ||
key = "request" | ||
pattern = '^/api/(?P<method>\w+)[/?].*category=(?P<category>\w+)&(?:.*)' | ||
``` | ||
|
||
will result in | ||
|
||
```diff | ||
-nginx_requests,verb=GET,resp_code=200 request="/api/search/?category=plugins&q=regex&sort=asc",referrer="-",ident="-",http_version=1.1,agent="UserAgent",client_ip="127.0.0.1",auth="-",resp_bytes=270i 1519652321000000000 | ||
+nginx_requests,verb=GET,resp_code=200 request="/api/search/?category=plugins&q=regex&sort=asc",method="search",category="plugins",referrer="-",ident="-",http_version=1.1,agent="UserAgent",client_ip="127.0.0.1",auth="-",resp_bytes=270i 1519652321000000000 | ||
``` | ||
|
||
### Metric renaming | ||
|
||
```toml | ||
[[processors.regex]] | ||
[[processors.regex.metric_rename]] | ||
pattern = '^(\w+)_.*$' | ||
replacement = "${1}" | ||
``` | ||
|
||
will result in | ||
|
||
```diff | ||
-nginx_requests,verb=GET,resp_code=200 request="/api/search/?category=plugins&q=regex&sort=asc",referrer="-",ident="-",http_version=1.1,agent="UserAgent",client_ip="127.0.0.1",auth="-",resp_bytes=270i 1519652321000000000 | ||
+nginx,verb=GET,resp_code=200 request="/api/search/?category=plugins&q=regex&sort=asc",referrer="-",ident="-",http_version=1.1,agent="UserAgent",client_ip="127.0.0.1",auth="-",resp_bytes=270i 1519652321000000000 | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed these comments? Are the covered below now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes as the behavior is now described in the
sample.conf
part in my view...