Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sanitize structured metadata values during query time in pipeline code #15113

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

cstyan
Copy link
Contributor

@cstyan cstyan commented Nov 26, 2024

In an earlier PR (here) we made a change to sanitize structured metadata names of disallowed characters, since when SM is added to pipeline stages in a users query the SM is turned into labels and in turned passed to Prometheus' label parser. That then results in an error if there's disallowed characters.

What we missed at the time was the same sanitization of structured metadata values. This PR sanitizes the value in the same way, replacing the invalid character with a space . While this likely requires users to query their structured metadata with a regex filter, it's at least a workaround. Perhaps we can find a way to return a warning to the user that their query hit SM that had invalid characters, and therefore they should consider changing their MatchEqual to MatchRegex if they're not seeing results.

The queries in question are in the form of {some_label="x"} | app="app_name" | line_format {{.thing}} {{.message}} |= "look for something in the content". I don't fully understand all the cases we have that result in SM being treated as labels, but if we do stream processing we add the SM as labels (which in turn causes them to go through prometheus label parsing) and then by adding them to the line with line_format the contents of the SM value is then available to filter with |=.

This might also be a good time to revisit the topic of sanitizing or rejecting SM at ingestion time.

@cstyan cstyan requested a review from a team as a code owner November 26, 2024 00:10
@cstyan cstyan changed the title sanitize structured metadata values during query time in pipeline code fix: sanitize structured metadata values during query time in pipeline code Nov 26, 2024
@cstyan
Copy link
Contributor Author

cstyan commented Nov 26, 2024

@ashwanthgoli thanks for pointing out strings.Map, not sure where I got the bytes.Map usage anymore since I started these changes a few weeks ago + we're already using strings.Map for the other instances of replacing invalid characters.

The benchmark results don't look much different, but we should at least be consistent.

goos: linux
goarch: amd64
pkg: github.com/grafana/loki/v3/pkg/logql/log
cpu: AMD Ryzen 9 5950X 16-Core Processor
                                                                    │    main     │                 pr                 │            pr-stringMap            │
                                                                    │   sec/op    │   sec/op     vs base               │   sec/op     vs base               │
_Pipeline/pipeline_bytes-32                                           5.940µ ± 1%   5.898µ ± 1%       ~ (p=0.171 n=10)   6.026µ ± 3%       ~ (p=0.352 n=10)
_Pipeline/pipeline_string-32                                          5.974µ ± 1%   5.965µ ± 0%       ~ (p=0.493 n=10)   6.041µ ± 2%  +1.12% (p=0.001 n=10)
_Pipeline/pipeline_bytes_no_invalid_structured_metadata-32            6.014µ ± 1%   6.099µ ± 1%  +1.41% (p=0.001 n=10)   6.118µ ± 3%  +1.72% (p=0.000 n=10)
_Pipeline/pipeline_string_with_invalid_structured_metadata_name-32    6.413µ ± 1%   6.462µ ± 1%  +0.77% (p=0.017 n=10)   6.559µ ± 2%  +2.28% (p=0.000 n=10)
_Pipeline/line_extractor_bytes-32                                     6.563µ ± 1%   6.568µ ± 1%       ~ (p=0.630 n=10)   6.614µ ± 2%  +0.77% (p=0.008 n=10)
_Pipeline/line_extractor_string-32                                    6.572µ ± 1%   6.571µ ± 1%       ~ (p=0.644 n=10)   6.630µ ± 2%  +0.88% (p=0.000 n=10)
_Pipeline/label_extractor_bytes-32                                    6.693µ ± 1%   6.683µ ± 0%       ~ (p=0.811 n=10)   6.746µ ± 1%       ~ (p=0.123 n=10)
_Pipeline/label_extractor_string-32                                   6.654µ ± 1%   6.685µ ± 1%  +0.47% (p=0.041 n=10)   6.698µ ± 1%  +0.67% (p=0.023 n=10)
_Pipeline/pipeline_string_with_invalid_structured_metadata_value-32                 6.292µ ± 1%                          6.310µ ± 1%
geomean                                                               6.345µ        6.351µ       +0.21%                  6.409µ       +1.21%

                                                                    │    main    │                 pr                  │            pr-stringMap             │
                                                                    │    B/op    │    B/op     vs base                 │    B/op     vs base                 │
_Pipeline/pipeline_bytes-32                                           826.5 ± 0%   827.0 ± 0%       ~ (p=0.158 n=10)     827.0 ± 0%       ~ (p=0.450 n=10)
_Pipeline/pipeline_string-32                                          891.0 ± 0%   891.0 ± 0%       ~ (p=0.734 n=10)     891.0 ± 0%       ~ (p=0.861 n=10)
_Pipeline/pipeline_bytes_no_invalid_structured_metadata-32            552.0 ± 0%   552.5 ± 0%       ~ (p=0.141 n=10)     552.0 ± 0%       ~ (p=1.000 n=10)
_Pipeline/pipeline_string_with_invalid_structured_metadata_name-32    630.0 ± 0%   629.5 ± 0%       ~ (p=1.000 n=10)     630.0 ± 0%       ~ (p=0.628 n=10)
_Pipeline/line_extractor_bytes-32                                     519.0 ± 0%   519.0 ± 0%       ~ (p=1.000 n=10) ¹   519.0 ± 0%       ~ (p=1.000 n=10) ¹
_Pipeline/line_extractor_string-32                                    519.0 ± 0%   519.0 ± 0%       ~ (p=1.000 n=10) ¹   519.0 ± 0%       ~ (p=1.000 n=10) ¹
_Pipeline/label_extractor_bytes-32                                    519.0 ± 0%   519.0 ± 0%       ~ (p=1.000 n=10)     519.0 ± 0%       ~ (p=1.000 n=10) ¹
_Pipeline/label_extractor_string-32                                   519.0 ± 0%   519.0 ± 0%       ~ (p=1.000 n=10)     519.0 ± 0%       ~ (p=1.000 n=10)
_Pipeline/pipeline_string_with_invalid_structured_metadata_value-32                605.0 ± 0%                            605.0 ± 0%
geomean                                                               607.6        607.4       +0.01%                    607.4       +0.01%
¹ all samples are equal

                                                                    │    main    │                 pr                  │            pr-stringMap             │
                                                                    │ allocs/op  │ allocs/op   vs base                 │ allocs/op   vs base                 │
_Pipeline/pipeline_bytes-32                                           29.00 ± 0%   29.00 ± 0%       ~ (p=1.000 n=10) ¹   29.00 ± 0%       ~ (p=1.000 n=10) ¹
_Pipeline/pipeline_string-32                                          30.00 ± 0%   30.00 ± 0%       ~ (p=1.000 n=10) ¹   30.00 ± 0%       ~ (p=1.000 n=10) ¹
_Pipeline/pipeline_bytes_no_invalid_structured_metadata-32            29.00 ± 0%   29.00 ± 0%       ~ (p=1.000 n=10) ¹   29.00 ± 0%       ~ (p=1.000 n=10) ¹
_Pipeline/pipeline_string_with_invalid_structured_metadata_name-32    31.00 ± 0%   31.00 ± 0%       ~ (p=1.000 n=10) ¹   31.00 ± 0%       ~ (p=1.000 n=10) ¹
_Pipeline/line_extractor_bytes-32                                     28.00 ± 0%   28.00 ± 0%       ~ (p=1.000 n=10) ¹   28.00 ± 0%       ~ (p=1.000 n=10) ¹
_Pipeline/line_extractor_string-32                                    28.00 ± 0%   28.00 ± 0%       ~ (p=1.000 n=10) ¹   28.00 ± 0%       ~ (p=1.000 n=10) ¹
_Pipeline/label_extractor_bytes-32                                    28.00 ± 0%   28.00 ± 0%       ~ (p=1.000 n=10) ¹   28.00 ± 0%       ~ (p=1.000 n=10) ¹
_Pipeline/label_extractor_string-32                                   28.00 ± 0%   28.00 ± 0%       ~ (p=1.000 n=10) ¹   28.00 ± 0%       ~ (p=1.000 n=10) ¹
_Pipeline/pipeline_string_with_invalid_structured_metadata_value-32                31.00 ± 0%                            30.00 ± 0%
geomean                                                               28.86        29.09       +0.00%                    28.98       +0.00%
¹ all samples are equal

Copy link
Contributor

@ashwanthgoli ashwanthgoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants