Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to disable verification by creating the BufferedTokenizer with 0 bytes. #16824

Closed
robbavey opened this issue Dec 19, 2024 · 2 comments · Fixed by #16882
Closed
Assignees

Comments

@robbavey
Copy link
Member

robbavey commented Dec 19, 2024

Enforcement of a maximum size to decode in the BufferedTokenizer was added, by the introduction of a sizeLimit parameter.

This parameter will be enforced, regardless of the size of the value, even if it is 0 or negative.

We should add the ability to skip the enforcement if the sizeLimit is set to 0.

@mashhurs mashhurs self-assigned this Dec 23, 2024
@mashhurs
Copy link
Contributor

TL;DR: I am not seeing a value of disabling BufferedTokenizer limit.

Bug history

  • Logstash had a bug which it failed to re-initiate the size marker when it flushes the BufferedTokenizer buffer. This was fixed in ensure inputSize state value is reset during buftok.flush #16760. This bug is applicable when using the size limit.
  • Plugins:
    • json_lines (3.2.0+) codec introduced a decode_size_limit_bytes parameter to set the limit for each JSON line when decoding. This change triggered Logstash core buffer flush-fail bug
    • For the LS-to-LS communications, ls-output plugins was sending all events without setting new line delimiter where LS core takes it as with a big size in BufferedTokenizer.
  • The problematic situation happens with 8.16.0, 8.16.1, or 8.17.0 versions which includes the json_lines-v3.2.0+

Current situation

  • We have a work around and docs for this issue (with Doc: Add json_lines known issue to release notes #16831)
  • LS core bug is fixed, correctly tracks the buffer marker.
  • LS output plugin correctly generates event-oriented ndjson-compatible payloads.
  • json_lines has a 512M limit for each JSON line which is pretty big and users can increase the limit as much as big they want.
  • No other plugins using the BufferedTokenizer with setting its limit.

From current situation, I am not seeing benefits of adding a feature to disable the BufferedTokenizer limit with its value (IMHO: 0 is invalid number for the size, intentionally sticking to negative -1 would make sense to as other products/services have similar ability)
CCing for feedback: @jsvd , @robbavey, @yaauie

@robbavey
Copy link
Member Author

robbavey commented Jan 8, 2025

My concern is specifically around here:

if (args.length == 2) {
this.sizeLimit = args[1].convertToInteger().getIntValue();
this.hasSizeLimit = true;

and the fact that we allow a value of 0 (or even negative values) to be set for this in the only plugin that currently uses this setting:

https://github.com/logstash-plugins/logstash-codec-json_lines/blob/main/lib/logstash/codecs/json_lines.rb#L50

Which, I believe will mean that the plugin can be instantiated correctly, but will blow up as soon as data is passed through.

I would like to see some guard against doing this, either by explicitly stating that we allow 0 or -1 as a way to avoid verification, or only allowing a positive value for this field, and throwing on an invalid value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants