-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(inputs.kafka_consumer): Use per-message parser to avoid races #13886
fix(inputs.kafka_consumer): Use per-message parser to avoid races #13886
Conversation
Hi,
Can you please file an issue and document what you were seeing please? |
related to #13888 let me know if you need more info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While that might work it will strictly serialize the parsing... Should we instead change the plugin to the parser-func interface and create a new parser each time (or use a sync.Pool
of parsers)?
My concern is that this will hamper performance on larger messages or slow parsers...
Performance is not important if you get wrong results. Though, I agree that the mutex in this place may have an impact on other parsers and the JSON parser might be one that folks to not use a lot. I am open to implement this, however, currently I'd need to look into some of the suggested options to get a better understanding. Hence, if someone can point into the right direction this would be awesome. |
f88cffa
to
58eb3cf
Compare
@srebhan, implemented the ParserFunc for now. Looking forward to additional feedback. |
Using parsers like json_v2 will result in undesired parser results. This switches to the ParserFunc pattern to create a dedicated parser in each thread.
58eb3cf
to
cf7f511
Compare
Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip. 👍 This pull request doesn't change the Telegraf binary size 📦 Click here to get additional PR build artifactsArtifact URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick update and your contribution in general @toanju!
This makes using a schema registry impossible, because the schema registry cache is down in the Avro parser, and a new parser means an always-empty cache, which means we query the schema registry once per measurement. Since in normal operations there are many, many more measurements than schemas, this creates an enormous load on the schema registry, and it also means that parsing each measurement takes much too long, since it has to wait on a network call. #14025 is what I'm working on to address this. @srebhan suggested an approach based on adding Clone() to the parser layer, and I'm trying to work through that; this looks to me like I have to understand each parser, whether or not it's thread-safe, and implement the correct Clone() method for each parser. That's a daunting task. Another possibility might be to associate the schema registry with the Kafka input layer rather than the Avro parser layer, since you need to be using both Kafka and Avro (I think) to use a Schema registry. |
Using parsers like json_v2 will result in undesired parser results. This switches to the ParserFunc pattern to create a dedicated parser in each thread.
Required for all PRs
fixes #13888