fix(inputs.kafka_consumer): Use per-message parser to avoid races #13886

toanju · 2023-09-07T16:37:43Z

Using parsers like json_v2 will result in undesired parser results. This switches to the ParserFunc pattern to create a dedicated parser in each thread.

Required for all PRs

Updated associated README.md.
Wrote appropriate unit tests.
Pull request title or commits are in conventional commit format

fixes #13888

powersj · 2023-09-07T19:49:58Z

Hi,

Using parsers like json_v2 will result in undesired parser results.

Can you please file an issue and document what you were seeing please?

toanju · 2023-09-08T08:12:30Z

related to #13888

let me know if you need more info

srebhan

While that might work it will strictly serialize the parsing... Should we instead change the plugin to the parser-func interface and create a new parser each time (or use a sync.Pool of parsers)?

My concern is that this will hamper performance on larger messages or slow parsers...

toanju · 2023-09-09T08:18:16Z

Performance is not important if you get wrong results. Though, I agree that the mutex in this place may have an impact on other parsers and the JSON parser might be one that folks to not use a lot.

I am open to implement this, however, currently I'd need to look into some of the suggested options to get a better understanding. Hence, if someone can point into the right direction this would be awesome.

toanju · 2023-09-09T19:53:32Z

@srebhan, implemented the ParserFunc for now. Looking forward to additional feedback.

Using parsers like json_v2 will result in undesired parser results. This switches to the ParserFunc pattern to create a dedicated parser in each thread.

telegraf-tiger · 2023-09-10T08:46:45Z

Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

👍 This pull request doesn't change the Telegraf binary size

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB	RPM	TAR GZ	ZIP
amd64.deb	aarch64.rpm	darwin_amd64.tar.gz	windows_amd64.zip
arm64.deb	armel.rpm	darwin_arm64.tar.gz	windows_arm64.zip
armel.deb	armv6hl.rpm	freebsd_amd64.tar.gz	windows_i386.zip
armhf.deb	i386.rpm	freebsd_armv7.tar.gz
i386.deb	ppc64le.rpm	freebsd_i386.tar.gz
mips.deb	riscv64.rpm	linux_amd64.tar.gz
mipsel.deb	s390x.rpm	linux_arm64.tar.gz
ppc64el.deb	x86_64.rpm	linux_armel.tar.gz
riscv64.deb		linux_armhf.tar.gz
s390x.deb		linux_i386.tar.gz
		linux_mips.tar.gz
		linux_mipsel.tar.gz
		linux_ppc64le.tar.gz
		linux_riscv64.tar.gz
		linux_s390x.tar.gz

srebhan

Thank you for the quick update and your contribution in general @toanju!

toanju · 2023-09-11T15:27:00Z

Many thanks @srebhan and @powersj 🎉

athornton · 2023-10-05T21:35:17Z

This makes using a schema registry impossible, because the schema registry cache is down in the Avro parser, and a new parser means an always-empty cache, which means we query the schema registry once per measurement. Since in normal operations there are many, many more measurements than schemas, this creates an enormous load on the schema registry, and it also means that parsing each measurement takes much too long, since it has to wait on a network call.

#14025 is what I'm working on to address this. @srebhan suggested an approach based on adding Clone() to the parser layer, and I'm trying to work through that; this looks to me like I have to understand each parser, whether or not it's thread-safe, and implement the correct Clone() method for each parser. That's a daunting task. Another possibility might be to associate the schema registry with the Kafka input layer rather than the Avro parser layer, since you need to be using both Kafka and Avro (I think) to use a Schema registry.

telegraf-tiger bot added fix pr to fix corresponding bug plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Sep 7, 2023

powersj added the waiting for response waiting for response from contributor label Sep 7, 2023

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Sep 8, 2023

srebhan reviewed Sep 8, 2023

View reviewed changes

srebhan self-assigned this Sep 8, 2023

srebhan added the area/kafka label Sep 8, 2023

toanju force-pushed the kafka-call-parsers-synchronous branch from f88cffa to 58eb3cf Compare September 9, 2023 19:51

fix(inputs.kafka_consumer): use mutex for parser

cf7f511

Using parsers like json_v2 will result in undesired parser results. This switches to the ParserFunc pattern to create a dedicated parser in each thread.

toanju force-pushed the kafka-call-parsers-synchronous branch from 58eb3cf to cf7f511 Compare September 10, 2023 08:09

srebhan approved these changes Sep 11, 2023

View reviewed changes

srebhan changed the title ~~fix(inputs.kafka_consumer): use mutex for parser~~ fix(inputs.kafka_consumer): Use per-message parser to avoid races Sep 11, 2023

srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Sep 11, 2023

srebhan assigned powersj and unassigned srebhan Sep 11, 2023

powersj approved these changes Sep 11, 2023

View reviewed changes

powersj merged commit 3fae643 into influxdata:master Sep 11, 2023
5 checks passed

github-actions bot added this to the v1.28.0 milestone Sep 11, 2023

toanju deleted the kafka-call-parsers-synchronous branch September 11, 2023 15:26

athornton mentioned this pull request Sep 29, 2023

Allow (controlled) reuse of parsers again #14025

Closed

athornton mentioned this pull request Oct 19, 2023

fix(parsers.json_v2): Prevent race condition in parse function #14149

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(inputs.kafka_consumer): Use per-message parser to avoid races #13886

fix(inputs.kafka_consumer): Use per-message parser to avoid races #13886

toanju commented Sep 7, 2023 •

edited

Loading

powersj commented Sep 7, 2023

toanju commented Sep 8, 2023

srebhan left a comment •

edited

Loading

toanju commented Sep 9, 2023

toanju commented Sep 9, 2023

telegraf-tiger bot commented Sep 10, 2023

Artifact URLs

srebhan left a comment

toanju commented Sep 11, 2023

athornton commented Oct 5, 2023

fix(inputs.kafka_consumer): Use per-message parser to avoid races #13886

fix(inputs.kafka_consumer): Use per-message parser to avoid races #13886

Conversation

toanju commented Sep 7, 2023 • edited Loading

Required for all PRs

powersj commented Sep 7, 2023

toanju commented Sep 8, 2023

srebhan left a comment • edited Loading

Choose a reason for hiding this comment

toanju commented Sep 9, 2023

toanju commented Sep 9, 2023

telegraf-tiger bot commented Sep 10, 2023

Artifact URLs

srebhan left a comment

Choose a reason for hiding this comment

toanju commented Sep 11, 2023

athornton commented Oct 5, 2023

toanju commented Sep 7, 2023 •

edited

Loading

srebhan left a comment •

edited

Loading