Make otel scheduler sync #2262

wildum · 2024-12-11T16:50:03Z

The fix in #2027 did not address all the race conditions that could happen with the scheduler. We noticed it when getting rid of our Otel fork brought back the flakiness of a batch processor test.

With this change, the scheduling of otel components is not an async operation anymore. This simplifies the logic and ensures that the components are started before they can start consuming.

ptodev

The OTel contract states that Start must return quickly, so I suppose sync would be fine. Do we know if the Collector does this async or sync? It'd be best to just do what it does, just to be on the safe side.

Regarding the issue with the batch processor - does it happen because something is wrong on the OTel side, or because #2027 maybe doesn't work for some edge case?

ptodev · 2024-12-11T18:41:55Z

It might be good for @thampiotr to opine here, given that this PR reverts changes in #2027.

wildum · 2024-12-12T08:46:35Z

The OTel contract states that Start must return quickly, so I suppose sync would be fine. Do we know if the Collector does this async or sync? It'd be best to just do what it does, just to be on the safe side.

It's done sync in the otel collector. The service starts everything sync (https://github.com/open-telemetry/opentelemetry-collector/blob/main/service/service.go#L230) and it's called sync here (https://github.com/open-telemetry/opentelemetry-collector/blob/main/otelcol/collector.go#L228)

Regarding the issue with the batch processor - does it happen because something is wrong on the OTel side, or because #2027 maybe doesn't work for some edge case?

Piotr and I had a discussion about it here: https://raintank-corp.slack.com/archives/C02GSU8SHBN/p1733846257348439
TLDR is that on Update we will create a new underlying unstarted Otel component. Because we start it async, there is always a window where the lazy consumer is unpaused and the unstarted component might start consuming before being started.

ptodev

Do we need to add a changelog entry? If it's a problem end users could see, then mentioning the error in the changelog could indicate to them that they shouldn't see it again.

thampiotr · 2024-12-12T10:21:07Z

It might be good for @thampiotr to opine here, given that this PR reverts changes in #2027.

It doesn't revert all the changes, it's reusing the approach.

thampiotr

Looks good, I like it more. Thanks for fixing @wildum 🚀

internal/component/otelcol/internal/scheduler/scheduler.go

ptodev · 2024-12-18T13:47:21Z

internal/component/otelcol/internal/scheduler/scheduler.go

-		// A message is already queued for refreshing running components so we
-		// don't have to do anything here.
-	}
+	level.Debug(cs.log).Log("msg", "scheduling components", "count", len(cc))


Suggested change

level.Debug(cs.log).Log("msg", "scheduling components", "count", len(cc))

level.Debug(cs.log).Log("msg", "scheduling otelcol components", "count", len(cc))

It's quite an ambiguous log line, but if you think there is value in keeping it I'm happy to have it as a debug log.

#2296

* Don't start components until Run is called * Update consumers after stopping the component * Minor fixes

make otel scheduler sync

b7bf45b

wildum requested a review from a team as a code owner December 11, 2024 16:50

ptodev reviewed Dec 11, 2024

View reviewed changes

ptodev approved these changes Dec 12, 2024

View reviewed changes

thampiotr approved these changes Dec 12, 2024

View reviewed changes

wildum mentioned this pull request Dec 12, 2024

Fix static converters #2270

Open

3 tasks

ptodev previously requested changes Dec 16, 2024

View reviewed changes

internal/component/otelcol/internal/scheduler/scheduler.go Show resolved Hide resolved

ptodev mentioned this pull request Dec 18, 2024

Don't start components until Run is called #2296

Merged

ptodev reviewed Dec 18, 2024

View reviewed changes

ptodev requested a review from thampiotr December 18, 2024 13:48

Don't start components until Run is called (#2296)

b133971

* Don't start components until Run is called * Update consumers after stopping the component * Minor fixes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make otel scheduler sync #2262

Make otel scheduler sync #2262

wildum commented Dec 11, 2024

ptodev left a comment

ptodev commented Dec 11, 2024

wildum commented Dec 12, 2024

ptodev left a comment

thampiotr commented Dec 12, 2024

thampiotr left a comment

ptodev Dec 18, 2024

	level.Debug(cs.log).Log("msg", "scheduling components", "count", len(cc))
	level.Debug(cs.log).Log("msg", "scheduling otelcol components", "count", len(cc))

Make otel scheduler sync #2262

Are you sure you want to change the base?

Make otel scheduler sync #2262

Conversation

wildum commented Dec 11, 2024

ptodev left a comment

Choose a reason for hiding this comment

ptodev commented Dec 11, 2024

wildum commented Dec 12, 2024

ptodev left a comment

Choose a reason for hiding this comment

thampiotr commented Dec 12, 2024

thampiotr left a comment

Choose a reason for hiding this comment

ptodev Dec 18, 2024

Choose a reason for hiding this comment