Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: data race / nil channel read in pattern aggregation push #15410

Merged
merged 2 commits into from
Dec 13, 2024

Conversation

na--
Copy link
Member

@na-- na-- commented Dec 13, 2024

What this PR does / why we need it:

This fixes a few different issues:

  1. Setting p.quit = nil like that is actually a data race and the tests failed when executed with the -race flag:
    WARNING: DATA RACE
    Read at 0x00c000838508 by goroutine 82:
    github.com/grafana/loki/v3/pkg/pattern/aggregation.(*Push).run()
        github.com/grafana/loki/v3/pkg/pattern/aggregation/push.go:257 +0x184
    github.com/grafana/loki/v3/pkg/pattern/aggregation.Test_Push.func3.gowrap2()
        github.com/grafana/loki/v3/pkg/pattern/aggregation/push_test.go:172 +0x38
    
    Previous write at 0x00c000838508 by goroutine 80:
    github.com/grafana/loki/v3/pkg/pattern/aggregation.(*Push).Stop()
        github.com/grafana/loki/v3/pkg/pattern/aggregation/push.go:165 +0x1884
    github.com/grafana/loki/v3/pkg/pattern/aggregation.Test_Push.func3()
        github.com/grafana/loki/v3/pkg/pattern/aggregation/push_test.go:176 +0x1833
    testing.tRunner()
        testing/testing.go:1690 +0x226
    testing.(*T).Run.gowrap1()
        testing/testing.go:1743 +0x44
    
    This would be a problem not just in the tests, but in the actual execution of this module.
  2. However, even if this had succeeded and the channel had been set to nil, that is even worse! Trying to read from a nil channel blocks forever, so the run() method would have blocked on case <-p.quit: instead of finished gracefully 😬
  3. Finally, even if we somehow got to the desired result of Push.run() gracefully stopping, the Push.Stop() method didn't actually wait for that to happen, which might be breaking the expectation of upstream users of that method. It's generally a good idea to expect that whatever you have called the Stop() method of something, that thing has actually stopped once the method returns. It's much easier to reason about its behavior that way.

@na-- na-- requested a review from a team as a code owner December 13, 2024 14:42
Copy link
Collaborator

@trevorwhitney trevorwhitney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood in slack, that the test fails with the -race flag before this change, but doesn't fail afterwards. That's all the test I need! LGTM

@trevorwhitney trevorwhitney merged commit 5d8220c into main Dec 13, 2024
58 checks passed
@trevorwhitney trevorwhitney deleted the ned/fix-data-race-02 branch December 13, 2024 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants