Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bloom Compactor: Optimize check for fingerprint ownership #11389

Merged
merged 8 commits into from
Dec 12, 2023

Conversation

chaudum
Copy link
Contributor

@chaudum chaudum commented Dec 5, 2023

What this PR does / why we need it:

Calling c.sharding.OwnsFingerprint(tenant, uint64(fingerprint)) for each Series of a TSDB index is very expensive, because it not only creates the tenant's sub-ring but also needs to check the fingerprint against it.

Instead, we can pre-calculate the current instance's token ranges and check if the (uint32 converted) fingerprint is contained within these ranges.

Special notes for your reviewer:

The main change of this PR is 5b58326

@chaudum chaudum changed the title Chaudum/bloomcompactor fingerprint ownership Bloom Compactor: Optimize check for fingerprint ownership Dec 5, 2023
Copy link
Contributor

github-actions bot commented Dec 5, 2023

Trivy scan found the following vulnerabilities:

@chaudum chaudum force-pushed the chaudum/bloomcompactor-fingerprint-ownership branch from 5b58326 to cc5fe60 Compare December 5, 2023 14:07
@chaudum chaudum marked this pull request as ready for review December 6, 2023 10:37
@chaudum chaudum requested a review from a team as a code owner December 6, 2023 10:37
Copy link
Contributor

@vlad-diachenko vlad-diachenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chaudum
Copy link
Contributor Author

chaudum commented Dec 7, 2023

Summary from the offline discussion between @chaudum and @vlad-diachenko

The current implementation of getting the fingerprint ownership works generically no matter how many tokens an instance has in the ring, and therefore how many token ranges.
The problem, however, is that if an instance has multiple token ranges, the fingerprints from both ranges are added to the same job, resulting in a min and max fingerprint for the job that spans both the token ranges of the instance plus everything in between. This would result in unnecessary downloads of blocks.

At the moment, bloom compactors are configured with 1 token per instance, which means that there is always exactly 1 instance that has two token ranges (one from lastToken+1 to MaxUint32, and one from 0 to firstToken).

There are two options then:

  1. Create a job for each token/fingerprint range of the instance. This will work with any number of tokens per instance.
  2. Use the order of instances based on their first token in the ring and split the ring's token range into n equal ranges and assign them in order of the instances. This would create equal token ranges for each instance, even if the tokens in the ring are not evenly distributed.

@poyzannur
Copy link
Contributor

There are two options then:

  1. Create a job for each token/fingerprint range of the instance. This will work with any number of tokens per instance.
  2. Use the order of instances based on their first token in the ring and split the ring's token range into n equal ranges and assign them in order of the instances. This would create equal token ranges for each instance, even if the tokens in the ring are not evenly distributed.

Discussed offline. I also think the second approach is going against the nature of ring. It is fragile to the changes in the number of tokens and requires future operators of the service to know the ring implementation. One of the advantages of the first approach as a result will create a job per range, which will nicely reduce job's responsibility. In second approach we better introduce a further divide up for the job, otherwise the work may get quite big. I'm happy for us to try and see both approaches.
Implementation so far lgtm.

Handle ranges correctly when token is 0 or MaxUint32

Signed-off-by: Christian Haudum <[email protected]>
Divide full token range by the amount of instances to get equal token
ranges. Assign ranges based on the sort order of the first token of an
instance in the ring.

Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
@chaudum chaudum force-pushed the chaudum/bloomcompactor-fingerprint-ownership branch from be064b5 to 585c342 Compare December 12, 2023 07:55
@chaudum chaudum merged commit c4f5a57 into main Dec 12, 2023
7 checks passed
@chaudum chaudum deleted the chaudum/bloomcompactor-fingerprint-ownership branch December 12, 2023 08:11
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
)

Calling `c.sharding.OwnsFingerprint(tenant, uint64(fingerprint))` for
each Series of a TSDB index is very expensive, because it not only
creates the tenant's sub-ring but also needs to check the fingerprint
against it.

Instead, we can pre-calculate the current instance's token ranges and
check if the (uint32 converted) fingerprint is contained within these
ranges.
 

Signed-off-by: Christian Haudum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants