Bloom Compactor: Optimize check for fingerprint ownership #11389

chaudum · 2023-12-05T13:52:05Z

What this PR does / why we need it:

Calling c.sharding.OwnsFingerprint(tenant, uint64(fingerprint)) for each Series of a TSDB index is very expensive, because it not only creates the tenant's sub-ring but also needs to check the fingerprint against it.

Instead, we can pre-calculate the current instance's token ranges and check if the (uint32 converted) fingerprint is contained within these ranges.

Special notes for your reviewer:

The main change of this PR is 5b58326

github-actions · 2023-12-05T13:54:02Z

Trivy scan found the following vulnerabilities:

HIGH openssl: Incorrect cipher key and IV length processing in libcrypto3 v3.1.3-r0. Fixed in v3.1.4-r0
HIGH openssl: Incorrect cipher key and IV length processing in libssl3 v3.1.3-r0. Fixed in v3.1.4-r0

vlad-diachenko

LGTM

chaudum · 2023-12-07T15:41:12Z

Summary from the offline discussion between @chaudum and @vlad-diachenko

The current implementation of getting the fingerprint ownership works generically no matter how many tokens an instance has in the ring, and therefore how many token ranges.
The problem, however, is that if an instance has multiple token ranges, the fingerprints from both ranges are added to the same job, resulting in a min and max fingerprint for the job that spans both the token ranges of the instance plus everything in between. This would result in unnecessary downloads of blocks.

At the moment, bloom compactors are configured with 1 token per instance, which means that there is always exactly 1 instance that has two token ranges (one from lastToken+1 to MaxUint32, and one from 0 to firstToken).

There are two options then:

Create a job for each token/fingerprint range of the instance. This will work with any number of tokens per instance.
Use the order of instances based on their first token in the ring and split the ring's token range into n equal ranges and assign them in order of the instances. This would create equal token ranges for each instance, even if the tokens in the ring are not evenly distributed.

poyzannur · 2023-12-11T14:17:50Z

There are two options then:

Create a job for each token/fingerprint range of the instance. This will work with any number of tokens per instance.

Use the order of instances based on their first token in the ring and split the ring's token range into n equal ranges and assign them in order of the instances. This would create equal token ranges for each instance, even if the tokens in the ring are not evenly distributed.

Discussed offline. I also think the second approach is going against the nature of ring. It is fragile to the changes in the number of tokens and requires future operators of the service to know the ring implementation. One of the advantages of the first approach as a result will create a job per range, which will nicely reduce job's responsibility. In second approach we better introduce a further divide up for the job, otherwise the work may get quite big. I'm happy for us to try and see both approaches.
Implementation so far lgtm.

Signed-off-by: Christian Haudum <[email protected]>

…nership Signed-off-by: Christian Haudum <[email protected]>

Handle ranges correctly when token is 0 or MaxUint32 Signed-off-by: Christian Haudum <[email protected]>

Signed-off-by: Christian Haudum <[email protected]>

Divide full token range by the amount of instances to get equal token ranges. Assign ranges based on the sort order of the first token of an instance in the ring. Signed-off-by: Christian Haudum <[email protected]>

Signed-off-by: Christian Haudum <[email protected]>

) Calling `c.sharding.OwnsFingerprint(tenant, uint64(fingerprint))` for each Series of a TSDB index is very expensive, because it not only creates the tenant's sub-ring but also needs to check the fingerprint against it. Instead, we can pre-calculate the current instance's token ranges and check if the (uint32 converted) fingerprint is contained within these ranges. Signed-off-by: Christian Haudum <[email protected]>

pull-request-size bot added the size/XL label Dec 5, 2023

chaudum changed the title ~~Chaudum/bloomcompactor fingerprint ownership~~ Bloom Compactor: Optimize check for fingerprint ownership Dec 5, 2023

chaudum force-pushed the chaudum/bloomcompactor-fingerprint-ownership branch from 5b58326 to cc5fe60 Compare December 5, 2023 14:07

chaudum marked this pull request as ready for review December 6, 2023 10:37

chaudum requested a review from a team as a code owner December 6, 2023 10:37

vlad-diachenko approved these changes Dec 7, 2023

View reviewed changes

chaudum added 8 commits December 12, 2023 08:55

Make compactTenant function more readable

d5d0706

Signed-off-by: Christian Haudum <[email protected]>

Move common code from bloomgateway into shared packages

ce4867e

Signed-off-by: Christian Haudum <[email protected]>

Bloom Compactor: Make fingerprint ownership check more efficient

318e3af

Signed-off-by: Christian Haudum <[email protected]>

fixup! Merge branch 'main' into chaudum/bloomcompactor-fingerprint-ow…

a0e6ed9

…nership Signed-off-by: Christian Haudum <[email protected]>

Fix serverAddressesWithTokenRanges function

3b41d19

Handle ranges correctly when token is 0 or MaxUint32 Signed-off-by: Christian Haudum <[email protected]>

fixup! Fix serverAddressesWithTokenRanges function

f2b443e

Signed-off-by: Christian Haudum <[email protected]>

Calculate token range base off first instance token

d6a94fb

Divide full token range by the amount of instances to get equal token ranges. Assign ranges based on the sort order of the first token of an instance in the ring. Signed-off-by: Christian Haudum <[email protected]>

Fix constant int overflow for ARM

585c342

Signed-off-by: Christian Haudum <[email protected]>

chaudum force-pushed the chaudum/bloomcompactor-fingerprint-ownership branch from be064b5 to 585c342 Compare December 12, 2023 07:55

chaudum merged commit c4f5a57 into main Dec 12, 2023
7 checks passed

chaudum deleted the chaudum/bloomcompactor-fingerprint-ownership branch December 12, 2023 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bloom Compactor: Optimize check for fingerprint ownership #11389

Bloom Compactor: Optimize check for fingerprint ownership #11389

chaudum commented Dec 5, 2023

github-actions bot commented Dec 5, 2023

vlad-diachenko left a comment

chaudum commented Dec 7, 2023

poyzannur commented Dec 11, 2023

Bloom Compactor: Optimize check for fingerprint ownership #11389

Bloom Compactor: Optimize check for fingerprint ownership #11389

Conversation

chaudum commented Dec 5, 2023

github-actions bot commented Dec 5, 2023

vlad-diachenko left a comment

Choose a reason for hiding this comment

chaudum commented Dec 7, 2023

poyzannur commented Dec 11, 2023