Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bloom-compactor] downloading chunks in batches #11649

Conversation

vlad-diachenko
Copy link
Contributor

@vlad-diachenko vlad-diachenko commented Jan 11, 2024

What this PR does / why we need it:
Added chunks batches iterator to download chunks in batches instead of downloading all of them at once. Otherwise, when the stream contains a lot of chunks, it can lead to OOM.

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

…f downloading all of them at once

Signed-off-by: Vladyslav Diachenko <[email protected]>
@vlad-diachenko vlad-diachenko requested a review from a team as a code owner January 11, 2024 10:10
@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Jan 11, 2024
Signed-off-by: Vladyslav Diachenko <[email protected]>
pkg/validation/limits.go Outdated Show resolved Hide resolved
pkg/bloomcompactor/chunksbatchesiterator.go Outdated Show resolved Hide resolved
Signed-off-by: Vladyslav Diachenko <[email protected]>
@vlad-diachenko vlad-diachenko enabled auto-merge (squash) January 11, 2024 14:08
Copy link
Contributor

@chaudum chaudum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm

@@ -545,7 +545,7 @@ func (c *Compactor) runCompact(ctx context.Context, logger log.Logger, job Job,
// When already compacted metas exists, we need to merge all blocks with amending blooms with new series
level.Info(logger).Log("msg", "already compacted metas exists, use mergeBlockBuilder")

var populate = createPopulateFunc(ctx, logger, job, storeClient, bt)
var populate = createPopulateFunc(ctx, job, storeClient, bt, c.limits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to pass the limits because it's extendable...

@@ -536,7 +536,7 @@ func (c *Compactor) runCompact(ctx context.Context, logger log.Logger, job Job,
}

fpRate := c.limits.BloomFalsePositiveRate(job.tenantID)
resultingBlock, err = compactNewChunks(ctx, logger, job, fpRate, bt, storeClient.chunk, builder)
resultingBlock, err = compactNewChunks(ctx, logger, job, fpRate, bt, storeClient.chunk, builder, c.limits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing down the limits, should we just pass the batchSize?
We also resolve the fpRate per tenant just before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's better to pass the limits because we will always add additional parameters and it will make us to add additional function's incoming parameters... I will remove fpRate and will resolve it inside the function.

pkg/bloomcompactor/chunksbatchesiterator.go Outdated Show resolved Hide resolved
pkg/bloomcompactor/chunkcompactor.go Show resolved Hide resolved
pkg/validation/limits_test.go Outdated Show resolved Hide resolved
pkg/bloomcompactor/chunksbatchesiterator.go Show resolved Hide resolved
@@ -22,7 +22,7 @@ import (
)

type compactorTokenizer interface {
PopulateSeriesWithBloom(bloom *v1.SeriesWithBloom, chunks []chunk.Chunk) error
PopulateSeriesWithBloom(bloom *v1.SeriesWithBloom, chunkBatchesIterator v1.Iterator[[]chunk.Chunk]) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can avoid modifying the argument of this method as well as its implementation at bloom_tokenizer.go.

We can call PopulateSeriesWithBloom for each call of the iterator Next(). E.g. on createPopulateFunc

batchesIterator, err := newChunkBatchesIterator(ctx, storeClient.chunk, chunkRefs, limits.BloomCompactorChunksBatchSize(job.tenantID))
if err != nil {
	return fmt.Errorf("error creating chunks batches iterator: %w", err)
}
for batchesIterator.Next() {
   if err := batchesIterator.Err() ...
   chunks := batchesIterator.At()
   err = bt.PopulateSeriesWithBloom(&bloomForChks, chunks)
}
err = bt.PopulateSeriesWithBloom(&bloomForChks, batchesIterator)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would affect the metrics we expose inside PopulateSeriesWithBloom tho.

So what about PopulateSeriesWithBloom receiving an iterator of chunks (Iterator[chunk.Chunk]) and doing the buffering inside the iterator implementation. That way the batching logic is present only in the iterator.

func (c *chunksBatchesIterator) loadNextBatch() error {
	batchSize := c.batchSize
	chunksToDownloadCount := len(c.chunksToDownload)
	if chunksToDownloadCount < batchSize {
		batchSize = chunksToDownloadCount
	}
	chunksToDownload := c.chunksToDownload[:batchSize]
	c.chunksToDownload = c.chunksToDownload[batchSize:]

	newBatch, err := c.client.GetChunks(c.context, chunksToDownload)
	if err != nil {
		return err
	}

	c.currentBatch = newBatch
	return nil
}

func (c *chunksBatchesIterator) Next() bool {
	if len(c.currentBatch) == 0 {
		if len(c.chunksToDownload) == 0 {
			return false
		}
		if c.err = c.loadNextBatch(); c.err != nil {
			return false
		}
	}

	// Pop the first chunk from the current batch and set it as the current chunk.
	c.currentChunk = c.currentBatch[0]
	c.currentBatch = c.currentBatch[1:]

	return true
}

func (c *chunksBatchesIterator) At() chunk.Chunk {
	return c.currentChunk
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it will do the same... but I do not believe that we win anything by changing it this way...
it just increases the complexity of the iterator.
let's say if we failed to download the batch, we will report an error, but this error is not connected to this particular chunk is returned by At() function. so, it might confuse somebody in the future...
not really against it, but it looks almost the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, not a strong opinion!

@vlad-diachenko vlad-diachenko merged commit a5aa8b3 into main Jan 12, 2024
9 checks passed
@vlad-diachenko vlad-diachenko deleted the vlad.diachenko/bloom-compactor_downloading-chunks-in-batches branch January 12, 2024 17:21
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
**What this PR does / why we need it**:
Added chunks batches iterator to download chunks in batches instead of
downloading all of them at once. Otherwise, when the stream contains a
lot of chunks, it can lead to OOM.

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [x] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)

---------

Signed-off-by: Vladyslav Diachenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants