Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BloomShipper: add cache for downloaded blocks #11394

Merged
merged 7 commits into from
Dec 13, 2023

Conversation

vlad-diachenko
Copy link
Contributor

@vlad-diachenko vlad-diachenko commented Dec 6, 2023

adapted embeddedcache.go to store downloaded bloom blocks on local filesystem.

What this PR does / why we need it:
This cache will be used by bloom-gateway to not download the blocks each time.

In the cache we store the objects(cachedBlock struct) that contain:

  1. blockDirectory: the path to the directory where the block was extracted
  2. activeQueriers: thread-safe counter of active block users. This field is important because we do not want the directory to be removed while it's in use.
  3. rest fields are needed for service needs

When the downloadingWorker receives this entity, it increases the counter activeQueriers and creates blockQuerier wrapper that has Close function that will decrease the counter once blockQuerier is not needed anymore.

When the cache entry is being removed from the cache, the cache calls the function removeDirectoryAsync which asynchronously removes the block's folder. This function checks every Xms if there are still active block queriers and once activeQueriers count is 0 the folder will be removed.
Also, there is a timeout in this function, and once the timeout is reached, the folder will be force removed.

Special notes for your reviewer:

  • If the cache is disabled, then the blocks will be downloaded each time the block is requested.
  • If the cache is used, the folder will be deleted by embeddedCache when it reaches memory size limit or items count limit. Otherwise, the block's folder will be deleted when Close function is called.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Dec 6, 2023
Copy link
Contributor

github-actions bot commented Dec 6, 2023

Trivy scan found the following vulnerabilities:

Signed-off-by: Vladyslav Diachenko <[email protected]>
@vlad-diachenko vlad-diachenko marked this pull request as ready for review December 6, 2023 11:37
@vlad-diachenko vlad-diachenko requested a review from a team as a code owner December 6, 2023 11:37
Copy link
Contributor

@chaudum chaudum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks quite good already. I think I would prefer a noop-cache implementation instead of the nil-check, because it would unify the code paths.

Comment on lines +321 to +323
Close: func() error {
return deleteFolder(blockDirectory)
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we want to delete the complete directory containing the block files every time we are done reading and call Close().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I understand now why you implemented it like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not use the cache, then we can not control the total size of the files on FS, so, it's better to remove it once we complete the processing.
in our case, we will always use the cache, but some OSS users might have it disabled (default state)

pkg/storage/stores/shipper/bloomshipper/config/config.go Outdated Show resolved Hide resolved
… use the abstraction instead of using 'if' statements in the code.

Signed-off-by: Vladyslav Diachenko <[email protected]>
docs/sources/configure/_index.md Show resolved Hide resolved
# Conflicts:
#	pkg/storage/stores/shipper/bloomshipper/block_downloader.go
#	pkg/storage/stores/shipper/bloomshipper/block_downloader_test.go
@vlad-diachenko vlad-diachenko enabled auto-merge (squash) December 13, 2023 15:49
@vlad-diachenko vlad-diachenko merged commit 9b69190 into main Dec 13, 2023
8 checks passed
@vlad-diachenko vlad-diachenko deleted the vlad.diachenko/blocks-cache branch December 13, 2023 16:31
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
adapted embeddedcache.go to store downloaded bloom blocks on local
filesystem.

**What this PR does / why we need it**:
This cache will be used by bloom-gateway to not download the blocks each
time.

In the cache we store the objects(`cachedBlock` struct) that contain: 
1. `blockDirectory`: the `path` to the directory where the block was
extracted
2. `activeQueriers`: thread-safe counter of active block users. This
field is important because we do not want the directory to be removed
while it's in use.
3. rest fields are needed for service needs

When the downloadingWorker receives this entity, it increases the
counter `activeQueriers` and creates blockQuerier wrapper that has
`Close` function that will decrease the counter once blockQuerier is not
needed anymore.

When the cache entry is being removed from the cache, the cache calls
the function `removeDirectoryAsync` which asynchronously removes the
block's folder. This function checks every `Xms` if there are still
active block queriers and once `activeQueriers` count is `0` the folder
will be removed.
Also, there is a timeout in this function, and once the timeout is
reached, the folder will be force removed.

**Special notes for your reviewer**:
* If the cache is disabled, then the blocks will be downloaded each time
the block is requested.
* If the cache is used, the folder will be deleted by embeddedCache when
it reaches memory size limit or items count limit. Otherwise, the
block's folder will be deleted when `Close` function is called.


**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [x] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)

---------

Signed-off-by: Vladyslav Diachenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XL type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants