Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bloom/fuse queries #11088

Merged
merged 11 commits into from
Nov 1, 2023
Merged

Bloom/fuse queries #11088

merged 11 commits into from
Nov 1, 2023

Conversation

owen-d
Copy link
Member

@owen-d owen-d commented Oct 31, 2023

Introduces a "Fuse" operation on a single block querier which multiplexes queries against a single block. This is advantageous for a few reasons:

  • A "query" contains one of more series, with each series needing to test one or more chunks for some search string(s).
  • Blocks are sorted by fingerprint, which are the result of a hash function. This means series are ~uniformly distributed across the hash keyspace (uint64 in our case). Because of this, a query for some number of series are likely to be spread out across the block itself.
  • Blocks are organized into individually compressed "pages". In essence, querying them requires Seeking to the correct offset, decompressing the page, then seeking within the page to the actual offset for a particular series.
  • Since the series for any query are likely spread out across the block, performing n queries decompresses each page n times.
  • By multiplexing many queries into a single pass iteration of the block, we amortize the cost of loading & decompressing

Misc:
Also introduces a bunch of generic-powered iterators. These can probably be optimized (I bet they use dynamic dispatch under the hood), but they're very convenient to prototype with and we can improve them in the future if it proves warranted.

@owen-d owen-d requested a review from a team as a code owner October 31, 2023 01:22
Signed-off-by: Owen Diehl <[email protected]>
Signed-off-by: Owen Diehl <[email protected]>
Signed-off-by: Owen Diehl <[email protected]>
Copy link
Collaborator

@slim-bean slim-bean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTM

@owen-d owen-d merged commit 6bfd2ba into grafana:main Nov 1, 2023
3 checks passed
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
Introduces a "Fuse" operation on a single block querier which
multiplexes queries against a single block. This is advantageous for a
few reasons:
* A "query" contains one of more series, with each series needing to
test one or more chunks for some search string(s).
* Blocks are sorted by fingerprint, which are the result of a hash
function. This means series are ~uniformly distributed across the hash
keyspace (`uint64` in our case). Because of this, a query for some
number of series are likely to be spread out across the block itself.
* Blocks are organized into individually compressed "pages". In essence,
querying them requires Seeking to the correct offset, decompressing the
page, then seeking within the page to the actual offset for a particular
series.
* Since the series for any query are likely spread out across the block,
performing `n` queries decompresses each page `n` times.
* By multiplexing many queries into a single pass iteration of the
block, we amortize the cost of loading & decompressing

Misc:
Also introduces a bunch of generic-powered iterators. These can probably
be optimized (I bet they use dynamic dispatch under the hood), but
they're very convenient to prototype with and we can improve them in the
future if it proves warranted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants