Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: incorrect table data disk cache key #16837

Merged
merged 2 commits into from
Nov 14, 2024

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented Nov 13, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Range returned by merged reading should not be used as the corresponding parts of table data cache key,
instead offset and len of column meta should be used.

range.start is not stable and can vary for the same column depending on the query's projection.

For instance:

  • SELECT col1, col2 FROM t;
  • SELECT col2 FROM t;

may result in different ranges for col2, this can lead to cache missing or INCONSISTENCIES

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Ensure consistent cache key through code refactoring.

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label Nov 13, 2024
Copy link

what-the-diff bot commented Nov 13, 2024

PR Summary

  • Improved Stability of Cache Key Generation for Columns
    • The way the keys are created to store our data in a quicker-to-access format (cache) has been updated. Instead of using the less reliable start and end range, we now use a more consistent method involving 'offset' and 'length' details coming from our column information. This change enhances the reliability of storing and retrieving cached information.
    • To ensure this new approach is understood and maintained in the future, detailed comments have been added. This will help avoid inconsistencies in the quick-access data due to changing ranges based on different columns being used.

should use `offset` and `len` of column meta as corresponding parts of
table data cache key
@dantengsky dantengsky force-pushed the fix-incorrect-disk-cache-key branch from b234c56 to c368bad Compare November 13, 2024 14:32
@dantengsky dantengsky changed the title fix: incorrect table data disck cache key fix: incorrect table data disk cache key Nov 13, 2024
@dantengsky dantengsky marked this pull request as ready for review November 13, 2024 16:40
@dantengsky dantengsky requested a review from b41sh November 13, 2024 16:40
@wubx wubx added this pull request to the merge queue Nov 14, 2024
@BohuTANG BohuTANG removed this pull request from the merge queue due to a manual request Nov 14, 2024
@BohuTANG BohuTANG merged commit fb5dbcb into databendlabs:main Nov 14, 2024
77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix this PR patches a bug in codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants