-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Opt](multi-catalog)Improve performance by introducing cache of list directory files when getting split for each query. #43913
Open
kaka11chen
wants to merge
1
commit into
apache:master
Choose a base branch
from
kaka11chen:query_directory_list_cache
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
morningman
reviewed
Nov 15, 2024
fe/fe-core/src/main/java/org/apache/doris/fs/TransactionScopeCachingDirectoryListerFactory.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/PhysicalPlanTranslator.java
Outdated
Show resolved
Hide resolved
kaka11chen
force-pushed
the
query_directory_list_cache
branch
from
November 27, 2024 02:46
65ef2e1
to
ecb2339
Compare
run buildall |
TPC-H: Total hot run time: 39843 ms
|
TPC-DS: Total hot run time: 196094 ms
|
ClickBench: Total hot run time: 32.26 s
|
kaka11chen
force-pushed
the
query_directory_list_cache
branch
from
November 27, 2024 11:20
ecb2339
to
9c415d4
Compare
run buildall |
TPC-H: Total hot run time: 39811 ms
|
TPC-DS: Total hot run time: 190713 ms
|
ClickBench: Total hot run time: 31.61 s
|
kaka11chen
force-pushed
the
query_directory_list_cache
branch
from
November 27, 2024 14:58
9c415d4
to
2156ce4
Compare
run buildall |
TPC-H: Total hot run time: 40191 ms
|
TPC-DS: Total hot run time: 191340 ms
|
ClickBench: Total hot run time: 32.37 s
|
kaka11chen
force-pushed
the
query_directory_list_cache
branch
2 times, most recently
from
November 28, 2024 03:02
adc3ee6
to
d0f5f61
Compare
run buildall |
TPC-H: Total hot run time: 40086 ms
|
TPC-DS: Total hot run time: 191103 ms
|
ClickBench: Total hot run time: 32.49 s
|
kaka11chen
force-pushed
the
query_directory_list_cache
branch
from
November 29, 2024 11:48
d0f5f61
to
6be0114
Compare
run buildall |
TPC-H: Total hot run time: 40430 ms
|
TPC-DS: Total hot run time: 191182 ms
|
ClickBench: Total hot run time: 32.25 s
|
kaka11chen
force-pushed
the
query_directory_list_cache
branch
from
December 3, 2024 06:01
6be0114
to
7333767
Compare
TPC-DS: Total hot run time: 198081 ms
|
ClickBench: Total hot run time: 32.99 s
|
morningman
added
dev/2.1.x-experimental
dev/3.0.x-experimental
and removed
dev/3.0.x
labels
Dec 30, 2024
kaka11chen
force-pushed
the
query_directory_list_cache
branch
from
December 30, 2024 07:06
7333767
to
5fd4a75
Compare
run buildall |
kaka11chen
force-pushed
the
query_directory_list_cache
branch
from
December 30, 2024 07:36
5fd4a75
to
c6e693c
Compare
run buildall |
TPC-H: Total hot run time: 32568 ms
|
TPC-DS: Total hot run time: 197007 ms
|
ClickBench: Total hot run time: 32.17 s
|
starocean999
previously approved these changes
Jan 2, 2025
github-actions
bot
added
the
approved
Indicates a PR has been approved by one committer.
label
Jan 2, 2025
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
morningman
previously approved these changes
Jan 3, 2025
morningman
force-pushed
the
query_directory_list_cache
branch
from
January 3, 2025 08:54
c6e693c
to
855a349
Compare
run buildall |
…directory files when getting split for each query.
kaka11chen
dismissed stale reviews from morningman and starocean999
via
January 3, 2025 09:25
4aea992
kaka11chen
force-pushed
the
query_directory_list_cache
branch
from
January 3, 2025 09:25
855a349
to
4aea992
Compare
github-actions
bot
removed
the
approved
Indicates a PR has been approved by one committer.
label
Jan 3, 2025
run buildall |
TPC-H: Total hot run time: 32975 ms
|
TPC-DS: Total hot run time: 190500 ms
|
ClickBench: Total hot run time: 30.84 s
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Refer to trino to implement the cache mechanism of multiple hive tables at the query level to obtain the file split list of each partition.
Because files within a query should have the same visibility, the split list of partitions that see the same table should be consistent across the query scope. So this cache is reasonable and should be enabled by default.
The mechanism in Trino is transactional level. A transaction can see the same table, so the command is
TransactionScopeCachingDirectoryLister
. This name is retained for Doris to expand to the transaction concept in the future.In addition, for this scenario, because the caffeine cache currently used by doris has an elimination phase strategy, the existing cache items in the window area may be eliminated immediately after the weight is updated. Therefore,
EvictableCache
which based on guava was introduced and eliminated based on segment LRU.Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)