Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Introduce level0FileCount for partitions table #4074

Closed
wants to merge 1 commit into from

Conversation

MonsterChenzhuo
Copy link
Contributor

Purpose

Linked issue: close #xxx

Tests

PartitionsTableTest#testLevel0FileCountValue

API and Format

Documentation

@JingsongLi
Copy link
Contributor

Thanks @MonsterChenzhuo for the contribution.

But what is usage of level0file?

@MonsterChenzhuo
Copy link
Contributor Author

MonsterChenzhuo commented Aug 29, 2024

Thanks @MonsterChenzhuo for the contribution.

But what is usage of level0file?

In the scenario where a table has the DelVector enabled, users can quickly determine whether data has been written and whether compaction has been completed by checking the level0file, especially when there is no data found for the current partition

However, if you use $files, the results may not be intuitive. This often requires users to perform aggregation to interpret the results.

@JingsongLi
Copy link
Contributor

Thanks @MonsterChenzhuo for the contribution.
But what is usage of level0file?

In the scenario where a table has the DelVector enabled, users can quickly determine whether data has been written and whether compaction has been completed by checking the level0file, especially when there is no data found for the current partition

However, if you use $files, the results may not be intuitive. This often requires users to perform aggregation to interpret the results.

But this is depends on per bucket? We should know the maxLevel0FilesInBucket and avgLevel0FilesInBucket, maybe it is better to just let them in metrics.

@MonsterChenzhuo
Copy link
Contributor Author

MonsterChenzhuo commented Sep 2, 2024

Thanks @MonsterChenzhuo for the contribution.
But what is usage of level0file?

In the scenario where a table has the DelVector enabled, users can quickly determine whether data has been written and whether compaction has been completed by checking the level0file, especially when there is no data found for the current partition
However, if you use $files, the results may not be intuitive. This often requires users to perform aggregation to interpret the results.

But this is depends on per bucket? We should know the maxLevel0FilesInBucket and avgLevel0FilesInBucket, maybe it is better to just let them in metrics.

maxLevel0FilesInBucket and avgLevel0FilesInBucket,

For real-time writes to the Paimon table, we use real-time compaction and collect metrics to monitor maxLevel0FilesInBucket and avgLevel0FilesInBucket. However, for scenarios with infrequent updates (such as T+1) that require high throughput and low consumption, using offline compaction to monitor the number of L0 files through metrics feels less convenient compared to using system tables.

There is an operational path as follows:
Check the system table to see if there are any L0 data remaining in the partition:
SELECT * FROM default.T$partitions;
If there are, use an SQL stored procedure to execute compaction:
CALL sys.compaction(table => default.T);

@JingsongLi
Copy link
Contributor

It seems a specific usage, let's wait future requirements.

@JingsongLi JingsongLi closed this Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants