-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaning up stale parts #107
Comments
No, we haven't do this. To be fixed |
In that case this chat with GPT that explains the naming might be useful: https://chat.openai.com/share/94a7fa2d-5f73-4f9f-b2ba-681ed324ae35 |
Seems like we can find the part with the highest LEVEL, check its MAX_BLOCK_NUMBER, and delete all parts with MAX_BLOCK_NUMBER less than that. |
There should be some code in clickhouse doing that in background thread pool. We just need to find it out, run it immediately on |
Yes, I'm interested and will try doing it today! :) |
The logic seems to be in I'm new to C++ so I wonder if there is a faster way to build ClickHouse / chdb just for tests? |
Hey @ruslandoga I'm afraid the first build is as painful as you're experiencing (our action takes > 5h to build) but if you add ccache to the mix the next compile and link rounds with minor modifications will be much faster. If you're on our discord feel free to ping on the dev channel and we'll try to assist. |
I tried to find your discord server but failed :) |
Imangani == qxip |
Thank you @auxten, but google doesn't return anything for "qxip discord" either. |
It doesn't seem like this link leads me anywhere :) I think it might be specific to your user account. Or maybe I need to be invited first. |
Hello @ruslandoga the link and invite to our discord is on the chdb readme in the contact section. Once you join you'll find us all at once on the chdb channels! Looking forward to you joining! |
A small note on the suggested implementation:
It seems like ClickHouse is cleaning old parts after ~eight minutes since they become inactive (i.e. stopped being referenced). Apparently, it's done this way to make sure the dirty pages have been fsynced (assuming dirty_writeback_centisecs = 5 minutes) and that in case of a crash (before fsync) the data could be restored. So I guess the custom implementation would need to call fsync on the new parts and only then clean the old parts. I also found in the docs that the old parts can be deleted with SQL, so technically, what I initially wanted could be achieved like this: INSERT INTO events FORMAT RowBinary <...rowbinary...>;
-- in the background process, every few minutes
SELECT * FROM system.parts WHERE table = 'events'; -- check how many parts there are, if too many, run OPTIMIZE
OPTIMIZE TABLE events;
-- also if there are inactive parts older than 8 minutes, drop them
ALTER TABLE events DROP PART '<part_id goes here>'; |
@ruslandoga Nice, I think the |
There are MergeTree settings which control usage of The relevant code is here |
There is also documentation on how to build ClickHouse - https://clickhouse.com/docs/en/development/build |
I don't see any related issues with the build process and we have chdb-builder helper |
Had no idea you have it. Just read your comment and thought that this should be covered by build documentation. Just tried to help |
Would this be safe to do? |
👋
Just wanted to quickly check if
chdb
automatically cleans stale MergeTree parts afteroptimize table
commands. And if so, how does it work?The text was updated successfully, but these errors were encountered: