Cleaning up stale parts #107

ruslandoga · 2023-09-07T13:08:16Z

👋

Just wanted to quickly check if chdb automatically cleans stale MergeTree parts after optimize table commands. And if so, how does it work?

The text was updated successfully, but these errors were encountered:

auxten · 2023-09-07T13:40:48Z

No, we haven't do this. To be fixed

ruslandoga · 2023-09-07T14:49:09Z

In that case this chat with GPT that explains the naming might be useful: https://chat.openai.com/share/94a7fa2d-5f73-4f9f-b2ba-681ed324ae35

ruslandoga · 2023-09-07T14:51:34Z

Seems like we can find the part with the highest LEVEL, check its MAX_BLOCK_NUMBER, and delete all parts with MAX_BLOCK_NUMBER less than that.

auxten · 2023-09-08T07:06:41Z

There should be some code in clickhouse doing that in background thread pool. We just need to find it out, run it immediately on Optimize. Any interest to make a patch? @ruslandoga

ruslandoga · 2023-09-09T03:23:22Z

Yes, I'm interested and will try doing it today! :)

ruslandoga · 2023-09-09T11:13:27Z

The logic seems to be in MergeTreeData::clearOldPartsFromFilesystem but I'm a bit stuck at compiling ClickHouse, it's already taking over four hours and I'm at [7671/8349] Building CXX object src/CMakeFiles/dbms.dir/Interpreters/DatabaseAndTableWithAlias.cpp.o 😅

I'm new to C++ so I wonder if there is a faster way to build ClickHouse / chdb just for tests?

lmangani · 2023-09-09T11:16:44Z

Hey @ruslandoga I'm afraid the first build is as painful as you're experiencing (our action takes > 5h to build) but if you add ccache to the mix the next compile and link rounds with minor modifications will be much faster. If you're on our discord feel free to ping on the dev channel and we'll try to assist.

ruslandoga · 2023-09-11T06:49:31Z

I tried to find your discord server but failed :)

auxten · 2023-09-11T06:50:41Z

I tried to find your discord server but failed :)

Imangani == qxip

ruslandoga · 2023-09-11T06:53:25Z

Thank you @auxten, but google doesn't return anything for "qxip discord" either.

auxten · 2023-09-11T06:54:28Z

Here are we: https://discord.com/channels/1098133460310294528/1125668654965604422

ruslandoga · 2023-09-11T06:58:15Z

It doesn't seem like this link leads me anywhere :) I think it might be specific to your user account. Or maybe I need to be invited first.

lmangani · 2023-09-11T07:04:16Z

Hello @ruslandoga the link and invite to our discord is on the chdb readme in the contact section. Once you join you'll find us all at once on the chdb channels! Looking forward to you joining!

ruslandoga · 2023-10-11T08:52:52Z

A small note on the suggested implementation:

We just need to find it out, run it immediately on Optimize.

It seems like ClickHouse is cleaning old parts after ~eight minutes since they become inactive (i.e. stopped being referenced). Apparently, it's done this way to make sure the dirty pages have been fsynced (assuming dirty_writeback_centisecs = 5 minutes) and that in case of a crash (before fsync) the data could be restored. So I guess the custom implementation would need to call fsync on the new parts and only then clean the old parts.

I also found in the docs that the old parts can be deleted with SQL, so technically, what I initially wanted could be achieved like this:

INSERT INTO events FORMAT RowBinary <...rowbinary...>;

-- in the background process, every few minutes
SELECT * FROM system.parts WHERE table = 'events'; -- check how many parts there are, if too many, run OPTIMIZE
OPTIMIZE TABLE events;
-- also if there are inactive parts older than 8 minutes, drop them
ALTER TABLE events DROP PART '<part_id goes here>';

auxten · 2023-10-13T03:11:55Z

@ruslandoga Nice, I think the 8 mins thing is reasonable in ClickHouse. But not in an embedded database. I would patch chdb to run something to do the cleanup automatically.

devcrafter · 2023-10-26T08:57:08Z

There are MergeTree settings which control usage of fsync(). Please check min_rows_to_fsync_after_merge and/or min_compressed_bytes_to_fsync_after_merge.

The relevant code is here

devcrafter · 2023-10-26T08:59:53Z

Hey @ruslandoga I'm afraid the first build is as painful as you're experiencing (our action takes > 5h to build) but if you add ccache to the mix the next compile and link rounds with minor modifications will be much faster. If you're on our discord feel free to ping on the dev channel and we'll try to assist.

There is also documentation on how to build ClickHouse - https://clickhouse.com/docs/en/development/build

lmangani · 2023-10-26T09:05:24Z

There is also documentation on how to build ClickHouse - https://clickhouse.com/docs/en/development/build

I don't see any related issues with the build process and we have chdb-builder helper

devcrafter · 2023-10-26T10:51:37Z

There is also documentation on how to build ClickHouse - https://clickhouse.com/docs/en/development/build

I don't see any related issues with the build process and we have chdb-builder helper

Had no idea you have it. Just read your comment and thought that this should be covered by build documentation. Just tried to help

poundifdef · 2024-04-18T19:42:37Z

Seems like we can find the part with the highest LEVEL, check its MAX_BLOCK_NUMBER, and delete all parts with MAX_BLOCK_NUMBER less than that.

Would this be safe to do?

ruslandoga added the question Further information is requested label Sep 7, 2023

auxten added bug Something isn't working help wanted Extra attention is needed and removed question Further information is requested labels Sep 8, 2023

ruslandoga changed the title ~~Cleaning up stale blocks~~ Cleaning up stale parts Oct 11, 2023

auxten added the Session label Sep 26, 2024

ruslandoga mentioned this issue Nov 6, 2024

Reimplement the session mode #197

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleaning up stale parts #107

Cleaning up stale parts #107

ruslandoga commented Sep 7, 2023 •

edited

Loading

auxten commented Sep 7, 2023

ruslandoga commented Sep 7, 2023

ruslandoga commented Sep 7, 2023 •

edited

Loading

auxten commented Sep 8, 2023 •

edited

Loading

ruslandoga commented Sep 9, 2023

ruslandoga commented Sep 9, 2023 •

edited

Loading

lmangani commented Sep 9, 2023 •

edited

Loading

ruslandoga commented Sep 11, 2023

auxten commented Sep 11, 2023

ruslandoga commented Sep 11, 2023

auxten commented Sep 11, 2023

ruslandoga commented Sep 11, 2023 •

edited

Loading

lmangani commented Sep 11, 2023

ruslandoga commented Oct 11, 2023 •

edited

Loading

auxten commented Oct 13, 2023

devcrafter commented Oct 26, 2023

devcrafter commented Oct 26, 2023

lmangani commented Oct 26, 2023 •

edited

Loading

devcrafter commented Oct 26, 2023 •

edited

Loading

poundifdef commented Apr 18, 2024

Cleaning up stale parts #107

Cleaning up stale parts #107

Comments

ruslandoga commented Sep 7, 2023 • edited Loading

auxten commented Sep 7, 2023

ruslandoga commented Sep 7, 2023

ruslandoga commented Sep 7, 2023 • edited Loading

auxten commented Sep 8, 2023 • edited Loading

ruslandoga commented Sep 9, 2023

ruslandoga commented Sep 9, 2023 • edited Loading

lmangani commented Sep 9, 2023 • edited Loading

ruslandoga commented Sep 11, 2023

auxten commented Sep 11, 2023

ruslandoga commented Sep 11, 2023

auxten commented Sep 11, 2023

ruslandoga commented Sep 11, 2023 • edited Loading

lmangani commented Sep 11, 2023

ruslandoga commented Oct 11, 2023 • edited Loading

auxten commented Oct 13, 2023

devcrafter commented Oct 26, 2023

devcrafter commented Oct 26, 2023

lmangani commented Oct 26, 2023 • edited Loading

devcrafter commented Oct 26, 2023 • edited Loading

poundifdef commented Apr 18, 2024

ruslandoga commented Sep 7, 2023 •

edited

Loading

ruslandoga commented Sep 7, 2023 •

edited

Loading

auxten commented Sep 8, 2023 •

edited

Loading

ruslandoga commented Sep 9, 2023 •

edited

Loading

lmangani commented Sep 9, 2023 •

edited

Loading

ruslandoga commented Sep 11, 2023 •

edited

Loading

ruslandoga commented Oct 11, 2023 •

edited

Loading

lmangani commented Oct 26, 2023 •

edited

Loading

devcrafter commented Oct 26, 2023 •

edited

Loading