Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][Flink][Spark] Add deletedFileTotalLenInBytes in result of OrphanFilesClean #4545

Conversation

wwj6591812
Copy link
Contributor

@wwj6591812 wwj6591812 commented Nov 19, 2024

Purpose

In my company, after we run a RemoveOrphanFilesAction, we not only want to know the number of orphan files that have been cleared, but also want to know how much capacity has been cleared.

This pr add deletedFileTotalLenInBytes in result of OrphanFilesClean.

Linked issue: close #xxx

Tests

API and Format

Documentation

@wwj6591812 wwj6591812 force-pushed the introduce_CleanOrphanFilesResult_for_support_deletedFileTotalSizeInBytes_1119 branch from 10a4a45 to 94a868e Compare November 19, 2024 01:55
Copy link
Member

@xuzifu666 xuzifu666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@wwj6591812 wwj6591812 force-pushed the introduce_CleanOrphanFilesResult_for_support_deletedFileTotalSizeInBytes_1119 branch 2 times, most recently from 0361a11 to da67df0 Compare November 21, 2024 02:39
@wwj6591812
Copy link
Contributor Author

@JingsongLi
Hi, please cc, Thx.
The failed UT was not caused by my modifications.

@wwj6591812 wwj6591812 changed the title [core] add deletedFileTotalSizeInBytes in result of OrphanFilesClean [Core][Flink][Spark] Add deletedFileTotalSizeInBytes in result of OrphanFilesClean Nov 24, 2024
deleteFiles::add,
p -> {
try {
deletedFilesSizeInBytes.addAndGet(fileIO.getFileSize(p));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getFileSize should always be together with delete, otherwise, I feel there may be issues with execution efficiency, getFileSize should be executed parallelismly too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getFileSize should always be together with delete, otherwise, I feel there may be issues with execution efficiency, getFileSize should be executed parallelismly too.

@JingsongLi Very thanks for your suggestions.

I would like to confirm with you the code improvement plan:
Are you means that getFileStatus execute only once in OrphanFilesClean#createFileCleaner, and fileCleaner can direct return the size of deleted file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think about it, maybe we don't need to invoke getFileSize, the files come from listStatus, it already contains file size.

Copy link
Contributor Author

@wwj6591812 wwj6591812 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Good idea.
I will have a try.
Thx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think about it, maybe we don't need to invoke getFileSize, the files come from listStatus, it already contains file size.

@JingsongLi
Hi,please cc, THX.

@wwj6591812 wwj6591812 force-pushed the introduce_CleanOrphanFilesResult_for_support_deletedFileTotalSizeInBytes_1119 branch 2 times, most recently from 366f382 to f97585e Compare November 25, 2024 10:35
@wwj6591812 wwj6591812 force-pushed the introduce_CleanOrphanFilesResult_for_support_deletedFileTotalSizeInBytes_1119 branch from f97585e to 66d1f25 Compare November 25, 2024 13:41
@wwj6591812 wwj6591812 changed the title [Core][Flink][Spark] Add deletedFileTotalSizeInBytes in result of OrphanFilesClean [Core][Flink][Spark] Add deletedFileTotalLenInBytes in result of OrphanFilesClean Nov 26, 2024
Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wwj6591812 , looks good to me!

@JingsongLi JingsongLi merged commit 8d57d3d into apache:master Nov 26, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants