[Feature] Improve the tryCommitOnce behavior in FileStoreCommitImpl #3351

xiangyuf · 2024-05-20T07:32:36Z

Search before asking

I searched in the issues and found nothing similar.

Motivation

When using dedicated compactions in production, we've found the write only job and compact job will failover every 2 or 3 days even if the remote filesystem support atomic rename operation.

The main cause is the FileAlreadyExistsException:

Checking with recent Hadoop API Rename implementation, we found the rename api will return FileAlreadyExistsException for rename api instead of false by default.
https://github.com/apache/hadoop/blob/branch-3.3.6/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java

IMHO, this can be improved by catch certain exceptions in tryCommitOnce and return false to upper caller.

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

I'm willing to submit a PR!

xiangyuf · 2024-05-22T07:50:28Z

Hi @JingsongLi, WDYT about this?

tsreaper · 2024-07-20T06:24:44Z

Could you explain what kinds of exceptions you would like to catch, and where you would like to catch them?

xiangyuf added the enhancement New feature or request label May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Improve the tryCommitOnce behavior in FileStoreCommitImpl #3351

[Feature] Improve the tryCommitOnce behavior in FileStoreCommitImpl #3351

xiangyuf commented May 20, 2024

xiangyuf commented May 22, 2024

tsreaper commented Jul 20, 2024

[Feature] Improve the tryCommitOnce behavior in FileStoreCommitImpl #3351

[Feature] Improve the tryCommitOnce behavior in FileStoreCommitImpl #3351

Comments

xiangyuf commented May 20, 2024

Search before asking

Motivation

Solution

Anything else?

Are you willing to submit a PR?

xiangyuf commented May 22, 2024

tsreaper commented Jul 20, 2024