Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve duplicate process time - Safe vs Fast #141

Open
didierga opened this issue Dec 1, 2017 · 2 comments
Open

Improve duplicate process time - Safe vs Fast #141

didierga opened this issue Dec 1, 2017 · 2 comments

Comments

@didierga
Copy link

didierga commented Dec 1, 2017

Under my understanding, at this time fslint does a double check for duplicate using md5sum then sha1sum to avoid md5sun collisions.

This double check is time consuming and in some case, depending of the amount and of the "value" of the files, I will prefer to have a faster single check mode with no sha1sum pass.

So I suggest to implement two modes: "Safe" the default one with double check and "Fast" with single check.

@pixelb
Copy link
Owner

pixelb commented Dec 2, 2017

Agreed, the double checking could probably be done more cleverly

@emergie
Copy link

emergie commented Feb 18, 2018

I have the same problem.

Right now I'm in a process of sorting about 40T of data on spinning rust.
fslint is a great help, but for my needs md5&sha1 verification is an overkill.

I've created PR #145 with a change that allows the user to tune accuracy/safeness of duplicate verification to suitable level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants