Improve duplicate process time - Safe vs Fast #141

didierga · 2017-12-01T22:33:32Z

Under my understanding, at this time fslint does a double check for duplicate using md5sum then sha1sum to avoid md5sun collisions.

This double check is time consuming and in some case, depending of the amount and of the "value" of the files, I will prefer to have a faster single check mode with no sha1sum pass.

So I suggest to implement two modes: "Safe" the default one with double check and "Fast" with single check.

pixelb · 2017-12-02T23:59:58Z

Agreed, the double checking could probably be done more cleverly

emergie · 2018-02-18T11:42:30Z

I have the same problem.

Right now I'm in a process of sorting about 40T of data on spinning rust.
fslint is a great help, but for my needs md5&sha1 verification is an overkill.

I've created PR #145 with a change that allows the user to tune accuracy/safeness of duplicate verification to suitable level.

emergie mentioned this issue Feb 18, 2018

Feature: duplicate detection algorithm #145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve duplicate process time - Safe vs Fast #141

Improve duplicate process time - Safe vs Fast #141

didierga commented Dec 1, 2017

pixelb commented Dec 2, 2017

emergie commented Feb 18, 2018 •

edited

Loading

Improve duplicate process time - Safe vs Fast #141

Improve duplicate process time - Safe vs Fast #141

Comments

didierga commented Dec 1, 2017

pixelb commented Dec 2, 2017

emergie commented Feb 18, 2018 • edited Loading

emergie commented Feb 18, 2018 •

edited

Loading