Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update video duplicate finder and more #1425

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

qarmin
Copy link
Owner

@qarmin qarmin commented Dec 29, 2024

  • updated vid_dup_lib_finder to latest version
  • used jxl -> image-rs converter provided by jxl library
  • using bigger buffer size to speedup checking for files in HDD and also SSD(biggest gains are in HDD)

Performance comparison of using Array and Vector with specific buffer sizes for reading files from disk and calculating their hashes.
This is quite realistic scenario which also uses rayon which sometimes sometimes mess with predictability of results.
My computer have quite good CPU, but cheap Sata SSD, so results shows disk

(Time to read files and calculate hashes in parallel, smaller is better)

Name 250000 files ~50 KB(SSD) 170 files 5MB-150MB(SSD) 1 file 0.9 GB(SSD) 6200 files 50KB - 50MB(HDD) 1 file 671 MB(HDD)
Array 16KB Base Base Base Base Base
Vector 16KB 0% 0% 0% 0% 0%
Vector 1MB -7% -4% -16% -45% 0%
Thread local Vector 1MB -12% -4% -16% -45% 0%

I tried to use locks to read at max 1 file from hdd, but there was no performance gains

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant