You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Vhpi already creates snapshots using hard-links, but vhpi doesn't know when files are moved around within a source directory. Moved files are being backuped into a new snapshot as if they were new files. The dedupe feature let's vhpi search for duplicate files among snapshots for each backup source and replaces them with hard-links, to keep the backup as slim as possible. I don't know (yet) if the Pi can handle the amount of overhead, thou. I think it would make most sense to add config options that allow to limit the amount of work that has to be done to find dupes. E.g. filter out all files that are less than 10MB, then search dupes. Or do only search dupes among snapshots that last a while, like 'monthly' and 'yearly' snapshots.
dedupe_min_file_size: xxx # Files smaller than this, will be excluded from dedupe process.
dedupe_snaps: ['monthly, yearly, ..'] # Dedupe will only run on the snapshots listed here.
dudupe_interval: 'weekly' # Define the dedupe interval.
This feature should be totally optional.
Brainstorming
Search duplicate files across all snapshots via fdupes for each Backup source.
Only absolute identical files with same permissions, timestamps, etc. are dupes.
Delete all duplicates and replace them with hardlinks. Kepp only one file for each dupe-group.
Add config option to let user set a custom interval for dupe removal.
Add an config option to define a minimum file size, only files that are bigger than set value are included in dupe removal. (Dupe removal does not make sense for little files.)
Add a config option to define which type of snapshots are to be included in dupe removal. (Dupe removal makes most sense to be used for snapshots that last long.)
The text was updated successfully, but these errors were encountered:
Dupe Replacement Feature
Vhpi already creates snapshots using hard-links, but vhpi doesn't know when files are moved around within a source directory. Moved files are being backuped into a new snapshot as if they were new files. The dedupe feature let's vhpi search for duplicate files among snapshots for each backup source and replaces them with hard-links, to keep the backup as slim as possible. I don't know (yet) if the Pi can handle the amount of overhead, thou. I think it would make most sense to add config options that allow to limit the amount of work that has to be done to find dupes. E.g. filter out all files that are less than 10MB, then search dupes. Or do only search dupes among snapshots that last a while, like 'monthly' and 'yearly' snapshots.
This feature should be totally optional.
Brainstorming
The text was updated successfully, but these errors were encountered: