TwinTrim is a powerful and efficient tool designed to find and manage duplicate files across directories. It provides a streamlined way to scan files, identify duplicates based on their content, and remove them automatically or with user guidance, helping you save storage space and keep your file system organized.
- Duplicate Detection: Scans directories to detect duplicate files based on file content rather than just filenames.
- Automatic or Manual Removal: Choose to handle duplicates automatically using the
--all
flag or manually select which files to delete. - Customizable Filters: Set filters for minimum and maximum file sizes, file types, and specific filenames to exclude from the scan.
- Multi-Threaded Processing: Utilizes multi-threading to quickly scan and process large numbers of files concurrently.
- Deadlock Prevention: Implements locks to prevent deadlocks during multi-threaded operations, ensuring smooth and safe execution.
- User-Friendly Interface: Offers clear prompts and feedback via the command line, making the process straightforward and interactive.
-
File Metadata Management:
- Uses
AllFileMetadata
andFileMetadata
classes to manage file information, such as modification time and file paths. - Maintains metadata in two dictionaries (
store
andnormalStore
) for handling different levels of duplicate management.
- Uses
-
File Hashing:
- Generates a unique hash for each file using MD5 to identify duplicates by content.
-
File Filtering:
- The
FileFilter
class provides functionality to filter files based on size, type, and exclusions.
- The
-
Duplicate Handling:
- Duplicate files are identified by comparing their hashes.
- Based on file modification time, the latest file is retained, and older duplicates are removed.
-
Deadlock Prevention:
- Uses locks within multi-threaded processes to ensure that resources are accessed safely, preventing deadlocks that could otherwise halt execution.
- add_or_update_file: Adds new files to the metadata store or updates existing entries if a duplicate is detected.
- add_or_update_normal_file: Similar to
add_or_update_file
but manages duplicates in a separate store. - handleAllFlag: Handles duplicate removal automatically without user intervention.
- find_duplicates: Finds duplicate files in the specified directory and prepares them for user review or automatic handling.
Run the script using the following command:
python twinTrim.py <directory> [OPTIONS]
--all
: Automatically delete duplicates without asking for confirmation.--min-size
: Specify the minimum file size to include in the scan (e.g.,10kb
).--max-size
: Specify the maximum file size to include in the scan (e.g.,1gb
).--file-type
: Specify the file type to include (e.g.,.txt
,.jpg
).--exclude
: Exclude specific files by name.
-
Automatic Duplicate Removal:
python twinTrim.py /path/to/directory --all
-
Manual Review and Removal:
python twinTrim.py /path/to/directory
-
Filtered Scan by File Size and Type:
python twinTrim.py /path/to/directory --min-size "50kb" --max-size "500mb" --file-type "txt"
- Python 3.6+
click
for command-line interactiontqdm
for progress barsconcurrent.futures
for multi-threaded processingbeaupy
for interactive selection
Clone the repository and install the required dependencies using Poetry:
git clone https://github.com/Kota-Karthik/twinTrim.git
cd twinTrim
poetry install
If you haven't installed Poetry yet, you can do so by following the instructions on the Poetry website.
Contributions are welcome! Whether you have ideas for improving the internal workings of TwinTrim, such as optimizing performance or refining algorithms, or you want to enhance the user interface of the CLI tool for a better user experience, your input is valuable. Please fork the repository and submit a pull request with your improvements or new features.
This project is licensed under the MIT License - see the LICENSE file for details.