Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --dry-run Functionality to TwinTrim for Safe Simulated Duplicate Removal #74

Merged
merged 5 commits into from
Oct 7, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 43 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

# TwinTrim

TwinTrim is a powerful and efficient tool designed to find and manage duplicate files across directories. It provides a streamlined way to scan files, identify duplicates based on their content, and remove them automatically or with user guidance, helping you save storage space and keep your file system organized.
Expand All @@ -9,29 +8,39 @@ TwinTrim is a powerful and efficient tool designed to find and manage duplicate
- **Automatic or Manual Removal**: Choose to handle duplicates automatically using the `--all` flag or manually select which files to delete.
- **Customizable Filters**: Set filters for minimum and maximum file sizes, file types, and specific filenames to exclude from the scan.
- **Multi-Threaded Processing**: Utilizes multi-threading to quickly scan and process large numbers of files concurrently.
- **Deadlock Prevention**: Implements locks to prevent deadlocks during multi-threaded operations, ensuring smooth and safe execution.
- **Deadlock Prevention**: Implements locks to prevent deadlocks during multi-threaded operations, ensuring
- **Dry Run**: Use the --dry-run option to simulate the process without making any changes, allowing you to review what will happen before executing.
smooth and safe execution.
- **User-Friendly Interface**: Offers clear prompts and feedback via the command line, making the process straightforward and interactive.

## How It Works

### Core Components

1. **File Metadata Management**:
- Uses `AllFileMetadata` and `FileMetadata` classes to manage file information, such as modification time and file paths.
- Maintains metadata in two dictionaries (`store` and `normalStore`) for handling different levels of duplicate management.

2. **File Hashing**:
- Generates a unique hash for each file using MD5 to identify duplicates by content.

1. **File Metadata Management**:

- Uses `AllFileMetadata` and `FileMetadata` classes to manage file information, such as modification time and file paths.
- Maintains metadata in two dictionaries (`store` and `normalStore`) for handling different levels of duplicate management.

2. **File Hashing**:

- Generates a unique hash for each file using MD5 to identify duplicates by content.

3. **File Filtering**:
- The `FileFilter` class provides functionality to filter files based on size, type, and exclusions.


- The `FileFilter` class provides functionality to filter files based on size, type, and exclusions.

4. **Duplicate Handling**:
- Duplicate files are identified by comparing their hashes.
- Based on file modification time, the latest file is retained, and older duplicates are removed.


- Duplicate files are identified by comparing their hashes.
- Based on file modification time, the latest file is retained, and older duplicates are removed.

5. **Dry Run Mode**:

The --dry-run flag allows you to simulate the duplicate removal process without making any actual changes, giving you an opportunity to review potential actions before committing to them.

5. **Deadlock Prevention**:
- Uses locks within multi-threaded processes to ensure that resources are accessed safely, preventing deadlocks that could otherwise halt execution.
- Uses locks within multi-threaded processes to ensure that resources are accessed safely, preventing deadlocks that could otherwise halt execution.

### Key Functions

Expand Down Expand Up @@ -59,23 +68,33 @@ python twinTrim.py <directory> [OPTIONS]
- `--exclude`: Exclude specific files by name.
- `--label-color`: Set the font color of the output label of the progress bar.
- `--bar-color`: Set the color of the progress bar.
- `--dry-run`: Simulate the duplicate removal process without making any changes.

### Examples

1. **Automatic Duplicate Removal**:
```bash
python twinTrim.py /path/to/directory --all
```

```bash
python twinTrim.py /path/to/directory --all
```

2. **Manual Review and Removal**:
```bash
python twinTrim.py /path/to/directory
```

```bash
python twinTrim.py /path/to/directory
```

3. **Filtered Scan by File Size and Type**:
```bash
python twinTrim.py /path/to/directory --min-size "50kb" --max-size "500mb" --file-type "txt"
```

```bash
python twinTrim.py /path/to/directory --min-size "50kb" --max-size "500mb" --file-type "txt"
```

4. **Dry Run Simulation**:

```bash
python twinTrim.py /path/to/directory --dry-run
```

## Dependencies

Expand Down Expand Up @@ -114,5 +133,3 @@ By participating in **TwinTrim**, you agree to abide by these guidelines and hel
## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.


26 changes: 24 additions & 2 deletions twinTrim/flags.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,10 @@
@click.option("--exclude", multiple=True, help="Files to exclude by name.")
@click.option("--label-color", default="yellow", type=str, help="Color of the label of progress bar.")
@click.option("--bar-color", default='#aaaaaa', type=str, help="Color of the progress bar.")
def cli(directory, all, min_size, max_size, file_type, exclude, label_color, bar_color):
@click.option("--dry-run", is_flag=True, help="Simulate the process without deleting files.")
def cli(directory, all, min_size, max_size, file_type, exclude, label_color, bar_color, dry_run):
"""Find and manage duplicate files in the specified DIRECTORY."""

# Initialize the FileFilter object
file_filter = FileFilter()
file_filter.setMinFileSize(parse_size(min_size))
Expand All @@ -36,8 +37,14 @@ def cli(directory, all, min_size, max_size, file_type, exclude, label_color, bar
file_filter.addFileExclude(file_name)

if all:
add-dry-run
if dry_run:
click.echo(click.style("Dry run mode enabled: Skipping actual deletion.", fg='yellow'))
handleAllFlag(directory, file_filter, label_color, bar_color, dry_run=dry_run) # Modify handleAllFlag to support dry_run
=======
logging.info("Deleting all duplicate files whithout asking.")
handleAllFlag(directory, file_filter, label_color, bar_color)
>>>>> main
return

start_time = time.time()
Expand Down Expand Up @@ -83,6 +90,20 @@ def cli(directory, all, min_size, max_size, file_type, exclude, label_color, bar
files_to_delete = [duplicates_list[int(option.split(")")[0]) - 1] for option in selected_indices]

for file_path in files_to_delete:
add-dry-run
if dry_run:
click.echo(click.style(f"[Dry Run] Would delete: {file_path}", fg='yellow'))
else:
handle_and_remove(file_path)

if not dry_run:
click.echo(click.style("Selected duplicate files removed!", fg='green'))
else:
click.echo(click.style("Dry run completed. No files were actually deleted.", fg='yellow'))

click.echo(click.style(f"Time taken: {time_taken:.2f} seconds.", fg='green'))

=======
try:
handle_and_remove(file_path)
logging.info(f"Deleted duplicate file: {file_path}")
Expand All @@ -98,3 +119,4 @@ def cli(directory, all, min_size, max_size, file_type, exclude, label_color, bar

if __name__ == "__main__":
cli()
main
Loading