Skip to content

Commit c61881d

Browse files
committed
First release
1 parent fe13627 commit c61881d

File tree

4 files changed

+547
-2
lines changed

4 files changed

+547
-2
lines changed

README.md

Lines changed: 104 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,104 @@
1-
# VideoDupChecker
2-
Check your video collection for duplicates!
1+
<p align="center">
2+
<h1 align="center">VideoDupChecker</h1>
3+
<p align="center">
4+
Check your video collection for duplicates!
5+
<br />
6+
</p>
7+
</p>
8+
9+
<br>
10+
11+
## ℹ About
12+
13+
VideoDupChecker is a command line tool for detecting video duplicates in a folder structure and checks if a smaller video is part of a larger video stream. It also detects if a specified portion (more info further down below) of a video file matches another one in the given folder.
14+
15+
The program supports multiple modes for different use cases, including comparing videos across folders or within specific subfolders (such as "Extras").
16+
17+
## Features
18+
- **Duplicate Detection**: Compare videos to identify duplicates based on file content.
19+
- **Flexible Modes**: Check a specific folder, focus on "Extras" or scan complete movie directories.
20+
- **Customizable Threshold**: Adjust the match threshold to suit your needs (default is 95%).
21+
22+
## Notes
23+
- The tool only checks if the video stream of a file matches that of another one. There can still be differences in the audio tracks or included subtitles. This has to be checked before you can delete them safely.
24+
25+
## Requirements
26+
27+
- **Video format**: Your video files need to be in the MKV format.
28+
29+
- **MKVToolNix**: You need the `mkvextract.exe` tool for extracting video tracks. Either install MKVToolNix or place `mkvextract.exe` in the same directory as the `VideoDupChecker.exe` file.
30+
31+
- **RAM**: The program requires as much RAM as the combined size of the two largest video files being compared. Make sure your system has sufficient available memory to handle the file sizes, especially when working with large video files.
32+
33+
- **Temporary Storage**: The program requires temporary storage to process the video files. It will create a temp folder in the directory you are executing the program in.
34+
The storage requirements depend on the mode you are running (further down you will find more info about them):
35+
- In `check_folder` mode: You will need temporary storage for the entire contents of the folder being processed.
36+
- In `check_extras_folder` mode: You will need temporary storage for the largest `Extras` subfolder.
37+
- In `check_movie_folder` mode: You will need temporary storage for the largest `movie` folder (including subfolders).
38+
39+
Ensure you have enough space in the directory where you execute the program. The program will automatically clean up the temporary files after processing.
40+
41+
42+
## Usage
43+
44+
### Command-Line Arguments
45+
46+
The program is run via the terminal with the following arguments:
47+
48+
```
49+
VideoDupChecker.exe <folder_path> --mode <mode> [--threshold <percentage>]
50+
```
51+
<br/>
52+
53+
The arguments are as follows:
54+
55+
- **folder_path**: Path to the folder you want to process.
56+
- **--mode**: Specifies the mode of operation.
57+
- `check_folder`: Compares all videos contained in this folder and all its subfolders for duplicates.
58+
- `check_extras_folder`: Looks specifically for videos in 'Extras' subfolders within the specified folder structure. The program will scan each folder within the specified top-level directory and process any 'Extras' subfolders it finds.
59+
An example folder structure for this case:
60+
61+
Movies
62+
├── Movie 1
63+
│ └── Extras
64+
├── Movie 2
65+
│ └── Extras
66+
└── Movie 3
67+
└── Extras
68+
69+
- `check_movie_folder`: Compares all videos directly within movie folders, including their subfolders.
70+
The program will iterate through all subdirectories within the specified top-level
71+
directory and process videos in each folder and its subfolders.
72+
An example folder structure for this case:
73+
74+
Movies
75+
├── Movie 1
76+
│ ├── Video1.mkv
77+
│ ├── Video2.mkv
78+
│ └── Subfolder
79+
├── Movie 2
80+
│ └── Video3.mkv
81+
└── Movie 3
82+
├── Video4.mkv
83+
└── Subfolder
84+
85+
- **--threshold** (optional): Percentage threshold for considering partial matches (default: 95%). The program checks if the first or last 95% of the video file matches with another file. You can specify a value between 0 and 100.
86+
87+
## Example Usage
88+
89+
```
90+
VideoDupChecker.exe "C:\path\to\folder" --mode check_folder --threshold 90
91+
```
92+
<br/>
93+
For more info you can also run:
94+
95+
```
96+
VideoDupChecker.exe -h
97+
```
98+
99+
## Self-Compilation with Nuitka
100+
101+
If you want to compile VideoDupChecker yourself, you can do so using Nuitka. Here is the command you need to use:
102+
```
103+
nuitka VideoDupChecker.py --standalone --onefile --windows-icon-from-ico=icon.ico
104+
```

VideoDupChecker.py

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
from comparevideosmodule import process_folder
2+
import os
3+
import time
4+
import argparse
5+
from collections import defaultdict
6+
7+
8+
def process_folders(base_folder, mode, threshold):
9+
"""
10+
Processes folders based on the selected mode.
11+
12+
Args:
13+
base_folder (str): Base folder to process.
14+
mode (str): "check_extras_folder", "check_movie_folder", or "check_folder".
15+
threshold (float): The percentage threshold for considering partial matches.
16+
17+
Returns:
18+
defaultdict: Matches grouped by folder.
19+
"""
20+
all_matches = defaultdict(list)
21+
22+
# For "check_extras_folder" and "check_movie_folder", subfolders are identical
23+
subfolders = [
24+
os.path.join(base_folder, subfolder)
25+
for subfolder in os.listdir(base_folder)
26+
if os.path.isdir(os.path.join(base_folder, subfolder))
27+
]
28+
29+
# For "check_folder", the subfolders list will just contain the base_folder
30+
if mode == "check_folder":
31+
subfolders = [base_folder]
32+
33+
total_folders = len(subfolders)
34+
35+
for index, folder_path in enumerate(subfolders, start=1):
36+
print(f"\n\nProcessing folder: {folder_path} (Folder {index} of {total_folders})\n")
37+
38+
if mode == "check_extras_folder":
39+
# Process "Extras" folder inside each movie folder
40+
extras_folder = os.path.join(folder_path, "Extras")
41+
if os.path.exists(extras_folder):
42+
matches = process_folder(extras_folder, mode, threshold)
43+
if matches:
44+
all_matches[extras_folder].extend(matches)
45+
elif mode == "check_movie_folder" or mode == "check_folder":
46+
# Process all video files in the given folder (and subfolders for check_movie_folder)
47+
matches = process_folder(folder_path, mode, threshold)
48+
if matches:
49+
all_matches[folder_path].extend(matches)
50+
51+
return all_matches
52+
53+
54+
def validate_threshold(value):
55+
"""
56+
Validates that the provided threshold is a float between 0 and 100.
57+
"""
58+
try:
59+
f_value = float(value)
60+
except ValueError:
61+
raise argparse.ArgumentTypeError(f"{value} is not a valid float.")
62+
if not (0 <= f_value <= 100):
63+
raise argparse.ArgumentTypeError(f"{value} is out of range. Must be between 0 and 100.")
64+
return f_value
65+
66+
67+
def main():
68+
parser = argparse.ArgumentParser(
69+
description="Process video folders for duplicate detection and comparison.",
70+
formatter_class=argparse.RawTextHelpFormatter # Preserve formatting in help text
71+
)
72+
parser.add_argument(
73+
"folder_path",
74+
type=str,
75+
help="Path to the folder to process."
76+
)
77+
parser.add_argument(
78+
"--mode",
79+
type=str,
80+
choices=["check_folder", "check_extras_folder", "check_movie_folder"],
81+
required=True,
82+
help=(
83+
"Specifies the mode of operation:\n\n"
84+
"1. check_folder:\n"
85+
" Compares all videos contained in this folder and all its subfolders for duplicates.\n\n"
86+
"2. check_extras_folder:\n"
87+
" Looks specifically for videos in 'Extras' subfolders within the specified folder structure.\n"
88+
" The program will scan each folder within the specified top-level directory and\n"
89+
" process any 'Extras' subfolders it finds.\n"
90+
" Example folder structure:\n\n"
91+
" Movies\n"
92+
" ├── Movie 1\n"
93+
" │ └── Extras\n"
94+
" ├── Movie 2\n"
95+
" │ └── Extras\n"
96+
" └── Movie 3\n"
97+
" └── Extras\n\n"
98+
"3. check_movie_folder:\n"
99+
" Compares all videos directly within movie folders, including their subfolders.\n"
100+
" The program will iterate through all subdirectories within the specified top-level\n"
101+
" directory and process videos in each folder and its subfolders.\n"
102+
" Example folder structure:\n\n"
103+
" Movies\n"
104+
" ├── Movie 1\n"
105+
" │ ├── Video1.mkv\n"
106+
" │ ├── Video2.mkv\n"
107+
" │ └── Subfolder\n"
108+
" ├── Movie 2\n"
109+
" │ └── Video3.mkv\n"
110+
" └── Movie 3\n"
111+
" ├── Video4.mkv\n"
112+
" └── Subfolder\n\n"
113+
),
114+
)
115+
parser.add_argument(
116+
"--threshold",
117+
type=validate_threshold,
118+
default=95.0,
119+
help=(
120+
"Optional: Percentage threshold for considering partial matches (default: 95%%). "
121+
"The program checks if the first 95%% or the last 95%% of the video file matches with another file."
122+
)
123+
)
124+
125+
args = parser.parse_args()
126+
127+
start = time.time()
128+
129+
all_matches = process_folders(args.folder_path, args.mode, args.threshold)
130+
131+
if not all_matches:
132+
print("\n\nNo duplicates or matches found in any folder.\n")
133+
else:
134+
print("\n\nCombined Matches:")
135+
print("-------------------------\n")
136+
for folder, matches in all_matches.items():
137+
print(f"Matches found in {folder}:")
138+
for small, large in matches:
139+
print(f"- {small} is part of or matches {large} by more than {args.threshold}%.")
140+
print()
141+
142+
end = time.time()
143+
print(f"Time elapsed: {end - start}s.")
144+
145+
if __name__ == "__main__":
146+
main()

0 commit comments

Comments
 (0)