A Tiny, High-Performance File Integrity and Comparison Tool
“A truly great program will always fit on a floppy disk. Hopefully, someone out there still remembers what those were… But it’s not about the floppies — it’s about quality software!”© :-D
Software author: Dennis V. Razumovsky
This program is distributed under the CC0 (Creative Commons Share Alike) license. The author is not responsible for any use of the source code or the entire program. Anyone who uses the code or the program uses it at their own risk and responsibility.
precizer is a lightweight and blazing-fast command-line application written entirely in pure C. It is designed for file integrity verification and comparison, making it particularly useful for checking synchronization results. The program recursively traverses directories, generating a database of files and their checksums for quick and efficient comparisons.
Built for both embedded platforms and large-scale clustered mainframes, precizer helps detect synchronization errors by comparing files and their checksums across different sources. It can also be used to analyze historical changes by comparing databases generated at different points in time from the same source.
Consider a scenario where two machines have large mounted volumes at /mnt1
and /mnt2
, respectively, containing identical data. The goal is to verify, byte by byte, whether the contents are truly identical or if discrepancies exist.
- Run precizer on the first machine (e.g., hostname
host1
):
precizer --progress /mnt1
This command recursively traverses all directories under /mnt1
, creating a database file host1.db
in the current directory. The --progress
flag provides real-time progress updates, displaying the total traversed space and the number of processed files.
- Run precizer on the second machine (e.g., hostname
host2
):
precizer --progress /mnt2
This will generate a database file host2.db
in the current directory.
- Copy
host1.db
andhost2.db
to one of the machines and run the following command to compare them:
precizer --compare host1.db host2.db
The output will display:
- Files that exist on
host1
but are missing onhost2
, and vice versa. - Files present on both hosts but with different checksums.
precizer stores only relative file paths in its database. For example, a file located at:
/mnt1/abc/def/aaa.txt
will be stored as:
abc/def/aaa.txt
without the /mnt1
prefix. Similarly, the corresponding file on /mnt2
:
/mnt2/abc/def/aaa.txt
will also be stored as:
abc/def/aaa.txt
This ensures that even when files reside in different mount points or sources, they can still be compared accurately under the same relative paths and their respective checksums.
Consider a scenario where a primary storage system has a backup copy. For example, this could be a data center storage and its Disaster Recovery copy. Synchronization from the primary storage to the backup occurs periodically, but due to the massive data volumes, synchronization is most likely not performed byte-by-byte but rather by detecting metadata changes within the file system. In such cases, file size and modification time are taken into account, but the actual content is not verified byte by byte. This approach makes sense because the primary data center and the Disaster Recovery site usually have high-speed communication channels, but a full byte-by-byte synchronization would take an unreasonably long time. Tools like rsync allow both types of synchronization — metadata-based and byte-by-byte — but they have one major drawback: state is not preserved between sessions. Let’s analyze this issue with the following scenario:
- Given: Server "A" and Server "B" (Primary Data Center and Disaster Recovery)
- Some files have been modified on Server "A".
- The rsync algorithm detects them based on changes in size and modification time and synchronizes them to Server "B".
- Multiple connection failures occur during synchronization between the Primary Data Center and the Disaster Recovery site.
- To verify data integrity (i.e., ensuring that files on "A" and "B" are identical byte by byte), rsync is often used with byte-by-byte comparison. The process works as follows:
- rsync is launched on Server "A" with the
--checksum
mode, attempting to compute checksums sequentially on both "A" and "B" in a single session. - This process takes an extremely long time for large-scale storage systems.
- Since rsync does not save computed checksums between sessions, it introduces several technical challenges:
- If the connection drops, rsync terminates the session, and on the next run, everything must start from scratch! Given the huge data volumes, performing a byte-by-byte verification for full data integrity becomes an impossible task.
- Storage subsystem failures can also lead to binary inconsistencies. In such cases, file system metadata cannot reliably determine whether file contents on "A" and "B" are truly identical.
- Over time, errors accumulate, increasing the risk of maintaining an inconsistent Disaster Recovery copy of system "A" on system "B", rendering the entire Disaster Recovery effort useless. Standard utilities do not detect these inconsistencies, and technical personnel may be completely unaware of data integrity problems in the Disaster Recovery storage.
- rsync is launched on Server "A" with the
- To overcome these limitations, precizer was developed. The program identifies exactly which files differ between "A" and "B" so that they can be resynchronized with the necessary corrections. The tool operates at maximum speed (pushing hardware performance to its limits) because it is written in pure C and utilizes high-performance algorithms optimized for efficiency. The program is designed to handle both small files and petabyte-scale data volumes, with no upper limits.
- The name precizer comes from the word precision, implying something that enhances accuracy.
- The program precisely analyzes directory contents, including subdirectories, computing checksums for every encountered file while storing metadata in an SQLite database (a regular binary file).
- precizer is fault-tolerant and can resume execution from the point of interruption. For example, if the program is terminated via Ctrl+C while analyzing a petabyte-scale file, it will NOT restart from the beginning but continue exactly where it left off using previously recorded data in the database. This significantly saves resources, time, and effort for system administrators.
- The program can be interrupted at any time using any method, and this is completely safe for both the scanned data and the database created by precizer.
- If the program is intentionally or accidentally stopped, there is no need to worry about losing progress. All results are fully preserved and can be used in subsequent runs.
- SHA512 is used for checksum computation, ensuring high speed and collision-free results, even when analyzing petabyte-scale files. If two huge files differ by just a single byte, SHA512 will detect it, whereas weaker hash functions (such as SHA1 or CRC32) do not guarantee the same accuracy.
- The precizer algorithms are designed so that it is easy to keep stored data up-to-date without recomputing everything from scratch. Simply run the program with the
--update
parameter, and the database will be updated with new files, while records of deleted files will be removed. For modified files whose sizes have changed, a new SHA512 checksum will be computed and updated in the database. - An additional option allows database updates to consider not only file size changes but also creation or modification timestamps. This means that any metadata changes will trigger an SHA512 checksum recalculation and database update. For example, if a file's ctime changes but its size remains the same, the checksum will NOT be recomputed if only
--update
is used. This is because ctime can change due to permission modifications, which does not indicate content changes. However, if the--watch-timestamps
parameter is specified, any metadata change (e.g., mtime or ctime) will trigger a full SHA512 checksum recalculation and database update. - By comparing databases from the same sources at different points in time, precizer can function as a security monitoring tool, detecting intrusion consequences by identifying unauthorized file modifications where content has changed, but metadata remains unchanged.
- The program never modifies, deletes, moves, or copies any files or directories it scans. Instead, it only builds lists of files, computes checksums, and stores them in a database. All modifications occur exclusively within the database, which is a simple binary file.
- The program’s performance is primarily limited by the disk subsystem. Every file is read byte by byte, and a SHA512 checksum is computed for each one.
- The program is extremely fast thanks to the use of SQLite and FTS (man 3 fts).
- Command-line argument parsing is implemented using ARGP library.
- Regular expressions are handled via PCRE2 libpcre2.
- The program is safe to use with massive numbers of files, directories, and deeply nested subdirectories. With FTS, recursion is avoided, preventing stack overflow issues even in cases of extreme directory depth.
- Due to its compact and portable code, the program can be used even on specialized NAS devices, embedded systems, or IoT platforms.
- The
--help
option is designed to be as detailed as possible, specifically to assist users who may not have advanced technical knowledge. - You can reach out to the author via:
- GitHub Discussions.
- You can also report a bug on GitHub.
- If you run into issues while using the program, feel free to ask a question on stackoverflow.com using the precizer tag. The author actively monitors such questions and will be happy to help with troubleshooting any problems.
A fully ready-to-use version can be downloaded here.
The prebuilt version is a statically linked ELF binary that can be run immediately on nearly any x64 Linux distribution. The binary is automatically built using GitHub's CI/CD pipeline, then compressed with UPX (an executable file packer). The final self-extracting compressed binary is then placed inside a zip archive for easier downloading. To use it, simply extract the zip file and run the executable.
- The author has set up an automated build system using GitHub Workflows and will continue maintaining new versions.
- However, the author is not willing to personally package and maintain precizer for all existing operating system distributions.
- If you are eager to create a package for a specific distribution but encounter significant challenges adapting the code, the author will gladly provide assistance in optimizing the program for that distribution or package manager. Contact details can be found in the “Questions & Bug Reports” section.
The build process produces a statically linked ELF binary with no external dependencies. This self-contained executable can run on nearly any modern Linux distribution.
Most required libraries are embedded into the binary, and by default, the program is built as a static executable. This approach enhances portability and eliminates dependency issues. Thanks to this setup, compiling the program on most modern platforms is straightforward — just follow these steps:
- Install build and compile tools on Linux
sudo pacman -S --noconfirm base-devel
sudo apt -y install build-essential
sudo apk add --update build-base fts-dev argp-standalone
- Get the source code
git clone https://github.com/precizer/precizer.git
cd precizer
- Build the project
make
-
Copy the compiled precizer binary to any directory listed in the system's
$PATH
to enable quick execution. -
Clean up
# Remove build artifacts
make clean
# Remove all build files, including compiled libraries
make clean-all
- Update
git pull
make
# Then proceed to step 4.
Repeat steps 1. and 2. Instead of step 3, run:
make portable
If you prefer not to install additional packages on your system, you can use a preconfigured Docker-based build environment.
To build the project, all you need is a working installation of Docker.
Running the simple make docker
command:
git clone https://github.com/precizer/precizer.git
cd precizer
make docker
will generate a compiled precizer
binary in the current directory. You can either run it from there or move it to a directory listed in $PATH
.
If make
is not installed, you can still build the application inside a container with these commands:
git clone https://github.com/precizer/precizer.git
cd precizer
docker build -t precizer .
docker create --name precizer precizer
docker cp precizer:/precizer/precizer precizer
docker rm -f precizer
This will produce a statically linked ELF binary in the current directory.
If you run into compatibility issues with the compiled binary across different systems, you can try increasing its portability:
git clone https://github.com/precizer/precizer.git
cd precizer
make docker-portable
or
git clone https://github.com/precizer/precizer.git
cd precizer
docker build --build-arg OS=ubuntu:18.04 --build-arg BUILD=portable -t precizer .
docker create --name precizer precizer
docker cp precizer:/precizer/precizer precizer
docker rm -f precizer
To evaluate the program’s capabilities, you can use the test sets available in the tests/examples/
directory within the source code.
Run tests with the following commands:
git clone https://github.com/precizer/precizer.git
cd precizer
make debug
cd tests/
make debug
./testitall
Add files to two databases and compare them with each other:
precizer --progress --database=database1.db tests/examples/diffs/diff1
precizer --progress --database=database2.db tests/examples/diffs/diff2
precizer --compare database1.db database2.db
The comparison of database1.db and database2.db databases is starting…
Starting database file database1.db integrity check…
Database database1.db has been verified and is in good condition
Starting database file database2.db integrity check…
Database database2.db has been verified and is in good condition
These files are no longer in the database1.db but still exist in the database2.db
path1/AAA/BCB/CCC/b.txt
These files are no longer in the database2.db but still exist in the database1.db
path2/AAA/ZAW/D/e/f/b_file.txt
The SHA512 checksums of these files do not match between database1.db and database2.db
2/AAA/BBB/CZC/a.txt
3/AAA/BBB/CCC/a.txt
4/AAA/BBB/CCC/a.txt
path1/AAA/ZAW/D/e/f/b_file.txt
path2/AAA/BCB/CCC/a.txt
Comparison of database1.db and database2.db databases is complete
The precizer completed its execution without any issues
Database Update
Let’s run the previous example again. First attempt. Warning message.
precizer --progress --database=database1.db tests/examples/diffs/diff1
Database file name: database1.db
The database database1.db has been created in the past and already contains data with files and their checksums. Use the --update
option if there is full confidence that update the content of the database is really need and the information about those files which was changed, removed or added should be deleted or updated against DB.
The precizer unexpectedly finished due to an error.
The --update parameter must be included. This parameter is required to protect the database from data loss caused by accidental execution.
precizer --update --progress --database=database1.db tests/examples/diffs/diff1
Primary database file name: database1.db
Starting database file database1.db integrity check…
Database database1.db has been verified and is in good condition
File system traversal initiated to calculate file count and storage usage
Total size: 45B, total items: 58, dirs: 46, files: 12, symlnks: 0
The database file database1.db has NOT been modified since the program was launched
The precizer completed its execution without any issues
Now let's make some adjustments:
# Modify a file
echo -n " " >> tests/examples/diffs/diff1/1/AAA/BCB/CCC/a.txt
# Add a new file
touch tests/examples/diffs/diff1/1/AAA/BCB/CCC/c.txt
# Remove a file
rm tests/examples/diffs/diff1/path2/AAA/ZAW/D/e/f/b_file.txt
And run precizer again, this time with the --update
parameter:
precizer --update --progress --database=database1.db tests/examples/diffs/diff1
Primary database file name: database1.db
Starting database file database1.db integrity check…
Database database1.db has been verified and is in good condition
File system traversal initiated to calculate file count and storage usage
Total size: 43B, total items: 58, dirs: 46, files: 12, symlnks: 0
The --update option has been used, so the information about files will be updated against the database database1.db
File traversal started
These files have been added or changed and those changes will be reflected against the DB database1.db:
1/AAA/BCB/CCC/a.txt changed size & ctime & mtime rehashed
1/AAA/BCB/CCC/c.txt added
File traversal complete
Total size: 43B, total items: 58, dirs: 46, files: 12, symlnks: 0
These files are no longer exist or ignored and will be deleted against the DB database1.db:
path2/AAA/ZAW/D/e/f/b_file.txt
Start vacuuming the primary database…
The primary database has been vacuumed
The database file database1.db has been modified since the program was launched
The precizer completed its execution without any issues
Every time precizer runs, it traverses the file system and then checks whether a record for a specific file already exists in the database. In other words, the program prioritizes the current state of the file system on disk.
The directory traversal in precizer works similarly to rsync as it uses a similar algorithm.
It's important to note that precizer will not recalculate SHA512 checksums for files that are already recorded in the database, as long as their metadata remains unchanged (such as size and last access time, atime). If the --watch-timestamps
argument is specified, the program will also consider the creation time (ctime) and modification time (mtime) in addition to the file size.
Any new, deleted, or modified files between application runs will be processed accordingly. All changes will be reflected in the database if the --update
parameter is specified.
Using the --silent
mode. When this mode is enabled, the program does not produce any output on the screen. This is useful when precizer is used in scripts.
Let's add the --silent parameter to the previous example:
precizer --silent --update --progress --database=database1.db tests/examples/diffs/diff1
As a result, nothing will be displayed on the screen.
Additional Information in --verbose
mode. This mode can be useful for debugging.
Let's add the --verbose parameter to the previous example:
precizer --verbose --update --progress --database=database1.db tests/examples/diffs/diff1
2025-01-25 09:55:59:820 src/parse_arguments.c:442:parse_arguments:Configuration: rational_logger_mode=VERBOSE
paths=tests/examples/diffs/diff1; database=database1.db; db_file_name=database1.db; verbose=yes; maxdepth=-1; silent=no; force=no; update=yes; watch-timestamps=no; progress=yes; compare=no, db-clean-ignored=no, dry-run=no, check-level=FULL, rational_logger_mode=VERBOSE
2025-01-25 09:55:59:820 src/parse_arguments.c:558:parse_arguments:Arguments parsed
2025-01-25 09:55:59:820 src/detect_paths.c:025:detect_paths:Checking directory paths provided as arguments
2025-01-25 09:55:59:820 src/file_availability.c:034:file_availability:Verify that the path tests/examples/diffs/diff1 exists
2025-01-25 09:55:59:820 src/file_availability.c:053:file_availability:The path tests/examples/diffs/diff1 is exists and it is a directory
2025-01-25 09:55:59:821 src/detect_paths.c:036:detect_paths:Paths detected
2025-01-25 09:55:59:821 src/init_signals.c:034:init_signals:Set signal SIGUSR2 OK:pid:604770
2025-01-25 09:55:59:821 src/init_signals.c:043:init_signals:Set signal SIGINT OK:pid:604770
2025-01-25 09:55:59:821 src/init_signals.c:052:init_signals:Set signal SIGTERM OK:pid:604770
2025-01-25 09:55:59:821 src/init_signals.c:055:init_signals:Signals initialized
2025-01-25 09:55:59:821 src/determine_running_dir.c:018:determine_running_dir:Current directory: /tmp
2025-01-25 09:55:59:821 src/db_determine_name.c:099:db_determine_name:Primary database file name: database1.db
2025-01-25 09:55:59:821 src/db_determine_name.c:105:db_determine_name:Primary database file path: database1.db
2025-01-25 09:55:59:821 src/db_determine_name.c:109:db_determine_name:DB name determined
2025-01-25 09:55:59:821 src/file_availability.c:034:file_availability:Verify that the path . exists
2025-01-25 09:55:59:821 src/file_availability.c:053:file_availability:The path . is exists and it is a directory
2025-01-25 09:55:59:821 src/file_availability.c:034:file_availability:Verify that the path database1.db exists
2025-01-25 09:55:59:821 src/file_availability.c:044:file_availability:The path database1.db is exists and it is a file
2025-01-25 09:55:59:821 src/db_determine_mode.c:128:db_determine_mode:Final value for config->sqlite_open_flag: SQLITE_OPEN_READWRITE
2025-01-25 09:55:59:821 src/db_determine_mode.c:129:db_determine_mode:Final value for config->db_initialize_tables: false
2025-01-25 09:55:59:821 src/db_determine_mode.c:131:db_determine_mode:DB mode determined
2025-01-25 09:55:59:821 src/db_test.c:061:db_test:Starting database file database1.db integrity check…
2025-01-25 09:55:59:821 src/db_test.c:082:db_test:The database verification level has been set to FULL
2025-01-25 09:55:59:821 src/db_test.c:126:db_test:Database database1.db has been verified and is in good condition
2025-01-25 09:55:59:822 src/db_get_version.c:087:db_get_version:Version number 1 found in database
2025-01-25 09:55:59:822 src/db_check_version.c:032:db_check_version:The database1.db database file is version 1
2025-01-25 09:55:59:822 src/db_check_version.c:061:db_check_version:The database database1.db is on version 1 and does not require any upgrades
2025-01-25 09:55:59:822 src/db_init.c:030:db_init:Successfully opened database database1.db
2025-01-25 09:55:59:822 src/db_init.c:118:db_init:The primary database and tables have NOT been initialized
2025-01-25 09:55:59:822 src/db_init.c:150:db_init:The primary database named database1.db is ready for operations
2025-01-25 09:55:59:822 src/db_init.c:167:db_init:The in-memory runtime_paths_id database successfully attached to the primary database database1.db
2025-01-25 09:55:59:822 src/db_init.c:174:db_init:Database initialization process completed
2025-01-25 09:55:59:822 src/db_compare.c:136:db_compare:Database comparison mode is not enabled. Skipping comparison
2025-01-25 09:55:59:822 src/db_contains_data.c:086:db_contains_data:The database database1.db has already been created previously
2025-01-25 09:55:59:822 src/db_validate_paths.c:192:db_validate_paths:The paths written against the database and the paths passed as arguments are completely identical
2025-01-25 09:55:59:822 src/file_list.c:143:file_list:File system traversal initiated to calculate file count and storage usage
2025-01-25 09:55:59:823 src/file_list.c:038:show_status:Total size: 43B, total items: 58, dirs: 46, files: 12, symlnks: 0
2025-01-25 09:55:59:825 src/db_get_version.c:087:db_get_version:Version number 1 found in database
2025-01-25 09:55:59:825 src/db_consider_vacuum_primary.c:025:db_consider_vacuum_primary:No changes were made. The primary database doesn't require vacuuming
2025-01-25 09:55:59:825 src/status_of_changes.c:049:status_of_changes:The database file database1.db has NOT been modified since the program was launched
2025-01-25 09:55:59:825 src/exit_status.c:027:exit_status:The precizer completed its execution without any issues
Non-recursive traversal using the --maxdepth
parameter
tree tests/examples/4
tests/examples/4
├── AAA
│ ├── BBB
│ │ ├── CCC
│ │ │ └── a.txt
│ │ └── uuu.txt
│ └── tttt.txt
└── sss.txt
3 directories, 4 files
The --maxdepth=0
parameter completely disables recursion.
precizer --maxdepth=0 tests/examples/4
Primary database file name: myhost.db
The path myhost.db doesn't exist or it is not a file
The primary DB file not yet exists. Brand new database will be created
Recursion depth limited to: 0
File traversal started
These files will be added against the myhost.db database:
sss.txt
File traversal complete
Total size: 2B, total items: 5, dirs: 4, files: 1, symlnks: 0
Start vacuuming the primary database…
The primary database has been vacuumed
The database myhost.db has been modified since the last check (files were added, removed, or updated)
The precizer completed its execution without any issues
Example of a Path to Ignore. To specify a pattern for ignoring files or directories, you can use PCRE2 regular expressions. Note: All paths in the regular expression must be specified as relative.
You can test and validate PCRE2 regular expressions using https://regex101.com/.
To understand how a relative path looks, simply run a directory traversal without the --ignore
option and check how the terminal displays the relative paths recorded in the database:
% tree -L 3 tests/examples/diffs
tests/examples/diffs
├── diff1
│ ├── 1
│ │ └── AAA
│ ├── 2
│ │ └── AAA
│ ├── 3
│ │ └── AAA
│ ├── 4
│ │ └── AAA
│ ├── path1
│ │ └── AAA
│ └── path2
│ └── AAA
└── diff2
├── 1
│ └── AAA
├── 2
│ └── AAA
├── 3
│ └── AAA
├── 4
│ └── AAA
├── path1
│ └── AAA
└── path2
└── AAA
26 directories, 0 files
precizer --ignore="diff1/1/.*" tests/examples/diffs
In this example, the initial traversal path is ./tests/examples/diffs
, and the generated ignore path is ./tests/examples/diffs/diff1/1/
along with all its subdirectories (/*
).
Primary database file name: myhost.db
The path myhost.db doesn't exist or it is not a file
The primary DB file not yet exists. Brand new database will be created
File traversal started
These files will be added against the myhost.db database:
diff1/1/AAA/BCB/CCC/a.txt ignored & not added
diff1/1/AAA/ZAW/A/b/c/a_file.txt ignored & not added
diff1/1/AAA/ZAW/D/e/f/b_file.txt ignored & not added
diff1/2/AAA/BBB/CZC/a.txt
diff1/3/AAA/BBB/CCC/a.txt
diff1/4/AAA/BBB/CCC/a.txt
diff1/path1/AAA/BCB/CCC/a.txt
diff1/path1/AAA/ZAW/A/b/c/a_file.txt
diff1/path1/AAA/ZAW/D/e/f/b_file.txt
diff1/path2/AAA/BCB/CCC/a.txt
diff1/path2/AAA/ZAW/A/b/c/a_file.txt
diff1/path2/AAA/ZAW/D/e/f/b_file.txt
diff2/1/AAA/BCB/CCC/a.txt
diff2/1/AAA/ZAW/A/b/c/a_file.txt
diff2/1/AAA/ZAW/D/e/f/b_file.txt
diff2/2/AAA/BBB/CZC/a.txt
diff2/3/AAA/BBB/CCC/a.txt
diff2/4/AAA/BBB/CCC/a.txt
diff2/path1/AAA/BCB/CCC/a.txt
diff2/path1/AAA/BCB/CCC/b.txt
diff2/path1/AAA/ZAW/A/b/c/a_file.txt
diff2/path1/AAA/ZAW/D/e/f/b_file.txt
diff2/path2/AAA/BCB/CCC/a.txt
diff2/path2/AAA/ZAW/A/b/c/a_file.txt
File traversal complete
Total size: 97B, total items: 114, dirs: 90, files: 24, symlnks: 0
Start vacuuming the primary database…
The primary database has been vacuumed
The database myhost.db has been modified since the last check (files were added, removed, or updated)
The precizer completed its execution without any issues
Enjoy your life!
Let's repeat the same example, but this time without the --ignore
option to include the three previously ignored files:
precizer --update tests/examples/diffs
Primary database file name: myhost.db
Starting database file myhost.db integrity check…
Database myhost.db has been verified and is in good condition
The --update option has been used, so the information about files will be updated against the database myhost.db
File traversal started
These files have been added or changed and those changes will be reflected against the DB myhost.db:
diff1/1/AAA/BCB/CCC/a.txt add
diff1/1/AAA/ZAW/A/b/c/a_file.txt add
diff1/1/AAA/ZAW/D/e/f/b_file.txt add
File traversal complete
Total size: 97B, total items: 114, dirs: 90, files: 24, symlnks: 0
Start vacuuming the primary database…
The primary database has been vacuumed
The database file myhost.db has been modified since the program was launched
The precizer completed its execution without any issues
Continuation of the Previous Example Example 6.
Multiple regular expressions for ignoring files can be specified simultaneously using the --ignore
option.
The database will be cleaned of references to files matching the regular expressions provided via the --ignore
arguments: "diff1/1/.*"
and "diff2/1/.*"
.
The --db-clean-ignored
parameter must be explicitly specified to remove database entries for files that match the patterns passed through the --ignore
option.
No changes were made to the file system, but the ignored files will be removed from the database.
# Update the database by removing entries for files that were marked as ignored:
precizer \
--update \
--db-clean-ignored \
--ignore="diff1/1/.*" \
--ignore="diff2/1/.*" \
tests/examples/diffs
Primary database file name: myhost.db
Starting database file myhost.db integrity check…
Database myhost.db has been verified and is in good condition
The --update option has been used, so the information about files will be deleted against the database myhost.db
These files are no longer exist or ignored and will be deleted against the DB myhost.db:
diff1/1/AAA/BCB/CCC/a.txt clean ignored
diff1/1/AAA/ZAW/A/b/c/a_file.txt clean ignored
diff1/1/AAA/ZAW/D/e/f/b_file.txt clean ignored
diff2/1/AAA/BCB/CCC/a.txt clean ignored
diff2/1/AAA/ZAW/A/b/c/a_file.txt clean ignored
diff2/1/AAA/ZAW/D/e/f/b_file.txt clean ignored
Start vacuuming the primary database…
The primary database has been vacuumed
The database file myhost.db has been modified since the program was launched
The precizer completed its execution without any issues
Using --ignore
together with --include
# Remove the old database and create a new one, then populate it with data:
rm -i "${HOST}.db"
precizer tests/examples/diffs
Let's complicate things by using regular expressions.
PCRE2 regular expressions for relative paths that need to be included. The specified relative paths will be included even if they were excluded using one or more --ignore
parameters. Multiple regular expressions can be specified using --include
.
To check and test PCRE2 regular expressions, you can use https://regex101.com/.
The DB will be cleaned of references to files matching the regular expressions provided in the --ignore
arguments: "^.*/path2/.*"
and "diff2/.*"
, but paths matching the patterns in --include
will remain in the database.
The --db-clean-ignored
parameter must be specified additionally to remove references to files matching the regular expressions passed via the --ignore
options from the database.
# Update the database, removing references to files that were marked as ignored,
# except for paths matching the --include patterns.
precizer --update --db-clean-ignored \
--ignore="^.*/path2/.*" \
--ignore="diff2/.*" \
--include="diff2/1/AAA/ZAW/A/b/c/.*" \
--include="diff2/path1/AAA/ZAW/.*" \
tests/examples/diffs
Primary database file name: myhost.db
Starting database file myhost.db integrity check…
Database myhost.db has been verified and is in good condition
The --update option has been used, so the information about files will be deleted against the database myhost.db
These files are no longer exist or ignored and will be deleted against the DB myhost.db:
diff1/path2/AAA/BCB/CCC/a.txt clean ignored
diff1/path2/AAA/ZAW/A/b/c/a_file.txt clean ignored
diff1/path2/AAA/ZAW/D/e/f/b_file.txt clean ignored
diff2/1/AAA/BCB/CCC/a.txt clean ignored
diff2/1/AAA/ZAW/D/e/f/b_file.txt clean ignored
diff2/2/AAA/BBB/CZC/a.txt clean ignored
diff2/3/AAA/BBB/CCC/a.txt clean ignored
diff2/4/AAA/BBB/CCC/a.txt clean ignored
diff2/path1/AAA/BCB/CCC/a.txt clean ignored
diff2/path1/AAA/BCB/CCC/b.txt clean ignored
diff2/path2/AAA/BCB/CCC/a.txt clean ignored
diff2/path2/AAA/ZAW/A/b/c/a_file.txt clean ignored
Start vacuuming the primary database…
The primary database has been vacuumed
The database file myhost.db has been modified since the program was launched
The precizer completed its execution without any issues