A Python-based fixed-length text compressor that encodes text using variable bit-length codes based on a custom alphabet in table.txt
, defaulting to a-z,space
(27 chars + EOS, 5 bits) for broad compatibility, or smaller alphabets (e.g., 8 chars + EOS, 4 bits) for better compression. It uses a dynamic end-of-stream (EOS) marker and no metadata, achieving a compression ratio of 1.38 for "hello world"
(11 bytes to 8 bytes) with the default a-z,space
table.
- Variable Bit Encoding: Uses
ceil(log2(N + 1))
bits per character, whereN
is the table size (e.g., 5 bits for 27 chars, 4 bits for 8 chars). - No Metadata: Output (
compressed.hex
) contains only the bitstream and EOS, minimizing size. - Table-Based: Requires
table.txt
(default:a,b,c,...,z,space
) for encoding/decoding, with the same table needed for both. - Compression Ratio: 1.38 for
"hello world"
(11 bytes → 8 bytes, 55-bit bitstream + 5-bit EOS with default table). - Stats: Reports original size, compressed size (bytes/bits), bitstream size, and ratio.
- No Dependencies: Uses only Python standard library.
- Install Python: Ensure Python 3.x is installed (python.org).
- Clone Repository:
git clone https://github.com/yourusername/QubitText-Compressor.git cd QubitText-Compressor
python QubitText-Compressor.py <mode> [--file <input_file>] [--text <input_text>] [--table <table_file>]
Encode the string "hello world"
using the default character table:
python QubitText-Compressor.py encode --text "hello world"
This will create a compressed.hex
file containing the encoded bitstream.
Specifies the operation mode:
- encode: Compresses input (text or file) into a hexadecimal bitstream saved as
compressed.hex
. - decode: Decompresses a hexadecimal bitstream from
compressed.hex
into readable text saved asdecompressed.txt
.
-
--file <input_file>
Input file path:- For encode: Path to a
.txt
file containing text to compress. - For decode: Path to a
.hex
file containing the compressed bitstream.
- For encode: Path to a
-
--text <input_text>
Direct string input:- For encode: Plain text to compress (e.g.,
"hello world"
). - For decode: A hexadecimal string to decompress (less commonly used).
- For encode: Plain text to compress (e.g.,
-
--table <table_file>
Path to character table file (default:table.txt
in current directory).
Must be used for both encoding and decoding if a custom table is involved.