-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LEA compressor #36
Comments
Nice, I figured there's a good community for experimental compressors but never looked for it. Any chance for the source of LEA, or at least a Linux binary? |
I don't know, I mostly just read this forum. |
Hi, I'm the author of LEA, I've just updated it to v0.3, which brings some nice speed improvements. Its purpose is diametrically opposite to that of QOI though, it seeks to be pareto-optimal at reasonably high compression ratios. I highly recommend joining the community at the encode.su forum if anyone is interested in data compression. |
@MarcioPais that's awesome news, thanks! Do you provide the source code anywhere (I couldn't find it anywhere)? What's the license? |
Wine is fine as a last resort but I'm getting some questionable (but repeatable) results, fair comparisons seem to be hindered by lack of native binary. Here's some with LEA 0.3 and qic 1.demo under wine:
The zen2 part is stronger so it makes sense that my code runs quicker on it, qic and lea should run quicker too but don't with qic being much slower. Wine overhead should be minimal and hit both sets of hardware about the same, never seen a massive difference like this before. There are wine version differences between the hardware which I'm looking into, otherwise the zen2 hardware is stronger in pretty much every respect. The skylake part is closer to the hardware that qic was (probably heavily) optimised for which may be part of it, the qic skylake result seems about right.
QOI is weak by compression standards, but a variant may have some utility as a preprocessor to a generic compressor. Nothing mindblowing, these early results look like some refinement could snag a red highlight on the LPCB chart below QIC:
edit: An older wine version on the zen2 part was the issue, what a difference a year makes. |
@dumblob @chocolate42 |
t24 just does a lossless transform, p24 packs the transformed pixels into qoi-like delta ops of 1/2/3/4 bytes per pixel. p24 compresses slightly worse than t24 but feeds the entropy coder much less data. The p24 packing appears to be a good fit for ~2/3 of LPCB images, a second stage finding a better fit may improve efficiency (if fjxl efficiency can be beaten we're then competing with qic's speed for a red highlight). Alternatively somehow more than double the speed (probably has to be with vector instructions) for a red highlight below fjxl. Nothing innovative, but it's an interesting exercise in optimising basic memory management if nothing else. Any fast generic compressors worth trying to replace zstd? |
As I've said, no generic postcoder is going to help; at the particular zone of the pareto frontier that you're targeting, you just need a good SIMD decorrelator followed by a SIMD entropy coder. I see you've joined the forum. You should probably make a new thread there for some crowdsourced ideas, as this is probably getting a bit off-topic with regards to the purpose of this repo. QOI is interesting not for being particularly good, but for it's simplicity - it's a nice way to start learning about data compression. |
Makes sense, I thought a postcoder was picking up the slack from a weak decorrelator but after checking (what I believe is) the optimal entropy coding of handling each component as a separate stream the existing preproc+entropy can beat fjxl on size in theory, assuming the maths holds up. ANS should do it but I'm not there yet.
I'll do that, when I've learned a bit more and have a chance of not coming off as a complete crank. There is a little juice left in a qoi-like preproc but probably not enough to beat qic on size. |
You seem to already have a pretty good grasp of this, I'm sure people there will appreciate your efforts. Beating QIC on size isn't the problem, doing it at about the same order of speed with a qoi-like approach is. From a quick glance at the issues posted here and on the original repo, it seems no one was even trying simple context modelling.
i.e., using a fast pseudo-luminance approximation to select the hash table to use. You'd need to run some tests to find the best quantization on a large, good representative testset so as not to dilute the contexts, but it will almost surely provide much better hit rates. And if you can get acceptable hit rates with only 32 entries per table, you get more freedom to design the opcodes. |
Got a basic ANS working (using ryg_rans) which confirms that a simple qoi-like can beat fjxl on size (p24ans), it's ~40% slower than QIC but has had no optimisation. 3 qoi ops (232b, 464, 888), a stream per component plus a control stream for the opcodes makes 10 streams (only 4 are used per pixel, the streams are for efficiency not speed). Now that byte alignment is out of the window and direct entropy coding is used, all the qoi-like encoding is really doing is partitioning the pixels such that more predictable pixels use more efficient entropy coding. We'd be better off discarding qoi-like ops in favour of a better partitioning scheme. Having one g stream, 9 r streams and 9 b streams (partitioned based on the bit depth of g), gets us to ~1307MB in theory (g24ans). Adding RLE (requiring an additional run stream and a control stream) gets us to ~1284MB in theory (g24ans_rle), possibly beating QIC on size depending on how optimal ANS is. An adaptive (per image) partitioning scheme is worth exploring but may be expensive.
In the interest of having something that should be easier to SIMD I've avoided index ops entirely for now. If and when a more efficient encoder is attempted thank you for that rather large hint. Once g24* has been implemented and confirmed I'll probably make a thread on the forum. edit: Unfortunately ANS (or at least the way I'm using it) doesn't come close enough to the entropy limit for g24rle to beat qic:
g24rle does two passes, one to gather frequency stats and one to statically encode ANS. Adaptive ANS might have better efficiency and might also be quicker as it would eliminate a pass, TODO. |
https://encode.su/threads/3818-LEA-Lossless-image-compressor
The text was updated successfully, but these errors were encountered: