-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diff from average instead of previous pixel #34
Comments
Impressive compression numbers from a relatively simple change. There's a slight change in opcodes but roughly speaking most of the difference from the 420 of your previous variant to the 405 here is from diffing the average? |
The main downside of this is that it doesn't work well with a streaming encoder/decoder.. |
The other changes from opcodes are there only for better handling of images with an alpha channel at the cost of turning luma686 into luma676 which made the compression a bit worse for 'images' and better for 'images-lance'. The main downside would indeed be for a streaming encoder / decoder because you have to revisit previous pixels. |
A streaming encoder's memory requirements relying on the width of the image is the biggest issue IMO, I wonder if there's a way to have a well-performing fixed-size previous-line cache. Alternatively it would probably be good enough to set a max size of the previous-line cache to something like 4096 pixels, that way 99% of input benefits fully from averaging and the rest can at least benefit from averaging for the first 4096 pixels in every row (OTOH, it may not work in practice). |
There was also this tiling proposal - maybe just read the whole image in terms of a sequence of tiles instead of sequence of pixels would do the trick? |
that would work, but it is strictly harder for streaming encoders. |
Since you would need the pixel right above the current one you need to cache a full width of pixels. |
Is it? I mean, the simpliest solution (to ignore tile boundaries for optimization/compression and optimize/compress only each tile individually) could be viewed as a list of small unrelated pictures (of course some operations like dithering won't work this way without artifacts, but that's not what's being discussed in this thread, right?). That sounds darn streaming-oriented and quite easy to implement (not that I'd have time to try it out now - it's just my thinking aloud). |
My proposal fixes the cache to some size (lets say 8192 pixels which sounds reasonable as an upper bound of expected input size). This would mean the 7680x4320 image uses an averaged reference everywhere, as would a 7680x20,000,000 image. The 20,000,000 x 2 image would use averaging for the first 8192 pixels in each row, previous pixel for the rest where it has no upper reference. If it works it would allow averaging for 99% of input without screwing streaming encoders or forcing high memory requirements. The main drawback is introducing a condition for the boundary of high-width input. The branch predictor can probably be leaned on instead of complicating the implementation as for 99% of input a branch predictor should effectively nullify a branch that only ever goes one way(?). |
That should work. |
I thought about the rounding of the average a bit more. |
Here's my numbers for 'regular'
Compile with:
|
The Gamut library https://github.com/AuburnSounds/gamut provides both QOI format and "QOIX", an evolving format which is @wbd73 Would you disclose your real name in order to have proper copyright information? |
The way you did it now is fine with me. |
It may be coincidence, but the compression ratios improve slightly, consistent with the observation at nigeltao/qoi2-bikeshed#34 (comment) It is also more SIMD friendly (specifically, it matches SSE2's _mm_avg_epu8 semantics), which future commits will explore. After 0.2474 CmpRatio 160.08 EncMPixels/s 178.46 DecMPixels/s Before 0.2489 CmpRatio 167.12 EncMPixels/s 189.10 DecMPixels/s
My QOIR repo is a work-in-progress, but FYI when I change the predictor from (left) to ((1+left+above)/2),
|
If you use a union so you can access the four uint8 bytes of a pixel also as a 32 bit integer value, you can use SWAR (SIMD within a register). It's a bit of an in between solution that is more portable.
In this case if P0 is one pixel represented as a 32 bit integer value and P1 is another pixel, QOI_PAVGB returns a 32 bit integer where each byte is the average of the specific byte of the two pixels (rounded up as preferred). |
For the luma operations, I tried to use the average of the previous pixel and the pixel above the current one to compare against instead of comparing to the previous pixel alone. It resulted in significant better compression.
Together with the other tags I chose, I was able to get the compression numbers below.
qoi2avg.h.txt
The text was updated successfully, but these errors were encountered: