Diff from average instead of previous pixel #34

wbd73 · 2022-01-05T11:05:59Z

For the luma operations, I tried to use the average of the previous pixel and the pixel above the current one to compare against instead of comparing to the previous pixel alone. It resulted in significant better compression.
Together with the other tags I chose, I was able to get the compression numbers below.

# Grand total for images
        decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi:          2.6         3.0        177.46        155.06       463   28.2%
qoi2avg:      2.9         3.7        162.71        125.88       405   24.7%

# Grand total for images-lance
        decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi:         17.2        15.1        236.14        268.34      2109   13.6%
qoi2avg      17.2        18.7        236.05        217.03      1709   11.0%

qoi2avg.h.txt

The text was updated successfully, but these errors were encountered:

chocolate42 · 2022-01-05T14:05:51Z

Impressive compression numbers from a relatively simple change. There's a slight change in opcodes but roughly speaking most of the difference from the 420 of your previous variant to the 405 here is from diffing the average?

oscardssmith · 2022-01-05T14:20:30Z

The main downside of this is that it doesn't work well with a streaming encoder/decoder..

wbd73 · 2022-01-05T14:21:39Z

The other changes from opcodes are there only for better handling of images with an alpha channel at the cost of turning luma686 into luma676 which made the compression a bit worse for 'images' and better for 'images-lance'.
I believe 'images' was at 422 kb with the changes for the alpha channel and dropped to 405 kb by taking the diff from the average instead of the previous pixel.
So it was indeed a significant drop with a relative simple change.

The main downside would indeed be for a streaming encoder / decoder because you have to revisit previous pixels.
When encoding / decoding from buffer to buffer it's easy because previous pixels are easy accessible.
For streaming you would have to keep a cache of the previous line of pixels.

chocolate42 · 2022-01-05T14:36:03Z

A streaming encoder's memory requirements relying on the width of the image is the biggest issue IMO, I wonder if there's a way to have a well-performing fixed-size previous-line cache. Alternatively it would probably be good enough to set a max size of the previous-line cache to something like 4096 pixels, that way 99% of input benefits fully from averaging and the rest can at least benefit from averaging for the first 4096 pixels in every row (OTOH, it may not work in practice).

dumblob · 2022-01-05T14:39:49Z

There was also this tiling proposal - maybe just read the whole image in terms of a sequence of tiles instead of sequence of pixels would do the trick?

oscardssmith · 2022-01-05T14:44:38Z

that would work, but it is strictly harder for streaming encoders.

wbd73 · 2022-01-05T14:56:11Z

I wonder if there's a way to have a well-performing fixed-size previous-line cache. Alternatively it would probably be good enough to set a max size of the previous-line cache to something like 4096 pixels, that way 99% of input benefits fully from averaging and the rest can at least benefit from averaging for the first 4096 pixels in every row (OTOH, it may not work in practice).

Since you would need the pixel right above the current one you need to cache a full width of pixels.
The speed would probably acceptable since you could use a FIFO / ring buffer to cache the pixels.
You would need to allocate the memory for that so for an image with a resolution of 20,000,000 x 2 pixels it wouldn't be very useful but for the majority of images it would work fine.
An 8K image ( 7680x4320 pixels ) would require a cache of 22.5 KiB (caching alpha channel isn't required).

dumblob · 2022-01-05T15:44:54Z

that would work, but it is strictly harder for streaming encoders.

Is it? I mean, the simpliest solution (to ignore tile boundaries for optimization/compression and optimize/compress only each tile individually) could be viewed as a list of small unrelated pictures (of course some operations like dithering won't work this way without artifacts, but that's not what's being discussed in this thread, right?).

That sounds darn streaming-oriented and quite easy to implement (not that I'd have time to try it out now - it's just my thinking aloud).

chocolate42 · 2022-01-05T15:54:41Z

An 8K image ( 7680x4320 pixels ) would require a cache of 22.5 KiB (caching alpha channel isn't required).

My proposal fixes the cache to some size (lets say 8192 pixels which sounds reasonable as an upper bound of expected input size). This would mean the 7680x4320 image uses an averaged reference everywhere, as would a 7680x20,000,000 image. The 20,000,000 x 2 image would use averaging for the first 8192 pixels in each row, previous pixel for the rest where it has no upper reference. If it works it would allow averaging for 99% of input without screwing streaming encoders or forcing high memory requirements. The main drawback is introducing a condition for the boundary of high-width input. The branch predictor can probably be leaned on instead of complicating the implementation as for 99% of input a branch predictor should effectively nullify a branch that only ever goes one way(?).

wbd73 · 2022-01-05T18:03:20Z

The 20,000,000 x 2 image would use averaging for the first 8192 pixels in each row, previous pixel for the rest where it has no upper reference. If it works it would allow averaging for 99% of input without screwing streaming encoders or forcing high memory requirements. The main drawback is introducing a condition for the boundary of high-width input.

That should work.
With such an approach, it might make sense to use en outer loop for the height with an inner loop for the width instead of a single loop for all pixels.

wbd73 · 2022-01-06T06:53:22Z

I thought about the rounding of the average a bit more.
In my previous code I used (px1 + px2) >> 1 so a .5 value was rounded down.
Since the delta values can adjust down 1 more compared to up, it makes more sense to round in a normal way.
(px1 + px2 + 1) >> 1 so a .5 value is rounded up.
By making this small adjustment, 'images' drops from 405 kb to 404 and 'images-lance' from 1709 to 1706.
Not much of a difference but worth to change the rounding I think.
qoi2avg2.h.txt

nigeltao · 2022-01-09T05:01:49Z

Here's my numbers for 'regular' qoi2avg2.h.txt (which I've called qoi-reg-avg2). I also wrapped LZ4 around it (called qoi-lz4-avg2). For comparison, qoi-mas-c04 (commit c04a975 from the phoboslab/master branch, 2021-12-24) and qoi-lz4-d28 are from #28 (comment)

qoi-lz4-avg2 compresses better than libpng on the images test suite (with 2.5x faster decode, 20x faster encode)! Performance could probably be optimized, too.

        decode ms   encode ms   decode mpps   encode mpps   size kb    rate

## icon_512/apps-preferences-desktop-locale.png size: 512x512
libpng:       5.1        23.6         51.08         11.09        25    2.5%
qoi-mas-c04:  1.9         1.8        141.22        148.71       186   18.2%
qoi-reg-avg2: 2.1         2.4        127.80        110.18       171   16.8%
qoi-lz4-d28:  1.6         1.7        162.76        152.28        27    2.7%
qoi-lz4-avg2: 2.0         2.4        131.81        107.72        38    3.7%

## Total for images/icon_512
libpng:       4.9        33.7         52.98          7.78        51    5.0%
qoi-mas-c04:  1.4         1.4        182.31        182.77        85    8.4%
qoi-reg-avg2: 1.5         1.8        180.13        142.30        76    7.5%
qoi-lz4-d28:  0.9         1.6        289.36        163.47        50    4.9%
qoi-lz4-avg2: 1.5         2.0        175.10        128.33        55    5.4%

## Total for images/photo_kodak
libpng:      15.3       192.1         25.78          2.05       648   56.3%
qoi-mas-c04:  5.6         7.6         69.91         51.85       671   58.3%
qoi-reg-avg2: 7.3         9.6         53.84         41.08       600   52.2%
qoi-lz4-d28:  6.1         8.0         64.25         49.44       651   56.5%
qoi-lz4-avg2: 7.2         9.9         54.98         39.69       600   52.2%

## Total for images/screenshot_web
libpng:      92.6      1045.6         87.78          7.77      2402    7.6%
qoi-mas-c04: 50.1        47.1        162.16        172.58      2649    8.3%
qoi-reg-avg2:51.2        63.1        158.55        128.84      2302    7.3%
qoi-lz4-d28: 30.6        55.9        265.65        145.24      2080    6.6%
qoi-lz4-avg2:46.6        64.5        174.28        125.92      2070    6.5%

# Grand total for images
libpng:      13.5       153.7         34.44          3.02       398   24.2%
qoi-mas-c04:  4.7         5.6         99.25         82.46       463   28.2%
qoi-reg-avg2: 5.4         7.1         86.37         65.40       404   24.6%
qoi-lz4-d28:  4.2         6.0        110.38         77.11       416   25.4%
qoi-lz4-avg2: 5.4         7.5         85.54         61.86       391   23.8%

# Grand total for images-lance
libpng:     103.5       925.0         39.25          4.39      1395    9.0%
qoi-mas-c04: 35.2        32.5        115.26        124.91      2109   13.6%
qoi-reg-avg2:37.7        40.5        107.59        100.39      1706   11.0%
qoi-lz4-d28: 28.7        36.2        141.62        112.13      1530    9.8%
qoi-lz4-avg2:38.2        42.9        106.27         94.71      1590   10.2%

qoi-lz4-avg2.h.txt

Compile with:

gcc -I ../stb qoibench.c -llz4 -lpng -O3

p0nce · 2022-07-11T09:37:37Z

The Gamut library https://github.com/AuburnSounds/gamut provides both QOI format and "QOIX", an evolving format which is
currently qoi2avg.h from @wbd73 followed by LZ4 compression (ie, it is qoi-lz4-avg2 without the rounding fix).

@wbd73 Would you disclose your real name in order to have proper copyright information?

wbd73 · 2022-07-12T13:09:21Z

@wbd73 Would you disclose your real name in order to have proper copyright information?

The way you did it now is fine with me.
Some ideas (luma232 with a different bias and using the average of two pixels as a prediction) came from me but using a FIFO for the color index was someone else idea.

It may be coincidence, but the compression ratios improve slightly, consistent with the observation at nigeltao/qoi2-bikeshed#34 (comment) It is also more SIMD friendly (specifically, it matches SSE2's _mm_avg_epu8 semantics), which future commits will explore. After 0.2474 CmpRatio 160.08 EncMPixels/s 178.46 DecMPixels/s Before 0.2489 CmpRatio 167.12 EncMPixels/s 189.10 DecMPixels/s

nigeltao · 2022-07-13T13:18:29Z

My QOIR repo is a work-in-progress, but FYI when I change the predictor from (left) to ((1+left+above)/2),

I can reproduce the slow-down in encode and decode speed, but
I can claw almost all of it back by using SIMD SSE2 (encode, decode), but
SIMD is obviously less portable.

wbd73 · 2022-07-13T13:51:38Z

I can claw almost all of it back by using SIMD SSE2 (encode, decode), but

SIMD is obviously less portable.

If you use a union so you can access the four uint8 bytes of a pixel also as a 32 bit integer value, you can use SWAR (SIMD within a register). It's a bit of an in between solution that is more portable.

/* SWAR */
#define QOI_PAVGB(P0, P1) ((P0 | P1) - (((P0 ^ P1) >> 1) & 0x7f7f7f7f))
#define QOI_PSUBB(P0, P1) ((P0 | 0x80808080) - (P1 & 0x7f7f7f7f)) ^ ((P0 ^ ~P1) & 0x80808080)

In this case if P0 is one pixel represented as a 32 bit integer value and P1 is another pixel, QOI_PAVGB returns a 32 bit integer where each byte is the average of the specific byte of the two pixels (rounded up as preferred).
QOI_PSUBB works in a similar way but returns the difference.

chocolate42 mentioned this issue Jan 7, 2022

QOIP format #31

Open

nigeltao mentioned this issue Jan 11, 2022

Ignore checksums more often richgel999/fpng#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diff from average instead of previous pixel #34

Diff from average instead of previous pixel #34

wbd73 commented Jan 5, 2022

chocolate42 commented Jan 5, 2022

oscardssmith commented Jan 5, 2022

wbd73 commented Jan 5, 2022 •

edited

Loading

chocolate42 commented Jan 5, 2022

dumblob commented Jan 5, 2022

oscardssmith commented Jan 5, 2022

wbd73 commented Jan 5, 2022

dumblob commented Jan 5, 2022 •

edited

Loading

chocolate42 commented Jan 5, 2022

wbd73 commented Jan 5, 2022

wbd73 commented Jan 6, 2022

nigeltao commented Jan 9, 2022 •

edited

Loading

p0nce commented Jul 11, 2022 •

edited

Loading

wbd73 commented Jul 12, 2022 •

edited

Loading

nigeltao commented Jul 13, 2022

wbd73 commented Jul 13, 2022 •

edited

Loading

Diff from average instead of previous pixel #34

Diff from average instead of previous pixel #34

Comments

wbd73 commented Jan 5, 2022

chocolate42 commented Jan 5, 2022

oscardssmith commented Jan 5, 2022

wbd73 commented Jan 5, 2022 • edited Loading

chocolate42 commented Jan 5, 2022

dumblob commented Jan 5, 2022

oscardssmith commented Jan 5, 2022

wbd73 commented Jan 5, 2022

dumblob commented Jan 5, 2022 • edited Loading

chocolate42 commented Jan 5, 2022

wbd73 commented Jan 5, 2022

wbd73 commented Jan 6, 2022

nigeltao commented Jan 9, 2022 • edited Loading

p0nce commented Jul 11, 2022 • edited Loading

wbd73 commented Jul 12, 2022 • edited Loading

nigeltao commented Jul 13, 2022

wbd73 commented Jul 13, 2022 • edited Loading

wbd73 commented Jan 5, 2022 •

edited

Loading

dumblob commented Jan 5, 2022 •

edited

Loading

nigeltao commented Jan 9, 2022 •

edited

Loading

p0nce commented Jul 11, 2022 •

edited

Loading

wbd73 commented Jul 12, 2022 •

edited

Loading

wbd73 commented Jul 13, 2022 •

edited

Loading