Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QOI_DELTA Proof of Concept #23

Open
chocolate42 opened this issue Dec 5, 2021 · 18 comments
Open

QOI_DELTA Proof of Concept #23

chocolate42 opened this issue Dec 5, 2021 · 18 comments

Comments

@chocolate42
Copy link

QOI_DELTA might be a viable replacement for QOI_DIFF_8. It can store -1..1 for each of rgb compared to QOI_DIFF_8 which stores -2..1. Narrowing the range allows it to be packed into 5 bits with a 3 bit tag. There's 5 unused values which can be used as 8 bit tags. Assuming special-casing RGB and RGBA as a last-resort encoding instead of the wasteful QOI_COLOR op is the way forwards, we will probably need some space for 8 bit tags so this is convenient. In the example below they've been filled with QOI_COLOR_RGB, QOI_COLOR_RGBA, QOI_COLOR_A, QOI_RUN_16, QOI_RUN_24 (QOI_COLOR_A doesn't seem very useful). The example hasn't been tuned for efficiency or compression and some of the other opcode choices are questionable, this is a WIP that is being iterated on and only meant as an example of QOI_DELTA.

// opcodes
#define QOI_INDEX      0x80 // 1xxxxxxx i7
#define QOI_DIFF_16    0x40 // 01rrrrrg ggggbbbb : r5g5b4
#define QOI_DELTA      0x20 // 001xxxxx (where x is in the range 0..26)
#define QOI_DIFF_24    0x10 // 0001xxxx
#define QOI_RUN_8      0x00 // 0000xxxx
// special 8 bit opcodes stored in the upper range of delta
#define QOI_COLOR_RGB  0x3b // Following 3 bytes RGB
#define QOI_COLOR_RGBA 0x3c // Following 4 bytes RGBA
#define QOI_COLOR_A    0x3d // Following byte A
#define QOI_RUN_16     0x3e // 8 bit run follows
#define QOI_RUN_24     0x3f // 16 bit run follows

#define QOI_RLE_1      0x2d // run length 1, actually a delta op
#define QOI_RUN_0_MAXLEN 1 //max run encodable without an explicit run op
#define QOI_RUN_8_MAXLEN (QOI_RUN_0_MAXLEN+16)
#define QOI_RUN_16_MAXLEN (QOI_RUN_8_MAXLEN+256)
#define QOI_RUN_24_MAXLEN (QOI_RUN_16_MAXLEN+65536)

op count: 265215890
QOI_DIFF_16=30.78%, QOI_INDEX=30.22%, QOI_DELTA=20.50%, QOI_COLOR_RGB=6.53%, QOI_RUN_8=6.46%, QOI_DIFF_24=4.74%, QOI_RUN_16=0.60%, QOI_COLOR_RGBA=0.12%, QOI_COLOR_24=0.03%, QOI_COLOR_A=0.01%

## Benchmarking ../qoitests/images/tango512/*.png -- 1 runs
## Totals (AVG) size: 0x0
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:       3.1        20.9         83.95         12.57        51
stbi:         2.1        21.0        122.62         12.46        69
qoi:          0.7         0.9        367.64        284.78        77

## Benchmarking ../qoitests/images/kodak/*.png -- 1 runs
## Totals (AVG) size: 0x0
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:       9.9       184.7         39.55          2.13       717
stbi:        10.2       103.0         38.48          3.82       979
qoi:          3.4         5.5        116.04         71.56       777

## Benchmarking ../qoitests/images/misc/*.png -- 1 runs
## Totals (AVG) size: 0x0
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:      12.4       102.7         71.71          8.66       283
stbi:        10.5        96.9         84.60          9.18       415
qoi:          2.7         3.9        330.06        230.68       402

## Benchmarking ../qoitests/images/textures/*.png -- 1 runs
## Totals (AVG) size: 0x0
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:       3.0        42.7         42.92          3.04       163
stbi:         2.7        23.5         47.49          5.53       232
qoi:          0.7         1.2        175.07        111.13       176

## Benchmarking ../qoitests/images/screenshots/*.png -- 1 runs
## Totals (AVG) size: 0x0
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:      58.6       632.4        140.56         13.02      2219
stbi:        52.4       774.3        157.16         10.63      2821
qoi:         25.3        30.4        324.87        270.37      2509

## Benchmarking ../qoitests/images/wallpaper/*.png -- 1 runs
## Totals (AVG) size: 0x0
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:     194.9      2938.7         48.09          3.19      9224
stbi:       221.8      1794.5         42.26          5.22     13299
qoi:         75.8        97.6        123.67         95.99     10919
@chocolate42
Copy link
Author

New iteration that's more competitive wrt compression. The code is a spiders web of ifdef's for prototyping but it's attached anyway incase anyone wants to bench this snapshot, WIP. A highlight is a GDIFF_16 definition with a 2 bit tag, the extra bits are spent on red and blue which seems to be strong and it makes the encoding neater. DIFF_24 looks like the next opcode for the chopping block.

// opcodes
#define QOI_1_INDEX_8       0x80 // 1xxxxxxx i7
#define QOI_2_GDIFF_16      0x40 // 01gggggg rrrrbbbb : (464 config)
                                 // Like vanilla GDIFF_16 but r and b extended to 4 bits
#define QOI_3_DELTA_8       0x20 // 001xxxxx (where x is in the range 0..26)
#define QOI_4_DIFF_24       0x10 // 0001xxxx

#define QOI_4_RUN_8         0x00 // 0000xxxx, rle 2..17
// special 8 bit opcodes
#define QOI_8_COLOR_RGB_32  0x3b // Change rgb, penultimate resort
#define QOI_8_COLOR_RGBA_40 0x3c // Change everything, last resort
#define QOI_8_LINEAR_RGB_16 0x3d // Apply diff to all rgb linearly.
                                 //used ~2.19% with 2_DIFF_16, ~0.32% with 2_GDIFF_16
#define QOI_8_RUN_16        0x3e // 8 bit run follows
#define QOI_8_RUN_24        0x3f // 16 bit run follows

op count: 265215890, 0x00=6.46%, 0x10=1.82%, 0x20=20.50%, 0x3B=3.47%, 0x3C=0.13%, 0x3D=0.32%, 0x3E=0.60%, 0x3F=0.03%, 0x40=36.44%, 0x80=30.22%,

## Benchmarking ../qoitests/images/tango512/*.png -- 1 runs
## Totals (AVG)
               decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:              3.3        21.5         80.57         12.21        51
stbi:                2.2        21.6        117.17         12.11        69
qoi-master:          0.7         1.0        352.33        265.32        80
qoi-experi:          0.8         1.0        311.68        250.70        79
qoi-demo10:          0.6         0.8        469.12        325.10        76
qoi-delta4:          0.7         1.1        366.57        244.10        75

## Benchmarking ../qoitests/images/kodak/*.png -- 1 runs
## Totals (AVG)
               decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:              9.9       183.7         39.67          2.14       717
stbi:               10.3       101.0         38.29          3.89       979
qoi-master:          3.6         5.1        108.28         77.29       771
qoi-experi:          3.7         6.1        105.94         64.60       700
qoi-demo10:          2.8         4.5        139.33         87.43       772
qoi-delta4:          3.0         6.0        130.90         65.63       689

## Benchmarking ../qoitests/images/misc/*.png -- 1 runs
## Totals (AVG)
               decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:             12.3       103.5         72.48          8.59       283
stbi:               10.9        97.8         81.52          9.08       415
qoi-master:          2.9         3.9        310.30        230.20       400
qoi-experi:          3.2         4.5        279.60        199.24       407
qoi-demo10:          1.8         3.1        483.77        288.60       400
qoi-delta4:          2.7         4.5        330.34        196.80       404

## Benchmarking ../qoitests/images/textures/*.png -- 1 runs
## Totals (AVG)
               decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:              3.0        41.9         42.92          3.10       163
stbi:                2.8        23.5         46.77          5.54       232
qoi-master:          0.9         1.2        149.84        108.77       184
qoi-experi:          0.9         1.5        141.97         88.26       179
qoi-demo10:          0.6         1.0        207.02        129.05       180
qoi-delta4:          0.8         1.4        173.04         94.66       169

## Benchmarking ../qoitests/images/screenshots/*.png -- 1 runs
## Totals (AVG)
               decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:             57.3       616.9        143.59         13.34      2219
stbi:               51.2       748.9        160.61         10.99      2821
qoi-master:         27.8        29.9        296.21        274.93      2582
qoi-experi:         32.1        33.8        256.12        243.65      2491
qoi-demo10:         19.9        25.8        414.44        318.81      2481
qoi-delta4:         24.5        33.8        336.35        243.25      2292

## Benchmarking ../qoitests/images/wallpaper/*.png -- 1 runs
## Totals (AVG)
               decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:            192.4      2943.3         48.71          3.18      9224
stbi:              223.4      1800.2         41.95          5.21     13299
qoi-master:         81.4        98.1        115.12         95.53     10640
qoi-experi:         80.9       104.3        115.86         89.88     10170
qoi-demo10:         76.8        89.2        121.97        105.07     10669
qoi-delta4:         72.9       108.3        128.57         86.56     10346

qoi-delta4.h.txt

@oscardssmith
Copy link

what is QOI_8_LINEAR_RGB_16? I don't understand the description.

@chocolate42
Copy link
Author

When the diff of R G and B are the same this tag can be used followed by the diff byte that's applied to all three channels, maybe lockstep would be a better name. Helps with long-range matches, may particularly help with black and white images stored as RGB (TBD), less effective when used alongside GDIFF_16 because GDIFF_16 is pretty good at longer-range matches itself. The first number of the above definitions is the length of the tag, the second is the length of the encoding.

@nigeltao
Copy link
Owner

nigeltao commented Dec 5, 2021

Interesting experiment! A couple of (not necessarily good) ideas:

  • How does a r3g8b3 QOI_DIFF_16 look, instead of r4g6b4?
  • We could replace QOI_8_LINEAR_RGB_16 with some sort of "mode toggle". One example could be switching QOI_DIFF_24 between r5g5b5a5 and r7g7b6.

@MrSmile
Copy link

MrSmile commented Dec 6, 2021

There's 5 unused values

I had a similar idea, but with those values (plus center zero) used for permutations of (±2, 0, 0). If it's reusing, better to reuse values from larger diff which coincide with smaller.

@chocolate42
Copy link
Author

We could replace QOI_8_LINEAR_RGB_16 with some sort of "mode toggle". One example could be switching QOI_DIFF_24 between r5g5b5a5 and r7g7b6.

That is interesting, basically a conditional >8 bit tag when needed (the condition would be expensive particularly when switching but it's worth exploring). One massive positive would be to RGB input when one op deals with alpha and the other doesn't (ie for the cost of one byte you free up bits otherwise wasted on handling alpha that will never come, no need to handle RGB/RGBA differently it'll sort itself out).

How does a r3g8b3 QOI_DIFF_16 look, instead of r4g6b4?

Changing QOI_GDIFF_16 to QOI_GDIFF2_16 (r3g8b3) with no other changes does this:

op count: 265215890, total size: 417281211, 0x00=6.46%, 0x10=6.81%, 0x20=20.50%, 0x3B=3.64%, 0x3C=0.13%, 0x3E=0.60%, 0x3F=0.03%, 0x40=31.60%, 0x80=30.22%,

## Benchmarking ../qoitests/images/kodak/*.png -- 1 runs
## Totals (AVG) size: 0x0
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:             10.6       192.2         37.20          2.05       717
qoi-deltaX:          3.3         6.4        117.56         61.47       716
qoi-delta4:          2.9         6.1        134.35         64.32       689

## Benchmarking ../qoitests/images/misc/*.png -- 1 runs
## Totals (AVG) size: 0x0
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:             12.9       110.0         68.85          8.08       283
qoi-deltaX:          2.9         4.7        301.81        189.91       410
qoi-delta4:          2.7         4.9        325.49        182.99       404

## Benchmarking ../qoitests/images/textures/*.png -- 1 runs
## Totals (AVG) size: 0x0
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:              3.2        44.9         40.16          2.89       163
qoi-deltaX:          0.9         1.6        147.58         79.80       187
qoi-delta4:          0.8         1.5        169.31         86.24       169

## Benchmarking ../qoitests/images/screenshots/*.png -- 1 runs
## Totals (AVG) size: 0x0
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:             61.8       665.6        133.10         12.37      2219
qoi-deltaX:         27.1        35.8        303.92        229.62      2284
qoi-delta4:         27.8        37.3        296.24        220.66      2292

## Benchmarking ../qoitests/images/wallpaper/*.png -- 1 runs
## Totals (AVG) size: 0x0
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:            202.9      3092.6         46.19          3.03      9224
qoi-deltaX:         79.6       116.1        117.75         80.71     10719
qoi-delta4:         75.5       113.9        124.12         82.26     10346

It eliminates the need for QOI_8_LINEAR_RGB_16 and makes the encoding neater which should make an optimised version faster than QOI_GDIFF_16 all other things equal. However because compression takes a hit so do encode/decode times.

I had a similar idea, but with those values (plus center zero) used for permutations of (±2, 0, 0). If it's reusing, better to reuse values from larger diff which coincide with smaller.

If I'm understanding correctly this means that for example 5 of the 9 GB states could have an R of +2. Is there a way to detect those 5 cases without tanking encode time?

@MrSmile
Copy link

MrSmile commented Dec 6, 2021

If I'm understanding correctly this means that for example 5 of the 9 GB states could have an R of +2. Is there a way to detect those 5 cases without tanking encode time?

I mean, in addition to 26 {−1..1} states (excluding full zero), there is also 6 {±2,0,0}, {0, ±2,0}, {0,0,±2}. So if any two components is the same, third can differ by 2. It can be detected with abs(vr) + abs(vg) + abs(vb) <= 2.

@chocolate42
Copy link
Author

I implemented your version as QOI_3_DELTA2_8 (shown in build qoi-d2run5). The delta2 build uses an 8 value RUN_8 because space was needed for the RUN_16 etc flags that normally reside in delta1. Two delta1 builds are used for comparison, one also with an 8 value RUN_8 which delta2 should strictly beat (qoi-d2run5, 9 unused ops), and one with a 16 value RUN_8 that takes advantage of the compactness of delta1 (qoi-d1run4, 1 unused op because the code has been reworked since delta4 and it's no longer convenient to have 8 bit tags increase RUN_8's range). tl;dr d2run5 marginally beats d1run4, but in general it depends on the strength and packing of the rest of the opcodes.

//d2run5
#define QOI_1_INDEX_8       0x80 // 1
#define QOI_2_GDIFF_16      0x40 // 01
#define QOI_3_DELTA2_8      0x20 // 001
#define QOI_4_DIFF_24       0x10 // 0001
#define QOI_5_RUN_8         0x08 // 00001
#define QOI_SPECIAL         0x00 // 00000

#define QOI_8_COLOR_RGB_32  (QOI_SPECIAL + 0)
#define QOI_8_COLOR_RGBA_40 (QOI_SPECIAL + 1)
#define QOI_8_LOCKSTEP_RGB_16 (QOI_SPECIAL + 2) // When vr==vg==vb==diff, store diff
#define QOI_8_RUN_16        (QOI_SPECIAL + 3)
#define QOI_8_RUN_24        (QOI_SPECIAL + 4)
## Benchmarking ../qoitests/images/tango512/*.png -- 1 runs
## Totals (AVG) size: 0x0
               decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:              3.1        20.5         84.31         12.79        51
stbi:                2.1        20.6        124.05         12.75        69
qoi-d1run4:          0.7         1.0        385.60        275.35        76
qoi-d1run5:          0.7         1.0        395.23        275.31        78
qoi-d2run5:          0.7         1.1        382.00        238.39        77

## Benchmarking ../qoitests/images/kodak/*.png -- 1 runs
## Totals (AVG) size: 0x0
               decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:              9.6       178.4         40.89          2.20       717
stbi:               10.1       100.2         38.93          3.92       979
qoi-d1run4:          2.7         5.3        145.99         74.06       689
qoi-d1run5:          2.7         5.3        146.35         74.80       689
qoi-d2run5:          2.8         5.9        139.10         66.32       686

## Benchmarking ../qoitests/images/misc/*.png -- 1 runs
## Totals (AVG) size: 0x0
               decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:             12.0       101.6         74.30          8.74       283
stbi:               10.5        97.4         84.46          9.12       415
qoi-d1run4:          2.7         4.0        325.47        220.48       404
qoi-d1run5:          2.7         4.0        332.07        220.50       405
qoi-d2run5:          2.8         4.5        313.58        199.36       403

## Benchmarking ../qoitests/images/textures/*.png -- 1 runs
## Totals (AVG) size: 0x0
               decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:              3.0        41.8         42.92          3.10       163
stbi:                2.7        23.2         47.91          5.59       232
qoi-d1run4:          0.7         1.2        191.43        105.12       169
qoi-d1run5:          0.7         1.2        194.97        106.23       169
qoi-d2run5:          0.7         1.4        190.64         94.06       168

## Benchmarking ../qoitests/images/screenshots/*.png -- 1 runs
## Totals (AVG) size: 0x0
               decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:             57.2       613.6        144.00         13.42      2219
stbi:               50.7       748.0        162.27         11.00      2821
qoi-d1run4:         26.0        31.5        316.21        261.31      2293
qoi-d1run5:         25.6        31.6        321.62        260.61      2315
qoi-d2run5:         26.1        37.4        315.10        220.27      2307

## Benchmarking ../qoitests/images/wallpaper/*.png -- 1 runs
## Totals (AVG) size: 0x0
               decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:            189.0      2877.1         49.58          3.26      9224
stbi:              218.6      1769.0         42.88          5.30     13299
qoi-d1run4:         82.8       111.5        113.16         84.09     10348
qoi-d1run5:         71.0       100.2        132.00         93.54     10383
qoi-d2run5:         75.5       110.0        124.16         85.16     10290

qoi-delta5.h.txt

@MrSmile
Copy link

MrSmile commented Dec 7, 2021

So my intuition about value distribution is right. Maybe even better would be to use original 64 values and distribute them more symmetrically.

@chocolate42
Copy link
Author

Even better to take a leaf from GDIFF's green-based delta's by creating a variant that fits in 8 bits (see delta7). delta6 stores vr vg vb in a distribution skewed towards green (vr=-2..1, vg=-3..3, vb=-2..1), delta7 is similar except it stores r and b's difference from green giving green much more weight again:

delta6
#define QOI_OP_DELTA   0x00 // 0 112 values, vr=-2..1, vg=-3..3, vb=-2..1
#define QOI_OP_RUN     0x70 // 0111 16 value RUN_8
#define QOI_OP_LUMA    0x80 // 10 GDIFF_16
#define QOI_OP_INDEX   0xc0 // 11 61 value index array
#define QOI_OP_RGB     0xfd // 11111101
#define QOI_OP_RUN16   0xfe // 11111110
#define QOI_OP_RGBA    0xff // 11111111
// QOI_OP_RUN resides in QOI_OP_DELTA, care is needed on decode
// Likewise RGB/RUN16/RGBA tags are at the end of QOI_OP_INDEX
delta7
#define QOI_OP_GDELTA  0x00 // 0 112 values, vg_r=-2..1, vg=-3..3, vg_b=-2..1
#define QOI_OP_RUN     0x70 // 0111
#define QOI_OP_LUMA    0x80 // 10
#define QOI_OP_INDEX   0xc0 // 11
#define QOI_OP_RGB     0xfd // 11111101
#define QOI_OP_RUN16   0xfe // 11111110
#define QOI_OP_RGBA    0xff // 11111111
// QOI_OP_RUN resides in QOI_OP_GDELTA, care is needed on decode
// Likewise RGB/RUN16/RGBA tags are at the end of QOI_OP_INDEX
## Benchmarking ../qoi_benchmark_suite/images/textures_pk01/*.png -- 5 runs
## Total for ../qoi_benchmark_suite/images/textures_pk01
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:          3.1        41.6         42.55          3.12       163   32.3%
stbi:            2.9        25.2         45.21          5.16       232   45.8%
qoi-demo10:      0.7         1.0        199.59        124.46       180   35.6%
qoi-exluma:      0.8         1.1        164.40        119.22       178   35.2%
qoi-delta4:      0.8         1.4        172.62         93.96       169   33.4%
qoi-delta6:      0.7         1.1        183.65        119.75       177   35.0%
qoi-delta7:      0.8         1.2        170.02        108.20       177   34.9%

## Benchmarking ../qoi_benchmark_suite/images/screenshot_game/*.png -- 5 runs
## Total for ../qoi_benchmark_suite/images/screenshot_game
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:         11.6       138.8         54.37          4.56       448   18.1%
stbi:           12.4       104.9         51.23          6.03       634   25.7%
qoi-demo10:      2.7         3.9        233.55        160.55       535   21.7%
qoi-exluma:      3.3         4.1        189.72        154.68       519   21.0%
qoi-delta4:      3.2         5.2        198.14        121.20       524   21.2%
qoi-delta6:      2.8         3.9        222.36        162.22       513   20.8%
qoi-delta7:      3.3         4.5        190.44        139.75       501   20.3%

## Benchmarking ../qoi_benchmark_suite/images/textures_photo/*.png -- 5 runs
## Total for ../qoi_benchmark_suite/images/textures_photo
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:         25.3       461.0         41.41          2.27      1977   48.3%
stbi:           28.5       263.3         36.78          3.98      2554   62.4%
qoi-demo10:      6.8        11.4        153.24         92.37      2506   61.2%
qoi-exluma:      8.7        11.0        120.73         95.34      1981   48.4%
qoi-delta4:      7.7        15.1        135.95         69.34      1956   47.8%
qoi-delta6:      6.9         9.8        151.67        107.07      1983   48.4%
qoi-delta7:      8.2        12.2        128.45         85.74      1920   46.9%

## Benchmarking ../qoi_benchmark_suite/images/photo_wikipedia/*.png -- 5 runs
## Total for ../qoi_benchmark_suite/images/photo_wikipedia
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:         27.8       567.6         39.03          1.91      2046   48.3%
stbi:           33.8       325.5         32.09          3.33      2893   68.3%
qoi-demo10:      6.6        12.3        165.04         88.35      2289   54.0%
qoi-exluma:      9.0        13.3        120.59         81.78      2102   49.6%
qoi-delta4:      8.5        17.4        128.33         62.21      2135   50.4%
qoi-delta6:      7.7        11.8        140.71         91.83      2077   49.0%
qoi-delta7:      8.5        13.4        127.58         80.85      2027   47.9%

## Benchmarking ../qoi_benchmark_suite/images/textures_pk/*.png -- 5 runs
## Total for ../qoi_benchmark_suite/images/textures_pk
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:          0.6        16.9         76.10          2.63        89   51.5%
stbi:            0.5        11.8         97.52          3.78       121   70.0%
qoi-demo10:      0.4         0.5        126.42         94.64        78   45.1%
qoi-exluma:      0.4         0.5        113.51         84.33        75   43.5%
qoi-delta4:      0.3         0.6        129.42         74.17        70   40.4%
qoi-delta6:      0.3         0.5        147.18         95.13        75   43.1%
qoi-delta7:      0.3         0.5        133.47         86.62        74   43.0%

## Benchmarking ../qoi_benchmark_suite/images/screenshot_web/*.png -- 5 runs
## Total for ../qoi_benchmark_suite/images/screenshot_web
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:         63.7       677.5        127.52         11.99      2402    7.6%
stbi:           59.7       899.0        136.10          9.04      3076    9.7%
qoi-demo10:     20.1        28.0        404.23        290.16      2680    8.4%
qoi-exluma:     29.5        28.8        275.56        281.73      2649    8.3%
qoi-delta4:     26.8        34.9        302.85        232.67      2469    7.8%
qoi-delta6:     23.6        26.5        344.90        306.34      2638    8.3%
qoi-delta7:     24.5        28.3        331.63        287.51      2592    8.2%

## Benchmarking ../qoi_benchmark_suite/images/icon_64/*.png -- 5 runs
## Total for ../qoi_benchmark_suite/images/icon_64
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:          0.1         0.6         34.90          6.78         3   23.6%
stbi:            0.1         0.6         45.75          6.71         4   27.9%
qoi-demo10:      0.0         0.0        303.54        186.71         4   27.6%
qoi-exluma:      0.0         0.0        246.36        205.12         5   31.3%
qoi-delta4:      0.0         0.0        232.79        137.14         4   26.5%
qoi-delta6:      0.0         0.0        301.49        207.53         4   28.9%
qoi-delta7:      0.0         0.0        288.41        192.03         4   28.6%

## Benchmarking ../qoi_benchmark_suite/images/textures_pk02/*.png -- 5 runs
## Total for ../qoi_benchmark_suite/images/textures_pk02
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:          7.7       132.3         39.51          2.30       427   36.1%
stbi:            7.6        70.9         39.98          4.28       623   52.5%
qoi-demo10:      1.8         2.7        164.41        112.85       492   41.5%
qoi-exluma:      2.2         2.8        139.27        109.28       479   40.4%
qoi-delta4:      2.1         3.7        144.76         83.22       475   40.1%
qoi-delta6:      1.9         2.7        158.67        111.74       475   40.1%
qoi-delta7:      2.1         3.0        147.71        100.64       472   39.8%

## Benchmarking ../qoi_benchmark_suite/images/photo_kodak/*.png -- 5 runs
## Total for ../qoi_benchmark_suite/images/photo_kodak
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:         10.1       191.3         38.76          2.06       717   46.7%
stbi:           10.7       112.9         36.59          3.48       979   63.8%
qoi-demo10:      2.8         4.4        141.33         88.58       772   50.3%
qoi-exluma:      3.0         4.2        129.04         93.76       671   43.7%
qoi-delta4:      2.9         5.8        136.35         67.23       689   44.9%
qoi-delta6:      2.8         4.1        138.76         96.08       667   43.5%
qoi-delta7:      3.3         5.1        118.55         76.94       650   42.4%

## Benchmarking ../qoi_benchmark_suite/images/photo_tecnick/*.png -- 5 runs
## Total for ../qoi_benchmark_suite/images/photo_tecnick
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:         29.9       767.3         48.18          1.88      2414   42.9%
stbi:           35.6       429.3         40.44          3.35      3533   62.8%
qoi-demo10:      8.8        15.9        163.63         90.49      2737   48.7%
qoi-exluma:     11.4        16.5        126.87         87.48      2527   44.9%
qoi-delta4:     11.0        22.1        130.85         65.16      2596   46.2%
qoi-delta6:     11.2        16.8        128.59         85.62      2489   44.3%
qoi-delta7:     11.3        17.4        127.84         82.85      2423   43.1%

## Benchmarking ../qoi_benchmark_suite/images/textures_plants/*.png -- 5 runs
## Total for ../qoi_benchmark_suite/images/textures_plants
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:         20.6       222.7         51.53          4.78       857   20.6%
stbi:           18.5       173.5         57.36          6.13      1191   28.7%
qoi-demo10:      2.9         6.2        365.23        172.32       957   23.0%
qoi-exluma:      4.3         6.1        245.88        175.50       922   22.2%
qoi-delta4:      4.1         8.4        257.48        126.98       907   21.8%
qoi-delta6:      4.1         6.2        260.79        170.39       907   21.8%
qoi-delta7:      3.8         6.4        282.93        166.72       896   21.6%

## Benchmarking ../qoi_benchmark_suite/images/pngimg/*.png -- 5 runs
## Total for ../qoi_benchmark_suite/images/pngimg
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:         36.9       350.7         48.96          5.16      1201   17.0%
stbi:           35.9       289.5         50.38          6.25      1751   24.8%
qoi-demo10:      5.9        10.4        306.23        174.24      1429   20.2%
qoi-exluma:      8.1        10.4        222.53        173.15      1445   20.5%
qoi-delta4:      7.9        13.9        227.92        130.11      1385   19.6%
qoi-delta6:      6.9         9.9        262.58        182.17      1420   20.1%
qoi-delta7:      7.4        10.8        245.53        167.93      1398   19.8%

## Benchmarking ../qoi_benchmark_suite/images/icon_512/*.png -- 5 runs
## Total for ../qoi_benchmark_suite/images/icon_512
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:          3.4        22.5         76.39         11.65        51    5.0%
stbi:            2.4        25.5        109.96         10.28        69    6.8%
qoi-demo10:      0.5         0.8        484.79        320.30        76    7.5%
qoi-exluma:      0.7         0.8        360.62        327.88       102   10.1%
qoi-delta4:      0.7         1.0        382.26        259.73        75    7.4%
qoi-delta6:      0.6         0.8        455.26        334.75        86    8.4%
qoi-delta7:      0.6         0.8        452.36        315.43        85    8.4%

# Grand total for ../qoi_benchmark_suite/images/
           decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:          8.7       123.0         53.24          3.77       423   23.3%
stbi:            8.9        86.3         51.97          5.38       601   33.2%
qoi-demo10:      2.0         3.2        235.23        145.79       484   26.7%
qoi-exluma:      2.5         3.3        182.84        141.44       465   25.7%
qoi-delta4:      2.4         4.3        191.79        108.69       460   25.4%
qoi-delta6:      2.2         3.1        210.36        147.37       458   25.3%
qoi-delta7:      2.4         3.5        193.94        133.58       450   24.8%

delta6/7 have been mildly optimised by re-ordering the if else chains. delta7 takes notably longer than delta6 I think purely because GDELTA needs to generate vg_r andvg_b, not just LUMA (aka the compiler must have automatically moved vg_r and vg_b generation into LUMA for delta6).

Averaging less than 1 byte per pixel is a neat trick. Thank you RLE ;)

qoi-delta7.h.txt

@nigeltao
Copy link
Owner

For delta7's QOI_OP_GDELTA encoding:

#define QOI_OP_GDELTA  0x00 // 0 112 values, vg_r=-2..1, vg=-3..3, vg_b=-2..1
...
bytes[p++] = QOI_OP_GDELTA | (((vg_b + 2) * 28) + ((vg_r + 2) * 7) + (vg + 3));

Changing that encoding line to:

bytes[p++] = QOI_OP_GDELTA | (((vg + 3) << 4) + ((vg_r + 2) << 2) + (vg_b + 2));

would make it easier to visually inspect the bytecode (in hex or binary): the 8-bit op becomes 0gggrrbb. It also lets you replace divisions (the % and /= operators below) with possibly-faster shifts and masks in the decode:

int vg = ((b1 % 7) - 3);
b1/=7;
px.rgba.r += vg - 2 + (b1 & 0x03);
b1/=4;
px.rgba.g += vg;
px.rgba.b += vg - 2 + (b1 & 0x03);

@chocolate42
Copy link
Author

You're right, because vg_r and vg_b are powers of two it could be done with shifts, I discounted it because RUN8 takes up the last 16 slots and I thought not using ANS would scatter them (making RUN8 handling 16 special cases instead of one MASK_4). I was wrong, by putting green first RUN8 is not scattered. One small refinement is to do the following to put RUN8 first, this way RUN8 doesn't bisect GDELTA.

bytes[p++] = QOI_OP_GDELTA | (((vg + 4) << 4) + ((vg_r + 2) << 2) + (vg_b + 2));

The encoding should be a nice speedup, I'm going to try it today. Divide by 4 is already a right shift (which is probably why delta7 still has decent speed), but doing grb allows the divide by 7 to be eliminated which may be big.

@chocolate42
Copy link
Author

The results are in. Tried two variants of delta8 out of curiosity with slightly different encode logic (same format):

8a
bytes[p++] = QOI_OP_GDELTA | (((vg + 4) << 4) + ((vg_r + 2) << 2) + (vg_b + 2));
8b
bytes[p++] = QOI_OP_GDELTA | ((vg + 4) << 4) | ((vg_r + 2) << 2) | (vg_b + 2);

Surely the compiler is smart enough to range check and see they're equivalent, especially when using -O3 as always?

## Total for ../qoi_benchmark_suite/images/textures_pk01
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:    0.7170      1.1415        181.19        113.80       177   34.9%
qoi-delta8a:   0.7235      1.0900        179.57        119.19       177   34.9%
qoi-delta8b:   0.7259      0.9599        178.97        135.33       177   34.9%
## Total for ../qoi_benchmark_suite/images/screenshot_game
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:    3.0539      4.1659        207.28        151.95       501   20.3%
qoi-delta8a:   3.1623      3.9449        200.18        160.47       501   20.3%
qoi-delta8b:   3.1456      3.4471        201.24        183.64       501   20.3%
## Total for ../qoi_benchmark_suite/images/textures_photo
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:    7.7406     11.6562        135.47         89.96      1920   46.9%
qoi-delta8a:   7.7053     11.5322        136.09         90.93      1920   46.9%
qoi-delta8b:   7.6953      9.9764        136.26        105.11      1920   46.9%
## Total for ../qoi_benchmark_suite/images/photo_wikipedia
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:    8.5098     13.3810        127.44         81.05      2027   47.9%
qoi-delta8a:   8.3517     12.9374        129.85         83.83      2027   47.9%
qoi-delta8b:   8.3494     11.3034        129.89         95.94      2027   47.9%
## Total for ../qoi_benchmark_suite/images/textures_pk
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:    0.3314      0.5125        134.31         86.85        74   43.0%
qoi-delta8a:   0.3239      0.4975        137.43         89.47        74   43.0%
qoi-delta8b:   0.3234      0.4410        137.62        100.92        74   43.0%
## Total for ../qoi_benchmark_suite/images/screenshot_web
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:   23.4665     27.9811        346.26        290.39      2592    8.2%
qoi-delta8a:  27.0183     25.4473        300.74        319.31      2592    8.2%
qoi-delta8b:  27.2402     22.0119        298.29        369.14      2592    8.2%
## Total for ../qoi_benchmark_suite/images/icon_64
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:    0.0129      0.0186        317.27        220.24         4   28.6%
qoi-delta8a:   0.0149      0.0183        274.53        224.12         4   28.6%
qoi-delta8b:   0.0151      0.0146        271.83        281.01         4   28.6%
## Total for ../qoi_benchmark_suite/images/textures_pk02
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:    2.0237      2.9700        150.11        102.28       472   39.8%
qoi-delta8a:   2.0042      2.8525        151.57        106.49       472   39.8%
qoi-delta8b:   1.9885      2.4890        152.76        122.04       472   39.8%
## Total for ../qoi_benchmark_suite/images/photo_kodak
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:    3.1438      4.8936        125.07         80.35       650   42.4%
qoi-delta8a:   3.1345      4.7418        125.45         82.93       650   42.4%
qoi-delta8b:   3.1235      4.1329        125.89         95.14       650   42.4%
## Total for ../qoi_benchmark_suite/images/photo_tecnick
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:   11.4094     17.6423        126.21         81.62      2423   43.1%
qoi-delta8a:  10.9503     16.6066        131.50         86.71      2423   43.1%
qoi-delta8b:  10.8998     14.4799        132.11         99.45      2423   43.1%
## Total for ../qoi_benchmark_suite/images/textures_plants
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:    3.7704      6.3602        282.17        167.27       896   21.6%
qoi-delta8a:   4.1386      6.0034        257.06        177.21       896   21.6%
qoi-delta8b:   4.1457      5.1593        256.62        206.21       896   21.6%
## Total for ../qoi_benchmark_suite/images/pngimg
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:    7.1984     10.4799        251.29        172.60      1398   19.8%
qoi-delta8a:   7.8123      9.9036        231.54        182.65      1398   19.8%
qoi-delta8b:   7.8087      8.6207        231.65        209.83      1398   19.8%
## Total for ../qoi_benchmark_suite/images/icon_512
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:    0.5676      0.8144        461.81        321.87        85    8.4%
qoi-delta8a:   0.6880      0.7320        381.00        358.14        85    8.4%
qoi-delta8b:   0.6932      0.6374        378.15        411.29        85    8.4%
# Grand total for ../qoi_benchmark_suite/images
            decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi-delta7:    2.3134      3.3694        200.64        137.76       450   24.8%
qoi-delta8a:   2.3885      3.1942        194.34        145.32       450   24.8%
qoi-delta8b:   2.3829      2.7861        194.79        166.60       450   24.8%

Apparently not, and not having to sum the discrete equations unnecessarily helps the decoder big time. Thanks for pointing out the optimisation, it's unlikely I'd ever have noticed.

For reference all my testing is done on a skylake laptop, Linux 64 bit, gcc 9.3.0.

qoi-delta8.h.txt

@chocolate42
Copy link
Author

This is a fresh attempt at a variant that focuses on compression but not to the complete detriment of performance. The goal is to apply LUMA's modelling to a range of output sizes, LUMA is a good encoding so lets see how far it can go. LUMA has been renamed LUMA464 to reflect the bit distribution between vg_r, vg, and vg_b, other LUMA ops do the same. Every delta9 build is an iteration of the previous build, the baseline has enough free opcodes to make iteration easy.

Builds

	ex2103: Latest experimental branch
	9a (234 ops): Baseline. luma232, luma464, run(8), index(31), rgb, rgba, run16
	9b (236 ops): Add luma575
	9c (242 ops): Replace luma575 with luma676
	9d (243 ops): Switch hash function back to experimental branch (3 5 7 11)
	9e (251 ops): Add luma4645
	9f (252 ops): Add luma6666
	9g (256 ops): Extend run8 to 12 to use remaining opcodes
	9h (256 ops): Replace luma6666 with op_a

Results

## Total for ../qoi_benchmark_suite/images/textures_pk01
qoi-ex2103:     0.769       1.141        168.86        113.84       178   35.2%
qoi-delta9a:    0.865       1.193        150.13        108.94       185   36.5%
qoi-delta9b:    0.921       1.339        141.07         96.99       175   34.6%
qoi-delta9c:    0.949       1.328        136.96         97.79       173   34.1%
qoi-delta9d:    0.806       1.172        161.09        110.85       172   34.0%
qoi-delta9e:    0.883       1.384        147.10         93.88       172   34.0%
qoi-delta9f:    0.832       1.232        156.08        105.43       172   34.0%
qoi-delta9g:    0.808       1.224        160.75        106.11       172   33.9%
qoi-delta9h:    0.856       1.263        151.75        102.89       172   33.9%
## Total for ../qoi_benchmark_suite/images/screenshot_game
qoi-ex2103:     3.061       4.077        206.81        155.25       519   21.0%
qoi-delta9a:    3.355       3.999        188.67        158.29       510   20.7%
qoi-delta9b:    3.486       4.319        181.61        146.56       494   20.0%
qoi-delta9c:    3.579       4.436        176.88        142.70       486   19.7%
qoi-delta9d:    3.134       3.974        201.97        159.29       488   19.8%
qoi-delta9e:    3.452       4.840        183.39        130.78       488   19.7%
qoi-delta9f:    3.242       4.256        195.23        148.75       488   19.7%
qoi-delta9g:    3.193       4.215        198.27        150.19       486   19.7%
qoi-delta9h:    3.407       4.135        185.81        153.07       486   19.7%
## Total for ../qoi_benchmark_suite/images/textures_photo
qoi-ex2103:     7.090       9.698        147.89        108.12      1981   48.4%
qoi-delta9a:    8.407      11.815        124.72         88.75      1916   46.8%
qoi-delta9b:    8.488      13.381        123.54         78.36      1841   45.0%
qoi-delta9c:    9.161      13.246        114.46         79.16      1838   44.9%
qoi-delta9d:    7.477      11.374        140.24         92.19      1837   44.9%
qoi-delta9e:    8.265      13.208        126.87         79.39      1837   44.9%
qoi-delta9f:    7.662      12.007        136.86         87.33      1837   44.9%
qoi-delta9g:    7.476      11.864        140.25         88.38      1837   44.9%
qoi-delta9h:    7.793      12.287        134.56         85.34      1837   44.9%
## Total for ../qoi_benchmark_suite/images/photo_wikipedia
qoi-ex2103:     8.096      11.952        133.96         90.74      2102   49.6%
qoi-delta9a:    9.075      13.202        119.50         82.15      2026   47.8%
qoi-delta9b:    9.491      14.878        114.26         72.90      1919   45.3%
qoi-delta9c:    9.987      14.582        108.59         74.37      1890   44.6%
qoi-delta9d:    8.343      12.841        129.98         84.46      1889   44.6%
qoi-delta9e:    9.159      15.208        118.41         71.31      1889   44.6%
qoi-delta9f:    8.703      13.416        124.61         80.84      1889   44.6%
qoi-delta9g:    8.368      13.373        129.60         81.10      1889   44.6%
qoi-delta9h:    8.620      14.034        125.81         77.27      1889   44.6%
## Total for ../qoi_benchmark_suite/images/textures_pk
qoi-ex2103:     0.333       0.480        133.70         92.75        75   43.5%
qoi-delta9a:    0.374       0.533        119.09         83.50        82   47.5%
qoi-delta9b:    0.404       0.587        110.08         75.82        76   43.7%
qoi-delta9c:    0.411       0.581        108.17         76.59        74   42.7%
qoi-delta9d:    0.349       0.511        127.39         87.14        74   42.9%
qoi-delta9e:    0.389       0.594        114.39         74.95        74   42.9%
qoi-delta9f:    0.363       0.538        122.54         82.80        74   42.9%
qoi-delta9g:    0.355       0.536        125.22         83.05        74   42.9%
qoi-delta9h:    0.371       0.560        120.09         79.52        74   42.9%
## Total for ../qoi_benchmark_suite/images/screenshot_web
qoi-ex2103:    26.172      28.693        310.47        283.19      2649    8.3%
qoi-delta9a:   27.634      24.295        294.05        334.46      2692    8.5%
qoi-delta9b:   28.357      26.221        286.55        309.89      2572    8.1%
qoi-delta9c:   29.006      28.268        280.13        287.45      2546    8.0%
qoi-delta9d:   26.956      27.210        301.44        298.62      2566    8.1%
qoi-delta9e:   28.484      36.125        285.27        224.93      2548    8.0%
qoi-delta9f:   27.662      30.462        293.75        266.75      2548    8.0%
qoi-delta9g:   27.194      30.206        298.80        269.01      2535    8.0%
qoi-delta9h:   29.665      25.328        273.91        320.82      2535    8.0%
## Total for ../qoi_benchmark_suite/images/icon_64
qoi-ex2103:     0.016       0.023        259.29        179.82         4   28.7%
qoi-delta9a:    0.019       0.022        219.86        183.37         4   30.7%
qoi-delta9b:    0.020       0.023        208.51        177.67         4   30.1%
qoi-delta9c:    0.020       0.024        208.60        167.47         4   29.8%
qoi-delta9d:    0.017       0.021        239.00        192.68         4   29.7%
qoi-delta9e:    0.019       0.029        212.19        140.61         4   27.9%
qoi-delta9f:    0.019       0.026        215.53        156.66         4   27.7%
qoi-delta9g:    0.018       0.026        227.71        156.28         4   27.6%
qoi-delta9h:    0.020       0.024        204.79        167.98         4   27.1%
## Total for ../qoi_benchmark_suite/images/textures_pk02
qoi-ex2103:     1.978       2.762        153.59        109.99       479   40.4%
qoi-delta9a:    2.403       3.206        126.41         94.76       491   41.4%
qoi-delta9b:    2.508       3.481        121.14         87.27       474   40.0%
qoi-delta9c:    2.605       3.501        116.61         86.77       467   39.4%
qoi-delta9d:    2.199       3.033        138.14        100.15       465   39.2%
qoi-delta9e:    2.468       3.513        123.10         86.46       463   39.1%
qoi-delta9f:    2.351       3.183        129.21         95.45       462   39.0%
qoi-delta9g:    2.263       3.167        134.24         95.91       462   39.0%
qoi-delta9h:    2.354       3.268        129.05         92.96       463   39.1%
## Total for ../qoi_benchmark_suite/images/photo_kodak
qoi-ex2103:     2.806       4.126        140.15         95.31       671   43.7%
qoi-delta9a:    3.307       4.789        118.89         82.11       649   42.3%
qoi-delta9b:    3.403       5.325        115.54         73.84       628   40.9%
qoi-delta9c:    3.560       5.346        110.46         73.55       624   40.7%
qoi-delta9d:    2.966       4.652        132.56         84.52       623   40.6%
qoi-delta9e:    3.299       5.344        119.21         73.57       623   40.6%
qoi-delta9f:    3.072       4.747        127.98         82.83       623   40.6%
qoi-delta9g:    3.051       4.777        128.89         82.32       623   40.6%
qoi-delta9h:    3.136       4.952        125.40         79.40       623   40.6%
## Total for ../qoi_benchmark_suite/images/photo_tecnick
qoi-ex2103:    10.515      15.669        136.95         91.90      2527   44.9%
qoi-delta9a:   11.769      16.721        122.36         86.12      2417   43.0%
qoi-delta9b:   12.266      18.797        117.40         76.61      2312   41.1%
qoi-delta9c:   12.817      18.440        112.35         78.09      2287   40.7%
qoi-delta9d:   10.714      16.232        134.40         88.71      2286   40.7%
qoi-delta9e:   11.776      19.179        122.28         75.08      2286   40.7%
qoi-delta9f:   11.103      16.851        129.70         85.45      2286   40.7%
qoi-delta9g:   10.840      16.856        132.84         85.43      2286   40.7%
qoi-delta9h:   11.196      17.574        128.62         81.94      2286   40.7%
## Total for ../qoi_benchmark_suite/images/textures_plants
qoi-ex2103:     3.928       5.876        270.85        181.06       922   22.2%
qoi-delta9a:    4.390       5.863        242.34        181.47       901   21.7%
qoi-delta9b:    4.492       6.484        236.84        164.07       869   20.9%
qoi-delta9c:    4.697       6.596        226.51        161.29       863   20.8%
qoi-delta9d:    4.046       6.103        262.97        174.31       863   20.8%
qoi-delta9e:    4.423       7.652        240.55        139.04       856   20.6%
qoi-delta9f:    4.214       6.631        252.49        160.45       851   20.5%
qoi-delta9g:    4.131       6.640        257.51        160.24       851   20.5%
qoi-delta9h:    4.378       6.258        242.99        170.01       855   20.6%
## Total for ../qoi_benchmark_suite/images/pngimg
qoi-ex2103:     7.476      10.296        241.95        175.69      1436   20.3%
qoi-delta9a:    8.108       9.773        223.09        185.08      1408   19.9%
qoi-delta9b:    8.545      10.776        211.68        167.86      1343   19.0%
qoi-delta9c:    8.718      10.911        207.48        165.79      1315   18.6%
qoi-delta9d:    7.730      10.086        234.01        179.34      1315   18.6%
qoi-delta9e:    8.341      12.581        216.87        143.78      1292   18.3%
qoi-delta9f:    7.906      10.974        228.79        164.84      1288   18.2%
qoi-delta9g:    7.870      10.934        229.83        165.43      1286   18.2%
qoi-delta9h:    8.320      10.327        217.42        175.16      1282   18.1%
## Total for ../qoi_benchmark_suite/images/icon_512
qoi-ex2103:     0.656       0.856        399.31        306.32        85    8.4%
qoi-delta9a:    0.704       0.712        372.12        368.11        96    9.5%
qoi-delta9b:    0.722       0.729        362.94        359.66        95    9.3%
qoi-delta9c:    0.728       0.789        359.92        332.07        95    9.3%
qoi-delta9d:    0.665       0.765        394.01        342.80        94    9.2%
qoi-delta9e:    0.724       1.058        361.84        247.71        85    8.4%
qoi-delta9f:    0.677       0.893        387.48        293.64        85    8.3%
qoi-delta9g:    0.672       0.881        390.02        297.43        83    8.2%
qoi-delta9h:    0.755       0.730        347.20        358.89        80    7.8%
# Grand total for ../qoi_benchmark_suite/images
qoi-ex2103:     2.310       3.192        200.98        145.42       463   25.6%
qoi-delta9a:    2.563       3.244        181.09        143.08       458   25.3%
qoi-delta9b:    2.675       3.568        173.53        130.09       439   24.2%
qoi-delta9c:    2.760       3.600        168.18        128.93       432   23.9%
qoi-delta9d:    2.390       3.232        194.21        143.61       433   23.9%
qoi-delta9e:    2.618       3.923        177.31        118.32       430   23.7%
qoi-delta9f:    2.474       3.450        187.66        134.56       429   23.7%
qoi-delta9g:    2.429       3.433        191.13        135.22       429   23.7%
qoi-delta9h:    2.563       3.391        181.12        136.87       428   23.6%

Notes

* Tests were done one after another as a block, skylake laptop, gcc 9.3.0, -O3
* 9a/b/c use px.v%31 as the hash function. (3,5,7,11) would've been better for performance
		9h opcodestats:
		* LUMA464=35.76%
		* LUMA232=23.30%
		* INDEX=18.03%
		* LUMA676=10.36%
		* RUN8(12)=7.88%
			* RGB=2.59%
			* RUN16=0.84%
			* RGBA=0.70%
			* A=0.25%
		* LUMA4645=0.30%
* Minor optimisation attempt made with 9h (if else ordering according to above stats), not that it did anything because gcc turned it into an indirect jump table anyway. At the least the decode function should be refactored as a hint to the compiler) which might be a sizeable performance bump without going into the heavier optimisation methods

delta9h config

#define QOI_COLOR_HASH(C) (C.rgba.r * 3 + C.rgba.g * 5 + C.rgba.b * 7 + C.rgba.a * 11)
#define INDEX_SIZE 32
#define RUN_SIZE 12

#define QOI_OP_LUMA232  0x00 // 0
#define QOI_OP_LUMA464  0x80 // 10 The original LUMA
#define QOI_OP_INDEX    0xc0 // 110
#define QOI_OP_LUMA676  0xe0 // 11100
#define QOI_OP_LUMA4645 0xe8 // 11101
#define QOI_OP_RUN      0xf0 // 1111 QOI_OP_SPECIAL resides in QOI_OP_RUN, careful now
#define QOI_OP_SPECIAL  0xfc // Block of special ops
#define QOI_OP_RUN16    0xfc // 11111100
#define QOI_OP_A        0xfd // 11111101
#define QOI_OP_RGB      0xfe // 11111110
#define QOI_OP_RGBA     0xff // 11111111

qoi-delta9h.h.txt

@nigeltao
Copy link
Owner

Here's demo9h, a little-endian (i.e. faster) version of delta9h. It's not as fast as demo10, but it's in a plausible spot on the simplicity / speed / compression trade-off.

master is commit 2ee2169 (2021-12-11).

        decode ms   encode ms   decode mpps   encode mpps   size kb    rate

## Total for images/icon_512
libpng:       5.0        33.9         52.74          7.74        51    5.0%
stbi:         3.9        32.8         66.47          8.00        69    6.8%
qoi-master:   1.5         1.3        171.27        198.88        85    8.4%
qoi-demo10:   0.8         1.2        309.23        218.27        76    7.5%
qoi-l61r64:   1.0         1.4        254.71        190.54        86    8.4%
qoi-delta8:   1.0         1.5        265.97        179.32        85    8.4%
qoi-delta9h:  1.6         1.4        164.56        190.03        80    7.8%
qoi-demo9h:   1.1         1.6        244.39        166.93        80    7.8%

## Total for images/icon_64
libpng:       0.2         0.9         23.26          4.35         3   23.6%
stbi:         0.1         0.9         28.05          4.45         4   27.9%
qoi-master:   0.0         0.0        104.20        100.77         4   28.7%
qoi-demo10:   0.0         0.0        148.43        110.04         4   27.6%
qoi-l61r64:   0.0         0.0        141.35        106.50         4   28.8%
qoi-delta8:   0.0         0.0        174.48        112.61         4   28.6%
qoi-delta9h:  0.0         0.0        106.58         96.26         4   27.1%
qoi-demo9h:   0.0         0.0        119.62        106.69         4   27.1%

## Total for images/photo_kodak
libpng:      15.2       295.7         25.88          1.33       717   46.7%
stbi:        16.9       157.8         23.33          2.49       979   63.8%
qoi-master:   6.1         6.7         64.95         58.81       671   43.7%
qoi-demo10:   4.1         7.1         95.72         55.76       772   50.3%
qoi-l61r64:   4.5         6.6         87.26         59.26       675   44.0%
qoi-delta8:   4.8         8.1         81.23         48.70       650   42.4%
qoi-delta9h:  6.4         8.3         61.04         47.30       623   40.6%
qoi-demo9h:   5.9         7.9         67.08         49.74       623   40.6%

## Total for images/photo_tecnick
libpng:      44.9      1143.8         32.08          1.26      2414   42.9%
stbi:        57.9       608.3         24.88          2.37      3533   62.8%
qoi-master:  22.6        26.3         63.73         54.84      2527   44.9%
qoi-demo10:  13.0        25.1        111.04         57.44      2737   48.7%
qoi-l61r64:  17.2        25.3         83.93         57.02      2529   45.0%
qoi-delta8:  15.9        28.2         90.37         51.02      2423   43.1%
qoi-delta9h: 22.1        29.5         65.29         48.74      2286   40.7%
qoi-demo9h:  18.5        28.5         78.02         50.58      2286   40.7%

## Total for images/photo_wikipedia
libpng:      42.9       865.8         25.28          1.25      2046   48.3%
stbi:        54.6       456.6         19.87          2.38      2893   68.3%
qoi-master:  17.0        20.2         63.66         53.81      2102   49.6%
qoi-demo10:   9.5        19.2        113.99         56.49      2289   54.0%
qoi-l61r64:  12.8        19.3         84.51         56.30      2103   49.7%
qoi-delta8:  12.2        22.0         88.71         49.29      2027   47.9%
qoi-delta9h: 17.1        23.3         63.60         46.51      1889   44.6%
qoi-demo9h:  13.1        21.8         82.49         49.83      1889   44.6%

## Total for images/pngimg
libpng:      56.4       533.3         32.08          3.39      1201   17.0%
stbi:        59.6       399.9         30.35          4.52      1751   24.8%
qoi-master:  16.4        16.7        110.41        108.00      1436   20.3%
qoi-demo10:   9.0        15.7        201.83        114.98      1429   20.2%
qoi-l61r64:  11.7        16.4        154.61        110.63      1437   20.3%
qoi-delta8:  11.4        17.5        158.92        103.20      1398   19.8%
qoi-delta9h: 16.9        18.1        106.82         99.98      1282   18.1%
qoi-demo9h:  11.8        17.9        153.10        101.20      1282   18.1%

## Total for images/screenshot_game
libpng:      18.6       216.7         34.08          2.92       448   18.1%
stbi:        20.9       150.9         30.24          4.19       634   25.7%
qoi-master:   6.7         6.5         93.95         97.27       519   21.0%
qoi-demo10:   4.1         6.1        156.06        104.53       535   21.7%
qoi-l61r64:   4.9         6.5        130.18         97.85       517   20.9%
qoi-delta8:   4.6         6.9        139.06         91.82       501   20.3%
qoi-delta9h:  6.7         7.1         94.31         88.67       486   19.7%
qoi-demo9h:   5.4         7.0        117.51         90.74       486   19.7%

## Total for images/screenshot_web
libpng:      92.0      1060.7         88.33          7.66      2402    7.6%
stbi:        78.8      1210.3        103.12          6.71      3076    9.7%
qoi-master:  55.7        44.8        145.98        181.17      2649    8.3%
qoi-demo10:  28.3        40.4        287.55        201.06      2680    8.4%
qoi-l61r64:  37.7        44.7        215.30        181.93      2649    8.3%
qoi-delta8:  35.8        46.5        227.25        174.76      2592    8.2%
qoi-delta9h: 56.8        48.1        143.08        169.07      2535    8.0%
qoi-demo9h:  35.1        51.7        231.47        157.08      2535    8.0%

## Total for images/textures_photo
libpng:      39.6       725.0         26.45          1.45      1977   48.3%
stbi:        46.6       370.7         22.49          2.83      2554   62.4%
qoi-master:  15.0        16.0         69.87         65.44      1981   48.4%
qoi-demo10:  10.1        18.3        104.22         57.40      2506   61.2%
qoi-l61r64:  11.2        15.1         93.39         69.39      1990   48.6%
qoi-delta8:  11.5        19.9         91.30         52.61      1920   46.9%
qoi-delta9h: 15.8        20.6         66.35         50.97      1837   44.9%
qoi-demo9h:  13.4        20.2         78.33         51.85      1837   44.9%

## Total for images/textures_pk
libpng:       0.9        24.6         47.24          1.81        89   51.5%
stbi:         0.8        17.0         55.03          2.62       121   70.0%
qoi-master:   0.8         0.8         59.21         57.16        75   43.5%
qoi-demo10:   0.6         0.7         77.52         60.26        78   45.1%
qoi-l61r64:   0.6         0.8         79.75         58.52        75   43.3%
qoi-delta8:   0.5         0.9         83.94         48.29        74   43.0%
qoi-delta9h:  0.8         1.0         56.99         43.95        74   42.9%
qoi-demo9h:   0.6         0.9         70.01         50.64        74   42.9%

## Total for images/textures_pk01
libpng:       5.3        66.5         24.65          1.95       163   32.3%
stbi:         5.1        37.1         25.61          3.50       232   45.8%
qoi-master:   1.6         1.7         80.68         75.78       178   35.2%
qoi-demo10:   1.0         1.6        136.54         81.54       180   35.6%
qoi-l61r64:   1.1         1.6        116.92         78.89       179   35.3%
qoi-delta8:   1.1         1.9        121.66         69.29       177   34.9%
qoi-delta9h:  1.7         2.1         77.60         62.51       172   33.9%
qoi-demo9h:   1.5         2.1         86.91         60.70       172   33.9%

## Total for images/textures_pk02
libpng:      11.7       205.0         25.97          1.48       427   36.1%
stbi:        12.1       100.1         25.07          3.03       623   52.5%
qoi-master:   4.5         4.5         67.71         67.38       479   40.4%
qoi-demo10:   2.8         4.3        108.01         70.79       492   41.5%
qoi-l61r64:   3.1         4.3         96.66         70.61       481   40.5%
qoi-delta8:   3.1         5.1         98.53         59.51       472   39.8%
qoi-delta9h:  4.7         5.7         64.54         53.58       463   39.1%
qoi-demo9h:   4.0         5.1         75.62         59.42       463   39.1%

## Total for images/textures_plants
libpng:      31.9       340.6         33.35          3.12       857   20.6%
stbi:        30.7       241.5         34.66          4.41      1191   28.7%
qoi-master:   8.6         9.9        123.62        107.79       922   22.2%
qoi-demo10:   4.3         9.5        247.12        111.57       957   23.0%
qoi-l61r64:   6.0         9.4        177.48        113.29       922   22.2%
qoi-delta8:   5.8        10.6        182.51        100.52       896   21.6%
qoi-delta9h:  9.0        11.1        118.50         95.51       855   20.6%
qoi-demo9h:   6.2        11.3        172.86         94.36       855   20.6%

# Grand total for images
libpng:      13.5       187.9         34.47          2.47       423   23.3%
stbi:        14.7       121.4         31.53          3.82       601   33.2%
qoi-master:   5.1         5.2         91.89         89.38       463   25.6%
qoi-demo10:   3.0         4.9        156.82         94.45       484   26.7%
qoi-l61r64:   3.7         5.1        127.03         91.58       463   25.6%
qoi-delta8:   3.5         5.6        133.24         82.77       450   24.8%
qoi-delta9h:  5.1         5.9         90.63         79.02       428   23.6%
qoi-demo9h:   4.0         5.7        117.16         81.46       428   23.6%

        decode ms   encode ms   decode mpps   encode mpps   size kb    rate

qoi-demo9h.h.txt

@chocolate42
Copy link
Author

Thank you for working your black magic, taming that decompression performance in particular is awesome.

@dumblob
Copy link

dumblob commented Jan 4, 2022

The results in the last table look actually very promising. Maybe QOI 2.0 😉?

@chocolate42
Copy link
Author

Things have progressed a little, I've found a single combination that averages 401 on images and that can probably be optimised to below 400 by exhaustively trying different LUMA ops. Effort level 0 of the QOIP thread shows the best single combination I can find to date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants