Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some easy optimizations are available #46

Closed
steven-joruk opened this issue Nov 28, 2021 · 17 comments
Closed

Some easy optimizations are available #46

steven-joruk opened this issue Nov 28, 2021 · 17 comments

Comments

@steven-joruk
Copy link

steven-joruk commented Nov 28, 2021

Hey, thanks for the work you're putting in to this.

I've written a Rust implementation of your image format (https://github.com/steven-joruk/qoi) and added some optimizations you might want to take as well.

The biggest gain is by factoring out writing the QOI_RUN command which lets you get rid of a bunch of redundant comparisons and a couple of branches: steven-joruk/qoi@3f3ee0a

You can reduce some more branches when writing QOI_COLOUR: https://github.com/steven-joruk/qoi/blob/3f3ee0ae7ecbb62a4b293f932d28580099989159/src/encode.rs#L158

And I'm unsure if this has an real affect but you only need to store the previous colour when it's changed (move the assignment in to the px_prev != px block).

When I hacked those in to my local qoi.h I saw improvements of around 16% for dice.png, I haven't measured other files. The rust benchmark encodes dice.qoi (from raw) in around 2.3ms compared to qoibench's 3.7ms (3.4ms with the above changes), I haven't compared the assembly or profiles to see what else may be going on.

@aldanor
Copy link
Contributor

aldanor commented Nov 29, 2021

@steven-joruk raises good points indeed and his encoding implementation is on average around 5-10% faster (than qoi.h) on small 3-channel photo-like images from kodak dataset and 1.9x faster on the screenshots dataset with 4-channel images.

However it's possible to make encoding 40-50% faster on kodak and 3-4x faster on screenshots (on screenshots in particular, this starts beating libpng by some absolutely obscure amounts), and I think it's not the end, it could probably be squeezed even further.

I've been also working on a Rust implementation lately (qoi-fast below) which now passes all roundtrips and checks on the test image set (the 300mb pngs tarball), but I need to clean it up a bit and test/fuzz before publishing. I'll share my workings so any useful tricks could be ported back to the C implementation, too (e.g. wrapping ops for diffs without leaving the 8-bit space and a few other random things).

I've had the benchmark suite comparing QOI implementations (compiling and linking C library directly into Rust so as to exclude any i/o etc), and I've added @steven-joruk's one temporarily to get some numbers, see below.

Benchmarks from kodak (first 5 only, they're all kind of the same):

../qoi-images/kodak/kodim11.png (768 x 512 x 3)
codec            decode (ms)   encode (ms)
qoi-c                   3.48          5.07
qoi-sj                  3.36          4.75
qoi-fast                2.57          3.49
../qoi-images/kodak/kodim05.png (768 x 512 x 3)
codec            decode (ms)   encode (ms)
qoi-c                   3.67          5.40
qoi-sj                  3.59          5.30
qoi-fast                2.73          3.65
../qoi-images/kodak/kodim04.png (512 x 768 x 3)
codec            decode (ms)   encode (ms)
qoi-c                   3.50          5.13
qoi-sj                  3.33          4.99
qoi-fast                2.56          3.51
../qoi-images/kodak/kodim10.png (512 x 768 x 3)
codec            decode (ms)   encode (ms)
qoi-c                   3.33          5.15
qoi-sj                  3.31          4.99
qoi-fast                2.59          3.50
../qoi-images/kodak/kodim06.png (768 x 512 x 3)
codec            decode (ms)   encode (ms)
qoi-c                   3.56          5.28
qoi-sj                  3.56          4.93
qoi-fast                2.67          3.61

Benchmarks from screenshots (all of them):

../qoi-images/screenshots/microsoft.com.png (1313 x 3328 x 4)
codec            decode (ms)   encode (ms)
qoi-c                  12.65         32.15
qoi-sj                 12.88         18.14
qoi-fast               10.28         11.45
../qoi-images/screenshots/stripe.com.png (1313 x 6603 x 4)
codec            decode (ms)   encode (ms)
qoi-c                  21.22         56.23
qoi-sj                 22.25         27.28
qoi-fast               18.11         16.31
../qoi-images/screenshots/news.ycombinator.com.png (1325 x 1450 x 4)
codec            decode (ms)   encode (ms)
qoi-c                   4.56         13.13
qoi-sj                  4.66          5.58
qoi-fast                3.85          3.71
../qoi-images/screenshots/amazon.com.png (1313 x 6097 x 4)
codec            decode (ms)   encode (ms)
qoi-c                  34.35         67.29
qoi-sj                 35.06         48.20
qoi-fast               28.14         31.95
../qoi-images/screenshots/phoboslab.org.png (1313 x 20667 x 4)
codec            decode (ms)   encode (ms)
qoi-c                  79.16        190.03
qoi-sj                 82.73        109.20
qoi-fast               66.71         67.13
../qoi-images/screenshots/apple.com.png (1313 x 4755 x 4)
codec            decode (ms)   encode (ms)
qoi-c                  19.72         45.64
qoi-sj                 20.39         26.30
qoi-fast               16.72         16.90
../qoi-images/screenshots/reddit.com.png (1313 x 8008 x 4)
codec            decode (ms)   encode (ms)
qoi-c                  31.42         74.56
qoi-sj                 32.79         43.39
qoi-fast               26.72         26.53
../qoi-images/screenshots/nytimes.com.png (1313 x 5780 x 4)
codec            decode (ms)   encode (ms)
qoi-c                  24.17         56.15
qoi-sj                 25.13         32.91
qoi-fast               20.51         20.69
../qoi-images/screenshots/imdb.com.png (1313 x 6441 x 4)
codec            decode (ms)   encode (ms)
qoi-c                  23.62         59.79
qoi-sj                 24.90         31.98
qoi-fast               20.13         19.76
../qoi-images/screenshots/duckduckgo.com.png (1313 x 2874 x 4)
codec            decode (ms)   encode (ms)
qoi-c                   8.12         23.00
qoi-sj                  6.90          9.44
qoi-fast                5.25          4.75
../qoi-images/screenshots/sublime.png (1008 x 9513 x 4)
codec            decode (ms)   encode (ms)
qoi-c                  23.49         63.99
qoi-sj                 23.90         29.08
qoi-fast               19.66         17.46
../qoi-images/screenshots/en.wikipedia.org.png (1313 x 2936 x 4)
codec            decode (ms)   encode (ms)
qoi-c                  12.46         27.41
qoi-sj                 10.98         13.91
qoi-fast                8.62          9.78
../qoi-images/screenshots/cnn.com.png (1313 x 5241 x 4)
codec            decode (ms)   encode (ms)
qoi-c                  23.91         51.49
qoi-sj                 22.27         30.62
qoi-fast               18.19         18.28

@aldanor
Copy link
Contributor

aldanor commented Nov 29, 2021

Update, decoders can also be sped up 2x 😄

../qoi-images/screenshots/microsoft.com.png (1313 x 3328 x 4)
codec            decode (ms)   encode (ms)
qoi-c                  12.88         32.36
qoi-sj                 13.64         17.96
qoi-fast                7.69         11.47
../qoi-images/screenshots/stripe.com.png (1313 x 6603 x 4)
codec            decode (ms)   encode (ms)
qoi-c                  24.42         56.42
qoi-sj                 21.23         27.80
qoi-fast               12.14         15.62
../qoi-images/screenshots/news.ycombinator.com.png (1325 x 1450 x 4)
codec            decode (ms)   encode (ms)
qoi-c                   4.67         13.67
qoi-sj                  4.60          5.45
qoi-fast                2.38          3.50

@oscardssmith
Copy link

With these modifications would little endian still be advantageous? #36

@aldanor
Copy link
Contributor

aldanor commented Nov 29, 2021

With these modifications would little endian still be advantageous? #36

Yes, it's completely orthogonal.

@nigeltao
Copy link

nigeltao commented Nov 29, 2021

Heh, #47 has a similar theme for the original C code, although it's about decoding and if I understand the OP correctly, that's about encoding. @aldanor's numbers shows speed ups to both (in Rust), although I haven't seen the code yet.

@aldanor
Copy link
Contributor

aldanor commented Nov 29, 2021

@nigeltao I've tried applying #47-like approach to my code but couldn't improve it (although I used a different approach with runs, different from both the original code and yours) - I'll try to share it asap, maybe today, so you could then take a look yourself.

@JohnPeel
Copy link

Another rust implementation https://github.com/JohnPeel/qoi

@darleybarreto
Copy link

Another rust implementation https://github.com/JohnPeel/qoi

Also here.

@taotao54321
Copy link

taotao54321 commented Nov 30, 2021

I found a minor compression improvement for the current implementation (fda5167).
When only one component of RGBA is differ, QOI_COLOR is shorter than QOI_DIFF_24. So, prefering QOI_COLOR to QOI_DIFF_24 in such cases, you can slightly improve compression ratio without modifying encode scheme.

My Rust implementation does this.

@oscardssmith
Copy link

@taotao54321 can you check how often this occurs?

@taotao54321
Copy link

@taotao54321 can you check how often this occurs?

I quickly checked using Kodak image suite.

Converting all 24 files with the current qoiconv, the total is 18969467 Bytes.
Converting them with my implementation, the total is 18964862 Bytes.

So the difference is 4605 Bytes. The difference between QOI_DIFF_24 and single-entry QOI_COLOR is 1 Byte, so the occurrence will be also 4605 times.

@oscardssmith
Copy link

It seems like we should be able to do something with the tag that's better than a .03% compression improvement, but it's still an improvement...

@zakarumych
Copy link
Contributor

Hey, I'm too work on Rust QOI implementation.

I've rewrote benchmark in Rust too and included my crate rapid-qoi, qoi and qoi_rs.
On kodak set mentioned above here's my results:

Lines starting from qoi-c are copy from reference benchmark results run on the same set.

## Benchmarking ../../textures/kodak//*.png -- 10 runs

## ../../textures/kodak/kodim01.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          3.024       3.898       130.043       100.867       909
qoi_rs:       2.940       5.886       133.761        66.805       909
rapid_qoi:    2.961       3.378       132.806       116.414       909
qoi-c:        2.703       3.518       145.495       111.774       909

## ../../textures/kodak/kodim02.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.685       3.701       146.434       106.238       706
qoi_rs:       2.611       5.659       150.614        69.481       707
rapid_qoi:    2.546       3.078       154.467       127.733       706
qoi-c:        2.423       3.488       162.289       112.719       706

## ../../textures/kodak/kodim03.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.738       3.543       143.624       110.979       594
qoi_rs:       2.683       5.234       146.564        75.125       593
rapid_qoi:    2.512       2.965       156.535       132.602       594
qoi-c:        2.524       3.342       155.770       117.673       594

## ../../textures/kodak/kodim04.png size: 512x768
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.708       3.728       145.185       105.470       763
qoi_rs:       2.622       5.666       149.949        69.404       763
rapid_qoi:    2.589       3.056       151.854       128.691       763
qoi-c:        2.425       3.509       162.173       112.064       763

## ../../textures/kodak/kodim05.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.859       3.877       137.547       101.420       952
qoi_rs:       2.741       5.723       143.459        68.712       951
rapid_qoi:    2.787       3.266       141.103       120.381       952
qoi-c:        2.545       3.586       154.533       109.661       952

## ../../textures/kodak/kodim06.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.822       3.618       139.318       108.691       757
qoi_rs:       2.706       5.579       145.329        70.486       758
rapid_qoi:    2.666       3.097       147.480       126.956       757
qoi-c:        2.563       3.380       153.435       116.348       757

## ../../textures/kodak/kodim07.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.701       3.608       145.586       108.976       653
qoi_rs:       2.637       5.294       149.127        74.274       654
rapid_qoi:    2.504       2.935       157.042       133.972       653
qoi-c:        2.465       3.490       159.502       112.664       653

## ../../textures/kodak/kodim08.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.873       3.877       136.856       101.411       987
qoi_rs:       2.719       5.812       144.607        67.652       987
rapid_qoi:    2.759       3.225       142.505       121.928       987
qoi-c:        2.400       3.463       163.869       113.536       987

## ../../textures/kodak/kodim09.png size: 512x768
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.690       3.717       146.189       105.783       669
qoi_rs:       2.607       5.547       150.819        70.885       669
rapid_qoi:    2.508       3.030       156.782       129.775       669
qoi-c:        2.339       3.505       168.101       112.197       669

## ../../textures/kodak/kodim10.png size: 512x768
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.669       3.683       147.352       106.756       704
qoi_rs:       2.576       5.659       152.620        69.482       705
rapid_qoi:    2.495       3.038       157.593       129.430       704
qoi-c:        2.394       3.502       164.236       112.275       704

## ../../textures/kodak/kodim11.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.728       3.541       144.164       111.050       736
qoi_rs:       2.643       5.416       148.760        72.600       737
rapid_qoi:    2.562       2.990       153.484       131.509       736
qoi-c:        2.414       3.312       162.886       118.717       736

## ../../textures/kodak/kodim12.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.695       3.570       145.913       110.135       614
qoi_rs:       2.638       5.426       149.074        72.465       614
rapid_qoi:    2.445       3.000       160.852       131.088       614
qoi-c:        2.459       3.370       159.890       116.695       614

## ../../textures/kodak/kodim13.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          3.023       3.958       130.074        99.341      1064
qoi_rs:       2.873       5.827       136.861        67.481      1063
rapid_qoi:    2.963       3.420       132.690       114.992      1064
qoi-c:        2.688       3.582       146.276       109.787      1064

## ../../textures/kodak/kodim14.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.863       3.752       137.356       104.810       831
qoi_rs:       2.765       5.742       142.236        68.475       830
rapid_qoi:    2.746       3.245       143.211       121.183       831
qoi-c:        2.567       3.504       153.173       112.213       831

## ../../textures/kodak/kodim15.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.792       3.657       140.850       107.511       722
qoi_rs:       2.695       5.532       145.911        71.086       722
rapid_qoi:    2.619       3.095       150.167       127.068       722
qoi-c:        2.565       3.566       153.304       110.260       722

## ../../textures/kodak/kodim16.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.768       3.458       142.032       113.706       641
qoi_rs:       2.693       5.353       146.021        73.452       640
rapid_qoi:    2.560       2.999       153.620       131.137       641
qoi-c:        2.506       3.374       156.884       116.538       641

## ../../textures/kodak/kodim17.png size: 512x768
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.744       3.663       143.307       107.345       714
qoi_rs:       2.669       5.580       147.320        70.468       715
rapid_qoi:    2.574       3.051       152.757       128.902       714
qoi-c:        2.519       3.409       156.118       115.354       714

## ../../textures/kodak/kodim18.png size: 512x768
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.873       3.895       136.847       100.961       944
qoi_rs:       2.739       5.949       143.566        66.093       944
rapid_qoi:    2.783       3.293       141.291       119.399       944
qoi-c:        2.522       3.573       155.925       110.054       944

## ../../textures/kodak/kodim19.png size: 512x768
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.824       3.860       139.231       101.876       813
qoi_rs:       2.697       5.723       145.814        68.707       812
rapid_qoi:    2.673       3.269       147.122       120.297       813
qoi-c:        2.494       3.499       157.640       112.376       813

## ../../textures/kodak/kodim20.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.229       2.869       176.426       137.079       589
qoi_rs:       2.213       4.593       177.688        85.611       589
rapid_qoi:    2.042       2.491       192.609       157.867       589
qoi-c:        1.977       2.533       198.906       155.212       589

## ../../textures/kodak/kodim21.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.775       3.864       141.683       101.772       761
qoi_rs:       2.657       5.757       148.008        68.308       761
rapid_qoi:    2.655       3.145       148.113       125.027       761
qoi-c:        2.456       3.420       160.104       114.973       761

## ../../textures/kodak/kodim22.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.822       3.970       139.348        99.037       859
qoi_rs:       2.733       5.943       143.900        66.163       859
rapid_qoi:    2.741       3.195       143.475       123.086       859
qoi-c:        2.481       3.518       158.464       111.785       859

## ../../textures/kodak/kodim23.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.590       3.838       151.836       102.443       688
qoi_rs:       2.529       5.658       155.472        69.493       691
rapid_qoi:    2.421       2.883       162.404       136.398       688
qoi-c:        2.252       3.412       174.614       115.229       688

## ../../textures/kodak/kodim24.png size: 768x512
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.865       3.804       137.239       103.369       839
qoi_rs:       2.773       5.759       141.784        68.280       841
rapid_qoi:    2.743       3.234       143.361       121.572       839
qoi-c:        2.511       3.513       156.616       111.946       839

## Totals (AVG) size: 0x0
          decode ms   encode ms   decode mpps   encode mpps   size kb
qoi:          2.765       3.706       142.213       106.094       771
qoi_rs:       2.673       5.597       147.093        70.260       771
rapid_qoi:    2.619       3.099       150.149       126.884       771
qoi-c:        2.466       3.432       159.434       114.574       771

Interestingly. Original diffs intervals were moved lately and broke one of my encoding optimizations.
Before that change the total result was

          decode ms   encode ms   decode mpps   encode mpps   size kb
rapid_qoi:    2.618       2.916       150.174       134.853       771

@nigeltao
Copy link

nigeltao commented Dec 3, 2021

I need to clean it up a bit and test/fuzz before publishing.

@aldanor any update?

@darleybarreto
Copy link

I need to clean it up a bit and test/fuzz before publishing.

@aldanor any update?

I think this is the one.

@aldanor
Copy link
Contributor

aldanor commented Jan 5, 2022

@darleybarreto Yes, I'm in the process of cleaning it up and updating the benchmarks (as it's been run on half of the suite only), was planning to post it here as soon as it's done (and adding a PR in this repo to mention it in the readme).

This will also be renamed to just qoi as @steven-joruk has kindly agreed to hand over the crate name to host this project.

@aldanor
Copy link
Contributor

aldanor commented Jan 6, 2022

Added https://github.com/aldanor/qoi-rust to the list of implementations in #164.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants