-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upcoming breaking changes & locking in the data format #28
Comments
On color spaces... I'm one of those 'experts' the HN thread is complaining about :) If the user is doing complex image operations in a color accurate space, they're going to be using, most likely OpenEXR, because the values are important, as is the colorimetric information, and there's a lot more information that needs to be stored beyond color space (as is pointed out in the HN thread). The intent of QOI is fast compress/decompress of data, obviously in a real time context. The data QOI is going to be used with is therefore going to be either linear or SRGB. Drawing a line on the sand on those two seems like a clear, useful, choice. There's a rub though. It's normal to store linear alpha with sRGB data. Also, it's common to store linear data in color channels, for example, in a normal map or other material input such as "metalness" in a typical PBR set up. I propose that it makes sense to simply make the rule that individual channels are either linear-and-user-interpreted, or sRGB-with-gamma, and have the format include a byte with a bitmask indicating which channel is which. I also propose that a bit of 0 would mean gamma, and 1 linear, and ordered such that the A channel lands in the LSB. A mask of 0 therefore might mean all gamma, and a mask of 1 would mean sRGB and linear alpha, regardless of the number of channels. That would make the two most common combinations result in a byte of zero and one, for any number of channels with alpha. The conciseness of only needing to care about zero or one in general appeals to me. |
👍 Also my three cents:
|
What's the use of this? It requires the encoder to either seek the file, buffer the whole output or do a first pass just to compute this number. |
Yea, I had the same question. This requires the writer to support seeking (which would yield problems with compression encoders), or you have to encode everything into a buffer first. The only advantage, I guess, would be able to skip qoi-encoded images if there were many of them streamed in a row, or something like that. |
I agree with the previous two writes that i find the
I agree with this. I think additional fields like color space and such should also be purely informative, as otherwise the decode might need to perform color space conversion. I would say that color space will only ever affect RGB, but QOI will always use linear alpha. I also agree with 1) and 2). We should not support premultiplied alpha, as it makes things more complex. I can also live with the current offset method for the signed deltas. One question i had for the file format was: What is the purpose of the four padding bytes at the end of the file in the original implementation? 🤔 |
What's your reasoning for this? Maybe I'm a bit naive, but a 64k x 64k image seems plenty large to me. The 12bit width/height in MPEG1 still holds up 30 years later (admittedly barely).
Uh, interesting! Didn't even think about it. I'll check if it makes a significant difference in compression. My hunch is that it does not - and the added complexity is not worth it.
True. I guess
Good question! I was thinking of this as additional way to verify if we have a valid header. Streaming QOI images would also benefit from this. An alternative would be a special Case in point: I've had very frustrating experiences with MPEG1 not having any field of the frame size or an indicator for a frame end, resulting in one extra frame of latency - you have to wait until the next frame arrives until you can be sure you have enough data to decode the current one. MPEG 1 was conceived in a way where you could encode data and spit it out to the wire immediately. Same for receiving: as soon as you get any bits from the wire you can theoretically start decoding. Yet, none(?) of the (software) decoders take advantage of it. ffmpeg and libmpeg2 buffers data for a whole frame before they start decoding. It's non-trivial to have a streaming decoder.
Can you clarify what you mean by "restricting"?
Since the longest chunk we can have is 5 bytes (QOI_COLOR with RGBA set), having an extra 4 bytes that we can ignore at the end means we only have to check once per loop iteration if didn't run out of data. Without these extra 4 bytes, we would need to check for every decoded chunk separately. E.g.: else if ((b1 & QOI_MASK_3) == QOI_RUN_16) {
// We need an extra byte here. Check if we have enough data
if (p >= chunks_len) {
free(pixels);
return NULL;
}
int b2 = bytes[p++];
run = (((b1 & 0x1f) << 8) | (b2)) + 32;
}
...
else if ((b1 & QOI_MASK_4) == QOI_COLOR) {
// Calculate how many extra bytes we need
int bytes_needed =
((b1 >> 4) & 1) +
((b1 >> 2) & 1) +
((b1 >> 1) & 1) +
((b1 >> 0) & 1);
// Check if we have enough data
if (p + bytes_needed > chunks_len) {
free(pixels);
return NULL;
}
if (b1 & 8) { px.rgba.r = bytes[p++]; }
if (b1 & 4) { px.rgba.g = bytes[p++]; }
if (b1 & 2) { px.rgba.b = bytes[p++]; }
if (b1 & 1) { px.rgba.a = bytes[p++]; }
} In contrast, the 4 bytes padding seemed like a nicer solution to guard against malicious input. |
It's a significant difference in the decoder to check if the overflow/wraparound would happen and error out because if format violation or just let the wraparound happen, and have improved encoding space (as hard on/off transitions can be encoded in a single byte instead of a
QOI always encodes all pixels, so you know the end of frame is reached when you have decoded exactly
Agreed. Encoding a run as 2…33 would make pictures also a tad smaller. I think this is worth the improvement as there isn't any difference in the decoder, and the encoder only gets slightly harder by introducing a single branch that checks if run-lengh is 1 and emit a QOI_INDEX instead of a QOI_RUN8
Ah. My implementation checks for the overrun anways, so i was wondering why it's there. Also note that malicious input doesn't have these 4 "safety" bytes on purpose, so we cannot rely on them being there. |
There's not much complexity here. Truncate the deltas in the encoder to 8 bits and document it. |
Right. That makes sense! I see now that allowing for wrapping is the better choice.
Doesn't matter. |
To clarify: you want to keep the biases encoding and just move the range by one? |
Sorry, yes. I just wanted to shift the range by one to be consistent with the range of a two's complement int. |
I propose having a |
Perhaps I am missing something obvious, but why is the following not an option? |
That sounds needlessly complex. I would just make it 32 bits (which is necessary for large images such as satellite fotos, ...) and be done with it. Sure, we might have 24 additional bits in the header that are not necessary for small files, but imo thats an acceptable sacrifice. |
From the OP:
I've mentioned it before (#12), but for comparison, here's the 16-byte NIE header:
Like QOI, I also never expect to define a "version 2", so the version byte looks useless at first, but I've also learned over the years to never say never. Also, the version byte is 0xFF, which guarantees that the header is both invalid ASCII and invalid UTF-8. |
I would instead advise to make everything little endian. Not just the header fields, but also the bytecodes. Almost all CPUs used today are little-endian, and support unaligned loads. The current
By replacing the
Which might be faster. It certainly looks simpler. |
one downside of that implementation is that it would be harder for non-C languages to implement. if it's faster though, it still probably makes sense. |
The maximum RGB size with 16-bit width/height is 12GB and would already take around 30 seconds to decode (although the current implementation fails at 2GB). For these and larger images, you really want the format to support multi-threaded decoding (chunks/tiles), and probably zoom levels. This isn't "simple", so I'd keep the 16-bit size limit to stop people misusing the format. |
Yes, in practice this is all I would ever use having worked on various games and texture compression tools. Which channels are sRGB? Which are linear? That's it and gives me 99% of the value. |
Agreed on #1, #2, #3. Make it a DWORD. Every single time I've limited my texture compression tools to specific maximum sizes (like 8192x8192 or 16384x16384), somebody has yelled and I had to bump the limits. For some specific use cases 16K-32K are definitely a thing, and I don't see why >=64K shouldn't be supported. You need more than 16-bits for the width/height. QOI is so much faster than PNG that it's a compelling use case for 64K-128K images. 640k is not enough: |
We're all sitting here with what would have been viewed as personal supercomputers in the 80's/90's. I have 128 GB RAM and soon going to 256 GB. Give me 32-bit width & height please - we're ready now. |
True, but my supercomputer also has 16 cores, and I would much rather load the same image split into tiles in ~2 seconds than ~30 seconds. I guess you know this, and want 32-bit sizes anyway (e.g. to replace even worse performing PNGs of these sizes?), so go for it. (I've tried working with a 20268x11309 JPEG (die shot, still well under the 16-bit limit, less than 1GB of raw RGB data), and there's no question that my computer could do it, but it just makes most software unusable. I'd love to have better formats for large images, but it's not a domain suited to simplicity-first projects.) |
+1 for uint32 w/h +1 for colorspace as a bitfield Other color spaces will probably be better served by other formats. Maybe we should also consider having a Just a suggestion, probably a lil out of scope if you want to keep it simple and straightforward. |
Not all images are square. Especially in science it isn't unbelievable that you could want to encode a 4GB image that's 2^20 by 2^10 pixels (for example). Since the cost is only 32 bits extra, I think it's worth the extra flexibility. |
Could also be useful for LUTs that are 1xLARGEVALUE. |
I just wanted to write a proposal with which i'd be happy. It had one new idea, but scrap that. I'll go adjust my implementation. Can i expect this to be frozen? Then i'll happily start PRing a improved spec in |
This isn't about portability. It's about possible compiler's optimization: https://godbolt.org/z/3brMonWrE |
We're talking about 1 ( |
Yes. If there's not any severe mistakes in my previous post, this will be the final spec.
That's a good point, but I will not entertain it for QOI. |
As someone pointed on HackerNews, it allow to simply put EXIF (or any other metadata format) at the end of the file, and a library/user will only need to skip these size bytes to read the metadata. |
Yeah, but this is also simply possible by seeking to the end of the file and reading the header there, similar to how Id3 tags are encoded |
seconded, truth is that little-endian "has won", everything performance-sensitive today should use little-endian, meaning they'll be slightly faster on little-endian cpus and slightly slower on big-endian cpus, because practically everything today use little-endian cpus. (X86 and x86-64, Apple M1, iPhone, Samsung phones, the vast majority of smartphones, its all little-endian. the last Apple big-endian system was released in 2005, they transitioned to little-endian in 2006 and have been little-endian ever since.)
|
Little-endian shows an approx 1.05x improvement in decode speed. The new |
What does a |
Bikeshedding alert: you could re-pack from 14 back to 12 bytes if you really wanted to:
More bikeshedding: I'd then change the magic to "qoi\xFF" to be non-ASCII and non-UTF-8. |
I do not :)
You can store different planes in one file, where some of those are sRGB (e.g. a specular map(?)) while others are interpreted linearly (e.g. bump maps, normal maps). QOI just provides this info in the header. What the user does with this info is outside of this spec.
I believe there's more gains to be had by rearranging/nesting the if-statements and other little tweaks. E.g. changing this range check Anyway, if we want to have the absolutely fastest encode/decode speed, this format would need to change a lot to accommodate SIMD instructions. It's certainly a worthwhile endeavor, but again, QOI is done! Thank you :) |
This (e.g. a 1-channel 'sRGB' specular map) still doesn't make sense to me. sRGB-ness (combined with your monitor color profile) tells you how to convert an RGB triple from the source image color space to the destination monitor color space. If only the red and green parts of the source image are 'sRGB' then you can't make the conversion without knowing the blue part too in sRGB space, and the file format doesn't tell you that. If you're storing specular maps, bump maps, normal maps, etc. then just tag the whole thing (all 4 channels) as not-sRGB. It's an all-or-nothing thing. |
Sure, but it's not "you can choose only one optimization". You can rearrange the if-statements (for e.g. +10%) and change to little endian (for e.g. +5%) and the two combine. The thing is you can rearrange the if-statements 3 months from now. You can't change to little endian once you freeze the file format.
It's not about having the absolutely fastest decode speed. It's about having a free lunch with no change in file format complexity. It's just a different bit packing into exactly the same number of bytes per opcode, with exactly the same opcode semantics. |
I wrote something regarding little- vs. big-endian here. If the highest bits of the bytecode decide about the operation the masking operation is unnecessary and a comparison with <= is sufficient. |
Re: color difference (2), why not use wrapping ops instead of essentially using saturating ops? (please correct me if I'm wrong) For example, Another point is that in addition to the diff op itself you would only have to perform 2 ops on the diff: one comparison and one addition instead of 3 like it currently is (2 comparison ops, one addition):
The potential compression disadvantage may be that some pixels further down the line with e.g. |
I think this is a bad idea. I have seen several file formats without size and it always looked like a bad idea. The Entropy Coded Segment of JPEG is such an example. All other JPEG segments have a length. The ECS (Entropy Coded Segment) does not. The ECS segment ends if a 0xff byte is not followed by 0 byte. In decoding the byte sequence 0xff 0 needs to be converted to 0xff. Every other JPEG segment can be skipped easily, because it has a length. An ECS must be processed byte by byte to find its end. This is absolutely crazy. If the size is missing you need to process everything to get to the end of the data. The size opens the opportunity to have other information afterwards. E.g.: Another header with another image. Or meta information. With a size field you can just jump to the information after the image without processing it. The size can also be used as redundant information (afterwards the file must end). I now think that the "channels" field is not needed, but a "size" field definitive makes sense. |
@ThomasMertes There's pros and cons of not having
Alternative middle ground: add an unused
Why not have this information before the image then? |
Is there a reason behind using biases instead of just two's complement there? For software implementations it might be kinda irrelevant, but for hardware implementations it requires adding more circuitry to take into account the correct bias. OTOH, sign extension is trivial both in software and hardware. |
I spent some more time thinking about how I could use qui files in some of my applications and realized, that the suggested 8 bits of custom information might not necessarily be enough for me. What if, instead of storing the custom information in the header itself, we store an offset between the end of the header and the start of the image itself. This way people could add their own information as some kind of secondary header, which would be completely ignored (besides a getter maybe) by qoi. |
@phoboslab has just said that "the data format is now fixed!". Let's move the bikeshedding to: Note that GitHub is having some server issues right now. |
I'm working with 48-bit uncompressed scans of film. I might have to create a qoi-48 fork then. :-) |
Indeed. One option for "a sequence of QOI images", without an explicit size field, is to present a TAR file that contains QOI files. It's streaming / single-pass encoding friendly. |
Similarly, for people wanting to stuff EXIF or other metadata after the, uh, non-meta data, you could instead use something based on IFF / RIFF / TIFF. |
Saying that I'm surprised by the amount of attention this is getting would be an understatement. There's lots of discussion going on about how the data format and compression could be improved and what features could be added.
I want to give my views here and discuss how to go forward.
First and foremost, I want QOI to be simple. Please keep this in mind. I consider the general compression scheme to be done. There's lots of interesting ideas on how to improve compression. I want to tinker with these ideas - but not for QOI.
QOI will not be versioned. There will only be one version of QOI's data format. I'm hoping we will be able to strictly define what exactly that is in the coming days.
QOI will only support 24bit RGB and 32bit RGBA data. I acknowledge there's some need for fewer or more channels and also for higher bit depths or paletted color - QOI will not serve these needs.
So, with all that said, there's some breaking changes that are probably worthwhile. I want to discuss if and how to implement those.
Proposed changes
width
,height
andsize
in the header should be stored as big endian for consistency with the rest of the format (this change already happened in c03edb2)Color differences (
QOI_DIFF_*
) shouldbe storedhave the same range as two's-complement. That means:-2..1
instead of the current range-1..2
-8..7
instead of the current range-7..8
-16..15
instead of the current range-15..16
3a) number of channels ( The number of channels should be encoded in the QOI header #16)
3b) the colorspace (Consider adding a sRGB flag to the header #25 and this huge discussion on HN)
3c) un-/premultiplied alpha (Clarify that RGBA means non-premultiplied alpha #13)
3d) user-defined values
So, 1) is already implemented; 2) seems like the right thing to do (any objections?); 3) is imho worth discussing.
3a) Storing the number of channels (
3
or4
) in the header would allow a user of this library to omit if they want RGB or RGBA and files would be more descriptive of their contents. You would still be able to enforce 3 or 4 channels when loading. This is consistent to what stbi_load doesIt is my opinion that the
channels
header value should be purely informative. Meaning, en-/decoder will do exactly the same, regardless of the number of channels. The extra 5bit for alpha inQOI_DIFF_24
will still be wasted for RGB files.3b) I don't understand enough about the colorspace issue to gauge the significance. If we implement this however, I would suggest to give this a full byte in the header, where
0 = sRGB
and any non-zero value is another, user-defined(?) colorspace.3c) I'm against an option for premultiplied alpha, because it puts more burden on any QOI implementation to decode in the right pixel format. We should just specify that QOI images have un-premultiplied alpha.
3d) For simplicity's sake I'd like to put 3a) and 3b) as one byte each into the header. I'm uncertain if we then should "pad" the
u32 size
in the header with two more bytes. This would make thesize
4byte aligned again, but there's probably no need for it!? Au16 unused
could also cause more confusion when other QOI libraries suddenly specify any of these bits to mean something.With all this, the header would then be the following 16 bytes:
The one issue I have with this, is how to give these extra header value to the user of this library.
qoi_read("file.qoi", &w, &h, &channels_in_file, &colorspace, want_channels)
looks like an ugly API. So maybe that would rather be implemented asqoi_read_ex()
andqoi_read()
stays as it is. I'm still not sure if I want that extended header...What's the opinion of the other library authors?
The text was updated successfully, but these errors were encountered: