Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RIFF container #27

Open
nigeltao opened this issue Dec 17, 2021 · 5 comments
Open

RIFF container #27

nigeltao opened this issue Dec 17, 2021 · 5 comments

Comments

@nigeltao
Copy link
Owner

Here is a concrete suggestion for an extensible header / container format. There are reserved-zero bits that are effectively a version number (#2). It can hold metadata such as color profiles (#7). It allows parallelized decoding (#9). Etc.

It builds on RIFF and vanilla (headerless) QOI, heavily inspired by how WebP Extended builds on RIFF and VP8/VP8L. We could call this format "qoiriffic", file extension "qoir".

RIFF

A quick overview: RIFF is a 12 byte header then a sequence of chunks.

The header is:

  • 4 bytes "RIFF",
  • a u32le File Size (yeah, for streaming encoders, use vanilla QOI with a different header / container) and
  • 4 more bytes (in this case, "QOIR"). QOIR is this container format. QOI (or perhaps QOIC or QOIP) is the pixel-compression bytecode thing (Separate file format from compression scheme #8). The "upstream" stand-alone QOI format with a 14-byte header could be called QOIF.

Each chunk is:

  • a u32le Chunk FourCC (e.g. "EXIF"),
  • a u32le Chunk Size and
  • a variable sized Chunk Payload (padded to an even number of bytes).

QOIX

The first chunk is a "QOIX" chunk, which is pretty much exactly like WebP's "VP8X" chunk, except that the Alpha (L) and Animation (A) bits must be zero. Alpha is already part of vanilla QOI (unlike VP8). Animation doesn't seem necessary but we could add it (copying WebP/VP8X) if we wanted to.

This "QOIX" chunk has reserved-zero bits (the high 24 bits of the u32le after "QOIX") that can act as version number. The low 2 bits could indicate the presence of "COMP" and "TILE" chunks.

TBD: After that is an optional "COMP" chunk, for configuring custom commpression schemes (e.g. xz, zstd).

TBD: After that is an optional "TILE" chunk, for cutting a larger image into uniform-sized tiles (and maybe specifying a background color for missing tiles).

After that is an optional "ICCP" chunk, again just like WebP/VP8X.

After that are optional "pre-QOIT" chunks.

After that comes one or more "QOIT" chunks: QOI tiles.

After that are optional "post-QOIT" chunks.

After that is an optional "EXIF" chunk, again just like WebP/VP8X.

After that is an optional "XMP " chunk, again just like WebP/VP8X.

QOIT

Each QOI tile chunk is:

  • a u32le Chunk FourCC ("QOIT"),
  • a u32le Chunk Size and
  • a variable sized Chunk Payload (padded to an even number of bytes, like all RIFF chunks are).

That Payload starts with a u32le Flags field. From LSB to MSB:

  • 2 bits Transparency:
    • 0: no alpha (equivalent to "3 channel BGR / RGB" or "4 channel BGRX / RGBX")
    • 1: non-premul alpha
    • 2: reserved
    • 3: premul alpha
  • 1 bit TileSubrectangle.
  • 2 bits Compression:
    • 0: no compression
    • 1: LZ4 block compression
    • 2: reserved
    • 3: Custom compression
  • 3 bits Lossiness (bit depth). Zero means lossless. Non-zero means lossy. For example, a value of 2 means that the 8-bit pixel values were right-shifted by 2 prior to encoding. Decoding undoes that (after executing all of the QOI bytecode) as best it can: each uint8 pixel value p is replaced by (((p&0x3F) << 2) | ((p&0x3F) >> 4)).
  • 24 bits reserved.

After the 4 byte Flags:

  • If the TileSubrectangle bit was set then 4 u32le values x0, y0, x1, y1 that define the top-left (inclusive) and bottom-right (exclusive) of this tile. TBD: some of this (e.g. if tiles have uniform width and height) could be factored out into a global "TILE" chunk.
  • If the Custom compression bits were set then a u32le Compression Configuration Size value and then CCS bytes to define the compression codec. For example CCS=4 may be followed by the 4 bytes "zstd" for Zstandard with no further configuration. Qoiriffic decoders must support LZ4 block compression (given in the header bit) but aren't required to support any custom compression codecs (the decoding fails as if it was an unsupported pre-QOIT chunk). TBD: again, some of this could be factored out into a global "COMP" chunk.

After that, vanilla QOI (i.e. bytecode) without the 14-byte header (but with the 8-byte padding trailer).

Pre-QOIT and Post-QOIT Chunks

These are extensions - arbitrary chunks that not all decoders are required to support. But if you control all of the producer and consumer implementations (e.g. a video game's first party assets), feel free to put your custom extensions here.

Being before or after the QOIT chunks corresponds to being a critical or ancillary chunk in the PNG spec. Unsupported pre-QOIT chunks (e.g. some sort of Hilbert curve pixel traversal configuration #6 or subtract-green or palette transform) means that the overall decoding fails but unsupported post-QOIT chunks (e.g. some sort of thumbnail or modification-time representation) can be ignored.

This document does not define any extensions.

@chocolate42
Copy link

I've always disliked RIFF thanks to avi files (and rarely wav files) hitting filesize limits back in the day (if only they had been a little more forward thinking and used 64 bit even if only in the main header), but stuffing QOI into a RIFF format is still appealing for automatic metadata support at least.

To get around limitations of RIFF the QOIX chunk could have a u32 field specifying how many following tile chunks the data is comprised of, allowing the stream data to exceed 4GiB assuming tools don't break when given a >4GiB file (put all metadata at the beginning if possible). Limiting a tile chunk to 4GiB (possibly 2GiB) is then enough to conform? The QOIX chunk could potentially also store the actual filesize as u64.

It wouldn't take too much thought from there to allow the format to represent any of the following:

  • A single image stored in a single tile
  • A single image stored in multiple tiles, whether by choice or thanks to the tile size limitation
  • Optionally group tiles so that bitstreams and compression can span tiles
  • Optionally have separate tiles for parallel processing
  • Multiple images, all of a sudden it's a lossless I-frame 4:4:4 video format (yes that's a lot of caveats)

@nigeltao
Copy link
Owner Author

Well, RIFF / IFF is just a sequence of chunks and a chunk is (FourCC, Size32, Payload). It'd be easy (if non-standard) to define a "64 bit IFF" where Size was 8 bytes instead of 4. Something like:

  • Header ("IFF6", FourCC, Size64) and then
  • A sequence of chunks (FourCC, Size64, Payload).

That's it.

We could possibly drop RIFF's "payloads are padded to an even number of bytes", while we're there.

The first byte of "IFF6" might also change to something like 0xEE (Latin-1 "î") to avoid being ASCII or UTF-8, and also avoid collding with e.g. TIFF images can start with 0x49 "I".

We could also possibly define "a Size64 of 0xFFFF_FFFF_FFFF_FFFF" means indeterminate (i.e. read to EOF), for the streaming case. Or maybe not. Just thinking out loud.

@chocolate42
Copy link

If we break the RIFF spec like that does it make sense to use RIFF? Using 8 bytes in a RIFF chunk header will presumably break all tools for metadata etc that seem like the main reason to use RIFF. Even >4GiB may be a stretch, using RIFF might mean biting the 32 bit bullet by checking width*height*5 + overhead for overflow.

@nigeltao
Copy link
Owner Author

You're probably right though that, barring big animations or pathological (noisy) images, 4GiB isn't going to bite in practice.

Still, even if it wouldn't be RIFF. It'd be a very straightforward RIFF-like type-length-value format.

Even if we stuck with official RIFF (with its 4GiB limitation), RIFF+QOI still wouldn't be an AVI, WAV or WEBP file. Are there common RIFF tools that just work on the generic format (as opposed to being e.g. specifically an AVI player or WAV editor)?

@BenBE
Copy link

BenBE commented Apr 22, 2022

What about just avoiding the nesting of the main chunk of RIFF. This still limits per-chunk data to ~2/4GiB, but causes any implementation to expect reading to EOF. Thus as long as implementations can read larger files arbitrary file sizes can be supported as long as individual parts of the file fit in 4GiB (which is a good requirement to have even with current amounts of memory available).

This also plays well with the tiling support mentioned above as multiple tiles are automatic as they are just multiple chunks of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants