Should I bother adding qoi support? My gut feeling is that it's a pretty meh codec that doesn't push any of the existing boundaries. Does anyone use it at all?
It's actually quite annoying how much attention and widespread adoption it's gotten, when far more viable improvements are right there on the table, but no one's interested in them.
For example, I took my 16k square test image and saved it in gimp with 0 compression, resulting in a 1 GB file. This presumably applies the png filters, but skips deflate. Then I did this:
gzip -9:
57 MB, 59.8 s to save, 2.6 s to load.
zstd -18:
43 MB, 25.9 s to save, 0.28 s to load.
xz -9:
39 MB, 17.3 s to save, 0.48 s to load.
@wolfpld none of those have a specification that fits on one page
@wolfpld IIRC it significantly beat all general purpose compression for compression speed at the time of release.
It led to the creation of https://github.com/richgel999/fpng and https://github.com/veluca93/fpnge which are backwards compatible with png and faster than qoi... So it wasn't a waste of time, but I wouldn't support it.
FPNGE slides: https://www.lucaversari.it/FJXL_and_FPNGE.pdf
Old benchmarks (from https://x.com/jonsneyers/status/1483000547934449668):
@dougall If I remember correctly, Rich was toying with png (specifically replacing zlib with zstd) before he stumbled upon qoi.
Are fpnge/fjxl viable at all? Or do they need special support in the decompressor to be fast and not fall into the slow backwards compatibility mode?
The problem with new image formats is that they need widespread adoption to be useful, but somehow we are still stuck with png and jpeg (and maybe some webp here and there).
@wolfpld Yeah, it was topical...
I haven't looked at fjxl.
fpnge is an AVX2 proof-of-concept (no ARM support), so I don't consider it viable, but it is just a compressor. It's not really trying to make decompression faster.
fpng is similar but works on ARM, and has a fast decoder specifically for files it made itself. I think this made it strictly more competitive with qoi, but it's only 10% faster than Wuffs' general-purpose PNG decompression (so I wouldn't bother with using the decoder).
@dougall Yeah, the specific use case I'm most interested in is reading png files created with tools I have no control over, so these solutions are not particularly interesting to me. And png decoding is the bottleneck.
@wolfpld @dougall I was able to speed up standard zlib inflate by adding faster pattern copy. The zlib code works 1 byte at a time to not have unaligned access problems and to properly copy short patterns. I optimized this with register sized writes (32/64-bit) and smart pattern copying. For well-compressed files (aka long runs of repeating patterns), it's dramatically faster at decoding.
@fast_code_r_us @dougall Have you looked at how it works in zlib-ng? They're doing SIMD all over the place, so I would imagine there would be no such problems there.
In Arch, you can install the zlib-ng-compat package to replace zlib, so now I don't even bother building zlib-ng in my projects. Especially considering how problematic it is to get libpng to use the library you just built instead of the system one.
@wolfpld That code has the right idea, but I went a little further. For my own uses, I did away with the "safe" part and require the output buffer to be 8 bytes larger than required to allow writing past the end. A small speed bump, but I only wrote it for myself.