2022.06.11 07:23 "Re: [Tiff] How to read JPEG-compressed YCbCr TIFFs", by Joris Van Damme

Amyspark,

That implies a conversion to RGB. We need the fully decoded, uncompressed YCbCr data, hence why I've been investigating how to make it happen.

I get that, and I'm unable to help, but I can clarify what to expect from LibTiff in this case and how this came about.

It is certainly true that TIFF, as in, the file format itself, supports subsampling, independent of compression mode. In fact, with the later arrival of various flavours of CIE L*a*b* encoding, even though these aren't in the original specification, it would be correct to say that TIFF supports subsampling independent of colour space, as long as it makes some sense (as in, you really shouldn't be subsampling RGB, you do need a separate brightness channel followed by two chroma channels).

This being said, nobody was ever using that feature before the arrival of JPEG compression inside TIFF. That got implemented in LibTiff by delegating JPEG compression and decompression to LibJpeg. And so there was this extra layer of complication and confusion around subsampling, as the subsampling byte-level layout was very different in LibJpeg, inspired by the JPEG file format, compared to the subsampling byte-level layout in LibTiff as it would have been inspired by the TIFF format.

From an engineering perspective, what was required was conversion code to go between the two different layouts. That way, all further resolution of data organisation and color conversion could proceed along a generalized path, and at least that way that generalized path would have been complete and there would have been incentive to maintain it. From a pragmatic software development perspective, though, it hardly seemed worth all that trouble. It was way easier to input RGB to the LibJpeg subcodec when compressing, and output RGB from it when decompression, forcing LibJpeg to do resolve all the subsampling/desubsampling issues and colour conversions. In LibJpeg, the code was already there.

Of course, that leads to inconsistencies on higher levels. Suddenly, the photometric interpretation value wasn't to be taken literally in some cases, depending on whether compression mode was JPEG, and neither were subsampling values, decompressed strip/tile access to subsampled data wasn't ever going to work in these cases like it was supposed to in the general case, etc... These inconsistencies in turn have led to many bugs and problems over the years.

At one point I designed the current old-style JPEG decoder (that next got maintained to death, and subsequently disabled by default, last I heard). I decided to do the right thing in that instance, especially since it was only supposed to decode anyway. I did not add support for the 'pseudotag' or any other special case hacks, but added code that did the conversion from LibJpeg subsampled YCbCr output to the subsampled organisation LibTiff was expecting, in as far as there was any support for it at all. Of course some people complained, as suddenly they had to deal with YCbCr and subsampling, whereas before they got away with ignoring it. As LibTiff at that point didn't have much support for (de)subsampling, they were right to complain from a usability standpoint.

I sort of hoped my contribution could be the starting point for more generalized subsampling support, as well as YCbCr support, independent of one another and independent of compression mode as is the spirit of the TIFF file format. Turns out the application was way too niche though, and the work required to move forward from that point just wasn't worth it to anyone.

So there you are. I get your code design. I did the same at one point. You are correct in trying to get to that particular data the way you want it, given the way you're using LibTiff. But there's no way it's ever going to work. Your only option is to have a separate path for JPEG compressed subsampled data, using the "TIFFTAG_JPEGCOLORMODE" pseudotag, pretending it's actually non-subsampled RGB, as does LibTiff internally. The data you're looking for, the way you want it, is not actually there at any point in the inconsistent internal decoder design.

Best regards,

Joris Van Damme
AWare Systems