2006.09.22 11:14 "Re: [Tiff] Status of ISO JBIG and CIELAB JPEG support", by Joris Van Damme

Wandering LibTIFF User,

As a regular HylaFAX user, and someone who is regularly called on to do image/document processing more often than my job description calls for, I've been trying to standardise my libtiff installs to provide as wide-ranging support as I can.

To this end, I've studied the patches provided by Lee and others to provide the necessary added functionality to jpegsrc6b for CIE/ITULAB JPEG support, as well as the patches to bring that support into LibTIFF.

I'll state the obvious that nobody seems to notice, first. The LibJpeg library includes a definition for any 'unknown' color space. If you set both jpeg and output color spaces to this value, at the right point in the decompression cycle, LibJpeg next returns exactly the color data that is in the file. This mechanism allows you to do any required color conversions, if any, outside of the library. Thus, you can retrieve any data in any color mode from a JPEG file, LibJpeg doesn't actually have to know ITULAB. This mechanism is well documented, too.

Imagine having 27 codecs around for 27 file formats. Are you going to manage 27 different color conversion modules? Debug all 27 off them? Only to next find there are inavoidably inconsistencies? Only to next find you're doing conversions from A to B in decoding with one codec, merely to instruct next converting from B to A when writing the data with another? Only to next find that a prior convertion from A to B, to next convert from B to C to suit your purposes, accumulates convertion errors, where a direct convertion from A to C would do a lot better? When adding support for ITULAB to 10 of these 27 codecs, are you planning to do 10 separate difficult jobs? Only to next find it hard to maintain as these 10 are modifications to open source libraries and you'll need to merge your changes and the public changes time and time again?

It seems clear that color convertion isn't really correctly placed inside a codec. Instead of having 27 such color convertion modules, codecs should normally be made to return exactly what is there, and any convertion should be managed by the image processing layer.

OK... I can feel another lenghty discussion coming, so disregard the 'should's in the above. Let's not make this religious, and return to the more practical level. My main point is that you don't actually need to patch LibJpeg to be able to retrieve and use the ITULAB color data, and that it's actually the most practical and convinient to do this without any LibJpeg patch at all.

[cd]jpeg and jpegtran seem to work fine. Alas, I don't have any ITU/CIELAB JPEG files (that aren't in TIFFs) to fully test out the library - does anyone have ptrs to such files?

I've some, or at the very least I had some... I'll try and find them and get back to you later today.

<http://bugzilla.remotesensing.org/show_bug.cgi?id=736>

It is dangerous trying to build support for something based on bugzilla attachment, as many a bugzilla attachment is totally bazurk. Here's a tagdump.

SubFileType (1 Long): Page
ImageWidth (1 Short): 1728
ImageLength (1 Short): 2128
BitsPerSample (3 Short): 8, 8, 8
Compression (1 Short): JPEG Technote #2
Photometric (1 Short): CIELab
ImageDescription (11 ASCII): 3604278160

Make (62 ASCII): LT V.92 1.0 MT5634ZBA-V92 SERIAL DATA/FAX MOD... Model (62 ASCII): LT V.92 1.0 MT5634ZBA-V92 Serial Data/Fax Mod...

StripOffsets (1 Long): 8
Orientation (1 Short): TopLeft
SamplesPerPixel (1 Short): 3
RowsPerStrip (1 Long): 4294967295
StripByteCounts (1 Long): 133746
XResolution (1 Rational): 204
YResolution (1 Rational): 196
PlanarConfig (1 Short): Contig
ResolutionUnit (1 Short): Inch
Software (27 ASCII): HylaFAX (tm) Version 4.2.1
DateTime (20 ASCII): 2005:01:21 13:48:59
HostComputer (16 ASCII): gollum.x101.com
YCbCrSubsampling (2 Short): 2, 2
FaxRecvParams (1 Long): 2203689
FaxRecvTime (1 Long): 89

According to TIFF 6.0 specification, the YCbCrSubsampling tag is just for YCbCr, and should not be combined with any other color space. A decoder that follows this convention, would ignore the tag in this case, and thus assume there is no subsampling inside the JPEG data. According to some, we ought to instead extend the Subsampling support for any color space that consists of one brightness channel followed by 2 chromaticity channels. This is not farfetched, after all we need to 'logically extend' the TIFF 6.0 spec in many areas these days so such is not uncommon. So a decoder that follows this convention, would expect [2,2] subsampling inside the JPEG data.

(The real tragedy is of course that YCbCrSubsampling default is [2,2]. So a writer that feels no subsampling scheme applies to the color mode and thus not emits the tag, is next misinterpreted by a reader that feels subsampling should apply and thus feels the missing tag indicates the default [2,2] applies. A more logical choice of default, i.e. [1,1] would have avoided this situation.)

So let's take a look. Here's a dump of the JPEG data in the single strip.

Offset 0 (2 bytes): SOI (Start of image)

Offset 2 (14 bytes): APP1 (Application segment)

Offset 16 (420 bytes): DHT (Define Huffman table(s))

Offset 436 (134 bytes): DQT (Define quantization table(s)) Offset 570 (19 bytes): SOF0 (Start of frame, non-differential, Huffman, Baseline DCT)

Offset 589 (14 bytes): SOS (Start of scan)

Offset 603 (133135 bytes): Entropy-coded data

Offset 133738 (6 bytes): DNL (Define number of lines)

Offset 133744 (2 bytes): EOI (End of image)

The first obvious thing that is out of the ordinary, is the DNL marker. This DNL is part of a mechanism that allows you to keep dumping scanlines sequentially into a JPEG file, without prior knowledge of how many scanlines you're about to process. It is thus popular in fax transmission. The length of the image is defined as 0 in the SOF marker that proceeds the image data, and the actual image length is appended at the end inside the DNL marker when the image has been processed and its length is finally known.

LibJpeg doesn't support the DNL scheme, and it's not valid inside JPEG compressed data inside TIFF anyway. What Howard did at the time, was 'patch up' the SOF marker to include the proper image length. Strictly speaking, this is now a totally invalid JPEG stream. But most implementations, LibJpeg included, can interpret the stream just fine and simply ignore the bazurk DNL marker.

(Same applies to the APP1 marker. It shouldn't be there inside JPEG compressed data inside TIFF. But it just gets ignored so there's no actual problem.)

The same SOF marker also includes the subsampling values, so here's next a dump of that marker's data.

P (Sample precision): 8
Y (Number of lines): 2128
X (Number of samples per line): 1728
Nf (Number of image components in frame): 3
Component 0
    C (Component identifier): 0
    H (Horizontal sampling factor): 2
    V (Vertical sampling factor): 2
    Tq (Quantization table destination selector): 0
Component 1
    C (Component identifier): 1
    H (Horizontal sampling factor): 1
    V (Vertical sampling factor): 1
    Tq (Quantization table destination selector): 1
Component 2
    C (Component identifier): 2
    H (Horizontal sampling factor): 1
    V (Vertical sampling factor): 1
    Tq (Quantization table destination selector): 1

This confirms my memory that Howard patched up the Y member of the SOF marker. It also indicates that subsampling is indeed used, and the subsampling values in the IFD are correct.

So... am I being bad here by trying to use these features in newer libraries than Lee originally patched? Which version of libtiff (and which patches) provide support for both the CIELAB JPEG files and the JBIG files? (I have the latest release of Markus's JBIG-KIT compiled and installed on the system.)

I'm unable to answer these questions, best I can do is provide theoretical background and file analysis. I will also try and find the ITULAB JPEG images that at least at one time were part of my testimage library.

I hope was somewhat helpful though. Perhaps someone else will jump in to answer your bottom-line questions.

Best regards,

Joris Van Damme
info@awaresystems.be
http://www.awaresystems.be/
Download your free TIFF tag viewer for windows here:
http://www.awaresystems.be/imaging/tiff/astifftagviewer.html