2009.01.09 14:45 "Re: [Tiff] TIFF standards and technical notes", by Gary McGath
I've written the following draft for a post on my File Formats Blog (http://fileformats.blogspot.com) on the status of JPEG encoding in TIFF. I'd like to run it by the people here for comments before making the post public. I've done a copy from the preview, links to the cited documents which will appear in the post are missing here.
TIFF's Catch 22
Section 22 of the TIFF 6.0 specification (PDF), on JPEG compression, has been a subject of ongoing controversy. The problems with it are discussed in Draft Tech Note 2 at remotesensing.org.
The major problem cited is that the tags defined in Section 22 require detailed understanding of the JPEG encoding. Even an application which simply modifies tags a TIFF file, without any expectation of decoding JPEG codestreams, must parse the data which JPEG tags refer to in order to preserve it. For example, tag 521, JPEGACTables, points to a list of offsets to Huffman AC tables, whose format is given as follows:
16 BYTES of “BITS”, indicating the number of codes of lengths 1 to 16;
Up to 256 BYTES of “VALUES”, indicating the values associated with
those codes, in order of length.
There is a similarly incomplete description of tag 520 (JPEGDCTables). This isn't sufficient information to determine the length of the table. The best a JPEG-unaware application can do, if it has to move the table, is to allocate 272 bytes for the copy of an AC table or 33 bytes for a DC table. From a quick reading, it looks to me as if all problems with uncertain sizes for JPEG data blocks can be solved by assuming a maximum length, so the problem may be overstated.
What is clear, though, is that there's confusion in the TIFF world on how to handle JPEG compression. Tech Note 2 goes on to recommend an alternate specification. Adobe has adopted this alternative (or something very close to it) in its Photoshop TIFF Technical notes (PDF), but without rescinding Section 22. Undoubtedly there are still Section 22-based TIFF files in use and at least a few still being created.
The basic problem is that Adobe hasn't revised the TIFF specification since 1992. Whatever problems there are with it look as if they'll remain the official standard forever. This is particularly a problem from the standpoint of digital preservation. "Real" TIFF files should follow the standard, not a technical note which (at least in name) is intended for a particular application. But this approach isn't realistic when so many files exist that depend on the Photoshop Notes. It has to be considered part of the specification.
JHOVE, by the way, deals with both sets of tags.
Digital Library Software Engineer
Harvard University Library Office for Information Systems