2004.05.27 01:15 "[Tiff] large TIFF - two alternatives", by Steve Carlsen

2004.05.27 07:30 "Re: [Tiff] large TIFF - two alternatives", by Andrey Kiselev

Hello, Steve,

Alternative 1: Minimal changes for TIFF to support 8-byte addresses

  1. ID = 43 (or maybe 0x4242?)
  2. 8-byte offset to 0th IFD
  3. Value/Offset fields are 8 bytes
  4. 8-byte offset to the next IFD (does anyone use this?)

Multiple IFDs are used for multipage documents and images which contain overviews. Also when we need to update the directory contents we are just add the new directory at the end of the file. So this feature is in heavy use.

  1. add TIFFType of LONG8, an 8 byte (unsigned) int
  2. StripOffsets and TileOffsets and ByteCounts can be LONG8

I like this approach because of it simplicity. It can be implemented in libtiff without big changes in the code.

Alternative 2: A more modern and general approach

(in the following, any power of two can be used instead of 8)

  1. ID = the string "TIFF2"
  2. 8-byte pointer to 0th IFD/Dictionary
  3. Value/pointer fields are 8 (2**x) bytes
  4. 8-byte child pointers
  5. add an 8-byte TIFF integer type, but it's rarely used explicitly
  6. StripOffsets and TileOffsets and ByteCounts are 8 bytes
  7. support for 4 or 8 or 16 or ... 2**X byte addresses (header tells which one you're using)
  8. use of ASCII tags/keys instead of binary ones
  9. most values are ASCII strings, too: integers, floating point numbers, enums, yielding fewer data types

I think that we must avoid ASCII numbers. It is very easy to fall into localization nightmare. Also some numerical metadata we may want to save into the tags can be quite large (almost as large as image itself) and overhead of ASCII representation will be valuable.

  1. Dictionaries (IFDs) can be really large.

Do you mean that the directory entry can be large enough to hold the any data value?

  1. Cleaner hierarchical structure, with explicit inheritance from parents. top-level dictionary contains metadata only; it does not point to image data.
  2. Explicit support for layers
  3. All data is planar (though it can be interleaved by row or tile)
  4. So there are 4 levels to the hierarchy: Root, Image, Layer, and Channel. Only Channel dictionaries point to image data.
  5. Multi-byte numeric values are always big-endian, not that it matters very often in this more string-oriented approach.
  6. All other rules are normal TIFF rules.

The second approach looks nice in long term. But why not to implement both ones? The first technique will be powered by existing implementations, so we can achieve our goal (large files) shortly.

Andrey

Andrey V. Kiselev
Home phone: +7 812 5274898 ICQ# 26871517