2007.07.05 07:24 "[Tiff] ANN: first preliminary version BigTIFF upgrade of LibTiff", by Joris Van Damme

Folks,

The BigTIFF upgrade of LibTiff is available in CVS for the brave and curious.

At this stage, it's *only* for the brave and curious, though. It has not been sufficiently tested. A number of tools and contribs have not been updated yet.

Whilst backwards compatibility has been our holy grail, there are some extensions and even some changes you need to be aware of. Here's some of them:

The options parameter in the TIFFOpen and TIFFClientOpen funcs has been extended. When creating new files, you can add option '4' to specify you want to create a ClassicTIFF file, though that is the default and the option is not strictly necessary. (As such, old calling code will continue to function and create ClassicTIFF files.) Or you can add option '8' to specify you want to create a BigTIFF file instead. This new option is also reflected in some of the tools we already upgraded. For instance, you can use the -8 option on tiffcp to have tiffcp produce BigTIFF files instead of the default ClassicTIFF.
Whilst on additional option is provided for version selection when creating new files, no such option is necessary when reading TIFF files. LibTiff reads ClassicTIFF and BigTIFF both, and the application does not need to be aware which TIFF version an opened file is.
Allthough the tag count in BigTIFF is 64bit, we restricted the count in the implementation to a much more reasonable size. This is necessary in current implementation, because all tag data gets read automatically in the IFD reading stage, so if there's half a dozen private tags with multiple gigabytes of data that causes considerable overhead even if the application level is never interested in these tags. Our choice to ignore tags with data longer then a certain sanity value is much needed as things stand. We also recommend to step away from writing tiles that are 8 kilobyte in their uncompressed form, or writing single-line strips, in really big files, resulting in mega's of tiles or strips. It's much more efficient to choose bigger tile or strip sizes, up to several megabyte if needed, and have a few kilo of tiles or strips instead.
Allthough it's rare, some application code does directly access file offsets. Some of these are automatically upgraded because they used the toff_t type, others need to be aware that the datatype changed and need to start using toff_t or uint64. This impacts access to tags like the EXIF IFD tag, for example, or the SubIfds tag, or to StripOffsets or TileOffsets.
Allthough it's rare, some application code does use structures like TIFFHeader or TIFFDirEntry that used to be an exact binary representation of TIFF structures. These need to change. The old TIFFHeader structure is replaced by the new TIFFHeaderClassic, TIFFHeaderBig, and TIFFHeaderCommon structures that are an exact binary representation of the ClassicTIFF and BigTIFF header, and of the part that is common to both. There is no new equivalent for the old TIFFDirEntry structure (or more precisely, there is still a TIFFDirEntry structure, but it is changed, moved to library-private definition, and no longer an exact binary representation of the tag structure of either TIFF version).
Allthough there is still a functional definition for types like toff_t (file offset), tstrip_t (strip index number), etc, we recommend against using these in newer code. We have learned that it is next to impossible to use these consistently and make real abstraction of the binary format of these types. Instead, at a certain level we always end up doing casts anyway, and taking the exact binary format into account, so these types are nothing but dangerously misleading and obfuscating. You do not need to update calling code that uses them, as 99.9% of such code will continue to work. But we recommend against using them in newer calling code, and we started replacing them with binary clear types like uint16, uint32 and such in the library.
We do use and will continue to use one functional type that is an exception to the above rule, being tmsize_t. This is a signed memory size type, i.e. it is int32 on 32bit machines, or int64 on 64bit machines.
Sizer functions, like TIFFTileSize or TIFFScanlineSize and the like, return a tmsize_t value. This is because we figure 98% of the calling code uses the return value as sizes in allocations and the like. So, any overflow that is theoretically possible with BigTIFF when LibTiff is running on a 32bit system, is best detected inside the sizer functions and it is best to return a type that makes sense as a memory size. If your calling code is the exception and is interested in actual file size, you best use the newer TIFFTileSize64 or TIFFScanlineSize64 function that returns an uint64 type.
If you dive into the code, you'll see considerable change in some files like tif_dirread and tif_dirwrite. These changes don't impact backwards compatibility, they are mostly a clean rewrite that does allow somewhat more robust reading of the unexpected already and will also serve future API extension but does not impact current API or functionality in a negative way that you need to know about.
Please let us know if you have any trouble at all. Please let us know if you don't, so that we have an indication it's quite allright already, if indeed it is.

Best regards,

Joris Van Damme
info@awaresystems.be
http://www.awaresystems.be/
Download your free TIFF tag viewer for windows here:
http://www.awaresystems.be/imaging/tiff/astifftagviewer.html