2004.09.16 19:12 "[Tiff] BigTIFF Tag Value Count issue", by Joris Van Damme

2004.09.17 01:54 "Re: [Tiff] BigTIFF Tag Value Count issue", by Bob Friesenhahn

Partially true. Well, completely true, I guess, if memory mapping is involved. However, memory mapping of files of a format that is specifically designed to be > 4 gig, might not be a good idea. In windows, you can have a 'window' of a file

Stop that! We are discussing BigTIFF here!!! :-)

There was a time when 640K seemed like a hard limit but that time is long past. Yesterday 4GB seemed like a hard limit but now 64-bit PCs can be purchased for under $1000. In a couple of years, most PCs will come with 4GB of RAM. Years later people may be cursing the inventors of BigTIFF because they only thought about problems based on past experiences and didn't think ahead.

memory mapped, called 'view' IIRC, limiting stress on address space, but this too, will not be a good idea in the BigTIFF case, since TIFF is inherently completely random access and so with this scheme you might have to swap to another 'window' for just about every new read.

TIFF may be random access but it can hardly be described as "completely random access". Raster image pixel data is usually ordered in a contiguous fashion.

64 bit CPUs are readily able to memory map huge files, limited more by the design of the MMU than anything else.

As a sidenote, slightly of topic as to this particular BigTIFF issue, but on topic as to the memory mapping... I've never quite understood why anyone would want to memory map a TIFF file for use in eg LibTiff. I believe that there is no speed or system advantage. The main advantage of memory mapping, I believe, is merely the convenience of accessing memory instead of having to do IO all the time, which is a bit less troublesome code logic... but, if your codec is build to accommodate all this IO, why would one next essentially wire IO routines to accommodate memory mapping?

With most modern operating systems, memory-mapped accesses are significantly faster than stdio-based or read/write access, particularly if the data is accessed more than once. The limitation on this is how effective the system is at forgetting mapped pages which are no longer needed anymore. If unneeded pages are not purged when they should be the system tends to choke and "thrash" once all physical memory has been consumed. Most OSs (even Windows) provide a call to tell the system that certain pages are no longer required. I have done quite a bit of investigation and benchmarking in this area and have learned that some OSs do tremendously better than others. Linux was among those that fared the worst.

Or am I wrong in thinking there is no actual speed/system/cache/management advantage?

There is always an advantage when the natural filesystem caching is used directly and data copies and function calls are minimized.

Option 0: stick with 4 byte tag count members (alignment, tag data <= 4 gig, Frank's choice)
Option 1: make it an 8 byte tag count (no alignment, tag data > 4 gig allowed)
Option 2, courtesy of Bob: make it 8 byte tag count, add 4 padding bytes (alignment, tag data > 4 gig allowed)

Can I put you down as an option 2 vote, or were you merely signalling the option?

Sure, there should always be a second option so put me down for option 2. :-)

Bob

======================================
Bob Friesenhahn
bfriesen@simple.dallas.tx.us
http://www.simplesystems.org/users/bfriesen