2005.03.30 00:18 "[Tiff] RFC: fast 'copy free' tiff decoding", by Ron

2005.03.30 11:38 "Re: [Tiff] RFC: fast 'copy free' tiff decoding", by Ron

How to handle subsets of the available data is probably the most pressing one. Without handling them more than necessary and without losing data that I have incomplete knowledge about.

May I resume your goal like this...?

LibTiff offers no generic access to TIFF (Tag) data as it is. Instead, it goes a long way interpreting (checking tag datatype and count for starters), and next offers tag specific access that may be substantially different from the original tag data (like is the case with the Colormap tag, or even the count of the BitsPerSample tag). That is fine if your code at some point really needs a convinient single value for specifically the BitsPerSample tag, but it is a bridge too far if you really need more generic access to the exact TIFF (Tag) data of any tag, in any TIFF.

What is needed is something like 'two layers' of data access here. The bottom one being the one you propose to add, that simply offers exactly what is there without doing any sorts of interpreting. This layer does not even need tag definitions, it is tag ignorant. The next layer, is the layer we're used to in LibTiff, the layer that is friendly in checking the tag values and interpreting them, offering access to more higher level structures depending on the specific tag in question.

This is a nice summary, thank you.

If this is your actual goal, I must say I find it a good and natural and correct thought... At the risk of sounding boring: I've done the same in my native Delphi TIFF codec. But it is a major rewrite to get it into LibTiff...

I believe that it would be less and better work than having these two layers implemented independently though. Enough that I'm prepared to actually do the grunt work on it if there is a reasonable consensus on this being the right approach to pursue and there is no killer argument to shoot it down. It is a major change certainly, but 'rewrite' to me means throwing away old broken code -- and I don't see much that would really need throwing away here. But if we want to handle tiff data at a lower level of abstraction than we currently do, I don't see how we can escape at least some refactoring.

If I'm interpreting your goals correctly, I'm afraid you might be confusing the issue with all this talk about memory maps and other implementation issues.

Indeed. I don't want people to think I'm on a slash and burn re-write mission here either. I have so far come at this from mostly a clean slate though, so others can better advise how well my ideas may fit in with the existing plans for libtiff. It just happened that the theoretically most efficient method was also the least code to implement a proof for, and so the one I can speak with the most confidence about right now. Everything else is speculation still to be proven :-)

I've personally always considered memory mapping TIFFs a bad idea. There's no telling what size files you might need to handle, especially with the new BigTIFF format. You could need to eat 8 gig of address space on a machine that is theoretically limited to 2 gig for memory maps, and shares this 2 gig with other applications, like I think is the case on common windows systems.

This isn't a problem of memory mapping though, you'd have exactly the same problem if you tried to naively load it all at the same time however you did it. You can still mmap small chunks of it at a time, and indeed will have to if the VM on your OS isn't up to snuff in that respect.

Moreover, there seems very little benefit to abusing address space like this, as opposed to good ol' plain file IO, especially given the fact that you'll need to aling and byteswap in seperately allocated memory blocks anyhow.

I'm not quite so pessimistic about it, but I agree this is not an argument that we should get bogged down in. Nothing I have suggested so far falls apart without mmap (nor am I professing it alone to be a panacea), and it will certainly need to support both that and more conventional means of obtaining a buffer full of data too.

Your initial summary covers my most major itch nicely, I'll happily defer to any better solution to it that exists, but this is the most promising I have on paper so far and I don't see many things wrong with it that wouldn't also be a factor whatever else we do. Does anyone else?

Ron