2008.11.20 04:17 "Re: [Tiff] Memory leak (TIFFOpen, TIFFReadTile)?", by Ron

On Wed, Nov 19, 2008 at 08:56:32PM -0000, Andy Cave wrote:

FWIW We use our own file I/O with libtiff (like a lot of people - or so I get the impression) and that definitely does not use memory mapped IO; that causes too many issues. For anyone who does anything with image data, who therefore knows what they are doing (certainly more than the OS), reading & caching the data themselves is far more efficient (since they know what there are doing) than letting the OS 'guess'.

Programmers are notoriously bad at guessing what will be optimal, most of the OS 'guesses' (at least for linux) are generally profiled and subject to constant efforts at improvement.

The minimal benchmarks posted here seem to show little difference, and the original 'problem' appears to have been mostly spurious. Is that really worth breaking the existing behaviour for all existing apps? I want to see something more concrete than speculation to justify taking that step.

If you use memory mapped IO you run the risk of running out of address space,

_Virtual_ address space. So until you are talking about Gb of allocation this is unlikely to be a problem.

the OS can end up paging other data out of memory which is more valuable

mmap'ing a file doesn't immediately page in real memory for every mapped page. It only does that when you try to access it. If your other data is empirically more more valuable, then a good OS should page out some of the mmap'ed data that you aren't any longer accessing instead. Since that data is already file backed if mmap'ed and not modified, it doesn't need to be written to swap and the OS can just drop the page and get it back from the file if it needs it again later. If you are short on real memory, this should be far more efficient than if it has to page out dirty memory buffers that you have somewhere to disk.

Again this is not a problem most application level programmers would be very good at guessing the best answer to. Better to let the VM experts deal with OS memory management (and report trouble cases to them if you have them). My tiff using application can't possibly know that memory dedicated to other applications is in fact 'more valuable' to me at any instant in time. The OS kernel at least has data to allow it to make such estimates and allocate real memory to the pages that really need it the most (often).

(especially if you access the data once), etc... If you access the data more than once, you can arrange to cache that which you use more than once, or do other clever stuff, etc... Memory mapping was touted years ago as the solution for memory problems, but personally I've never been convinced.

Actually my main complaint with mmap'ing tiffs in the current implementation is that we don't really use it to the best effect. Given the structure of tiff, it would be possible to simply mmap a file, and then set the pointers to the data structures to access it directly to the offets in that file.

I did some work on that a few years back (and there should be discussion of this in the archive), but it isn't something I've really needed for a while now.

But in the meantime, I'm inclined to cast my lot with Frank. Unless someone provides a concrete example of a problem or benefit that would warrant changing this, the best thing to do is retain compatibility with the existing behaviour.

People who want to override that already have the ability to do so. Ultimately, what will work best for them can only be determined by profiling _their_ real application, in realistic use scenarios, on a machine typical of what they expect it should need. Anything else is pretty much a coin toss. And what comes up heads for one user, may not be the right answer for any other.

Cheers,
Ron