2017.06.28 14:36 "Re: [Tiff] Excessive memory allocation while chopping strips", by Even Rouault

For strip reading, there's the mechanism I already put in place and described in my previous email. Small files that would declare large strip byte count will no longer cause the full allocation to be done before beginning reading the strip content.

Looks like a reasonable fix but:

It's not that simple (should ideally have a few pair of eyes go over it).

Reviewing is always welcome. I'd be open to a more formal review process if that's needed but it would imply people would volunteer reviewing proposed patches in due time. My feeling is that the libtiff development community has not lately been sufficiently large and active for that.

It made reading the large strips a little slower.

Agreed. Although potential RAM recopying due to realloc() must be significantly faster than the typical I/O acquisition time.

It may have increased the peak amount of memory required (in case realloc needs to copy).

Agreed. To limitate a bit that issue, I've only enabled that on 64 bits builds, because the main issue might be the lack of virtual address space on 32 bits.

We could short-circuit all this non-trivial progressive allocation and reading logic in the case where we know the file size. I wanted to have something that would work without knowing the file size, but I agree my use cases where getting the file size is a slow operation are not common. As you suggest below a TIFFOpen() flag for something else, I'm wondering if we couldn't have a TIFFOpen() flag that would mean "calling TIFFGetFileSize() is a costly operation on this file handle, please do not do this unless strictly required".

It does not help with compressed strips, as I understand it.

Hum, it should apply both to uncompressed or compressed strips/tiles AFAIR. And this changeset doesn't apply to the mmap'ed case since that one knows the file size and already validates the strip size against it.

I don't know about that. Just because people like to use their shiny new fuzzer on everything they can get their hands on does not mean code will be written to use the new API. I might be wrong but I get the feeling that in many cases libtiff is being dragged along indirectly with a bunch of other dependencies, typically through some flavor of an abstract image handling library. Do you know better from reading the bug reports?

Most of the time it seems people use the TIFF utilities (tiffcp, tiffsplit, tiff2pdf, etc...) for the fuzzing. Most of the bugs are in the utilities themselves.

As far as I'm concerned, most issues I've fixed recently were found through the fuzzing of GDAL and its GTiff driver which uses libtiff ( the GDAL builds I use have an internal libtiff copy, that I sync with libtiff CVS head, so it benefits indirectly from the compilation instrumentation flags used by the fuzzing environment. GDAL builds libtiff with CHUNKY_STRIP_READ_SUPPORT)

If we're going to have this, I think it should be the default behavior across the board. We should first fix the cases needed by well formed large files (ie. progressive allocation for large tags, as you did for the strips). Then simply refuse strips/tiles larger than X (say, 1 GB or even 100 MB) and add a flag to TIFFOpen() to override that for the rare few projects which really need to deal with such files and don't care about the potential consequences. Making it the default requires that we make it work well and ensures it will be used. It's also more work, unfortunately.

I didn't dare suggesting setting stricter limits in libtiff by default (with a flag to retain current behaviour as you suggest), but that would certainly be a better protection than opt-in mechanisms. I'm certainly +1 for that.

--
Spatialys - Geospatial professional services
http://www.spatialys.com