2017.06.27 22:08 "Re: [Tiff] Excessive memory allocation while chopping strips", by Even Rouault
On mardi 27 juin 2017 22:51:43 CEST Nicolas RUFF wrote:
I saw in https://trac.osgeo.org/gdal/changeset/39082 that you try to defend against overly large memory allocations.
I have a pathological test case from fuzzing where header values are:
ImageWidth = 1060
ImageLength = 536871550
It will hit the following piece of code in tif_dirread.c:
- Some manufacturers make life difficult by writing
- large amounts of uncompressed data as a single strip.
- This is contrary to the recommendations of the spec.
- The following makes an attempt at breaking such images
- into strips closer to the recommended 8k bytes. A
- side effect, however, is that the RowsPerStrip tag
- value may be changed.
if ( !_TIFFFillStriles(tif) || !tif->tif_dir.td_stripbytecount )
ChopUpSingleUncompressedStrip() will compute the required number of strips and reach this point:
newcounts = (uint64*) _TIFFCheckMalloc(tif, nstrips, sizeof (uint64),
"for chopped \"StripByteCounts\" array");
In the my case, nstrips = 268435775 (and rowsperstrip = 2), which will try to allocate 268435775 * 8 bytes. This is already beyond 2**31, but we can make it even larger and beyond 2**32 by playing with ImageLength.
Rather than playing this cat-and-mouse game of assigning arbitrary limits to every possible tag, I would suggest the following patch in tif_aux.c:
_TIFFCheckRealloc(TIFF* tif, void* buffer,
tmsize_t nmemb, tmsize_t elem_size, const char* what)
void* cp = NULL;
tmsize_t bytes = nmemb * elem_size;
- * XXX: Check for integer overflow.
- if (nmemb && elem_size && bytes / elem_size == nmemb)
+ /* TIFF size is limited to 32-bit by design and 31-bit by implementation, so
+ * there should be no reason to ever allocate more than INT_MAX bytes.
+ */+ if ((nmemb <= INT_MAX) && (elem_size <= INT_MAX) && (bytes <=
cp = _TIFFrealloc(buffer, bytes);
Note: this patch assumes that tmsize_t is 64 bits, otherwise the check should be done in a different way.
What do you think?
Glad you raise this point. I wanted to raise this topic (and a bit beyond the particular case you mention), but haven't found yet the time.
BigTIFF may theoretically have strips that are larger than 4GB (although that would be insane to craft such a file), so that might be legit to allocate that amount of memory
And in your case that could even be a standard TIFF file with width=1 and height=2 billion that would fit on 2 GB. So either it has a single strip declared and ChopUpSingleUncompressedStrip() is right trying to create a large amount of strips that will require more than 2 GB of RAM, or either it has 2 billion strips declared in StripOffsets/StripByteCounts tags and you might have try allocating & reading that from disk (in which case BigTIFF would be needed since the size for the tags woud be enormous)...
But indeed you raise a more general problem. libtiff may allocate in various places enormous virtual memory for very short files. I've tried workarounding a few situations at application level in GDAL, but I failed in a few ocurences, for example on a OJPEG file where TIFFReadDirectory() will force a call to _TIFFFillStriles() due to how the OJPEG codec works.
A potential solution would be to check the allocation request w.r.t to the file size. One downside of this is that, in one of my use case, I want support reading TIFF (some particular formulations where the IFD is at the beginning) in a pure streaming way (from a pipe), so I have no idea of the file size. Or in other use cases, I read a gzipped TIFF in a streaming way from disk, but knowing the uncompressed size is super costly (requires to uncompress the whole file). But I admit those are marginal cases.
I can see several ways of addessing the issue, which might be complementary:
- having a way at the API level to specify to libitff the maximum amount of memory allowed for a single allocation. --> moderately easy to implement (not hard, but requires a path on the whole code base) I believe TIFFOpen() cannot allocate a insane amount of memory, so having this TIFFSetMaxMemoryForSingleAllocation() can be called just afterwards
- having a way at the API level to reject files they would consider insane, for example by specifying a maximum width * height. TIFFReadDirectory() would honour that to early exit and avoid later processing that might cause excessive memory allocation --> easy to implement
- before allocating a lot of memory before reading from file (reading the content of a tag typically), allocates a partial amound of memory, read that from the file. If successful, realloc() to a larger size, and read again, etc.. until your read the content. I've implemented this for strip content fetching lately: https://github.com/vadz/libtiff/commit/ec74e94dc1f6ac7d28113ea68cdd3ce028150477 Could be replicated for StripOffsets/StripByteCounts reading. There's the following ticket about that: http://bugzilla.maptools.org/show_bug.cgi?id=2675
--> moderately involved to implement
* as ChopUpSingleUncompressedStrip() may allocate memory unrelated to the file size, limit it to a mximum number of comupted strips, say 1 million. This strip chopping mechanism is a band-aid for malformed file. We don't have to support it for files of insane dimension, IMHO Besides allocating a lot of memory, this also can take a few hundred of milliseconds to compute the synthetic tag values.
--> easy to implement
* replicate in libtiff the lazy fetching of the strip offset and counts (read only the values for the current strip, actually a bit before/after and cache that) that I've implemented in GDAL so you don't need to fetch the whole tags (that would probably solve my OJPEG case that I can't workaround from GDAL); This would help a lot in situations where you switch back and forth among the IFDs of a same file (for example if you display a big image with overviews, and zoom in/zoom out often, you end up reading them all the time if you just have a single TIFF handle)
--> involved to implement. Requires auditing the codebase to no longer use td_stripbytecount / td_stripoffset. Retrieval of the whole content of the corresponding tags through TIFFGetField() would be left only for compatibility of external code.
Spatialys - Geospatial professional services