2017.09.16 19:50 "[Tiff] TIFF tile size limit", by Bob Friesenhahn

2017.09.17 17:00 "Re: [Tiff] TIFF tile size limit", by Roger Leigh

On 17/09/17 17:27, Even Rouault wrote:

On dimanche 17 septembre 2017 11:29:24 CEST Kemp Watson wrote:

"Tile sizes are already allowed to be larger than the image dimensions by the TIFF specification since they can spill over the right and bottom of the image. A one pixel image could be in a 1kx1k "tile²."

Ugh. That¹s the root of the issue, for sure.

Another issue, more with the implementation of libtiff than with the spec itself, is if you have a big number of tiles or strips, the allocation of the StripByteCount and StripOffset arrays can be very costly. Let's take the case of a 1 million x 1 million image with tiles of 32 x 32. The allocation cost for those arrays is (1e6 / 16) * (1e6 / 16) * sizeof(uint64) * 2 = 62 GB (actually libtiff would refuse to read the content of a tag if it is more than 2GB large), and currently they are allocated and read entirely as soon as you read the first tile. libtiff should rather just read in the file the values of the tile byte count & offset for the tile it is going to read. Even with a more reasonable tile size of 256x256, the cost is still 244 MB that you need to allocate and read at file opening.

This has definitely shown up with the (fairly crude) benchmarking I've done so far. Note "small" is 2¹⁴×2¹⁴ and "big" is 2¹⁶×2¹⁶.

File size variation: https://github.com/openmicroscopy/ome-files-performance/blob/master/analysis/tile-test-write-size.pdf -- This is synthetic but derived from real data written by libtiff.

It's mainly to show how space is wasted when tiles overlap the image border, but at the low end the metadata usage becomes significant; you can see a small increase at the smallest tile sizes from the extra count/offset array sizes.

Writing performance: https://github.com/openmicroscopy/ome-files-performance/blob/master/analysis/tile-test-write-performance.pdf -- you can see an event more dramatic effect upon write times with very small sizes after which it seems to become I/O bound and tile size ceases to have any appreciable effect. There's something which doesn't scale with vast quantities of small tiles, but it will need more investigation to determine exactly what. Since it's negligible with any tile size greater than 256 it's not a big priority for me right now, but fixing it (if possible) would have some small benefit.

Tile counts: https://github.com/openmicroscopy/ome-files-performance/blob/master/analysis/tile-test-count.pdf -- purely synthetic and fairly obvious, but demonstrates the scaling problems with small tile sizes.

Regards,
Roger