2019.10.01 07:39 "[Tiff] Support for multiple types in TIFFReadCustomDirectory", by Dmitry Semyonov

Hello,

We are using TIFFReadCustomDirectory routine to read GPS meta data from TIFF images, passing a TIFFFieldArray with a list of GPS tags and their types.

This works well for TIFF files which use correct types for GPS tags according to specification. Unfortunately sometimes we encounter TIFF images which use wrong types for some tags (i.e. DOUBLE instead of RATIONAL for Latitude/Longitude values). In such cases TIFFReadCustomDirectory prints a warning to the console and ignores the corresponding tag.

We would like to be more user friendly in such situations, and correctly load Latitude/Longitude values even when they are encoded using DOUBLE type. It seems that TIFFReadCustomDirectory has special support for this case, iterating through multiple tif_fields items with the same tag value. So the solution could be simple: include multiple items in the TIFFFieldArray for the same tag with different types we would like to support. Unfortunately we found that it doesn't work, as the _TIFFMergeFields routine ignores multiple definitions when adding them to tif_fields list:

               for (i = 0; i < n; i++) {

                              const TIFFField *fip =

                                             TIFFFindField(tif,
info[i].field_tag, TIFF_ANY);

                /* only add definitions that aren't already present */

                              if (!fip) {

                        tif->tif_fields[tif->tif_nfields] = (TIFFField *)
(info+i);

                        tif->tif_nfields++;

                }

               }

As the code supporting multiple types for the same tag in TIFFReadCustomDirectory seems to be incompatible with the duplicates removal code in the _TIFFMergeFields, we've got the feeling that there might be a bug in the _TIFFMergeFields implementation.

We also found an old commit that changed behavior of the _TIFFMergeFields routine:

<https://gitlab.com/libtiff/libtiff/commit/a6bea11162768ecdda2deb623ed676e02 2a119ee> https://gitlab.com/libtiff/libtiff/commit/a6bea11162768ecdda2deb623ed676e022 a119ee

This modification made it possible to perform partial merges of the TIFFFieldArray records. But it also changed the way duplicates are located: before this modification newly added tags were compared only to the previously existing ones. But after modification the tags are compared to all tags, including newly added in the previous iterations. We wonder if this second change was intended, or was introduced by mistake.

If it was really intended, than there still seems to be a bug in the duplicate filtering code: the TIFFFindField call expects tif_fields array to be sorted, as it uses a binary search to find a tag. But _TIFFMergeFields code performs sorting only after all new tags are added to the list. This means that in the process of adding new tags to the list the tif_fields may become not sorted, and TIFFFindField using a binary search can miss some tags.

In case the current behavior is incorrect, it can be easily fixed by incrementing tif_nfields only after all tags are added:

               size_t tif_nfields = tif->tif_nfields;

               for (i = 0; i < n; i++) {

                              const TIFFField *fip =

                                             TIFFFindField(tif,
info[i].field_tag, TIFF_ANY);

                /* only add definitions that aren't already present */

                              if (!fip) {

                        tif->tif_fields[tif_nfields] = (TIFFField *)
(info+i);

                        tif_nfields++;

                }

               }

               tif->tif_nfields = tif_nfields;

Can you please confirm if it is a bug?

Thanks,

Dmitry