AWARE SYSTEMS
TIFF and LibTiff Mail List Archive

Thread

2020.05.17 09:41 "[Tiff] fax2tiff adds extra line", by John Muccigrosso
2020.05.17 19:23 "[Tiff] fax2tiff adds extra line", by Thomas Bernard
2020.05.18 12:20 "Re: [Tiff] fax2tiff adds extra line", by John Muccigrosso
2020.05.18 21:10 "Re: [Tiff] Tiff Digest, Vol 19, Issue 2", by Richard Nolde
2020.05.19 16:45 "Re: [Tiff] Tiff Digest, Vol 19, Issue 2", by Bob Friesenhahn

2020.05.17 09:41 "[Tiff] fax2tiff adds extra line", by John Muccigrosso

I'm using pdfimages to process a PDFs which results in some ccitt files. Using fax2tiff I'm able to turn those into slightly more manageable tiffs. Here's the problem: in some cases fax2tiff produces files that seem to be 1 row too long. Here's some example output when I use the verbose mode on one such file:

> Fax4Decode: Warning, Premature EOL at line 5200 of strip 4294967295 (got 0, expected 3450).
> /path/to/file/filename.ccitt:
> 5201 rows in input
> 0 total bad rows
> 0 max consecutive bad rows

It then goes on to produce a file that has 5201 rows, which other non-ccitt output from the same pdf suggest is incorrect; it should be 5200, as the warning seems to acknowledge.

So my question is, assuming the warning is correct, should fax2tiff be concluding that the file has 5201 rows and therefore producing such a file, or should it not include that apparently 0-length row?

As a check, once I know the row count, I can tell imagemagick to process the ccitt file with a command like this:

> magick -size <width>x<height> g4:filename.ccitt info:

If I use the width from the params file and the height that fax2tiff reported (5201), I get an error similar to that reported by fax2tiff:

> magick: Premature EOL at line 5200 of strip 0 (got 0, expected 3450). `Fax4Decode' @ warning/tiff.c/TIFFWarnings/1037.

Whereas if I use for the height the row where both report the error (5200), all is well. Again, this is the row count in the non-ccitt output for the other pages in the PDF processed by pdfimages, which suggests to me that it's the correct count.

I realize here that pdfimages may also be producing a badly formatted file.

The files I'm working with are here:

https://www.dropbox.com/s/vky6dz0eaepmuf2/alex-039.ccitt
https://www.dropbox.com/s/kpk97mzfi0xc5qj/alex-039.params

The image is of a page with image captions, but no images.

Using fax2tiff from the command line on a Mac OS 10.13.6. LIBTIFF, Version 4.1.0.

Thanks in advance for any help.

John