2003.07.25 21:25 "[Tiff] Patch to expand MAXFILES limit in tiffsplit", by Andrew J. Montalenti

2003.07.25 21:25 "[Tiff] Patch to expand MAXFILES limit in tiffsplit", by Andrew J. Montalenti

Hey there everyone. This is my first patch for libtiff. Be gentle ;)

I ran into this problem with tiffsplit the other day. I just want to mention that the tool is very good and works much faster than comparable tools that do tiff splitting, so kudos to the author.

It does have one very annoying limitation, though: the maximum size of the multipage tiff documents it can split.

Yesterday, when trying to convert a 4000-page PDF into 4000 single page TIFFs (yes, I know, crazy!), I ran into the problem. I got the 4000-page PDF into a 4000-page multipage TIFF (sizing in at 600MB) easily enough, but when I tried to use tiffsplit to get 4000 individual TIFFs, I got a "too many files" error.

Well, to scratch an itch I took a look at the code. There was no real limitation of the program to process larger documents. It was just that the author had chosen a naming scheme of "xaa, xab, xac, xad... etc.", which has an upper limit of only 2,028.

To sane people who don't deal with OCR'ed documents regularly, this upper limit is reasonable. But to some people for whom life's joy is sucked out by watching progress bars operate on large multipage TIFFs, 2028 pages is merely "the tip of the iceberg!"

My patch just adds another field (so that it works "xaaa, xaab, xaac... etc.") which expands the upper limit to something insane like 52,728. The truth is, not even God himself could help you if you had to deal with a multipage TIFF that big, nevermind tiffsplit ;)

I chose to keep the same (general) naming scheme instead of switching to something more practical (like tiff_000001, tiff_000002, etc.) because I figured some people rely on the tiffs being named that way and I didn't want to cause a little earthquake.

Thanks for the great tools and library,

-Andrew

p.s. I heard from someone else who uses tiffsplit that this limitation has been mentioned on the mailing list quite a few times, so for those of you who can't wait for it to get committed to CVS, simply apply it to a vanilla 3.5.7 source tree:

cp tiffsplit.c.patch tiff-v3.5.7/
cd tiff-v3.5.7/
patch -p0 < tiffsplit.c.patch

Compile and enjoy :)

--- tools/tiffsplit.c   2000-04-28 09:55:39.000000000 -0400
+++ tools/tiffsplit.c        2003-07-25 16:48:43.000000000 -0400
@@ -79,6 +79,7 @@
 newfilename(void)
 {
        static int first = 1;
+  static long lastTurn;
   static long fnum;
       static short defname;
   static char *fpnt;
@@ -94,7 +95,7 @@
             }
               first = 0;
      }
-#define       MAXFILES        676
+#define     MAXFILES        17576
   if (fnum == MAXFILES) {
                 if (!defname || fname[0] == 'z') {
                      fprintf(stderr, "tiffsplit: too many files.\n");
@@ -103,8 +104,23 @@
            fname[0]++;
             fnum = 0;
       }
-      fpnt[0] = fnum / 26 + 'a';
-     fpnt[1] = fnum % 26 + 'a';
+     if (fnum % 676 == 0) {
+         if (fnum != 0) {
+                       //advance to next letter every 676 pages
+                       //condition for 'z'++ will be covered above
+                    fpnt[0]++;
+             } else {
+                       //set to 'a' if we are on the very first file
+                  fpnt[0] = 'a';
+         }
+              //set the value of the last turning point
+              lastTurn = fnum;
+       }
+      //start from 0 every 676 times (provided by lastTurn)
+  //this keeps us within a-z boundaries
+  fpnt[1] = (fnum - lastTurn) / 26 + 'a';
+        //cycle last letter every file, from a-z, then repeat
+  fpnt[2] = fnum % 26 + 'a';
      fnum++;
 }