2010.05.12 16:47 "Re: [Tiff] Combining multiple G4 images into a single output image", by Bob Friesenhahn
In the case of merging/overlaping two pictures at bilevel, and the image on top is not placed at a byte boundary you need to mask and combine only the edges of the images, the fully overlapping part then only needs bit shifting. I'm not really shure what it means to do such massive shifting operations. It really depends on the size of the images to be placed. Maybe it could be optimized by calculating what is really seen later of the image to be placed and just skip full bytes which would be overwritten by images placed later. I also can think of some lookup-tables to aid masking and shifting. One LT could get a byte value multiplied with a shiftcount, building a pointer and just read back the precalculated value. This LT could be build on runtime by a function. This way no real shifting had to be done, but only lookups which may be much faster.
Unless you do something really silly, the Group4 compression in libtiff will dominate the time for what you will be doing since it is much more complex.
Look up tables may be useful for older CPUs but on newer/modern CPUs, a direct algorithmic approach is usually best since it avoids memory access latency and cache thrashing.
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/