
Jason White <jason@jasonjgw.net> writes:
How can I most effectively compress scanned page images in PDF files without unduly degrading the visual quality?
Do you have access to the documents that built the PDF? i.e. the foo.tex and foo-1.jpg ? If so it should be easy -- just deal with the images before they enter the PDF. jpegoptim -m75, pngcrush, etc. I don't know how best to compress embedded vector images -- they're usually embedded as PDF (instead of EPS), but I guess you would do path simplification on the source in inkscape or whatever...
I've been trying it with ImageMagick and a file that I want to compress, but so far without achieving much compression.
AFAICT imagemagick operates on PDFs by calling gs to do all the work. You could ask #imagemagick on freenode.
The ImageMagick identify command, applied to one of the original pages shows: PDF 595x842 595x842+0+0 16-bit Bilevel DirectClass 63.2KB 0.000u 0:00.009
I've been experimenting a little with the -compress, -density and -quality commands of the convert command, but without as much progress as I would prefer. In most cases the output is larger than the input.
I don't think imagemagick is the best tool for this. However I did recently have success improving scanned receipts (which the scanner gave as JPEG-in-a-PDF) using pdftoimage, then using imagemagick to reduce the size of the image a quarter and convert it to a monochrome PNG. Don't forget +repage when you resize.