ansaurus

Question

How to embed external OCR into existing PDF?

Answer 1

A:

If all you want to do is convert an existing pdf to grayscale, try Imagemagick:

convert foo.pdf -colorspace Gray -compress zip gray.pdf

I don't think this will change any other attributes in your pdf.

DaveParillo 2009-10-01 16:15:14

This does not seem to retain the hidden text layer in the PDF. (Tried with ImageMagick 6.4.5.)

Jukka Matilainen 2009-10-05 22:00:18

odd, because imagemagick uses ghostscript to do it's image conversion...

DaveParillo 2009-10-06 01:18:12

I also tried it, and also lost the text layer. I used ImageMagick 6.4.5, too.

kepler 2009-10-06 12:53:40

Answer 2

+1 A:

For your follow-up question about processing PDF files without losing the the hidden layers: I believe Ghostscript is able to do this. For example, the following command should convert a PDF to grayscale:

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dColorConversionStrategy=/Gray -dProcessColorModel=/DeviceGray -sOutputFile=output.pdf input.pdf

Jukka Matilainen 2009-10-05 22:28:48

Nice, it worked. But the output is not as clean as I wanted. If ImageMagick could convert the PDF without losing the text layer, I would like to process each page with something like: convert \( -white-threshold 50% \) -monochrome ...Maybe there is a way of telling IM how to use GS, like DaveParillo said. I'll check on this later.

kepler 2009-10-06 13:03:33

ansaurus

tags:

views:

answers:

How to embed external OCR into existing PDF?

related questions