ansaurus

Question

How can you find a problem with a programmatically generated PDF?

Answer 1

+2 A:

Validating PDF files can be quite a tricky task -- primarily because the tools required to do it properly are very expensive.

Acrobat has a tool (Advanced > Preflight > PDF Analysis > Report PDF syntax issues) that lets you scan a PDF for any syntax issues, but that tool can't be access programmatically.

Appligent has a tool called pdfHarmmony, which is powered by Adobe's PDF Library, and can e access programmatically, but it is very expensive (US$2500+). This option would give you the best results if you can afford it.

There's another option which is 3-Heights PDF Analysis & Repair, I don't know what it's quality is like, but it is similarly expensive.

This PDF Validator tool on SourceForge might interest you, however, it only analyzes the documents structure and not the content itself, so corrupt images or content streams won't be picked up.

Unfortunately, due to the difficulty of analyzing PDF files in detail, there are really any free tools that can do it properly, but I suppose a tool that checks the documents structure is better than nothing.

Rowan 2010-09-03 07:36:57

Answer 2

+2 A:

The "cheapest" (and at the same time quite reliable!) way is to use Ghostscript. Let Ghostscript interpret the PDF and see which return value it gives. If it has no problem, the PDF file should be OK. On Windows:

 gswin32c.exe ^
       -o nul
       -sDEVICE=nullpage ^
        d:/path/to/file.pdf

The nullpage output device will not create any new file. But Ghostscript will tell on stdout/stderr if it encounters an error. Check for the content of the %errorlevel% pseudo environment variable. -- On Linux:

 gs \
       -o /dev/null \
       -sDEVICE=nullpage \
        /path/to/file.pdf

(Check return value with echo $? for a 0 value for "no problems".)

In case of errors, Ghostscript issues some info which may be helpful to you. In any case, at least you can positively identify those files which do have NO problems: if Ghostscript can process them, Acrobat (Reader) will have no problem rendering them too.

pipitas 2010-09-06 23:42:57

ansaurus

tags:

views:

answers:

How can you find a problem with a programmatically generated PDF?

related questions