I was wondering if is there a way for php to check if a PDF file stored locally on the server is corrupted or not. We have a php application that deals with a lot of scanned documents converted in PDF and it would be nice to check which of them is corrupted to alert the user. I tried to look around but with no luck.
+1
A:
There are versions of pdflib available which can read PDFs - you could simply try to open and read each page with that.
Paul Dixon
2009-08-13 07:31:08
Thank you for the answer, I was looking if there could be pure php solution without third part apps. By the way I'll give it a look
2009-08-13 08:59:01
It's not really a 'third part app'... the PDFLib pCOS, at least, is available as a PHP extension. I've used pCOS to analyse PDFs before (inspecting images, embedded fonts etc.). I'm not sure what it would be like detecting 'corruption', but you can definitely test it out for free.
Narcissus
2009-08-13 11:24:58
A:
The problem is there are many ways a PDF file can be corrupt.
Maybe your best solution would be to find a PDF reading lib and try to extract the first word from each page or something. That would at least catch some basic types of corruption.
James Healy
2009-09-13 01:45:14