Our Django application needs to do a few things with uploaded PDF files:
- Verify that the file is a PDF and isn't corrupted
- Check that the file isn't encrypted
- Count the number of pages
We run into problems with one unfortunately popular application that's idea of an unencrypted PDF export is actually an encrypted PDF file, just with a blank password. We've been working with PyPDF to date, which is unable to read those files because the encryption is non-standard. The application exporting these files is quite popular among our users, which is a pain.
Another application exported files with a bad MIME type (something other than application/pdf
), so whatever we end up using needs to be able to cope with silly choking points like that.
Is there an actively maintained, robust PDF library anywhere that we could utilize? Even PDFtk, a CLI utility that a couple people have been recommending, was last updated in 2006.
Any help is appreciated.
Update: To clarify, it can be free or paid-for. Suggest whatever you think is the best option.