views:

632

answers:

3

Although the PDF specification is available from Adobe, it's not exactly the simplest document to read through. PDF allows documents to be encrypted so that either a user password and/or an owner password is required to do various things with the document (display, print, etc). A common use is to lock a PDF so that end users can read it without entering any password, but a password is required to do anything else.

I'm trying to parse PDFs that are locked in this way (to get the same privileges as you would get opening them in any reader). Using an empty string as the user password doesn't work, but it seems (section 3.5.2 of the spec) that there has to be a user password to create the hash for the admin password.

What I would like is either an explanation of how to do this, or any code that I can read (ideally Python, C, or C++, but anything readable will do) that does this so that I can understand what I'm meant to be doing. Standalone code, rather than reading through (e.g.) the gsview source, would be best.

+1  A: 

A plugin for GSview for viewing encrypted PDFs is here.

If this works for you, you may be able to look at the source.

DMC
+1  A: 

If I remember correctly, there is a fixed padding string of 32 (?) bytes to apply to any password. All passwords need to be 32 bytes at the start of computing the encryption key, either by truncating or adding some of those padding bytes.

If no user password was set you simply have to pad with all 32 bytes of the string, i.e. use the 32 padding bytes as the starting point for computing the encryption key.

I have to admit it's been a while since I've done this, I do remember that the encryption part of the PDF is an absolute mess as it got changed significantly in nearly every revision, requiring you to cope with a lot of cases to handle all PDF's.

Good luck.

Pieter
A: 

xpdf is probably a good reference implementation for this sort of problem. I have successfully used them to open encrypted pdfs before.

tfinniga