tags:

views:

30

answers:

2

How can I convert pdf files from version 1.1 to 1.4 (or higher)?

Actually I need some sort of command line tool for batch converting or some API to be able to convert dynamically severall documents.

A: 

You most likely will need the full version of Adobe Acrobat. (As opposed to the free version, Adobe Reader.)

JYelton
Actually I need some sort of command line tool for batch converting or some API to be able to convert dynamically.
You should add that requirement to the details of your question.
JYelton
+1  A: 

Pdf 1.1 is forward compatible with pdf 1.4. Everything in pdf 1.1 will work with pdf 1.4 - it's guaranteed by the spec. Let's assume that you've got some justifiable reason why this is not good enough for you (let's assume, for example, that you have a non-spec compliant tool that consumes PDF and explodes on any file version less that 1.4).

We can focus on the main syntactic differences between versions.

All PDF files have a header somewhere in the first 1024 bytes. In most cases, it's the very first line, but that's not guaranteed (I'm looking at you GhostScript!). The header looks like this in PDF 1.1:

%PDF-1.1

in PDF 1.4, it looks like this:

%PDF-1.4

So in theory, all you need is a tool that will look in the first 1024 bytes for a file for "%PDF-1.1" and change it to "%PDF-1.4". You could use sed, perl, etc to do something like that for you. You could write it in C and you would be tempted to do something like this:

#define PDFHEADERSIZE 1024
bool ChangeFileToNewPdfVersion(char *file)
{
    char *replacePoint = NULL;
    FILE *fp = fopen(file, "rw");
    char buf[PDFHEADERSIZE + 1];
    buf[PDFHEADERSIZE] = '\0';
    if (fread(buf, 1, PDFHEADERSIZE, fp) != PDFHEADERSIZE) { fclose(fp); return false; }
    fseek(fp, 0, SEEK_SET);
    if ((replacePoint = strstr(buf, "%PDF-1.1")) == NULL) { fclose(fp); return false; }
    replacePoint[7] = '4';
    if (fwrite(buf, 1, PDFHEADERSIZE, fp) != PDFHEADERSIZE) { fclose(fp); return false; }
    fflush(fp);
    fclose(fp);
    return;
}

which will work in most sane cases. It will not work if the file starts, for example, with 0 bytes, which would serve as null terminators in the block of data.

A better choice (really) would be to cobble up a simple state machine to find %PDF-1. by reading 1 byte at a time until it either finds it or passes 1017 (1024 less the header length), then reads the next byte, if it's a '1', it seeks back a byte and writes a '4'.

The only other thing you would need to worry about is that PDF 1.4 suggests that the document catalog should contain a Version key with the file version. Since this is defined as optional in the spec, you are safe to ignore it.

So this will solve your problem. I do not, however, believe that you should need to do this. Really.

You should take some time to read part of the PDF spec, specifically section I.2 about version numbers and compatibility.

plinth
Adobe spec contains following implementation note:14.Acrobat viewers also accept a header of the form%!PS−Adobe−N.n PDF−M.mmay be worth to look for this form too.
Bobrovsky