tags:

views:

869

answers:

4

I was wondering if anyone had any experience in working programmatically with .pdf files. I have a .pdf file and I need to crop every page down to a certain size.

After a quick Google search I found the pyPdf library for python but my experiments with it failed. When I changed the cropBox and trimBox attributes on a page object the results were not what I had expected and appeared to be quite random.

Has anyone had any experience with this? Code examples would be well appreciated, preferably in python.

A: 

You are probably looking for a free solution, but if you have money to spend, PDFlib is a fabulous library. It has never disappointed me.

Ned Batchelder
A: 

You can convert the PDF to Postscript (pstopdf or ps2pdf) and than use text processing on the Postscript file. After that you can convert the output back to PDF.

This works nicely if the PDFs you want to process are all generated by the same application and are somewhat similar. If they come from different sources it is usually to hard to process the Postscript files - the structure is varying to much. But even than you migt be able to fix page sizes and the like with a few regular expressions.

mdorseif
A: 

Acrobat Javascript API has a setPageBoxes method, but Adobe doesn't provide any Python code samples. Only C++, C# and VB.

+2  A: 
danio
This code has the same effect as the code I was experimenting with; the pages of the resulting document were certainly cropped but all blank. Any ideas why that might be?
johannth
You've probably checked this but all I can think is that you are cropping a small area of the PDF that is blank? If you have access to acrobat pro you can use the crop pages tool to show all the page boxes. I don't know of any free tools that can do this. Maybe evince or okular for linux?
danio
I feel really stupid. I misread the api and assumed that the cropbox was upperLeft, lowerRight. So I was indeed just cropping to a blank part of the page.
johannth
Easily done if you are used to working with screen co-ordinates with the origin at top-left. Took me a while to get used to having the origin at bottom-left in PDF but now I am so used to it I find it jarring to switch back to top-left for screen layout work!
danio