tags:

views:

1431

answers:

4

Hi,
I need to add some extra text to an existing PDF using Python, what is the best way to go about this and what extra modules will I need to install.

Note: Ideally I would like to be able to run this on both Windows and Linux, but at a push Linux only will do.

Thanks in advance.
Richard.

Edit: pyPDF and ReportLab look good but neither one will allow me to edit an existing PDF, are there any other options?

A: 

You may have better luck breaking the problem down into converting PDF into an editable format, writing your changes, then converting it back into PDF. I don't know of a library that lets you directly edit PDF but there are plenty of converters between DOC and PDF for example.

Wahnfrieden
Problem is that I only have the source in PDF (from a 3rd party) and PDF -> DOC -> PDF will lose a lot in the conversion. Also I need this to run on Linux so DOC may not be the best choice.
Frozenskys
I believe Adobe keeps PDF editing capability pretty closed and proprietary so that they can sell licenses for their better versions of Acrobat. Maybe you can find a way to automate the usage of Acrobat Pro to edit it, using some kind of macro interface.
Wahnfrieden
If the parts you want to write to are form fields, there are XML interfaces to editing them - otherwise I can't find anything.
Wahnfrieden
No I just wanted to add a few lines of text to each page.
Frozenskys
+1  A: 

Have you tried pyPdf ?

Sorry, it doesn’t have the ability to modify a page’s content.

Looks like that might work, has anyone used it? What's the memory usage like?
Frozenskys
It does have the ability to add a text watermark and if it was formatted properly it might work.
Frozenskys
A: 

If you're on Windows, this might work:

PDF Creator Pilot

There's also a whitepaper of a PDF creation and editing framework in Python. It's a little dated, but maybe can give you some useful info:

Using Python as PDF Editing and Processing Framework

thedz
The white paper looks good but is a little light on code, and I don't really have the resource to implement a whole PDF framework myself! ;)
Frozenskys
+7  A: 

I know this is an older post, but I spent a long time trying to find a solution. I came across a decent one using only ReportLab and PyPDF so I thought I'd share:

  1. read your PDF using PdfFileReader(), we'll call this input
  2. create a new pdf containing your text to add using ReportLab, save this as a string object
  3. read the string object using PdfFileReader(), we'll call this text
  4. create a new PDF object using PdfFileWriter(), we'll call this output
  5. iterate through input and apply .mergePage(text.getPage(0)) for each page you want the text added to, then use output.addPage() to add the modified pages to a new document

This works well for simple text additions. See PyPDF's sample for watermarking a document.

dwelch
"create a new pdf containing your text to add using ReportLab, save this as a string object"How do you do that? Its a canvas instance.
Lakshman Prasad