views:

979

answers:

4

Hello,

I am after a pure Python solution (for the GAE) to convert webpages to pdf.

I had a look at reportlab but the documentation focuses on generating pdfs from scratch, rather than converting from HTML.

What do you recommend? - pisa?

Edit: My use case is I have a HTML report that I want to make available in PDF too. I will make updates to this report structure so I don't want to maintain a separate PDF version, but (hopefully) convert automatically.
Also because I generate the report HTML I can ensure it is well formed XHTML to make the PDF conversion easier.

+3  A: 

Have you considered pyPdf? I doubt it has anywhere like the functional richness you require, but, it IS a start, and is in pure Python. The PdfFileWriter class would be the one to generate PDF output, unfortunately it requires PageObject instances and doesn't provide real ways to put those together, except extracting them from existing PDF documents. Unfortunately all richer pdf page-generation packages I can find do appear to depend on reportlab or other non-pure-Python libraries:-(.

Alex Martelli
What's not pure-Python about ReportLab? AFAIK the C extension is optional and for performance acceleration only.
Vinay Sajip
Alex Martelli
I had also heard reportlab was pure Python...pyPdf seems too low level for my need, because I'm not trying to create a PDF from scratch.
Plumo
apparently reportlab has optional c modules to run faster. And PIL can be used GAE: http://code.google.com/appengine/docs/python/images/installingPIL.html
Plumo
@Richard, your total misconception about PIL on GAE is very common, let me try once again to clear it up: with GAE in real service you get a microscopic image-manipulation API that's less than 1/100 the PIL functionality; the GAE SDK can emulate that tiny API based on local installs of PIL, that **DOESN'T** mean you'll get PIL when you run your GAE app on Google's servers. And freetype2 doesn't seem an "optional to run faster C module" to me: how are you going to deal with fonts when freetype2's not around, fast or slow as you may be?!
Alex Martelli
that's interesting about PIL - thanks! The official documentation is somewhat limited here...
Plumo
@Richard, you're welcome -- I don't think GAE's docs are missing in this regard (the URL you gave clearly mentions being about the **SDK**, and the images api's docs clearly identifies the tiny set of functionality that's supported), but people's wishful thinking apparently overwhelms their ability to read clear, unambiguous docs clearly stating that their heart's fondest wish is **not** fulfilled; I'm thinking of timber suppliers able to ship 2by4s in solid wood in quantity as the only remaining approach, cricket and baseball bats being too pricey in the needed volumes;-).
Alex Martelli
point taken ...
Plumo
+2  A: 

What you're asking for is a pure Python HTML renderer, which is a big task to say the least ('real' renderers like webkit are the product of thousands of hours of work). As far as I'm aware, there aren't any.

Instead of looking for an HTML to PDF converter, what I'd suggest is building your report in a format that's easily converted to both - for example, you could build it as a DOM (a set of linked objects), and write converters for both HTML and PDF output. This is a much more limited problem than converting HTML to PDF, and hence much easier to implement.

Nick Johnson
that's a pity...
Plumo
+2  A: 

Pisa claims to support what I want to do:

pisa is a html2pdf converter using the ReportLab Toolkit, the HTML5lib and pyPdf. It supports HTML 5 and CSS 2.1 (and some of CSS 3). It is completely written in pure Python so it is platform independent. The main benefit of this tool that a user with Web skills like HTML and CSS is able to generate PDF templates very quickly without learning new technologies. Easy integration into Python frameworks like CherryPy, KID Templating, TurboGears, Django, Zope, Plone, Google AppEngine (GAE) etc.

So I will investigate it further

Plumo
Have you successful integrated Pisa on your Gae project?
systempuntoout
This guy explains it better than I could: http://blog.notdot.net/2010/04/Generating-PDFs-on-App-Engine-Python-and-introducing-Mapvelopes
Plumo
+1  A: 

Our tool is implemented as a service - you can call it from any language as it doesn't care. You just pass it the URL of the XHTML that you want to convert, and it returns the PDF. It works with CSS and renders pretty-much as you see with a web browser.

Disclaimer, I work for the company that produces it, but it is pretty slick and easy.

http://fourpdf.com/

Regards, Jake.

Jake Liddell