views:

1432

answers:

10

I need to automatically generate a PDF file from an exisiting (X)HTML-document. The input files (reports) use a rather simple, table-based layout, so support for really fancy javascript/css stuff is probably not needed.

As I am used to working in Java, a solution that can easily be used in a java-project is preferable. It only needs to work on windows systems, though.

One way to do it that is feasable, but does not produce good quality output (at least out of the box) is using CSS2XSLFO(http://re.be/css2xslfo/index.xhtml), and Apache FOP to create the PDF files. The problem I encountered was that while CSS-attributes are converted nicely, the table-layout is pretty messed up, with text flowing out of the table cell.

I also took a quick look at Jrex, a Java-API for using the Gecko rendering engine.

Is there maybe a way to grab the rendered page from the internet explorer rendering engine and send it to a PDF-Printer tool automatically? I have no experience in OLE programming in windows, so I have no clue what's possible and what is not.

Do you have an idea?

edit: The flying saucer/iText thing looks very promising. I will try to go with that.

Thanks for all the answers

+1  A: 

If you have the funding, nothing beats Prince XML as this video shows

Ólafur Waage
A: 

You can use a headless firefox with an extension. It's pretty annoying to get running but it does produce good results.

Check out this answer for more info.

rojoca
+12  A: 

The Flying Saucer XHTML renderer project has support for outputting XHTML to PDF. Have a look at an example here.

Mark
+8  A: 

Check out iText; it is a pure Java PDF toolkit which has support for reading data from HTML. I used it recently in a project when I needed to pull content from our CMS and export as PDF files, and it was all rather straightforward. The support for CSS and style tags is pretty limited, but it does render tables without any problems (I never managed to set column width though).

Creating a PDF from HTML goes something like this:

Document doc = new Document(PageSize.A4);
PdfWriter.getInstance(doc, out);
doc.open();
HTMLWorker hw = new HTMLWorker(doc);
hw.parse(new StringReader(html));
doc.close();
fred-o
+1  A: 

If you look at the side bar of your question, you will see many related questions...

In your context, the simpler method might be to install a PDF print driver like PDFCreator and just print the page to this output.

PhiLho
A: 

There is a PHP class which can perform such an operation.

The web site is at http://www.rustyparts.com/pdf.php

Jon Winstanley
+1  A: 

Is there maybe a way to grab the rendered page from the internet explorer rendering engine and send it to a PDF-Printer tool automatically?

This is how ActivePDF works, which is good means that you know what you'll get, and it actually has reasonable styling support.

It is also one of the few packages I found (when looking a few years back) that actually supports the various page-break CSS commands.


Unfortunately, the ActivePDF software is very frustrating - since it has to launch the IE browser in the background for conversions it can be quite slow, and it is not particularly stable either.

There is a new version currently in Beta which is supposed to be much better, but I've not actually had a chance to try it out, so don't know how much of an improvement it is.

Peter Boughton
Thanks for the helpful answer. I don't think ActivePDF is really suitable because of the price, but it's good to know something like that exists.
panschk
A: 

If you have php installed, try fpdf. It's at http://fpdf.org

I already used FPDF for a web project. It is not useful for what I want to do, because you have to build your PDF document step by step. You can't just feed it an HTML document and get a PDF doc back.
panschk
+3  A: 

Did you try WKHTMLTOPDF ?

It's an open source implementation of webkit. Both are free.

We've set a small tutorial here

Mic
For a straight html-page-to-pdf conversion, this is better than anything else I've seen, free or commercial.
MGOwen
A: 

I believe dompdf hasn't been mentioned yet.