views:

4573

answers:

13

what is the easiest(and fastest) way to perform this kind of transformation: "Data in XML" to "Some MS Word 2003 Supported format" to PDF using Java?

My first guess was to fill the template with XML data (using Placeholders for example) and then save it and convert it to PDF. But I can't just put placeholders to DOC files, and I can't convert from some other Word formats to PDF...

My primary task is to convert XML Data to PDF allowing users to change the PDF on-demand. The best way to change the PDF on-demand seems to give user some kind of MS Word readable document, and then convert it back.

There are 2 main problems with this task: 1) I can't use OpenOffice for conversion. 2) System should be able to convert ~1 page of table-based document per 1 second on 2Ghz Core. 3) RTF does not provide enough styling, so some more complex format should be used.

Thanks in advance.

+2  A: 

To convert XML data to (a.o.) PDF output you should have a look at Apache FOP. This is a processing engine that uses XSL-FO as defined by the W3C to render the output.

This solution is arbitrary powerful in it's styling and extremely easy to automate. However, the drawback is that you'll need XSL stylesheets that may not be easy to manipulate by end users.

Is anyone here who has an idea on how to make XSL-FO stylesheet-editing an end-user capabale task?

mkoeller
Apache FOP does not meet the perfomance requirements. 1 page document is processed for 2-3 seconds here, while 1 second is maximal allowed time.Also, end users should have the way to edit the resulting document, not the template.
Max
+1  A: 

Yeah good luck with this one, I've looked into this problem before for a project that thankfully fell through.

DOC is a difficult format to deal with, but I would recommend taking a look at Apache POI. For dealing with PDFs I recommend Itext, though the previously mentioned FOP may be better at going from XML to PDF, I don't have an experience with it so I couldn't say one way or another..

James McMahon
A: 

I was looking at something similar about a year ago. The Apache FOP solution is good for XML to PDF. However you might find it a little code centric.

Personally I went with the server control from TxTextControl which allowed me to serve PDF certificates to website users.

Phil Hannent
A: 

The best way to change the PDF on-demand seems to give user some kind of MS Word readable document, and then convert it back.

You might want to consider Acrobat Forms instead, which gives users the ability to fill out fields directly in PDF files. See http://www.planetpdf.com/mainpage.asp?WebPageID=338 for example.

Ilja Preuß
Well, I need to generate the document first, for example to create a table with same styled rows, but different amount of them.
Max
I'm sure you can generate PDF forms programmatically, too.
Ilja Preuß
+3  A: 

You could use Word 2003's own XML-based file format (WordML) as an intermediate format.

There are tools available to create such documents from XML input data in the Word 2003: XML Software Development Kit (SDK) which you can download from Microsoft:

http://www.microsoft.com/downloads/details.aspx?familyid=ca83cb4f-8dee-41a3-9c25-dd889aea781c&displaylang=en

One option would be to generate an XSLT using the tools in the SDK (there should be a sample included) and run this XSL transformation using Java. Then users can edit the output document and create PDF on-demand.

If Word 2007 is an option for you, you could use the PDF conversion functionality available for Office 2007.

0xA3
Hm, is there a way to automate the PDF creation using Word 2003? Will it provide enough performance (1p/sec) ?
Max
Depends on what you need to be included in the PDF. You could have a PDF printer and print your document (rather fast but no PDF bookmarks and cross references) or use Adobe Acrobat/Distiller (with bookmarks/crossrefs but slow). Word 2007 has a good and fast PDF creator though.
0xA3
A: 

The XML Hacks book offers 2 solutions:

  1. hack # 47: a commercial tool to create PDF files from XML + CSS: Prince; see YesLogic and Prince XML. There's a "personal" version available for non-commercial use, which puts a logo on page one, but which is good enough to try out if it's something for you.

  2. hack # 48: XSL-FO, using Apache FOP. You can imagine XSL-FO as a page layout file format, in an XML file.

To me it looks like the first solution is the one that will give the quickest satisfactory results.

bart
None of this solutions provide a way to modify the PDF after it was generated. So it won't go.
Max
A: 

DOC to PDF: http://www.dancrintea.ro/doc-to-pdf/

+5  A: 

You can use docx4j for both:

  • data in XML to a docx document
  • docx to pdf

docx4j currently supports 3 ways of doing docx to pdf:

  1. via HTML (using xhtmlrenderer)
  2. via XSL FO (using FOP)
  3. via iText
plutext
A: 

you can try tweak word to pdf converter ...

A: 

You can use Aspose.Words for Java.

Data

To populate a document with data you can do that using Aspose.Words' reporting engine. You can design your report in a Microsoft Word document and use standard MERGEFIELD fields plus some Aspose.Words extension for repeatable regions. Then you can generate a report from that document + your data using Aspose.Words. It can accept a ResultSet or a completely custom MailMergeDataSource object that you can implement easily to retrieve data from XML for example.

There is a bit of info can be found in the Programmers Guide.

Alternatively, if you don't feel like using the reporting engine you can build a document programmatically via a rich API.

PDF

What's most exciting is that you can then conver that document to PDF with a high degree of fidelity. E.g. the document will most often appear looking exactly like it was done by MS Word.

Here is the catch: The siste product Aspose.Words for .NET has had conversion to PDF for several years already. But conversion to PDF in Aspose.Words for Java is at the moment only in Beta stage, but hopefully will come out of beta in a month or so.

Disclaimer: I'm a project lead on the Aspose.Words team.

romeok
A: 

Do the users have access to the web? If so, I think a better solution is XML -> HTML, with the HTML having a form to update details, and a link on the page which retrieves the (original and/or updated) PDF.

igor
A: 

I want to do the same thing on my news website...

What's the best "free" approach to this?

OutsideMMA.com

MMAMail.com
A: 

A frequent format for documents is DocBook, which exist in both XML and SGML dialects. Such a document can easily be converted to both RTF and PDF (and many others).

http://www.docbook.org/

Thorbjørn Ravn Andersen