views:

882

answers:

6

I've found a superb HTML to PDF converter in Prince XML. Now I'm looking for something of similar quality to produce Word documents from HTML + CSS. This is on PHP/Linux.

A: 

It might be easier to look for a PDF -> DOC converter instead, if you already have one-half of the problem solved. That said, I don't know of any good PDF -> DOC converters either :(

nezroy
A: 

You might want to try sending the HTML file as DOC (similar to how it's described link text, but with the proper Contet-Type header :-)) and let the end-user's system do the conversion (AFAIK you can do something similar with Excel too).

Cd-MaN
A: 

OpenOffice can be used in server/headless mode to produce documents in lots of formats.

Liam
A: 

Here is one alternative for pdf => doc. Haven't tried it, good luck!

asvela
A: 

I had to do this a few years ago and ended up rolling my own custom solution. I created a Word document in the format I wanted, saved as HTML, and then added code where required to retrieve text from the database and format the way MS Word likes it. I forced a header to make the client think it was receiving a Word Document instead of an HTML file. Microsoft Word happily opened the file as if it were a regular Word Document.

If it were feasible to output a DOCX file instead, you could do an XSL tranform.

Scott
+1  A: 

Three options depending on what you need to do:

  1. For simple cases, you can just write out the HTML to a .doc file. Sample here. That's limited, though, and prompts the user to save as HTML if they make updates.

  2. If you can require Word 2007, you can generate Office Open XML, which is basically a zip file which holds XML documents. I haven't found a library that can do that, but you can get started by renaming a sample .docx file to a .zip file and looking at what's included, then generate that from PHP. Some info on that in this SO question.

  3. If you need to support Word 2003, you need to work with Word 2003's XML format. It's different than the 2007 format, but is at least forward compatible (so it'll work on 2007 as well). The simplest way is to save as a Word 2003 XML document from Word, then open the document in a text editor, then get to work writing XSLT that will convert your HTML to the correct XML. I've done it, and it worked, but it was a lot of work. There's info on the format here.

None of those are all that easy, so it might be worth buying a software product that does the conversion for you.

More info on this question is available in this SO question, as well.

Jon Galloway