views:

520

answers:

4

I would like to programatically convert a Microsoft Word document into XHTML. The language of choice is PHP, so I would appreciate any suggestions with PHP.

The initial idea is trying to convert the doc file into odt, and then use the Odt2Xhtml PHP class to get it into XHTML format.

Any better way to do this?

+2  A: 

The most robust way is to use COM to let Word save the document as HTML.

I don't know whether Word can generate XHTML directly; if not, Google shows plenty of options for doing that conversion.

ykaganovich
Not especially robust: http://support.microsoft.com/default.aspx?scid=kb;EN-US;257757
ChrisW
+4  A: 

If you're running Linux one way to go would be to install OpenOffice on the server.

Example instructions for a 'headless' (i.e. no UI) install can be found here.

You could then use a nice CLI app like unoconv executed via shell_exec to do your conversions via PHP.

Ciaran McNulty
A: 

See http://www.codeplex.com/OpenXMLViewer which includes an XSLT you could adapt, which is what I did in docx4j. Note however, that that XSLT is not for the faint of heart!

plutext
A: 

phpLiveDocx offers a really easy way to convert Microsoft Word documents.

Learn more at the project web site:

http://www.phplivedocx.org

You can also use phpLiveDocx to merge textual data with MS Word templates and save the resulting document to DOC, DOCX, RTF, PDF or TXT.

The component is enterprise-ready and has been written for the Zend Framework.