views:

185

answers:

2

I need an option from within PHP to Manipulate .docx (Microsoft Office 2007) document.

I need to:

  1. Read the internal text
  2. Convert to .html
  3. To view them inside a browser.
  4. To replace text.

I know I can use Word Automation, creating a COM object of Microsoft Word, but it's too slow, unstable and I have to have it installed on the server.

Is there any library or code that can do it from PHP?

+2  A: 

There is PHPWord for that by the authors of PHPExcel.

Sarfraz
Unfortunately, PHPWord can only write Word2007 docx files at present. I'm working on a reader in my spare time, but it probably won't be available for a couple of months (too many other demands from PHPExcel at the moment)
Mark Baker
@Mark Baker: He seems to talk about docx too as you say it supports only that at the moment :)
Sarfraz
Mark. Is it only a writer - or is it only for docx?
aviv
@aviv - Currently, PHPWord is only a writer, and only supports .docx files. The long-term intent is to provide both read and write capability for .doc, .docx and even Open Office Writer .odt formats. The architecture allows for easy implementation of new readers and writers for different file formats, but they still need to be coded. Currently I have a partially-coded reader for .docx; but my priorities are still with the more extensively used PHPExcel library, so it may be some time before I get it finished and incorporated into the PHPWord release code
Mark Baker
@aviv COM may be slow and unstable (actually its not too bad), but you can also use COM with OpenOffice Writer (http://wiki.services.openoffice.org/wiki/Documentation/DevGuide/ProUNO/Bridge/Automation_Bridge) as an alternative to MSWord; and with XPCOM or CORBA it's possible to mimic COM on other platforms
Mark Baker
A: 

Docx is just a ZIP file containing multiple XML files and embedded media files like images. Because of this, you can read and edit the document with ease. Just unzip it, open word/document.xml, do reading & writing, and repack the files.

Convet to HTML may be difficult. But you'll find a thumbnail of the first page in docProps/thumbnail.jpeg.

Note that you'll have to familiarize yourself with the XML structure to do any complex edits. There's a summary XML docProps/app.xml which has some metadata for the file so don't forget to update it. Read more from Wikipedia: http://en.wikipedia.org/wiki/Office_Open_XML

jmz