views:

282

answers:

3

Looking to develop server-side application that will process documents. The source documents are mostly MS-Word 2003, 2007, i.e. the MS version of Docx. Want the server application to be able to run on both linux or windows. Wanting to know what is the best tool or library for reading and writing MS-Word files under linux. Compatibility is the most important consideration. Must preserve source document formatting including tables.

I have seen a kind of similar post here but it was specific to python. I don't care what language or libraries are used as long as they are available for windows and linux.
Must not require MS-Word to read the Word files.
I am aware of Open Office but am looking for a solution which has a high degree of compatibility with MS-Word files. Also just came across this solution which looks promising. aspose.com Anyone had any experience using Aspose.Words for Java or similar 3rd party packages? It looks promising but it's pricey at over $2K for an OEM subscription. That said if it delivers as advertised it may still be the best solution out there.

thanks There have been a couple of suggestions but nothing so far which would fits the bill (or the budget).

A: 

Ok, I'll have another go at an answer ;-)

What about using unaconv

It can convert any document OpenOffice can read to any document OpenOffice can write. You should be able to use that to convert both to/from MS-Word documents (providing they're not overly complicated which I've found open office can't handle very well).

The only caveat is that you need to have an instance of OpenOffice running on the linux server for unoconv to interact with.

Benj
thanks again, I was aware that OO might be one solution. While I regularly use OO 3 it does have same problems dealing with the old binary Word files i.e Word 97-2003. Looking to see if there are any more compatible solutions available. Cheers
10ToedSloth
+3  A: 

Have you considered using b2xtranslator to convert binary .doc to .docx. (On Linux, you'd have to run it in Mono)

You could then use POI or docx4j to manipulate the docx. Not a solution if you need to save as .doc though (unless you use OO for that bit)

plutext
b2xtranslator depends on the system.io.packaging .net class, which mono has only recently supported, and I think is still a bit flaky.
Charles Stewart
A: 

Mono has recently acquired support for the system.io.packaging .net class, which allows some degree of manipulation of docx files. If the kind of thing you want to do is add/remove resources and recurse over the text, it's probably the right thing.

Charles Stewart