views:

770

answers:

5

I am looking for a Java API which can compare two microsoft word documents.

We are using Linux server so we can't install Microsoft Word in it.

EDIT :- We want to compare two document and what ever things are not common that we have to highlight with some color or any other way ... So I thing we have to merge both document and highlight content which are not common.

+1  A: 

There is an Apache POI - Java API to do this.

Example source code is here.

I found another article doing the same thing in Java, but uses windows COM to do this. If you are using Linux, it suggests using a remote windows machine to do the work. The article contains detailed explanation: Word from Java

Niyaz
Thanks niyaz but i want to compare two document like office 2007 has compare functionality in built ... When we are comparing two doc we have to show the uncommon words, images in bold or with some color ...appreciate your help
I think you will have to use some other library for comparing the contents.So [A library for reading DOC files + a library for content comparison] will do the work for you.
Niyaz
yes I am trying to search on net ... anyway thanks for help ...
A: 

You can have a look at Aspose.Words for Java. It might be able to help you out.

Conrad
Thanks Conrad but i m looking for opensource ...
+1  A: 

Ms Word is not really supported in java.

you can use poi, but you wont be able to compare everything. COM control is your best chance of doing it(you might be able to use WINE on linux to emulate it).

I think your best choice is to use RTF files and iText-RTF(in MsWord you can save document as RTF). They have better support, however from my own experiance i can tell you that sometimes they render different in MsWord2003,OpenOffice and MsWord2007. So you should always check that.

You could also try OpenOffice API(ive never tried it), but there arent many resources out there to tell you how to use it.

01
A: 

If its a docx, you could use docx4j (ASL v2).

See the CompareDocuments example

plutext
A: 

If Office 2007 supports server mode, like OpenOffice does, you could send the stream to a network and process the results back.

You might be able to achieve what you need it with a recent version of OpenOffice too, using the UNO API.