views:

375

answers:

4

I am looking for a Java API which can compare two microsoft word documents.

We want to compare two msword document and things which are not common that we have to highlight with some color or any other way ... So I thing we have to merge both document and highlight content which are not common.

We are using Linux server so we can't install Microsoft Word in it.

A: 

Well, newer Word formats (DOCX at least) are XML, so can be compared using an XML parser - but thats probably not the easy way out. You could take a look at http://www.aspose.com/categories/file-format-components/aspose.words-for-.net-and-java/default.aspx which supports DOC.

Jan
thanks but we are looking for opensource ...
Bihag Raval
A: 

Try OpenOffice API, but there arent many resources out there to tell you how to use it.

01
Really? I've found the documentation (http://api.openoffice.org/) alright.
Matthew Flaschen
thanks Tagging Monkey and Matthew Flaschen for your help
Bihag Raval
A: 

POI (Poorly Obfuscated Implementation), a Java toolkit for Office files, is generally useful for things like this. However, don't expect it to be trivial with any toolkit.

Matthew Flaschen
A: 

You might get some support from Apache POI but there is nothing in this API doing that out of the box and, actually, this will really be a very tough task (word style modifications are just endless). But, if you limit the diff to the content, you might get something satisfying.

There is an interesting thread on the POI users list that discusses this approach: How to compare 2 word doc (OLE2CDF or OpenXML). There is even a start of an implementation. (PS: I noticed that the OP did initiate that thread on the POI users list, I'm just mentioning it for other readers. It would have been nice to have the end of the story though.)

Pascal Thivent
i only posted that question on the forum... anyway thanks ...
Bihag Raval