views:

495

answers:

6

How can you get the diff of two word .doc documents programatically? Where you can then take the resulting output and generate an html file of the result. (As you would expect to see in a normal gui diff tool)

I imagine if you grabed the docs via COM and converted the output to text you could provide some diff funcitonality. Thoughts?

Is there a way to do this without windows and COM?

(Perferably in python, but I'm open to other solutions)

UPDATE

Original question asking about msword diff tools was a duplicate of: (Thanks Nathan)

http://stackoverflow.com/questions/90075/how-to-compare-two-word-documents/

+3  A: 

I am not sure whether you are looking for following functionality. Microsoft itself has the option in office suite, Please check http://support.microsoft.com/kb/306484

lakshmanaraj
A: 

Probably not relevant (because you already know this) but Word does have a change tracking feature (which needs to be switched on before hand). http://office.microsoft.com/en-us/word/HA012186901033.aspx

Christopher Edwards
+3  A: 

It looks like if you have word and win32com installed it is relatively easy to get the text:

import win32com.client
app = win32com.client.Dispatch('Word.Application')
doc = app.Documents.Open('c:\\files\\mydocument.doc')
print doc.Content.Text
app.Quit()

Source: http://win32com.goermezer.de/content/view/158/192/

You can then run a standard diff on the resulting text.

CTT
+3  A: 

I use Araxis Merge to compare a variety of source files, but it also extracts and compares various office document formats such as MS Word, PDF, OpenDocument, etc. I think this would be your best bet if you're willing to spend a bit of money.

http://www.araxis.com/merge/index.html

David Ma
Thanks! I wasn't aware that toosl were availble to do this.
monkut
+1  A: 

Use this option in Word 2003:

Tools | Compare and Merge Documents

Or this in Word 2007:

Review | Compare

It prompts you for a file with which to compare the file you're editing.


This question is a duplicate of How to compare two word documents?, and this answer is a duplicate of my answer there.

Nathan Fellman
Thanks! it looks like the xdocdiff mentioned in that question (http://freemind.s57.xrea.com/xdocdiff/e/index.html) tool can be used to generate the diff output programmatically.
monkut
A: 

If its a docx, and you are happy with java, you could use docx4j (ASL v2). This has diff functionality built in.

See the CompareDocuments example

If its a doc, it also has basic code for converting to docx (using poi), which you could do first.

plutext