views:

68

answers:

1

Hi,

I am trying to generate check sum of a word document by opening at binary level. I generate the check sum of the document. Copy the document to a different location. When I generate the checksum at the new location I get a different value though I haven't changed the contents of the document. The check sum varies even if I copy the document back to the same location. This does not happen with other file types such as .txt or .pdf files. So this proves that there are no bugs in the check sum generation. But what I feel is that by opening a .doc file in binary level, I am generating checksum for metadata of the document which varies. Am I right? Please enlighten me.

A: 

.doc files are OLE streams, and .docx files are zip compressed xml files, so the short answer is: yes, there is all manner of metadata attached with a Word document.

That said, simply copying any file to a new location (as opposed to copying the contents of the file into a new file) shouldn't modify it. How are you copying it?

Matthew Scharley
I am copying the contents alone
Prabhu
If you are opening a new word file and copy/pasting the contents across, this would explain the different check sums. Word puts all sorts of information into it's files, some of which is timestamped, so even if you do the exact same things, the timestamps will be different and will generate a different checksum
Matthew Scharley