tags:

views:

850

answers:

7

I am trying to guarantee the integrity of a file after download. I store the MD5 of the file in database and compare that MD5 to the file after it is downloaded. However, I always get different MD5 results when I hash the file after it is downloaded. I am wondering if the byte array that is being hashed contains the meta data like last modified and is throwing off the hash. If anyone else has done this before, your help would be greatly appreciated.

A: 

If im not totally wrong here the md5 hash is only working on the actual data not the timestamps and other metadata. Maybe you are transfering text-files with ftp, in that case the ftpclient might rewrite the newline characters to fit your system and then the hash will be diffrent

Tjofras
A: 

If you are using FTP to download, the problem could be:

  • Binary download option instead of ASCII (or vice versa).

  • Transferring across platforms e.g. Windows to Unix where the EOL is treated differently.

nzpcmad
A: 

You could test your theory by only hashing against a particular part of file... Say, the middle 50%... If that is different then you know its not just a timestamp or something... That said, you really need to give us more info to get a better answer...

dicroce
+2  A: 

A simple way to find out: run a diff (I assume binary but maybe not) against two different downloads. This should quickly pinpoint the problem.

dpp
+3  A: 

The MD5 hash is calculated on the file contents, and is not affected by document metadata. It is a deterministic process that will always produce the same result, if you start with the same content (although, there are ways to fake an MD5 signature due to collision).

How are you creating the MD5 hash for the file? Have you tried using another tool to reproduce the problem?

If there is a different MD5 signature, then your files are different somehow.

The previous suggestions of EOL characters, or transferring a binary file in ASCII mode are very likely reasons why the files could be changed. Using a diff tool can help identify where/how the files are different. If your file is binary format, try using a binary diff tool.

Mads Hansen
A: 

Make sure you are actually calculating the MD5 on the bytes of the file, not the filename or some other string.

Mark Stahler
A: 

You could use http://www.filemd5.net/API to get the MD5 of the file before you download it

JohnnieWalker