ansaurus

Question

Python: Generating a MD5 checksum of a file?

Answer 1

+1 A:

There is a way that's pretty memory inefficient.

import hashlib
[(fname, hashlib.md5(file(fname, 'r').read()).digest()) for fname in fnamelst]

This will give you a list of tuples, each tuple containing the name of its file and its hash.

I strongly question your use of MD5. You should be at least using SHA1. MD5 is known broken, and shouldn't be used for any purpose, even if you don't think your purpose is security sensitive.

Here is a way that is more complex, but not memory inefficient:

import hashlib
def hashfile(afile, hasher, blocksize=65536):
    buf = afile.read(blocksize)
    while len(buf) > 0:
        hasher.update(buf)
        buf = afile.read(blocksize)
    return hasher.digest()

[(fname, hashfile(file(fname, 'r'), hashlib.md5()) for fname in fnamelst]

Omnifarious 2010-08-07 19:53:25

I'm only using MD5 to confirm the file isn't corrupted. I'm not so concerned about it being broken.

Alexander 2010-08-07 20:03:30

@TheLifelessOne: And despite @Omnifarious scary warnings, that is perfectly good use of MD5.

GregS 2010-08-07 20:09:02

@GregS, @TheLifelessOne - Yeah, and next thing you know someone finds a way to use this fact about your application to cause a file to be accepted as uncorrupted when it isn't the file you're expecting at all. No, I stand by my scary warnings. I think MD5 should be removed or come with deprecation warnings.

Omnifarious 2010-08-07 20:21:32

Answer 2

+4 A:

You can use hashlib.md5()

Note that sometimes you won't be able to fit the whole file in memory. In that case, you'll have to read chunks of 128 bytes sequentially and feed them to the Md5 function. See this question.

quantumSoup 2010-08-07 19:53:52

Well if any of the files are larger than 1MB, then I've got some problems. Thanks though. I think that solves my problem.

Alexander 2010-08-07 19:59:09

ansaurus

tags:

views:

answers:

Python: Generating a MD5 checksum of a file?

related questions