ansaurus

Question

How can I create multiple hashes of a file using only one pass?

Answer 1

+15 A:

Something like this perhaps?

>>> import hashlib
>>> hashes = (hashlib.md5(), hashlib.sha1())
>>> f = open('some_file', 'r')
>>> for line in f:
...     for hash in hashes:
...         hash.update(line)
... 
>>> for hash in hashes:
...     print hash.name, hash.hexdigest()

or loop over f.read(1024) or something like that to get fixed-length blocks

ʞɔıu 2009-02-11 16:25:24

That looks like it would work but I would read bytes using a fixed block size rather than a per-line basis (some binary files may not contain line breaks)

Jason S 2009-02-11 16:28:00

f.readlines() requires ~100MB, but a mere `f` works (a file object is an iterator over lines in Python)

J.F. Sebastian 2009-02-11 17:25:37

`for line in f` iterates over *lines* in the file. if line size is 1MB then it doesn't matter what buffer size do you use; len(line) will be 2**20. Therefore the 3rd parameter for the `open()` is not useful in this case.

J.F. Sebastian 2009-02-11 20:26:07

you may be right, maybe I don't understand the meaning of that parameter correctly

ʞɔıu 2009-02-11 22:56:51

Answer 2

+3 A:

I don't know Python but I am familiar w/ hash calculations.

If you handle the reading of files manually, just read in one block (of 256 bytes or 4096 bytes or whatever) at a time, and pass each block of data to update the hash of each algorithm. (you'll have to initialize state at the beginning and finalize the state at the end.)

Jason S 2009-02-11 16:26:27

Answer 3

+6 A:

Here's a modified @ʞɔıu's answer using @Jason S' suggestion.

from __future__ import with_statement
from hashlib import md5, sha1

filename = 'hash_one-pass.py'

hashes = md5(), sha1()
chunksize = max(4096, max(h.block_size for h in hashes))
with open(filename, 'rb') as f:
    while True:
        chunk = f.read(chunksize)
        if not chunk:
            break
        for h in hashes:
            h.update(chunk)

for h in hashes:
    print h.name, h.hexdigest()

J.F. Sebastian 2009-02-11 17:38:04

ansaurus

tags:

views:

answers:

How can I create multiple hashes of a file using only one pass?

related questions