views:

424

answers:

2

I am wanting to generate and store a CRC (or similar) value for a given list of files which can be used as a comparison at a later point. Writing a function to do this is simple enough, but is there a more standard way to do it within the Python libs?

The value generated does not need to be of any particular standard.

+4  A: 

recommend hashlib, it implements a common interface to many different secure hash and message digest algorithms. Included are the FIPS secure hash algorithms SHA1 and MD5. a demo code:

import hashlib
m = hashlib.md5()
for line in open('data.txt', 'rb'):
    m.update(line)
print m.hexdigest()
##ouput
1ab8ad413648c44aa9b90ce5abe50eea
sunqiang
A simple hashlib.md5(my_file.read()) ?
kjfletch
@kjfletch, I have update the answer with a simple demo code, it update md5 line by line to easy the system load, and u can use this function with os.walk(http://docs.python.org/library/os.html#os.walk has a sample) to calculate every file's md5 as you want.
sunqiang
@sunqiang: `for line in open()` may (attempt to) return rather long "lines" from a binary file. It's probably a good idea to used `block = f.read(BLOCKSIZE); m.update(block)` for predictable and safe memory usage.
John Machin
@john, Agree. excellent point.
sunqiang
This will produce the same output as the looped version: with open(path, "r") as f: m = hashlib.md5(f.read())
kjfletch
+1  A: 

If you don't need one-way security you could also use zlib.crc32 or zlib.adler32, as documented here.

Vinay Sajip
It's worth noting that adler32 runs faster than crc32 but is not as good at error detection as crc32. If the application is that checksum(file) accompanies file, adler32 should not be used -- it was quite appropriate for its targetted application: checksum(UNcompressed file) accompanies compressed file.
John Machin