views:

182

answers:

2

The hashlib Python module provides the following hash algorithms constructors: md5(), sha1(), sha224(), sha256(), sha384(), and sha512().

Assuming I don't want to use md5, is there a big difference in using, say, sha1 instead of sha512? I want to use something like hashlib.shaXXX(hashString).hexdigest(), but as it's just for caching, I'm not sure I need the (eventual) extra overhead of 512...

Does this overhead exist, and if so, how big is it?

+4  A: 

Why not just benchmark it?

>>> def sha1(s):
...     return hashlib.sha1(s).hexdigest()
...
>>> def sha512(s):
...     return hashlib.sha512(s).hexdigest()
...
>>> t1 = timeit.Timer("sha1('asdf' * 100)", "from __main__ import sha1")
>>> t512 = timeit.Timer("sha512('asdf' * 100)", "from __main__ import sha512")
>>> t1.timeit()
3.2463729381561279
>>> t512.timeit()
6.5079669952392578

So on my machine, hash512 is twice as slow as sha1. But as GregS said, why would you use secure hash for caching? Try the builtin hash algorithms which should be really fast and tuned:

>>> s = "asdf"
>>> hash(s)
-618826466
>>> s = "xxx"
>>> hash(s)
943435
>>> hash("xxx")
943435

Or better yet, use the builtin Python dictionaries. Maybe you can tell us more about what you plan on caching.

EDIT: I'm thinking that you are trying to achieve something like this:

hash = hashlib.sha1(object_to_cache_as_string).hexdigest()
cache[hash] = object_to_cache

What I was refferring to by "use the builtin Python dictinoaries" is that you can simplify the above:

cache[object_to_cache_as_string] = object_to_cache

In this way, Python takes care of the hashing so you don't have to!

Regarding your particular problem, you could refer to http://stackoverflow.com/questions/1151658/python-hashable-dicts in order to make a dictionary hashable. Then, all you'd need to do to cache the object is:

cache[object_to_cache] = object_to_cache
sttwister
Thanks for taking the time to benchmark it. As many of you said, I probably don't need to use secure hashing for caching.Basically I need to store a fingerprint of [the content of] a dictionary. As I can't use either `hashlib` or `hash()` directly on a dictionary, I was building a string containing the elements of that dictionary (don't like this approach) and then use `hashlib` on it... But now you've intrigued me with "use the builtin Python dictionaries", what do you mean by that?
Emilien
See edit. I hope this solves your problem.
sttwister
By reading your comments (all of you), I realized I didn't need to use any secure hashing, so I implemented my own "hashing" algorithm. Since the dictionnary always have specific elements, and each value has an idea, I create a string from those ideas and cache that. Thanks all.
Emilien
A: 

Perhaps a naive test... but it looks like it depends on how much you're hashing. 2 blocks of sha512 is faster than 4 blocks of sha256?

>>> import timeit
>>> import hashlib
>>> for sha in [ x for x in dir(hashlib) if x.startswith('sha') ]:
...   t = timeit.Timer("hashlib.%s(data).hexdigest()" % sha,"import hashlib; data=open('/dev/urandom','r').read(1024)")
...   print sha + "\t" + repr(t.timeit(1000))
...
sha1    0.0084478855133056641
sha224  0.034898042678833008
sha256  0.034902095794677734
sha384  0.01980900764465332
sha512  0.019846916198730469
MattH