views:

88

answers:

1

The online documentation says

Hash[expr] 
  gives an integer hash code for the expression expr.
Hash[expr,"type"]
  gives an integer hash code of the specified type for expr.

It also gives "possible hash code types":

  • "Adler32" Adler 32-bit cyclic redundancy check
  • "CRC32" 32-bit cyclic redundancy check
  • "MD2" 128-bit MD2 code
  • "MD5" 128-bit MD5 code
  • "SHA" 160-bit SHA-1 code
  • "SHA256" 256-bit SHA code
  • "SHA384" 384-bit SHA code
  • "SHA512" 512-bit SHA code

Yet none of these correspond to the default returned by Hash[expr].

So my questions are:

  • What method does the default Hash use?
  • Are there any other hash codes built in?
+4  A: 

The default hash algorithm is, more-or-less, a basic 32-bit hash function applied to the underlying expression representation, but the exact code is a proprietary component of the Mathematica kernel. It's subject to (and has) change between Mathematica versions, and lacks a number of desirable cryptographic properties, so I personally recommend you use MD5 or one of the SHA variants for any serious application where security matters. The built-in hash is intended for typical data structure use (e.g. in a hash table).

The named hash algorithms you list from the documentation are the only ones currently available. Are you looking for a different one in particular?

Michael Pilat
@Michael Do you know if there a way to "introspect" Mathematica to get this kind of info? And also: Why isn't the "type" param of the Hash function implemented as an "option" (->) ? Mmmm may be I should post a question ...
belisarius
@belisarius It is strange how Hash[] does not follow the option convention...
Simon
@Michael I'm actually using it to speed up a calculation. I have some very long expressions built out of `NonCommutativeMultiply` that I need to `Collect` the coefficients and solve for linear dependence. Hashing the `NonCommutativeMultiply`s speeds things up by about a factor of 2. So all I need is the uniqueness of the hash. The question was asked only out of curiosity.
Simon
I'm not sure why `Hash` uses a second argument and not options, nor is there any way to introspect out possible algorithms, although that's a great suggestion.
Michael Pilat
Also, @Simon, keep in mind most hash functions (including `Hash[]`, MD5 and SHA) aren't "perfect" (i.e., injective), and unfortunately, especially not `Hash`. Evaluate and consider this: `Length[DeleteDuplicates[Hash /@ Range[10^6]]]` =)
Michael Pilat
@Simon But `Length@DictionaryLookup[___] - Length@DeleteDuplicates@(Hash /@ DictionaryLookup[___]) == 1` :), so I think you should test the injective property for your app
belisarius
@belisarius It is interesting that the only two words in default English Dictionary that hash the same are "Occidentals" and "umped"...
Simon
Thanks for the warnings. It's not being used in a critical part of my program and the final results will be tested without the hash - so I don't expect any problems.
Simon
@Simon So derive the algorithm by knowing that :D
belisarius