The response is absolutely no surprise: in fact
In [1]: -5768830964305142685L & 0xffffffff
Out[1]: 1934711907L
so if you want to get reliable responses on ASCII strings, just get the lower 32 bits as uint
. The hash function for strings is 32-bit-safe and almost portable.
On the other side, you can't rely at all on getting the hash()
of any object over which you haven't explicitly defined the __hash__
method to be invariant.
Over ASCII strings it works just because the hash is calculated on the single characters forming the string, like the following:
class string:
def __hash__(self):
if not self:
return 0 # empty
value = ord(self[0]) << 7
for char in self:
value = c_mul(1000003, value) ^ ord(char)
value = value ^ len(self)
if value == -1:
value = -2
return value
where the c_mul
function is the "cyclic" multiplication (without overflow) as in C.