views:

209

answers:

3
>>> hash("\x01")
128000384
>>> hash("\x02")
256000771
>>> hash("\x03")
384001154
>>> hash("\x04")
512001541

Interesting part is 128000384 x 2 is not 256000771, and also others

I am just wondering how that algorithm works and want to learn something on it.

Thanks in advance.

+4  A: 

You could always try Google.

I did that, and the first hit was: http://effbot.org/zone/python-hash.htm

The algorithm there matches your results.

truppo
I searched in Python Source codes because I thought it was written in C. No? I will check that out, thanks. +1
S.Mark
Yes they are. That page translates a few of the hashing algorithms to Python, just for fun apparently.
Jason Orendorff
+4  A: 

If you download the source code of Python, you will find it for sure! But bear in mind the hash function is implemented for each kind of objects differently.

For example, you will find the unicode hash function in Objects/unicodeobject.c in the function unicode_hash. You might have to look a bit more to find the string hash function. Find the structure defining the object you are interested in, and in the field tp_hash, you will find the function that compute the hash code of that object.

For the string object: The exact code is found in Objects/stringobject.c in the function string_hash:

static long
string_hash(PyStringObject *a)
{
register Py_ssize_t len;
register unsigned char *p;
register long x;

if (a->ob_shash != -1)
    return a->ob_shash;
len = Py_SIZE(a);
p = (unsigned char *) a->ob_sval;
x = *p << 7;
while (--len >= 0)
    x = (1000003*x) ^ *p++;
x ^= Py_SIZE(a);
if (x == -1)
    x = -2;
a->ob_shash = x;
return x;
}
PierreBdR
Thanks, I have source copy but there is many hits for the word hash, and I coundn't find it, thanks anyway. +1
S.Mark
I edited a bit to give you more information on how to find the hash you are interested in.
PierreBdR
Thanks a lot, I just found that now too.
S.Mark
+1  A: 

I recommend you to read the Wikipedia entry for hash functions ( http://en.wikipedia.org/wiki/Hash_function ) to understand better the hash functions. You'll get a lot of answers for the implementation!

To summarize some key points (not only for this specific function, but in general for all hash functions):

  • On a given entry (depending the needs will be a number, a string, and object, etc) can produce a output which is smaller and of fixed length. Usually (not to say always), is an integer.
  • Different inputs produce different outputs. As the output is smaller than the input, there ALWAYS will be different inputs that produce the same output. This is call 'hash collition" and should be rare if hash function is well designed.
  • The process should be efficient, so it's fast to get the hash of an input data.
  • For some types of hash functions is important that similar inputs produce not similar outputs. For others, it's not a requisite, but it's usually achieved. That's why hash("\x02") is not 2*hash("\x01")

Basically, hash functions are used to use an integer in the place of the complete object, which you can manage more easily and more efficiently.

Khelben
Mmm, I'm not sure that my answer will be useful. I get a little confused for the has(2x) != 2has(x) and I think a clarification of what a hash function is could be useful... But that not was the answer properly.Let me know if it's not relevant and I will delete my comment.
Khelben
+1, Thanks, those are very good points.
S.Mark