tags:

views:

276

answers:

5
+1  Q: 

Hash-algorithm

I am looking for a hash-algorithm, to create as close to a unique hash of a string (max len = 255) as possible, that produces a long integer (DWORD).

I realize that 26^255 >> 2^32, but also know that the number of words in the English language is far less than 2^32.

The strings I need to 'hash' would be mostly single words or some simple construct using two or three words.


The answer:

One of the FNV variants should meet your requirements. They're fast, and produce fairly evenly distributed outputs. (Answered by Arachnid)


+2  A: 

See here for a previous iteration of this question (and the answer).

Nick Johnson
Yes, that's what I'm looking for. I did search but the keywords I used does not appear in the link you gave. I added a comment there to add more relevant keywords.
slashmais
+1  A: 

One technique is to use a well-known hash algorithm (say, MD5 or SHA-1) and use only the first 32 bits of the result.

Be aware that the risk of hash collisions increases faster than you might expect. For information on this, read about the Birthday Paradox.

Greg Hewgill
+1  A: 

Ronny Pfannschmidt did a test with common english words yesterday and hasn't encountered any collisions for the 10000 words he tested in the Python string hash function. I haven't tested it myself, but that algorithm is very simple and fast, and seems to be optimized for common words.

Here the implementation:

static long
string_hash(PyStringObject *a)
{
    register Py_ssize_t len;
    register unsigned char *p;
    register long x;

    if (a->ob_shash != -1)
     return a->ob_shash;
    len = Py_SIZE(a);
    p = (unsigned char *) a->ob_sval;
    x = *p << 7;
    while (--len >= 0)
     x = (1000003*x) ^ *p++;
    x ^= Py_SIZE(a);
    if (x == -1)
     x = -2;
    a->ob_shash = x;
    return x;
}
Armin Ronacher
Can we have a link to the article/post you're referencing?
Nick Johnson
Was a discussion on IRC, no link. sorry
Armin Ronacher
A: 

H(key) = [GetHash(key) + 1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1))] % hashsize

MSDN article on HashCodes

Catch22
A: 

Java's String.hash() can be easily viewed here, its algorithm is

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
Stephen Denne