tags:

views:

154

answers:

4

Hi, I'm working on a hash function which gets a string as input.

Right now I'm doing a loop and inside the hash (an int variable) is being multiplied by a value and then the ASCII code for the current character is added to the mix.

hash = hash * seed + string[i]

But sometimes, if the string is big enough there is an integer overflow there, what can I do to avoid it while maintaining the same hash structure? Maybe a bit operation included inside the loop?

A: 

Why not use long to store the result? You can then apply techniques such as this one to detect the overflow

DVK
A: 

If you have access to a larger data type, you can do something like this:

int32_t hash, seed;
int64_t temporary;

temporary = hash * seed + string[i];
hash = ( temporary >> 32 ) ^ ( temporary & 0xFFFFFFFF );

Otherwise you will have to manually multiply the hash and seed into two values, add string[i] with overflow, then ^ the two values.

Hashes are implicitly lossy, so it should be fine to just let the overflow bits go unless there is a specific reason you need them, like matching an existing algorithm.

drawnonward
+1  A: 

Hash functions like this one are supposed to overflow. You have to declare "hash" unsigned. If you really need an int than simply use hash & 0x7fffffff. Review the Fowler-Noll-Vo algorithm, you'll find links to source code there.

Hans Passant
+1  A: 

There are a number of possible interpretations of your question, and as noted by the comments, you may need to clarify.

The only sensible interpretation however is that you want to restrict the hash value to a specified range. Assuming that, then if the range were 0 to HASH_TABLE_SIZE - 1, then:

hash = (hash * seed + string[i]) % HASH_TABLE_SIZE ;

or if the table size is a power of two, use a mask:

#define HASH_TABLE_SIZE (0x01<<8)  // 2^8 (256) table
#define HASH_MODULO_MASK (HASH_TABLE_SIZE - 1)
...
hash = (hash * seed + string[i]) & HASH_MODULO_MASK ;
Clifford