views:

146

answers:

3

Hi guys,

I was wondering how to calculate the hash code for a given string by hand. I understand that in Java, you can do something like:

String me = "What you say what you say what?";
long whatever = me.hashCode();

That's all good and dandy, but I was wondering how to do it by hand. I know the given formula for calculating the hash code of a string is something like:

S0 X 31 ^ (n-1) + S1 X 31 ^ (n-2) + .... + S(n-2) X 31 + S(n-1)
Where S indicates the character in the string, and n is the length of the string. Using 16 bit unicode then, the first character from string me would be computed as:

87 X (31 ^ 34)

However, that creates an insanely large number. I can't imagine adding all the characters together like that. So, in order to calculate the lowest-order 32 bits result then, what would I do? Long whatever from above equals -957986661 and I'm not how to calculate that!

+4  A: 

Take a look at the source code of java.lang.String.

/**
 * Returns a hash code for this string. The hash code for a
 * <code>String</code> object is computed as
 * <blockquote><pre>
 * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
 * </pre></blockquote>
 * using <code>int</code> arithmetic, where <code>s[i]</code> is the
 * <i>i</i>th character of the string, <code>n</code> is the length of
 * the string, and <code>^</code> indicates exponentiation.
 * (The hash value of the empty string is zero.)
 *
 * @return  a hash code value for this object.
 */
public int hashCode() {
int h = hash;
    int len = count;
if (h == 0 && len > 0) {
    int off = offset;
    char val[] = value;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}
dty
@BalusC, thanks for improving my answer! :-)
dty
I get the basic idea (I can compute small strings) but when the string gets large I'm unsure as to what to do.
@user458346, the size of the string isn't important. This is the values of using a loop, it doesn't matter how long the loop is, it does get any more complicated.
Peter Lawrey
A: 

At every step you should divide the result modulo MAX_HASH_VALUE.

So imagine you wanna keep your hash below a 100.

result = 0
for each character in the string do
    result += (31 * result) + character # hash code calculation
    result = result % 100               # limiting it to a 100
end
glebm
I don't understand... so what would I do for the first two characters?
+2  A: 

Most hash functions of this sort calculate the hash value modulo some large number (e.g. a large prime). This avoids overflows and keeps the range of values returned by the function within a specified range. But this also means an infinite range of input values will get a hash value from a finite set of possible values (i.e. [0,modulus)), hence the problem of hash collisions.

In this case, the code would look something like this:

   public int hash(String x){
        int hashcode=0;
        int MOD=10007;
        int shift=29;
        for(int i=0;i<x.length();i++){
            hashcode=((shift*hashcode)%MOD+x.charAt(i))%MOD;
        }
        return hashcode; 
    }

Exercise for the reader:

See the code for the hashCode function for java.util.String. Can you see why it does not use a modulus explicitly?

MAK
I can't see... Could You explain it?
jjczopek
MAK