What hash algorithm does .net utilise? What about java?

+6 A:

The hash function is not built into the hash table; the hash table invokes a method on the key object to compute the hash. So, the hash function varies depending on the type of key object.

In Java, a List is not a hash table (that is, it doesn't extend the Map interface). One could implement a List with a hash table internally (a sparse list, where the list index is the key into the hash table), but such an implementation is not part of the standard Java library.

erickson 2009-05-21 18:46:04

Though Java's hashtable does include a SECONDARY hash function, of course, to protect against hash functions whose randomness isn't in the lower bits.

Neil Coffey 2009-05-21 19:21:00

+1 A:

Anything purporting to be a HashTable or something like it in .NET does not implement its own hashing algorithm: they always call the object-being-hashed's GetHashCode() method.

There is a lot of confusion though as to what this method does or is supposed to do, especially when concerning user-defined or otherwise custom classes that override the base Object implementation.

Yadyn 2009-05-21 18:54:35

+1 A:

For .NET, you can use Reflector to see the various algorithms. There is a different one for the generic and non-generic hash table, plus of course each class defines its own hash code formula.

Jonathan Allen 2009-05-21 18:55:14

+1 A:

The HASHING algorithm is the algorithm used to determine the hash code of an item within the HashTable.

The HASHTABLE algorithm (which I think is what this person is asking) is the algorithm the HashTable uses to organize its elements given their hash code.

Java happens to use a chained hash table algorithm.

Will Hartung 2009-05-21 18:56:00

For HashMap and Hashtable. IdentityHashMap (and ThreadLocal) use probing algorithms. (Although I don't think any of this is specified.)

Tom Hawtin - tackline 2009-05-22 00:30:12

+2 A:

I know nothing about .NET but I'll attempt to speak for Java.

In Java, the hash code is ultimately a combination of the code returned by a given object's hashCode() function, and a secondary hash function inside the HashMap/ConcurrentHashMap class (interestingly, the two use different functions). Note that Hashtable and Dictionary (the precursors to HashMap and AbstractMap) are obsolete classes. And a list is really just "something else".

As an example, the String class constructs a hash code by repeatedly multiplying the current code by 31 and adding in the next character. See my article on how the String hash function works for more information. Numbers generally use "themselves" as the hash code; other classes, e.g. Rectangle, that have a combination of fields often use a combination of the String technique of multiplying by a small prime number and adding in, but add in the various field values. (Choosing a prime number means you're unlikely to get "accidental interactions" between certain values and the hash code width, since they don't divide by anything.)

Since the hash table size-- i.e. the number of "buckets" it has-- is a power of two, a bucket number is derived from the hash code essentially by lopping off the top bits until the hash code is in range. The secondary hash function protects against hash functions where all or most of the randomness is in those top bits, by "spreading the bits around" so that some of the randomness ends up in the bottom bits and doesn't get lopped off. The String hash code would actually work fairly well without this mixing, but user-created hash codes may not work quite so well. Note that if two different hash codes resolve to the same bucket number, Java's HashMap implementations use the "chaining" technique-- i.e. they create a linked list of entries in each bucket. It's thus important for hash codes to have a good degree of randomness so that items don't cluster into a particular range of buckets. (However, even with a perfect hash function, you will still by law of averages expect some chaining to occur.)

Hash code implementations shouldn't be a mystery. You can look at the hashCode() source for any class you choose.

Neil Coffey 2009-05-21 20:06:29

You should add that hash collisions are handled through a linked list of entries with the same hash code.

Michael Borgwardt 2009-05-21 20:46:11

IIRC, HashMap is now size as a power of two, but didn't used to be. Hashtable traditionally started off with a prime size. Prime size is useful (only) if you hash codes are bad (and you don't rehash).

Tom Hawtin - tackline 2009-05-22 00:32:34

Tom-- yes it did, and in fact as I recall this was carried over to pre-1.4 versions of HashMap. But as you say, many consider that if you find you need a prime number of buckets that's essentially God's way of telling you your hash function isn't good enough.

Neil Coffey 2009-05-22 00:37:00

Michael -- I'll add something to mention that. N.B. It's specificall the same *bucket index* that leads to chaining. If the hashCode() of two objects returns the same value, then one will override the other.

Neil Coffey 2009-05-22 00:37:50

+1 A:

The .NET Dictionary<T> class uses an IEqualityComparer<T> to compute hash codes for keys and to perform comparisons between keys in order to do hash lookups. If you don't provide an IEqualityComparer<T> when constructing the Dictionary<T> instance (it's an optional argument to the constructor) it will create a default one for you, which uses the object.GetHashCode and object.Equals methods by default.

As for how the standard GetHashCode implementation works, I'm not sure it's documented. For specific types you can read the source code for the method in Reflector or try checking the Rotor source code to see if it's there.

Kevin Gadd 2009-05-21 20:40:13

ansaurus

tags:

views:

answers:

What hash algorithm does .net utilise? What about java?

related questions