ansaurus

Question

Why does a HashMap rehash the hashcode supplied by the Key object

Answer 1

+1 A:

I somewhere read this is done to ensure a good distribution even if your hashCode implementation, well, err, sucks.

Helper Method 2010-03-29 13:42:49

Right, and the default hashcode() implementation in java.lang.Object doesn't have much distribution between hashes.

Sam Barnum 2010-03-29 13:48:16

This is true, however more explanation/citation/link would be nice...

pajton 2010-03-29 13:49:33

What i dont understand is that if each hash is unique (and the method in question does not - and cannot - address the problem of unique hashes), what problems does the mechanism face? It mentions something about collisions in lower order bits - but that's not very clear.

Varun Garde 2010-03-29 13:57:20

pgras 2010-03-29 14:08:14

Answer 2

+3 A:

As Helper wrote, it is there just in case the existing hash function for the key objects is faulty and does not do a good-enough job of mixing the lower bits. According to the source quoted by pgras,

 /**
  * Returns index for hash code h.
  */
 static int indexFor(int h, int length) {
     return h & (length-1);
 }

The hash is being ANDed in with a power-of-two length (therefore, length-1 is guaranteed to be a sequence of 1s). Due to this ANDing, only the lower bits of h are being used. The rest of h is ignored. Imagine that, for whatever reason, the original hash only returns numbers divisible by 2. If you used it directly, the odd-numbered positions of the hashmap would never be used, leading to a x2 increase in the number of collisions. In a truly pathological case, a bad hash function can make a hashmap behave more like a list than like an O(1) container.

Sun engineers must have run tests that show that too many hash functions are not random enough in their lower bits, and that many hashmaps are not large enough to ever use the higher bits. Under these circumstances, the bit operations in HashMap's hash(int h) can provide a net improvement over most expected use-cases (due to lower collision rates), even though extra computation is required.

tucuxi 2010-03-29 15:04:01

+1 Wow, really good answer, much much better than mine -,-

Helper Method 2010-03-29 21:08:18

"just in case"? Actually, most hash codes in Java are going to be crappy. Just look at java.lang.Integer, for instance!But this actually makes sense. It's better to say "it's okay if everyone's Object.hashCode()s have crappy bit distribution, as long as they follow the equal-objects-have-equal-hashcodes rule, and try to avoid collisions as much as possible." Then only collection implementations like HashMap have the burden of passing those values through a secondary hash function, instead of it being everyone's problem.

Kevin Bourrillion 2010-03-30 00:05:24

Answer 3

A:

Are you sure this is still the case? I looked at the code in the link given above and the only operations I see that look like this are in the calculate capacity function.

http://www.dreamincode.net/forums/topic/189119-javautil%3Bhashmapcalculatecapacityin-x/

John Creighton 2010-09-05 09:33:51

ansaurus

tags:

views:

answers:

Why does a HashMap rehash the hashcode supplied by the Key object

related questions