views:

52

answers:

2

I have a Java HashMap whose keys are instances of java.lang.Object, that is: the keys are of different types. The hashCode values of two key objects of different types are likely to be the same when they contain identical variable values.

In order to improve the performance of the get method for my HashMap, I'm inclined to mix the name of the Java type into the hashCode methods of my key objects. I have not seen examples of this elsewhere, and so my this-might-be-wacky alarm went off. Do you think mixing the type into hashCode is a good idea? Should I mix in the class name, or the hashCode of the relevant Class object?

+3  A: 

I wouldn't mix the type name in - but if you're controlling the hashCode algorithm already, why not just change it so that they won't clash? For example, if you're using the common "add and multiply" approach, you could start off with different base cases or use different multipliers.

Before you worry about this too much though, have you actually measured how often you're really getting collisions with real data? Is this definitely a problem, or are you just concerned that it might be a problem?

Jon Skeet
Good thinking, Jon. I know there will be collisions; but I don't know what impact those collisions will have on user-detectable performance. So...fair comment; maybe I should blow it off as a problem until it jumps up and yells 'fix me!'.
David
+1  A: 

I think your this-might-be-wacky alarm should have gone off when you decided to have keys of different types. But let's assume this is a case where Object is really the way to go.

You should try it without mixing in the type name and stress test the performance if you find that this particular lookup is determined to be a hotspot in the system. Chances are the performance doesn't matter that much.

Like Jon implied, the performance of the hash map is improved by reducing collisions. Mixing in the type name is just as likely to increase collisions as it is to reduce them. To keep your hashmap in peak condition, you want the likelihood of any particular hashcode to be about that same as any other over the domain of valid key values. So the probability of a hashcode of 10 should be about the same as the probability of 100 or any other number. That way the hash table buckets fill evenly (in all likelihood). so whether you have an object of type A or type B should not matter. just the probability distribution of the hashcodes of all occurring key values.

Choy