views:

792

answers:

4

I understand that the String class' hashCode() method is not guarantied to generate unique hash codes for distinct String-s. I see a lot of usage of putting String keys into HashMap-s (using the default String hashCode() method). A lot of this usage could result in significant application issues if a map put displaced a HashMap entry that was previously put onto the map with a truely distinct String key.

What are the odds that you will run into the scenario where String.hashCode() returns the same value for distinct String-s? How do developers work around this issue when the key is a String?

+2  A: 

I strongly suspect that the HashMap.put method does not determine whether the key is the same by just looking at String.hashCode.

There is definitely going to be a chance of a hash collision, so one would expect that the String.equals method will also be called to be sure that the Strings are truly equal, if there is indeed a case where the two Strings have the same value returned from hashCode.

Therefore, the new key String would only be judged to be the same key String as one that is already in the HashMap if and only if the value returned by hashCode is equal, and the equals method returns true.

Also to add, this thought would also be true for classes other than String, as the Object class itself already has the hashCode and equals methods.

Edit

So, to answer the question, no, it would not be a bad idea to use a String for a key to a HashMap.

coobird
The javadocs don't specify this. But it makes sense.
Marcus
Of course equals is also used, but as soon as you have multiple objects within the same bucket, your O(1) is more like O(n)...
Zed
@Zed: That's why it's important to have an appropriate size for collections which rely on a hash to determine where they physically store their values. I believe the Hash* implementations in Java go for a load factor of 0.75 for their storage.
coobird
@Marcus: javadocs does specify that equals is used: http://java.sun.com/javase/6/docs/api/java/util/HashMap.html#get%28java.lang.Object%29
Zed
+2  A: 

You are talking about hash collisions. Hash collisions are an issue regardless of the type being hashCode'd. All classes that use hashCode (e.g. HashMap) handle hash collisions just fine. For example, HashMap can store multiple objects per bucket.

Don't worry about it unless you are calling hashCode yourself. Hash collisions, though rare, don't break anything.

Keith Randall
However the Javadocs for the put method say "If the map previously contained a mapping for this key, the old value is replaced." This seems to contradict the suggestion that HashMap "can store multiple objects per bucket". No?
Marcus
@Marcus, No, a hash collision does not always mean a key collision. The value will be replaced only if the strings are exactly equivalent. If the strings have the same hashcode then it will still work. A bucket is just a linked list and can store many distinct strings with the same hash.
finnw
+13  A: 
CPerkins
Does your example assume that each object in the map correctly implements the equals() method which lets the HashMap differentiate between the objects?
Marcus
Be careful - this could be construed as "write your own version of String.hashCode()" which is probably not a good idea. I know that is not what you mean but I would recommend removing that part of your answer to avoid confusion.
finnw
@Marcus - correctly? Yes, there must be a "correct" equals method - either the one inherited from Object, which is an identity comparison, or build one yourself. Presence of an *incorrect* .equals method is a Bad Thing.
CPerkins
@finnw - thanks, but remember, I'm calling it a degenerate case. In fact, you can't write a version of String.hashCode, since String is final. Still, I don't like to appear to give bad advice. Do you have any suggestions for clarification?
CPerkins
@Marcus, yes, equals() must be implemented correctly for HashMap to work correctly. That is why tools like FindBugs will generate warnings if you override hashCode without overriding equals (or vice-versa.)
finnw
Sorry to get into this thread late, finnw could you respond to CPerkins and using string as a key?
Berlin Brown
+2  A: 

This is not an issue, it's just how hashtables work. It's provably impossible to have distinct hashcodes for all distinct strings, because there are far more distinct strings than integers.

As others have written, hash collisions are resolved via the equals() method. The only problem this can cause is degeneration of the hashtable, leading to bad performance. That's why Java's HashMap has a load factor, a ratio between buckets and inserted elements which, when exceeded, will cause rehashing of the table with twice the number of buckets.

This generally works very well, but only if the hash function is good, i.e. does not result in more than the statistically expected number of collisions for your particular input set. String.hashCode() is good in this regard, but this was not always so. Allegedly, prior to Java 1.2 it only inlcuded every n'th character. This was faster, but caused predictable collisions for all String sharing each n'th character - very bad if you're unluck enough to have such regular input, or if someone want to do a DOS attack on your app.

Michael Borgwardt