Getting into the gray area of "on/off topic", but necessary to eliminate confusion regarding Oscar Reyes suggestion that more hash collisions is a good thing because it reduces the number of elements in the HashMap. I may misunderstand what Oscar is saying, but I don't seem to be the only one: kdgregory, delfuego, Nash0, and I all seem to share the same (mis)understanding.
If I understand what Oscar is saying about the same class with the same hashcode, he's proposing that only one instance of a class with a given hashcode will be inserted into the HashMap. Eg, if I have an instance of SomeClass with a hashcode of 1 and a second instance of SomeClass with a hashcode of 1, only one instance of SomeClass is inserted.
The Java pastebin example at http://pastebin.com/f20af40b9 seems to indicate the above correctly summarizes what Oscar is proposing.
Regardless of any understanding or misunderstanding, what happens is different instances of the same class do not get inserted only once into the HashMap if they have the same hashcode - not until it's determined whether the keys are equal or not. The hashcode contract requires that equal objects have the same hashcode; however, it doesn't require that unequal objects have different hashcodes (although this may be desirable for other reasons)[1].
The pastebin.com/f20af40b9 example (which Oscar refers to at least twice) follows, but modified slightly to use JUnit assertions rather than printlines. This example is used to support the proposal that the same hashcodes cause collisions and when the classes are the same only one entry is created (eg, only one String in this specific case):
@Test
public void shouldOverwriteWhenEqualAndHashcodeSame() {
String s = new String("ese");
String ese = new String("ese");
// same hash right?
assertEquals(s.hashCode(), ese.hashCode());
// same class
assertEquals(s.getClass(), ese.getClass());
// AND equal
assertTrue(s.equals(ese));
Map map = new HashMap();
map.put(s, 1);
map.put(ese, 2);
SomeClass some = new SomeClass();
// still same hash right?
assertEquals(s.hashCode(), ese.hashCode());
assertEquals(s.hashCode(), some.hashCode());
map.put(some, 3);
// what would we get?
assertEquals(2, map.size());
assertEquals(2, map.get("ese"));
assertEquals(3, map.get(some));
assertTrue(s.equals(ese) && s.equals("ese"));
}
class SomeClass {
public int hashCode() {
return 100727;
}
}
However, the hashcode isn't the complete story. What the pastebin example neglects is the fact that both s
and ese
are equal: they are both the string "ese". Thus, inserting or getting the contents of the map using s
or ese
or "ese"
as the key are all equivalent because s.equals(ese) && s.equals("ese")
.
A second test demonstrates it is erroneous to conclude that identical hashcodes on the same class is the reason the key -> value s -> 1
is overwritten by ese -> 2
when map.put(ese, 2)
is called in test one. In test two, s
and ese
still have the same hashcode (as verified by assertEquals(s.hashCode(), ese.hashCode());
) AND they are the same class. However, s
and ese
are MyString
instances in this test, not Java String
instances - with the only difference relevant for this test being the equals: String s equals String ese
in test one above, whereas MyStrings s does not equal MyString ese
in test two:
@Test
public void shouldInsertWhenNotEqualAndHashcodeSame() {
MyString s = new MyString("ese");
MyString ese = new MyString("ese");
// same hash right?
assertEquals(s.hashCode(), ese.hashCode());
// same class
assertEquals(s.getClass(), ese.getClass());
// BUT not equal
assertFalse(s.equals(ese));
Map map = new HashMap();
map.put(s, 1);
map.put(ese, 2);
SomeClass some = new SomeClass();
// still same hash right?
assertEquals(s.hashCode(), ese.hashCode());
assertEquals(s.hashCode(), some.hashCode());
map.put(some, 3);
// what would we get?
assertEquals(3, map.size());
assertEquals(1, map.get(s));
assertEquals(2, map.get(ese));
assertEquals(3, map.get(some));
}
/**
* NOTE: equals is not overridden so the default implementation is used
* which means objects are only equal if they're the same instance, whereas
* the actual Java String class compares the value of its contents.
*/
class MyString {
String i;
MyString(String i) {
this.i = i;
}
@Override
public int hashCode() {
return 100727;
}
}
Based on a later comment, Oscar seems to reverse what he's said earlier and acknowledges the importance of equals. However, it still seems the notion that equals is what matters, not the "same class", is unclear (emphasis mine):
"Not really. The list is created only if the hash is the same, but the key is different. For instance if a String give hashcode 2345 and and Integer gives the same hashcode 2345, then the integer is inserted into the list because String.equals( Integer ) is false. But if you have the same class ( or at least .equals returns true ) then the same entry is used. For instance new String("one") and `new String("one") used as keys, will use the same entry. Actually this is the WHOLE point of HashMap in first place! See for yourself: pastebin.com/f20af40b9 – Oscar Reyes"
versus earlier comments that explicitly address the importance of identical class and same hashcode, with no mention of equals:
"@delfuego: See for yourself: pastebin.com/f20af40b9 So, in this question the same class is being used ( wait a minute, the same class is being used right? ) Which implies that when the same hash is used the same entry is used and there is not "list" of entries. – Oscar Reyes"
or
"Actually this would increase the performance. The more collisions eq less entries in the hashtable eq. less work to do. Is not the hash ( which looks fine ) nor the hashtable ( which works great ) I'd bet it is on the object creation where the performance is degrading. – Oscar Reyes"
or
"@kdgregory: Yes, but only if the collision happens with different classes, for the same class ( which is the case ) the same entry is used. – Oscar Reyes"
Again, I may misunderstand what Oscar was actually trying to say. However, his original comments have caused enough confusion that it seems prudent to clear everything up with some explicit tests so there are no lingering doubts.
[1] - From Effective Java, Second Edition by Joshua Bloch:
Whenever it is invoked on the same object more than once during an execution of an application, the hashCode method must consistently return the
same integer, provided no information used in equal s comparisons on the
object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equal s(Obj ect) method, then calling the hashCode method on each of the two objects must produce the same
integer result.
It is not required that if two objects are unequal according to the equal s(Object) method, then calling the hashCode method on each of the two objects
must produce distinct integer results. However, the programmer should be
aware that producing distinct integer results for unequal objects may improve
the performance of hash tables.