views:

935

answers:

5

I have this test code:

import java.util.*;

class MapEQ {

  public static void main(String[] args) {
   Map<ToDos, String> m = new HashMap<ToDos, String>();
   ToDos t1 = new ToDos("Monday");
   ToDos t2 = new ToDos("Monday");
   ToDos t3 = new ToDos("Tuesday");
   m.put(t1, "doLaundry");
   m.put(t2, "payBills");
   m.put(t3, "cleanAttic");
   System.out.println(m.size());
} }

class ToDos{

  String day;

  ToDos(String d) { day = d; }

  public boolean equals(Object o) {
      return ((ToDos)o).day == this.day;
 }

// public int hashCode() { return 9; }
}

When // public int hashCode() { return 9; } is uncommented m.size() returns 2, when it's left commented it returns three. Why?

+9  A: 

You have overidden equals without overriding hashCode. You must ensure that for all cases where equals returns true for two objects, hashCode returns the same value. The hash code is a code that must be equal if two objects are equal (the converse need not be true). When you put your hard-coded value of 9 in, you satisfy the contract again.

In your hash map, equality is only tested within a hash bucket. Your two Monday objects should be equal, but because they are returning different hash codes, the equals method isn't even called to determine their equality - they are put straight into different buckets, and the possibility that they are equal isn't even considered.

David M
It is also worth noting that the equals function is implemented incorrectly. It only works on this case because the two "Monday" string literals are interned and therefore have the same reference.
Mark Byers
Yes, well spotted! Quite correct.
David M
The equals method should also be able to cope with `null` and arguments of different classes (note subtypes). If you keep `equals` (and `hashCode`) to a standard form, then it is trivial to implement correctly (even a machine could code it!).
Tom Hawtin - tackline
Even if two entries end up in the same bucket they aren't necessarily compared. The (Sun) implementation keeps a record of the hash code so that it doesn't need to be recomputed. This is also compared as a quick check before calling equals.
Tom Hawtin - tackline
Does size() express only the number of keys used?
omgzor
With the objects returning the same hash code, and equality defined according to the string entry, the two "Monday" objects are recognised as the same, and the second is not added - it is already in the map. size() returns the number of objects contained, and so in this case returns 2. Hope that makes sense.
David M
+3  A: 

When you don't override the hashCode() method, your ToDos class inherits the default hashCode() method from Object, which gives every object a distinct hash code. This means that t1 and t2 have two different hash codes, even though were you to compare them, they would be equal. Depending on the particular hashmap implementation, the map is free to store them separately (and this is in fact what happens).

When you do correctly override the hashCode() method to ensure that equal objects get equal hash codes, the hashmap is able to find the two equal objects and place them in the same hash bucket.

A better implementation would give objects that are not equal different hash codes, like this:

public int hashCode() {
    return (day != null) ? day.hashCode() : 0;
}
Avi
Why do you say it's "better" to give objects that aren't equal different hash codes?
David M
The hash code is used by the HashMap to quickly find objects without having to compare every object. If all your hashcodes are the same, it will cause everything to go into the same bucket which will slow things down a lot.
Mark Byers
Because that reduces the number of hash collisions, which is vital for good performance. In the extreme case of hashCode() returning the same value for all objects, the HashMap degenerates to a list, and all those nice O(1) apperations are suddenly O(n).
Michael Borgwardt
@Michael - But the same is true at the opposite extreme - your list of items in a single has bucket becomes a list of one item hash buckets. You need to get a balance, and there is certainly no rule of thumb saying that the right balance is obtained by making sure that inequality gives different hash codes.
David M
@David - there is exactly that rule of thumb, by design of a hash table. From the hashCode() docs (http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode%28%29): the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Avi
@Avi - the word in that is "may" not "will".
David M
@David M, @Avi: It's a well known fact that you want your hash functions to have a nice uniform distribution. The reason for the "may" is because distinct hashCodes may map to the same value when you are using modulo arithmetic. For example. Hashcodes of 1 and 101 in a map with 100 buckets will map to the same bucket because 1 and 101 are congruent modulo 100. If however, your key space for hash values is the same as the number of buckets you have... then producing different values for different objects is in fact optimal.
Tom
In summary, the "may" just comes from the fact that you can always have a set of really bad inputs that have distinct hashcodes and still lead to the worst case scenario of all mapping to the same bucket.
Tom
does the size() method express the number of keys? or the number of keys-value pairs?
omgzor
@dmindreader: If you mean the size() method of the HashMap, it inherits its contract from java.util.Map, which means it expresses the number of items currently in the map.
Avi
what if I have two distinct keys to map three values? what would size() print then?
omgzor
@dmindreader: I'm not sure I understand your question... in a map, the number of distinct keys in the map is always the same as the number of key-value pairs. The only time those numbers will differ is if you are using a MultiMap of some sort (which allows the same key to exist in the map). How can two distinct keys map to three values? Can you give an example?
Tom
A: 

When hashCode is uncommented, HashMap sees t1 and t2 as being the same thing; thus, t2's value clobbers that of t1. To understand how this works, note that when hashCode returns the same thing for two instances, they end up going to the same HashMap bucket. When you try to insert a second thing into the same bucket (in this case t2 is being inserted when t1 is already present), HashMap scans the bucket for another key that is equals. In your case, t1 and t2 are equals because they have the same day. At that point, "payBills" clobbers "doLaundry". As for whether t2 clobbers t1 as the key, I believe this is undefined; thus, either behavior is allowed.

There are a few important things to think about here:

  1. Are two ToDos instances really equal just because they have the same day of the week?
  2. Whenever you implement equals, you should implement hashCode so that any two objects that are equals also have the same hashCode values. This is a fundamental assumption that HashMap makes. This is probably also true of anything else that relies the hashCode method.
  3. Design your hashCode method so that the hash codes are evenly distributed; otherwise, you won't get the performance benefits of hashing. From this perspective, returning 9 is one of the worst things you can do.
allyourcode
+2  A: 

I cannot emphasize enough that you should read Chapter 3 in Effective Java (warning: pdf link). In that chapter you will learn everything you need to know about overriding methods in Object, and in particular, about the equals contract. Josh Bloch has a great recipe for overriding the equals method that you should follow. And it will help you understand why you should be using equals and not == in your particular implementation of the equals method.

Hope this helps. PLEASE READ IT. (At least the first couple items... and then you will want to read the rest :-).

-Tom

Tom
Not to mention, a recipe for overriding hashCode() too!
Tom
A: 

The Implementing equals and Implementing hashCode topics on javapractices.com might be helpful as well.

John O