views:

4073

answers:

9

I have been trying to understand the internal implementation of java.util.HashMap and java.util.HashSet.

Following are the doubts popping in my mind for a while:

  1. Whats is the importance of the @Override public int hashcode() in a HashMap/HashSet? Where is this hash code used internally?
  2. I have generally seen the key of the HashMap be a String like myMap<String,Object>. Can I map the values against someObject (instead of String) like myMap<someObject, Object>? What all contracts do I need to obey for this happen successfully?

Thanks in advance !

EDIT:

  1. Are we saying that the hash code of the key (check!) is the actual thing against which the value is mapped in the hash table? And when we do myMap.get(someKey); java is internally calling someKey.hashCode() to get the number in the Hash table to be looked for the resulting value?

Answer: Yes.

EDIT 2:

  1. In a java.util.HashSet, from where is the key generated for the Hash table? Is it from the object that we are adding eg. mySet.add(myObject); then myObject.hashCode() is going to decide where this is placed in the hash table? (as we don't give keys in a HashSet).

Answer: The object added becomes the key. The value is dummy!

+3  A: 

Whats is the importance of the @Override public int hashcode() in a HashMap/HashSet?

This allows the instance of the map to produce a useful hash code depending on the content of the map. Two maps with the same content will produce the same hash code. If the content is different, the hash code will be different.

Where is this hash code used internally?

Never. This code only exists so you can use a map as a key in another map.

Can I map the values against someObject (instead of String) like myMap<someObject, Object>?

Yes but someObject must be a class, not an object (your name suggests that you want to pass in object; it should be SomeObject to make it clear you're referring to the type).

What all contracts do I need to obey for this happen successfully?

The class must implement hashCode() and equals().

[EDIT]

Are we saying that the hash code of the key (check!) is the actual thing against which the value is mapped in the hash table?

Yes.

Aaron Digulla
You're saying that hashcode of map is calculated based on content, which means that it can change during map lifetime. Later you write that map can be used as a key in another map. Having object whose hashcode can change as a key in hash colllection is highly risky and leads to memory leaks
Luno
@Luno - yes, but that's the responsibility of the person who designed the app. The fact is that the Set API *requires* that `equals` is overridden, so `hashcode` *must also* be overridden to match.
Stephen C
@Johannes: No, that's an external usage.
Aaron Digulla
+1  A: 

There is a intricate relationship between equals(), hashcode() and hash tables in general in Java (and .NET too, for that matter). To quote from the documentation:

public int hashCode()

Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.

The general contract of hashCode is:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)

The line

@Overrides public int hashCode()

just tells that the hashCode() method is overridden. This ia usually a sign that it's safe to use the type as key in a HashMap.

And yes, you can aesily use any object which obeys the contract for equals() and hashCode() in a HashMap as key.

Joey
"This ia usually a sign that it's safe to use the type as key in a HashMap." That answered my question 2 perfectly. Thanks a ton !
peakit
+2  A: 
  1. Any Object in Java must have a hashCode() method; HashMap and HashSet are no execeptions. This hash code is used if you insert the hash map/set into another hash map/set.
  2. Any class type can be used as the key in a HashMap/HashSet. This requires that the hashCode() method returns equal values for equal objects, and that the equals() method is implemented according to contract (reflexive, transitive, symmetric). The default implementations from Object already obey these contracts, but you may want to override them if you want value equality instead of reference equality.
Thomas
+1  A: 

Yes. You can use any object as the key in a HashMap. In order to do so following are the steps you have to follow.

  1. Override equals.

  2. Override hashCode.

The contracts for both the methods are very clearly mentioned in documentation of java.lang.Object. http://java.sun.com/javase/6/docs/api/java/lang/Object.html

And yes hashCode() method is used internally by HashMap and hence returning proper value is important for performance.

Here is the hashCode() method from HashMap

public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

It is clear from the above code that hashCode of each key is not just used for hashCode() of the map, but also for finding the bucket to place the key,value pair. That is why hashCode() is related to performance of the HashMap

Varun
thanks Varun for this info..
peakit
"hashCode of each key is not just used for hashCode() of the map" could you please clarify on this..i thought.. it is **only** used to decide the bucket..
peakit
+5  A: 

The answer to question 2 is easy - yes you can use any Object you like. Maps that have String type keys are widely used because they are typical data structures for naming services. But in general, you can map any two types like Map<Car,Vendor> or Map<Student,Course>.

For the hashcode() method it's like answered before - whenever you overwrite equals(), then you have to overwrite hashcode() to obey the contract. On the other hand, if you're happy with the standard implementation of equals(), then you shouldn't touch hashcode() (because that could break the contract and result in identical hashcodes for unequal objects).

Practical sidenote: eclipse (and probably other IDEs as well can auto generate a pair of equals() and hashcode() implementation for your class, just based on the class members.

Edit

For you additional question: yes, exactly. Look at the source code for HashMap.get(Object key) calls key.hashcode to calculate the position (bin) in the internal hashtable and returns the value at that position (if there is one).

But be careful with 'handmade' hashcode/equals methods - if you use an object as a key, make sure, that the hashcode doesn't change afterwards, otherwise you won't find the mapped values anymore. In other words, the fields you use to calculate equals and hashcode should be final (or 'unchangeable' after creation of the object).

Assume, we have a contact with String name and String phonenumber and we use both fields to calculate equals() and hashcode(). Now we create "John Doe" with his mobile phone number and map him to his favorite Donut shop. hashcode() is used to calculate the index (bin) in the hash table and that's where the donut shop is stored.

Now we learn that he has a new phone number and we change the phone number field of the John Doe object. This results in a new hashcode. And this hashcode resolves to a new hash table index - which usually isn't the position where John Does Donut shop was stored.

The problem is clear: In this case we wanted to map "John Doe" to the Donut shop, and not "John Doe with a specific phone number". so we have to be careful with autogenerated equals/hashcode, because the might use unwanted fields and introduce trouble with HashMaps and HashSets.

Edit 2

If you add an object to a HashSet, the Object is the key for the internal hash table, the value is set but unused (just a static instance of Object). Here's the implementation from the openjdk 6 (b17):

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
private transient HashMap<E,Object> map;

public boolean add(E e) {
  return map.put(e, PRESENT)==null;
}
Andreas_D
"the value is set but unused (just a static instance of Object)." I did n't get this completely..pls explain..And secondly in HashSet if the value of the obj is changed after wards.. the problem u mentioned for HashMap (hashcode of key gets changed, not traceable) should not happen.. right? confirm...
peakit
marking this as done.. very nice explanation.. thanks
peakit
+1  A: 

Aaron Digulla is absolutely correct. An interesting additional note that people don't seem to realise is that the key object's hashCode() method is not used verbatim. It is, in fact, rehashed by the HashMap i.e. it calls hash(someKey.hashCode)), where hash() is an internal hashing method.

To see this, have a look at the source: http://kickjava.com/src/java/util/HashMap.java.htm

The reason for this is that some people implement hashCode() poorly and the hash() function gives a better hash distribution. It's basically done for performance reasons.

GaryF
nice point Gary..
peakit
+1  A: 

In answer to question 2, though you can have any class that can be used to as the key in Hashmap, the best practice is to use immutable classes as keys for the HashMap. Or at the least if your "hashCode", and "equals" implementation are dependent on some of the attributes of your class then you should take care that you don't provide methods to alter these attributes.

sateesh
"though you can have any class that can be used to as the key in Hashmap, the best practice is to use immutable classes as keys for the HashMap" Eye opener for me.. thanks Sateesh..
peakit
+1  A: 

Hashing containers like HashMap and HashSet provide fast access to elements stored in them by splitting their contents into "buckets".

For example the list of numbers: 1, 2, 3, 4, 5, 6, 7, 8 stored in a List would look (conceptually) in memory something like: [1, 2, 3, 4, 5, 6, 7, 8].

Storing the same set of numbers in a Set would look more like this: [1, 2] [3, 4] [5, 6] [7, 8]. In this example the list has been split into 4 buckets.

Now imagine you want to find the value 6 out of both the List and the Set. With a list you would have to start at the beginning of the list and check each value until you get to 6, this will take 6 steps. With a set you find the correct bucket, the check each of the items in that bucket (only 2 in our example) making this a 3 step process. The value of this approach increases dramatically the more data you have.

But wait how did we know which bucket to look in? That is where the hashCode method comes in. To determine the bucket in which to look for an item Java hashing containers call hashCode then apply some function to the result. This function tries to balance the numbers of buckets and the number of items for the fastest lookup possible.

During lookup once the correct bucket has been found each item in that bucket is compared one at a time as in a list. That is why when you override hashCode you must also override equals. So if an object of any type has both an equals and a hashCode method it can be used as a key in a Map or an entry in a Set. There is a contract that must be followed to implement these methods correctly the canonical text on this is from Josh Bloch's great book Effective Java: Item 8: Always override hashCode when you override equals

Tendayi Mawushe
Very nice explanation Tendayi.."During lookup once the correct bucket has been found each item in that bucket is compared one at a time as in a list." ..why would you do this comparison.. as we never know the object, we passed the key..
peakit
This explanation is mainly for when you are searching for an item in a Set or Map. However when you are adding an item to the container you still need to check the existing items. This because a Set item or Map key can only appear once, that is adding an item that is equal to one already in the collection (according to the implementation of the equals method) overwrites the existing item.
Tendayi Mawushe
A: 

"That is why when you override hashCode you must also override equals" it should be the other way.

test