views:

769

answers:

6

I was reading the Java api docs on Hashtable class and came across several questions. In the doc, it says "Note that the hash table is open: in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially. " I tried the following code myself

Hashtable<String, Integer> me = new Hashtable<String, Integer>();
me.put("one", new Integer(1));
me.put("two", new Integer(2));
me.put("two", new Integer(3));
System.out.println(me.get("one"));  
System.out.println(me.get("two"));

the out put was

1
3
  1. Is this what it means by "open"?
  2. what happened to the Integer 2? collected as garbage?
  3. Is there an "closed" example?
+1  A: 

A hash is a computed function that maps one object ("one" or "two" in your sample) to (in this case) an integer. This means that there may be multiple values that map to the same integer ( an integer has a finite number of permitted values while there may be an infinite number of inputs) . In this case "equals" must be able to tell these two apart. So your code example is correct, but there may be some other key that has the same hashcode (and will be put in the same bucket as "two")

krosenvold
+9  A: 

No, this is not what is meant by "open".

Note the difference between a key collision and a hash collision.

The Hashtable will not allow more than one entry with the same key (as in your example, you put two entries with the key "two", the second one (3) replaced the first one (2), and you were left with only the second one in the Hashtable).

A hash collision is when two different keys have the same hashcode (as returned by their hashCode() method). Different hash table implementations could treat this in different ways, mostly in terms of low-level implementation. Being "open", Hashtable will store a linked list of entries whose keys hash to the same value. This can cause, in the worst case, O(N) performance for simple operations, that normally would be O(1) in a hash map where the hashes mostly were different values.

Avi
How can I avoid different values hashed to one key?
derrdji
To aid in understanding: Look at the source to String.hashCode() and print the output of "one".hashCode() and "two".hashCode()
basszero
@derrdji: In general, hashCode() should be implemented in such a way as to make it unlikely that different values would hash to the same key. If you are implementing your own classes as keys, and especially if you override the equals() method, you should provide a good hashCode() method as well. For more information on the theory of hash functions, see: http://en.wikipedia.org/wiki/Hash_function
Avi
Note that whether two different values hash to the same hash code or to different hash codes has no FUNCTIONAL effect on your program. One of the things that HashTable and HashMap do for you is deal with this case behind the scenes. The only difference it makes is a performance difference. If, in the worst case, every key you added to the hash table had the same hash code, you'd end up sequentially searching the entire list on every access rather than jumping to the right place, and performance would suck. But the program would still work.
Jay
+1  A: 

It means that Hashtable uses open hashing (also known as separate chaining) to deal with hash collisions. If two separate keys have the same hashcode, both of them will be stored in the same bucket (in a list).

Bill the Lizard
I checked the source code, and both Hashtable and Hashmap use chaining. This is actually the opposite of "open addressing", and each hashcode can only occupy exactly one address in the array. An example of open addressing would be linear probing.FWIW, the terminology is confusing: open hashing is the exact opposite of open addressing.
Sean Reilly
@Sean: That is confusing! I fixed my error. Thank you for leaving a comment.
Bill the Lizard
+2  A: 

Open means that if two keys are not equal, but have the same hash value, then they will be stored in the same "bucket". In this case, you can think of each bucket as a linked list, so if many things are stored in the same bucket, search performance will decrease.

Bucket 0: Nothing
Bucket 1: Item 1
Bucket 2: Item 2 -> Item 3
Bucket 3: Nothing
Bucket 4: Item 4

In this case, if you search for a key that hashes to bucket 2, you have to then perform an O(n) search on the list to find the key that equals what you're searching for. If the key hashes to Bucket 0, 1, 3, or 4, then you get an O(1) search performance.

Ryan Ahearn
+3  A: 

It means that two items with different keys that have the same hashcode end up in the same bucket.

In your case the keys "two" are the same and so the second put overwrites the first one.

But assuming that you have your own class

class Thingy {
    private final String name;

    public Thingy(String name) {
         this.name = name;
    }

    public boolean equals(Object o) {
        ...

    }

    public int hashcode() {
       //not the worlds best idea
       return 1;
    }

}

And created multiple instances of it. i.e.

Thingy a = new Thingy("a"); 
Thingy b = new Thingy("b"); 
Thingy c = new Thingy("c");

And inserted them into a map. Then one bucket i.e. the bucket containing the stuff with hashcode 1 will contain a list (chain) of the three items.

Map<Thingy, Thingy> map = new HashMap<Thingy, Thingy>();
map.put(a, a);
map.put(b, b);
map.put(c, c);

So getting an item by any Thingy key would result in a lookup of the hashcode O(1) followed by a linear search O(n) on the list of items in the bucket with hashcode 1.

Also be careful to ensure that you obey the correct relationship when implementing hashcode and equals. Namely if two objects are equal then they should have the same hascode, but not necessarily the otherway round as multiple keys are likely to get the same hashcode.

Oh and for the full definitions of Open hashing and Closed hash tables look here http://www.c2.com/cgi/wiki?HashTable

pjp
An excellent answer, even without the ironically understated "not the worlds best idea". +1, but I'd vote twice if they let me. :)
CPerkins
A: 

Warning: there are contradictory definitions of "open hashing" in common usage:

Quoting from http://www.c2.com/cgi/wiki?HashTable cited in another answer:

Caution: some people use the term "open hashing" to mean what I've called "closed hashing" here! The usage here is in accordance with that in TheArtOfComputerProgramming and IntroductionToAlgorithms, both of which are recommended references if you want to know more about hash tables.

For example, the above page defines "open hashing" as follows:

There are two main strategies. Open hashing, also called open addressing, says: when the table entry you need for a new key/value pair is already occupied, find another unused entry somehow and put it there. Closed hashing says: each entry in the table is a secondary data structure (usually a linked list, but there are other possibilities) containing the actual data, and this data structure can be extended without limit.

By contrast, the definition supplied by Wikipedia is:

In the strategy known as separate chaining, direct chaining, or simply chaining, each slot of the bucket array is a pointer to a linked list that contains the key-value pairs that hashed to the same location. Lookup requires scanning the list for an entry with the given key. Insertion requires appending a new entry record to either end of the list in the hashed slot. Deletion requires searching the list and removing the element. (The technique is also called open hashing or closed addressing, which should not be confused with 'open addressing' or 'closed hashing'.)

If so-called "experts" cannot agree what the term "open hashing" means, it is best to avoid using it.

Stephen C