views:

387

answers:

7

The hash structures I am aware of - HashTable, HashSet & HashMap.

Do they all use the bucket structure - ie when two hashcodes are similar exactly the same one element does not overwrite the other, instead they are placed in the same bucket associated with that hashcode?

A: 

Except the HashSet. Set is by definition unique elements.

This was a mistake. Please see the comments below.

coolest_head
-1: Different elements may have same Hash - they will both be added
djna
That doesn't preclude hash code collisions. A set still has to deal with them for non-equal objects. Anyway, a HashSet is just a wrapper to a HashMap in implementation.
Yishai
The question wasn't about duplicate elements. It was about elements that have the same hash elements. A hashset can of course contain multiple objects that have the same hash code.
sepp2k
I misread the question that it was about the equal elements rather than the elements with same hashcode. My apologies. I'll delete the post now.
coolest_head
+1  A: 

I believe Java hash structures all use a form of chaining to deal with colisions when performing the hashing - which places the items that have the same hash into a list.

I do not believe that Java uses open addressing for it's hash based data structures (open addressing recomputes hashes based on retry sequences until it finds an open slit in the table)

James Conigliaro
I'd vote this up if I had enough rep!
ST
+5  A: 

A linked list is used at each bucket for dealing with hash collisions. Note that the java HashSet is actually implemented by a HashMap underneath (all keys being mapped to the same singleton value across all HashSets) and hence uses the same bucket structure.

If an element is added, its equality is checked against all items in the linked list (via .equals) before it is added at the end. Hence having hash collisions is particularly bad, as this could be an expensive check as the linked list becomes larger.

oxbow_lakes
A (singly) linked list, but not LinkedList!
Tom Hawtin - tackline
Yup - that's very true
oxbow_lakes
and is HashTable implemented similarly or is it independent of the two you describe above?
ST
+1  A: 

The apis define the behaviour, the internals of how Hash collisions are managed doesn't affect the guarantees of the API ... the performance impact of bad hash value computation is another story. Let's just hash everything to 42 and see how it behaves.

djna
+5  A: 

In Sun's current implementation of the Java library, IdentityHashMap and the internal implementation in ThreadLocal use probing structures.

The general problem with probing hash tables in Java is that hashCode and equals may be relatively expensive. Therefore you want to cache the hash value. You can't have an array that mixes references and primitives, so you'd need to do something relatively complicated. On the other hand, if you are using == to check matches, then you can check many references without a performance problem.

IIRC, Azul had a fast concurrent quadratic probing hash map.

Tom Hawtin - tackline
probably a great answer but over my head slightly - what is meant by probing structures? You mean when comparing hashes?
ST
Proving is a synonym for open addressing give in Adam Rosenfield's answer.
Tom Hawtin - tackline
+2  A: 

No -- open addressing is an alternate method of representing hash tables, where objects are stored directly in the table, instead of residing in a linked list. Only one object can be stored at a given index, so resolving collisions is more complicated.

When adding an object for which another object already resides at the same index, a probing sequence is used to determine the new index at which to store the new object. Removing objects is also more complicated, since you if you remove an object, you need to leave a marker that says "there used to be an object here"; for more details, see Wikipedia.

Open addressing is preferable when the objects being stored as small and will rarely be deleted. Open addressing has improved cache performance, since you don't need to go through an extra level of indirection walking a linked list.

The classes you mentioned -- HashTable, HashSet, and HashMap don't use open addressing, but you could easily create new classes that implemented open addressing and provided the same APIs as those classes.

Adam Rosenfield
A: 

Maps and Sets are the interfaces that determine the behavior of a HashSet or HashMap. A HashSet is a Set, and so it behaves like a Set (ie duplicates are not allowed). A HashMap acts like a Map - it will not overwrite a key with a similar hashcode, but it will overwrite a key, if the same exact key is used again. This will be the same regardless of what data structure is backing the Map internally. See the javadoc for Sets and HashMaps for more.

Did you mean to ask something about the specific implementation of one of these structures?

Peter Recore
so if I have two objects that are not identical (equals() fails) but return the same int from hashcode(), when both objects are placed in the same HashSet will one overwrite the other?
ST
yes, as stated in the javadoc for Set, (http://java.sun.com/javase/7/docs/api/java/util/Set.html#add(E)) the ultimate definition of sameness is equals(). Quote: "More formally, adds the specified element e to this set if the set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). "
Peter Recore