views:

2909

answers:

5
+3  Q: 

Hashset vs Treeset

I've always loved trees, that nice O(n*lg(n)) and the tidyness of them. However, every software engineer I've ever known has asked me pointedly why I would use a treeset. From a CS background, I don't think it matters all that much which you use, and I don't care to mess around with hash functions and buckets (in the case of Java).

In what cases would I want to use a hashset over a treeset?

+7  A: 

HashSet is O(1) to access elements, so it certainly does matter. But maintaining order of the objects in the Set isn't possible.

TreeSet is useful if maintaining an order matters to you. But, as you've noted, you're trading order for slower time to access an element: O(log n) for basic operations.

From the javadocs for TreeSet:

This implementation provides guaranteed log(n) time cost for the basic operations ({@code add}, {@code remove} and {@code contains}).

duffymo
+1  A: 

If you aren't inserting enough elements to result in frequent rehashings (or collisions, if your HashSet can't resize), a HashSet certainly gives you the benefit of constant time access. But on sets with lots of growth or shrinkage, you may actually get better performance with Treesets, depending on the implementation.

Amortized time can be close to O(1) with a functional red-black tree, if memory serves me. Okasaki's book would have a better explanation than I can pull off. (Or see his publication list)

JasonTrue
+1  A: 

HashSet implementations are, of course, much much faster -- less overhead because there's no ordering. A good analysis of the various Set implementations in Java is provided at http://java.sun.com/docs/books/tutorial/collections/implementations/set.html.

The discussion there also points out an interesting 'middle ground' approach to the Tree vs Hash question. Java provides a LinkedHashSet, which is a HashSet with an "insertion-oriented" linked list running through it, that is, the last element in the linked list is also the most recently inserted into the Hash. This allows you to avoid the unruliness of an unordered hash without incurring the increased cost of a TreeSet.

Joe
+1  A: 

The reason why most use HashSet is that the operations are (on average) O(1) instead of O(log n). If the set contains standard items you will not be "messing around with hash functions" as that has been done for you. If the set contains custom classes, you have to implement hashCode to use HashSet (although Effective Java shows how), but if you use a TreeSet you have to make it Comparable or supply a Comparator. This can be a problem if the class does not have a particular order.

I have sometimes used TreeSet (or actually TreeMap) for very small sets/maps (< 10 items) although I have not checked to see if there is any real gain in doing so. For large sets the difference can be considerable.

Now if you need the sorted, then TreeSet is appropriate, although even then if updates are frequent and the need for a sorted result is infrequent, sometimes copying the contents to a list or an array and sorting them can be faster.

Kathy Van Stone
A: 

Message Edit ( complete rewrite ) When order does not matter, that's when. Both should give Log(n) - it would be of utility to see if either is over five percent faster than the other. HashSet can give O(1) testing in a loop should reveal whether it is.

Nicholas Jordan