views:

493

answers:

5

In java, im creating a SortedSet from a list which is always going to be ordered (but is only of type ArrayList). I figure adding them one by one is going to have pretty poor performance (in the case for example of an AVL tree), as it will have to reorder the tree a lot.

my question is, how should i be creating this set? in a way that it is as fast as possible to build a balanced tree?

the specific implementation i was planning on using was either IntRBTreeSet or IntAVLTreeSet from http://fastutil.dsi.unimi.it/docs/it/unimi/dsi/fastutil/ints/IntSortedSet.html

after writing this up, I think the poor performance wont affect me too much anyway (too small amount of data), but im still interested in how it would be done in a general case.

+2  A: 

Red-Black trees are a good choice for the general case, and they have very fast inserts. See Chris Okasaki's paper for an elegant and fast implementation. The Functional Java library has a generic Set class that is backed by a red-black tree implemented according to this paper.

Apocalisp
+3  A: 

A set having a tree implementation would have the middle element from your list in the top. So the algorithm would be as following:

  1. find the middle element of the List
  2. insert it into set
  3. repeat for both sub-lists to the left and to the right of the middle element
Vladimir Dyuzhev
i think this is a good option. still has fast access to the (array)list to insert them, and what are the odds that the list elements will be sorted in that way (not very high).
gcrain
A: 

Do you have a performance problem with the simple approach of just inserting the elements as they come?

If not, don't optimize.

starblue
valid point. but for the sake of discussion let's assume he does have a performance problem.
Vladimir Dyuzhev
A: 

The built in TreeSet (http://java.sun.com/j2se/1.4.2/docs/api/java/util/TreeSet.html) class uses a red-black tree as it's backing tree (and, has been noted, red-black trees are quite fast for inserts). Here's good info on Red-Black trees (they don't have the problem of the typical binary tree implementation when inserting data that is mostly ordered already).

If you are dealing with huge data sets (big enough to require disk based backing, or significant paging file swap), then a B+Tree is a very good option (see JDBM for a Java based version of self-balancing B+Tree - it doesn't implement Set, but could be used that way if desired).

Depending on how your application is actually using this data, you might want to consider the GlazedLists library and make your lists 'live'. If all you are doing is static analysis, then this may be overkill, but it is an absolutely fantastic way of working with list based data. Definitely worth reading about.

Kevin Day
+1  A: 

With all the discussion of using a Set, it occurs to me that maybe the problem could be re-stated. Why use a Set at all? If you just want to check for membership, and your source list is sorted, then do a binary search for the object - this will be as fast (and probably faster) than any n-tree you can envision, and it's not that tough to code.

So, envision a OrderedListSet interface that just wraps the underling List object. As long as the comparator used to order the list is also used for the binary search, this should be pretty straight-forward.

All Set operations will start with a getIndex(Object ob) call, then the appropriate action is taken on the List.

Kevin Day
the problem is that the list IS sorted, but its not in a data structure that guarantees order. so i can assume that its in order, but cant be absolutely sure that my code will work
gcrain
Makes sense to me. I definitely recommend that you look at GlazedLists - it completely changed how I write code involving lists.
Kevin Day