views:

162

answers:

4

Hello there.

I am trying to clear up some things regarding complexity in some of the operations of TreeSet. On the javadoc it says:

"This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains)."

So far so good. My question is what happens on addAll(), removeAll() etc. Here the javadoc for Set says:

"If the specified collection is also a set, the addAll operation effectively modifies this set so that its value is the union of the two sets."

Is it just explaining the logical outcome of the operation or is it giving a hint about the complexity? I mean, if the two sets are represented by e.g. red-black trees it would be better to somehow join the trees than to "add" each element of one to the other.

In any case, is there a way to combine two TreeSets into one with O(logn) complexity?

Thank you in advance. :-)

+4  A: 

You could imagine how it would be possible to optimize special cases to O(log n), but the worst case has got to be O(m log n) where m and n are the number of elements in each tree.

Edit:

http://net.pku.edu.cn/~course/cs101/resource/Intro2Algorithm/book6/chap14.htm

Describes a special case algorithm that can join trees in O(log(m + n)) but note the restriction: all members of S1 must be less than all members of S2. This is what I meant that there are special optimizations for special cases.

bshields
+1  A: 

According to this blog post:
http://rgrig.blogspot.com/2008/06/java-api-complexity-guarantees.html
it's O(n log n). Because the documentation gives no hints about the complexity, you might want to write your own algorithm if the performance is critical for you.

dark_charlie
+3  A: 

Looking at the java source for TreeSet, it looks like it if the passed in collection is a SortedSet then it uses a O(n) time algorithm. Otherwise it calls super.addAll, which I'm guessing will result in O(n logn).

EDIT - guess I read the code too fast, TreeSet can only use the O(n) algorithm if it's backing map is empty

carnold
A: 

It is not possible to perform merging of trees or join sets like in Disjoint-set data structures because you don't know if the elements in the 2 trees are disjoint. Since the data structures have knowledge about the content in other trees, it is necessary to check if one element exists in the other tree before adding to it or at-least trying to add it into another tree and abort adding it if you find it on the way. So, it should be O(MlogN)

ajay
I can't quite understand this. Suppose you have two SortedSets that have no-overlapping elements and are represented by red-black trees. How come you can't join them since the "join" operation in red-black trees take O(log(n+m)) time?
Andreas K.
Given 2 arbitrary TreeSets, how will you find out if that is the case?
ajay
Well according to the program I am currently making, I can guarantee that the two TreeSets won't have any overlapping element. However it seems that I am not able to join them in O(log(n+m)) as pointed out by the rest of the answers...
Andreas K.