views:

206

answers:

2

In my code, Java TreeSet iteration is the dominant time factor. In looking at the system I believe that it is O(n) complexity. Can anyone verify this?

I am thinking that by providing links backward from child node to parent node I could improve the performance.

+3  A: 

TreeSet iteration is of course O(n), as can be expect from any sensible tree-walking algorithm.

I am thinking that by providing links backward from child node to parent node I could improve the performance.

TreeMap (which TreeSet is based on) already has such parent references. This is the method it all boils down to:

private Entry<K,V> successor(Entry<K,V> t) {
    if (t == null)
        return null;
    else if (t.right != null) {
        Entry<K,V> p = t.right;
        while (p.left != null)
            p = p.left;
        return p;
    } else {
        Entry<K,V> p = t.parent;
        Entry<K,V> ch = t;
        while (p != null && ch == p.right) {
            ch = p;
            p = p.parent;
        }
        return p;
    }
}
Michael Borgwardt
I think the key point is that it is not only the leaf nodes that have values attached. For `n` leafs you do have `O(n log n)` nodes but also `O(n log n)` values. In real cases, I would expect `O(n)` operations to dominate `O(n log n)` operations anyway.
Tom Hawtin - tackline
I thought it would be O(n)+O(logn) because if you are iterating in order you have to first use log(n) traversals to find the left or rightmost leaf, and then you have n traversals to walk the tree backwards. Thus you have O(n) ?Does my math make sense?
John Smith
@john: O(n)+O(log n) is the same thing as O(n) per definition.
Michael Borgwardt
@Tom: O(n*log n) is strictly greater than O(n), but it's not true that you have O(n*log n) nodes for n leafs. For a binary tree with n leafs, you have n/2 nodes at the second-to-lowest level, n/4 at the next higher one... That's a geometric series and adds up to 2n, i.e. O(n).
Michael Borgwardt
@Michael Borgwardt Second part is true (but it's mistake which could clearly have been made!). Although O(n log n) is more stuff than O(n), for actual values of n, doing O(n log n) and O(n) may take longer in the latter.
Tom Hawtin - tackline
+1  A: 

Have you considered taking a copy of the TreeSet when you alter it? If the dominate time is spent in TreeSet iteration (rather than modifying it) then copying the TreeSet to an array or ArrayList (only when altered) and only iterating over this array/ArrayList could almost elminate the cost of TreeSet iteration.

Peter Lawrey
Or Guava's `ImmutableSortedSet`, which is array-backed.
Kevin Bourrillion