views:

141

answers:

2

I am working on a piece of 3D software that has sometimes has to perform intersections between massive numbers of curves (sometimes ~100,000). The most natural way to do this is to do an N^2 bounding box check, and then those curves whose bounding boxes overlap get intersected.

I heard good things about octrees, so I decided to try implementing one to see if I would get improved performance.

Here's my design: Each octree node is implemented as a class with a list of subnodes and an ordered list of object indices.

When an object is being added, it's added to the lowest node that entirely contains the object, or some of that node's children if the object doesn't fill all of the children.

Now, what I want to do is retrieve all objects that share a tree node with a given object. To do this, I traverse all tree nodes, and if they contain the given index, I add all of their other indices to an ordered list.

This is efficient because the indices within each node are already ordered, so finding out if each index is already in the list is fast. However, the list ends up having to be resized, and this takes up most of the time in the algorithm. So what I need is some kind of tree-like data structure that will allow me to efficiently add ordered data, and also be efficient in memory.

Any suggestions?

A: 

SortedDictionary (.NET 2+) or SortedSet (.NET 4 only) is probably what you want. They are tree structures.

SortedList is a dumb class which is no different from List structurally.

However, it is still not entirely clear to me why you need this list as sorted. Maybe if you could elaborate on this matter we could find a solution where you don't need sorting at all. For example a simple HashSet could do. It is faster at both lookups and insertions than SortedList or any of the tree structures if hashing is done properly.

Ok, now when it is clear to me that you wanted sorted lists merging, I can try to write an implementation.

At first, I implemented merging using SortedDictionary to store heads of all the arrays. At each iteration I removed the smallest element from the dictionary and added the next one from the same array. Performance tests showed that overhead of SortedDictionary is huge, so that it is almost impossible to make it faster than simple concatenation+sorting. It even struggles to match SortedList performance on small tests.

Then I replaced SortedDictionary with custom-made binary heap implementation. Performance improvement was tremendous (more than 6 times). This Heap implementation even manages to beat .Distinct() (which is usually the fastest) in some tests.

Here is my code:

class Heap<T>
{
    public Heap(int limit, IComparer<T> comparer)
    {
        this.comparer = comparer;
        data = new T[limit];
    }

    int count = 0;
    T[] data;

    public void Add(T t)
    {
        data[count++] = t;
        promote(count-1);
    }

    IComparer<T> comparer;

    public int Count { get { return count; } }

    public T Pop()
    {
        T result = data[0];
        fill(0);
        return result;
    }

    bool less(T a, T b)
    {
        return comparer.Compare(a,b)<0;
    }

    void fill(int index)
    {
        int child1 = index*2+1;
        int child2 = index*2+2;
        if(child1 >= Count)
        {
            data[index] = data[--count];
            if(index!=count)
                promote(index);
        }
        else
        {
            int bestChild = child1;
            if(child2 < Count && less(data[child2], data[child1]))
            {
                bestChild = child2;
            }

            data[index] = data[bestChild];
            fill(bestChild);
        }
    }

    void promote(int index)
    {
        if(index==0)
            return;
        int parent = (index-1)/2;
        if(less(data[index], data[parent]))
        {
            T tmp = data[parent];
            data[parent] = data[index];
            data[index] = tmp;
            promote(parent);
        }
    }
}

struct ArrayCursor<T>
{
    public T [] Array {get;set;}
    public int Index {get;set;}
    public bool Finished {get{return Array.Length == Index;}}
    public T Value{get{return Array[Index];}}
}

class ArrayComparer<T> : IComparer<ArrayCursor<T>>
{
    IComparer<T> comparer;
    public ArrayComparer(IComparer<T> comparer)
    {
        this.comparer = comparer;
    }

    public int Compare (ArrayCursor<T> a, ArrayCursor<T> b)
    {
        return comparer.Compare(a.Value, b.Value);
    }
}

static class HeapMerger
{
    public static IEnumerable<T> MergeUnique<T>(this T[][] arrays)
    {
        bool first = true;
        T last = default(T);
        IEqualityComparer<T> eq = EqualityComparer<T>.Default;
        foreach(T i in Merge(arrays))
            if(first || !eq.Equals(last,i))
            {
                yield return i;
                last = i;
                first = false;
            }
    }

    public static IEnumerable<T> Merge<T>(this T[][] arrays)
    {
        var map = new Heap<ArrayCursor<T>>(arrays.Length, new ArrayComparer<T>(Comparer<T>.Default));

        Action<ArrayCursor<T>> tryAdd = (a)=>
        {
            if(!a.Finished)
                map.Add(a);
        };

        for(int i=0;i<arrays.Length;i++)
            tryAdd(new ArrayCursor<T>{Array=arrays[i], Index=0});

        while(map.Count>0)
        {
            ArrayCursor<T> lowest = map.Pop();
            yield return lowest.Value;
            lowest.Index++;
            tryAdd(lowest);
        }
    }
}
Rotsor
The reason why I chose to use sorted lists inside the nodes is because the number of entries in each node is usually going to be small, so a hashtable seemed unnecessary.But then, since I have the sorted lists, it seemed like I should be able to merge them more efficiently than unsorted lists. This is the problem I am asking about.As for traversing the nodes, I am obviously only looking at the ones which contain the object I want to intersect with. Each node has a min and max index; if the object's index is between the min and max, I check if the node contains the object. Etc.
Reveazure
You are correct, it's the moving of the elements, not the actual reallocation that is taking the time. I figured this out before, but then I forgot it.
Reveazure
Ok, you can take advantage of these arrays being sorted by merging them, but it is [a bit more complex](http://discuss.joelonsoftware.com/default.asp?interview.11.439577.9), and I believe that hashtable may still be faster.
Rotsor
+1  A: 

Assuming you keep the size of the OctTree as a property of the tree, you should be able to preallocate a list that is larger than the number of things you could possibly put it in. Preallocating the size will keep the resize from happening as long as the size is larger than you need. I assume that you are using a SortedList to keep your ordered results.

var results = new SortedList<Node>( octTree.Count );
// now find the node and add the points
results = result.TrimToSize(); // reclaim space as needed

An alternative would be to augment your data structure keeping the size of the tree below the current node in the node itself. Then you'd be able to find the node of interest and directly determine what the size of the list needs to be. All you'd have to do is modify the insert/delete operations to update the size of each of the ancestors of the node inserted/deleted at the end of the operation.

tvanfosson
I could do that with a regular list as well, but doesn't it feel a bit dirty to preallocate such massive amounts of memory?
Reveazure
@Reveazure -- I suggested a mechanism to reduce the overallocation by augmenting your structure. If you have another heuristic to predict the necessary amount, that would work, too. Since the list is actually just keeping the reference, not a copy of the actual object, the amount of memory needed for even 100,000 objects is relatively small (~800K on a 32-bit machine) anyway.
tvanfosson
The problem of adding items to SortedList is clearly not reallocation, but ensuring that the list stays sorted (moving of greater elements to the end so that the new item can be inserted).
Rotsor
@Rotsor - I wasn't suggesting that he use SortedList, I was assuming that he was and suggesting how he could make it more efficient, eliminating the need for resizing, which was his observation, not mine. If he's open to using something other than an ordered list SortedDictionary is a definite possibility. It appears that the keys would be unique so I don't know why it wouldn't work. On the other hand maybe he knows that the items are inserted in key order anyway so reordering isn't an issue.
tvanfosson
@tvanfosson maybe he knows that and has some miraculous problem with resizing. Or maybe he doesn't and has the usual problem with his algorithm complexity being O(N*N). Don't even know what is more probable...
Rotsor
@Rotsor That's why I was thinking that perhaps a tree structure is called for.
Reveazure
@Reveazure - SortedDictionary uses a Red/Black Tree implementation which will work just fine if your keys are unique. Because it uses a linked implementation (rather than an array-indexed one) you don't need to/can't preallocate the capacity. If you don't need ordering, a plain Dictionary would have even better performance characteristics.
tvanfosson