ansaurus

Question

How to get results efficiently out of an Octree/Quadtree?

Answer 1

A:

SortedDictionary (.NET 2+) or SortedSet (.NET 4 only) is probably what you want. They are tree structures.

SortedList is a dumb class which is no different from List structurally.

However, it is still not entirely clear to me why you need this list as sorted. Maybe if you could elaborate on this matter we could find a solution where you don't need sorting at all. For example a simple HashSet could do. It is faster at both lookups and insertions than SortedList or any of the tree structures if hashing is done properly.

Ok, now when it is clear to me that you wanted sorted lists merging, I can try to write an implementation.

At first, I implemented merging using SortedDictionary to store heads of all the arrays. At each iteration I removed the smallest element from the dictionary and added the next one from the same array. Performance tests showed that overhead of SortedDictionary is huge, so that it is almost impossible to make it faster than simple concatenation+sorting. It even struggles to match SortedList performance on small tests.

Then I replaced SortedDictionary with custom-made binary heap implementation. Performance improvement was tremendous (more than 6 times). This Heap implementation even manages to beat .Distinct() (which is usually the fastest) in some tests.

Here is my code:

class Heap<T>
{
    public Heap(int limit, IComparer<T> comparer)
    {
        this.comparer = comparer;
        data = new T[limit];
    }

    int count = 0;
    T[] data;

    public void Add(T t)
    {
        data[count++] = t;
        promote(count-1);
    }

    IComparer<T> comparer;

    public int Count { get { return count; } }

    public T Pop()
    {
        T result = data[0];
        fill(0);
        return result;
    }

    bool less(T a, T b)
    {
        return comparer.Compare(a,b)<0;
    }

    void fill(int index)
    {
        int child1 = index*2+1;
        int child2 = index*2+2;
        if(child1 >= Count)
        {
            data[index] = data[--count];
            if(index!=count)
                promote(index);
        }
        else
        {
            int bestChild = child1;
            if(child2 < Count && less(data[child2], data[child1]))
            {
                bestChild = child2;
            }

            data[index] = data[bestChild];
            fill(bestChild);
        }
    }

    void promote(int index)
    {
        if(index==0)
            return;
        int parent = (index-1)/2;
        if(less(data[index], data[parent]))
        {
            T tmp = data[parent];
            data[parent] = data[index];
            data[index] = tmp;
            promote(parent);
        }
    }
}

struct ArrayCursor<T>
{
    public T [] Array {get;set;}
    public int Index {get;set;}
    public bool Finished {get{return Array.Length == Index;}}
    public T Value{get{return Array[Index];}}
}

class ArrayComparer<T> : IComparer<ArrayCursor<T>>
{
    IComparer<T> comparer;
    public ArrayComparer(IComparer<T> comparer)
    {
        this.comparer = comparer;
    }

    public int Compare (ArrayCursor<T> a, ArrayCursor<T> b)
    {
        return comparer.Compare(a.Value, b.Value);
    }
}

static class HeapMerger
{
    public static IEnumerable<T> MergeUnique<T>(this T[][] arrays)
    {
        bool first = true;
        T last = default(T);
        IEqualityComparer<T> eq = EqualityComparer<T>.Default;
        foreach(T i in Merge(arrays))
            if(first || !eq.Equals(last,i))
            {
                yield return i;
                last = i;
                first = false;
            }
    }

    public static IEnumerable<T> Merge<T>(this T[][] arrays)
    {
        var map = new Heap<ArrayCursor<T>>(arrays.Length, new ArrayComparer<T>(Comparer<T>.Default));

        Action<ArrayCursor<T>> tryAdd = (a)=>
        {
            if(!a.Finished)
                map.Add(a);
        };

        for(int i=0;i<arrays.Length;i++)
            tryAdd(new ArrayCursor<T>{Array=arrays[i], Index=0});

        while(map.Count>0)
        {
            ArrayCursor<T> lowest = map.Pop();
            yield return lowest.Value;
            lowest.Index++;
            tryAdd(lowest);
        }
    }
}

Rotsor 2010-06-16 17:42:28

The reason why I chose to use sorted lists inside the nodes is because the number of entries in each node is usually going to be small, so a hashtable seemed unnecessary.But then, since I have the sorted lists, it seemed like I should be able to merge them more efficiently than unsorted lists. This is the problem I am asking about.As for traversing the nodes, I am obviously only looking at the ones which contain the object I want to intersect with. Each node has a min and max index; if the object's index is between the min and max, I check if the node contains the object. Etc.

Reveazure 2010-06-17 14:45:30

You are correct, it's the moving of the elements, not the actual reallocation that is taking the time. I figured this out before, but then I forgot it.

Reveazure 2010-06-17 14:53:13

Ok, you can take advantage of these arrays being sorted by merging them, but it is [a bit more complex](http://discuss.joelonsoftware.com/default.asp?interview.11.439577.9), and I believe that hashtable may still be faster.

Rotsor 2010-06-17 18:47:50

Answer 2

+1 A:

Assuming you keep the size of the OctTree as a property of the tree, you should be able to preallocate a list that is larger than the number of things you could possibly put it in. Preallocating the size will keep the resize from happening as long as the size is larger than you need. I assume that you are using a SortedList to keep your ordered results.

var results = new SortedList<Node>( octTree.Count );
// now find the node and add the points
results = result.TrimToSize(); // reclaim space as needed

An alternative would be to augment your data structure keeping the size of the tree below the current node in the node itself. Then you'd be able to find the node of interest and directly determine what the size of the list needs to be. All you'd have to do is modify the insert/delete operations to update the size of each of the ancestors of the node inserted/deleted at the end of the operation.

tvanfosson 2010-06-16 17:46:53

I could do that with a regular list as well, but doesn't it feel a bit dirty to preallocate such massive amounts of memory?

Reveazure 2010-06-16 17:57:10

@Reveazure -- I suggested a mechanism to reduce the overallocation by augmenting your structure. If you have another heuristic to predict the necessary amount, that would work, too. Since the list is actually just keeping the reference, not a copy of the actual object, the amount of memory needed for even 100,000 objects is relatively small (~800K on a 32-bit machine) anyway.

tvanfosson 2010-06-16 18:16:40

The problem of adding items to SortedList is clearly not reallocation, but ensuring that the list stays sorted (moving of greater elements to the end so that the new item can be inserted).

Rotsor 2010-06-16 21:43:40

@Rotsor - I wasn't suggesting that he use SortedList, I was assuming that he was and suggesting how he could make it more efficient, eliminating the need for resizing, which was his observation, not mine. If he's open to using something other than an ordered list SortedDictionary is a definite possibility. It appears that the keys would be unique so I don't know why it wouldn't work. On the other hand maybe he knows that the items are inserted in key order anyway so reordering isn't an issue.

tvanfosson 2010-06-16 22:29:16

@tvanfosson maybe he knows that and has some miraculous problem with resizing. Or maybe he doesn't and has the usual problem with his algorithm complexity being O(N*N). Don't even know what is more probable...

Rotsor 2010-06-16 22:32:32

@Rotsor That's why I was thinking that perhaps a tree structure is called for.

Reveazure 2010-06-17 13:58:43

@Reveazure - SortedDictionary uses a Red/Black Tree implementation which will work just fine if your keys are unique. Because it uses a linked implementation (rather than an array-indexed one) you don't need to/can't preallocate the capacity. If you don't need ordering, a plain Dictionary would have even better performance characteristics.

tvanfosson 2010-06-17 15:04:43

ansaurus

tags:

views:

answers:

How to get results efficiently out of an Octree/Quadtree?

related questions