views:

210

answers:

2

I have an adjacency list of objects (rows loaded from SQL database with the key and it's parent key) that I need to use to build an unordered tree. It's guaranteed to not have cycles.

This is taking wayyy too long (processed only ~3K out of 870K nodes in about 5 minutes). Running on my workstation Core 2 Duo with plenty of RAM.

Any ideas on how to make this faster?

public class StampHierarchy {
    private StampNode _root;
    private SortedList<int, StampNode> _keyNodeIndex;

    // takes a list of nodes and builds a tree
    // starting at _root
    private void BuildHierarchy(List<StampNode> nodes)
    {
        Stack<StampNode> processor = new Stack<StampNode>();
        _keyNodeIndex = new SortedList<int, StampNode>(nodes.Count);

        // find the root
        _root = nodes.Find(n => n.Parent == 0);

        // find children...
        processor.Push(_root);
        while (processor.Count != 0)
        {
            StampNode current = processor.Pop();

            // keep a direct link to the node via the key
            _keyNodeIndex.Add(current.Key, current);  

            // add children
            current.Children.AddRange(nodes.Where(n => n.Parent == current.Key));

            // queue the children
            foreach (StampNode child in current.Children)
            {
                processor.Push(child);
                nodes.Remove(child); // thought this might help the Where above
            }
        }
    }
}

    public class StampNode {
         // properties: int Key, int Parent, string Name, List<StampNode> Children
    }
+2  A: 
  1. Put the nodes into a sorted list or dictionary.

  2. Scan that list, pick up each node, find its parent node in the same list (binary search or dictionary lookup), add it to the Children collection of the parent node.

There's no need for a Stack to put this into a tree.

Hightechrider
It's worth to note that sorting the nodes by key before putting them into a sorted list makes a huge difference in speed. Go with dictionary is also another alternative if memory is not a primary constraint.
Codism
+1  A: 

SortedList is not a good container to use in this context. It is O(n) for insertion operations (the repeated calls to Add()), as it is internally represented as a flat list. Using Dictionary instead of SortedList will be a large improvement, as it is O(1) amortized insertion time.

Gary Linscott
Ah, also I missed the current.Children.AddRange line. You don't want to re-scan your entire node list searching for each parent. As Hightechrider suggested, putting the nodes into a Dictionary first would speed things up dramatically, as again, you change an O(n) operation into an O(1) operation.
Gary Linscott