ansaurus

Question

Tree iterator, can you optimize this any further?

Answer 1

+5 A:

Why not a recursive solution?

void processTree (const BaseNodePtr &current, unsigned int & cnt )
{
  cnt++;

  if (current->hasChild())
    processTree(current->child());
  if (current->hasNext())
    processTree(current->next());
}

Since shared_ptr seems to be your bottleneck, why not improve it? Are you using threads? If not, then undefine the symbol BOOST_HAS_THREADS. The shared_ptr reference count is guarded by a mutex which is probably the cause of the slow performance.

Why not change your data structure to not use shared_ptr altogether? Manage the raw pointers yourself? Maybe use scoped_ptr instead?

1800 INFORMATION 2009-08-19 20:19:10

that is slower believe it or not. Passing the current variable to the next recursive call of processTree is slow. This takes about twice the time.

Ron 2009-08-19 20:24:34

Have you measured it?

Johannes Schaub - litb 2009-08-19 20:25:24

I made a simple modification that should improve the speed. Since you said the iterators are shared pointers, passing by ref will be faster since we don't then have to copy and maintain the reference count on the actual parameter to the processTree funtion

1800 INFORMATION 2009-08-19 20:29:29

Yes I did measure that. I was comparing the shared_ptr implementation to a raw pointer implementation when I found it was 10x as slow with my recursive version of the tree walker. So I started my hunt for a faster piece of code.

Ron 2009-08-19 20:30:13

This is very good. I can't believe I didn't think of this. This code only takes 4009ms. Of course it is recursive, but darn this is great. I am leaning towards accepting this answer even though it's recursive!

Ron 2009-08-19 21:01:15

Yeah my application has threads, so I do need the mutex and I like not having to worry so much about memory management of the raw pointers. Thanks for your answer!

Ron 2009-08-19 21:06:15

Answer 2

A:

Create a "nextvisit" function, and keep calling that, to simplify the code; next to that, use const references iso value-semantics for the shared-pointers... this may save you valuable shared-ptr copies:

// define the order of visitation in here
BaseNodePtr& next( const BaseNodePtr& p ) {
    if( p->hasChild() ) return p->child();
    if( p->hasNext() ) return p->next();
    BaseNodePtr ancestor = p->parent();
    while( ancestor != 0 && !ancestor->hasNext() ) ancestor = ancestor->parent();
    return ancestor;
}

void processTree( const BaseNodePtr& p, unsigned int& cnt ) {
   while( p != NULL ) {
     ++cnt;
     p = next(p);
   }        
}

But for readability, clarity, maintainability, ... for god's sake, use recursion. Unless your stack isn't big enough.

xtofl 2009-08-19 20:28:17

that gets stuck in an infinite loop. :) child to parent, child to parent child to parent etc..

Ron 2009-08-19 20:37:40

yes. It should return the parent's next... Sorry - fixed

xtofl 2009-08-19 20:44:07

Answer 3

+1 A:

I HATE when answers dismiss the question with a "don't do that" but here I go...

Say there was a way to remove the down bool... will that really make any REAL difference in execution time? We're talking about a small handful of CPU operations and a few byte extra on the stack.

Focus on making the child() and parent() calls faster if you need speed. Otherwise you're wasting your time (IMOHO).

EDIT: Maybe walk the tree (w/ this "slow" code) ONCE and build an array of pointers into the tree in the desired order. Use this "index" later.

What I'm saying is I think you're approaching optimization from the wrong angle.

Aardvark 2009-08-19 20:41:23

The cost in the child, next() and parent() call come from the boost::shared_ptr implementation. I need that functionality so I was trying to find a way around this.

Ron 2009-08-19 20:46:29

sorry for the multiple edits...

Aardvark 2009-08-19 20:48:14

Answer 4

+1 A:

Here is how to have only one recursion call instead of two:

void processTree (const BaseNodePtr &current, unsigned int & cnt )
{
  for(bool gotNext = true; gotNext; current = current->next()) { 
    cnt++;
    if (current->hasChild())
      processTree(current->child());
    gotNext = current->hasNext();
  }
}

David Lehavi 2009-08-19 21:22:45

Answer 5

+2 A:

For the ultimate speed up what you need to do is order the nodes in memory so that they are stored in a contiguous block in the order that you visit them.

e.g If you have a tree defined as follows.

        1
       / \
      2   3
     / \  /\
    4   5 6 7
   /\    /  /\
  8  9  10 11 12
 / \           \
13 14          15

Then the visit function as described will visit the nodes in the following order

Now if you order the nodes in memory as a contiguous block of 15 allocations and store the nodes in the order demonstrated above then you will generally visiting a node that has "spatial locality". This can improve your cache hits, depending upon the size of your node structure and thus make things run faster.

To create a quick iterative method of visiting all the nodes in a tree only once and with no recursion.

unsigned int g_StackDepth = 0;
BaseNodePtr* g_Stack[MAX_STACK_DEPTH];

void processTree (BaseNodePtr root, unsigned int & cnt )
{
    g_Stack[g_StackDepth++] = root;
    while( g_StackDepth > 0 )
    {
        BaseNodePtr curr = g_Stack[--g_StackDepth];
        cnt++;

        if ( curr->HasNext() )
        {
            g_Stack[g_StackDepth++] = curr->Next();
        }


        if ( curr->HasChild() )
        {
            g_Stack[g_StackDepth++] = curr->Child();
        }

    }
}

Combined with the above ordering you should get just about the best speed you CAN get, to my knowledge.

Obviously this has limitations as you have to know how big your stack can possibly grow in advance. Though you could get around this by using a std::vector instead. Using a std::vector however would eliminate all the advantages the iterative method above provides, however.

Hope that is some help :)

Goz 2009-08-19 21:32:22

ansaurus

tags:

views:

answers:

Tree iterator, can you optimize this any further?

related questions