views:

464

answers:

5

As a follow up to my original question about a small piece of this code I decided to ask a follow up to see if you can do better then what we came up with so far.

The code below iterates over a binary tree (left/right = child/next ). I do believe there is room for one less conditional in here (the down boolean). The fastest answer wins!

  1. The cnt statement can be multiple statements so lets make sure this appears only once
  2. The child() and next() member functions are about 30x as slow as the hasChild() and hasNext() operations.
  3. Keep it iterative <-- dropped this requirement as the recursive solution presented was faster.
  4. This is C++ code
  5. visit order of the nodes must stay as they are in the example below. ( hit parents first then the children then the 'next' nodes).
  6. BaseNodePtr is a boost::shared_ptr as thus assignments are slow, avoid any temporary BaseNodePtr variables.

Currently this code takes 5897ms to visit 62200000 nodes in a test tree, calling this function 200,000 times.

void processTree (BaseNodePtr current, unsigned int & cnt )
{
    bool down = true;

    while ( true )
    {
     if ( down )
     {
      while (true) {

       cnt++; // this can/will be multiple statesments

         if (!current->hasChild()) break;
           current = current->child();
      }
     }

     if ( current->hasNext() )
     {
      down = true;
      current = current->next();
     }
     else
     {
      down = false;
      current = current->parent();
      if (!current)
       return; // done.
     }
    }
}
+5  A: 

Why not a recursive solution?

void processTree (const BaseNodePtr &current, unsigned int & cnt )
{
  cnt++;

  if (current->hasChild())
    processTree(current->child());
  if (current->hasNext())
    processTree(current->next());
}

Since shared_ptr seems to be your bottleneck, why not improve it? Are you using threads? If not, then undefine the symbol BOOST_HAS_THREADS. The shared_ptr reference count is guarded by a mutex which is probably the cause of the slow performance.

Why not change your data structure to not use shared_ptr altogether? Manage the raw pointers yourself? Maybe use scoped_ptr instead?

1800 INFORMATION
that is slower believe it or not. Passing the current variable to the next recursive call of processTree is slow. This takes about twice the time.
Ron
Have you measured it?
Johannes Schaub - litb
I made a simple modification that should improve the speed. Since you said the iterators are shared pointers, passing by ref will be faster since we don't then have to copy and maintain the reference count on the actual parameter to the processTree funtion
1800 INFORMATION
Yes I did measure that. I was comparing the shared_ptr implementation to a raw pointer implementation when I found it was 10x as slow with my recursive version of the tree walker. So I started my hunt for a faster piece of code.
Ron
This is very good. I can't believe I didn't think of this. This code only takes 4009ms. Of course it is recursive, but darn this is great. I am leaning towards accepting this answer even though it's recursive!
Ron
Yeah my application has threads, so I do need the mutex and I like not having to worry so much about memory management of the raw pointers. Thanks for your answer!
Ron
A: 

Create a "nextvisit" function, and keep calling that, to simplify the code; next to that, use const references iso value-semantics for the shared-pointers... this may save you valuable shared-ptr copies:

// define the order of visitation in here
BaseNodePtr& next( const BaseNodePtr& p ) {
    if( p->hasChild() ) return p->child();
    if( p->hasNext() ) return p->next();
    BaseNodePtr ancestor = p->parent();
    while( ancestor != 0 && !ancestor->hasNext() ) ancestor = ancestor->parent();
    return ancestor;
}

void processTree( const BaseNodePtr& p, unsigned int& cnt ) {
   while( p != NULL ) {
     ++cnt;
     p = next(p);
   }        
}

But for readability, clarity, maintainability, ... for god's sake, use recursion. Unless your stack isn't big enough.

xtofl
that gets stuck in an infinite loop. :) child to parent, child to parent child to parent etc..
Ron
yes. It should return the parent's next... Sorry - fixed
xtofl
+1  A: 

I HATE when answers dismiss the question with a "don't do that" but here I go...

Say there was a way to remove the down bool... will that really make any REAL difference in execution time? We're talking about a small handful of CPU operations and a few byte extra on the stack.

Focus on making the child() and parent() calls faster if you need speed. Otherwise you're wasting your time (IMOHO).

EDIT: Maybe walk the tree (w/ this "slow" code) ONCE and build an array of pointers into the tree in the desired order. Use this "index" later.

What I'm saying is I think you're approaching optimization from the wrong angle.

Aardvark
The cost in the child, next() and parent() call come from the boost::shared_ptr implementation. I need that functionality so I was trying to find a way around this.
Ron
sorry for the multiple edits...
Aardvark
+1  A: 

Here is how to have only one recursion call instead of two:

void processTree (const BaseNodePtr &current, unsigned int & cnt )
{
  for(bool gotNext = true; gotNext; current = current->next()) { 
    cnt++;
    if (current->hasChild())
      processTree(current->child());
    gotNext = current->hasNext();
  }
}
David Lehavi
+2  A: 

For the ultimate speed up what you need to do is order the nodes in memory so that they are stored in a contiguous block in the order that you visit them.

e.g If you have a tree defined as follows.

        1
       / \
      2   3
     / \  /\
    4   5 6 7
   /\    /  /\
  8  9  10 11 12
 / \           \
13 14          15

Then the visit function as described will visit the nodes in the following order

1
 2
  4
   8
    13
    14
   9
  5
 3
  6
   10
  7
   11
   12
     15

Now if you order the nodes in memory as a contiguous block of 15 allocations and store the nodes in the order demonstrated above then you will generally visiting a node that has "spatial locality". This can improve your cache hits, depending upon the size of your node structure and thus make things run faster.

To create a quick iterative method of visiting all the nodes in a tree only once and with no recursion.

unsigned int g_StackDepth = 0;
BaseNodePtr* g_Stack[MAX_STACK_DEPTH];

void processTree (BaseNodePtr root, unsigned int & cnt )
{
    g_Stack[g_StackDepth++] = root;
    while( g_StackDepth > 0 )
    {
        BaseNodePtr curr = g_Stack[--g_StackDepth];
        cnt++;

        if ( curr->HasNext() )
        {
            g_Stack[g_StackDepth++] = curr->Next();
        }


        if ( curr->HasChild() )
        {
            g_Stack[g_StackDepth++] = curr->Child();
        }

    }
}

Combined with the above ordering you should get just about the best speed you CAN get, to my knowledge.

Obviously this has limitations as you have to know how big your stack can possibly grow in advance. Though you could get around this by using a std::vector instead. Using a std::vector however would eliminate all the advantages the iterative method above provides, however.

Hope that is some help :)

Goz