views:

313

answers:

9

I have this sort of C function -- that is being called a zillion times:

void foo ()
{
    if (/*condition*/)
    {

    }
    else if(/*another_condition*/)
    {

    }
    else if (/*another_condition_2*/)
    {

    } 
          /*And so on, I have 4 of them, but we can generalize it*/
    else
    {

    }
 }

I have a good test-case that calls this function, causing certain if-branches to be called more than the others.

My goal is to figure the best way to arrange the if statements to minimize the branching.

The only way I can think of is to do write to a file for every if condition branched to, thereby creating a histogram. This seems to be a tedious way. Is there a better way, better tools?

I am building it on AS3 Linux, using gcc 3.4; using oprofile (opcontrol) for profiling.

+4  A: 

Try a profiler (gprof?) - it will tell you how much time is spent. I don't recall if gprof counts branches, but if not, just call a separate empty method in each branch.

DVK
a separate empty method in each branch. How can tell a compiler not to inline the function if I am building an optimized binary?
vehomzzz
All you are trying to do is see which branch is executed most often. Build is unoptimized and get the data
tster
@Andrei: or specify the *noinline* function attribute.
Michael Foukarakis
A: 

Wrap the code in each branch into a function and use a profiler to see how many times each function is called.

Dima
A: 

Line-by-line profiling gives you an idea which branches are called more often.

Using something like LLVM could make this optimization automatically.

Georg
+13  A: 

It's not portable, but many versions of GCC support a function called __builtin_expect() that can be used to tell the compiler what we expect a value to be:

if(__builtin_expect(condition, 0)) {
  // We expect condition to be false (0), so we're less likely to get here
} else {
  // We expect to get here more often, so GCC produces better code
}

The Linux kernel uses these as macros to make them more intuitive, cleaner, and more portable (i.e. redefine the macros on non-GCC systems):

#ifdef __GNUC__
#  define likely(x)   __builtin_expect((x), 1)
#  define unlikely(x) __builtin_expect((x), 0)
#else
#  define likely(x)   (x)
#  define unlikely(x) (x)
#endif

With this, we can rewrite the above:

if(unlikely(condition)) {
  // we're less likely to get here
} else {
  // we expect to get here more often
}

Of course, this is probably unnecessary unless you're aiming for raw speed and/or you've profiled and found that this is a problem.

Chris Lutz
Nice explanation on the functions +1. I want to echo your "this is probably unnecessary", though. For any branch that occurs very often (and if the branch doesn't occur often you probably don't care about the performance hit of a misbranch) the processor's branch predictor is usually already doing a good enough job without these hints.
Falaina
For what it's worth, in something like the Linux kernel, the performance impact of a branch is something that needs to be thought about. For most projects, however, you are right: branches aren't going to be the bottleneck.
Chris Lutz
+1  A: 

Some counter may help. After You see the counters, and there are large differences, You can sort the conditions in a decreasing order.

static int cond_1, cond_2, cond_3, ...

void foo (){
    if (condition){
      cond_1 ++;
      ...
    }
    else if(/*another_condition*/){
      cond_2 ++;
      ...
    }
    else if (/*another_condtion*/){
      cond_3 ++;
      ...
    } 
    else{
      cond_N ++;
      ...
    }
 }

EDIT: a "destructor" can print the counters at the end of a test run:

void cond_print(void) __attribute__((destructor));

void cond_print(void){
  printf( "cond_1: %6i\n", cond_1 );
  printf( "cond_2: %6i\n", cond_2 );
  printf( "cond_3: %6i\n", cond_3 );
  printf( "cond_4: %6i\n", cond_4 );
}

I think it is enough to modify only the file that contains the foo() function.

sambowry
Sorting them in order doesn't necessarily guarantee that the compiler will output them in a given order, though I suppose it can't hurt. See my answer for an arguably better way to order them.
Chris Lutz
+1 well, you'd probably want to make cond_N a static?
vehomzzz
I actually used this technique....
vehomzzz
+3  A: 

Running your program under Callgrind will give you branch information. Also I hope you profiled and actually determined this piece of code is problematic, as this seems like a microoptimization at best. The compiler is going to generate a branch table from the if/else if/else if it's able to which would require no branching (this is dependent on what the conditionals are, obviously)0, and even failing that the branch predictor on your processor (assuming this is not for embedded work, if it is feel free to ignore me) is pretty good at determining the target of branches.

Falaina
+2  A: 

It doesn't actually matter what order you change them round to, IMO. The branch predictor will store the most common branch and auto take it anyway.

That said, there are something you could try ... You could maintain a set of job queues and then, based on the if statements, assign them to the correct job queue before executing them one after another at the end.

This could further be optimised by using conditional moves and so forth (This does require assembler though, AFAIK). This could be done by conditionally moving a 1 into a register, that is initialised as 0, on condition a. Place the pointer valueat the end of the queue and then decide to increment the queue counter or not by adding that conditional 1 or 0 to the counter.

Suddenly you have eliminated all branches and it becomes immaterial how many branch mispredictions there are. Of course, as with any of these things, you are best off profiling because, though it seems like it would provide a win ... it may not.

Goz
+1 for a thoughtful answer, but frankly, doesn't it depend on the cycles spent in each condition, and the cycles spent in the branch that is finally chosen? Those would have to be awfully skinny before branch prediction would make a noticeable difference.
Mike Dunlavey
+2  A: 

We use a mechanism like this:

// pseudocode
class ProfileNode
{
public:
   inline ProfileNode( const char * name ) : m_name(name)
   {  }
   inline ~ProfileNode()
   {
      s_ProfileDict.Find(name).Value() += 1; // as if Value returns a nonconst ref
   }

   static DictionaryOfNodesByName_t  s_ProfileDict;
   const char * m_name; 
}

And then in your code

void foo ()
{
    if (/*condition*/)
    {
       ProfileNode("Condition A");
       // ...
    }
    else if(/*another_condition*/)
    {
       ProfileNode("Condition B");
       // ...
    } // etc..
    else
    {
       ProfileNode("Condition C");
       // ...
    }
 }

void dumpinfo()
{
  ProfileNode::s_ProfileDict.PrintEverything();
}

And you can see how it's easy to put a stopwatch timer in those nodes too and see which branches are consuming the most time.

Crashworks
of course there is some overhead, and if you call your introduced function a million times you can change the timing you see...
Quonux
Yes, in practice we use a constant integer symbol for the node identifiers rather than strings, to make the dictionary lookup actually be an O(1) array index.
Crashworks
A: 

As a profiling technique, this is what I rely on.

What you want to know is: Is the time spent in evaluating those conditions a significant fraction of execution time?

The samples will tell you that, and if not, it just doesn't matter.

If it does matter, for example if the conditions include function calls that are on the stack a significant part of the time, what you want to avoid is spending much time in comparisons that are false. The way you tell this is, if you often see it calling a comparison function from, say, the first or second if statement, then catch it in such a sample and step out of it to see if it returns false or true. If it typically returns false, it should probably go farther down the list.

Mike Dunlavey