views:

280

answers:

6

I have a method like this :

bool MyFunction(int& i)
{
  switch(m_step)
  {
    case 1:
       if (AComplexCondition)
       {
         i = m_i;
         return true;
       }

    case 2:
      // some code

    case 3:
      // some code
  }
}

Since there are lots of case statements (more than 3) and the function is becoming large, I tried to extract the code in case 1 and put it in an inline function like this:

inline bool funct(int& i)
{
  if (AComplexCondition)
  {
    i = m_i;
    return true;
  }
  return false;
}
bool MyFunction(int& i)
{
  switch(m_step)
  {
    case 1:
       if (funct(i))
       {
         return true;
       }

    case 2:
      // some code

    case 3:
      // some code
  }
}

It seems this code is significantly slower than the original. I checked with -Winline and the function is inlined. Why is this code slower? I thought it would be equivalent. The only difference I see is there is one more conditional check in the second version, but I thought the compiler should be able to optimize it away. Right?

edit: Some peoples suggested that I should use gdb to stop over every assembly instructions in both versions to see the differences. I did this.

The first version look like this :

mov
callq (Call to AComplexCondition())
test
je (doesn't jump)
mov (i = m_i)
movl (m_step = 1)

The second version, that is a bit slower seems simpler.

movl (m_step = 1)
callq (Call to AComplexCondition())
test
je (doesn't jump)
mov (i = m_i)
xchg %ax,%ax (This is a nop I think)

These two version seems to do the same thing, so I still don't know why the second version is still slower.

+2  A: 

This is very hard to track down. One problem could be code bloat causing the majority of the loop to be pushed out of the (small) CPU cache... But that doesn't entirely make sense either now that I think of it..

What I suggest doing:

Isolate the code and condition as much as possible while still being able to observe the slowdown.

Then, go profile it. Does the profiling make sense? Now, (assuming your up for the adventure) disasssemble the code and look at what g++ is doing different. Report those results back here

Earlz
+1  A: 

GMan is correct, inline doesn't guarantee that your function will be inlined. It is a hint to the compiler that it might be a good idea. If the compiler doesn't think it is wise to inline the function, you now have the overhead of a function call. Which at the very least will mean two JMP statement being executed. Which means the instructions for the function are stored in a non sequential location, not in the next memory location where the function was invoked, and execution will move that new location complete it and move back to after your function call.

iaimtomisbehave
Look at my post. I verified and confirmed that the function is inlined by gcc.
Mathieu Pagé
Indeed, I was merely making sure he knew that. I'm just as confused as Mathieu why the performance would change; the compiler should be able to produce the same code.
GMan
Didn't see the comments when I posted that sorry about that.
iaimtomisbehave
No problem, thanks anyway.
Mathieu Pagé
+1  A: 

Without seeing the ComplexCondition part, it's hard to say. If that condition is sufficiently complex, the compiler won't be able to pipeline it properly and it will interfere with the branch prediction in the chip. Just a possibility.

Joel
Yes, the condition is complex and imply a call the other functions (in some case) however ComplexCondition is the same in both version of the code, so I though it would have the same impact. Am I right?
Mathieu Pagé
Depends which compiler you're using. GCC does its instruction scheduling before inlining, while Visual Studio does it after, IIRC.
Joel
+1  A: 

Does the assembler tell you anything about what's happening? It might be easier to look at the disassembly than to have us guess, although I go along with iaimtomisbehave's jmp idea generally.

corprew
+3  A: 

Just step through it. Plant a breakpoint, go into the disassembly view, and start stepping.

All mysteries will vanish.

Mike Dunlavey
Good suggestion. I did that (see my edit of the OP), but it did not help, probably because I did not understand well what I saw. Or maybe I did not used gdb correctly.
Mathieu Pagé
A: 

This is a good question. Let us know what you find. I do have a few thoughts mostly stemming from the compiler no longer being able to break up the code you have inlined, but no guaranteed answer.

  1. statement order. It makes sense that the compiler would put this statement with its complex code last. That means the other cases would be evaluated first and it would never get checked unless necessary. If you simplify the statement it might not do this, meaning your crazy conditional gets fully evaluated every time.

  2. creating extra cases. It should be possible to pull some of the coditionals out of the if statement and make an extra case stament in some circumstances. That could eliminate some checking.

  3. pipelining defeated. Even if it inlines, it won't be able to break up the code inside the actuall inlining any. This is the basic issue with all three of these, but with pipelining this causes problems obviously since for pipelining you want to start executing before you get to the check itself.

Charles Eli Cheese