ansaurus

Question

How efficient is an if statement compared to a test that doesn't use an if? (C++)

Answer 1

+5 A:

Either way, the assembly will only be a few instructions and either way it'll take picoseconds for those instructions to execute.

I would profile the application an concentrate your optimization efforts to something more worthwhile.

Also, the time saved by this type of optimization will not be worth the time wasted by anyone trying to maintain it.

For simple statements like this, I find the ternary operator very intuitive:

low = (a < b) ? a : b;

Clear and concise.

Ben S 2010-06-09 20:34:59

x86 can map a comparison result to 0/1 without a jump.

Mark Ransom 2010-06-09 20:38:41

Where is the conditional jump in `low = b + ((a - b) `

Graphics Noob 2010-06-09 20:39:25

I must be missing something, why will there be a conditional jump in his second example?

Torlack 2010-06-09 20:40:11

I read it as a logical and for some reason, disregard my conditional comment, editing...

Ben S 2010-06-09 20:43:40

I made the same mistake too when I first read the message.

Torlack 2010-06-09 20:47:26

Nanoseconds, not picoseconds. Most processors operate at only the GHz clock range.

Crashworks 2010-06-09 23:41:16

Answer 2

+1 A:

Don't worry about your computer, it will execute them both far faster than you should care for. Sometimes the compiler will generate an identical code for both alternatives. The question you should really spend your time pondering is, 3 months from now, how fast can you read and understand the two snippets? Or, right now, how quickly can your (human) friend understand what you're doing in either of the two?

wilhelmtell 2010-06-09 20:36:52

Answer 3

+20 A:

(Disclaimer: the following deals with very low-level optimizations that are most often not necessary. If you keep reading, you waive your right to complain that computers are fast and there is never any reason to worry about this sort of thing.)

One advantage of eliminating an if statement is that you avoid branch prediction penalties.

Branch prediction penalties are generally only a problem when the branch is not easily predicted. A branch is easily predicted when it is almost always taken/not taken, or it follows a simple pattern. For example, the branch in a loop statement is taken every time except the last one, so it is easily predicted. However, if you have code like

a = random() % 10
if (a < 5)
  print "Less"
else
  print "Greater"

then this branch is not easily predicted, and will frequently incur the prediction penalty associated with clearing the cache and rolling back instructions that were executed in the wrong part of the branch.

One way to avoid these kinds of penalties is to use the ternary (?:) operator. In simple cases, the compiler will generate conditional move instructions rather than branches.

So

int a, b, low;
if (a < b) low = a;
else low = b;

becomes

int a, b, low;
low = (a < b) ? a : b

and in the second case a branching instruction is not necessary. Additionally, it is much clearer and more readable than your bit-twiddling implementation.

Of course, this is a micro-optimization which is unlikely to have significant impact on your code.

danben 2010-06-09 20:38:28

Finally, an answer that isn't bleating about premature optimization. Thank you.

Justicle 2010-06-09 20:48:01

Michael Burr 2010-06-09 20:56:24

@BlueRaja - prove it! Answer the question and show your working.

Justicle 2010-06-09 20:58:04

At `-O1` and higher, gcc produces identical code for the if statement and the ternary operator for the min() function, using a cmovg instruction in both cases. At `-O0`, it uses branches and labels for the if statement and cmovle for the ternary operator.

Adam Rosenfield 2010-06-09 21:08:28

I agree that this is more readable, but it will certainly not be faster. See my answer.

BlueRaja - Danny Pflughoeft 2010-06-09 21:21:25

"and in the second case a branching instruction is not necessary." +1 for binning in the branching overhead into the picture. Otherwise one has to keep in mind that modern CPUs (Intel ones included) have "conditional" operations which remove need for branching in trivial cases like that.

Dummy00001 2010-06-10 15:39:25

Answer 4

+1 A:

Unless you're really trying to buckle down on efficiency, I don't think this is something you need to worry about.

My simple thought though is that the if would be quicker because it's comparing one thing, while the other code is doing several operations. But again, I imagine that the difference is minuscule.

Jesse J 2010-06-09 20:38:48

Answer 5

+11 A:

~~Simple answer: One conditional jump is going to be more efficient than two subtractions, one addition, a bitwise and, and a shift operation combined.~~ I've been sufficiently schooled on this point (see the comments) that I'm no longer even confident enough to say that it's usually more efficient.

Pragmatic answer: Either way, you're not paying nearly as much for the extra CPU cycles as you are for the time it takes a programmer to figure out what that second example is doing. Program for readability first, efficiency second.

Bill the Lizard 2010-06-09 20:39:43

Your simple answer is wrong for some processors.

nategoose 2010-06-09 21:36:02

@nategoose: Which processors?

Bill the Lizard 2010-06-09 21:41:03

@Bill: many processors have a long instruction pipeline which must be flushed whenever there's a mispredicted branch, taking perhaps 10 or 20 cycles. In this case, the branch is likely to be mispredicted half the time, so the conditional version might take an average of 5 or 10 cycles, while the squiggly version takes 4 or 5. (Of course, other processors have conditional instructions, short pipelines and other ways to avoid misprediction, and then the conditional version will be faster).

Mike Seymour 2010-06-09 23:09:39

And on the processor I mostly use, the first version takes 2 cycles, and the second takes 3.

Mike Seymour 2010-06-09 23:33:21

Crashworks 2010-06-09 23:39:50

@nategoose, @Mike, @Crashworks: Well, that will teach me to make sweeping generalizations based on benchmarks from one machine. I stand corrected.

Bill the Lizard 2010-06-10 01:31:27

@Bill the Lizard: The most extreme example would be most GPUs, I believe. On the other hand if memory serves there are some (but not all) conditional branch instructions on the AVR microcontrollers that take 3 cycles, which would take the same number of cycles as an 8 bit add, bitwise and, and shift by 1.

nategoose 2010-06-10 03:49:31

Answer 6

+1 A:

Why low = a; in the if and low = a; in the else? And, why 31? If 31 has anything to do with CPU word size, what if the code is to be run on a CPU of different size?

The if..else way looks more readable. I like programs to be as readable to humans as they are to the compilers.

vpit3833 2010-06-09 20:40:50

Answer 7

+2 A:

One thing I will point out that I haven't noticed mention that an optimization like this can easily be overwhelmed by other issues. For example, if you are running this routine on two large arrays of numbers (or worse yet, pairs of number scattered in memory), the cost of fetching the values on today's CPUs can easily stall the CPU's execution pipelines.

Torlack 2010-06-09 20:46:55

Answer 8

+5 A:

The biggest problem is that your second example won't work on 64-bit machines.

However, even neglecting that, modern compilers are smart enough to consider branchless prediction in every case possible, and compare the estimated speeds. So, you second example will most likely actually be slower

There will be no difference between the if statement and using a ternary operator, as even most dumb compilers are smart enough to recognize this special case.

BlueRaja - Danny Pflughoeft 2010-06-09 20:59:42

I've looked at the assembly output of MSVC and GCC, and neither of them seem smart enough to emit branchless conditional moves half the time I want them to.

Crashworks 2010-06-09 23:40:46

@Crashworks: That means the compiler decided the branchless conditional is actually slower (branchless conditionals require more clocks, but don't have the possibility of clearing the instruction pipeline)

BlueRaja - Danny Pflughoeft 2010-06-10 00:00:18

The second example won't even work on all 32 bit machines, since the behaviour of the `>>` operator on negative values is implementation-defined (it's allowable for the compiler to implement it with a logical shift rather than arithmetic shift).

caf 2010-06-10 00:01:26

Yes, but the compiler was wrong when it decided that. I've timed both pathways. My job consists of cramming more work into 16.6 milliseconds than the competing product can.In general, I have seen compilers emit many suboptimal code sequences. They are not perfect.

Crashworks 2010-06-10 00:03:45

@Crashworks: In that case you should take your most critical portions and rewrite them in highly-optimized assembly

BlueRaja - Danny Pflughoeft 2010-06-10 01:44:23

I sometimes do, but it's often easier to meet the compiler halfway and write code in such a way that it results in the code sequence I want; intrinsics in particular are an example of this. That's much easier to intermingle with other C++ code than inline assembly. It's a common practice in the embedded world; part of the job is learning what the compiler will emit for particular inputs.

Crashworks 2010-06-10 01:52:10

@Crashworks: the scenario you're describing seems like it's one that's actually worthy of worrying about the optimization. The other 99+% of the cases there's no value in it, so the code should be written to be and correct and readable (which may well still involve the conditional operator).

Michael Burr 2010-06-11 20:32:14

In practice I wrote an `isel(a,b,c)` function which has the same effect as `return a >= 0 ? b : c` . We just use that. (It was named by analogue to the `fsel` intrinsic, which is the hardware's native floating point conditional-move.) It would be better if the compiler were just smart enough to emit the right code for `?:`, but we haven't got a smart compiler, just GCC.

Crashworks 2010-06-11 20:52:49

Answer 9

+5 A:

Like with any low-level optimization, test it on the target CPU/board setup.

On my compiler (gcc 4.5.1 on x86_64), the first example becomes

cmpl    %ebx, %eax
cmovle  %eax, %esi

The second example becomes

subl    %eax, %ebx
movl    %ebx, %edx
sarl    $31, %edx
andl    %ebx, %edx
leal    (%rdx,%rax), %esi

Not sure if the first one is faster in all cases, but I would bet it is.

Cubbi 2010-06-09 21:16:21

Answer 10

+3 A:

For something as simple as this, why not just experiment and try it out?

Generally, you'd profile first, identify this as a hotspot, experiment with a change, and view the result.

I wrote a simple program that compares both techniques passing in random numbers (so that we don't see perfect branch prediction) with Visual C++ 2010. The difference between the approaches on my machine for 100,000,000 iteration? Less than 50ms total, and the if version tended to be faster. Looking at the codegen, the compiler successfully converted the simple if to a cmovl instruction, avoiding a branch altogether.

Michael 2010-06-09 21:18:01

Answer 11

+1 A:

profile results with gcc -o foo -g -p -O0, Solaris 9 v240

 %Time Seconds Cumsecs  #Calls   msec/call  Name
  36.8    0.21    0.21 8424829      0.0000  foo2
  28.1    0.16    0.37       1    160.      main
  17.5    0.10    0.4716850667      0.0000  _mcount
  17.5    0.10    0.57 8424829      0.0000  foo1
   0.0    0.00    0.57       4      0.      atexit
   0.0    0.00    0.57       1      0.      _fpsetsticky
   0.0    0.00    0.57       1      0.      _exithandle
   0.0    0.00    0.57       1      0.      _profil
   0.0    0.00    0.57    1000      0.000   rand
   0.0    0.00    0.57       1      0.      exit

code:

int
foo1 (int a, int b, int low)        
{
   if (a < b) 
      low = a;   
   else 
      low = b;         
   return low;
}

int                       
foo2 (int a, int b, int low)
{
   low = (a < b) ? a : b;
   return low;
}


int main()
{
    int low=0;
    int a=0;
    int b=0;
    int i=500;
    while (i--)
    {
       for(a=rand(), b=rand(); a; a--)
       {
           low=foo1(a,b,low);
           low=foo2(a,b,low);
       }        
    }
    return 0;
}

Based on data, in the above environment, the exact opposite of several beliefs stated here were not found to be true. Note the 'in this environment' If construct was faster than ternary ? : construct

jim mcnamara 2010-06-09 21:40:20

However, compiling here, `gcc -O2 -S -o output.S input.c`, `foo1` and `foo2` compile to exactly the same 4 instructions. (Linux, gcc 4.3.4, amd64 (core 2 duo))

Thanatos 2010-06-09 23:12:04

That was the whole point and why "bleating" about profiling is meaningful. Thanks.

jim mcnamara 2010-06-09 23:27:40

Answer 12

A:

If it is for Gnu C++, try this

int min = i <? j;

I have not profiled it but I think it is definitely the one to beat.

Syd 2010-06-09 23:03:10

I don't know what Gnu C++ is, but I don't like its syntax.

Dennis Zickefoose 2010-06-09 23:18:52

Gnu C++ is of course the C++ compiler from GCC (the Gnu Compiler Collection). IIRD they've deprecated this form. Just use `std::min(i,j)`. It's unlikely GCC's `std::min<int>` is slower than this.

MSalters 2010-06-10 09:11:17

Answer 13

+5 A:

Compiling this on gcc 4.3.4, amd64 (core 2 duo), Linux:

int foo1(int a, int b)
{
    int low;
    if (a < b) low = a;
    else low = b;
    return low;
}

int foo2(int a, int b)
{
    int low;
    low = b + ((a - b) & ((a - b) >> 31));
    return low;
}

I get:

foo1:
    cmpl    %edi, %esi
    cmovle  %esi, %edi
    movl    %edi, %eax
    ret

foo2:
    subl    %esi, %edi
    movl    %edi, %eax
    sarl    $31,  %eax
    andl    %edi, %eax
    addl    %esi, %eax
    ret

...which I'm pretty sure won't count for branch predictions, since the code doesn't jump. Also, the non if-statement version is 2 instructions longer. I think I will continue coding, and let the compiler do it's job.

Thanatos 2010-06-09 23:18:55

Answer 14

A:

I'm just wondering which one of these would be more efficient (or if the difference is to miniscule to be relevant), and the efficiency of if-else statements versus alternatives in general.

Desktop/server CPUs are optimized for pipelining. Second is theoretically faster because CPU doesn't have to branch and can utilize multiple ALUs to evaluate parts of expression in parallel. More non-branching code with intermixed independent operations are best for such CPUs. (But even that is negated now by modern "conditional" CPU instructions which allow to make the first code branch-less too.)

On embedded CPUs branching if often less expensive (relatively to everything else), nor they have many spare ALUs to evaluate operations out-of-order (that's if they support out-of-order execution at all). Less code/data is better - caches are small too. (I have even seen uses of buble-sort in embedded applications: the algorithm uses least of memory/code and fast enough for small amounts of information.)

Important: do not forget about the compiler optimizations. Using many tricks, the compilers sometimes can remove the branching themselves: inlining, constant propagation, refactoring, etc.

But in the end I would say that yes, the difference is minuscule to be relevant. In long term, readable code wins.

The way things go on the CPU front, it is more rewarding to invest time now in making the code multi-threaded and OpenCL capable.

Dummy00001 2010-06-10 16:02:28

ansaurus

tags:

views:

answers:

How efficient is an if statement compared to a test that doesn't use an if? (C++)

related questions