views:

444

answers:

14

Hi there,

I've been working on a piece of code recently where performance is very important, and essentially I have the following situation:

int len = some_very_big_number;
int counter = some_rather_small_number;

for( int i = len; i >= 0; --i ){
    while( counter > 0 && costly other stuff here ){
        /* do stuff */
        --counter;
    }
    /* do more stuff */
}

So here I have a loop that runs very often and for a certain number of runs the while block will be executed as well until the variable counter is reduced to zero and then the while loop will not be called because the first expression will be false.

The question is now, if there is a difference in performance between using
counter > 0 and counter != 0?
I suspect there would be, does anyone know specifics about this.

+30  A: 

To measure is to know.

Jim Lewis
Best answer ever :)
zebrabox
I'd like to disagree, but I'd have to go find a better one first.
T.E.D.
I would have written that, "tias".
jrockway
Demn is this an unswer? There was a question! Zebrabox do you measyre or could you measure everything to know the quantity?
Narek
But the opposite is false. To know is not necessarily to measure...
SadSido
@SadSido: You're right with that. IIRC, it goes like this: To know is to err. `:)`
sbi
A: 

In general, they should be equivalent (both are usually implemented in single-cycle instructions/micro-ops). Your compiler may do some strange special-case optimization that is difficult to reason about from the source level, which may make either one slightly faster. Also, equality testing is more energy-efficient than inequality testing (>), though the system-level effect is so small as to not merit discussion.

Matt J
A: 

There may be no difference. You could try examining the assembly output for each.

That being said, the only way to tell if any difference is significant is to try it both ways and measure. I'd bet that the change makes no difference whatsoever with optimizations on.

David Thornley
+8  A: 

I think you're spending time optimizing the wrong thing. "costly other stuff here", "do stuff" and "do more stuff" are more important to look at. That is where you'll make the big performance improvements I bet.

Cellfish
+3  A: 

Is there a difference between counter > 0 and counter != 0? It depends on the platform.

A very common type of CPU are those from Intel we have in our PC's. Both comparisons will map to a single instruction on that CPU and I assume they will execute at the same speed. However, to be certain you will have to perform your own benchmark.

Martin Liversage
+18  A: 

Do you think that what will solve your problem! :D

    if(x >= 0)
00CA1011  cmp         dword ptr [esp],0 
00CA1015  jl          main+2Ch (0CA102Ch) <----
...
    if(x != 0)
00CA1026  cmp         dword ptr [esp],0 
00CA102A  je          main+3Bh (0CA103Bh) <----
AraK
The output I got was jg and jne instead of jl and je, but they're a wash. Times to execute these instructions are the same.
KFro
@KFro What compiler are you using? I am using VC2008 (default release mode)
AraK
Just to expand a bit on this, in x86 both `je` and `jl` jump based on the flags. However, the flags are generated by the `cmp` in both cases, so *calculation* has the same performance either way. The *decision* while based on different flags, will also have the same latency, since the flags are already available, and checking the flags is done in parallel for all flags at once anyway.
Nathan Fellman
...and then you might start to wonder which one your CPU's branch predictor gets right more often.
T.E.D.
+1  A: 

OK, you can measure this, sure. However, these sorts of comparisons are so fast that you are probably going to see more variation based on processor swapping and scheduling then on this single line of code.

This smells of unnecessary, and premature, optimization. Right your program, optimize what you see. If you need more, profile, and then go from there.

windfinder
+17  A: 

In programming, the following statement is the sign designating the road to Hell:

I've been working on a piece of code recently where performance is very important

Write your code in the cleanest, most easy to understand way. Period.

Once that is done, you can measure its runtime. If it takes too long, measure the bottlenecks, and speed up the biggest ones. Keep doing that until it is fast enough.

The list of projects that failed or suffered catastrophic loss due to a misguided emphasis on blind optimization is large and tragic. Don't join them.

T.E.D.
Your advice might apply generally but in this case, either way of writing this makes no difference for clarity and readability. Furthermore, the code *is* very sensitive to performance (I happen to know it …), in fact the whole goal of this project is to squeeze every last ounce of performance from the CPU (and that includes micro-optimizations in tight loops).
Konrad Rudolph
I note that, while you assure that this loop is performance-critical, you haven't written that you found this out through measuring. So how do you know?
sbi
If you know fine, but you still have to measure. It could be that you are wasting time trying to figure out how to save 60 CPU cycles in a loop that has 60,000 of them to save somewhere else.
T.E.D.
+4  A: 

There will be a huge difference if the counter starts with a negative number. Otherwise, on every platform I'm familiar with, there won't be a difference.

Mark Ransom
+3  A: 

As Jim said, when in doubt see for yourself :

#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
using namespace boost::posix_time;
using namespace std;
void main()
{
    ptime Before = microsec_clock::universal_time(); // UTC NOW
    // do stuff here
    ptime After = microsec_clock::universal_time(); // UTC NOW
    time_duration delta_t = After - Before; // How much time has passed?
    cout << delta_t.total_seconds() << endl; // how much seconds total?
    cout << delta_t.fractional_seconds() << endl; // how much microseconds total?
}

Here's a pretty nifty way of measuring time. Hope that helps.

Maciek
The problem with this is that his "do stuff" is a boolean comparison. Most likely the two calls to universal_time() take far longer than the thing being measured. Even if you changed the calls to RTDSC instructions, that plus the math probably takes longer. Any measurement done this way is liable to be awash in noise.
T.E.D.
A: 

Assuming you are developing for the x86 architecture, when you look at the assembly output it will come down to jns vs jne. jns will check the sign flag and jne will check the zero flag. Both operations, should as far as I know, be equally costly.

Lucas
+1  A: 

I would add that the overwhelming performance aspects of this code on modern cpus will be dominated not by the comparison instruction but whether the comparison is well predicted since any mis-predict will waste many more cycles than any integral operation.

As such loop unrolling will probably be the biggest winner but measure, measure, measure.

ShuggyCoUk
+1  A: 

Thinking that the type of comparison is going to make a difference, without knowing it, is the definition of guessing.

Don't guess.

Mike Dunlavey
A: 

Clearly the solution is to use the correct data type.

Make counter an unsigned int. Then it can't be less than zero. Your compiler will obviously know this and be forced to choose the optimal solution.

Or you could just measure it.

You could also think about how it would be implemented...(here we go on a tangent)...

  • less than zero: the sign bit would be set, so need to check 1 bit
  • equal to zero : the whole value would be zero, so need to check all the bits

Of course, computers are funny things, and it may take longer to check a single bit than the whole value (however many bytes it is on your platform).

You could just measure it...

And you could find out that one it more optimal than another (under the conditions you measured it). But your program will still run like a dog because you spent all your time optimising the wrong part of your code.

The best solution is to use what many large software companies do - blame the hardware for not runnnig fast enough and encourage your customer to upgrade their equipment (which is clearly inferior since your product doesn't run fast enough).

< /rant>

PinkTriangles