views:

1740

answers:

13

Variable x is int with possible values: -1, 0, 1, 2, 3. Which expression will be faster (in CPU ticks):

1. (x < 0)
2. (x == -1)

Language: C/C++, but I suppose all other languages will have the same.

P.S. I personally think that answer is (x < 0).

More widely for gurus: what if x from -1 to 2^30?

+1  A: 

Same, both operations are usually done in 1 clock.

Artyom
It would be one instruction fetch-and-execute cycle, but not one clock cycle.
Raim
+7  A: 

Both operations can be done in a single CPU step, so they should be the same performance wise.

ghills
Arrrghh! While this is true on the vast majority of chips, you simple *can't* make a definitive statement without knowing the platform he's working on. All the world is not an x86.
dmckee
just most of it ;)
Matthew Whited
Well I would assume if he were asking this question for a specific, non-normal architecture he would specify as such. If he is asking in general, I was trying to give a simple answer for most modern architectures.
ghills
Sure, I didn't think about any specific architecture. Usual x86.
Nikolay Vyahhi
+23  A: 

Why? Whichever you do, the compiler will optimize it on whatever platform you are currently compiling on.

If you need to check if it's -1, use (x == -1), if you want to know if it's less than zero, use that one instead. Write what you would read out loud.

Tiny things like this won't make anything faster, and you should be worried about readability and clean design rather than which tiny operation is faster.

And even if it doesn't do any logical changes, chances are on your platform, both will perform in one CPU cycle.

GMan
Thank you.It's actually bottleneck operator in the high-load program. Performance in this 1-2 strings is much more valuable than readability and I always can write //comments.
Nikolay Vyahhi
If you gain significant performance gains off of anything this small, chances are your design is in need of refactoring (more appropriate algorithms).But like others have said, profile it to find out for your current situation.
GMan
All bottlenecks are usually this small, even in perfect design with perfect algorithms (though there is no such). I do high-load DNA processing and know my field and my algorithms quite well.
Nikolay Vyahhi
"All bottlenecks are usually this small" This isn't very true. Most bottlenecks come from going about the problem in the incorrect manner. Not by potentially saving 1 CPU tick. :/
GMan
@nikolay: Wow. This is actually smaller than cache miss costs in your program?
bdonlan
+3  A: 

The important consideration, anyway, is which actually directs your program flow accurately, and which just happens to produce the same result?

If x is actually and index or a value in an enum, then will -1 always be what you want, or will any negative value work? Right now, -1 is the only negative, but that could change.

CodexArcanum
+9  A: 

Try it and see! Do a million, or better, a billion of each and time them. I bet there is no statistical significance in your results, but who knows -- maybe on your platform and compiler, you might find a result.

This is a great experiment to convince yourself that premature optimization is probably not worth your time.

Alex Feinman
+2  A: 

You can't even answer this question out of context. If you try for a trivial microbenchmark, it's entirely possible that the optimizer will waft your code into the ether:

// Get time
int x = -1;
for (int i = 0; i < ONE_JILLION; i++) {
    int dummy = (x < 0); // Poof!  Dummy is ignored.
}
// Compute time difference - in the presence of good optimization
// expect this time difference to be close to useless.
Bob Cross
It will be optimized by compiler into zero instructions.But I understood your idea, thanks.
Nikolay Vyahhi
Yes - that's what I was trying to say in a jolly way. If it wasn't clear on the first try, my fault.
Bob Cross
You can avoid this to an extent by allowing x and dummy to escape (ie, pass their pointers to a function in another translation unit) and introducing a compiler-specific memory barrier instruction such as gcc's __sync_synchronize(). This will force the compiler to emit code to evaluate (x<0) and set dummy - but it will also force memory accesses.
bdonlan
In the end, you're going to end up creating an elaborate construction to try to measure a difference that isn't there or isn't measurable without 100% context. For example, the OP tagged this question with "C++" and "C" - there's a dramatic difference between the two, much less between the various compilers on all the different platforms.
Bob Cross
In such a small piece of code adding measurement code could change the result because of caching, optimization and such.
Makis
@Makis, yes, that's exactly what I'm saying.
Bob Cross
+65  A: 

That depends entirely on the ISA you're compiling for, and the quality of your compiler's optimizer. Don't optimize prematurely: profile first to find your bottlenecks.

That said, in x86, you'll find that both are equally fast in most cases. In both cases, you'll have a comparison (cmp) and a conditional jump (jCC) instructions. However, for (x < 0), there may be some instances where the compiler can elide the cmp instruction, speeding up your code by one whole cycle.

Specifically, if the value x is stored in a register and was recently the result of an arithmetic operation (such as add, or sub, but there are many more possibilities) that sets the sign flag SF in the EFLAGS register, then there's no need for the cmp instruction, and the compiler can emit just a js instruction. There's no simple jCC instruction that jumps when the input was -1.

Adam Rosenfield
I don't believe that this is or was the "bottleneck" in any program. If you saw a difference in time it's more likely that you code "jumped" over the == -1 condition by e.g. setting it to -2 and thus did not terminate the loop (assuming that expression was part of a loop).
lothar
Don't forget that the cmp instruction might be replaced by an or instruction, which wouldn't reduce the number of cycles but might change the memory alignment. This might be helpful, or it might be counterproductive, which is why profiling is so important.
Mark Ransom
P.S. Don't look down on this question - I've had loops that were so tight that this kind of optimization would make a difference. Usually only a couple of percent, but every little bit helps sometimes!
Mark Ransom
On x86, TEST may be used to test reg == 0, and it's expected to be faster than CMP.
Bastien Léonard
And not even depends in the ISA alone, but in the actual implementation of the architecture too...
fortran
+6  A: 

It could be dependent on what operations precede or succeed the comparison. For example, if you assign a value to x just before doing the comparison, then it might be faster to check the sign flag than to compare to a specific value. Or the CPU's branch-prediction performance could be affected by which comparison you choose.

But, as others have said, this is dependent upon CPU architecture, memory architecture, compiler, and a lot of other things, so there is no general answer.

Kristopher Johnson
Thanks, good answer.
Nikolay Vyahhi
+5  A: 

x < 0 will be faster. If nothing else, it prevents fetching the constant -1 as an operand. Most architectures have special instructions for comparing against zero, so that will help too.

Phone Guy
How can you tell this, without knowing architecture and/or compiler?
kotlinski
Which architecture are you talking about? I believe most x86 instruction sets can do a compare against an immediate value. No need to fetch an operand. Here is a link to an intel instruction set reference: http://www.intel.com/Assets/PDF/manual/253666.pdf
jabbie
Sure, almost any architecture can do a compare against an immediate value. But even there the instruction is bigger (and therefore requires another fetch from memory). Not a huge deal, unless every ounce of performance is critical, which seemed to be the context here. I assume the questioner is writing a device driver or something.
Phone Guy
As to the first question - I've looked at architectures for a long time. After the first half dozen or so, patterns start to emerge.I also happen to know more than is healthy about the semantics of the x86 instruction set, which most folks tend to focus on these days. For instance, any time you do pretty much anything with a value on x86, the condition bits are set. So you can test for negative with a JB instruction after doing a computation, loading a value into a register, etc.Compilers generally try to take advantage of this, though some dumb ones don't.
Phone Guy
+1  A: 

It depends on the architecture, but the x == -1 is more error-prone. x < 0 is the way to go.

No this is not the way to go.To detect errors, use unit tests, not fancy code.To be less error prone: give a name to constants.It's usually better to go straight to the point. If the goal is to compare to -1, just write (x == -1), otherwise the next developer maintaining this code will have to figure out why we are comparing to 0 ("oh, ok, it's in fact to test against -1"), and then figure out what (the f...) is -1.
Jem
Well, we are talking about an ideal case.As you say, nobody should use "magic numbers", but constants.You can compare to ( x <= VALUE ) this way.Usually you do this with counter variables, so it's a good way to be less error prone.In real world, unit test not always can be done (time or other constraints).Obviously if it's a special case you ONLY want to check the '-1' value, ( x == VALUE ) it's the way to go.
+1  A: 

As others have said there probably isn't any difference. Comparisons are such fundamental operations in a CPU that chip designers want to make them as fast as possible.

But there is something else you could consider. Analyze the frequencies of each value and have the comparisons in that order. This could save you quite a few cycles. Of course you still need to compile your code to asm to verify this.

Makis
+1  A: 

I'm sure you're confident this is a real time-taker.

I would suppose asking the machine would give a more reliable answer than any of us could give.

I've found, even in code like you're talking about, my supposition that I knew where the time was going was not quite correct. For example, if this is in an inner loop, if there is any sort of function call, even an invisible one inserted by the compiler, the cost of that call will dominate by far.

Mike Dunlavey
+1  A: 

Nikolay, you write:

It's actually bottleneck operator in the high-load program. Performance in this 1-2 strings is much more valuable than readability...

All bottlenecks are usually this small, even in perfect design with perfect algorithms (though there is no such). I do high-load DNA processing and know my field and my algorithms quite well

If so, why not to do next:

  1. get timer, set it to 0;
  2. compile your high-load program with (x < 0);
  3. start your program and timer;
  4. on program end look at the timer and remember result1.
  5. same as 1;
  6. compile your high-load program with (x == -1);
  7. same as 3;
  8. on program end look at the timer and remember result2.
  9. compare result1 and result2.

You'll get the Answer.

zxcat