views:

164

answers:

9

I'm writing a loop in C, and I am just wondering on how to optimize it a bit. It's not crucial here as I'm just practicing, but for further knowledge, I'd like to know:

In a loop, for example the following snippet:

int i = 0;
while (i < 10) {
    printf("%d\n", i);
    i++;
}

Does the processor check both (i < 10) and (i == 10) for every iteration? Or does it just check (i < 10) and, if it's true, continue?

If it checks both, wouldn't:

int i = 0;
while (i != 10) {
    printf("%d\n", i);
    i++;
}

be more efficient?

Thanks!

+5  A: 

One of the interesting things about these optimization questions is that they often show why you should code for clarity/correctness before worrying about the performance impact of these operations (which oh-so often don't have any difference).

Your 2 example loops do not have the same behavior:

int i = 0;
/* this will print 11 lines (0..10) */
while (i <= 10) {
    printf("%d\n", i);
    i++;
}

And,

int i = 0;
/* This will print 10 lines (0..9) */
while (i != 10) {
    printf("%d\n", i);
    i++;
}

To answer your question though, it's nearly certain that the performance of the two constructs would be identical (assuming that you fixed the problem so the loop counts were the same). For example, if your processor could only check for equality and whether one value were less than another in two separate steps (which would be a very unusual processor), then the compiler would likely transform the (i <= 10) to an (i < 11) test - or maybe an (i != 11) test.

Michael Burr
Gah, a typo. Edited in the fix.
Wolfin
+8  A: 

Both will be translated in a single assembly instruction. Most CPUs have comparison instructions for LESS THAN, for LESS THAN OR EQUAL, for EQUAL and for NOT EQUAL.

mouviciel
Okay, that solves my problem. But something about it just doesn't seem right.. on a bitwise level, two comparisons can't be made at the same time, can they? I suppose it doesn't matter.
Wolfin
Of course they can. Two flags are involved in these comparisons. The Z flag indicates that a result is 0. N flag indicates that a result is negative. Logic glue for combining these two flags for building all comparison instructions is cheap. By the way, did you notice that adding two 64-bit integers is performed at the same time for all bits?
mouviciel
Of course can two comparisions made at the same time, why not? But its only one comparsion I see, "i<10".
drhirsch
@wolfin: It's not two comparisons. It's just a single comparison on a bitwise level. Actually, the circuit that does this just uses a subtract (the real circuit is simply an adder with a negative number) and then check if the result is negative. For `x<10` it checks against `10` and for `x<=10` it simply checks against `11`. This may be implemented in the hardware itself (in the case of x86 CPUs) or by the complier (for most modern CPUs).
slebetman
@wolfin: actually, see mouviciel's answer. His answer is better than mine.
slebetman
@Wolfin - now that's a good question. And the answer is that the CPUs typically set a series of several flags when an operation is performed, and the CPU can consult more than one of those flags to decide whether or not to branch. For example, when comparing i to 10, the compiler will typically do something like a subtract operation, which will set the carry flag if one operand is less than the other and will set the zero flag if the operands are equal. A less-than-or-equal decision can be made by jumping if either the carry or the zero flag is set, which can be done in a single instruction.
Michael Burr
@slebetman: It probably checks against 10 in both cases, but in one case uses "Jump if Negative" and in the other "Jump if Negative or Zero", like @Michael Burr said.
Vatine
@Vatine: yeah, I sort of said so myself in the comment immediately after *that* comment ^
slebetman
A: 

The operators <= and < are a single instruction in assembly, there should be no performance difference. Note that tests for 0 can be a bit faster on some processors than to test for any other constant, therefore it can be reasonable to make a loop run backward:

int i = 10;
while (i != 0) 
{
    printf("%d\n", i);
    i--;
}

Note that micro optimizations like these usually can gain you only very little more performance, better use your time to use efficient algorithms.

codymanix
A: 

Depends on the architecture and compiler. On most architectures, there is a single instruction for <= or the opposite, which can be negated, so if it is translated into a loop, the comparison will most likely be only one instruction. (On x86 or x86_64 it is one instruction)

The compiler might unroll the loop into a sequence of ten times i++, when only constant expressions are involved it will even optimize the ++ away and leave only constants.

And Ira is right, the comparison does vanish if there is a printf involved, which execution time might be millions of clock cycles.

drhirsch
+3  A: 

This a clear example of early optimization.... IMHO, that is something that programmers new to their craft are way to prone to worry about. If you must worry about it, learn to benchmark and profile your code so that your worries are based on evidence rather than supposition.

Speaking to your specific questions. First, a <= is not implemented as two operations testing for < and == separately in any C compiler I've met in my career. And that includes some monumentally stupid compilers. Notice that for integers, a <= 5 is the same condition as a < 6 and if the target architecture required that only < be used, that is what the code generator would do.

Your second concern, that while (i != 10) might be more efficient raises an interesting issue of defensive programming. First, no it isn't any more efficient in any reasonable target architecture. However, it raises a potential for a small bug to cause a larger failure. Consider this: if some line of code within the body of the loop modified i, say by making it greater than 10, what might happen? How long would it take for the loop to end, and would there be any other consequences of the error?

Finally, when wondering about this kind of thing, it often is worthwhile to find out what code the compiler you are using actually generates. Most compilers provide a mechanism to do this. For GCC, learn about the -S option which will cause it to produce the assembly code directly instead of producing an object file.

RBerteig
A: 

Does the processor check both (i < 10) and (i == 10) for every iteration? Or does it just check (i < 10) and, if it's true, continue?

Neither, it will most likely check (i < 11). The <= 10 is just there for you to give better meaning to your code since 11 is a magic number which actually means (10+1).

slebetman
Aha, this makes me smarter!
Wolfin
A: 
// Case I

int i = 0; 
while (i < 10) {
    printf("%d\n", i);
    i++;
    printf("%d\n", i);
    i++;
}

// Case II

int i = 0;
while (i < 10) {
    printf("%d\n", i);
    i++;
}

Case I code take more space but fast and Case II code is take less space but slow compare to Case I code. Because in programming space complexity and time complexity always proportional to each other. It means you must compromise either space or time.
So in that way you can optimize your time complexity or space complexity but not both.

And your both code are same.

Anand Kumar
A: 

I'm writing a loop in C, and I am just wondering on how to optimize it a bit.

If you compile with optimizations turned on, the biggest optimization will be from unrolling that loop.

It's going to be hard to profile that code with -O2, because for trivial functions the compiler will unroll the loop and you won't be able to benchmark actual differences in compares. You should be careful when profiling test cases that use constants that might make the code trivial when optimized by the compiler.

rox0r
A: 

disassemble. Depending on the processor, and optimization and a number of things this simple example code actually unrolls or does things that do not reflect your real question. Compiling with gcc -O1 though both example loops you provided resulted in the same assembler (for arm).

Less than in your C code often turns into a branch if greater than or equal to the far side of the loop. If your processor doesnt have a greater than or equal it may have a branch if greater than and a branch if equal, two instructions.

typically though there will be a register holding i. there will be an instruction to increment i. Then an instruction to compare i with 10, then equal to, greater than or equal, and less than are generally done in a single instruction so you should not normally see a difference.

dwelch