views:

395

answers:

13

I understand most of the micro-optimizations out there but are they really useful?

Exempli gratia: does doing ++i instead of i++, or while(1) or for(;;) really result in performance improvements (either in memory fingerprint or CPU cycles)?

So the question is, what micro-optimizations can be done in C? Are they really useful?

A: 
x ^= y
y ^= x
x ^= y
Viktor Klang
What is that? Could you give some explanation?
kogut
It swaps variables.
Balon
Swapping numbers x and y
laura
@Balon: That is implementation defined behaviour, it may work on one platform, on another it will fail...the best course of action is to steer clear of these tricks and not to depend on them.
tommieb75
It is about 3 times as slow as using a temporary variable when optimizations are turned on. http://stackoverflow.com/questions/1826159/swapping-two-variable-value-without-using-3rd-variable/1826175#1826175
Yacoby
What is implementation defined about it?
Richard Pennington
This is awful. It is difficult to understand and will utterly confuse most programmers who have to maintain such code. This type of coding practise is one reason why many software projects fail.
Mick Sharpe
its only implementation defined if you try to do it in a single sequence point, as written of course it wont even compile, it also is probably worse than using a temp
jk
Geeze. All Astor forgot was the smiley.
Richard Pennington
@Richard: http://stackoverflow.com/questions/36906/what-is-the-fastest-way-to-swap-values-in-c
tommieb75
infact its undefined if done in a single sequence point, theres nothing implementation defined about it at all
jk
@tommieb75: It is not undefined. And in this case x and y can not be aliased to one another. Unless we are talking about C++ (which we are not) and they were references.
Richard Pennington
given the example wouldn't even compile its hard to say whether it would invoke UB or not
jk
I got challenged when I was a kid to come up with this solution and I did; I was so proud of myself. Thank you for the memories!
Leonardo Herrera
Come on guys, we're talking of microoptimizations here... We all know that you should probably focus on delivering business value instead ;-)
Viktor Klang
Astor nailed it(ok smiley is misng but whatever). This is perfect answer to demonstrate how something what looks like "optimization" is only confusing AND also much slower (on modern CPUs at least).
MaR
tommieb75: I know, I've just answered to kogut's question :)
Balon
-1 This is the worst kind of micro optimization, making your code slower (3 reads, 3 writes that can not be reordered against 2 reads, 2 writes), less portable, less readable.
tristopia
+4  A: 

I think you do not need to think about these micro-optimizations because most of them is done by compiler. These things can only make code more difficult to read.

Remember, [edited] premature [/edited] optimization is an evil.

kogut
Optimization isn't evil, but it is just one aspect of your code and, maybe, not the most important. It's sometimes important to not optimize to retain some other aspect -- such as readability or safety -- but if you can write code that takes half as much time, say by caching intermediate results, and is as clean and readable, you ought to do it. On the other hand, I'm not going to spend much time improving a program that takes 7 seconds to run -- unless it's supposed to run every 6 seconds.
tvanfosson
The actual Knuth's quote is "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
Leonardo Herrera
*EARLY* optimization is an evil. "Make it work, make it work right, then make it work fast, in that order."
Mike DeSimone
I see your point, all of you guys have right. I edited my answer.
kogut
+19  A: 

You should rely on your compiler to optimise this stuff. Concentrate on using appropriate algorithms and writing reliable, readable and maintainable code.

Mick Sharpe
+100. If you are in a position where micro-optimizations like these actually make a difference, you wouldn't be asking this question.
DevSolar
+3  A: 

To be honest, that question, while valid, is not relevant today - why?

Compiler writers are a lot more smarter than they were 20 years ago, rewind back in time, then these optimizations would have been very relevant, we were all working with old 80286/386 processors, and coders would often resort to tricks to squeeze even more bytes out of the compiled code.

Today, processors are too fast, compiler writers knows the intimate details of operand instructions to make every thing work, considering that there is pipe-lining, core processors, acres of RAM, remember, with a 80386 processor, there would be 4Mb RAM and if you're lucky, 8Mb was considered superior!!

The paradigm has shifted, it was about squeezing every byte out of compiled code, now it is more on programmer productivity and getting the release out the door much sooner.

The above I have stated the nature of the processor, and compilers, I was talking about the Intel 80x86 processor family, Borland/Microsoft compilers.

Hope this helps, Best regards, Tom.

tommieb75
Actually, I suspect that compiler writers were probably better 20 years ago -- they had to be. Compilers, on the other hand, may be better today, but it's because the current compiler writers are standing on the shoulders of the work of the earlier compiler writers and researchers who have improved the state of the art.
tvanfosson
@tvanfosson: Agreed! From little acorns grows big oak trees :)
tommieb75
if you are writing in c today there is a good chance its embedded stuff so you can't necessarily rely on x86 sized resources
jk
The main reason compilers are better now is that they're so much *slower*, because we can afford 10x as many cycles to spend compiling... even more so for embedded systems where our development machines are 10x as fast as the target (or even 1000x, as I struck in one case recently).
Andrew McGregor
+4  A: 

There's a difference, I think, between a micro-optimization, a trick, and alternative means of doing something. It can be a micro-optimization to use ++i instead of i++, though I would think of it as merely avoiding a pessimization, because when you pre-increment (or decrement) the compiler need not insert code to keep track of the current value of the variable for use in the expression. If using pre-increment/decrement doesn't change the semantics of the expression, then you should use it and avoid the overhead.

A trick, on the other hand, is code that uses a non-obvious mechanism to achieve a result faster than a straight-forward mechanism would. Tricks should be avoided unless absolutely needed. Gaining a small percentage of speed-up is generally not worth the damage to code readability unless that small percentage reflects a meaningful amount of time. Extremely long-running programs, especially calculation-heavy ones, or real-time programs are often candidates for tricks because the amount of time saved may be necessary to meet the systems performance goals. Tricks should be clearly documented if used.

Alternatives, are just that. There may be no performance gain or little; they just represent two different ways of expressing the same intent. The compiler may even produce the same code. In this case, choose the most readable expression. I would say to do so even if it results in some performance loss (though see the preceding paragraph).

tvanfosson
I don't think the compiler "insert[s] code to keep track of the current value of the variable", it's simply a matter of whether it sticks the increment instruction before or after the block of code generated for the expression. It should be the same amount of overhead (an increment if it's in a register, a load/increment/store if not) regardless.
TMN
This post is somewat related. Check out the comments for info on using pre-increment/decrement over post- versions: http://appdeveloper.intel.com/en-us/blog/2010/01/11/6-tips-efficient-netbook-c-code#comments
Xandy
It depends. It's certainly possible that the compiler may be able by carefully ordering the operations to use the same number of instructions. My experiments with C# show this to be the case for simple examples, but only because they use an evaluation stack and the intermediate values are stored on the stack. It must keep track of the previous value somewhere in order to use the previous value in evaluating any expression it is part of. If the pre/post-increment **is** the expression, then you're correct. It's possible though that in a complex expression it may need atemporary variable.
tvanfosson
A: 

++i should be prefered over i++ for situations where you don't use the return value because it better represents the semantics of what you are trying to do (increment i) rather than any possible optimisation (it might be slightly faster, and is probably not worse).

jk
+1  A: 

For a slightly more pragmatic take on the question of ++i vs. i++ (at least in a C++ context) see http://llvm.org/docs/CodingStandards.html#micro_preincrement.

If Chris Lattner says it, I've got to pay attention. ;-)

Richard Pennington
In C++, yes. In C, there will be no performance difference.
Stephen Canon
+5  A: 

The day tclhttpd, a webserver written in Tcl, one of the slowest scripting language, managed to outperform Apache, a webserver written in C, one of the supposedly fastest compiled language, was the day I was convinced that micro-optimizations significantly pales in comparison to using a faster algorithm/technique*.

Never worry about micro-optimizations until you can prove in a debugger that it is the problem. Even then, I would recommend first coming here to SO and ask if it is a good idea hoping someone would convince you not to do it.

It is counter-intuitive but very often code, especially tight nested loops or recursion, are optimized by adding code rather than removing them. The gaming industry has come up with countless tricks to speed up nested loops using filters to avoid unnecessary processing. Those filters add significantly more instructions than the difference between i++ and ++i.

*note: We have learned a lot since then. The realization that a slow scripting language can outperform compiled machine code because spawning threads is expensive led to the developments of lighttpd, NginX and Apache2.

slebetman
"Never worry about micro-optimizations" - +1 for this. Doesn't say "don't do them", just "don't worry about them". If a particular micro-optimization can be done without worrying, for instance because it's second nature to you, then you may as well just do it up front. Just don't expend any resources or stack up any future cost of `orrible code.
Steve Jessop
+1  A: 

You would do better to consider every program you write primarily as a language in which you communicate your ideas, intentions and reasoning to other human beings who will have to bug-fix, reuse and understand it. They will spend more time on decoding garbled code than any compiler or runtime system will do executing it. To summarise, say what you mean in the clearest way, using the common idioms of the language in question.

For these specific examples in C, for(;;) is the idiom for an infinite loop and "i++" is the usual idiom for "add one to i" unless you use the value in an expression, in which case it depends whether the value with the clearest meaning is the one before or after the increment.

martinwguy
+2  A: 

Keeping the fact that memory is the new disk in mind will likely improve your performance far more than applying any of those micro-optimizations.

Brian
and disk is the new tape.
janneb
A: 

Generally, loops that count towards zero are faster than loops that count towards some other number. I can imagine a situation where the compiler can't make this optimization for you, but you can make it yourself.

Say that you have and array of length x, where x is some very big number, and that you need to perform some operation on each element of x. Further, let's say that you don't care what order these operations occur in. You might do this...

int i;
for (i = 0; i < x; i++)
    doStuff(array[i]);

But, you could get a little optimization by doing it this way instead -

int i;
for (i = x-1; i != 0; i--)
{
    doStuff(array[i]);
}
doStuff(array[0]);

The compiler doesn't do it for you because it can't assume that order is unimportant.

MaR's example code is better. Consider this, assuming doStuff() returns an int:

int i = x;
while (i != 0)
{
    --i;
    printf("%d\n",doStuff(array[i]));
}

This is ok as long as printing the array contents in reverse order is acceptable, but the compiler can't decide that for you.

This being an optimization is hardware dependent. From what I remember about writing assembler (many, many years ago), counting up rather than counting down to zero requires an extra machine instruction each time you go through the loop.

If your test is something like (x < y), then evaluation of the test goes something like this:

  • subtract y from x, storing the result in some register r1
  • test r1, to set the n and z flags
  • branch based on the values of the n and z flags

If your test is ( x != 0), you can do this:

  • test x, to set the z flag
  • branch based on the value of the z flag

You get to skip a subtract instruction for each iteration.

There are architectures where you can have the subtract instruction set the flags based on the result of the subtraction, but I'm pretty sure x86 isn't one of them, so most of us aren't using compilers that have access to such a machine instruction.

Ed
If the compiler can prove that the order is unimportant, then it can absolutely do it for you. Compilers are actually pretty good at this transform.
Stephen Canon
int i=x; while (i!=0) {--i; doStuff(i); }
MaR
I think the reason for this is not that decrementing is faster, but that comparing against 0 doesn't waste a register since some (all common ones?) processors have hardwired 0 and 1 "registers". Then again, assuming fetching array[i] from memory dominates doStuff(), you're counting on the cache prefetcher working as well with the backwards access patterns as with forwards. Which might easily be a bigger issue than wasting a register if this turns out to be false.
janneb
@janneb, no that's not the reason why decrementing is faster on some architectures. By decrementing you don't need to compare explicitly against 0, the zero flag is automaticly raised when reaching 0. There can be another reason, many processors have inbuilt loop commands (x86) which might be faster (not always, in Pentium Pro/II/III times LOOP was significantly slower than a DEC; JNZ combination. This said, it is true that the cache load pattern is probably more important than the exact code combination.
tristopia
+3  A: 

If you can easily see that two different code sequences produce identical results, without making assumptions about the data other than what's present in the code, then the compiler can too, and generally will.

It's only when the transformation from one to the other is highly non-obvious or requires assuming something that you may know to be true but the compiler has no way to infer (eg. that an operation cannot overflow or that two pointers will never alias, even though they aren't declared with the restrict keyword) that you should spend time thinking about these things. Even then, the best thing to do is usually to find a way to inform the compiler about the assumptions that it can make.

If you do find specific cases where the compiler misses simple transformations, 99% of the time you should just file a bug against the compiler and get on with working on more important things.

Stephen Canon
+1  A: 

Here's real optimization, in my experience.

Someone on SO once remarked that micro-optimization was like "getting a haircut to lose weight". On American TV there is a show called "The Biggest Loser" where obese people compete to lose weight. If they were able to get their body weight down to a few grams, then getting a haircut would help.

Maybe that's overstating the analogy to micro-optimization, because I have seen (and written) code where micro-optimization actually did make a difference, but when starting off there is a lot more to be gained by simply not solving problems you don't have.

Mike Dunlavey