views:

590

answers:

10

I'm writing some optimized C code that basically runs through an array and does something to each element. What it does depends on the current value of the element so something like:

for (i=0; i < a_len; i++) {
    if (a[i] == 0) {
        a[i] = f1(a[i]);
    } else if (a[i] % 2 == 0) {
        a[i] = f2(a[i]);
    } else {
        a[i] = 0;
}

I'm returning to C after many years working in dynamic languages, where my practice has been to try to write straightforward code and not create lots of local variables for things that I can just refer to directly, like a[i] above. I am very much aware that best practices are to write readable code and trust that the compiler is smarter than you and will do good optimizations.

If I were writing the code above in assembler, I would load a[i] into a register once and then just use that value each time because I know that a[] is private memory and won't change between references. However, even a smart compiler might do a load every time because it can't be sure that the memory hasn't changed. (Or do I have to explicitly declare "a" volatile for the compiler to not make this optimization?).

So, my question is: should I expect better performance by rewriting with a local variable like so:

for (i=0; i < a_len; i++) {
    val = a[i];
    if (val == 0) {
        a[i] = f1(val);
    } else if (val % 2 == 0) {
        a[i] = f2(val);
    } else {
        a[i] = 0;
}

Or does stuff like -O3 take care of this automatically for me? The code I'm optimizing takes days to run, so even modest improvements will make a difference.

+13  A: 

The obvious answer is of course to first write it in the most readable/plain/understandable way, then compile it with as much optimization as you can get, and then benchmark and profile that.

There is no point in optimizing things before you even know if they are bottlenecks. If the compiler does that transform automatically, you're just making the code worse, spending time, and getting absolutely nothing in return. Except perhaps the feeling of being cool, but that fades with time. :)

unwind
A: 

Try and time it.

I guess that the compiler is smart enough to optimize it.

Georg
a normal profiler would struggle with this - we're talking individual instruction levels of profiling, not function level
Alnitak
A: 

An array in C is essentially a pointer.

local variables are cheap.

I find the first example a tad more simple to read because im not questioning what "val" is for. if "val" and "a" had better names I would venture to say the second example would improve readability.

An array is not 'essentially a pointer' it IS one.
Brock Woolf
brock, thanks for the pointer on pointers!
A: 
  1. Don't optimise unless you know you have to
  2. Your compiler will probably do the right thing
  3. I find the latter version easier to read
  4. (marginal) it's easier to isolate the latter version from side effects in a multi-threaded program
Alnitak
why the downvotes?
Alnitak
I guess someone doesn't agree with you.
Georg
I guess, but I'd like to understand why...
Alnitak
+7  A: 

Write it for readability first. Personally, I find that all the subscripting hurts my eyes, so I'd probably write it more like:

for (i=0; i < a_len; i++) {

    int val = a[i];  /* or whatever type */
    int result = 0;  /* default result */

    if (val == 0) {
        result = f1(val);
    } else if (val % 2 == 0) {
        result = f2(val);
    } 

    a[i] = result;
}

I'm guessing the compiler will generate similar code with optimizations cranked up. But I wouldn't be shocked if one or the other was slightly (only very slightly) better. And I'd bet that if one were, it would be the one using the locals.

Also, you might get a very slight improvement by changing walking through the array using an index to walking through it using a pointer. Again, that's very compiler and situation dependent.

for (p=&a[0]; p < &a[a_len]; ++p) {

    int val = *p;    /* or whatever type */
    int result = 0;  /* default result */

    if (val == 0) {
        result = f1(val);
    } else if (val % 2 == 0) {
        result = f2(val);
    } 

    *p = result;
}

And, yes, I'm aware that these are micro-optimizations and generally should not even be worried about (please code for readability and correctness first) - I'm just pointing out some options for when the micro-optimization might be warranted (these suggestions have to backed up with analysis of the particular situation).

As far as whether the compiler will repeatedly reload from something like a[i] or not, that depends on the flow of control and whether the object being accessed is a global or has had its address taken and passed to something else.

If the object is global or has had its address taken and you call a function, generally the compiler has to assume that the object could have been modified by the function and will have to reload it. Similar issues happen when pointers are used to pass information to functions. Using locals can help mitigate this issue, since a compiler can very easily determine that a local is not modified by a called function unless the address of the local is taken. Compilers can also try solve this problem by using some sort of global optimization (such as what MSVC does at link time).

You example code probably isn't really hitting this problem even if array a is global because you don't re-read the value from the array after you've called the either of those functions (you only write to it).


I wonder why markdown is removing blank lines from the code-formatted blocks?

Michael Burr
Great answer. It's a shame you have to pander to the "no premature optimization" crowd instead of just answering the guy's question and assuming he knows what he's talking about, but I've seen good answers get downvoted into oblivion for not doing so.
Matt J
@Matt J - thanks, but be aware that I'm often in the "no premature optimization" crowd. Guess it depends on my mood. Here, I actually think that the more readable (to me) code is probably easier for the compiler to optimize too. But I doubt that this example code would actually see a difference.
Michael Burr
I certainly try not to optimize my own code prematurely, but when answering someone else's question, sometimes it is more appropriate to just answer the question, if there is no indication that the asker is incompetent.Often, the "do not optimize" answer is voted way above the helpful answer is all.
Matt J
+3  A: 

The functions f1 and f2 seems to share the same signature. How differently do they behave? Do you really need the check outside? Or, can you embed the logic in one function?

If you have a if-else ladder instead of only two such functions, try to use an array of function pointers instead. Use the value of a[ i ] to index in to that array and call the correct function.

Hand-optimization often turns out to be error prone micro-optimization. It's best to leave this task to the compiler. If you really need to optimize, look at the big picture, think of algorithms, the design, layers etc.

As for your question: Yes, most compilers are likely to optimize out the memory read should a[ i ] be not declared volatile.

dirkgently
That was just an example. The question was really whether the memory read and offset arithmetic would be optimized out.
Nick
accepted for being the only answer so far to actually answer the question.
Nick
Sometimes the compiler will not optimize the code when it is dealing with a pointer that "may be aliased". In your case, if the compiler is getting function(int * a) then the compiler might assume the ptr to a is aliased and therefore won't optimize.
Trevor Boyd Smith
If you quality the pointer as "int * restrict a" then the compiler will know that "a" is not being aliased and it will optimize.
Trevor Boyd Smith
restrict is a C99 addition. There are only so few C99 compatible implementations.
dirkgently
+4  A: 

Both versions generate exactly the same code in GCC, as long as -O or higher is turned on. So my suggestion is to do whichever way you like better (I prefer without the local variable).

Greg Rogers
A: 

Optimization hint

I would probable first look at having var as a pointer instead i local vaiable to probable be better. then you dont use double storing of the variable as well

int* var;//Int or whatever type a[] is
for (i=0; i < a_len; i++) {
    val = &a[i];
    if (*val == 0) {
        f1(val);//// Set the valur inside f1
    } else if (*val % 2 == 0) {
        f2(val);// Set the valur inside f2
    } else {
        *val = 0;
}

A hint for optimizing your code could be to avoid using '%' operator when you are not interested in the result. It depends on your compiler but this has proven way faster for me (using a macro for readability):

#define is_divisible(dividend, divisor) ((((dividend)/(divisor)) * (divisor))==(dividend))

use:

else if (is_divisible(val,2)) {

This is faster, at least on most cases I have tested.

Edit: true that the profit is not so large when using modula calculations using only '% 2' But if you ever are looning at a larger calue than 2 to do modula operation and are just interested in of the modula returns zero then my macro is faster in all the compilers i have used

eaanon01
Matt J
Well as I state the ones I have word on thay also calculate/rerunes the rest from the calculation. The macro does not look at anyting else than zero, but I see you point when just using '% 2'
eaanon01
A: 

You say "If I were writing the code above in assembler..." so I assume you know assembly language.

My advice: Have a look at the compiler output on critical sections of code, see what's really going on.

Jimmy J
+1  A: 

dirkGentley's answer:

Yes, most compilers are likely to optimize out the memory read should a[ i ]

Sometimes the compiler will not optimize the code when it is dealing with a pointer that "may be aliased". In your case Nick, if you are giving "a" as a function parameter, function(int * a), then the compiler might assume the pointer to "a" is aliased and therefore won't optimize.

If you quality the pointer as "int * restrict a" then the compiler will know that "a" is not being aliased and it will optimize.

The only way to know 100% whether the compiler is optimizing is to check the assembly!

Trevor Boyd Smith