tags:

views:

274

answers:

11

In C:

Lets say function "Myfuny()" has 50 line of codes in which other smaller functions also get called. Which one of the following code would be more efficient?

void myfunction(long *a, long *b);
int i;
for(i=0;i<8;i++)
   myfunction(&a, &b);

or

myfunction(&a, &b);
myfunction(&a, &b);
myfunction(&a, &b);
myfunction(&a, &b);
myfunction(&a, &b);
myfunction(&a, &b);
myfunction(&a, &b);
myfunction(&a, &b);  

any help would be appreciated.

+2  A: 

It depends on so many things your best bet is to do it both ways and measure.

n8wrl
+2  A: 

It would take less (of your) time to write out the for loop. I'd also say it's clearer to read with the loop. It would probably save a few instructions to write them out, but with modern processors and compilers it may amount to exactly the same result...

Kendrick
+14  A: 

That's premature optimization, you just shouldn't care...

Now, from a code maintenance point of view the first form (with the loop) is definitely better.

From a run-time point of view and if the function is inline and defined in the same compilation unit, and with a compiler that does not unroll the loop itself, and if code is already in instruction cache (I don't know for moon phases, I still believe it shouldn't have any noticable effect) the second one may be marginally fastest.

As you can see, there is many conditions for it to be fastest, so you shouldn't do that. There is probably many other parameters to optimize in your program that would have a much greater effect for code speed than this one. Any change that would affect algorithmic complexity of the program will have a much greater effect. More generally speaking any code change that does not affect algorithmic complexity is probably premature optimization.

If you really want to be sure, measure. On x86 you can use the kind of trick I used in this question to get a fairly accurate measure. The trick is to read a processor register that count the number of cycles spent. The question also illustrate how code optimization questions can become tricky, even for very simple problems.

kriss
+1 for suggesting he actually measure; that is the only way to know for sure. But along the lines of what you said, in most cases it would be better to write clean code up-front and then measure to identify/optimize performance bottlenecks.
Justin Ethier
@Justin Ethier: I couldn't agree more, but *clean* is something subjective. I remember pieces of code I worked on where a function was called two times in a row with swapped parameters. The original coder used a loop to do that, but what code did became clearer to me when I replaced this loop with two flat calls. The answer seems obvious in this exemple as it is so very simple, the cleaner code - if there is really one - may not be as ovious in real programs.
kriss
Fair enough. When I say clean I really mean "maintainable". But I agree this will mean different things depending on who is actually maintaining the code...
Justin Ethier
+4  A: 

I'd assume the compiler will translate the first variant into the second.

Tim
I'd assume the compiler **COULD** translate the first variant into the second if it detect an advantage in doing so.
Martin York
+1  A: 

The first. It is easier to read.

swegi
+3  A: 

The first. Any have half-decent compiler will optimize that for you. It's easier to read/understand and easier to write.

Secondly, write first, optimize second. Even if your compiler was completely brain dead and retarded, it at best would only save you a few nano/ms seconds on a modern CPU. Chances are there are bigger bottlenecks in your applications that could/should be optimized first.

dime
A: 

On modern processors the size of compiled code becomes very importand. If this loop could run entirly from processor's cache it would be the fastest solution. As n8wrl said test yourself.

Seba
A: 

First, are you sure you have a code execution performance problem? If you don't, then you're talking about making your code less readable and writable for no reason at all.

Second, have you profiled your program to see if this is in a place where it will take a significant amount of time? Humans are very bad at guessing the hot spots in programs, and without profiling you're likely to spend time and effort fiddling with things that don't make a difference.

Third, are you going to check the assembler code produced to see if there's a difference? If you're using an optimizing compiler with optimizations on, it's likely to produce what it sees fit for either. If you aren't, and you have a performance problem, get a better computer or turn on more optimizations.

Fourth, if there is a difference, are you going to test both ways to see which is better? On at least a representative sample of the systems your users will be running on?

And, to give you my best answer to which is more efficient: it depends. If they're in fact compiled to different code, the unrolled version might be faster because it doesn't have the loop overhead (which includes a conditional branch), and the rolled-up version might be faster because it's shorter code and will work better in the instruction cache. The usual wisdom was to unroll, but I once sped up a long-running section by rolling the execution up as tightly as I could.

David Thornley
A: 

I created a short test for this, with surprising results. At least for me, anyway, I would've thought it was the other way round.

So, I wrote two versions of a program iterating over a function nothing(), that did nothing interesting (inc on a variable).

The first used proper loops (a million iterations of 1000 iterations, two nested fors), the second one did a million iterations of 1000 consecutive calls to nothing().

I used the time command to measure. The version with the proper loop took about 3.5 seconds on average, and the consecutive calling version took about 2.5 seconds on average.

I then tried to compile with optimization flags, but gcc detected that the program did essentially nothing and execution was instantaneous on both versions =P. Didn't bother fixing that.

Edit: if you were actually thinking of writing 8 consecutive calls in your code, please don't. Remember the famous quote: "Programs must be written for people to read, and only incidentally for machines to execute.".

Also note that my tests did nothing except nothing() (=P) and are no proper benchmarks to consider in any actual program.

Santiago Lezica
A: 

Loop unrolling can make execution faster (otherwise Duff's Device wouldn't have been invented), but that's a function of so many variables (processor, cache size, compiler settings, what myfunction is actually doing, etc.) that you can't rely on it to always be true, or for whatever improvement to be worth the cost in readability and maintainability. The only way to know for sure if it makes a difference for your particular platform is to code up both versions and profile them.

Depending on what myfunction actually does, the difference could be so far down in the noise as to be undetectable.

This kind of micro-optimization should only be done if all of the following are true:

  1. You're failing to meet a hard performance requirement;
  2. You've already picked the proper algorithm and data structure for the problem at hand (e.g., in the average case a poorly optimized Quicksort will beat the pants off of a highly optimized bubble sort, and in the worst case they'll be equally bad);
  3. You're compiling with the highest level of optimization that the compiler offers;
John Bode
A: 

How much does myFunction(long *a, long *b) do?

If it does much more than *a = *b + 1; the cost of calling the function can be so small compared to what goes on inside the function that you are really focussing in the wrong place.

On the other hand, in the overall picture of your application program, what percent of time is spent in these 8 calls? If it's not very much, then it won't make much difference no matter how tightly you optimize it.

As others say, profile, but that's not necessarily as simple as it sounds. Here's the method I and some others use.

Mike Dunlavey