views:

331

answers:

6

Hi,

I have programmed an embedded software (using C of course) and now I'm considering ways to improve the running time of the system. The most important single module in my system is one very large nested for loop module.

That module consists of two nested for loops that loops max 122500 times. That's not very much yet, but the problem is that inside that nested for loop I have a function call to a function that is in another source file. That specific function consists mostly of two another nested for loops which loops always 22500 times. So now I have to make a function call 122500 times.

I have made that function that is to be called a lot lighter and shorter (yet still works as it should) and now I started to think that would it be faster to rip off that function call and write that process directly inside those first two for loops?

The processor in that system is ARM7TDMI and its frequency is 55MHz. The system itself isn't very time critical so it doesn't have to be real time capable. However the faster it can process its duties the better.

Also would it be also faster to use while loops instead of fors? And any piece of advice about how to improve the running time is appreciated.

-zaplec

+17  A: 

TRY IT AND SEE!!

It'll almost certainly make a difference. Function call overhead isn't usually that much of an issue, but at over 100K repetitions it starts to add up.

...But whether or not it makes any real-world difference is something only you can answer, after trying it and timing the results.

As for for vs while... it shouldn't matter unless you actually change the behavior when changing the loop. If in doubt, make your compiler spit out assembler code for both and compare... or just change it and time it.

Shog9
I guess that's the only wise way to go as those changes can be made quite easily now that I pre-optimized the code. Converting the code to assembly and estimating the performance would probably be too much work compared to simple try and see solution. Thanks to everyone for many fast answers :)
zaplec
A: 

For function vs. inline, unfortunately there is no easy answer. I.e. it depends. See this FAQ. For "for" vs. "while", I wouldn't think there is any significant difference in performance.

grokus
+3  A: 

You need to be careful in the optimizations you make because you aren't always clear on which optimizations the compiler is making for you. Pre-optimization is a common mistake people make. Is it important that your code is readable and easily maintained or slightly faster? Like others have suggested, the best approach is to benchmark the different ways and see if there is a noticeable difference.

If you don't believe your compiler does much in the way of optimization I would look at some older concepts in optimizing C (searches on SO or google should provide some good links).

NebuSoft
+1  A: 

Run a profiler on your code. If you are just guessing at where you are spending your time, you are probably wrong. A profiler will show what function is taking the most time and you can focus on that. You could be doing something in the function that takes longer than the function call itself. Did you look to see if you can change floating operations to integer, or integer math to shifts? You can spend a lot of time fiddling with things that don't make much difference. Run a profiler on your code and know for sure that the things you are changing will make a difference.

Jim Tshr
I'm not very familiar with profilers, but at least I'll try one. I guess it's very valuable to learn about them :)
zaplec
+2  A: 

The ARM processor has an instruction pipeline (cache). When the processor encounters a branch (call) instruction, it must clear the pipeline and reload, thus wasting some time. One objective when optimizing for speed is to reduce the number of reloads to the instruction pipeline. This means reducing branch instructions.

As others have stated in SO, compile your code with optimization set for speed, and profile. I prefer to look at the assembly language listing as well (either printed from the compiler or displayed interwoven in the debugger). Use this as a baseline. If you can't profile, you can use assembly instruction counting as a rough estimate.

The next step is to reduce the number of branches; or the number times a branch is taken. Unrolling loops helps to reduce the number of times a branch is taken. Inlining helps reduce the number of branches. Before applying this fine-tuning techniques, review the design and code implementation to see if branches can be reduced. For example, reduce the number of "if" statements by using Boolean arithmetic or using Karnaugh Maps. My favorite is reducing requirements and eliminating code that doesn't need to be executed.

In the code implementation, move code that doesn't change outside of the for or while loops. Some loops may be reduce to equations (example, replacing a loop of additions with a multiplication). Also, reduce the quantity of iterations, by asking "does this loop really need to be executed this many times").

Another technique is to optimize for Data Oriented Design. Also check this reference.

Just remember to set a limit for optimizing. This is where you decide any more optimization is not generating any ROI or customer satisfaction. Also, apply optimizations in stages; which will allow you to have a deliverable when your manager asks for one.

Thomas Matthews
A: 

In general, a function call should have more overhead than inlining. You really should profile however, as this can be affected quite a bit by your compiler (especially the compile/optimization settings). Some compilers will automatically inline code for example.

Gregory Gelfond