views:

379

answers:

10

My function will be called thousands of times. If i want to make it faster, will changing the local function variables to static be of any use? My logic behind this is that, because static variables are persistent between function calls, they are allocated only the first time, and thus, every subsequent call will not allocate memory for them and will become faster, because the memory allocation step is not done.

Also, if the above is true, then would using global variables instead of parameters be faster to pass information to the function every time it is called? i think space for parameters is also allocated on every function call, to allow for recursion (that's why recursion uses up more memory), but since my function is not recursive, and if my reasoning is correct, then taking off parameters will in theory make it faster.

I know these things I want to do are horrible programming habits, but please, tell me if it is wise. I am going to try it anyway but please give me your opinion.

+8  A: 

The best way to find out is to actually run a profiler. This can be as simple as executing several timed tests using both methods and then averaging out the results and comparing, or you may consider a full-blown profiling tool which attaches itself to a process and graphs out memory use over time and execution speed.

Do not perform random micro code-tuning because you have a gut feeling it will be faster. Compilers all have slightly different implementations of things and what is true on one compiler on one environment may be false on another configuration.

To tackle that comment about fewer parameters: the process of "inlining" functions essentially removes the overhead related to calling a function. Chances are a small function will be automatically in-lined by the compiler, but you can suggest a function be inlined as well.

In a different language, C++, the new standard coming out supports perfect forwarding, and perfect move semantics with rvalue references which removes the need for temporaries in certain cases which can reduce the cost of calling a function.

I suspect you're prematurely optimizing, however, you should not be this concerned with performance until you've discovered your real bottlenecks.

M2tM
+1 for being sensible and resisting the urge to hazard a guess. :)
quixoto
thanks! i tried it, as i said i would. my program already had a piece of code that counted the seconds it took to do its thing. what it did in 60 seconds previous to the static/global thing now took 49 seconds. i still cannot say it was a good idea, but it did seem to work this time, giving consistent results :) i didn't know about the compiler optimizations or that the stack was also used for local variables of functions (i am still much of a newbie). also, i will for sure look into c++0x when its here (all of its features: i think the rvalue thing and lambdas are already in GCC :D). thanks!!
+3  A: 

Absolutly not! The only "performance" difference is when variables are initialised

    int anint = 42;
 vs
    static int anint = 42;

In the first case the integer will be set to 42 every time the function is called in the second case ot will be set to 42 when the program is loaded.

However the difference is so trivial as to be barely noticable. Its a common misconception that storage has to be allocated for "automatic" variables on every call. This is not so C uses the already allocated space in the stack for these variables.

Static variables may actually slow you down as its some aggresive optimisations are not possible on static variables. Also as locals are in a contiguous area of the stack they are easier to cache efficiently.

James Anderson
+1  A: 

Yes, using static variables will make a function a tiny bit faster. However, this will cause problems if you ever want to make your program multi-threaded. Since static variables are shared between function invocations, invoking the function simultaneously in different threads will result in undefined behaviour. Multi-threading is the type of thing you may want to do in the future to really speed up your code.

Most of the things you mentioned are referred to as micro-optimizations. Generally, worrying about these kind of things is a bad idea. It makes your code harder to read, and harder to maintain. It's also highly likely to introduce bugs. You'll likely get more bang for your buck doing optimizations at a higher level.

As M2tM suggestions, running a profiler is also a good idea. Check out gprof for one which is quite easy to use.

Michael Mior
+1  A: 

You can always time your application to truly determine what is fastest. Here is what I understand: (all of this depends on the architecture of your processor, btw)

C functions create a stack frame, which is where passed parameters are put, and local variables are put, as well as the return pointer back to where the caller called the function. There is no memory management allocation here. It usually a simple pointer movement and thats it. Accessing data off the stack is also pretty quick. Penalties usually come into play when you're dealing with pointers.

As for global or static variables, they're the same...from the standpoint that they're going to be allocated in the same region of memory. Accessing these may use a different method of access than local variables, depends on the compiler.

The major difference between your scenarios is memory footprint, not so much speed.

KFro
This is an important point - as long as your variables aren't initialised, allocated 100 automatic variables is just as fast as allocating one.
caf
It should be noted that the compiler is "allocating" the memory, not a memory management system.
KFro
+14  A: 

The overhead of local variables is zero. Each time you call a function, you are already setting up the stack for the parameters, return values, etc. Adding local variables means that you're adding a slightly bigger number to the stack pointer (a number which is computed at compile time).

Also, local variables are probably faster due to cache locality.

If you are only calling your function "thousands" of times (not millions or billions), then you should be looking at your algorithm for optimization opportunities after you have run a profiler.


Re: cache locality (read more here): Frequently accessed global variables probably have temporal locality. They also may be copied to a register during function execution, but will be written back into memory (cache) after a function returns (otherwise they wouldn't be accessible to anything else; registers don't have addresses).

Local variables will generally have both temporal and spatial locality (they get that by virtue of being created on the stack). Additionally, they may be "allocated" directly to registers and never be written to memory.

Seth
+1 in terms of modern CPU speeds, "a thousand times a second" is "once every few million cycles".
Jurily
+1 though it depends of course how the compiler generates code. For intelligent compilers. the difference is between `sub sp, 20` and `sub sp, 24`, which is no difference at all.
paxdiablo
+1: I was just writing nearly the same response.
drewk
i dont really understand the stack. i thought only the parameters were pushed into it on function call. i thought the local variables were made with some sort of malloc. now i did a little research and kind of understand the difference between dynamic and stack based allocation. thanks :D
Could you explain the "cache locality" thing further? I thought, with local variables there is nothing to cache between calls.
Jurily
not quite zero, not all functions need to setup the stack and manage it, depends on the code and how many locals, etc (and the target processor). If all of your locals are static for example (which very similar to using globals from a compiler perspective) you might save a few instructions, so zero overhead is not a true statement. On average though the difference between local globals (locals with the word static) and local variables is minimal, not necessarily the place you need to look for performance gains.
dwelch
A: 

I agree with the others comments about profiling to find out stuff like that, but generally speaking, function static variables should be slower. If you want them, what you are really after is a global. Function statics insert code/data to check if the thing has been initialized already that gets run every time your function is called.

Chris Marsh
A: 

Using static variables can actually make your code significantly slower. Static variables must exist in a 'data' region of memory. In order to use that variable, the function must execute a load instruction to read from main memory, or a store instruction to write to it. If that region is not in the cache, you lose many cycles. A local variable that lives on the stack will most surely have an address that is in the cache, and might even be in a cpu register, never appearing in memory at all.

TokenMacGuy
*every time the function gets called, it has to check to make sure that the static variable is [not] initialized yet* <- This is incorrect. Before main() runs, all of the static variables are initialized (in __start()). Globals are also initialized at this time.
reemrevnivek
generally the load instruction will be used both for the local on the stack or local in the data region. getting the variable initialized the first time is a good point, good coding requires that if then else. Knowing if your compiler/environment zeros that memory on program launch is a shortcut to this, risky, bad coding style, but often works and is fast(er).
dwelch
@dwelch: the point is that a local may not appear in main memory at all, it can be optimized (safely) to live in a register only.
TokenMacGuy
yes, I understand that, with or without the static on there you get the same optimization. With the static there is a memory location in .data memory reserved for that variable, may never get used. without the static there is sometimes a stack location reserved for that variable, may never get used.
dwelch
+2  A: 

There is no one answer to this. It will vary with the CPU, the compiler, the compiler flags, the number of local variables you have, what the CPU's been doing before you call the function, and quite possibly the phase of the moon.

Consider two extremes; if you have only one or a few local variables, it/they might easily be stored in registers rather than be allocated memory locations at all. If register "pressure" is sufficiently low that this may happen without executing any instructions at all.

At the opposite extreme there are a few machines (e.g., IBM mainframes) that don't have stacks at all. In this case, what we'd normally think of as stack frames are actually allocated as a linked list on the heap. As you'd probably guess, this can be quite slow.

When it comes to accessing the variables, the situation's somewhat similar -- access to a machine register is pretty well guaranteed to be faster than anything allocated in memory can possible hope for. OTOH, it's possible for access to variables on the stack to be pretty slow -- it normally requires something like an indexed indirect access, which (especially with older CPUs) tends to be fairly slow. OTOH, access to a global (which a static is, even though its name isn't globally visible) typically requires forming an absolute address, which some CPUs penalize to some degree as well.

Bottom line: even the advice to profile your code may be misplaced -- the difference may easily be so tiny that even a profiler won't detect it dependably, and the only way to be sure is to examine the assembly language that's produced (and spend a few years learning assembly language well enough to know say anything when you do look at it). The other side of this is that when you're dealing with a difference you can't even measure dependably, the chances that it'll have a material effect on the speed of real code is so remote that it's probably not worth the trouble.

Jerry Coffin
Um, Jerry, notice
Will
+1  A: 

It looks like the static vs non-static has been completely covered but on the topic of global variables. Often these will slow down a programs execution rather than speed it up.

The reason is that tightly scoped variables make it easy for the compiler to heavily optimise, if the compiler has to look all over your application for instances of where a global might be used then its optimising won't be as good.

This is compounded when you introduce pointers, say you have the following code:

int myFunction()
{
    SomeStruct *A, *B;
    FillOutSomeStruct(B);
    memcpy(A, B, sizeof(A);
    return A.result;
}

the compiler knows that the pointer A and B can never overlap and so it can optimise the copy. If A and B are global then they could possibly point to overlapping or identical memory, this means the compiler must 'play it safe' which is slower. The problem is generally called 'pointer aliasing' and can occur in lots of situations not just memory copies.

http://en.wikipedia.org/wiki/Pointer_alias

Daniel
A: 

Profiling may not see the difference, disassembling and knowing what to look for might.

I suspect you are only going to get a variation as much as a few clock cycles per loop (on average depending on the compiler, etc). Sometimes the change will be dramatic improvement or dramatically slower, and that wont necessarily be because the variables home has moved to/from the stack. Lets say you save four clock cycles per function call for 10000 calls on a 2ghz processor. Very rough calculation: 20 microseconds saved. Is 20 microseconds a lot or a little compared to your current execution time?

You will likely get more a performance improvement by making all of your char and short variables into ints, among other things. Micro-optimization is a good thing to know but takes lots of time experimenting, disassembling, timing the execution of your code, understanding that fewer instructions does not necessarily mean faster for example.

Take your specific program, disassemble both the function in question and the code that calls it. With and without the static. If you gain only one or two instructions and this is the only optimization you are going to do, it is probably not worth it. You may not be able to see the difference while profiling. Changes in where the cache lines hit could show up in profiling before changes in the code for example.

dwelch