views:

1433

answers:

10

A lot of literature talks about using inline functions to "avoid the overhead of a function call". However I haven't seen quantifiable data. What is the actual overhead of a function call i.e. what sort of performance increase do we achieve by inlining functions?

+3  A: 

For very small functions inlining makes sense, because the (small) cost of the function call is significant relative to the (very small) cost of the function body. For most functions over a few lines it's not a big win.

Don Neufeld
+12  A: 

On most architectures, the cost consists of saving all (or some, or none) of the registers to the stack, pushing the function arguments to the stack (or putting them in registers), incrementing the stack pointer and jumping to the beginning of the new code. Then when the function is done, you have to restore the registers from the stack. This webpage has a description of what's involved in the various calling conventions.

Most C++ compilers are smart enough now to inline functions for you. The inline keyword is just a hint to the compiler. Some will even do inlining across translation units where they decide it's helpful.

Eclipse
on x86 (and many other arches) not ALL the registers need to be backed up because they are expected to be changed by function calls. C calling convention on x86 typically doesn't preserve eax, ecx and edx.
Evan Teran
Pushing all function parameters to the stack is the C ABI. C++ does not specify a specific ABI as part of the standard (unlike C). Thus allowing each compiler to optimize as required. Hence most C++ compilers don't push all parameters to the stack.
Martin York
@Martin York: C's ABI is not part of the standard -- it can't be, the standard is architecture agnostic, while the ABI depends on architecture. The standardized ABIs for C, which let it be used as a base interchange and glue language, are done by the OS or chip manufacturer. BeOS has a C++ ABI.
wnoise
Yup, and for Itanium Intel provided the C++ ABI.
MSalters
A: 

Each new function requires a new local stack to be created. But the overhead of this would only be noticeable if you are calling a function on every iteration of a loop over a very large number of iterations.

Ash
You mean "new stack frame", rather than "new stack", yes?
Andrew Edgecombe
A: 

For most functions, their is no additional overhead for calling them in C++ vs C (unless you count that the "this" pointer as an unnecessary argument to every function.. You have to pass state to a function somehow tho)...

For virtual functions, their is an additional level of indirection (equivalent to a calling a function through a pointer in C)... But really, on today's hardware this is trivial.

dicroce
The compiler will hardly ever inline a virtual function, so you're making a moot point. The only exception is when the object type is known at compile time, then the indirection can be skipped.
Mark Ransom
+2  A: 

The amount of overhead will depend on the compiler, CPU, etc. The percentage overhead will depend on the code you're inlining. The only way to know is to take your code and profile it both ways - that's why there's no definitive answer.

Mark Ransom
A: 

I don't have any numbers, either, but I'm glad you're asking. Too often I see people try to optimize their code starting with vague ideas of overhead, but not really knowing.

Andy Lester
A: 

There are a few issues here.

  • If you have a smart enough compiler, it will do some automatic inlining for you even if you did not specify inline. On the other hand, there are many things that cannot be inlined.

  • If the function is virtual, then of course you are going to pay the price that it cannot be inlined because the target is determined at runtime. Conversely, in Java, you might be paying this price unless you indicate that the method is final.

  • Depending on how your code is organized in memory, you may be paying a cost in cache misses and even page misses as the code is located elsewhere. That can end up having a huge impact in some applications.

Uri
Sorry, the virtual bit is the wrong way around. If your object is on the stack, within that function the compiler will know the type at compile time. Hence, it can resolve calls immediately, and doesn't need to put in a vtable lookup
MSalters
I'm not sure I see the point of this comment. Obviously in situations where there is no dynamic dispatching this is not going to be an issue.However, in many cases you will be using pointers and potentially polymorphic code, and it is not uncommon to see code upcasts or downcasts stack objects.
Uri
A: 

There is a great concept called 'register shadowing', which allows to pass ( up to 6 ? ),values thru registers ( on CPU ) instead of stack ( memory ). Also, depending on the function and variables used within, compiler may just decide that frame management code is not required !!

Also, even C++ compiler may do a 'tail recursion optimiztaion', i.e. if A() calls B(), and after calling B(), A just returns, compiler will reuse the stack frame !!

Of course, this all can be done, only if program sticks to the semantics of standard ( see pointer aliasing and it's effect on optimizations )

Vardhan Varma
The optimization you describe is not "tail recursion optimization", what you describe is real optimization...but tail recursion optimization is when a recursive function can be changed into function which loops because the recursion occurs at the end or "tail" of the function.
Evan Teran
Both are actually 'tail call optimisation': tail recursion is simply the special case of tail calling that calls itself.
Simon Buchan
+4  A: 

There's the technical and the practical answer. The practical answer is it will never matter, and in the very rare case it does the only way you'll know is through actual profiled tests.

The technical answer, which your literature refers to, is generally not relevant due to compiler optimizations. But if you're still interested, is well described by Josh.

As far as a "percentage" you'd have to know how expensive the function itself was. Outside of the cost of the called function there is no percentage because you are comparing to a zero cost operation. For inlined code there is no cost, the processor just moves to the next instruction. The downside to inling is a larger code size which manifests it's costs in a different way than the stack construction/tear down costs.

nedruod
A: 

It's worth pointing out that an inlined function increases the size of the calling function and anything that increases the size of a function may have a negative affect on caching. If you're right at a boundary, "just one more wafer thin mint" of inlined code might have a dramatically negative effect on performance.


If you're reading literature that's warning about "the cost of a function call," I'd suggest it may be older material that doesn't reflect modern processors. Unless you're in the embedded world, the era in which C is a "portable assembly language" has essentially passed. A large amount of the ingenuity of the chip designers in the past decade (say) has gone into all sorts of low-level complexities that can differ radically from the way things worked "back in the day."

Larry OBrien