views:

162

answers:

9

Hey guys, I'm doing a bit of hands on research surrounding the speed benefits of making a function inline. I don't have the book with me, but one text I was reading, was suggesting a fairly large overhead cost to making function calls; and when ever executable size is either negligible, or can be spared, a function should be declared inline, for speed.

I've written the following code to test this theory, and from what I can tell, there is no speed benifit from declaring a function as inline. Both functions, when called 4294967295 times, on my computer, execute in 196 seconds.

My question is, what would be your thoughts as to why this is happening? Is it modern compiler optimization? Would it be the lack of large calculations taking place in the function?

Any insight on the matter would be appreciated. Thanks in advance friends.

#include < iostream >
#include < time.h >

// RESEARCH                                                   Jared Thomson 2010
////////////////////////////////////////////////////////////////////////////////
// Two functions that preform an identacle arbitrary floating point calculation
// one function is inline, the other is not.

double test(double a, double b, double c);
double inlineTest(double a, double b, double c);

double test(double a, double b, double c){
    a = (3.1415 / 1.2345) / 4 + 5;
    b = 9.999 / a + (a * a);
    c = a *=b;
    return c;
}

inline
double inlineTest(double a, double b, double c){
    a = (3.1415 / 1.2345) / 4 + 5;
    b = 9.999 / a + (a * a);
    c = a *=b;
    return c;
}

// ENTRY POINT                                                Jared Thomson 2010
////////////////////////////////////////////////////////////////////////////////
int main(){
    const unsigned int maxUINT = -1;
    clock_t start = clock();

    //============================ NON-INLINE TEST ===============================//
    for(unsigned int i = 0; i < maxUINT; ++i)
        test(1.1,2.2,3.3);

    clock_t end = clock();
    std::cout << maxUINT << " calls to non inline function took " 
              << (end - start)/CLOCKS_PER_SEC << " seconds.\n";

    start = clock();

    //============================ INLINE TEST ===================================//
    for(unsigned int i = 0; i < maxUINT; ++i)
        test(1.1,2.2,3.3);

    end = clock();
    std::cout << maxUINT << " calls to inline function took " 
              << (end - start)/CLOCKS_PER_SEC << " seconds.\n";

    getchar(); // Wait for input.
    return 0;
} // Main.

Assembly Output

PasteBin

+8  A: 

The inline keyword is basically useless. It is a suggestion only. The compiler is free to ignore it and refuse to inline such a function, and it is also free to inline a function declared without the inline keyword.

If you are really interested in doing a test of function call overhead, you should check the resultant assembly to ensure that the function really was (or wasn't) inlined. I'm not intimately familiar with VC++, but it may have a compiler-specific method of forcing or prohibiting the inlining of a function (however the standard C++ inline keyword will not be it).

So I suppose the answer to the larger context of your investigation is: don't worry about explicit inlining. Modern compilers know when to inline and when not to, and will generally make better decisions about it than even very experienced programmers. That's why the inline keyword is often entirely ignored. You should not worry about explicitly forcing or prohibiting inlining of a function unless you have a very specific need to do so (as a result of profiling your program's execution and finding that a bottleneck could be solved by forcing an inline that the compiler has for some reason not done).

Re: the assembly:

; 30   :     const unsigned int maxUINT = -1;
; 31   :     clock_t start = clock();

    mov esi, DWORD PTR __imp__clock
    push    edi
    call    esi
    mov edi, eax

; 32   :     
; 33   :     //============================ NON-INLINE TEST ===============================//
; 34   :     for(unsigned int i = 0; i < maxUINT; ++i)
; 35   :         blank(1.1,2.2,3.3);
; 36   :     
; 37   :     clock_t end = clock();

    call    esi

This assembly is:

  1. Reading the clock
  2. Storing the clock value
  3. Reading the clock again

Note what's missing: calling your function a whole bunch of times

The compiler has noticed that you don't do anything with the result of the function and that the function has no side-effects, so it is not being called at all.

You can likely get it to call the function anyway by compiling with optimizations off (in debug mode).

Tyler McHenry
__declspec(noinline) disables inlining for a function.http://msdn.microsoft.com/en-us/library/kxybs02x(VS.80).aspx
Nick
To see the assembly in VC++: Project Settings -> Configuration Properties -> C/C++ -> Output Files -> Assembler Output
Cogwheel - Matthew Orlando
@Nick that's for member functions. for static functions you need `#pragma auto_inline(...)`
tenpn
+1 for "check the resultant assembly".
Jon-Eric
link to assembly output added to the bottom of the post.
Xoorath
Thanks @Tyler, after building release mode and having an instantanious execution, I figured that might be whats going on.I appreciate the feedback, and the time you took. I've certainly learned more about compiler optimization then inline functions, but I'm still pleased.Cheers
Xoorath
@Tyler McHenry: Compilers are good at making the smallest code, but they can't tell how often things will run. Reference parameters can be accessed much faster in inline code than non-inline code; who but a programmer would know whether a routine will be called often enough for that speed difference to matter?
supercat
@supercat A profiler will. Programmers are, in general, *not* very good at predicting in advance where their programs will spend the most the most time, except in the most obvious cases. That's why I said that manual inlining is only appropriate when you have a specific (read: demonstrated by measurement) need to do so.
Tyler McHenry
On modern PC-style systems, the only way to know what performance issues are going to arise is to profile things. Second-order issues such as caching have grown enormously in significance over the last decade, and will continue to do so. I do much of my work on embedded systems which are much more predictable. To be fair, I often optimize things that turn out not to need it, because it's easy to do so. Why have spend 80us reading a byte if it can be done in 40? The code's short and simple enough it's worth learning how to get the best results.
supercat
A: 

Both the functions could be inlined. The definition of the non-inline function is in the same compilation unit as the usage point, so the compiler is within its rights to inline it even without you asking.

Post the assembly and we can confirm it for you.

EDIT: the MSVC compiler pragma for banning inlining is:

#pragma auto_inline(off)
    void myFunction() { 
        // ...
    }
#pragma auto_inline(on)
tenpn
+1  A: 

Two things could be happening:

  1. The compiler may either be inlining both or neither functions. Check your compiler documentation for how to control that.

  2. Your function may be complex enough that the overhead of doing the function call isn't big enough to make a big difference in the tests.

Inlining is great for very small functions but it's not always better. Code bloat can prevent the CPU from caching code.

In general inline getter/setter functions and other one liners. Then during performance tuning you can try inlining functions if you think you'll get a boost.

Matt Edlefsen
+1  A: 

Um, shouldn't

//============================ INLINE TEST ===================================//
    for(unsigned int i = 0; i < maxUINT; ++i)
        test(1.1,2.2,3.3);

be

//============================ INLINE TEST ===================================//
    for(unsigned int i = 0; i < maxUINT; ++i)
         inlineTest(1.1,2.2,3.3);

?

But if that was just a typo, would recommend that look at a dissassembler or reflector to see if the code is actually inline or still stack-ed.

nonnb
@nonnb HAHAHA OH WOW.very true sir. I'll fix that, test it, and also read the assembly output as its running. Takes forever in debug mode, but in release it always yields 0 seconds.
Xoorath
@Xoorath That's likely because with optimizations on (which is the difference between release and debug modes), the compiler is likely to notice that you aren't doing anything with the results of your functions, and to eliminate the calls to them entirely. And even if you did do something with the results, as a few others pointed out, since your function results don't depend on their arguments, it's likely that the compiler could do all the math at compile-time, leaving you with negligible runtime.
Tyler McHenry
A: 

If this test took 196 seconds for each loop, then you must not have turned optimizations on; with optimizations off, generally compilers don't inline anything.

With optimization on, however, the compiler is free to notice that your test function can be completely evaluated at compile time, and crush it down to "return [constant]" -- at which point, it may well decide to inline both functions since they're so trivial, and then notice that the loops are pointless since the function value is not used, and squash that out too! This is basically what I got when I tried it.

So either way, you're not testing what you thought you tested.


Function call overhead ain't what it used to be, compared to the overhead of blowing out the level-1 instruction cache, which is what aggressive inlining does to you. You can easily find reports online of gcc's -Os option (optimize for size) being a better default choice for large projects than -O2, and the big reason for that is that -O2 inlines a lot more aggressively. I would expect it is much the same with MSVC.

Zack
A: 

The only way I know of to guarantee a function is inline is to #define it

For example:

#define RADTODEG(x) ((x) * 57.29578)

That said, the only time I would bother with such a function would be in an embedded system. On a desktop/server the performance difference is negligible.

Doug
I'm a game developer, working on a math library. Speed is an asset. Thanks for the reply though, I've used defines in other applications like that before.
Xoorath
+1  A: 

Your code as posted contains a couple oddities.

1) The math and output of your test functions are completely independent of the function parameters. If the compiler is smart enough to detect that those functions always return the same value, that might give it incentive to optimize them out entirely inline or not.

2) Your main function is calling test for both the inline and non-inline tests. If this is the actual code that you ran, then that would have a rather large role to play in why you saw the same results.

As others have suggested, you would do well to examine the actual assembly code generated by the compiler to determine that you're actually testing what you intended to.

TheUndeadFish
A: 

Run it in a debugger and have a look at the generated code to see if your function is always or never inlined. I think it's always a good idea to have a look at the assembler code when you want more knowledge about the optimization the compiler does.

IanH
+1  A: 

Apologies for a small flame ...

Compilers think in assembly language. You should too. Whatever else you do, just step through the code at the assembler level. Then you'll know exactly what the compiler did.

Don't think of performance in absolute terms like "fast" or "slow". It's all relative, percentage-wise. The way software is made fast is by removing, in successive steps, things that take too large a percent of the time.

Here's the flame: If a compiler can do a pretty good job of inlining functions that clearly need it, and if it can do a really good job of managing registers, I think that's just what it should do. If it can do a reasonable job of unrolling loops that clearly could use it, I can live with that. If it's knocking itself out trying to outsmart me by removing function calls that I clearly wrote and intended to be called, or scrambling my code sanctimoniously trying to save a JMP when that JMP occupies 0.000001% of running time (the way Fortran does), I get annoyed, frankly.

There seems to be a notion in the compiler world that there's no such thing as an unhelpful optimization. No matter how smart the compiler is, real optimization is the programmer's job, and nobody else's.

Mike Dunlavey
@Mike Dunlavey: Well put.
Xoorath