tags:

views:

170

answers:

4

At school, we about virtual functions in C++, and how they are resolved (or found, or matched, I don't know what the terminology is -- we're not studying in English) at execution time instead of compile time. The teacher also told us that compile-time resolution is much faster than execution-time (and it would make sense for it to be so). However, a quick experiment would suggest otherwise. I've built this small program:

#include <iostream>
#include <limits.h>

using namespace std;

class A {
    public:
    void f() {
        // do nothing
    }
};

class B: public A {
    public:
    void f() {
        // do nothing
    }
};

int main() {
    unsigned int i;
    A *a = new B;
    for (i=0; i < UINT_MAX; i++) a->f();
    return 0;
}

I compiled the program above and named it normal. Then, I modified A to look like this:

class A {
    public:
    virtual void f() {
        // do nothing
    }
};

Compiled and named it virtual. Here are my results:

[felix@the-machine C]$ time ./normal 

real    0m25.834s
user    0m25.742s
sys 0m0.000s
[felix@the-machine C]$ time ./virtual 

real    0m24.630s
user    0m24.472s
sys 0m0.003s
[felix@the-machine C]$ time ./normal 

real    0m25.860s
user    0m25.735s
sys 0m0.007s
[felix@the-machine C]$ time ./virtual 

real    0m24.514s
user    0m24.475s
sys 0m0.000s
[felix@the-machine C]$ time ./normal 

real    0m26.022s
user    0m25.795s
sys 0m0.013s
[felix@the-machine C]$ time ./virtual 

real    0m24.503s
user    0m24.468s
sys 0m0.000s

There seems to be a steady ~1 second difference in favor of the virtual version. Why is this?


Relevant or not: dual-core pentium @ 2.80Ghz, no extra applications running between two tests. Archlinux with gcc 4.5.0. Compiling normally, like:

$ g++ test.cpp -o normal

Also, -Wall doesn't spit out any warnings, either.


Edit: I have separated my program into A.cpp, B.cpp and main.cpp. Also, I made the f() (both A::f() and B::f()) function actually do something (x = 0 - x where x is a public int member of A, initialized with 1 in A::A()). Compiled this into six versions, here are my final results:

[felix@the-machine poo]$ time ./normal-unoptimized 

real    0m31.172s
user    0m30.621s
sys 0m0.033s
[felix@the-machine poo]$ time ./normal-O2

real    0m2.417s
user    0m2.363s
sys 0m0.007s
[felix@the-machine poo]$ time ./normal-O3

real    0m2.495s
user    0m2.447s
sys 0m0.000s
[felix@the-machine poo]$ time ./virtual-unoptimized 

real    0m32.386s
user    0m32.111s
sys 0m0.010s
[felix@the-machine poo]$ time ./virtual-O2

real    0m26.875s
user    0m26.668s
sys 0m0.003s
[felix@the-machine poo]$ time ./virtual-O3

real    0m26.905s
user    0m26.645s
sys 0m0.017s

Unoptimized is still 1 second faster when virtual, which I find a bit peculiar. But this was a nice experiment and would like to thank all of you for your answers!

+10  A: 

Once the vtable is in the cache, the performance difference between virtual and non-virtual functions that actually do something is very small. It's certainly not something you should normally concern yourself with when developing software using C++. And as others have pointed out, benchmarking unoptimised code in C++ is pointless.

anon
+1 in the real world, @Felix, worrying about things like this is called *micro-optimizing*, and is not something good programmers waste their time doing - programming large systems is hard enough as it is. The only time you really worry about the speed of virtual calls (and other micro-optimizations) is when you are calling that function millions of times a second.
BlueRaja - Danny Pflughoeft
+6  A: 

Profiling unoptimised code is pretty much meaningless. Use -O2 to produce a meaningful result. Using -O3 may result in even faster code, but it may not generate a realistic outcome unless you compile A::f and B::f separately to main (i.e., in separate compilation units).

Based on the feedback, perhaps even -O2 is too aggressive. The 2 ms result is because the compiler optimised the loop away entirely. Direct calls aren't that fast; in fact, it ought to be very difficult to observe any appreciable difference. Move the implementations of f into a separate compilation unit to get real numbers. Define the classes in a .h, but define A::f and B::f in their own .cc file.

Marcelo Cantos
Holy crap! Both `-O2` and `-O3` reduced the `normal` version to .002 seconds, while the `virtual` stayed at ~27 seconds.
Felix
@Felix The optimiser has probably removed the function call (and maybe the loop) altogether - you have to make the function have a global side effect to prevent this.
anon
@Felix: Note that this is also not representative of the overhead of a normal virtual function call: In the `normal` case probably all the function calls and the whole loop got optimized away.
sth
If you want a more realistic scenario, move the definitions of `A::f` and `B::f` into a separate file. This way, the code generator won't be know what those functions do and will have to generate actual calls (unless you have link-time/whole-program optimization enabled).
R Samuel Klatchko
@Felix: The optimizer optimized the non-virtual call and even the for-loop away completely. In actuality, virtual calls are not much slower than non-virtual calls (ex. calling an empty virtual function should be faster than dividing an int by a float).
BlueRaja - Danny Pflughoeft
+2  A: 

Given how much the CPU was doing under the hood reorganizing your code, interleaving computation and memory access, I wouldn't read too much into a 4% difference - it's not worth worring about since you can't draw any sensible conclusions from a microbenchmark like this.

Try a real computation, with real memory access to get a feel for how much the virtual method is costing you. The virtual method itself usually isn't the problem - a modern cpu will interleave the pointer fetch from vtable with other work - it's the lack of inlining that kills performance.

mdma
+1  A: 

Given the simplicity of the program, there is a decent chance the compiler is optimizing away certain things, or something along that line. Adding complexity/making the compiler compile exactly what you want is something you should aim for with this kind of thing (at runtime the difference is I believe only 2 dereferences, so less than the rest of the function call). One way to do this is, as @Marcelo said, compiling A and B in separate files from main -- I would go a step further and compile each in its own file. I disagree with him, however, in that, for reasons stated above, you should turn optimizations OFF, so that the compiler produces a literal translation of your code, and doesn't remove things.

Jared P