tags:

views:

400

answers:

5

If I call a virtual function 1000 times in a loop, will I suffer from the vtable lookup overhead 1000 times or only once?

+2  A: 

If the compiler can deduce that the object on which you're calling the virtual function doesn't change, then, in theory, it should be able to hoist the vtable lookup out of the loop.

Whether your particular compiler actually does this is something you can only find out by looking at the assembly code it produces.

Martin B
... or by profiling. Write a code that must do 1000 lookups and compare.
Tadeusz A. Kadłubowski
But what would you compare it against? You can't compare it against a non-virtual function because that calls an absolute address as opposed to an indirect address. You also can't compare it against code that calls 1000 different objects because a) you have to get the addresses of those objects from somewhere, which takes extra time, and b) calling 1000 different objects is much less cache-friendly, so we would expect it to be slower anyway.
Martin B
Compared against a volatie Foo*. Note that you first have to check 1000 non-virtual calls to see how much overhead you incur by reloading `this` on every call. Then, compare volatile and non-volatile Foo*'s over 1000 virtual calls to see how much _additional_ overhead the vtable lookup incurs.
MSalters
That's a good point!
Martin B
+5  A: 

The compiler may be able to optimise it - for example, the following is (at least conceptually) easliy optimised:

Foo * f = new Foo;
for ( int i = 0; i < 1000; i++ ) {
   f->func();
}

However, other cases are more difficult:

vector <Foo *> v;
// populate v with 1000 Foo (not derived) objects
for ( int i = 0; i < v.size(); i++ ) {
   v[i]->func();
}

the same conceptual optimisation is applicable, but much harder for the compiler to see.

Bottom line - if you really care about it, compile your code with all optimisations enabled and examine the compiler's assembler output.

anon
+2  A: 

The Visual C++ compiler (at least through VS 2008) does not cache vtable lookups. Even more interestingly, it doesn't direct-dispatch calls to virtual methods where the static type of the object is sealed. However, the actual overhead of the virtual dispatch lookup is almost always negligible. The place where you sometimes do see a hit is in the fact that virtual calls in C++ cannot be replaced by direct calls like they can in a managed VM. This also means no inlining for virtual calls.

The only true way to establish the impact for your application is using a profiler.

Regarding the specifics of your original question: if the virtual method you are calling is trivial enough that the virtual dispatch itself is incurring a measurable performance impact, then that method is sufficiently small that the vtable will remain in the processor's cache throughout the loop. Even though the assembly instructions to pull the function pointer from the vtable are executed 1000 times, the performance impact will be much less than (1000 * time to load vtable from system memory).

280Z28
Thank you very much for your answer and comments. I'll check that on gcc when I get some time.
poulejapon
A: 

I would say this depends on your compiler as well as on the look of the loop. Optimizing compilers can do a lot for you and if the VF-call is predictable the compiler can help you. Maybe you can find something about the optimizations your compiler does in your compiler documentation.

ManniAT
I know it "could", I don't if it "does" help me.
poulejapon
+1  A: 

I think that the problem is not vtable lookup since that's very fast operation especially in a loop where you have all required values on cache (if the loop is not too complex, but if it's complex then virtual function wouldn't impact performance a lot). The problem is the fact that compiler cannot inline that function in compile time.

This is especially a problem when virtual function is very small (e.g. returning only one value). The relative performance impact in this case can be huge because you need function call to just return a value. If this function can be inlined, it would improve performance very much.

If the virtual function is performance consuming, then I wouldn't really care about vtable.

Aleksandar