tags:

views:

295

answers:

8

Hi, I have kept hearing this statement. Switch..Case is Evil for code maintenance, but it provides better performance(since compiler can inline stuffs etc..). Virtual functions are very good for code maintenance, but they incur a performance penalty of two pointer indirections.

Say i have a base class with 2 subclasses(X and Y) and one virtual function, so there will be two virtual tables. The object has a pointer, based on which it will choose a virtual table. So for the compiler, it is more like

switch( object's function ptr )
{

   case 0x....:

       X->call();

       break;

   case 0x....:

       Y->call();
};

So why should virtual function cost more, if it can get implemented this way, as the compiler can do the same in-lining and other stuff here. Or explain me, why is it decided not to implement the virtual function execution in this way?

Thanks, Gokul.

A: 

There is no branching in virtual dispatch. The vptr in your class points to a vtable, with a second pointer for the specific function at a constant offset.

rlbond
If branching can help inlining, why is it not done that way?
Gokul
Technically that *is* a branch; an assembly writer refers to a jump to an address stored in a register as an "indirect branch".
Crashworks
Branching in a switch-like statement requires you know what all the branches may be. With vtable pointers, you can add a new descendant class to the program (possibly via a shared library loaded years after the host program's code shipped) without having to recompile *everything* that could possibly use that descendant through a base-class pointer.
Rob Kennedy
But when i compile and link to produce an executable, am i not telling the compiler/linker that this is a final executable and no more shared lib can get added as a descendant? Or in other words, can linker perform these optimizations?
Gokul
Not necessarily. For example with a dynamic library, if you update the dll you don't need to relink.
rlbond
+2  A: 

The compiler can't do that because of the separate compilation model.

At the time the virtual function call is being compiled, there is no way for the compiler to know for sure how many different subclasses there are.

Consider this code:

// base.h
class base
{
public:
    virtual void doit();
};

and this:

// usebase.cpp
#include "base.h"

void foo(base &b)
{
    b.doit();
}

When the compiler is generating the virtual call in foo, it has no knowledge of which subclasses of base will exist at runtime.

R Samuel Klatchko
I have asked the question already below. But can linker do this optimization?
Gokul
You could have the linker do it, but that would bring it's own problems. The compiler would have to leave space in the instruction stream for the linker to patch in the code - but how does the compiler know how much space to leave? You could have the compiler call into a linker generated function but then you're back to branching so you haven't gained anything.
R Samuel Klatchko
This is precisely the kind of thing that link-time code generation and profile-guided optimization could do. Note I say *could* -- I do not claim to know the gory details of either.
Patrick Johnmeyer
@Patrick!! Thanks, that was useful..
Gokul
Can you explain how boost::variant fits in here? thanks.
Gokul
Wouldn't the linker need to know the addresses of the functions to do this (think shared objects)?
Mark B
A: 

Your statement about the branching when calling virtual function is wrong. There is not such thing in generated code. Take a look at the assembly code will give you a better idea.

In a nut shell, one general simplified implementation of C++ virtual function is: each class will have a virtual table (vbtl), and each instance of the class will have a virtual table pointer (vptr). The virtual table is basically a list of function pointers.

When you are calling a virtual function, say it is like:

class Base {};
class Derived {};
Base* pB = new Derived();
pB->someVirtualFunction();

The 'someVirtualFunction()' will have a corresponding index in the vtbl. And the call

pB->someVirtualFunction(); 

will be converted to something like:

pB->vptr[k](); //k is the index of the 'someVirtualFunction'.

In this way the function is actually called indirectly and it has the polymorphism.

I suggest you to read 'The C++ Object Model' by Stanley Lippman.

Also, the statement that virtual function call is slower than the switch-case is not accruate. It depends. As you can see above, a virtual function call is just 1 extra time of dereference compared to a regular function call. And with switch-case branching you'd have extra comparision logic (which introduce the chance of missing cache for CPU) which also consumes CPU cycles. I would say in most case, if not all, virutal function call should be faster than switch-case.

Findekano
Then why do folks in boost community recommend to use boost::variant in place of Virtual functions, if we know the subclasses involved? I think boost::variant uses switch..case internally.
Gokul
I am not sure why you are comparing virtual function with boost::variant. They are solving different problems.
Findekano
If i read the problem/motivation of boost::variant, they are saying that. You can even read this thread http://lists.boost.org/Archives/boost/2010/02/162072.php
Gokul
@Gokul: First of all, what's the problem that the virtual function tries to solve? (dynamic) Plymorphism, which allows different types to be handled by a uniform interface. A virtual function call is not bound until runtime. The key here is that the type remains UNKNOWN until runtime. In the link you gave, the type of the function call is actually already known at compile time, that why it can be implemented in a switch-case manner. We can call it compile time plymorphism. This is a technique used widely in libs like WTL. The key here is that the type is KNOWN at compile time.
Findekano
@Gokul: Take a look at http://www.slideshare.net/reachanil/static-and-dynamic-polymorphism-in-c for more info.
Findekano
A: 

Actually, if you'll have many virtual functions, switch-like branching will be slower than two pointer indirection. Performance of the current implementation doesn't depend on how many virtual functions you have.

code ex machina
Yep! if there is a case for optimization, with lesser switch..case, then the compiler can selectively work on it right?
Gokul
A: 

Saying definitively that switch/case is more or less performant than virtual calls is an over-generalization. The truth is that this will depend on many things, and will vary based on:

  • what compiler you are using
  • what optimizations are enabled
  • the overall characteristics of your program and how they impact those optimizations

If you are optimizing the code in your head as you write it, there is a good chance that you are making the wrong choice. Write the code in a human-readable and/or user-friendly way first, then run the entire executable through profiling tools. If this area of the code shows up as a hotspot, then try it both ways and see which is quantifiably better for your particular case.

Patrick Johnmeyer
+1  A: 

Your question rests on misunderstandings about the way switches and virtual functions work. Rather than fill up this box with a long treatise on code generation, I'll give a few bullet points:

  • Switch statements aren't necessarily faster than virtual function calls, or inlined. You can learn more about the way that switch statements are turned into assembly here and here.
  • The thing that is slow about virtual function calls isn't the pointer lookups, it's the indirect branch. For complicated reasons having to do with the internal electronics of the CPU, for most modern processors it is faster to perform a "direct branch", where the destination address is encoded in the instruction, than an "indirect branch", where the address is computed at runtime. Virtual function calls and large switch statements are usually implemented as indirect branches.
  • In your example above, the switch is completely redundant. Once an object's member function pointer has been computed, the CPU can branch straight to it. Even if the linker was aware of every possible member object that existed in the executable, it would still be unnecessary to add that table lookup.
Crashworks
can you explain me how boost::variant fits in here? Thanks..
Gokul
A: 

Here's some results from concrete tests. These particular results are from VC++ 9.0/x64:

Test Description: Time to test a global using a 10-way if/else if statement
CPU Time:        7.70  nanoseconds           plus or minus      0.385

Test Description: Time to test a global using a 10-way switch statement
CPU Time:        2.00  nanoseconds           plus or minus     0.0999

Test Description: Time to test a global using a 10-way sparse switch statement
CPU Time:        3.41  nanoseconds           plus or minus      0.171

Test Description: Time to test a global using a 10-way virtual function class
CPU Time:        2.20  nanoseconds           plus or minus      0.110

With sparse cases, the switch statement is substantially slower. With dense cases, the switch statement might be faster, but the switch and the virtual function dispatch overlap a bit, so while the switch is probably faster, the margin is so small we can't even be sure it is faster, not to mention being enough faster to care much about. If the cases in the switch statement are sparse at all, there's no real question that the virtual function call will be faster.

Jerry Coffin
Nice statistics. Can you explain me on why boost::variant is recommended over virtual functions in boost?
Gokul
@Gokul:About all I could do is read through their mailing list to try to figure out their reasoning. It might be out of date, or it might apply only in certain situations, or ...
Jerry Coffin
A: 

These kind of optimizations are possible only by a repatching linker which should run as a part of C++ runtime.

The C++ runtime is more complex that even a new DLL load (with COM) will add new function pointers to the vtable. (think about pure virtual fns?)

Then compiler or linker both cannot do this optimization. switch/case is obviously faster than indirect call since prefetch in CPU is deterministic and pipelining is possible. But it will not work out in C++ because of this runtime extension of object's vtable.

vprajan