views:

1432

answers:

9

I seem to recall reading somewhere that the cost of a virtual call in C# is not as high, relatively speaking, as in C++. Is this true? If so - why?

+2  A: 

The cost of a virtual call in C++ is that of a function call through a pointer (vtbl). I doubt that C# can do that one faster and still being able to determine object type at runtime...

Edit: As Pete Kirkham pointed out, a good JIT might be able to inline the C# call, avoiding a pipeline stall; something most C++ compilers cannot do (yet). On the other hand, Ian Ringrose mentioned the impact on cache usage. Adding to that the JIT itself running, and (strictly personally) I wouldn't bother really unless profiling on the target machine under realistic workloads has proven the one to be faster than the other. It's micro-optimization at best.

DevSolar
+3  A: 

I guess this assumption is based on JIT-compiler, meaning that C# probably converts a virtual call into a simple method call a bit before it is actually used.

But it's essentially theoretical and i would not bet on it !

Benoît
Even if this were the case; that "converting into a simple method call" is not for free either, now is it?
DevSolar
No, of course not. But you would have nothing left to pay when doing the actual call (like paying in advance).
Benoît
Point is, the next time you make that call, you'd have to check the object again. In general, this might change, so obj.foo() refers to a different foo each time. Note that C++ compilers often can convert the virtual call to a normal call as well, if the object type is known at compile time.
MSalters
+1  A: 

Not sure about the full framework but in the Compact Framework it will be slower cause CF has no virtual call tables although it does cache the result. This means that a virtual call in CF will be slower the first time it is called as it has to do a manual lookup. It may be slow every time it is called if the app is low on memory as the cached lookup may be pitched.

Quibblesome
+1  A: 

For JIT compiled languages (I don't know if CLR does this or not, Sun's JVM does), it's a common optimisation to convert a virtual call which has only two or three implementations into a sequence of tests on the type and direct or inline calls.

The advantage of this is that modern pipelined CPUs can use branch prediction and prefetching of direct calls, but an indirect call (represented by a function pointer in high level languages) often results in the pipeline stalling.

In the limiting case, where there is only one implementation of the virtual call and the body of the call is small enough, the virtual call reduced to purely inline code. This technique was used in the Self language runtime, which the JVM evolved from.

Most C++ compilers don't perform the whole program analysis required to perform this optimisation, but projects such as LLVM are looking at whole program optimisations such as this.

Pete Kirkham
Are you sure it always causes a pipeline stall? There's no reason a CPU can't prefetch an indirect call (and I imagine it's something Intel would target specifically...). If it can be prefetched then the overhead of a virtual function will be zero.
Jimmy J
It was a few years ago I last checked, so I may be wrong. The only reference I can find on Intel for predicting indirect branches is in their profile directed compilation; most of their docs just say they're 'very hard to predict', and other research says 99% of stalls are from indirect calls.
Pete Kirkham
I think the 2nd time the same indirect call is taken from the same call site, there will not be a stall. E.g if a loop is maing a virtual call on many object of the same type it will be OK.
Ian Ringrose
Intel VTune will detect such stalls, so if you have it you can run a test. In the meantime I've weakened the 'always' to 'often'.
Pete Kirkham
+5  A: 

A C# virtual call has to check for “this” being null and a C++ virtual call does not. So I can’t see in generally why a C# virtual calls would be faster. In special cases the C# compiler (or JIT compiler) may be able to inline the virtual call better then a C++ compiler, as a C# compiler has access to better type information. The call method instruction may sometimes be slower in C++, as the C# JIT may be able to use a quicker instruction that only copes with a small offset as it know more about the runtime memory layout and processor model then a C++ compiler.

However we are talking about a handful of processor instruction at most here. On a modem superscalar processor, it is very possible that the “null check” instruct is run at the same time as the “call method” and therefore takes no time.

It is also very likely that all the processor instructions will already in be the level 1 cache if the call is make in a loop. But the data is less likely to be caches, the cost of reading a data value from main memory these days is the same as running 100s of instructions from the level 1 cache. Therefore it is unlucky that in real applications the cost of a virtual call is even measurable in more then a very few places.

The fact that the C# code uses a few more instructions will of course reduce the amount of code that can fit in the cache, the effect of this is impossible to predict.

Ian Ringrose
A: 

In C# it might be possible to convert a virtual function to non-virtual by analysing the code. In practice it won't happen often enough to make much difference.

Jimmy J
A: 

C# flattens the vtable and inlines ancestor calls so you don't chain up the inheritance hierarchy to resolve anything.

Peter Wone
A: 

It may be not exactly the answer to your question, but although .NET JIT optimizes the virtual calls as everyone said before, profile-guided optimization in Visual Studio 2005 and 2008 does virtual call speculation by inserting a direct call to the most likely targeted function, inlining the call, so the weight may be the same.

macbirdie
+4  A: 

The original question says:

I seem to recall reading somewhere that the cost of a virtual call in C# is not as high, relatively speaking, as in C++.

Note the emphasis. In other words, the question might be rephrased as:

I seem to recall reading somewhere that in C#, virtual and non-virtual calls are equally slow, whereas in C++ a virtual call is slower than a non-virtual call...

So the questioner is not claiming that C# is faster than C++ under any circumstances.

Possibly a useless diversion, but this sparked my curiosity concerning C++ with /clr:pure, using no C++/CLI extensions. The compiler produces IL that gets converted to native code by the JIT, although it is pure C++. So here we have a way of seeing what a standard C++ implementation does if running on the same platform as C#.

With a non-virtual method:

struct Plain
{
    void Bar() { System::Console::WriteLine("hi"); }
};

This code:

Plain *p = new Plain();
p->Bar();

... causes the call opcode to be emitted with the specific method name, passing Bar an implicit this argument.

call void <Module>::Plain.Bar(valuetype Plain*)

Compare with an inheritance hierarchy:

struct Base
{
    virtual void Bar() = 0;
};

struct Derived : Base
{
    void Bar() { System::Console::WriteLine("hi"); }
};

Now if we do:

Base *b = new Derived();
b->Bar();

That emits the calli opcode instead, which jumps to a computed address - so there's a lot of IL before the call. By turning it back in to C# we can see what is going on:

**(*((int*) b))(b);

In other words, cast the address of b to a pointer to int (which happens to be the same size as a pointer) and take the value at that location, which is the address of the vtable, and then take the first item in the vtable, which is the address to jump to, dereference it and call it, passing it the implicit this argument.

We can tweak the virtual example to use C++/CLI extensions:

ref struct Base
{
    virtual void Bar() = 0;
};

ref struct Derived : Base
{
    virtual void Bar() override { System::Console::WriteLine("hi"); }
};

Base ^b = gcnew Derived();
b->Bar();

This generates the callvirt opcode, exactly as it would in C#:

callvirt instance void Base::Bar()

So when compiling to target the CLR, Microsoft's current C++ compiler doesn't have the same possibilities for optimization as C# does when using the standard features of each language; for a standard C++ class hierarchy, the C++ compiler generates code that contains hard-coded logic for traversing the vtable, whereas for a ref class it leaves it to the JIT to figure out the optimal implementation.

Daniel Earwicker
This is about C++ on top of the CLR, which isn't really fair play IMHO.
DevSolar
Also, what does "fair" mean, in this context?
Daniel Earwicker
Johann asked about virtual calls in C# being "cheaper" than in C++. I take "C++" to mean "C++ compiled to native". So MSC++ cannot compile a C++ virtual call into IL "callvirt". The "why not?" has to be aimed at the MS compiler, not the language, doesn't it?
DevSolar
The part where I said "Possibly a useless diversion, but this sparked my curiosity concerning..." is where you can stop reading if you don't care.
Daniel Earwicker
:-) I like this. I purposely kept the question phrasing a bit open-ended.
Johann Gerell
I didn't mean to offend. Perhaps my hindbrain didn't register that disclaimer long enough to disclaim the whole of your article. ;-)
DevSolar
No problem - I wasn't offended, actually more worried by that point that I had wasted other people's time as well as my own... :)
Daniel Earwicker