views:

1102

answers:

7

We all know what virtual functions are in C++, but how are they implemented at a deep level?

Can the vtable be modified or even directly accessed at runtime?

Does the vtable exist for all classes, or only those that have at least one virtual function?

Do abstract classes simply have a NULL for the function pointer of at least one entry?

Does having a single virtual function slow down the whole class? Or only the call to the function that is virtual? And does the speed get affected if the virtual function is actually overwritten or not, or does this have no effect so long as it is virtual.

A: 

Usually with a VTable, an array of pointers to functions.

Lou Franco
+1  A: 

Each object has a vtable pointer that points to an array of member functions.

+27  A: 

How are virtual functions implemented at a deep level?

From "Virtual Functions in C++"

Whenever a program has a virtual function declared, a v - table is constructed for the class. The v-table consists of addresses to the virtual functions for classes that contain one or more virtual functions. The object of the class containing the virtual function contains a virtual pointer that points to the base address of the virtual table in memory. Whenever there is a virtual function call, the v-table is used to resolve to the function address. An object of the class that contains one or more virtual functions contains a virtual pointer called the vptr at the very beginning of the object in the memory. Hence the size of the object in this case increases by the size of the pointer. This vptr contains the base address of the virtual table in memory. Note that virtual tables are class specific, i.e., there is only one virtual table for a class irrespective of the number of virtual functions it contains. This virtual table in turn contains the base addresses of one or more virtual functions of the class. At the time when a virtual function is called on an object, the vptr of that object provides the base address of the virtual table for that class in memory. This table is used to resolve the function call as it contains the addresses of all the virtual functions of that class. This is how dynamic binding is resolved during a virtual function call.

Can the vtable be modified or even directly accessed at runtime?

Universally, I believe the answer is "no". You could do some memory mangling to find the vtable but you still wouldn't know what the function signature looks like to call it. Anything that you would want to achieve with this ability (that the language supports) should be possible without access to the vtable directly or modifying it at runtime. Also note, the C++ language spec does not specify that vtables are required - however that is how most compilers implement virtual functions.

Does the vtable exist for all objects, or only those that have at least one virtual function?

I believe the answer here is "it depends on the implementation" since the spec doesn't require vtables in the first place. However, in practice, I believe all modern compilers only create a vtable if a class has at least 1 virtual function. There is a space overhead associated with the vtable and a time overhead associated with calling a virtual function vs a non-virtual function.

Do abstract classes simply have a NULL for the function pointer of at least one entry?

The answer is it is unspecified by the language spec so it depends on the implementation. Calling the pure virtual function results in undefined behavior if it is not defined (which it usually isn't) (ISO/IEC 14882:2003 10.4-2). In practice it does allocate a slot in the vtable for the function but does not assign an address to it. This leaves the vtable incomplete which requires the derived classes to implement the function and complete the vtable. Some implementations do simply place a NULL pointer in the vtable entry; other implementations place a pointer to a dummy method that does something similar to an assertion.

Note that an abstract class can define an implementation for a pure virtual function, but that function can only be called with a qualified-id syntax (ie., fully specifying the class in the method name, similar to calling a base class method from a derived class). This is done to provide an easy to use default implementation, while still requiring that a derived class provide an override.

Does having a single virtual function slow down the whole class or only the call to the function that is virtual?

This is getting to the edge of my knowledge, so someone please help me out here if I'm wrong!

I believe that only the functions that are virtual in the class experience the time performance hit related to calling a virtual function vs. a non-virtual function. The space overhead for the class is there either way. Note that if there is a vtable, there is only 1 per class, not one per object.

Does the speed get affected if the virtual function is actually overridden or not, or does this have no effect so long as it is virtual?

I don't believe the execution time of a virtual function that is overridden decreases compared to calling the base virtual function. However, there is an additional space overhead for the class associated with defining another vtable for the derived class vs the base class.

Additional Resources:

http://www.codersource.net/published/view/325/virtual_functions_in.aspx

http://en.wikipedia.org/wiki/Virtual_table

http://www.codesourcery.com/public/cxx-abi/abi.html#vtable

Burly
note i changed objects to classes for the 3rd question
Brian R. Bondy
It would not be in line with Stroustrup's philosophy of C++ for a compiler to put an unnecessary vtable pointer in an object which doesn't need it. The rule is that you don't get overhead that isn't in C unless you ask for it, and it's rude for compilers to break that.
Steve Jessop
I agree that it would be foolish for any compiler that takes itself seriously to use a vtable when no virtual functions exist. However, I felt it important to point out that, to my knowledge, the C++ standard does not /require/ it, so be warned before depending on it.
Burly
Thanks Brian! (and thanks onebyone, I ran out of characters)
Burly
Actually, forget that, I briefly thought he was asking whether the objects point to the vtable, but on re-reading he's just asking whether the vtable exists for the class. Since it would be 0-size, the point is moot :-)
Steve Jessop
I have added my comment to the answer to the second question as a separate answer.
Andrew Stein
A great answer, an example for others!
Roel
Even virtual functions can be called non-virtually. This is in fact quite common: if the object is on the stack, within scope the compiler will know the exact type and optimizes out the vtable lookup.This is especially true for the dtor, which must be called in the same stack scope.
MSalters
There are a number of tricks and gotchas that the compiler can use/introduce, depending on its implementation. I tried to stay away implementation specific answers as much as I could, but that was impossible since his questions are not answered by the language spec. I had to draw the line somewhere.
Burly
Good addition though, MSalters. Thanks!
Burly
I believe when a class that has at least one virtual function, every object has a vtable, and not one for the entire class.
Asaf R
Asaf R - I don't believe this is correct. You define functions on a per class basis, not on a per instance basis. Therefore there is no need for vtables on a per object basis. However, there is some special "magic" during object construction/destruction since the object is not fully formed yet.
Burly
Common implementation: Each object has a pointer to a vtable; the class owns the table. The construction magic simply consists of updating the vtable pointer in the derived ctor, after the base ctor has finished.
MSalters
+1  A: 

This answer has been incorporated into the Community Wiki answer

  • Do abstract classes simply have a NULL for the function pointer of at least one entry?

The answer for that is that it is unspecified - calling the pure virtual function results in undefined behavior if it is not defined (which it usually isn't) (ISO/IEC 14882:2003 10.4-2). Some implementations do simply place a NULL pointer in the vtable entry; other implementations place a pointer to a dummy method that does something similar to an assertion.

Note that an abstract class can define an implementation for a pure virtual function, but that function can only be called with a qualified-id syntax (ie., fully specifying the class in the method name, similar to calling a base class method from a derived class). This is done to provide an easy to use default implementation, while still requiring that a derived class provide an override.

Michael Burr
Also, I don't think that an abstract class can define an implementation for a pure virtual function. By defintion, a pure virtual function has no body (e.g. bool my_func() = 0;). You can however, provide implementations for regular virtual functions.
Burly
A pure virtual function can have a definition. See Scott Meyers' "Effective C++, 3rd Ed" Item #34, ISO 14882-2003 10.4-2, or http://bytes.com/forum/thread572745.html
Michael Burr
Ahh, right you are. Nice Mike!
Burly
A: 

Burly's answers are correct here except for the question:

Do abstract classes simply have a NULL for the function pointer of at least one entry?

The answer is that no virtual table is created at all for abstract classes. There is no need since no objects of these classes can be created!

In other words if we have:

class B { ~B() = 0; }; // Abstract Base class
class D : public B { ~D() {} }; // Concrete Derived class

D* pD = new D();
B* pB = pD;

The vtbl pointer accessed through pB will be the vtbl of class D. This is exactly how polymorphism is implemented. That is, how D methods are accessed through pB. There is no need for a vtbl for class B.

In response to Mike's comment below...

If the B class in my description has a virtual method foo() that is not overridden by D and a virtual method bar() that is overridden, then D's vtbl will have a pointer to B's foo() and to its own bar(). There is still no vtbl created for B.

Andrew Stein
This is not correct for 2 reasons: 1) an abstract class may have regular virtual methods in addition to pure virtual methods, and 2) pure virtual methods may optionally have a definition that can be called with a fully qualified name.
Michael Burr
Yes, this is incorrect in the general case.
Burly
Right - on second thought I imagine that if all virtual methods were pure virtual the compiler might optimize the vtable away (it would need help form the linker to ensure there were no definitions as well).
Michael Burr
Mike and Burly: Think about it again.....
Andrew Stein
+1  A: 
  • Can the vtable be modified or even directly accessed at runtime?

Not portably, but if you don't mind dirty tricks, sure!

In most compiler's I've seen, the vtbl * is the first 4 bytes of the object, and the vtbl contents are simply an array of member pointers there (generally in the order they were declared, with the base class's first). There are of course other possible layouts, but that's what I've generally observed.

class A {
  public:
  virtual int f1() = 0;
};
class B : public A {
  public:
  virtual int f1() { return 1; }
  virtual int f2() { return 2; }
};
class C : public A {
  public:
  virtual int f1() { return -1; }
  virtual int f2() { return -2; }
};

A *x = new B;
A *y = new C;
A *z = new C;

Now to pull some shenanigans...

Changing class at runtime:

std::swap(*(void **)x, *(void **)y);
// Now x is a C, and y is a B! Hope they used the same layout of members!

Replacing a method for all instances (monkeypatching a class)

This one's a little trickier, since the vtbl itself is probably in read-only memory.

int f3(A*) { return 0; }

mprotect(*(void **)x,8,PROT_READ|PROT_WRITE|PROT_EXEC);
// Or VirtualProtect on win32; this part's very OS-specific
(*(int (***)(A *)x)[0] = f3;
// Now C::f1() returns 0 (remember we made x into a C above)
// so x->f1() and z->f1() both return 0

The latter is rather likely to make virus-checkers and the link wake up and take notice, due to the mprotect manipulations. In a process using the NX bit it may well fail.

puetzk
+1  A: 

You can recreate the functionality of virtual functions in C++ using function pointers as members of a class and static functions as the implementations, or using pointer to member functions and member functions for the implementations. There are only notational advantages between the two methods... in fact virtual function calls are just a notational convenience themselves. In fact inheritance is just a notational convenience... it can all be implemented without using the language features for inheritance. :)

The below is crap untested, probably buggy code, but hopefully demonstrates the idea.

e.g.

class Foo
{
protected:
 void(*)(Foo*) MyFunc;
public:
 Foo() { MyFunc = 0; }
 void ReplciatedVirtualFunctionCall()
 {
  MyFunc(*this);
 }
...
};

class Bar : public Foo
{
private:
 static void impl1(Foo* f)
 {
  ...
 }
public:
 Bar() { MyFunc = impl1; }
...
};

class Baz : public Foo
{
private:
 static void impl2(Foo* f)
 {
  ...
 }
public:
 Baz() { MyFunc = impl2; }
...
};
jheriko