tags:

views:

403

answers:

3

I have read some vague statement that virtual inheritance doesn't provide the memory structure required by COM, so we have to use the normal inheritance. Virtual inheritance is invented to handle the diamond problem.

Could someone show me an illustration of the difference of memory structure details between this two inherit approaches? And the key reason why virtual inheritance is not suitable for COM. A picture would be best.

Many thanks.

+4  A: 

First, in COM the behavior of virtual inheritance is always used. QueryInterface can't return a different value for e.g. the IUnknown base pointer depending on what derived class was used to obtain it.

But you're correct that this isn't the same mechanism as virtual inheritance in C++. C++ doesn't use a QueryInterface function to upcast, so it needs another way of getting the base class pointer.

The memory layout issue is caused because COM requires that all methods of a base interface can be called directly using a derived interface pointer. AddRef is a good example. In COM, you can call AddRef and pass any derived interface as the this pointer. In C++, the AddRef implementation would expect the this pointer to be of type IUnknown* const. The difference is that in C++, the caller finds the base pointer, while in COM the callee does the adjustment to find the base pointer, so each derived interface needs a distinct implementation (of QueryInterface, at least) aware of the offset from the derived interface pointer passed in to the base pointer.

At first glance, a C++ compiler could choose, as an implementation detail, to have the callee perform the adjustment just like COM. But the pointer-to-member-function rules aren't compatible with this implementation of virtual base classes.

Ben Voigt
+1  A: 

A COM coclass can implement multiple interfaces but each individual interface must implement a v-table with pointers to all of the methods brought in by its 'base' interfaces. At a minimum IUnknown. If it, say, implements IPersistFile then it must provide an implementation of the three IUnknown methods as well as IPersist::GetClassID. And the IPersistFile specific methods.

Which happens to match the behavior of most C++ compilers when they implement non-virtual multiple inheritance. The compiler sets up individual v-tables for each inherited (pure abstract) class. And fills it with method pointers so that one common class method implements all methods that the interfaces have in common. In other words, no matter how many interfaces are implemented, they all are serviced by one class method like QueryInterface, AddRef or Release.

Exactly the way you'd want it to work. Having one implementation of AddRef/Release makes reference counting simple to keep the coclass object alive, no matter how many different interface pointers you hand out. QueryInterface is trivial to implement, a simple cast provides the interface pointer to a v-table with the correct layout.

Virtual inheritance is not required. And most likely would break COM because the v-tables no longer have the required layout. Which is tricky for any compiler, review the /vm options for the MSVC compiler for example. That COM so uncannily is compatible with the typical behavior of a C++ compiler is not an accident.

Btw, this all hits the fan when a coclass want to implement multiple interfaces that have a method name in common that isn't meant to do the same thing. That's a pretty big oops and hard to deal with. Mentioned in ATL Internals (DAdvise?), I sadly forgot the solution.

Hans Passant
+1. Also worth noting that consumers can only use the designated layout for the interface they requested, and nothing more. Sometimes it may be tempting to cheat, if you are running single-apartment and control both the COM object and consumer. However, cross-apartment/DCOM calls won't work and will fail horribly. Interfaces subject to marshaling are actually proxies in place of remote objects, which almost never point to the original provider object.
meklarian
+1  A: 

COM interfaces are rather like JAVA interfaces in a way - they don't have data members. This means that interface inheritance is different to class inheritance when multiple inheritance is used.

To start with, consider non-virtual inheritance with diamond-shaped inheritance patterns...

  • B inherits A
  • C inherits A
  • D inherits B and C

An instance of D contains two separate instances of the data members of A. That means that when a pointer-to-A points into an instance of D, it needs to identify which instance of A within D it means - the pointer is different in each case, and pointer casts are not simple relabellings of the type - the address changes too.

Now consider the same diamond with virtual inheritance. Instances of B, C and D all contain a single instance of A. If you think of B and C having a fixed layout (including the A instance) this is a problem. If Bs layout is [A, x] and Cs layout is [A, y], then [B, C, z] is not valid for D - it would contain two instances of A. What you have to use is something like [A, B', C', z] where B' is everything from B except the inherited A etc.

This means that if you have a pointer-to-B, you don't have a single scheme for dereferencing the members inherited from A. Finding those members is different depending on whether the pointer points to a pure-B or a B-within-D or a B-within-something-else. The compiler needs some run-time clue (virtual tables) to find the inherited-from-A members. You end up needing several pointers to several virtual tables in the D instance, as theres a vtable for the inherited B and for the inherited C etc, implying some memory overhead.

Single inheritance doesn't have these issues. Memory layout of instances is kept simple, and virtual tables are simpler too. That's why Java disallows multiple inheritance for classes. In interface inheritance there are no data members, so again these problems simply don't arise - there's no issue of which-inherited-A-with-D, nor of different ways to find A-within-B depending on what that particular B happens to be within. Both COM and Java can allow multiple inheritance of interfaces without having to handle these complications.

EDIT

I forgot to say - without data members, there is no real difference between virtual and non-virtual inheritance. However, with Visual C++, the layout is probably different even if there are no data members - using the same rules for each inheritance style consistently irrespective of whether any data members are present or not.

Also, the COM memory-layout matches the Visual-C++ layout (for supported inheritance types) because it was designed to do that. There's no reason why COM couldn't have been designed to support multiple and virtual inheritance of "interfaces" with data members. Microsoft could have designed COM to support the same inheritance model as C++, but chose not to - and there's no reason why they should have done otherwise.

Early COM code was often written in C, meaning hand-written struct layouts that had to precisely match the Visual-C++ layout to work. Layouts for multiple and virtual inheritance - well, I wouldn't volunteer to do it manually. Besides, COM was always its own thing, meant to link code written in many different languages. It was never intended to be tied to C++.

YET MORE EDITING

I realised I missed a key point.

In COM, the only layout issue that matters is the virtual table, which only has to handle method dispatch. There are significant differences in layout depending on whether you take the virtual or non-virtual approach, similar to the layout of on object with data members...

  • For non-virtual, the D vtab contains an A-within-B vtab and an A-within-C vtab.
  • For virtual, the A only occurs once within Ds vtable, but the object contains multiple vtables and pointer casts need address changes.

With interface-inheritance, this is basically implementation detail - there's only one set of method implementations for A.

In the non-virtual case, the two copies of the A virtual table would be identical (leading to the same method implementations). Its a slightly larger virtual table, but the per-object overhead is less and the pointer casts are just type-relabelling (no address change). It's simpler and more efficient implementation.

COM can't detect the virtual case because there's no indicator in the object or vtable. Also, there's no point supporting both conventions when there's no data members. It just supports the one simple convention.

Steve314
I'm very uncomfortable getting the acceptance on this answer - how do I go about getting it taken away? - I answered thinking through, edited from continued thing it through, and there's something in here that probably explains a key aspect of why this happened in the early days of COM, but I only have a very limited and dated understanding of COM, and this answer simply isn't accurate. In DCOM, it simply cannot be right, and even in when-it-was-just-the-OLE2-back-end days, an implementation of QueryInterface could probably do whatever it likes in principle.
Steve314