views:

191

answers:

2

It seems that my problem is a bug in MSVC. I'm using the Visual Studio 2008 with Service Pack 1, and my code works with GCC (as tested on codepad.org).

Any official info on this bug? Any ideas how to work around it? Is the bug fixed in VS2010? All insights would be greatly appreciated.

The code:

struct Base {
    Base(int i = 0) : i(i) {}
    virtual ~Base() {}
    virtual Base *clone() const = 0;

protected:
    int i;
};

struct A : virtual public Base {
    A() {}
    virtual A *clone() const = 0;
};

struct B : public A {
    B() {}
    B *clone() const { return new B(*this); }

    /// MSVC debugger shows that 'b' is for some reason missing the Base
    /// portion of it's object ("Error: expression cannot be evaluated")
    /// and trying to access 'b.i' causes an unhandled exception.
    ///
    /// Note: This only seems to occur with MSVC
    B(const B &b) : Base(b.i), A() {}
};

void foo(const A &elem) {
    A *a = elem.clone();
    if (a) delete a;
}

int main() {
    A *a = new B;
    foo(*a);
    delete a;
}
+5  A: 

It looks as though the compiler is not correctly adjusting the this pointer when calling through A::clone. If you remove the declaration of A::clone then everything works fine.

Digging in deeper, when you have A::clone, the vtable looks like this:

    [0x0]   0x002f1136 [thunk]:B::`vector deleting destructor'`vtordisp{4294967292,0}' (unsigned int)   void *
    [0x1]   0x002f11e0 [thunk]:B::clone`vtordisp{4294967292,0}' (void)  void *
    [0x2]   0x002f12ad [thunk]:B::clone`vtordisp{4294967292,4}' (void)  void *
    [0x3]   0x002f12a3 B::clone(void)   void *

And foo calls elem.__vfptr[2], offsetting this incorrectly by -4 bytes. Without A::clone, the vtable looks like this:

    [0x0]   0x00ee1136 [thunk]:B::`vector deleting destructor'`vtordisp{4294967292,0}' (unsigned int)   void *
    [0x1]   0x00ee11e0 [thunk]:B::clone`vtordisp{4294967292,0}' (void)  void *
    [0x2]   0x00ee12a3 B::clone(void)   void *

And foo calls elem.__vfptr[1]. That does not adjust this at all (and the code assumes that this will be equal to Base instead of B).

So it looks like the compiler assumes that A::clone is a new virtual method and doesn't override Base::clone when determining whether A requires a new virtual table, but then some other code later determines that A does not need a virtual table. You can verify this by comparing sizeof(B) with or without a new virtual function:

struct A : virtual public Base {
    A() {}
    virtual A *clone() const = 0;
}; //sizeof(B)==16

struct A : virtual public Base {
    A() {}
    virtual A *clone() const = 0;
virtual const A *clone2() const { return this; }
}; //sizeof(B)==20

So it's a compiler bug.

MSN
A: 

It would seem (from some testing) that the bug is caused by the combination of a virtual base class with a pure virtual method using covariant return types.

Since dropping either the pure virtual method from the Base class, or making Base a non-virtual base class or making the clone() method non-covariant seems to solve the bug.

I guess this one's solved for me (after I submit a bug report to MS), and I'm even left with a few options to circumvent it. :)

ntx