tags:

views:

216

answers:

3

Never turn your back on C++. It'll getcha.

I'm in the habit of writing unit tests for everything I do. As part of this I frequently define classes with names like A and B, in the .cxx of the test to exercise code, safe in the knowledge that i) because this code never becomes part of a library or is used outside of the test, name collisions are likely very rate and ii) the worst that could happen is that the linker will complain about multiply defined A::A() or what every and I'll fix that error. How wrong I was.

Here are two compilation units:

#include <iostream>
using namespace std;

// Fwd decl.
void runSecondUnit();

class A {
public:
   A() : version( 1 ) {
      cerr << this << "   A::A()  --- 1\n";
   }    
   virtual ~A()   {
      cerr << this << "   A::~A()  --- 1\n";
   }

   int version;    };

void runFirstUnit()  {
   A a;
   // Reports 1, correctly.
   cerr << "   a.version = " << a.version << endl;
   // If you uncomment these, you will call
   // secondCompileUnit: A::getName() instead of A::~A !
   //A* a2 = new A;
   //delete a2;
}

int main( int argc, char** argv )  {
   cerr << "firstUnit BEGIN\n";
   runFirstUnit();
   cerr << "firstUnit END\n";

   cerr << "secondUnit BEGIN\n";
   runSecondUnit();
   cerr << "secondUnit END\n";
}

and

#include <iostream>
using namespace std;

void runSecondUnit();

// Uncomment to fix all the errors:
//#define USE_NAMESPACE
#if defined( USE_NAMESPACE )
   namespace mySpace
   {
#endif

class A  {
   public:
   A() :  version( 2 )  {
      cerr << this << "   A::A()  --- 2\n";
   }

   virtual const char* getName() const {
      cerr << this << "   A::getName()  --- 2\n"; return "A";
   }

   virtual ~A()  {
      cerr << this << "   A::~A()  --- 2\n";
   }

   int version;
};


#if defined(USE_NAMESPACE )
   } // mySpace
   using namespace mySpace;
#endif

void runSecondUnit() {   
   A a;   
   // Reports 1. Not 2 as above!
   cerr << "   a.version = " << a.version << endl;
   cerr << "   a.getName()=='" << a.getName() << "'\n";    
}

Ok, ok. Obviously I shouldn't have declared two classes called A. My bad. But I bet you can't guess what happens next...

I compiled each unit, and linked the two object files (successfully) and ran. Hmm...

Here's the output (g++ 4.3.3):

firstUnit BEGIN
0x7fff0a318300   A::A()  --- 1
   a.version = 1
0x7fff0a318300   A::~A()  --- 1
firstUnit END
secondUnit BEGIN
0x7fff0a318300   A::A()  --- 1
   a.version = 1
0x7fff0a318300   A::getName()  --- 2
   a.getName()=='A'
0x7fff0a318300   A::~A()  --- 1
secondUnit END

So there are two separate A classes. In the second use, the destructor and constructor for the first on was used, even though only the second one was in visible in its compilation unit. Even more bizarre, if I uncomment the lines in runFirstUnit, instead of calling either A::~A, the A::getName is called. Clearly in the first use, the object gets the vtable for the second definition (getName is the second virtual function in the second class, the destructor the second in the first). And it even correcly gets the constructor from the first.

So my question is, why didn't the linker complain about the multiply defined symbols. It appears to choose the first match. Reordering the objects in the link step confirm.

The behavior is identical in Visual Studio, so I'm guessing that this is some standard-defined behavior. My question is, why? Clearly it would be easy for the linker to barf given the duplicate names. If I add,

 void f() {}

to both files it complains. Why not for my class constructors and destructors?

EDIT The problem isn't, "what should I have done to avoid this", or "how is the behavior explained". It is, "why don't linkers catch it?" Projects may have thousands of compile units. Sensible naming practices don't really solve this issue -- they only make the problem obscure and only then if you can train everyone to follow them.

The above example leads to ambiguous behavior that is easy and definitively solvable by compiler tools. So, why do they not? Is this simply a bug. (I suspect not.)

** EDIT ** See litb's answer below. I'm repeating is back to make sure my understanding's right:

Linkers only generate warnings for strong references. Because we have shared headers, inline function definitions (i.e. where declaration and definition is made at the same place, or template functions) are be compiled into multiple object files for each TU that sees them. Because there's no easy way to restrict the generation this code to a single object file, the linker has the job of choosing one of many definitions. So that errors are not generated by the linker, the symbols for these compiled definitions are tagged as weak references in the object file.

+3  A: 

A simple way to restrict each class to the current translation unit is to enclose it in an anonymous namespace:

// a.cpp
namespace {
  class A {
    // ...
  };
}

// b.cpp
namespace {
  class A {
    // ...
  };
}

is perfecetly legal. Because the two classes are in separate translation units, and are inside anonymous namespaces, they won't conflict.

jalf
That's a better pattern for my regression tests. Thanks. But what about the question -- why doesn't the linker generate an error?
+5  A: 

The compiler and linker relies on both classes to be exactly the same. In your case, they are different and so strange things happen. The one definition rule says that the result is undefined behavior - so behavior is not at all required to be consistent among compilers. . I suspect that in runFirstUnit, in the delete line, it puts a call to the first virtual table entry (because in its translation unit, the destructor may occupy the first entry).

In the second translation unit, this entry happens to point to A::getName, but in the first translation unit (where you execute the delete), the entry points to A::~A. Since these two are differently named (A::~A vs A::getName) you don't get a name clash (you will have code emitted for both the destructor and getName). But since their class name is the same, their v-tables will clash on purpose, because since both classes have the same name, the linker will think they are the same class and assume same contents.

Notice that all member functions were defined in-class, which means they are all inline functions. These functions can be defined multiple times in a program. In the case of in-class definitions, the rationale is that you may include the same class definition into different translation units from their header files. Your test function, however, isn't an inline function and thus including it into different translation units will triggers a linker error.

If you enable namespaces, there will be no clash what-so ever, because ::A and ::mySpace::A are different classes, and of course will get different v-tables.

Johannes Schaub - litb
But the constructors have the same name. Why no name clash for them? Seems perfectly possible and the bugs avoidable.Also, adding namespaces (except for the anonymous one) is no guarantee of avoiding name clashes if both files use the same namespace.
There will be a name clash too. But since the constructors are in-line, the linker will assume they are all defined the same way and will emit only one definition of them into the binary (the address of an inline function is required to be the same across translation units), discarding the other definitions. Try to define them outside the class definition and you will get linker errors too, likely
Johannes Schaub - litb
It's somewhat like template instantiations. If you do swap(a, b); with both a and b being integers from two translation units, you wouldn't expect a linker error either, although the compiler may have generated a function both times for doing the swap. The magic is, the linker will merge all instantiations so they end up as only one function instance in the end. In practice, it will usually throw away all but one of them, as far as i know.
Johannes Schaub - litb
In your test, the first translation unit didn't use the same namespace. So using `mySpace::A` in the other one, it would effectively solve the issue, of course.
Johannes Schaub - litb
Are the constructor's really inlined? I don't think so. Why? Because object receives the vtable pointer to the other. Doesn't the constructor set the vtable pointer? How is that possible without code from the other translation unit? Hmmm... let me try...
So... that's interesting. Moving the definition outside of the scope of the class *does* generate a linker error -- suggesting inlining. However, when left as is, the second unit calls the first unit's constructor -- this can only happen because of linking. Therefore both sets of constructor names must be available to linker. Which again leaves us with, why is it happy with that?
Hmm... the template analogy is plausible for an explanation. However, for non template code, it seems clear to me that multiple definitions of symbols should be disallowed. For template generated code, it is also suspect of different units contain different implementations (although linkers don't mind that either).
The constructors and all functions that are defined in-class are inline functions. But that doesn't mean that calls to them are actually inlined. It merely means that you can have multiple definitions of them in the program, provided all of them have the same content.
Johannes Schaub - litb
In GCC, inline functions are achieved by weak linking: If multiple symbols have weak linkage, the linker will pick one symbol of them. If there are multiple weak symbols, but *one* strong symbol, then that strong symbol is used. That's why the second translation unit will use the non-inline constructor definition provided in the first translation unit. But this behavior is not guaranteed by the standard: It's a result of undefined behavior. If a function is inline in one translation unit, it must be inline in every other translation unit too.
Johannes Schaub - litb
Ah wait - the second-calls-first happens with the inline constructors? If so, then this is not because the first TU contains strong constructor symbols, but because the linker just decided to drop the second'd TUs constructor definition instead of the first. There are many things the linker won't check, which includes many things that concern cross-TU definitions.
Johannes Schaub - litb
I gave a more theoretical description on it here: http://stackoverflow.com/questions/908830/isnt-cs-inline-totally-optional/910686#910686 . It doesn't concern linkers, but rather the language itself. An experiment on `weak` can be found here: http://stackoverflow.com/questions/617554/override-a-function-call-in-c/617588#617588
Johannes Schaub - litb
So I guess the answer is that linkers can only reasonably generate warnings for strong references. I'll add a summary of what I understood at the bottom of the question. Thanks litb.
A: 

The functions are defined as inline. inline functions can be defined multiple times in the program. See point 3 in the summary here:

http://en.wikipedia.org/wiki/One_Definition_Rule

The important point is:

For a given entity, each definition must be the same.

Try not defining the functions as inline. The linker should start to give duplicate symbol errors then.

brone
The functions are provably not inlined. Why? Because there are interactions between the code in each compile unit. The first unit may call the second destructor. The second unit calls the first's constructor. Only the linker can cause this behavior.
Also. The problem is easily solvable in many ways. This question is why it is a problem at all?
It doesn't matter if the functions are actually inlined; provided they are defined (or declared) inline, they count as inline, and multiple definitions are permitted. This is why there is no error message generated: it's not an error to have multiple definitions in this case.
brone