Never turn your back on C++. It'll getcha.
I'm in the habit of writing unit tests for everything I do. As part of this I frequently define classes with names like A and B, in the .cxx of the test to exercise code, safe in the knowledge that i) because this code never becomes part of a library or is used outside of the test, name collisions are likely very rate and ii) the worst that could happen is that the linker will complain about multiply defined A::A() or what every and I'll fix that error. How wrong I was.
Here are two compilation units:
#include <iostream>
using namespace std;
// Fwd decl.
void runSecondUnit();
class A {
public:
A() : version( 1 ) {
cerr << this << " A::A() --- 1\n";
}
virtual ~A() {
cerr << this << " A::~A() --- 1\n";
}
int version; };
void runFirstUnit() {
A a;
// Reports 1, correctly.
cerr << " a.version = " << a.version << endl;
// If you uncomment these, you will call
// secondCompileUnit: A::getName() instead of A::~A !
//A* a2 = new A;
//delete a2;
}
int main( int argc, char** argv ) {
cerr << "firstUnit BEGIN\n";
runFirstUnit();
cerr << "firstUnit END\n";
cerr << "secondUnit BEGIN\n";
runSecondUnit();
cerr << "secondUnit END\n";
}
and
#include <iostream>
using namespace std;
void runSecondUnit();
// Uncomment to fix all the errors:
//#define USE_NAMESPACE
#if defined( USE_NAMESPACE )
namespace mySpace
{
#endif
class A {
public:
A() : version( 2 ) {
cerr << this << " A::A() --- 2\n";
}
virtual const char* getName() const {
cerr << this << " A::getName() --- 2\n"; return "A";
}
virtual ~A() {
cerr << this << " A::~A() --- 2\n";
}
int version;
};
#if defined(USE_NAMESPACE )
} // mySpace
using namespace mySpace;
#endif
void runSecondUnit() {
A a;
// Reports 1. Not 2 as above!
cerr << " a.version = " << a.version << endl;
cerr << " a.getName()=='" << a.getName() << "'\n";
}
Ok, ok. Obviously I shouldn't have declared two classes called A. My bad. But I bet you can't guess what happens next...
I compiled each unit, and linked the two object files (successfully) and ran. Hmm...
Here's the output (g++ 4.3.3):
firstUnit BEGIN
0x7fff0a318300 A::A() --- 1
a.version = 1
0x7fff0a318300 A::~A() --- 1
firstUnit END
secondUnit BEGIN
0x7fff0a318300 A::A() --- 1
a.version = 1
0x7fff0a318300 A::getName() --- 2
a.getName()=='A'
0x7fff0a318300 A::~A() --- 1
secondUnit END
So there are two separate A classes. In the second use, the destructor and constructor for the first on was used, even though only the second one was in visible in its compilation unit. Even more bizarre, if I uncomment the lines in runFirstUnit, instead of calling either A::~A, the A::getName is called. Clearly in the first use, the object gets the vtable for the second definition (getName is the second virtual function in the second class, the destructor the second in the first). And it even correcly gets the constructor from the first.
So my question is, why didn't the linker complain about the multiply defined symbols. It appears to choose the first match. Reordering the objects in the link step confirm.
The behavior is identical in Visual Studio, so I'm guessing that this is some standard-defined behavior. My question is, why? Clearly it would be easy for the linker to barf given the duplicate names. If I add,
void f() {}
to both files it complains. Why not for my class constructors and destructors?
EDIT The problem isn't, "what should I have done to avoid this", or "how is the behavior explained". It is, "why don't linkers catch it?" Projects may have thousands of compile units. Sensible naming practices don't really solve this issue -- they only make the problem obscure and only then if you can train everyone to follow them.
The above example leads to ambiguous behavior that is easy and definitively solvable by compiler tools. So, why do they not? Is this simply a bug. (I suspect not.)
** EDIT ** See litb's answer below. I'm repeating is back to make sure my understanding's right:
Linkers only generate warnings for strong references. Because we have shared headers, inline function definitions (i.e. where declaration and definition is made at the same place, or template functions) are be compiled into multiple object files for each TU that sees them. Because there's no easy way to restrict the generation this code to a single object file, the linker has the job of choosing one of many definitions. So that errors are not generated by the linker, the symbols for these compiled definitions are tagged as weak references in the object file.