views:

179

answers:

5

I have read the existing questions on external/internal linkage over here on SO. My question is different - what happens if I have multiple definitions of the same variable with external linkage in different translation units under C and C++?

For example:

/*file1.c*/

typedef struct foo {
    int a;
    int b;
    int c;
} foo;

foo xyz;


/*file2.c*/

typedef struct abc {
    double x;
} foo;

foo xyz;

Using Dev-C++ and as a C program, the above program compiles and links perfectly; whereas it gives a multiple redefinition error if the same is compiled as a C++ program. Why should it work under C and what's the difference with C++? Is this behavior undefined and compiler-dependent? How "bad" is this code and what should I do if I want to refactor it (i've come across a lot of old code written like this)?

A: 

The C program permits this and treats the memory a little like a union. It will run, but may not give you what you expected.

The C++ program (which is stronger typed) correctly detects the problem and asks you to fix it. If you really want a union, declare it as one. If you want two distinct objects, limit their scope.

Michael J
The C behaviour may be true on your implementation but it's not guaranteed by the language.
Charles Bailey
A variable name is just a label for a memory address. If you provide two definitions for how to interpret that label, that doesn't magicly make the label refer to two different objects.Have you ever seen a linker that will behave differently to that?
Michael J
I don't deny that this is the usual linker behaviour, this behaviour is used by other languages and many C implementations. The implication from your answer, though, was that it is a well-defined behaviour. Allowing more than one external definition in a program is a common extension, according to the C standard Annex J, but even with this extension if the definitions don't agree it results in undefined behaviour.
Charles Bailey
+1  A: 

C++ does not allow a symbol to be defined more than once. Not sure what the C linker is doing, a good guess might be that it simply maps both definitions onto the same symbol, which would of course cause severe errors.

For porting I would try to put the contents of individual C-files into anonymous namespaces, which essentially makes the symbols different, and local to the file, so they don't clash with the same name elsewhere.

tobing
Sure it can be defined more than once. The definitions have to be identical, though.
Potatoswatter
@Potatoswatter: Objects must be _defined_ only once, they may be _declared_ multiple times. `inline` functions are special in that they may be defined once per translation unit but other functions must be defined only once in each program.
Charles Bailey
Sorry, my bad :P
Potatoswatter
+1  A: 

This is caused by C++'s name mangling. From Wikipedia:

The first C++ compilers were implemented as translators to C source code, which would then be compiled by a C compiler to object code; because of this, symbol names had to conform to C identifier rules. Even later, with the emergence of compilers which produced machine code or assembly directly, the system's linker generally did not support C++ symbols, and mangling was still required.

With regards to compatibility:

In order to give compiler vendors greater freedom, the C++ standards committee decided not to dictate the implementation of name mangling, exception handling, and other implementation-specific features. The downside of this decision is that object code produced by different compilers is expected to be incompatible. There are, however, third party standards for particular machines or operating systems which attempt to standardize compilers on those platforms (for example C++ ABI[18]); some compilers adopt a secondary standard for these items.

From http://www.cs.indiana.edu/~welu/notes/node36.html the following example is given:


For example for the below C code

int foo(double*);
double bar(int, double*);

int foo (double* d) 
{
    return 1;
}

double bar (int i, double* d) 
{
    return 0.9;
}

Its symbol table would be (by dump -t)

[4]  0x18        44       2     1 0 0x2 bar
[5]  0x0         24       2     1 0 0x2 foo

For same file, if compile in g++, then the symbol table would be

[4]  0x0         24       2     1 0 0x2 _Z3fooPd
[5]  0x18        44       2     1 0 0x2 _Z3bariPd

_Z3bariPd means a function whose name is bar and whose first arg is integer and second argument is pointer to double.


hlovdal
+4  A: 

Both C and C++ have a "one definition rule" which is that each object may only be defined once in any program. Violations of this rule cause undefined behaviour which means that you may or may not see a diagnostic message when compiling.

There is a language difference between the following declarations at file scope, but it does not directly concern the problem with your example.

int a;

In C this is a tentative definition. It may be amalgamated with other tentative definitions in the same translation unit to form a single definition. In C++ it is always a definition (you have to use extern to declare an object without defining it) and any subsequent definitions of the same object in the same translation unit are an error.

In your example both translation units have a (conflicting) definition of xyz from their tentative definitions.

Charles Bailey
A: 

You have found the One Definition Rule. Clearly your program has a bug, since

  • There can only be one object named foo once the program is linked.
  • If some source file includes all the header files, it will see two definitions of foo.

C++ compilers can get around #1 because of "name mangling": the name of your variable in the linked program may be different from the one you chose. In this case, it isn't required, but it's probably how your compiler detected the problem. #2, though, remains, so you can't do that.

If you really want to defeat the safety mechanism, you can disable mangling like this:

extern "C" struct abc foo;

… other file …

extern "C" struct foo foo;

extern "C" instructs the linker to use C ABI conventions.

Potatoswatter
Oh, of course, as someone else mentioned, you should just use a `union` instead.
Potatoswatter