tags:

views:

471

answers:

2

This might be a dumb question, but maybe someone can provide some insight.

I have some global variables defined in a header file (yes yes I know that's bad, but this is just a hypothetical situation). I include this header file in two source files, which are then compiled into two object files. The global symbols are not referenced anywhere in the code.

If the source files are C, then it looks like the compiler omits the global symbols and everything links without errors. If the source files are C++, the symbols are included in both object files and then I get linker errors. For C++ I am using extern "C" when I include the header.

I am using the Microsoft compiler from VS2005.

Here is my code:

Header file (test.h):

#ifndef __TEST_H
#define __TEST_H

/* declaration in header file */
void *ptr;

#endif

C Source files:

test1.c

#include "test.h"

int main( ) {
    return 0;
}

test2.c

#include "test.h"

C++ Source Files:

test1.cpp

extern "C" {
#include "test.h"
}

int main( ) {
    return 0;
}

test2.cpp

extern "C" {
#include "test.h"
}

For C, the object files look something like this:

Dump of file test1.obj

File Type: COFF OBJECT

COFF SYMBOL TABLE
000 006DC627 ABS    notype       Static       | @comp.id
001 00000001 ABS    notype       Static       | @feat.00
002 00000000 SECT1  notype       Static       | .drectve
    Section length   2F, #relocs    0, #linenums    0, checksum        0
004 00000000 SECT2  notype       Static       | .debug$S
    Section length  228, #relocs    7, #linenums    0, checksum        0
006 00000004 UNDEF  notype       External     | _ptr
007 00000000 SECT3  notype       Static       | .text
    Section length    7, #relocs    0, #linenums    0, checksum 96F779C9
009 00000000 SECT3  notype ()    External     | _main
00A 00000000 SECT4  notype       Static       | .debug$T
    Section length   1C, #relocs    0, #linenums    0, checksum        0

String Table Size = 0x0 bytes

And for C++ they look something like this:

Dump of file test1.obj

File Type: COFF OBJECT

COFF SYMBOL TABLE
000 006EC627 ABS    notype       Static       | @comp.id
001 00000001 ABS    notype       Static       | @feat.00
002 00000000 SECT1  notype       Static       | .drectve
    Section length   2F, #relocs    0, #linenums    0, checksum        0
004 00000000 SECT2  notype       Static       | .debug$S
    Section length  228, #relocs    7, #linenums    0, checksum        0
006 00000000 SECT3  notype       Static       | .bss
    Section length    4, #relocs    0, #linenums    0, checksum        0
008 00000000 SECT3  notype       External     | _ptr
009 00000000 SECT4  notype       Static       | .text
    Section length    7, #relocs    0, #linenums    0, checksum 96F779C9
00B 00000000 SECT4  notype ()    External     | _main
00C 00000000 SECT5  notype       Static       | .debug$T
    Section length   1C, #relocs    0, #linenums    0, checksum        0

String Table Size = 0x0 bytes

I notice that _ptr is listed as UNDEF when I compile the C source, and it is defined when I compile the C++ source, which results in linker errors.

I understand that this is not a good thing to do in real life, I am just trying to understand why this is different.

Thanks.

+11  A: 

In C, identifiers have three different types of "linkage":

  1. external linkage: roughly, this is what people mean by "global variables". In common terms, it refers to identifiers that are visible "everywhere".
  2. internal linkage: these are objects that are declared with static keyword.
  3. no linkage: these are objects that are "temporary", or "automatic", such as variables declared inside a function (commonly referred as "local variables").

For objects with external linkage, you can have only one definition. Since your header file defines such an object and is included in two C files, it is undefined behavior (but see below). The fact that your C compiler doesn't complain does not mean it is OK to do so in C. For this, you must read the C standard. (Or, assuming no bugs in your compiler, if it is invoked in a standards-compliant mode, and if it complains about something [gives a diagnostic], it probably means your program isn't compliant.)

In other words, you can't test what is allowed by the language by testing something and checking if your compiler allows it. For this, you must read the standard.

Note that there is a subtle difference between definition and tentative definition.

$ cat a.c
int x = 0;
$ cat b.c
#include <stdio.h>
int x = 0;
int main(void)
{
    printf("%d\n", x);
    return 0;
}
$ gcc -ansi -pedantic -W -Wall -c a.c
$ gcc -ansi -pedantic -W -Wall -c b.c
$ gcc -o def a.o b.o
b.o:(.bss+0x0): multiple definition of `x'
a.o:(.bss+0x0): first defined here
collect2: ld returned 1 exit status

Now, let's change a.c:

$ cat a.c
int x; /* Note missing " = 0", so tentative definition */

Now compile it:

$ gcc -ansi -pedantic -W -Wall -c a.c
$ gcc -o def a.o b.o
$ ./def
0

We can change b.c instead:

$ cat a.c
int x = 0;
$ cat b.c
#include <stdio.h>
int x; /* tentative definition */
int main(void)
{
    printf("%d\n", x);
    return 0;
}
$ gcc -ansi -pedantic -W -Wall -c a.c
$ gcc -ansi -pedantic -W -Wall -c b.c
$ gcc -o def a.o b.o
$ ./def
0

A "tentative definition" becomes "real definition" in C if there is no other definition. So, we could have changed both files to contain int x;, and it would be legal C.

So, you may have a tentative definition in the header file. We need to see the actual code to be sure.

The C standard says that the following is undefined behavior (appendix J.2p1):

An identifier with external linkage is used, but in the program there does not exist exactly one external definition for the identifier, or the identifier is not used and there exist multiple external definitions for the identifier.

C++ may have different rules.

Edit: As per this thread on comp.lang.c++, C++ does not have tentative definitions. The reason being:

This avoids having different initialization rules for built-in types and user-defined types.

(The thread deals with the same question, btw.)

Now I am almost sure that OP's code contains what C calls "tentative definition" in the header file, which makes it legal in C and illegal in C++. We will know for sure only when we see the code though.

More information on "tentative definitions" and why they are needed is in this excellent post on comp.lang.c (by Chris Torek).

Alok
It may. Does it?
Hans Passant
I am not sure. Someone with more C++ knowledge will hopefully answer that. I didn't want to claim something I don't know for sure. (Of course, I hope I got everything regarding C right in my reply - but if not, I will be happy to be corrected!)
Alok
@nobugz: Please see my edit.
Alok
I posted the code, and the rules about tentative definitions appear to be what I was looking for. Thank you for the comments.
bde
+1  A: 

Don't define variables in headers - that is evil.

Only ever declare variables in headers - with an explicit extern keyword.

The difference is that C++ expressly requires the one definition rule - there may only be one definition of any given variable with external linkage in a C++ program.

Strictly, the C standard has the same requirement. However, Appendix J of the standard lists a common extension of allowing multiple uninitialized definitions to be treated as one - it is called 'common definitions' because it is similar to the behaviour of COMMON blocks in (old school) Fortran (Fortran IV, aka Fortran 66, and Fortran 77, for example).


Caveat: yes, if you know enough not to need to ask such questions, there can occasionally, very seldomly but just occasionally, be a reason for defining variables in a header. But such occasions are so few and far between that it is near enough accurate to say "don't define variables in headers".


Christoph raises an interesting point about 'static const variables'. One weasel-wording way around the issue is to claim that "a constant isn't a variable". However, Christoph is right: in C++ in particular, static const 'variables' are used. In C, I thought such constants would tend to elicit 'unused' warnings from compilers; however, GCC 4.4.2 does not give any warnings when given this minimal code:

static const int x = 3;
extern int p(void);
int p(void)
{
    return(3);
}

It does not complain about the unused x even under '-Wall -Wextra'. So, it is OK to define constants in headers - as in 'static const SomeType constName = InitialValue;'. At least, if your compiler is GCC, though the code also compiles without warning under Sun Studio compiler with 'cc -v':

C compiler: /compilers/v12/SUNWspro/bin/cc
cc: Sun C 5.9 SunOS_sparc Patch 124867-09 2008/11/25
acomp: Sun C 5.9 SunOS_sparc Patch 124867-09 2008/11/25
iropt: Sun Compiler Common 12 SunOS_sparc Patch 124861-13 2009/03/10
cg: Sun Compiler Common 12 SunOS_sparc Patch 124861-13 2009/03/10
ld: Software Generation Utilities - Solaris Link Editors: 5.10-1.486
Jonathan Leffler
there's nothing wrong with defining constants (ie `static const` variables) in headers; if you only declare them, compilers without link-time optimizations can't inline the values...
Christoph