views:

254

answers:

4

When I write the following program:

file 1:

#include <stdio.h>    
int global;    
void print_global1() {
        printf("%p\n", &global);
}

file 2:

#include <stdio.h>
char global;    
void print_global2() {
        printf("%p\n", &global);
}

file 3:

void print_global1();
void print_global2();
int main()
{
        print_global1();
        print_global2();

        return 0;
}

output:

$ ./a.out
0x804a01c
0x804a01c

Here is my question:

  • Why are the linker implementing "int global" and "char global" as the same global variable:
  • How come the compiler does not complain (not the smallest warning with -Wall -Wextra -ansi ...)
  • How are the size of the global variable managed (the size of int and char are different)

PS: The second question is architecture/compiler related, so lets take the gcc or Visual C++ (for C) with the int size as 32 bits

EDIT: THIS IS NOT A QUESTION FOR C++ BUT for C!

I use gcc version 4.4.1 and on Ubuntu 9.10, Here is the compilation console output:

$ ls
global_data1.c  global_data2.c  global_data.c

$ gcc -Wall -Wextra -ansi global_data*.c
$ ./a.out
0x804a01c
0x804a01c
or 
$ gcc -Wall -Wextra -ansi -c global_data*.c
$ gcc -Wall -Wextra -ansi global_data*.o
$ ./a.out
0x804a01c
0x804a01c
+1  A: 

Which compiler are you using. What is the platform? With g++ I get

/tmp/cc8Gnf4h.o:(.bss+0x0): multiple definition of `global'
/tmp/ccDQHZn2.o:(.bss+0x0): first defined here
/usr/bin/ld: Warning: size of symbol `global' changed from 4 in a.o to 1 in b.o

AFAIR, in C++ the variables in different translation units much have the exactly same declaration to work.

Amit Kumar
+1  A: 

The linker allows for having duplicate external data like this (although I'm surprised that the different types don't cause a problem). Which one you get depends upon the order of your object files on your link command line.

Scott Smith
+1 Is there a way tool which will allow me to know the size of the variable "global" ?.
Phong
+7  A: 

gcc does not report any error/warnings. But g++ does.

EDIT:

Looks like C allows tentative definitions for a variable.

In your case both the global definitions are tentative and in that case the first one seen by the linker is chosen.

Change your file2 to:

char global = 1; // no more tentative...but explicit.

Now if you compile like before, the tentative def in file1 will be ignored.

Make both the def explicit by:

int global = 1; // in file1

char global = 1; // in file2

now neither can be ignored and we get the multiple def error.

codaddict
I am not doing C++ but C program (thats why i didn't put the C++ flag in my answer)! It looks like the C standard specification allows this behavior (but I am not really sure). I can still use the C++ linker to check if there is multiple definition but I am not really sure if it is really safe...
Phong
@Phong: you are right..the C std allows this. I've updated my answer.
codaddict
@codaddict: +1, Thanks for the quick update! I now better understand how it is handle by the compiler.
Phong
+1 for *tentative definitions*
Scott Smith
It is not so much C that allows tentative definitions as it is recognized as a common extension in Appendix J of the C99 standard. See also: http://stackoverflow.com/questions/1987413/inclusion-of-unused-symbols-in-object-files-by-compiler-in-c-vs-c/1987675#1987675
Jonathan Leffler
@Jonathan: tentative definitions have always been a part of the ANSI C standard. Section 3.7.2 in the C89 standard describes them. See http://groups.google.com/group/comp.lang.c/msg/47ae65fdb11e7111 for a very good description.
Alok
@Alok: you're correct - what I was intending to refer to was the 'common extension' in J.5.11 'Multiple external definitions' where it says "There may be more than one external definition for the identifier of an object, with orwithout the explicit use of the keyword extern; if the definitions disagree, or more than one is initialized, the behavior is undefined".
Jonathan Leffler
+3  A: 

This has to do with something called "tentative definition" in C. First, if you assign to global in both file1 and file2, you will get an error in C. This is because global is not tentatively defined in file1 and file2 anymore, it is really defined.

From the C standard (emphasis mine):

A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static, constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.

For your case, "translation unit" (basically) each source file.

About "composite types":

For an identifier with internal or external linkage declared in a scope in which a prior declaration of that identifier is visible, if the prior declaration specifies internal or external linkage, the type of the identifier at the later declaration becomes the composite type.

For more on tentative definitions, see this question and its answers.

It seems like for your case, it should be undefined behavior because global is defined at the end of the translation units, so you get two definitions of global, and what's worse, they are different. Looks like the linker by default doesn't complain about this though.

GNU ld has an option called --warn-common, which warns you for multiple tentative definitions (common symbol is linker's name for tentatively defined variables):

$ gcc -Wl,--warn-common file*.c
/tmp/ccjuPGcq.o: warning: common of `global' overridden by larger common
/tmp/ccw6nFHi.o: warning: larger common is here

From the manual:

If there are only (one or more) common symbols for a variable, it goes in the uninitialized data area of the output file. The linker merges multiple common symbols for the same variable into a single symbol. If they are of different sizes, it picks the largest size. The linker turns a common symbol into a declaration, if there is a definition of the same variable.

The --warn-common option can produce five kinds of warnings. Each warning consists of a pair of lines: the first describes the symbol just encountered, and the second describes the previous symbol encountered with the same name. One or both of the two symbols will be a common symbol.

Alok
@alok: +1 Thanks for teaching me the -Wl,--warn-common gcc option.
Phong