Could someone exactly explain the concept of extern
variables in C? The declaration, exact use of extern
and its scope.
views:
3470answers:
6Extern is the keyword you use to declare that the variable itself resides in another translation unit.
So you can decide to use a variable in a translation unit and then access it from another one, then in the second one you declare it as extern and the symbol will be resolved by the linker.
If you don't declare it as extern you'll get 2 variables named the same but not related at all, and an error of multiple definitions of the variable.
A extern
variable is a declaration (thanks to sbi for the correction) of a variable which is defined in another translation unit. So that means that the variable is defined in another file...
Say you have to .c
-files test1.c
and test2.c
. If you define a global variable int test1_var;
in test1.c
and you'd like to access this variable in test2.c
you have to use extern int test1_var;
in test2.c
.
Complete sample:
$cat test1.c
int test1_var = 5;
$cat test2.c
#include <stdio.h>
extern int test1_var;
int main() {
printf("test1_var = %d\n", test1_var);
}
$gcc test1.c test2.c -o test
$./test
test1_var = 5
Adding an extern
turns a variable definition into a variable declaration. See this thread as to what's the difference between a declaration and a definition.
extern tells the compiler to trust you that the memory for this variable is declared elsewhere, so it doesnt try to allocate/check memory.
Therefore, you can compile a file that has reference to an extern, but you can not link if that memory is not declared somewhere.
Useful for global variables and libraries, but dangerous because the linker does not type check.
Using extern
is only of relevance when the program you're building consists of multiple source files linked together, where some of the variables defined, for example, in source file file1.c
need to be referenced in other source files, such as file2.c
.
It is important to understand the difference between defining a variable and declaring a variable:
- A variable is defined when the compiler allocates the storage for the variable.
- A variable is declared when the compiler is informed that a variable exists (and this is its type); it does not allocate the storage for the variable at that point.
You may declare a variable multiple times (though once is sufficient); you may only define it once within a given scope.
Best way to declare and define global variables
Although there are other ways of doing it, the clean, reliable way to declare and define global variables is to use a header file file3.h
to contain an extern
declaration of the variable. The header is included by the one source file that defines the variable and by all the source files that reference the variable. For each program, one source file (and only one source file) defines the variable. Similarly, one header file (and only one header file) should declare the variable.
file3.h
extern int global_variable; /* Declaration of the variable */
file1.c
#include "file3.h" /* Declaration made available here */
/* Variable defined here */
int global_variable = 37; /* Definition checked against declaration */
void increment(void) { return global_variable++; }
file2.c
#include "file3.h"
#include <stdio.h>
void use_it(void)
{
printf("Global variable: %d\n", global_variable++);
}
That's the best way to use them.
Guidelines
Rules to be broken by experts only, and only with good reason:
- A header file only contains
extern
declarations of variables - neverstatic
or unqualified variable definitions. - For any given variable, only one header file declares it (SPOT - Single Point of Truth).
- A source file never contains
extern
declarations of variables - source files always include the (sole) header that declares them. - For any given variable, exactly one source file defines the variable, preferably initializing it too. (Although there is no need to initialize explicitly to zero, it does no harm and can do some good, because there can be only one initialized definition of a particular global variable in a program).
- The source file that defines the variable also includes the header to ensure that the definition and the declaration are consistent.
- A function should never need to declare a variable using
extern
. - Avoid global variables whenever possible - use functions instead.
Not so good way to define global variables
With some (indeed, many) C compilers, you can get away with what's called a 'common' definition of a variable too. 'Common', here, refers to a technique used in Fortran for sharing variables between source files, using a (possibly named) COMMON block. What happens here is that each of a number of files provides a tentative definition of the variable. As long as no more than one file provides an initialized definition, then the various files end up sharing a common single definition of the variable:
file10.c
int i; /* Do not do this in portable code */
void inc(void) { i++; }
file11.c
int i; /* Do not do this in portable code */
void dec(void) { i--; }
file 12.c
int i = 9; /* Do not do this in portable code */
void put(void) { printf("i = %d\n", i); }
This technique does not conform to the letter of the C standard and the 'one definition rule', but the C standard lists it as a common variation on its one definition rule.
Because this technique is not always supported, it is best to avoid using it, especially if your code needs to be portable. Using this technique, you can also end up with unintentional type punning. If one of the files declared i
as a double
instead of as an int
, C's type-unsafe linkers probably would not spot the mismatch. If you're on a machine with 64-bit int
and double
, you'd not even get a warning; on a machine with 32-bit int
and 64-bit double
, you'd probably get a warning about the different sizes - the linker would use the largest size, exactly as a Fortran program would take the largest size of any common blocks.
This is mentioned in the C standard in informative Annex J as a common extension:
J.5.11 Multiple external definitions
There may be more than one external definition for the identifier of an object, with or without the explicit use of the keyword extern; if the definitions disagree, or more than one is initialized, the behavior is undefined (6.9.2).
Warning
As noted in comments here, and as stated in my answer to a similar question, using multiple definitions for a global variable leads to undefined behaviour, which is the standard's way of saying "anything could happen". One of the things that can happen is that the program behaves as you expect; and J.5.11 says, approximately, "you might be lucky more often than you deserve". But a program that relies on multiple definitions of an extern variable - with or without the explicit 'extern' keyword - is not a strictly conforming program and not guaranteed to work everywhere. Equivalently: it contains a bug which may or may not show itself.
Violating the guidelines
Note 1: if the header defines the variable without the extern
keyword:
faulty_header.h
int some_var; /* Do not do this in a header!!! */
Then each file that includes the header creates a tentative definition of the variable.
Note 2: if the header defines and initializes the variable, then only one source file in a given program can use the header:
broken_header.h
int some_var = 13; /* Only one source file in a program can use this */
Note 3: if the header defines a static variable (with or without initialization), then each source file ends up with its own private version of the 'global' variable.
seldom_correct.h
static int hidden_global = 3; /* Each source file gets its own copy */
When the variable is actually a complex array, this can lead to extreme duplication of code. It can, very occasionally, be a sensible way to achieve some effect, but that is rather unusual.
Use the header technique I showed first. It works reliably and everywhere. Note, in particular, that the header declaring the global_variable
is included in every file that uses it - including the one that defines it. This ensures that everything is self-consistent.
Similar concerns arise with declaring and defining functions - analogous rules apply. But the question was about variables specifically, so I've kept the answer to variables only.
I like to think of an extern variable as a promise that you make to the compiler.
When encountering an extern, the compiler can only find out its type, not where it "lives", so it can't resolve the reference.
You are telling it, "Trust me. At link time this reference will be resolvable."