When you run g++ -I. -l. ABC.cpp
you are asking the compiler to create an executable out of ABC.cpp
. But the code in this file replies on a function defined in XYZ.cpp
, so the executable cannot be created due to that missing function.
You have two options (depending on what it is that you want to do). Either you give the compiler all of the source files at once so that it has all the definitions, e.g.
g++ -I. -l. ABC.cpp XYZ.cpp
or, you use the -c
option compile to ABC.cpp to object code (.obj on Windows, .o in Linux) which can be linked later, e.g.
g++ -I. -l. -c ABC.cpp
Which will produce ABC.o
which can be linked later with XYZ.o
to produce an executable.
Edit: What is the difference between #including and linking?
Understanding this fully requires understanding exactly what happens when you compile a C++ program, which unfortunately even many people who consider themselves to be C++ programmers do not. At a high level, the compilation of a C++ program goes through three stages: preprocessing, compilation, and linking.
Preprocessing
Every line that starts with #
is a preprocessor directive which is evaluated at the preprocessing stage. The #include
directive is literally a copy-and-paste. If you write #include "XYZ.h"
, the preprocessor replaces that line with the entire contents of XYZ.h
(including recursive evaluations of #include
within XYZ.h
).
The purpose of including is to make declarations visible. In order to use the function GetOneGaussianByBoxMuller
, the compiler needs to know that GetOneGaussianByBoxMuller
is a function, and to know what (if any) arguments it takes and what value it returns, the compiler will need to see a declaration for it. Declarations go in header files, and header files are included to make declarations visible to the compiler before the point of use.
Compiling
This is the part where the compiler runs and turns your source code into machine code. Note that machine code is not the same thing as executable code. An executable requires additional information about how to load the machine code and the data into memory, and how to bring in external dynamic libraries if necessary. That's not done here. This is just the part where your code goes from C++ to raw machine instructions.
Unlike Java, Python, and some other languages, C++ has no concept of a "module". Instead, C++ works in terms of translation units. In nearly all cases, a translation unit corresponds to a single (non-header) source code file, e.g. ABC.cpp
or XYZ.cpp
. Each translation unit is compiled independently (whether you run separate -c
commands for them, or you give them to the compiler all at once).
When a source file is compiled, the preprocessor runs first, and does the #include
copy-pasting as well as macros and other things that the preprocessor does. The result is one long stream of C++ code consisting of the contents of the source file and everything included by it (and everything included by what it included, etc...) This long stream of code is the translation unit.
When the translation unit is compiled, every function and every variable used must be declared. The compiler will not allow you to call a function for which there is no declaration or to use a global variable for which there is no declaration, because then it wouldn't know the types, parameters, return values, etc, involved and could not generate sensible code. That's why you need headers -- keep in mind that at this point the compiler is not even remotely aware of the existence of any other source files; it is only considering this stream of code produced by the processing of the #include
directives.
In the machine code produced by the compiler, there are no such things as variable names or function names. Everything must become a memory address. Every global variable must be translated to a memory address where it is stored, and every function must have a memory address that the flow of execution jumps to when it is called. For things that are defined (i.e. for functions, implemented) in the translation unit, the compiler can assign an address. For things that are only declared (usually as a result of included headers) and not defined, the compiler does not at this point know what the memory address should be. These functions and global variables for which the compiler has only a declaration but not a definition/implementation, are called external symbols, and they are presumed to exist in a different translation unit. For now, their memory addresses are represented with placeholders.
For example, when compiling the translation unit corresponding to ABC.cpp
, it has a definition (implementation) of ABC
, so it can assign an address to the function ABC
and wherever in that translation unit ABC
is called, it can create a jump instruction to that address. On the other hand, although its declaration is visible, GetOneGaussianByBoxMuller
is not implemented in that translation unit, so its address must be represented with a placeholder.
The result of compiling a translation unit is an object file (with the .o
suffix on Linux).
Linking
One of the main jobs of the linker is to resolve external symbols. That is, the linker looks through a set of object files, sees what their external symbols are, and then tries to find out what memory address should be assigned to them, replacing the placeholder.
In your case the function GetOneGaussianByBoxMuller
is defined in the translation unit corresponding to XYZ.cpp
, so inside XYZ.o
it has been assigned a specific memory address. In the translation unit corresponding to ABC.cpp
, it was only declared, so inside ABC.o
, it is only a placeholder (external symbol). The linker, if given both ABC.o
and XYZ.o
will see that ABC.o
needs an address filled in for GetOneGaussianByBoxMuller
, and it fill find that address in XYZ.o
, and replace the placeholder in ABC.o
with it. Addresses for external symbols can also be found in libraries.
If the linker fails to find an address for GetOneGaussianByBoxMuller
(as it does in your example where it is only working on ABC.o
, as a result of not having passed XYZ.cpp
to the compiler), it will report an unresolved external symbol error, also described as an undefined reference.
Finally, once the compiler has resolved all external symbols, it combines all of the now-placeholder-free object code, adds in all the loading information that the operating system needs, and produces an executable. Tada!
Note that through all of this, the names of the files don't matter one bit. It's a convention that XYZ.h
should contain declarations for things that are defined in XYZ.cpp
, and it's good for maintainable code to organize things that way, but the compiler and linker don't care one bit whether that's true or not. The linker will look through all the object files it's given and only the object files it's given to try to resolve a symbol. It neither knows nor cares which header the declaration of the symbol was in, and it will not try to automatically pull in other object files or compile other source files in order to resolve a missing symbol.
... wow, that was long.