views:

67

answers:

2

I guess I'm not linking something right?

I want to call ABC.cpp which needs XYZ.h and XYZ.cpp. All are in my current directory and I've tried #include <XYZ.h> as well as#include "XYZ.h".

Running $ g++ -I. -l. ABC.cpp at the Ubuntu 10 Terminal gives me:

`/tmp/ccCneYzI.o: In function `ABC(double, double, unsigned long)':
ABC.cpp:(.text+0x93): undefined reference to `GetOneGaussianByBoxMuller()'
collect2: ld returned 1 exit status`

Here's a summary of ABC.cpp:

#include "XYZ.h"
#include <iostream>
#include <cmath>

using namespace std;

double ABC(double X, double Y, unsigned long Z)
{
...stuff...
}

int main()
{
...cin, ABC(cin), return, cout...
}

Here's XYZ.h:

#ifndef XYZ_H
#define XYZ_H

double GetOneGaussianByBoxMuller();
#endif

Here's XYZ.cpp:

#include "XYZ.h"
#include <cstdlib>
#include <cmath>

// basic math functions are in std namespace but not in Visual C++ 6
//(comment's in code but I'm using GNU, not Visual C++)

#if !defined(_MSC_VER)
using namespace std;
#endif



double GetOneGaussianByBoxMuller()
{
...stuff...
}

I'm using GNU Compiler version g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3.

This is my first post; I hope I included everything that someone would need to know to help me. I have actually read the "Related Questions" and the Gough article listed in one of the responses, as well as searchd around for the error message. However, I still can't figure out how it applies to my problem.

Thanks in advance!

+3  A: 

Try

g++ ABC.cpp XYZ.cpp

If you want to compile the seprately you need to build object files:

g++ -c ABC.cpp
g++ -c XYZ.cpp
g++ ABC.o XYZ.o
Martin York
Wow ... _really_ fast response, thank you so much. Let me try that out and get back to you about how it worked out.
Lao Tzu
That worked! Thank you.Reading the definition of the -c flag from the GCC manual I see: Compile or assemble the source files, but do not link. The linking stage simply is not done. The ultimate output is in the form of an object file for each source file.So, I can read it, but I don't understand it.Anyway, thank you!
Lao Tzu
+3  A: 

When you run g++ -I. -l. ABC.cpp you are asking the compiler to create an executable out of ABC.cpp. But the code in this file replies on a function defined in XYZ.cpp, so the executable cannot be created due to that missing function.

You have two options (depending on what it is that you want to do). Either you give the compiler all of the source files at once so that it has all the definitions, e.g.

 g++ -I. -l. ABC.cpp XYZ.cpp

or, you use the -c option compile to ABC.cpp to object code (.obj on Windows, .o in Linux) which can be linked later, e.g.

 g++ -I. -l. -c ABC.cpp

Which will produce ABC.o which can be linked later with XYZ.o to produce an executable.

Edit: What is the difference between #including and linking?

Understanding this fully requires understanding exactly what happens when you compile a C++ program, which unfortunately even many people who consider themselves to be C++ programmers do not. At a high level, the compilation of a C++ program goes through three stages: preprocessing, compilation, and linking.

Preprocessing

Every line that starts with # is a preprocessor directive which is evaluated at the preprocessing stage. The #include directive is literally a copy-and-paste. If you write #include "XYZ.h", the preprocessor replaces that line with the entire contents of XYZ.h (including recursive evaluations of #include within XYZ.h).

The purpose of including is to make declarations visible. In order to use the function GetOneGaussianByBoxMuller, the compiler needs to know that GetOneGaussianByBoxMuller is a function, and to know what (if any) arguments it takes and what value it returns, the compiler will need to see a declaration for it. Declarations go in header files, and header files are included to make declarations visible to the compiler before the point of use.

Compiling

This is the part where the compiler runs and turns your source code into machine code. Note that machine code is not the same thing as executable code. An executable requires additional information about how to load the machine code and the data into memory, and how to bring in external dynamic libraries if necessary. That's not done here. This is just the part where your code goes from C++ to raw machine instructions.

Unlike Java, Python, and some other languages, C++ has no concept of a "module". Instead, C++ works in terms of translation units. In nearly all cases, a translation unit corresponds to a single (non-header) source code file, e.g. ABC.cpp or XYZ.cpp. Each translation unit is compiled independently (whether you run separate -c commands for them, or you give them to the compiler all at once).

When a source file is compiled, the preprocessor runs first, and does the #include copy-pasting as well as macros and other things that the preprocessor does. The result is one long stream of C++ code consisting of the contents of the source file and everything included by it (and everything included by what it included, etc...) This long stream of code is the translation unit.

When the translation unit is compiled, every function and every variable used must be declared. The compiler will not allow you to call a function for which there is no declaration or to use a global variable for which there is no declaration, because then it wouldn't know the types, parameters, return values, etc, involved and could not generate sensible code. That's why you need headers -- keep in mind that at this point the compiler is not even remotely aware of the existence of any other source files; it is only considering this stream of code produced by the processing of the #include directives.

In the machine code produced by the compiler, there are no such things as variable names or function names. Everything must become a memory address. Every global variable must be translated to a memory address where it is stored, and every function must have a memory address that the flow of execution jumps to when it is called. For things that are defined (i.e. for functions, implemented) in the translation unit, the compiler can assign an address. For things that are only declared (usually as a result of included headers) and not defined, the compiler does not at this point know what the memory address should be. These functions and global variables for which the compiler has only a declaration but not a definition/implementation, are called external symbols, and they are presumed to exist in a different translation unit. For now, their memory addresses are represented with placeholders.

For example, when compiling the translation unit corresponding to ABC.cpp, it has a definition (implementation) of ABC, so it can assign an address to the function ABC and wherever in that translation unit ABC is called, it can create a jump instruction to that address. On the other hand, although its declaration is visible, GetOneGaussianByBoxMuller is not implemented in that translation unit, so its address must be represented with a placeholder.

The result of compiling a translation unit is an object file (with the .o suffix on Linux).

Linking

One of the main jobs of the linker is to resolve external symbols. That is, the linker looks through a set of object files, sees what their external symbols are, and then tries to find out what memory address should be assigned to them, replacing the placeholder.

In your case the function GetOneGaussianByBoxMuller is defined in the translation unit corresponding to XYZ.cpp, so inside XYZ.o it has been assigned a specific memory address. In the translation unit corresponding to ABC.cpp, it was only declared, so inside ABC.o, it is only a placeholder (external symbol). The linker, if given both ABC.o and XYZ.o will see that ABC.o needs an address filled in for GetOneGaussianByBoxMuller, and it fill find that address in XYZ.o, and replace the placeholder in ABC.o with it. Addresses for external symbols can also be found in libraries.

If the linker fails to find an address for GetOneGaussianByBoxMuller (as it does in your example where it is only working on ABC.o, as a result of not having passed XYZ.cpp to the compiler), it will report an unresolved external symbol error, also described as an undefined reference.

Finally, once the compiler has resolved all external symbols, it combines all of the now-placeholder-free object code, adds in all the loading information that the operating system needs, and produces an executable. Tada!

Note that through all of this, the names of the files don't matter one bit. It's a convention that XYZ.h should contain declarations for things that are defined in XYZ.cpp, and it's good for maintainable code to organize things that way, but the compiler and linker don't care one bit whether that's true or not. The linker will look through all the object files it's given and only the object files it's given to try to resolve a symbol. It neither knows nor cares which header the declaration of the symbol was in, and it will not try to automatically pull in other object files or compile other source files in order to resolve a missing symbol.

... wow, that was long.

Tyler McHenry
Wow ... _really_ fast response; thank you so much. Let me try that out and get back about how it worked out.
Lao Tzu
Thanks for this explanation. It makes more sense now. I still don't understand the difference between linking and including -- to me it seems like the compiler just has to have a list of the objects and then it can go ahead and do what it needs with those definitions.Anyway, thanks! It works now.
Lao Tzu
@Lao Tzu: the difference is that including just copies-and-pastes the contents of the included file (XYZ.h) into your source file. That's all. To produce an executable, the compiler needs to see the *definition* of the function, i.e. the function body. That's in XYZ.cpp, not XYZ.h. You can this information to the compiler by showing it all the source files at once, or you can do it by compiling the source files separately and then linking them together later. Which you do depends on the needs of your build procedures.
Steve Jessop
If you've used Java, you'll know that `javac` goes out hunting into the filesystem to find the source files of the dependencies of your code, and compiles them too, because it knows the name and location of the .java file from the name and package of the class you need. C++ compilers do not do that - at absolute best they can find any .h files named in `#include` statements on your include path, and any libraries named on the command line in your library path. They never assume for themselves that a function is to be found in `ABC.o`, let alone `ABC.cpp`, just because it's declared in `ABC.h`.
Steve Jessop
I haven't used Java, but about half of that makes sense anyway. You're saying it wouldn't make sense for the compiler to look in `QWERTY.cpp` for objects that are mentioned in `QWERTY.h`, because who knows, the object might be in `YTREWQ.cpp` instead.But I don't _exactly_ know what you mean when you say `include` copy-pastes the contents and `link` copy-pastes the definition. Do you mean that `link` ing pastes some kind of machine readable code while `include` just pastes some more instructions? If so, what's the use of `includ` ing anything?
Lao Tzu
I don't have enough room to explain here so I'm going to explain the difference between include and link in an edit to this answer.
Tyler McHenry