views:

159

answers:

4

I've always wondered. I know that compilers convert the code you write into binaries but what do linkers do? They've always been a mystery to me.

I roughly understand what 'linking' is. It is when references to libraries and frameworks are added to the binary. I don't understand anything beyond that. For me it "just works". I also understand the basics of dynamic linking but nothing too deep.

Could someone explain the terms?

+6  A: 

In languages like 'C', individual modules of code are traditionally compiled separately into blobs of object code, which is ready to execute in every respect other than that all the references that module makes outside itself (i.e. to libraries or to other modules) have not yet been resolved (i.e. they're blank, pending someone coming along and making all the connections).

What the linker does is to look at all the modules together, look at what each module needs to connect to outside itself, and look at all the things it is exporting. It then fixes that all up, and produces a final executable, which can then be run.

Where dynamic linking is also going on, the output of the linker is still not capable of being run - there are still some references to external libraries not yet resolved, and they get resolved by the OS at the time it loads the app (or possibly even later during the run).

Will Dean
It's worth noting that some assemblers or compilers can output an executable file directly if the compiler "sees" everything necessary (typically in a single source file plus anything it #includes). A few compilers, typically for small micros, have that as their only mode of operation.
supercat
Yes, I tried to give a middle-of-the-road answer. Of course, as well as your case the opposite is true too, in that some kinds of object file don't even have the full code-generation done; that's done by the linker (that's how MSVC whole-program-optimisation works).
Will Dean
A: 

Hmmmm... Wikipedia to the rescue?

vanza
+2  A: 

When the compiler produces an object file, it includes entries for symbols that are defined in that object file, and references to symbols that aren't defined in that object file. The linker takes those and puts them together so (when everything works right) all the external references from each file are satisfied by symbols that are defined in other object files.

It then combines all those object files together and assigns addresses to each of the symbols, and where one object file has an external reference to another object file, it fills in the address of each symbol wherever it's used by another object. In a typical case, it'll also build a table of any absolute addresses used, so the loader can/will "fix up" the addresses when the file is loaded (i.e., it'll add the base load address to each of those addresses so they all refer to the correct memory address).

Quite a few modern linkers can also carry out some (in a few cases a lot) of other "stuff", such as optimizing the code in ways that are only possible once all of the modules are visible (e.g., removing functions that were included because it was possible that some other module might call them, but once all the modules are put together it's apparent that nothing ever calls them).

Jerry Coffin
+6  A: 

To understand linkers, it helps to first understand what happens "under the hood" when you convert a source file (such as a C or C++ file) into an executable file (a file that can be executed on your machine or someone else's machine running the same architecture).

Under the hood, when a program is compiled, the compiler converts the source file into object byte code. This byte code (sometimes called object code) is mnemonic instructions that only your computer architecture understands. Traditionally, these files have an .OBJ extension.

After the object file is created, the linker comes into play. More often then not, a real program that does anything useful will need to reference other files. In C, for example, a simple program to print your name to the screen would consist of:

printf("Hello Nick!\n");

When the compiler compiled your program into an obj file, it simply put a reference to the printf function. The linker resolves this reference. Most programming languages have a standard library of routines to cover the basic stuff expected from that language. The linker links your OBJ file with this standard library. The linker can also link your OBJ file with other OBJ files. You can create other OBJ files that have functions that can be called by another OBJ file. The linker works, almost like a word processor's copy and paste. It "copies" out all the necessary functions your program references and creates a single executable.

Note that not all operating systems create a single executable. Windows, for examples, uses DLL's that keep all these functions together in a single file. This reduces the size of your executable, but makes your executable dependant on these specific DLLs. DOS used to use things called Overlays (.OVL files). This had many purposes, but one was to keep commonly used functions together in 1 file (another purpose it served, in case you're wondering, was to be able to fit large programs into memory. DOS has a limitation in memory and overlays could be "unloaded" from memory and other overlays could be "loaded" on top of that memory, hence the name, "overlays"). Linux has shared libraries, which is basically the same idea as DLL's (hard core linux guys I know would tell me there are MANY BIG differences).

Hope this helps!

icemanind
Great answer. Additionally most modern linkers will remove redundant code like template instantiations.
Noah Roberts