tags:

views:

290

answers:

4

This question is based on a previous question: http://stackoverflow.com/questions/1917935/how-does-c-compilation-get-around-needing-header-files.

Confirmation that C# compilation makes use of multiple passes essentially answers my original question. Also, the answers indicated that C# uses type and method signature metadata stored in assemblies to check code syntax at compile time.

Q: how does C/C++/Objective-C know what code to load at run time that was linked at compile-time? And to tie it into a technology I'm familiar with, how does C#/CLR do this?

Correct me if I'm wrong, but for C#/CLR, my intuitive understanding is that certain paths are checked for assemblies upon execution, and basically all code is loaded and linked dynamically at run time.

Edit: Updated to include C++ and Objective-C with C.

Update: To clarify, what I really am curious about is how C/C++/Objective-C compilation matches an "externally defined" symbol in my source with the actual implementation of that code, what is the compilation output, and basically how the compilation output is executed by the microprocessor to seamlessly pass control into the library code (in terms of instruction pointer). I have done this with the CLR virtual machine, but am curious to know how this works conceptually in C++/Objective-C on an actual microprocessor.

+1  A: 

The C and C++ Standards have nothing to say about run-time loading - this is entirely OS-specific. In the case of Windows, one links the code with an export library (generated when a DLL is created) that contains the names of functions and the name of the DLL they are in. The linker creates stubs in the code containing this information. At run-time, these stubs are used by the C/C++ runtime together with the Windows LoadLibrary() and associated functions to load the function code into memory and execute it.

anon
To clarify, what I really am curious about is how a C/C++/ObjC program matches an "externally defined" symbol in one executable or library with the actual implementation of that code, and how that compares with C#/CLR. It's been a while since I compiled anything in C/C++/ObjC. So are you saying that for C/C++, but it sounds like all dependent library code is linked together at compile time and we end up with one monolothic executable? But Windows compilation does special stuff to achieve it's dynamic linking...
gWiz
We end up with one executable, but some of the executable code remains in the separate DLLs. And yes, there is "special stuff" in the Windows compile/link process, as there would be in the (for instance) Linux compile/link process. This special stuff is not specified by the programming language standards.
anon
A: 

By libraries you are referring to DLLs right?

There are certain patterns for OS to look for required files (usually start from application local path, then proceed to folder specify by environment variable <PATH>.)

YeenFei
Not necessarily. I might be using incorrect terminology. I am primarily a C# developer and a lot of this is abstracted away thanks to the CLR. By "libraries" I simply mean pre-existing code (in C/C++/ObjC that is pre-compiled object code; in CLR that's pre-compiled byte-code).
gWiz
+2  A: 

The linker plays an essential role in C/C++ building to resolve external dependencies. .NET languages don't use a linker.

There are two kinds of external dependencies, those whose implementation is available at link time in another .obj or .lib file offered as input to the linker. And those that are available in another executable module. A DLL in Windows.

The linker resolves the first ones at link time, nothing complicated happens since the linker will know the address of the dependency. The latter step is highly platform dependent. On Windows, the linker must be provided with an import library. A pretty simple file that merely declares the name of the DLL and a list of the exported definitions in the DLL. The linker resolves the dependency by entering a jump in the code and adding a record to the external dependency table that indicates the jump location so that it can be patched at runtime. The loading of the DLL and setting up the import table is done at runtime by the Windows loader. This is a bird's-eye view of the process, there are many boring details to make this happen as quickly as possible.

In managed code all of this is done at runtime, driven by the JIT compiler. It translates IL into machine code, driven by program execution. Whenever code executes that references another type, the JIT compiler springs into action, loads the type and translates the called method of the type. A side-effect of loading the type is loading the assembly that contains the type, if it wasn't loaded before.

Notable too is the difference for external dependencies that are available at build time. A C/C++ compiler compiles one source file at a time, the dependencies are resolved by the linker. A managed compiler normally takes all source files that create an assembly as input instead of compiling them one at a time. Separate compilation and linking is in fact supported (.netmodule and al.exe) but is not well supported by available tools and thus rarely done. Also, it cannot support features like extension methods and partial classes. Accordingly, a managed compiler needs many more system resources to get the job done. Readily available on modern hardware. The build process for C/C++ was established in an era where those resources were not available.

Hans Passant
Very interesting read
Chris Marisic
Awesome, thank you (accepted for including the CLR comparison).
gWiz
Cool. Didn't know about .netmodule.
luiscubal
+2  A: 

I believe the process you're asking about is the one called symbol resolution. In the common case, it works along these lines (I've tried to keep it pretty OS-neutral):

  1. The first step is compiling of individual source files to create object files. The source code is turned machine language instructions, and any symbols (ie. function or external variable names) that aren't defined in the source file itself result in placeholders being left in the compiled machine language code, wherever they are referenced. The unknown symbol is also added to a list in the object file - at the end of compilation, this list contains every unresolved symbol in the object file, cross-referenced with the location in the object file of all the placeholders that were added. Each object file also contains a list of the symbols exported by that object file - that is, the symbols defined in that object file that it wants to make visible to code outside that object file - along with the values of those symbols.

  2. The second step is static linking. This also happens at compile-time. During the static linking process, all of the object files created in the first step and any static library files (which are just a special kind of object file) are combined into a single executable. The static linker does a pass through the symbols exported by each object file and static library it has been told to link together, and builds a complete list of the exported symbols (and their values). It then does a pass through the unresolved symbols in each object file, and where the symbol is found in the master list, replaces all of the placeholders with the actual value of the symbol. For any symbols that still remain unresolved at the end of this process, the linker looks through the list of symbols exported by all dynamic libraries it knows about. It builds a list of dynamic libraries that are required, and stores this in the executable. If any symbols still haven't been found, the link process fails.

  3. The third step is dynamic linking, which happens at run time. The dynamic linker loads the dynamic libraries in the list contained in the executable, and replaces the placeholders for the remaining unresolved symbols with their corresponding values from the dynamic libraries. This can either be done "eagerly" - after the executable loads but before it runs - or "lazily", which is on-demand, when an unresolved symbol is first accessed.

caf
Thank you, very informative.
gWiz