views:

187

answers:

3

Hello. I am trying to fully understand the process pro writing code in some language to execution by OS. In my case, the language would be C and the OS would be Windows. So far, I read many different articles, but I am not sure, whether I understand the process right, and I would like to ask you if you know some good articles on some subjects I couldn´t find.

So, what I think I know about C (and basically other languages):

C compiler itself handles only data types, basic math operations, pointers operations, and work with functions. By work with functions I mean how to pass argument to it, and how to get output from function. During compilation, function call is replaced by passing arguments to stack, and than if function is not inline, its call is replaced by some symbol for linker. Linker than find the function definition, and replace the symbol to jump adress to that function (and of course than jump back to program).

If the above is generally true and I get it right, where to final .exe file actually linker saves the functions? After the main() function? And what creates the .exe header? Compiler or Linker?

Now, additional capabilities of C, today known as C standart library is set of functions and the declarations of them, that other programmers wrote to extend and simplify use of C language. But these functions like printf() were (or could be?) written in different language, or assembler. And there comes my next question, can be, for example printf() function be written in pure C without use of assembler?

I know this is quite big question, but I just mostly want to know, wheather I am right or not. And trust me, I read a lots of articles on the web, and I would not ask you, If I could find these infromation together on one place, in one article. Insted I must piece by piece gather informations, so I am not sure if I am right. Thanks.

+6  A: 

I think that you're exposed to some information that is less relevant as a beginning C programmer and that might be confusing you - part of the goal of using a higher level language like this is to not have to initially think about how this process works. Over time, however, it is important to understand the process. I think you generally have the right understanding of it.

The C compiler merely takes C code and generates object files that contain machine language. Most of the object file is taken by the content of the functions. A simple function call in C, for example, would be represented in the compiled form as low level operators to push things into the stack, change the instruction pointer, etc.

The C library and any other libraries you would use are already available in this compiled form.

The linker is the thing that combines all the relevant object files, resolves all the dependencies (e.g., one object file calling a function in the standard library), and then creates the executable.

As for the language libraries are written in: Think of every function as a black box. As long as the black box has a standard interface (the C calling convention; that is, it takes arguments in a certain way, returns values in a certain way, etc.), how it is written internally doesn't matter. Most typically, the functions would be written in C or directly in assembly. By the time they make it into an object file (or as a compiled library), it doesn't really matter how they were initially created, what matters is that they are now in the compiled machine form.

The format of an executable depends on the operating system, but much of the body of the executable in windows is very similar to that of the object files. Imagine as if someone merged together all the object files and then added some glue. The glue does loading related stuff and then invokes the main(). When I was a kid, for example, people got a kick out of "changing the glue" to add another function before the main() that would display a splash screen with their name.

One thing to note, though is that regardless of the language you use, eventually you have to make use of operating system services. For example, to display stuff on the screen, to manage processes, etc. Most operating systems have an API that is also callable in a similar way, but its contents are not included in your EXE. For example, when you run your browser, it is an executable, but at some point there is a call to the Windows API to create a window or to load a font. If this was part of your EXE, your EXE would be huge. So even in your executable, there are "missing references". Usually, these are addressed at load time or run time, depending on the operating system.

Uri
Thanks a lot. So, just to be sure, when I have call to function in main function, does linker add jump to that function and append that function to the end of exe file, or are the functions located on different parts of exe file, and than loader actually connects them together?
B.Gen.Jack.O.Neill
Becouse in 8051 assembler, I just added routines to the end of the program, and by jump and ret intructions entered to them and went back. But to use jump compiler need to know the adress of the first instruction in the routine. But when windows load progam into RAM, compiler cannot know what will be the adress of the first instruction of given routine.
B.Gen.Jack.O.Neill
I'm not sure how this currently works in Windows but the loader generally does a lot and can relocate things. For example, In the old days of MS DOS, there used to be a limit on the size of executables, so if you had a lot of extra code, special tricks were needed to load and remove code at runtime.
Uri
There's a good advanced article about how the linker and loader work for Linux: http://www.linuxjournal.com/article/6463. I think that with some googling you may find out how this works in recent versions of Windows, but generally speaking, as a C programmer, you really shouldn't worry about it past the link stage.
Uri
For 8051 code you are running directly on the processor. This is very similar to the mode that operating system kernels operate in on x86 (and many other) processors. Your application runs in a different mode which allows the operating system to remap memory to make the application "think" that its code and data are in different places than they actually are. Another way to get around this is with position independant code. Rather than calls and jumps to absolute addresses relative addresses (+- offset) are used. These, mixed with what a runtime linker can do is how that works.
nategoose
A: 

The compiler is responsible for translating all your functions written in C into assembly, which it saves in the object file (DLL or EXE, for example). So, if you write a .c file that has a main function and a few other function, the compiler will translate all of those into assembly and save them together in the EXE file. Then, when you run the file, the loader (which is part of the OS) knows to start running the main function first. Otherwise, the main function is just like any other function for the compiler.

The linker is responsible for resolving any references between functions and variables in one object file with the references in other files. For example, if you call printf(), since you do not define the function printf() yourself, the linker is responsible for making sure that the call to printf() goes to the right system library where printf() is defined. This is done at compile-time.

printf() is indeed be written in pure C. What it does is call a system call in the OS which knows how to actually send characters to the standard output (like a window terminal). When you call printf() in your program, at compile time, the linker is responsible for linking your call to the printf() function in the standard C libraries. When the function is passed at run-time, printf() formats the arguments properly and then calls the appropriate OS system call to actually display the characters.

RarrRarrRarr
+1  A: 

I am a new user and this system does not allow me to post more than one link. To get around that restriction, I have posted some idea at my blog http://zhinkaas.blogspot.com/2010/04/how-does-c-program-work.html. It took me some time to get all links, but in totality, those should get you started.

Zhinkaas