views:

109

answers:

3

Hi, I'm trying to understand how these languages work under the hood. Unfortunately I only ever read very superficial things. I'll summarize what I know already, I would be really happy if you could correct me, and most of all, help me enhance my little bits of half-knowledge.

C++:

The C++ compiler preprocesses all source files. This means, that it actually inserts strings into the places where macros where originally. After that, it creates an .obj file for each source file containing machine independant bytecode. The linker then links all external .obj files from libraries with the custom made .obj files together, and compiles it into an .exe.

Java:

Java code is compiled into machine independant "bytecode" which sits in .class files, which in turn can sit in .JAR files, which get run on the JRE. The virtual machine is just doing garbage cleanup then. Java code is compiled just-in-time like C#, but with hotspot optimization developed by SUN.

C#:

Practically the same as Java? C# source code gets compiled into CIL (Common Intermediate Language) code, which is still human readable. This code will be run by the CLR Just-in-Time. This compilation turns methods into machine specific code just when they are first called.

I'm actually interested in pretty much every language...but Java and C# are almost the same, and I always wondered how the differentiate. And C++ is the "classic" so to speak. The father of both without any kind of virtual machine. Appreciate the help!

edit: I know that this is a broad subject, but I really couldn't find any solid knowledge. If you have links or books that explain this sort of thing I'm happy to go to work. I tried to read the SUN specifications/whitepapers for the java virtual machine, but that is all a little too deep for me right now.

A: 

Pretty good.

C++'s .obj files are machine dependent but generally do not have memory addresses resolved. A Linker just takes the .obj files and links them together and resolves many of the addresses to absolute values.

It's not really correct to say that the virtual machine is just doing garbage cleanup--not even sure what that means. The VM reads the bytes of code and decodes each one, so the VM is like a CPU. When it finds a bunch of code that is executed repeatedly it can replace that bytecode with real highly optimized machine code--that is JIT Compiling.

I think the rest is pretty correct--although I can't honestly say if C#'s CIL is human readable.

Bill K
You mean it resolves the machine independant addresses (like an imaginary pointer to I/O) to something real on Windows right? Would be on par with something I read earlier.
Blub
@Blub Actually it resolves addresses that might point into another object file--the object files don't know about each other at compile time and so they don't know where the entry points will actually end up. when the linker puts all the object files together it resolves addresses that it can't know about until then (What if they object files linked in a different order?). C# and Java don't need this because they actually store the name of the method in the bytecode and let the runtime figure it all out.
Bill K
+4  A: 

The compilation of unmanaged C++ is very different from the compilation of managed C++, C# and Java.

Unmanaged C++

Unmanaged C++ (“traditional” C++) is compiled directly into machine code. The programmer invokes a compiler that targets a specific platform (processor and operating system), and the compiler outputs an executable that works only on that platform. The executable contains the machine code that the particular processor understands. When executed, the processor will directly execute the compiled code as is (modulo virtual memory address translation yadda yadda).

Managed C++, C# and Java

Managed code is compiled into an intermediate code (CIL in the case of .NET languages like C#, and Java bytecode in the case of Java). The compiler outputs an executable that contains code in this intermediate language. At this point, it is still platform-independent. When executed, a so-called Just-in-Time compiler will kick in, which translates the intermediate code into machine code just before executing. The processor will then execute the machine code generated by the JIT compiler. Most of the time, this machine code is kept in memory and discarded at the end of the program (so it has to run the JITting again the next time), but tools exist to do the JITting permanently.

The benefit here of course is that the platform-independent executable can be run on any platform, but the downside is that you need an execution environment (including a JIT compiler) for that platform.

Timwi
Note there is nothing in the Java Language Specification that says you have to compile Java to JVML. And in fact, there are many implementations of Java that don't. They compile to CIL, native code, Parrot bytecode, ECMAScript, Dalvik bytecode or C. Some Java implementations are pure interpreters. Similarly for C#: there are compilers that target native as well. And for C++, there are interpreters which don't compile at all, and there are compilers which compile to Java or ECMAScript. Really, what a compiler compiles to is that compiler's business and has nothing to with the language.
Jörg W Mittag
A: 

All three languages are pretty much the same (they're all imperative OO languages), the main differences being that

  • Java and C# support run-time type reflection (i.e., the program can examine itself and type casts are run-time checked operations), whereas C++ does not and
  • you can't subvert the Java and C# types directly from within the languages themselves (although I suspect the compilers for all three languages just emit code with undefined semantics in the exception cases);
  • C++ doesn't have to check for NULL dereferences (that is left to the hardware), whereas C# and Java do have to check for nulls on every dereference;
  • C# and Java have to provide stronger guarantees concerning the memory model (i.e., what happens in the case of concurrent reads/writes to the same variable), and exception handling, and so forth;
  • typically, C++ is compiled directly to the target machine or assembly language whereas C# and Java are typically compiled to intermediate languages (IL or JVM) for later JITting or interpretation. IL and JVM are, essentially, abstractions of "the CPU".

The C++ compiler will go to greater lengths to optimise the code it generates because it can't pass the buck for low-level optimisation on to the JIT compiler.

Rafe