views:

744

answers:

10

I was reading this question to find out the differences between the Java Virtual Machine and the .NET CLR and Benji's answer got me wondering why Virtual Machines are necessary in the first place.

From my understanding of Benji's explanation, the JIT compiler of a Virtual Machine interprets the intermediate code into the actual assembly code that runs on the CPU. The reason it has to do this is because CPUs often have different numbers of registers and according to Benji, "some registers are special-purpose, and each instruction expects its operands in different registers." This makes sense then that there is a need for an intermediary interpreter like the Virtual Machine so that the same code can be run on any CPU.

But, if that's the case, then what I don't understand is why C or C++ code compiled into machine code is able to run on any computer as long as it is the correct OS. Why then would a C program I compiled on my Windows machine using a Pentium be able to run on my other Windows machine using an AMD?

If C code can run on any CPU then what is the purpose of the Virtual Machine? Is it so that the same code can be run on any OS? I know Java has VM versions on pretty much any OS but is there a CLR for other OS's besides Windows?

Or is there something else I'm missing? Does the OS do some other interpretation of assembly code it runs to adapt it to the particular CPU or something?

I'm quite curious about how all this works, so a clear explanation would be greatly appreciated.

Note: The reason I didn't just post my queries as comments in the JVM vs. CLR question is because I don't have enough points to post comments yet =b.

Edit: Thanks for all the great answers! So it seems what I was missing was that although all processors have differences there is a common standardization, primarily the X86 architecture, which provides a large enough set of common features so that the C code compiled on one X86 processor will work for the most part on another X86 processor. This furthers the justification for Virtual Machines, not to mention I forgot about the importance of garbage collection.

+2  A: 

AMD and Intel processors both have x86 architecture, if you want to run c/c++ program on a different architecture you have to use a compiler for that architecture, the same binary executable won't run across different processor architectures.

Element
Ah I see, so that's the bit of info I was missing. So then the number of registers and other primary CPU details that are important to the assembly code are standard across all X86 processors?
Daniel
you won't need different compiler but you will still need different virtual machine that will run on given platform.
lubos hasko
A: 

You're right in your analysis, java or C# could have been designed to compile direct to run on any machine, and would probably be faster if they did that. But the virtual machine approach gives complete control of the environment in which your code runs, the VM creates a secure sandbox that only allows commands with the right security access to perform potentially damaging code - like changing password, or updating an HD bootsector. There are many other benefits, but that's the killer reason. You can't get a StackOverflow in C# ...

MrTelly
What about a method that calls itself recursively and blows the call stack.
mP
@mP: It won't overflow the stack - c# will cease execution before any other memory is overwritten, so a stackoverflow becomes an out of stack error.
Adam Davis
Ok ok, I meant a the classic virus hack where you overwrite the return address on the stack - is it possible in C#?
MrTelly
BTW can someone educate me as too why my answer's not upto scratch
MrTelly
Most operating systems do a good job of enforcing security controls on regular compiled code. On UNIX-like systems a C program can't change the bootsector unless it has permissions to do so. Furthermore you can avoid stack overflows etc by using a compiled language that implements proper safeguards.
Artelius
+1  A: 

I know Java has VM versions on pretty much any OS but is there a CLR for other OS's besides Windows?

Mono

Tim
ooo nice, thanks!
Daniel
+2  A: 

In a very simplified way, that's because Intel and AMD implements the same assembly language, with the same number of register, etc etc...

So your C compiler compiles code to work on Linux. That assembly is using a Linux ABI, so as long as the compile program is being run on Linux, on x86 assembly, and the right function signature, then all is dandy.

Now try taking that compiled code, and stick it on, say Linux/PPC (e.g. Linux on an old iBook). That isn't going to work. Where as a Java program would because the JVM has been implemented on the Linux/PPC platform.

Assembly langauge nowadays is basically another interface that a programmer can program to. x86 (32-bit) allows you to access eax,ebx,ecx,edx for general purpose integer registers, and f00-f07 for floating point. Behind the scenes, the CPU actually has hundred more registers, and jumbled that stuff around to squeeze the performance out.

Calyth
+3  A: 

Your assumption that C code can run on any processor is incorrect. There are things like registers and endianness which will make compiled C programs not work at all on one platform, while it might work on another.

However, there are certain similarities that processors share, for example, Intel x86 processors and AMD processors share a large enough set of properties that most code compiled against one will run on the other. However, if you want to use processor-specific properties, then you need a compiler or set of libraries which will do that for you.

As for why you would want a virtual machine, beyond the statement that it will handle differences in processors for you, there is also the fact that virtual machines offer services to code that are not available to programs compiled in C++ (not managed) today.

The most prominent service offered is garbage collection, offered by the CLR and the JVM. Both of these virtual machines offer you this service for free. They manage the memory for you.

Things like bounds checking, access violations (while still possible, they are extremely difficult) are also offered.

The CLR also offers a form of code security for you.

None of these are offered as part of the basic runtime environment for a number of other languages which don't operate with a virtual machine.

You might get some of them by using libraries, but then that forces you into a pattern of use with the library, whereas in .NET and Java services that are offered to you through the CLR and JVM are consistent in their access.

casperOne
Ah nice point, I overlooked the garbage collection feature. That alone is probably enough justification for me! Memory leaks are a pain...
Daniel
Garbage collection is not a monopoly of virtual machines. There are libraries that do that also for compiled languages.
actually garbage collection is not the main reason for having virtual machines... main reasons are code access security and just-in-time compiling.
lubos hasko
casperOne
Too bad I can't mark second best answer... or third for that matter. Actually there were a lot of very good answers to this question, each with something the other answers didn't focus on. Anyways yours was best until I read Adam's detailed explanation. Still, great answer!
Daniel
+2  A: 

Essentially it allows for 'managed code', which means exactly what it says - the virtual machine manages the code as it runs. Three main benefits of this are just-in-time compilation, managed pointers/garbage collection, and security control.

For the just-in-time compilation one, the virtual machine watches the code execute and so as the code is run more often, it is reoptimised to run faster. You can't do this with native code.

Managed pointers are also easier to optimise because the virtual machine tracks them as they go around, managing them in different ways depending on their size and lifetime. It's difficult to do this in C++ because you can't really tell where a pointer is going to go just reading the code.

Security is a self-explanatory one, the virtual machine stops the code from doing things it shouldn't because it's watching. Personally I think that's probably the biggest reason why Microsoft chose managed code for C#.

Basically my point is, because the virtual machine can watch the code as it happens, it can do things which make life easier on the programmer and make the code faster.

Ray Hidayat
Wow interesting, I didn't know the JIT could optimize the code to run faster like that!
Daniel
+22  A: 

The AMD and intel processors use the same instruction set and machine architecture (from the standpoint of execution of machine code).

C and C++ compilers compile to machine code, with headers appropriate to the OS they are targeted at. Once compiled they cease to associate in any way, shape, or form with the language they were compiled in and are merely binary executables. (there are artifacts taht may show what language it was compiled from, but that isn't the point here)

So once compiled, they are associated to the machine (X86, the intel and amd instruction set and architecture) and the OS.

This is why they can run on any compatible x86 machine, and any compatible OS (win95 through winvista, for some software).

However, they cannot run on an OSX machine, even if it's running on an intel processor - the binary isn't compatible unless you run additional emulation software (such as parallels, or a VM with windows).

Beyond that, if you want to run them on an ARM processor, or MIPS, or PowerPC, then you have to run a full machine instruction set emulator that interprets the binary machine code from X86 into whatever machine you're running it on.

Contrast that with .NET.

The .NET virtual machine is fabricated as though there were much better processors out in the world - processors that understand objects, memory allocation and garbage collection, and other high level constructs. It's a very complex machine and can't be built directly in silicon now (with good performance) but an emulator can be written that will allow it to run on any existing processor.

Suddenly you can write one machine specific emulator for any processor you want to run .NET on, and then ANY .NET program can run on it. No need to worry about the OS or the underlying CPU architecture - if there's a .NET VM, then the software will run.

But let's go a bit further - once you have this common language, why not make compilers that convert any other written language into it?

So now you can have a C, C#, C++, Java, javascript, Basic, python, lua, or any other language compiler that converts written code so it'll run on this virtual machine.

You've disassociated the machine from the language by 2 degrees, and with not too much work you enable anyone to write any code and have it run on any machine, as long as a compiler and a VM exists to map the two degrees of separation.

If you're still wondering why this is a good thing, consider early DOS machines, and what Microsoft's real contribution to the world was:

Autocad had to write drivers for each printer they could print to. So did lotus 1-2-3. In fact, if you wanted your software to print, you had to write your own drivers. If there were 10 printers, and 10 programs, then 100 different pieces of essentially the same code had to be written separately and independently.

What windows 3.1 tried to accomplish (along with GEM, and so many other abstraction layers) is make it so the printer manufacturer wrote one driver for their printer, and the programmer wrote one driver for the windows printer class.

Now with 10 programs and 10 printers, only 20 pieces of code have to be written, and since the microsoft side of the code was the same for everyone, then examples from MS meant that you had very little work to do.

Now a program wasn't restricted to just the 10 printers they chose to support, but all the printers whose manufacturers provided drivers for in windows.

The same issue is occurring in application development. There are really neat applications I can't use because I don't use a MAC. There is a ton of duplication (how many world class word processors do we really need?).

Java was meant to fix this, but it had many limitations, some of which aren't really solved.

.NET is closer, but no one is developing world-class VMs for platforms other than Windows (mono is so close... and yet not quite there).

So... That's why we need VMs. Because I don't want to limit myself to a smaller audience simply because they chose an OS/machine combination different from my own.

Adam Davis
Aw, SO didn't notify me others were posting. I thought I had this question all to myself! ;-P
Adam Davis
Yeah, we need some type of AJAX window to let us know who we're racing against! haha
Daniel
Wow your answer is great! Casper's answer was great too but I'm going to give the best to you because yours was so detailed. Thanks, I learned a lot!
Daniel
@Adam: I have numerous programs of varying complexity written in Java that run equally well on numerous operating systems and hardware platforms - so I would contend that Java comes very close to solving this problem; far closer than CLI does (with only Windows and almost Linux).
Software Monkey
+1 for and excellent answer, though.
Software Monkey
@Software Monkey, Java is a great language, but the last time I used it I couldn't use serial ports reliably across architectures, sound and 3D graphics were problematic - even generic apps weren't as expressive as native apps. .NET is less portable, though. Native apps still win over either.
Adam Davis
+1  A: 

Firstly machine code is not the lowest form of instructions for a cpu. Todays x86 CPUS themselves interpret the X86 instruction set into another internal format using microcode. The only people who actually program microcode are the chip developer engineer types, who faithfully and painless emulate the legacy x86 instruction chip to achieve maximum performance using todays technologies.

Developer types have always been adding additional layers of abstractions because of the power and features that they bring. After all better abstractions allow new applications to be written more quickly and reliably. Businesses dont care about what or how they code looks like they just want the job done reliably and quickly. Does it really matter if the C version of an application takes a few milliseconds less but ends up taking double the time to develop ?

The speed question is almost a non argument as many enterprise applications that serve millions of people are written in platforms/languages like java - eg GMail, GMaps. Forget about which language/platform is fastest. Whats more important is that you use the correct algorithms and write effecient code and get the job done.

mP
Hmm, thanks for the microcode explanation. I didn't realize that even the assembly is interpreted!
Daniel
A: 

I think the premise of your question is valid - you're certainly not the first to ask this question. So check out http://llvm.org to see an alternative approach (which is now a project being run? or sponsored by Apple)

ankushnarula
+1  A: 

Most compilers, even native code compilers, use some sort of intermediate language.

This is mainly done to reduce compiler construction costs. There are many (N) programing languages in the world. There are also many (M) hard ware platforms in the world. If compilers worked without using an intermediate language, the total number of "compilers" that would need to be written to support all languages on all hardware platforms would be N*M.

However, by defining an intermediate language and breaking a compiler up into 2 parts, a front end and a back end, with the front end compiling source code into IL and the back end compiling IL into machine code, you can get away with writing only N+M compilers. This ends up being a huge cost savings.

The big difference between CLR / JVM compilers and native code compilers is the way the front end and the back end compilers are linked to each other. In a native code compiler the two components are usually combined into the same executable, and both are run when the programmer hits "build" in the IDE.

With CLR / JVM compilers, the front end and the back end are run at different times. The front end is run at compile time, producing IL that is actually shipped to customers. The back end is then embodied in a separate component that is invoked at runtime.

So, this brings up the alternate question, "What are the benefits of delaying back end compilation until runtime"?

The answer is: "It depends".

By delaying back end compilation until runtime, it becomes possible to ship one set of binaries that can run on multiple hardware platforms. It also makes it possible for programs to take advantage of improvements in the back end compilation technology without being redeployed. It can also provide a foundation for efficiently implementing many dynamic language features. Finally, it offers the ability to introduce security and reliability constraints between separately compiled, dynamically linked libraries (dlls) that is not possible with upfront machine code compilation.

However, there are also draw backs. The analysis necessary to implement extensive compiler optimizations can be expensive. This means that "JIT" back ends will often do less optimizations than upfront backends do. This can hurt performance. Also, the need to invoke the compiler at runtime also increases the time necessary to load programs. Programs generated with "upfront" compilers don't have those problems.

Scott Wisniewski
Thanks for your answer. Understanding that native code compilers and VMs both use intermediate languages, they just compile to machine language at different times, makes the overall picture much more clear.
Daniel