views:

703

answers:

7

Hi,

I read a book which referred .net CLR as virtual machine ? Can anyone justify this ? What is the reason we need the concept of virtual machines on some development platforms ?

Isn't it possible to develop a native framework [one without virtual machine] that is fully object oriented and as powerful as .net ?

The book which refers to CLR as virtual machine is "Professional .Net Framework 2.0".

Thanks.

+3  A: 

The "Virtual Machine" part refers to the fact that .NET code is compiled into EXE's and DLL's as "Intermediate" Assembly language (IL) to run on a virtual machine, as opposed to real CPU assembly language. Then, at runtime the ILM is converted into real CPU assembly for execution (referred to as Just-in-time, or JIT compiling).

Sure, you could write a .NET compiler so that it would be compiled into CPU assembly language instead of IL. However, this would not be portable to all CPUs - you'd have to to compile a different version for each OS/CPU pair. But by compiling into ILM, you let the "Virtual Machine" handle the CPU and OS specific stuff.

RichAmberale
I think you mean IL, or more correctly CIL - Common Intermediate Language.
dahlbyk
Ya thanks, fixing...
RichAmberale
+1  A: 

The advantage of the CLR is the freedom to write code in whatever programming language the developer chooses, since the code will be compiled down to CLR before being interpreted into native calls. The .NET framework uses this JIT compilation to treat everything uniformly and output programs which work for the platform being deployed on, which is absent from compiled languages.

byte
A: 

I am a bit old school, so i call the CLR a virtual machine as well. My reasoning is that the CLR assembles the machine code from an intermediate bytecode, which is what a virtual machine also does.

The benefits of the CLR is mainly due to the way it assembles the machine code which utilizes runtime type information.

You can develop a native framework as powerful as the .NET framework using just native types. The only flexibility you lose is the ability to reassemble the native code if you ever transport your program to another platform without recompiling.

Andrew Keith
+11  A: 

Similar to the Java Virtual Machine (JVM), the .net CLR is a byte-code interpreting virtual machine.

The JVM interprets programs which contain java byte codes and the .net CLR interprets programs which contain what Microsoft calls "Intermediate Language (IL)" instructions. There are differences between these byte codes, but the virtual machines are similar and aspire to provide similar features.

Both of these virtual machine implementations have the ability to compile their input bytecode to the machine language of the computer they are running on. This is called "Just In Time Compilation (JIT)" and the output code produced is called "JIT code." Because the JIT code contain sequences of instructions in the machine language of the computer's CPU, this code is sometimes referred to as "native" code.

However, JIT code is qualitatively and quantitatively different from native code, as explained below. For that reason, this article considers JIT code to be nothing more than a native implementation of the Virtual Machine while running a particular bytecode program.

One feature that both these Virtual Machines (VMs) aspire to provide is security in the form of preventing certain hazardous programming errors. For example, the title of this website forum, stackoverflow, is inspired by one such type of hazardous error that is possible in native code.

In order to provide safety and execution security, the VMs implement type safety at the "Virtual Machine level". Assignments to VM memory are required to store the type of data which is held in that memory location. For example, if an integer is pushed onto the stack, it is not possible to pop a double from the stack. C-style "unions" are prohibited. Pointers and direct access to memory are prohibited.

We could not get the same benefits by enforcing an object oriented language framework on developers if the result is a native binary such as an EXE file. In that case, we would not be able to distinguish between native binaries generated using the framework and EXEs generated by a malicious user employing sources other than the framework.

In the case of the VMs, the type-safety is enforced at the "lowest level" that the programmer is allowed to access. (Neglecting for a moment that it is possible to write managed native code, that is.) Therefore, no user will encounter an application which performs one of the hazardous operations which require direct access to memory locations and pointers.

In practice, the .net CLR implements a way to write native code which can be called by .net "managed" code. In this case, the burden is on the native code author not to make any of the pointer and memory mistakes.

As both the JVM and .net CLR perform JIT compilation, either VM actually creates a native-compiled binary from the bytecode supplied. This "JIT code" performs more quickly than the VM's interpreter execution, because even the machine language code produced by JIT contains all the VM's needed safety checks that the VM would perform. As a result, the JIT output code is not as fast as native code which would ordinarily not contain numerous run-time checks. However, this speed performance drawback is exchanged for an improvement to reliability including security; in particular, use of uninitialized storage is prevented, type-safety of assignments is enforced, range-checking is performed (thus stack- and heap- based buffer overflows prevented), object lifetimes are managed by garbage collection, dynamic allocation is type safe. An environment executing such run-time behavior checks is implementing the specification of a virtual machine and is little more than a machine language realization of a virtual machine.

Heath Hunnicutt
Solid detailed informative answer but I hope you won't mind if I pick one nit. A stack overflow is absolutely possible in managed code: to reproduce this, create and call the following method: public void f() { f(); }. The difference tends to be that errors are more predictable and less exploitable (this ties in with type safety, one effect of which is to prevent data being accidentally run as code). For example dereferencing a bad pointer in C has unpredictable results; but dereferencing null on the JVM or CLR safely and predictably gives you an exception. Apologies for pedantry!
itowlson
That is a different kind of stack overflow, but I see what you mean. I was referring to the overflow of a stack buffer, not exhaustion of the stack itself. I think the name of this website also refers to the former and not the latter, but that could just be my perspective. Typically, stack-based buffer overflows are much more interesting than exhaustion of the stack (or of the VM stack), so that's usually what people mean when they write "stack overflow" although technically your interpretation could also be correct. But thanks for the comment. ;)
Heath Hunnicutt
This is just wrong. .Net languages are compiled to IL, which is similar to java byte code, and it's the IL that is distributed to user computers for execution. The difference is that instead of loading bytecode into a VM for execution, .Net IL is compiled to native code before execution. No virtual machine as such is ever loaded.
Joel Coehoorn
@Joel - surely it's compilation to native code is an irrelevant detail? A future version of the CLR that had a lightning fast interpreter could switch to interpretation instead, and no-one would need to know that fact...
Damien_The_Unbeliever
@Damien: you could say the same thing about java switching to a native compiled mode. If they did that, you'd see a lot of marketing noise about "losing the VM". Plus, .Net gives you the option to pre-compile if you really want it (hint: it's slower on average over the life of a program), which kinda takes the IL out of the picture.
Joel Coehoorn
@Joel - I don't think you would. Developers still would target the Virtual Machine. I guess we're working from different definitions of Virtual Machine, but to my mind such details as JIT compilation are irrelevant.
Damien_The_Unbeliever
Another difference: the JIT stops running _before_ your program starts. People tend to think of VM like a sandbox for you code, and with .Net there is no sandbox environment running while your code does.
Joel Coehoorn
@Joel - Just a bit of clarification... JITting is at the method level, so each method gets compiled right before it is called the first time. So technically, the JITter is called throughout program execution, each time a previously-uncalled method is invoked. The only time that the JITter would be "shut down" is if every method and type in your code has already been accessed once.
ProKiner
The JVM does something similar - it natively compiles "heavily used" code segments rather than all the code - http://java.sun.com/developer/onlineTraining/Programming/JDCBook/perf2.html - search for "Adaptive optimization" "Just-In-Time Compilers"
Nate
Joel: WHAT? Of course the IL is what's distributed to users and then JITed, same as JVM bytecode. In fact, that's my point -- it's functionally on-par (by design) to JVM bytecode. In fact, Sun Java had JIT prior to MS JVM, IIRC. Both had it before CLR existed. Remember MS JVM?
Heath Hunnicutt
"That is, on the computer which is running the 'interpreted' code, the VM actual creates a native-compiled binary from the bytecode supplied. " VM = CLR, bytecode=IL, supplied="Sent to user"
Heath Hunnicutt
+10  A: 

Brad Abrams: Is the CLR a Virtual Machine?

QrystaL
That article reflects Microsoft's beliefs about how people should speak, not what the CLR is. The CLR is a Virtual Machine and Microsoft wants people to call it an Execution Environment. Also, two legs are bad and four legs are good. ;)
Heath Hunnicutt
I'll give you the "article reflects how Micrsoft wants us to speak" part, but the rest is wrong. The CLR is _**NOT**_ a Virtual Machine. See my answer for why.
Joel Coehoorn
+16  A: 

There are a lot of misconceptions here. I suppose you could think of .Net as a virtual machine if you really wanted, but let's look at how the .Net Framework really handles your code. The typical scenario looks like this

  1. You write a .Net program in C#, VB.Net, F#, or some other compatible language.
  2. That code is compiled down to an Intermediate Language (IL), which is similar to Java's bytecode.
  3. The IL is packed into assemblies (ie: *.dll or *.exe files) for deployment.
  4. The IL in the assemblies is distributed to end user computers.
  5. The end user invokes the program for the first time on a computer with the right version of .Net installed
  6. The computer sees this is a .Net assembly rather than "raw" machine code, and passes it off to the JIT compiler
  7. The JIT compiler compiles the IL to fully-native machine code.
  8. The native code is saved in memory for the life of this program execution.
  9. The saved native code is invoked, and the IL no longer matters.

There are a couple important points here, but the big one is that at no point is any code ever interpreted. Instead, you can see in step 7 that it is compiled to native code. This a huge difference than loading it into a virtual machine, for several reasons:

  1. The code is executed by the cpu directly rather than interpreted by an abstraction layer, which should be faster.
  2. The JIT compiler can take advantage of machine-specific optimizations, rather than settling for a lowest common denominator.
  3. If you want you can even pre-compile the code and in essence hide step 7 from the user completely.

I suppose you could call this a virtual machine, in the sense the JITter abstracts away the details of the real machine from the developer. But personally I don't think that's really right. To many people, a virtual machine implies an abstraction away from native code that for .Net programs just doesn't exist.

One other key point about this whole process that really sets it apart from a "virtual machine" environment is that it's only the typical process. If you really want to, you can pre-compile a .Net assembly before distribution and deploy native code directly to end users (hint: it's slower on average over the life of the program). Of course, you still need the .Net runtime installed, but at this point it's really not much different from any other run time API. It's almost like a collection dlls with a nice API you can link against, as you might have with the VB or C runtimes Microsoft also ships with Visual Studio. This kind of takes the IL out of the picture, making the VM moniker much harder to justify. (I say "kind of" because the IL is still deployed and used to verify the saved code, but it's never itself touched for execution).

I think the most telling thing, though, is the lack of a VM process. When you run your app, there's no common "sandbox" process that runs. This is very different from Java, where if you open the task manager when a program is running you will see a process specifically for the Java VM, and the application's actual process is a thread inside of the sandbox created by the VM. In .Net, you see the application's process in the Windows task manager directly.

In summary: you could say that IL + CLR + JIT together somehow make up a virtual machine. Personally I don't think so, but I won't argue with you if you believe that. The point I want to make is that when you tell someone that .Net runs in a virtual machine with no further explaination, the idea you are communicating to that person is "interpreted bytecode in a host process." And that's just wrong.

Joel Coehoorn
Have you got a reference on item 3 from your second list? I've never seen anything that says that pre-JITted code is saved, except when NGEN has been invoked.
Damien_The_Unbeliever
Also, explain the need for Constraint Execution Regions if the JITting all happens once and only once... http://msdn.microsoft.com/en-us/library/ms228973.aspx
Damien_The_Unbeliever
JITted code is saved in memory, so is lost when the app exists. The code is re-JITted the next time the app is run (or, more specifically, when a new process uses the code).
ProKiner
@prokiner - I agree, this is my understanding also
Damien_The_Unbeliever
You seem to be saying that if it doesn't do everything exactly the same way as Java does, then it's not a Virtual Machine. I think that's the biggest thing I disagree with you about, because it then tends to be a circular definition.
Damien_The_Unbeliever
Joel, according to your comments I have the feeling that you believe that the JIT compiler compiles once and for all the IL code to native code when the application starts, then suddenly stops. It's just wrong, the JIT compiles methods on the fly whenever the execution flow requires it. Also, there's a VM process. That's the CLR that is running your application. It's definitely a sandboxed environment, with a GC, a security model, a metadata service. The running application is just named after the .exe because Window's PE loader knows how to delegate to the CLR .net binaries.
Jb Evain
Also, NGEN generated code runs inside a CLR - it doesn't create a native executable that uses CLR/.NET as a "collection (of) dlls with a nice API you can link against" - it just saves off the native code that would be generated by the JIT.
Nate
@Jb, I did believe that, yes. Further investigation shows it's only true if you use ngen _or_ if you install the assembly to the GAC. Where I'm at, we end up putting most things into the GAC and so this hasn't been very off, but I realize it's not true for everyone and so I'll update my answer to reflect that.
Joel Coehoorn
@Damien: no, I don't think it needs to do everything the same as Java to be a VM, but I do have certain expectations when I hear the term "VM" for this context that .Net just does not live up to. I think it's misleading to tell others that .Net runs "inside" a VM, especially as this is typically used to make a distinction from native code that for .Net does not exist. _IF_ there a VM involved (something I already admitted you could make a case for), it's only from a pre-processing standpoint and not something your code runs "inside".
Joel Coehoorn
@Nate: I mostly agree with you, and that statement should be "it's _like_ a collection of dlls..." It's supposed to be an abstraction: a simple way to think of it. Will fix. I disagree, though, that any thing ever runs "inside" the CLR. "On top of" would be more correct.
Joel Coehoorn
@Joel, still your whole last paragraph is wrong. Windows' PE loader starts the CLR, which itself starts the program within its control, memory and security wise. Inside is definitely usable. That's just a hack to avoid a launcher à la java, Rotor or mono. But the .net EE (execution engine) controls how the program runs.
Jb Evain
Yep. The PE Header redirection is a trick to bootstrap the CLR (mscoree) which is, by all means, running your .Net application.Moreover, like JB said, JIT is not done once and for all, and some JITted methods can even be discarded and rejitted at a later point.I guess that misconceptions are in the eye of the beholder.
Yann Schwartz
Joel, your answer is following the MS lead. The .net CLR is the evolution of the discontinued JVM. "IL + CLR + JIT" do equal a VM when the function of the JIT output is to replicate the CLR functionality. I don't appreciate your strange attack on a well-answered question, including making a bunch of assertions re: my answer that have not yet been answered and appear to be "wrong" on your part. The OP considers the JVM a VM, so the .Net CLR is also a VM. Either neither or both of them are Virtual Machines, because not only are they functionally identical, one derives from the other.
Heath Hunnicutt
You make point (1) about JIT code being "faster" due to not being interpreted further. While this is true, it is a little misleading. The performance of the code is closer to VM performance than, e.g., typical C projects. The discrepancy is that the JIT output contains many more runtime checks, type-safety checks, and especially many more dynamic allocations than a typical C project. My comparison is not intended to be a value comparison, but rather on this angle: JIT output performs like VM code and not like code written to the non-VM API, both speed and security-wise. It's like a VM.
Heath Hunnicutt
@Damien_The_Unbeliever: Regarding #3, this reference indicates that only NGEN'd code is saved: http://msdn.microsoft.com/en-us/magazine/cc163655.aspx "Note the subtle tradeoff here. Using NGEN means trading CPU consumption for more disk access, since the native image generated by NGEN is likely to be larger than the MSIL image. You might be wondering if this could hurt cold startup as it results in increased disk activity. Interestingly, the CLR Performance team has observed that if JIT compilation is completely eliminated, cold startup time typically improves."
Heath Hunnicutt
By "discontinued JVM" I mean the discontinued Microsoft Java Virtual Machine.
Heath Hunnicutt
"the idea you are communicating to that person is "interpreted bytecode in a host process." And that's just wrong". I couldn't agree more. That is not, in my view, the definition of a Virtual Machine.
Damien_The_Unbeliever
A: 

You've got many valuable answers, but I think one thing hasn't been mentioned yet: Modularity.

It's quite hard to export a OO class from native DLL. Sure, you can tell the linker to export the class and import it somewhere else, but this is brittle; Changing a single private member in a class will break binary compatibility, i.e. if you change one DLL without recompiling all the other modules, your program will crash horribly at runtime.

There are some ways around this: For example, you can define public abstract interfaces, derive from those and export global factory functions from your DLL. That way, you can change implementation details of a class. But you can't derive from that class in another DLL. And changing the interface also breaks binary compatibility, of course.

I'm not sure if there is a good solution for this in native code: If the compiler/linker creates native code at compile time, then it must know the exact memory layout of the classes/structures that are used in code. If the last compilation step (generating native code) is delayed until a method is called for the first time, this problem simply goes away: you can modify a class in an assembly, and as long as the JIT can resolve all the used members at runtime, everything will run fine.

In a nutshell: If you create a monolithic single-executable program, you could probably have most of the powerful features of .NET with a compiler that creates native code. But the disadvantages of having a JIT compiler (framework installation, slightly longer startup times) really don't outweigh the benefits in most cases.

nikie