views:

872

answers:

5

I've been looking into how programming languages work, and some of them have a so-called virtual machines. I understand that this is some form of emulation of the programming language within another programming language, and that it works like how a compiled language would be executed, with a stack. Did I get that right?

With the proviso that I did, what bamboozles me is that many non-compiled languages allow variables with "liberal" type systems. In Python for example, I can write this:

x = "Hello world!"
x = 2**1000

Strings and big integers are completely unrelated and occupy different amounts of space in memory, so how can this code even be represented in a stack-based environment? What exactly happens here? Is x pointed to a new place on the stack and the old string data left unreferenced? Do these languages not use a stack? If not, how do they represent variables internally?

+4  A: 

Probably, your question should be titled as "How do dynamic languages work?."

That's simple, they store the variable type information along with it in memory. And this is not only done in interpreted or JIT compiled languages but also natively-compiled languages such as Objective-C.

Mehrdad Afshari
Aha, I see. But what happens during the reassignment? If the new data is larger than the old one, where is it placed and what becomes of the old data?
Martin
The actual contents of the object could be stored elsewhere and only a reference would be stored in place. There are plenty of techniques to implement a dynamic language. What happens really depends on the particular implementation of your language.
Mehrdad Afshari
http://en.wikipedia.org/wiki/Dynamic_typing#Dynamic_typing
Mehrdad Afshari
right.. the thing that is typed is the value, not the variable.
sean riley
+2  A: 

In most VM languages, variables can be conceptualized as pointers (or references) to memory in the heap, even if the variable itself is on the stack. For languages that have primitive types (int and bool in Java, for example) those may be stored on the stack as well, but they can not be assigned new types dynamically.

Ignoring primitive types, all variables that exist on the stack have their actual values stored in the heap. Thus, if you dynamically reassign a value to them, the original value is abandoned (and the memory cleaned up via some garbage collection algorithm), and the new value is allocated in a new bit of memory.

Doug
Why is the stack used at all if it only contains pointers to the heap? Wouldn't it be smarter to store pointers internally and just skip the whole stack approach?
Martin
+1  A: 

The key to many of the 'how do VMs handle variables like this or that' really comes down to metadata... The meta information stored and then updated gives the VM a much better handle on how to allocate and then do the right thing with variables.

In many cases this is the type of overhead that can really get in the way of performance. However, modern day implementations, etc have come a long way in doing the right thing.

As for your specific questions - treating variables as vanilla objects / etc ... comes down to reassigning / reevaluating meta information on new assignments - that's why x can look one way and then the next.

Gabriel
+1  A: 

To answer a part of your questions, I'd recommend a google tech talk about python, where some of your questions concerning dynamic languages are answered; for example what a variable is (it is not a pointer, nor a reference, but in case of python a label).

The MYYN
+2  A: 

The VM has nothing to do with the language. Any language can run on top of a VM (the Java VM has hundreds of languages already).

A VM enables a different kind of "assembly language" to be run, one that is more fit to adapting a compiler to. Everything done in a VM could be done in a CPU, so think of the VM like a CPU. (Some actually are implemented in hardware).

It's extremely low level, and in many cases heavily stack based--instead of registers, machine-level math is all relative to locations relative to the current stack pointer.

With normal compiled languages, many instructions are required for a single step. a + might look like "Grab the item from a point relative to the stack pointer into reg a, grab another into reg b. add reg a and b. put reg a into a place relative to the stack pointer.

The VM does all this with a single, short instruction, possibly one or two bytes instead of 4 or 8 bytes PER INSTRUCTION in machine language (depending on 32 or 64 bit architecture) which (guessing) should mean around 16 or 32 bytes of x86 for 1-2 bytes of machine code. (I could be wrong, my last x86 coding was in the 80286 era.)

Microsoft used (probably still uses) VMs in their office products to reduce the amount of code.

The procedure for creating the VM code is the same as creating machine language, just a different processor type essentially.

VMs can also implement their own security, error recovery and memory mechanisms that are very tightly related to the language.

Some of my description here is summary and from memory. If you want to explore the bytecode definition yourself, it's kinda fun:

http://java.sun.com/docs/books/jvms/second_edition/html/Instructions2.doc.html

Bill K