tags:

views:

1639

answers:

8

It's seems rare to read of a Python "virtual machine" while in Java "virtual machine" is used all the time. Both interpret byte codes, why call one a virtual machine and the other an interpreter?

A: 

There's no real difference between them, people just follow the conventions the creators have chosen.

Cody Brocious
+18  A: 

Probably one reason for the different terminology is that one normally thinks of feeding the python interpreter raw human-readable source code and not worrying about bytecode and all that.

In Java, you have to explicitly compile to bytecode and then run just the bytecode, not source code on the VM.

Even though Python uses a virtual machine under the covers, from a user's perspective, one can ignore this detail most of the time.

Mr Fooz
I agree. This difference in terminology really comes down to the end-user (developer, that is) experience. It has nothing to do with real technical differences, as the technical line is so incredibly blurred as to be damn-near nonexistent.
Cody Brocious
+1: And -- more importantly -- what's the point? What program can't you write because of this distinction? What stack traceback is confusing you? What library doesn't appear to work correctly?
S.Lott
+5  A: 

The term interpreter is a legacy term dating back to earlier shell scripting languages. As "scripting languages" have evolved into full featured languages and their corresponding platforms have become more sophisticated and sandboxed, the distinction between a virtual machine and an interpreter (in the Python sense), is very small or non-existent.

The Python interpreter still functions in the same way as a shell script, in the sense that it can be executed without a separate compile step. Beyond that, the differences between Python's interpreter (or Perl or Ruby's) and Java's virtual machine are mostly implementation details. (One could argue that Java is more fully sandboxed than Python, but both ultimately provide access to the underlying architecture via a native C interface.)

Daniel
+3  A: 

Don't forget that Python has JIT compilers available for x86, further confusing the issue. (See psyco).

A more strict interpretation of an 'interpreted language' only becomes useful when discussing performance issues of the VM, for example, compared with Python, Ruby was (is?) considered to be slower because it is an interpreted language, unlike Python - in other words, context is everything.

Arafangion
That's wrong. First, there is no such thing as an "interpreted language". Whether an implementation uses a compiler or an interpreter is not a trait of the language but of the implementation. Second, of the 13 or so Ruby implementations, exactly 1 is an interpreter, all the others have compilers.
Jörg W Mittag
Third, Ruby is not slow. No language is slow, because speed is not a trait of the language, but the language implementation. Of the 13 or so Ruby implementations, some are slower than some of the 7 Python implementations, some are faster.
Jörg W Mittag
I think he is comparing the standard implementations here Jörg. CPython and Ruby (I think the official implementation is just named Ruby).
James McMahon
While Arafangion might be referring to the "standard" implementations, he should have said so. I'm a Pythonista but I hate *any* statement of the form "Language X is slow", since I agree with Jörg on the matter of implementations.
ΤΖΩΤΖΙΟΥ
This is exactly why I said "was (is?)", and particularly the term "slower". Nowhere did I say that Ruby is in itself slow.
Arafangion
+7  A: 

A virtual machine is a virtual computing environment with a specific set of atomic well defined instructions that are supported independent of any specific language and it is generally thought of as a sandbox unto itself. The VM is analogous to an instruction set of a specific CPU and tends to work at a more fundamental level with very basic building blocks of such instructions (or byte codes) that are independent of the next. An instruction executes deterministically based only on the current state of the virtual machine and does not depend on information elsewhere in the instruction stream at that point in time.

An interpreter on the other hand is more sophisticated in that it is tailored to parse a stream of some syntax that is of a specific language and of a specific grammer that must be decoded in the context of the surrounding tokens. You can't look at each byte or even each line in isolation and know exactly what to do next. The tokens in the language can't be taken in isolation like they can relative to the instructions (byte codes) of a VM.

A Java compiler converts Java language into a byte-code stream no different than a C compiler converts C Language programs into assembly code. An interpreter on the other hand doesn't really convert the program into any well defined intermediate form, it just takes the program actions as a matter of the process of interpreting the source.

Another test of the difference between a VM and an interpreter is whether you think of it as being language independent. What we know as the Java VM is not really Java specific. You could make a compiler from other languages that result in byte codes that can be run on the JVM. On the other hand, I don't think we would really think of "compiling" some other language other than Python into Python for interpretation by the Python interpreter.

Because of the sophistication of the interpretation process, this can be a relatively slow process....specifically parsing and identifying the language tokens, etc. and understanding the context of the source to be able to undertake the execution process within the interpreter. To help accelerate such interpreted languages, this is where we can define intermediate forms of pre-parsed, pre-tokenized source code that is more readily directly interpreted. This sort of binary form is still interpreted at execution time, it is just starting from a much less human readable form to improve performance. However, the logic executing that form is not a virtual machine, because those codes still can't be taken in isolation - the context of the surrounding tokens still matter, they are just now in a different more computer efficient form.

Tall Jeff
I was under the impression that python did generate byte code, pyc, or is that what you are referring by "help accelerate such interpreted languages, this is where we can define intermediate forms of pre-parsed, pre-tokenized source code that is more readily directly interpreted."
James McMahon
@InSciTek Jeff: From your answer it's not clear whether you do know that Python uses a virtual machine too.
ΤΖΩΤΖΙΟΥ
@TZ - The popular Python implementation is a Python compiler with a back side VM. In interactive mode, it is a bit of hybrid with both an interpreter front end, and a compiler back end. However those are implementation choices. I tried to describe difference between concept of VM and Interpreter
Tall Jeff
`On the other hand, I don't think we would really think of "compiling" some other language other than Python into Python for interpretation by the Python interpreter.` It is possible to write a language that can be compiled into Python bytecode, just like Scala is compiled into Java bytecode. In interactive mode, Python's interactive shell compiles your typed command into bytecode and executes that bytecode. You can write your own shell using eval and exec, and you can use compile() built-in function to turn a string into bytecode.
Lie Ryan
A: 

In my humble opinion, terms like "Virtual Machine" and "Write once, run anywhere" are marketing buzz words, that just describe an interpreter environment.

However, one has to note that java doesn't come with a command line interpreter like python.

hasen j
+4  A: 

Interpreter, translates source code into some efficient intermediate representation (code) and immediately executes this.

Virtual Machine, explicitly executes stored pre-compiled code built by a compiler which is part of the interpreter system.

A very important characteristic of a virtual machine is that the software running inside, is limited to the resources provided by the virtual machine. Precisely, it cannot break out of its virtual world. Think of secure execution of remote code, Java Applets.

In case of python, if we are keeping pyc files, as mentioned in the comment of this post, then the mechanism would become more like a VM, and this bytecode executes faster. If we look at this as a whole, PVM is a last step of Python Interpreter.

The bottomline is, when refer Python Interpreter, it means we are referring it as a whole, and when we say PVM, that means we are just talking about a part of Python Interpreter, a runtime-environment. Similar to that of Java, we refer different parts differentyl, JRE, JVM, JDK, etc.

For more, Wikipedia Entry: Interpreter, and Virtual Machine. Yet another one here. Here you can find the Comparison of application virtual machines. It helps in understanding the difference between, Compilers, Interpreters, and VMs.

Adeel Ansari
+2  A: 

In this post, "virtual machine" refers to process virtual machines, not to system virtual machines like Qemu or Virtualbox. A process virtual machine is simply a program which provides a general programming environment -- a program which can be programmed.

Java has an interpreter as well as a virtual machine, and Python has a virtual machine as well as an interpreter. The reason "virtual machine" is a more common term in Java and "interpreter" is a more common term in Python has a lot to do with the major difference between the two languages: static typing (Java) vs dynamic typing (Python). In this context, "type" refers to primitive data types -- types which suggest the in-memory storage size of the data. The Java virtual machine has it easy. It requires the programmer to specify the primitive data type of each variable. This provides sufficient information for Java bytecode not only to be interpreted and executed by the Java virtual machine, but even to be compiled into machine instructions. The Python virtual machine is more complex in the sense that it takes on the additional task of pausing before the execution of each operation to determine the primitive data types for each variable or data structure involved in the operation. Python frees the programmer from thinking in terms of primitive data types, and allows operations to be expressed at a higher level. The price of this freedom is performance. "Interpreter" is the preferred term for Python because it has to pause to inspect data types, and also because the comparatively concise syntax of dynamically-typed languages is a good fit for interactive interfaces. There's no technical barrier to building an interactive Java interface, but trying to write any statically-typed code interactively would be tedious, so it just isn't done that way.

In the Java world, the virtual machine steals the show because it runs programs written in a language which can actually be compiled into machine instructions, and the result is speed and resource efficiency. Java bytecode can be executed by the Java virtual machine with performance approaching that of compiled programs, relatively speaking. This is due to the presence of primitive data type information in the bytecode. The Java virtual machine puts Java in a category of its own:

portable interpreted statically-typed language

The next closest thing is LLVM, but LLVM operates at a different level:

portable interpreted assembly language

The term "bytecode" is used in both Java and Python, but not all bytecode is created equal. bytecode is just the generic term for intermediate languages used by compilers/interpreters. Even C compilers like gcc use an intermediate language (or several) to get the job done. Java bytecode contains information about primitive data types, whereas Python bytecode does not. In this respect, the Python (and Bash,Perl,Ruby, etc.) virtual machine truly is fundamentally slower than the Java virtual machine, or rather, it simply has more work to do. It is useful to consider what information is contained in different bytecode formats:

  • llvm: cpu registers
  • Java: primitive data types
  • Python: user-defined types

To draw a real-world analogy: LLVM works with atoms, the Java virtual machine works with molecules, and The Python virtual machine works with materials. Since everything must eventually decompose into subatomic particles (real machine operations), the Python virtual machine has the most complex task.

Intepreters/compilers of statically-typed languages just don't have the same baggage that interpreters/compilers of dynamically-typed languages have. Programmers of statically-typed languages have to take up the slack, for which the payoff is performance. However, just as all nondeterministic functions are secretly deterministic, so are all dynamically-typed languages secretly statically-typed. Performance differences between the two language families should therefore level out around the time Python changes its name to HAL 9000.

The virtual machines of dynamic languages like Python implement some idealized logical machine, and don't necessarily correspond very closely to any real physical hardware. The Java virtual machine, in contrast, is more similar in functionality to a classical C compiler, except that instead of emitting machine instructions, it executes built-in routines. In Python, an integer is a Python object with a bunch of attributes and methods attached to it. In Java, an int is a designated number of bits, usually 32. It's not really a fair comparison. Python integers should really be compared to the Java Integer class. Java's "int" primitive data type can't be compared to anything in the Python language, because the Python language simply lacks this layer of primitives, and so does Python bytecode.

Because Java variables are explicitly typed, one can reasonably expect something like Jython performance to be in the same ballpark as cPython. On the other hand, a Java virtual machine implemented in Python is almost guaranteed to be slower than mud. And don't expect Ruby, Perl, etc., to fare any better. They weren't designed to do that. They were designed for "scripting", which is what programming in a dynamic language is called.

Every operation that takes place in a virtual machine eventually has to hit real hardware. Virtual machines contain pre-compiled routines which are general enough to to execute any combination of logical operations. A virtual machine may not be emitting new machine instructions, but it certainly is executing its own routines over and over in arbirtrarily complex sequences. The Java virtual machine, the Python virtual machine, and all the other general-purpose virtual machines out there are equal in the sense that they can be coaxed into performing any logic you can dream up, but they are different in terms of what tasks they take on, and what tasks they leave to the programmer.

Psyco for Python is not a full Python virtual machine, but a just-in-time compiler that hijacks the regular Python virtual machine at points it thinks it can compile a few lines of code -- mainly loops where it thinks the primitive type of some variable will remain constant even if the value is changing with each iteration. In that case, it can forego some of the incessent type determination of the regular virtual machine. You have to be a little careful, though, lest you pull the type out from under Psyco's feet. Pysco, however, usually knows to just fall back to the regular virtual machine if it isn't completely confident the type won't change.

The moral of the story is that primitive data type information is really helpful to a compiler/virtual machine.

Finally, to put it all in perspective consider this: a Python program executed by a Python interpreter/virtual machine implemented in Java running on a Java interpreter/virtual machine implemented in LLVM running in a qemu virtual machine running on an iPhone.

permalink

pooryorick