views:

755

answers:

12

What are the technical reasons why languages like Python and Ruby are interpreted (out of the box) instead of compiled? It seems to me like it should not be too hard for people knowledgeable in this domain to make these languages not be interpreted like they are today, and we would see significant performance gains. So certainly I am missing something.

+1  A: 

By design.

The authors wanted something where they can write scripts into.

Python gets compiled the first time it is executed though

OscarRyz
i.e. "Byte compiled" (to .pyc) the first time it is executed. This only speeds up load times of future executions. I think the asker is talking about being compiled to native machine code.
overthink
Perl too... sigh, no body asks about Perl.
dlamblin
"The authors wanted something where they can write scripts into."? This statement confuses me. The suitability of purpose for given tasks of a given language has a lot less to do with its' write/(compile,link)/run cycle than the richness of a standard library, memory management, and type model. This would probably garner some support for writing scripts in Java as well, if there wasn't the other small consideration of how much overhead it takes to start the virtual machine...
Nick Bastin
+2  A: 

Well, isn't one of the strengths of these languages that they are so easily scriptable? They wouldn't be if they were compiled. And on the other hand, dynamic languages are easier to intereprete than to compile.

Maximilian Mayerl
+6  A: 

I think the biggest reason for the languages being interpreted is portability. As a programmer you can write code that will run in an interpreter not a specific OS. So your programs behave more uniformly across platforms (more so than compiled languages). Another advantage I can think of is it's easier to have a dynamic type system in an interpreted language. I think the creators of the language were thinking having a language where programmers can be more productive due to automatic memory management, dynamic type system and meta programming wins over any performance loss due to the language being interpreted. If you are concerned about performance you can always compile the language to native machine code employing a technique like JIT compilation.

neesh
I'm not sure the portability argument is sound. If you decouple your lexer/AST-builder from your code generation backend, you can vary the two independently (which is somewhat how gcc is architected). So I don't see it being the reason. Your point about a dynamic type system being easier to implement in an interpreted language is sound. In fact, any dynamic type system requires runtime support (either through an interpreter or through a runtime library) practically by definition.
Daniel Pryden
The portability argument is in fact not at all sound. Interpreted languages are no easier to port than compiled ones, generally. The only reason they might even *seem* easier to port is because you can write your interpreted language port without understanding the underlying machine, but that's just a question of competence, not difficulty.
Nick Bastin
you guys are right..this came out all wrong! I am going to update my answer to say what I was really thinking.
neesh
+2  A: 

In a compiled language, the loop you get into when making software is

  1. Make a change
  2. Compile changes
  3. Test changes
  4. goto 1

Interpreted languages tend to be faster to make stuff in because you get to cut out step two of that process (and when you're dealing with a large system where compile times can be upwards of two minutes, step two can add a significant amount of time).

This isn't necessarily the reason python|ruby designers thought of, but keep in mind that "How efficiently does the machine run this?" is only half the software development problem.

It also seems like it would be easier to compile code in a language that's interpreted naturally than it would be to add an interpreter to a language that's compiled by default.

Inaimathi
+27  A: 
DigitalRoss
"Ever seen a Java script? I haven't." Ha! So your reputation points might be 30x more than mine, but even I have heard of JavaScript!
mtyaka
Hehe, that's good. Actually, I learn a lot on SO, and I know I still have a lot to learn. But, English-pun's notwithstanding, (1) by "script" I meant "a shell-executed script, like in bash or perl"; (2) I have never seen one of those in JS either, and (3) JS has pretty much nothing whatsoever to do with Java.
DigitalRoss
@mtyaka: Clearly he means "script written in Java", not "Javascript", which is an entirely different language. @DigitalRoss: actually, javascript is gaining popularity as a non-web scripting language with the advent of standalone interpreters like V8 and SquirrelFish.
Nick Bastin
I know, it was just a joke :) When I saw words "Java" and "script" next to each other in the same sentence, I couldn't resist. @DigitalRoss: By the way, thank you for a nice and informative answer.
mtyaka
@Nick, I'll give mtyaka credit for a good joke, though it's always hard to tell. And yeah, actually I have http://www.ossp.org/pkg/lib/js/ (OSSP js) installed on my system, and I know it's being embedded in non-browser applications or used stand-alone, but I've only personally used it to experiment with javascript one-liners and I have yet to see my first actual javascript script in the wild. :-)
DigitalRoss
Aha, it *was* a joke. Very good.
DigitalRoss
I've seen a house fly. And time flies like an arrow, while fruit flies like a banana.
Andrew Grimm
+5  A: 

Today, there is no longer a strong distinction between "compiled" and "interpreted" languages. Python is in fact compiled just as much as Java is, the only differences are:

  • The Python compiler is much faster than the Java compiler
  • Python automatically compiles source code as it is executed, there is no separate "compile" step required
  • Python bytecode is different from JVM bytecode

Python even has a function called compile() which is an interface to the compiler.

It sounds like the distinction you are making is between "dynamically typed" and "statically typed" languages. In dynamic languages such as Python, you can write code like:

def fn(x, y):
    return x.foo(y)

Notice that the types of x and y are not specified. At runtime, this function will look at x to see whether it has a member function named foo, and if so will call it with y. If not, it will throw a runtime error that indicates no such function was found. This sort of runtime lookup is much easier to represent using an intermediate representation like bytecode, where a runtime VM does the lookup instead of having to generate machine code to do the lookup itself (or, call a function to do the lookup which is what the bytecode will do anyway).

Python has projects such as Psyco, PyPy, and Unladen Swallow that take various approaches to compiling Python object code into something closer to native code. There is active research in this area but there is not (as yet) a simple answer.

Greg Hewgill
+14  A: 

Exactly like (in the typical implementation of) Java or C#, Python gets first compiled into some form of bytecode, depending on the implementation (CPython uses a specialized form of its own, Jython uses JVM just like a typical Java, IronPython uses CLR just like a typical C#, and so forth) -- that bytecode then gets further processed for execution by a virtual machine (AKA interpreter), which may also generate machine code "just in time" -- known as JIT -- if and when warranted (CLR and JVM implementations often do, CPython's own virtual machine typically doesn't but can be made to do so e.g. with psyco or Unladen Swallow).

JIT may pay for itself for sufficiently long-running programs (if memory's way cheaper than CPU cycles), but it may not (due to slower startup times and larger memory footprint), especially when the types also have to be inferred or specialized as part of the code generation. Generating machine code without type inference or specialization is easy if that's what you want, e.g. freeze does it for you, but it really doesn't present the advantages that "machine code fetishists" attribute to it. E.g., you get an executable binary of 1.5 to 2 MB in lieu of a tiny "hello world" .pyc -- not much point!-). That executable is stand-alone and distributable as such, but it will only work on a very specific narrow range of operating systems and CPU architectures, so the tradeoffs are quite iffy in most cases. And, the time it takes to prepare the executable is quite long indeed, so it would be a crazy choice to make that mode of operation the default one.

Alex Martelli
+8  A: 

Merely replacing an interpreter with a compiler won't give you as big a performance boost as you might think for a language like Python. When most time is actually spend doing symbolic lookups of object members in dictionaries, it doesn't really matter if the call to the function performing such lookup is interpreted, or is native machine code - the difference, while not quite negligible, will be dwarfed by lookup overhead.

To really improve performance, you need optimizing compilers. And optimization techniques here are very different from what you have with C++, or even Java JIT - an optimizing compiler for a dynamically typed / duck typed language such as Python needs to do some very creative type inference (including probabilistic - i.e. "90% chance of it being T" and then generating efficient machine code for that case with a check/branch before it) and escape analysis. This is hard.

Pavel Minaev
+5  A: 

The effort required to create a good compiler to generate native code for a new language is staggering. Small research groups typically take 5 to 10 years (examples: SML/NJ, Haskell, Clean, Cecil, lcc, Objective Caml, MLton, and many others). And when the language in question requires type checking and other decisions to be made at run time, a compiler writer has to work much harder to get good native-code performance (for an excellent example, see work by Craig Chambers and later Urs Hoelzle on Self). The performance gains you might hope for are harder to realize than you might think. This phenomenon partly explains why so many dynamically typed languages are interpreted.

As noted, a decent interpreter is also instantly portable, while porting compilers to new machine architectures takes substantial effort (and is a problem I personally have been working on for over 20 years, with some time off for good behavior). So an interpreter is a way to reach a wide audience quickly.

Finally, although fast compilers and slow interpreters exist, it's usually easer to make the edit-translate-go cycle faster by using an interpreter. (For some nice examples of fast compilers see the aforementioned lcc as well as Ken Thompson's go compiler. For an example of a relatively slow interpreter see GHCi.

Norman Ramsey
+2  A: 

REPL. Don't knock it 'till you've tried it. :)

Parappa
SML/NJ has offered native-code compiled REPL for over 20 years... as have many Lisp systems.
Norman Ramsey
A: 

Compiling Ruby at least is notoriously hard. I'm working on one, and as part of that I wrote a blog post enumerating some of the issues here.

Specifically, Ruby is suffering from a very unclear (i.e. non-existent) boundary between the "read" and "execute" phase of the program that makes it hard to compile efficiently. You could just emulate what the interpreter does, but then you're not going to see much speed up, so it wouldn't be worth the effort. If you want to compile it efficiently you then face a lot of additional complications to handle the extreme level of dynamism in Ruby.

The good news is that there are techniques for overcoming this. Self, Smalltalk and Lisp/Scheme's have dealt quite successfully with most of the same issues. But it takes time to sift through it and figure out how to make it work with Ruby. It also doesn't help that Ruby has a very convoluted grammar.

Vidar Hokstad
A: 

Raw compute performance is probably not a goal of most interpreted languages. Interpreted languages are typically more concerned about programmer productivity than raw speed. In most cases these languages are plenty fast enough for the tasks the languages were designed to tackle.

Given that, and that just about the only advantages of a compiler are type checking (difficult to do in a dynamic language) and speed, there's not much incentive to write compilers for most interpreted languages.

Bryan Oakley