views:

581

answers:

9

I have designed around 5 experimental languages and interpreters for them so far, for education, as a hobby and for fun.

One thing I noticed: The assembly-like language featuring only subroutines and conditional jumps as structures was much slower than the high-level language featuring if, while and so on. I developed them both simultaneously and both were interpreted languages. I wrote the interpreters in C++ and I tried to optimize the code-execution part to be as fast as possible.

My hypothesis: In almost all cases, performance of interpreted languages rises with their level (high/low).

  • Am I basically right with this? (If not, why?)

EDIT: I didn't mention the word compiled here even once, it is about interpreted versus interpreted!

+4  A: 

The reality, of course, is slightly more complicated than that. As languages, interpreters and compilers become more sophisticated, new opportunities arise for the machine to optimize performance. Further, the performance of any given program depends greatly on the quality of the code that the programmer has written.

Also, performance affects different types of programs in different ways. Line-of-business applications, for example, almost always take most of their speed hit looking up data from a database and sending it across the wire, whereas game programmers must deal with a completely different concept of performance, namely that of video card frame rates.

So the performance picture is not nearly as simple as it might seem.

Robert Harvey
It's worth noting that he's saying that his higher-level interpreted language is faster than his lower-level *interpreted* language, *not* that it's faster than compiled code. While the points you make are perfectly valid, I don't believe that's what he's suggesting.
Adam Robinson
@Adam: Yeah, I saw that. Whoever heard of an *interpreted assembly language?*
Robert Harvey
@Robert Harvey: I needed something easy to start with ;D It's pretty fast, though: **Three times as fast as Python**, and 5 times faster than Ruby. (Measured with various function call, recursion, integer processing and string manipulation tests)
immersion
@Robert: Two examples: Java bytecode, and .NET IL.
Martinho Fernandes
@Robert, while it's not assembly (and the compiled code gets cached), IL is very much an intepreted lower-level language ;)
Adam Robinson
@Adam: That's a bit of a stretch, given that it *is* compiled. You can even bake it to purely compiled native code by using NGEN.
Robert Harvey
@Robert: Every interpreted language is (in that sense) compiled, as they all have to be turned into machine instructions *at some level*. The only difference is that the compiled version of the IL code is cached once it's compiled.
Adam Robinson
@Adam: Fair enough. But having a JIT compiler (rather than a straight interpreter) gives IL more of the performance characteristics of a compiled language than an interpreted one.
Robert Harvey
@Harvey: No. A valid comparison requires that both interpreters be written by people of similar skill and like attention to performance. There's no surprise that someone trying to code for performance gets it, potentially higher performance than a better algorithm. But that's only if the better algorithm is not also coded for performance.
Ben Voigt
@Ben: Which is pretty much what I said in my answer.
Robert Harvey
@Adam: if, at runtime, the actual code being executed is not in object code form, and it is executing due to a virtual machine reading in each command and processing them on the fly, then it's being interpreted. If, just before it is run, a JIT compiler converts it to object code, it's compiled.
Jimmeh
@Jimmeh: We're splitting hairs here. Those who dislike JIT code would call .NET code interpreted. Those who like JIT code would call it compiled. Going by the strictest definitions of both, it's *neither*. A case can be made for either side, but I think that given (on the first pass) the compilation (transformation of the code into machine instructions/object code) takes place *immediately prior to execution*, then it's not that big of a stretch to argue that it's at least *similar* to interpreted.
Adam Robinson
Ok, but in this case "for each statement parsed, perform a complex operation" really is a better algorithm than "for each statement parsed, perform a simple operation, and combine multiple steps to perform a complex operation". Matlab is a really good example of this: code that calls vectorized built-in functions is several orders of magnitude faster than code that uses loops, and that's despite the presence of a JIT. The pre-compiled SIMD operations are optimized much better than just a collection of simple steps, above and beyond the win from using native code to do the looping.
Ben Voigt
The existance of a JIT for Java (on some platforms) does not mean that there aren't interpreters. Java interpreters make a good example of performance characteristics of interpreted low-level languages.
Ben Voigt
@Adam: Now that you say it, I suppose it'd be quite valid to say that JIT code is interpreted, just at a higher level of granularity than classically interpreted code - i.e. at a whole program level rather than at a statement level. Never really thought about it that way before :)
Jimmeh
+27  A: 

In both cases you are interpeting code. I would imagine in the higher level languages you have fewer instructions to achieve the same task, so you are spending less time interpreting instructions and more time doing something useful.

Mark Byers
I would suggest that the OP profile interpretation of both one of his high level and one of his low level languages. If this is the reason (likely), it should clearly show up in the profile.
Martinho Fernandes
@Martinho: I would accomplish the same thing by single-stepping the interpreter. I can tell how much time it's wasting by how tired I get.
Mike Dunlavey
@Mike: that's exactly what I said. You just picked up a profiler that you have to pedal :P
Martinho Fernandes
He may find that gap narrowed a bit if he puts some earnest work into optimizing the main loop of his interpreters. Googling "pipelined interpreter" kept me busy for weeks back when I fiddled with language design.
Jason
@Martinho: Don't get me started ;-) Most of those gas-powered profilers miss the whole point.
Mike Dunlavey
A: 

Summary: benchmarking is hard. You can't generalize from anecdotes.

This is not a rule supported by evidence.

How are you setting up your experiment from which you draw the "higher = faster" conclusion? Do you have example implementations of identical algorithms and test sets that allow you to objectively measure one interpreted language against another?

If you're just creating little mini-benchmarks, the probability is very high that you are drawing a conclusion from too few data points.

For example, if one interpreted language has a quicksort operator, you might think "Aha! Faster!" than a language where you have to implement sorting by hand. However, quicksort has worst case performance of O(n^2). If I hand your higher-level operation an example of a worst-case data set, a different hand-implemented merge sort is going to beat it.

Bob Cross
Well, you can still do optimization with interpreted languages. Examples: IronPython, IronRuby (and I'm sure the C implementations also have their own optimizations). You can also "compile as you go", the same way a JIT-compiler does with Java bytecode or CIL.
Martinho Fernandes
These examples are compiled to a byte code (and presumably optimized) and then the byte code is interpreted.
Joel
@Joel: I don't follow. The byte code *is an interpreted language*, and it *suffers optimization* by the interpreter (aka JIT-compiler). Also, IronPython benefits from optimizations provided by the DLR like call-site caching.
Martinho Fernandes
again it's not comparing interpreted to compiled, he's comparing HLL interpreted with LLL INTERPRETED!
Brian Postow
@Brian, you're right. I changed my answer to address your point.
Bob Cross
+1  A: 

Well, obviously it'll depend on how you've implemented the different languages.

I would conjecture that more instructions need to be interpreted to do the same thing in a lower level interpreted language, and there's going to be the overhead of having to interpret every single low level instruction as opposed to fewer higher level statements.

Jimmeh
+11  A: 

In the end, parsing a line of text in your language is going to take roughly the same amount of time whether we're talking about assembly versus The Next Big Thing Language (TNBL). This isn't true in a literal sense, but it is true in a Turing-esque, Big-O-Notation sort of way.

If it takes (again, "roughly") the same amount of time to determine what this means:

mov $3, $4, $5

as this:

print "foo"

...then let's imagine writing Hello World in our two languages. The assembly interpreter is going to have a much more complex program to parse. Say, n lines long, which is n times as many lines as the TNBL Hello Wolrld. So your most basic overhead is n times as long.

On top of that, you have all the code that you execute in the simpler language that models the behavior of registers, operations, etc. It's a lot of work. In TNBL, there's nearly a one-to-one mapping between the semantics of the interpreted code and something you can do in the host language. This means a greatly reduced overhead in going from semantics to execution.

I'm sure you'll see some Java programmers who would balk at this thesis, but I would point out that Java has a couple of advantages going for it: an intermediate bytecode that tries to get the code as close to the hardware as possible before executing it, and thousands of man-years sunk into developing compile-time and run-time optimization of that language. They are hardly on the continuum with a hobbyist language. =]

+1  A: 

I'd say you're about half right. If you graphed speed of interpretation, with language level on the X axis and speed of execution on the Y axis, you'd get a "bathtub" curve -- interpreting an extremely low level language can be pretty fast, and interpreting an extremely high level language can be pretty fast. In between the two, interpretation is substantially slower.

Given an extremely high-level input language (e.g., APL) you get minimal interpretation overhead because you can do a great deal of work based on parsing relatively little input code.

At the opposite extreme, you can get pretty decent speed simply because with a sufficiently low-level language, the interpretation becomes almost trivial. The inner interpreter of a Forth implementation is a prime example of this. These often represent a given program quite compactly, which tends to be quite cache friendly, so at least in theory you could get execution as fast as (or even somewhat faster than) pure machine code.

The original intent was that most JVMs, the .NET "managed" environment, Smalltalk byte-code interpreters, etc., would fit the latter case. As most who've tried to develop these have quickly found, it's hard to keep the interpretation overhead low enough to accomplish this. Every one I know of that fits into this class has the inner loop written in assembly language by hand and usually by a good assembly language programmer. I'd almost go so far as to say that if you try to use a higher-level language (even C) it's going to be slower, and probably significantly so (when you're adding overhead for every input instruction, even one extra instruction in the inner loop will almost certainly lead to a measurable speed penalty).

Jerry Coffin
A: 

How do you make a fair comparison?

I assume you are not counting parsing speed, and I assume you are generating an intermediate byte-code instruction set, and that's what you interpret.

Suppose you have a "benchmark" that is a loop to sum the elements of an array. If your higher-level language has special byte code instructions for this, so that in the higher-level language, there are fewer byte codes to interpret, that right there would explain why it is faster.

Mike Dunlavey
A: 

Assuming both are implemented in about the same way, with the low level language unoptimized, and the high level language not expanded to lower level intermediate code - 'probably'.

In such a case, the high level language would have the advantage of cramming more compiled (faster) code per instruction.

In practice though, a low level interpreted language should be able to reduce the overhead per instruction, and is typically much more straightforward to implement JIT compilation with. Just how well each is implemented will ultimately determine how they stack up against each other.

I will say that it's easier to implement a faster high level language. (in my limited experience)

pdehaan
A: 

If you had an interpreted language with one command: runMyEntireProgramNatively(), it would be faster than an interpreted language with more commands. The less each command does, the greater the ratio of time spent in interpretation to actually doing stuff.

drawnonward