views:

80

answers:

4

Hi, OK, first, I dont want any kind of flamewar here or anything like it. My bigger question is more theoretical, and will include few examples.

So, as I wrote, I cannot understand how can interpreted language be even little efficient. And since its modern, I will take Java as example.

Lets go back to days where there was no JIT compilers. Java has its virtual machine which is basically its hardware. You write code, than it compiled into bytecode for taking at least some job off the virtual machine, thats fine. But considering how complex even RISC instruction set can be in hardware, I cannot even think of way to do it at software emulated hardware.

I have no experience writing virtual machines, so I dont know how its done at most efficient level, but I cannot think of anything more efifcient than testing every instruction for match adn than do appropriate actions. You know, something like: if(instruction=="something") { (do it) } else if(instruction=="something_diffrent"){ (do it) }etc....

But this has to be terribly slow. And still, even there are articles that java was slow before JIT compilers, they still say that its not so slow. But for emulating it must take many clock cycles of real HW to perform one bytecode instruction.

And still, even entire platforms are based on java. For example, Android. And first verisons of Android had no JIT compiler. They were interpreted. But should not than be Android terribly slow? And yet it is not. I know, when you call some API function, from Android library, they are written in machine code, so they are efficient, so this helps a lot.

But imagine that you would write your own game engine from sratch, using API just for displaying images. You would need to do many array copy operations, many calculations which would be terribly slow when emulated.

And now some examples as I promised. Since I mainly work with MCUs, I found JVM for Atmel AVR MCU. Thay state that 8MHZ MCU can do 20K java optcodes per second. But since AVR can do most instructions in one or two cycles, lets say 6000000 instructions average. This gives us that JVM without JIT compiler is 300 times slower to machine code. So why become java so popular without JIT compiler? Isnt this too bad performance loss? I just cannot understand it. Thanks.

A: 

But should not then be Android terribly slow?

Define "terribly slow". It's a phone. It has to process "Dial first digit" before you dial the second digit.

In any interactive application, the limiting factor is always human reaction time. It could be a 100 time slower and still be faster than the user.

So, to answer you question, yes, interpreters are slow, but they are usually fast enough, particularly as hardware keeps getting faster.

Remember when Java was introduced, it was sold as a web applet language (replacing and now replaced by, Javascript --- which also interpreted). It was only after JIT compilation that it became popular on servers.

Bytecode interpreters can be faster that a line of if()s by using a jump table:

 void (*jmp_tbl)[256] = ...;  /* array of function pointers */
 byte op = *program_counter++;
 jmp_tbl[op]();
James Curran
Thanks, but what about games? Main rendering loops? 3D rendering? I know you would need to do atomic simulations on phone, but Android is rich platform. Even youtube video procesisng....
B.Gen.Jack.O.Neill
Just because the deveoper side of the API is java, doesn't mean that that code being called is. The OS kernel is often written in a native code (can't say about Andriod specifically)
James Curran
Well, I believe Android is the case. But I menationed that, that this is huge performance gain....
B.Gen.Jack.O.Neill
"replaced by, Javascript --- which also interpreted" - although as it happens, purely interpreted Javascript implementations are dog-slow, and real browsers have compiling or tracing Javascript engines just like real Java implementations have JITs.
Steve Jessop
A: 

There are two different ways to approach this question.

(i) "why is it OK to run slow code"

As James already mentioned above, sometimes speed of execution is not all you're interested in. For lots of apps running in interpreted mode can be "fast enough". You have to take into account how the code you're writing will be used.

(ii) "why is interpreted code inneficient"

There are many ways you can implement an interpreter. In your question you talk about the most naïve approach: basically a big switch, interpreting each JVM instruction as it's read.

But you can optimize that: for example, instead of looking at a single JVM instruction, you can look at a sequence of them and look for patterns for which you have more efficient interpretations available. Sun's JVM actually does some of these optimizations in the interpreter itself. In a previous job, a guy took some time to optimize the interpreter that way and interpreted Java bytecode was running noticeably faster after his changes.

But in modern JVMs that contain a JIT compiler, the interpreter is just a stepping stone until the JIT does its job, so people don't really spend that much time optimizing the interpreter.

vanza
+3  A: 

We've had byte code around for a long time. On the old Apple II, the USCD p-system was very popular, which compiled Pascal into byte code, which would be interpreted by an 8-bit 6502 that might be running at 2 MHz. Those programs did run reasonably fast.

A bytecode interpreter would generally be based on a jump table rather than a chain of if/then/else statements. In C or C++, this would involve a switch statement. Fundamentally, the interpreter would have the equivalent of an array of processing code, and use the opcode in the byte code instruction as the index of the array.

It's also possible to have byte code that's higher-level than the machine instructions, so that one byte code instruction would translate into several, sometimes numerous, machine code instructions. A byte code that was constructed for a particular language can do this fairly easily, as it only has to match the control and data structures of that particular language. This stretches out the interpretation overhead and makes the interpreter more efficient.

An interpreted language is likely to have some speed penalty when compared to a compiled language, but this is often unimportant. Many programs process input and output at human speed, and that leaves a tremendous amount of performance that can be wasted. Even a network-bound program is likely to have far more CPU power available than it needs. There are programs that can use all the CPU efficiency they can get, and for obvious reasons they tend not to be written in interpreted languages.

And, of course, there's the question of what you get for some inefficiency that may or may not make a difference. Interpreted language implementations tend to be easier to port than compiled implementations, and the actual byte code is often portable. It can be easier to put higher-level functionality in the language. It allows the compilation step to be much shorter, meaning that execution can start much faster. It may allow better diagnostics if something goes wrong.

David Thornley
By the time 2 MHz CPUs were common, UCSD was becoming passe. It was mostly used on Apple II+'s, with a clock speed of 1.023 MHz.
Jerry Coffin
@Jerry: Thank you. I was using Z80-based machines at the time, which had different timings.
David Thornley
@David: Yup -- I can remember when I lusted after a machine with a Z80H -- rated for a clock speed of 8 whole MHz! Probably just as well I never got one -- that kind of speed might well induce a heart attack. :-)
Jerry Coffin
A: 

12 MHz would be an ATtiny, which is an 8-bit microprocessor. That means (for example) that a native 'Add" instruction can only add two 8-bit numbers together to get a 9-bit result. The JVM is basically a virtual 32-bit processor. That means its add instruction adds two 32-bit numbers together to produce a 33-bit result.

As such, when you're comparing instruction rates, you should expect a 4:1 reduction in instruction rate as an absolute minimum. In reality, while it's easy to simulate a 32-bit add with 4 8-bit adds (with carries), some things don't scale quite like that. Just for example, according to Atmel's own app note, a 16x16 multiplication producing a 32-bit result executes in ~218 clock cycles. The same app note shows a 16/16 bit division (producing an 8-bit result) running in 255 cycles.

Assuming those scale linearly, we can expect 32-bit versions of the multiplication to take ~425-450 clock cycles, and the division ~510 cycles. In reality, we should probably expect a bit of overhead, which would reduce speed still more -- adding at least 10% to those estimates probably makes them more realistic.

Bottom line: when you compare apples to apples, it becomes apparent that a whole lot of the speed difference you're talking isn't real at all (or isn't attributable JVM overhead anyway).

Jerry Coffin