views:

132

answers:

2

I'm doing a survey of features in preparation for a research project.

Name a mainstream language or language feature that is hard to optimize, and why the feature is or isn't worth the price paid, or instead, just debunk my theories below with anecdotal evidence. Before anyone flags this as subjective, I am asking for specific examples of languages or features, and ideas for optimization of these features, or important features that I haven't considered. Also, any references to implementations that prove my theories right or wrong.

Top on my list of hard to optimize features and my theories (some of my theories are untested and are based on thought experiments):

1) Runtime method overloading (aka multi-method dispatch or signature based dispatch). Is it hard to optimize when combined with features that allow runtime recompilation or method addition. Or is it just hard, anyway? Call site caching is a common optimization for many runtime systems, but multi-methods add additional complexity as well as making it less practical to inline methods.

2) Type morphing / variants (aka value based typing as opposed to variable based) Traditional optimizations simply cannot be applied when you don't know if the type of someting can change in a basic block. Combined with multi-methods, inlining must be done carefully if at all, and probably only for a given threshold of size of the callee. ie. it is easy to consider inlining simple property fetches (getters / setters) but inlining complex methods may result in code bloat. The other issue is I cannot just assign a variant to a register and JIT it to the native instructions because I have to carry around the type info, or every variable needs 2 registers instead of 1. On IA-32 this is inconvenient, even if improved with x64's extra registers. This is probably my favorite feature of dynamic languages, as it simplifies so many things from the programmer's perspective.

3) First class continuations - There are multiple ways to implement them, and I have done so in both of the most common approaches, one being stack copying and the other as implementing the runtime to use continuation passing style, cactus stacks, copy-on-write stack frames, and garbage collection. First class continuations have resource management issues, ie. we must save everything, in case the continuation is resumed, and I'm not aware if any languages support leaving a continuation with "intent" (ie. "I am not coming back here, so you may discard this copy of the world"). Having programmed in the threading model and the contination model, I know both can accomplish the same thing, but continuations' elegance imposes considerable complexity on the runtime and also may affect cache efficienty (locality of stack changes more with use of continuations and co-routines). The other issue is they just don't map to hardware. Optimizing continuations is optimizing for the less-common case, and as we know, the common case should be fast, and the less-common cases should be correct.

4) Pointer arithmetic and ability to mask pointers (storing in integers, etc.) Had to throw this in, but I could actually live without this quite easily.

My feelings are that many of the high-level features, particularly in dynamic languages just don't map to hardware. Microprocessor implementations have billions of dollars of research behind the optimizations on the chip, yet the choice of language feature(s) may marginalize many of these features (features like caching, aliasing top of stack to register, instruction parallelism, return address buffers, loop buffers and branch prediction). Macro-applications of micro-features don't necessarily pan out like some developers like to think, and implementing many languages in a VM ends up mapping native ops into function calls (ie. the more dynamic a language is the more we must lookup/cache at runtime, nothing can be assumed, so our instruction mix is made up of a higher percentage of non-local branching than traditional, statically compiled code) and the only thing we can really JIT well is expression evaluation of non-dynamic types and operations on constant or immediate types. It is my gut feeling that bytecode virtual machines and JIT cores are perhaps not always justified for certain languages because of this.

I welcome your answers.

+1  A: 

A couple of comments:

  • All languages with dynamic dispatch, even based on a single object, seem pretty hard to implement efficiently. Look at all the effort that has gone into run-time optimization of Self (or more recently, JavaScript, with SpiderMonkey).

  • Don't overlook delimited continuations. The jury is still out, but they are significantly easier to optimize than classic undelimited continuations. Read the paper by Gasbichler and Sperber.

Norman Ramsey
+1 thanks. I agree on the point about all the effort that has gone into Javascript, and its still sub-par when debugging large programs.
mrjoltcola
+1  A: 

It is my gut feeling that bytecode virtual machines and JIT cores are perhaps not always justified for certain languages because of this.

Didn't IronPython get written to prove that a virtual machine could not do as well as the native implementation of the language (Python). And then the author of IronPython got rather a shock when they found out that IronPython performed really well for a dynamic language on a bytecode VM?

Microsoft's own .Net internals group are on record as stating that they think ultimately the JITter will outperform a "normal" compiler/linker (for say C/C++).

I think the jury is still out on this. Hard to call it either way. Choose the language that best fits the job...

Stephen Kellett
"Choose the language that best fits the job..." - Thanks, I agree, but this is not the goal of my question. It is about optimization and features, specifically. As for IronPython, I don't know the answer to your question, but I am interested to read if you have references I can follow.
mrjoltcola
My question was rhetorical. IronPython was so good that it disproved the theory that dynamic languages were not suitable for bytecode virtual machines. Hence the followup paragraph about Microsoft which supports that viewpoint.And that leads to the last paragraph. Worrying about whether a dynamic language is suitable is missing the point. They are (assuming the VM is good enough, some of them are not), so you can just choose the best language for the job.
Stephen Kellett
This question is not about whether a language is suitable, so please keep it on topic. I'm a languages and VM researcher / implementer and I'm talking about specific optimizations of features and you are talking "choose the best tool for the job". I write tools for the job. Now, to IronPython, it is a useful example, but some of the features I've listed aren't present in Python. Python doesn't have real closures (variables are readonly) or first class continuations (CPS is not the same thing). IronPython proves that Python can be implemented well on CLR, not that all languages can.
mrjoltcola
+1 for lively debate, since only two brave souls have bothered to answer.
mrjoltcola
IronRuby is another dynamic language, which is also faster on the DLR than the native version: http://antoniocangiano.com/2009/08/03/performance-of-ironruby-ruby-on-windows/ You've also got JRuby, Groovy, Lua (and LuaJIT). I'm not going to try to debunk your ideas - I don't use dynamic languages enough to have a strong enough opinion about them, let alone write a VM implementing them.A chap called Mike Pall maintains the Lua JIT project. He is very approachable - you may find him a gold mine of experience and opinions on various techniques.
Stephen Kellett
IronRuby is another CLR/DLR based language that is not (yet) a full implementation of the reference edition, nor all of the features I mentioned above (at least as far as I know). But thanks for the comments.
mrjoltcola