ansaurus

Question

VM Design: More opcodes or less opcodes? What is better?

Answer 1

+2 A:

For software performance it's easier if all opcodes are the same length, so you can have one gigantic switch statement and not have to examine various option bits that might have been set by preceding modifier opcodes.

Two matters that I think you didn't ask about are ease of writing compilers that translate programming languages to your VM code and ease of writing interpreters that execute your VM code. Both of these are easier with fewer opcodes. (But not too few. For example if you omit a divide opcode then you get an opportunity to learn how to code good division functions. Good ones are far harder than simple ones.)

Windows programmer 2009-06-10 06:25:52

I didn't ask for compilers because the language that will be translated to VM code is not set. There could be hundreds of languages. I'm only interested for code that has already been compiled. Regarding how hard it is to write the VM, that is issue number 5 of my definition of "better". Same length is probably a good point (upvote for that), but it's not really answering how to find the trade off between having little and having plenty of opcodes.

Mecki 2009-06-10 08:54:05

"I'm only interested for code that has already been compiled." -- Huh? How could code be compiled before you've defined the set of opcodes that the code can be compiled to? The source code is Java or C++ or Haskell or whatever, the object code is your VM's machine language, and a compiler has to do that translation.

Windows programmer 2009-06-10 23:47:39

All initial code the VM is going to execut will be hand crafted (like assembly programming for a CPU). Compilers that can translate a high level language to that VM will come much later and here I don't really care how hard or easy it is to write such a compiler (that also depends very much on the high level language) - as long as the VM is "turning complete" there is no language that could not be compiled to it. And this completeness can already been achieved with only 8 opcodes.

Mecki 2009-06-11 13:32:26

Answer 2

+1 A:

To be honest, I think it's largely a matter of the purpose of the VM, similar to how the processor design is largely determined by how the processor is primarily meant to be used.

In other words, you'll preferably be able to determine common use case scenarios for your VM, so that you can establish features that are likely going to be required, and also establish those that are unlikely to be very commonly required.

Of course I do understand, that you are probably envisioning an abstract, very generic, Virtual Machine, that can be used as the internal/backend implementation for other programming languages?

However, I feel, it's important to realize and to emphasize that there really is no such thing as a "generic ideal" implementation of anything, i.e. once you keep things generic and abstract you will inevitably face a situation where you need to make compromises.

Ideally, these compromises will be based on real life use scenarios for your code, so that these compromises are actually based on well-informed assumptions and simplifications that you can make without going out on a limb.

In other words, I would think about what are the goals for your VM? How is it primarily going to be used in your vision? What are the goals you want to achieve?

This will help you come up with requirements and help you make simplifcations, so that you can design your instruction set based on reasonable assumptions.

If you expect your VM to be primarily used by programming languages for numbers crunching, you'll probably want to look for a fairly powerful foundation with maths operations, by providing lots of low level primitives, with support for wide data types.

If on the other hand, you'll server as the backend for OO languages, you will want to look into optimizing the corresponding low level instructions (i.e. hashes/dictionaries).

In general, I would recommend to keep the instruction set as simple and intuitive as possible in the beginning, and only add special instructions once you have proven that having them in place is indeed useful (i.e. profile & opcode dumps) and does cause a performance gain. So, this will be largely determine by the very first "customers" your VM will have.

If you are really eager to research more involved approaches, you could even look into dynamically optimizing the instruction set at runtime, using pattern matching to find common occurrences of opcodes in your bytecode, in order to derive more abstract implementations, so that your can transform your bytecode dynamically with custom, runtime-generated, opcodes.

none 2009-06-11 01:21:06

It should be like a CPU. A CPU runs code from low level C number crunching to high level C++. It should not be one with a very limited purpose, it should be as general and it should be possible to run pretty much everything that is today written in an common programming language. A bit like LLVM, but LLVM is designed for being first compiled and then executed, it's not suited for interpretation.

Mecki 2009-06-12 14:08:38

yes, but CPUs are obviously also specialized for certain uses, some more so than others - similarly, there are different philosophies to processor design (e.g. CISC vs. RISC) and you are basically trying to combine the best of both worlds with your question.

none 2009-06-14 04:35:44

Answer 3

A:

I prefer minimalistic instruction-sets because there can be combined into one opcode. For example an opcode consisting of two 4 bit instruction fields can be dispatched with an 256 entry jump-table. As dispatch overhead is the main bottleneck in interpretation perfomance increased by an factor ~ two because only every second instruction needs to be dispatched. One way to implement an minimalistic but effective instruction set would be an accumulator/store design.

Sian 2010-10-23 11:25:43

ansaurus

tags:

views:

answers:

VM Design: More opcodes or less opcodes? What is better?

Prerequisites

Mulitple Opcodes for the Same Operation

Combining Two Opcodes Into a Single One

Data Types vs Opcodes

Meta Opcodes

How to Find a Good Trade-Off???

related questions