Don't be shocked. This is a lot of text but I'm afraid without giving some detailed information I cannot really show what this is all about (and might get a lot of answers that don't really address my question). And this definitely not an assignment (as someone ridiculously claimed in his comment).
Prerequisites
Since this question can probably not be answered at all unless at least some prerequisites are set, here are the prerequisites:
- The Virtual Machine code shall be interpreted. It is not forbidden that there may be a JIT compiler, but the design should target an interpreter.
- The VM shall be register based, not stack based.
- The answer may neither assume that there is a fixed set of registers nor that there is an unlimited number of them, either one may be the case.
Further we need a better definition of "better". There are a couple of properties that must be considered:
- The storage space for the VM code on disk. Of course you could always scrap all optimizations here and just compress the code, but this has a negative effect on (2).
- Decoding speed. The best way to store the code is useless if it takes too long to transform that into something that can be directly executed.
- The storage space in memory. This code must be directly executable either with or without further decoding, but if there is further decoding involved, this encoding is done during execution and each time the instruction is executed (decoding done only once when loading the code counts to item 2).
- The execution speed of the code (taking common interpreter techniques into account).
- The VM complexity and how hard it is to write an interpreter for it.
- The amount of resources the VM needs for itself. (It is not a good design if the code the VM runs is 2 KB in size and executes faster than the wink of an eye, however it needs 150 MB to do this and its start up time is far above the run time of the code it executes)
Now examples what I actually mean by more or less opcodes. It may look like the number of opcodes is actually set, as you need one opcode per operation. However its not that easy.
Mulitple Opcodes for the Same Operation
You can have an operation like
ADD R1, R2, R3
adding the values of R1 and R2, writing the result to R3. Now consider the following special cases:
ADD R1, R2, R2
ADD R1, 1, R1
These are common operations you'll find in a lot of applications. You can express them with the already existing opcode (unless you need a different one because the last one has an int value instead of a register). However, you could also create special opcodes for these:
ADD2 R1, R2
INC R1
Same as before. Where's the advantage? ADD2 only needs two arguments, instead of 3, INC even only needs a single one. So this could be encoded more compact on disk and/or in memory. Since it is also easy to transform either form to the other one, the decoding step could transform between both ways to express these statements. I'm not sure how much either form will influence execution speed, though.
Combining Two Opcodes Into a Single One
Now let's assume you have an ADD_RRR (R for register) and a LOAD to load data into an register.
LOAD value, R2
ADD_RRR R1, R2, R3
You can have these two opcodes and always use constructs like this throughout your code... or you can combine them into a single new opcode, named ADD_RMR (M for memory)
ADD_RMR R1, value, R3
Data Types vs Opcodes
Assume you have 16 Bit integer and 32 Bit integer as native types. Registers are 32 Bit so either data type fits. Now when you add two registers, you could make the data type a parameter:
ADD int16, R1, R2, R3
ADD int32, R1, R2, R3
Same is true for a signed and unsigned integers for example. That way ADD can be a short opcode, one byte, and then you have another byte (or maybe just 4 Bit) telling the VM how to interpret the registers (do they hold 16 Bit or 32 Bit values). Or you can scrap type encoding and instead have two opcodes:
ADD16 R1, R2, R3
ADD32 R1, R2, R3
Some may say both are exactly the same - just interpreting the first way as 16 Bit opcodes would work. Yes, but a very naive interpreter might look quite different. E.g. if it has one function per opcode and dispatches using a switch statement (not the best way doing it, function calling overhead, switch statement maybe not optimal either, I know), the two opcodes could look like this:
case ADD16: add16(p1, p2, p3); break; // pX pointer to register
case ADD32: add32(p1, p2, p3); break;
and each function is centered around a certain kind of add. The second one though may look like this:
case ADD: add(type, p1, p2, p3); break;
// ...
// and the function
void add (enum Type type, Register p1, Register p2, Register p3)
{
switch (type) {
case INT16: //...
case INT32: // ...
}
}
Adding a sub-switch to a main switch or a sub dispatch table to a main dispatch table. Of course an interpreter can do either way regardless if types are explicit or not, but either way will feel more native to developers depending on opcode design.
Meta Opcodes
For lack of a better name I'll call them that way. These opcodes have no meaning at all on their own, they just change the meaning of the opcode following. Like the famous WIDE operator:
ADD R1, R2, R3
WIDE
ADD R1, R2, R3
E.g. in the second case the registers are 16 Bit (so you can addnress more of them), in the first one only 8. Alternatively you can not have such a meta opcode and have an ADD and an ADD_WIDE opcode. Meta opcodes like WIDE avoid having a SUB_WIDE, MUL_WIDE, etc. as you can always prepend every other normal opcode with WIDE (always just one opcode). Disadvantage is that an opcode alone becomes meaningless, you always must check the opcode before it if it was a meta opcode or not. Further the VM must store an extra state per thread (e.g. whether we are now in wide mode or not) and remove the state again after the next instruction. Even CPUs have such opcodes (e.g. x86 LOCK opcode).
How to Find a Good Trade-Off???
Of course the more opcodes you have, the bigger switches/dispatch-tables will become and the more bits you will need to express these codes on disk or in memory (though you can maybe store them more efficiently on disk where the data doesn't have to be directly executable by a VM); also the VM will become more complicated and have more lines of code - on the other hand the more powerful the opcodes are: You are getting closer to the point where every expression, even a complex one, will end up in one opcode.
Choosing little opcodes makes it easy to code the VM and will lead to very compact opcodes I guess - on the other hand it means you may need a very high number of opcodes to perform a simple task and every not extremely often used expression will have to become a (native) function call of some kind, as no opcode can be used for it.
I read a lot about all kind of VMs on the Internet, but no source was really making a good and fair trade-off going either way. Designing a VM is like designing a CPU, there are CPUs with little opcodes, they are fast, but you also need many of these. And there are CPUs with many opcodes, some are very slow, but you'll need much less of them to express the same piece of code. It looks like the "more opcodes are better" CPUs have totally won the consumer market and the "less opcodes are better" ones can only survive in some parts of the server market or super computer business. What about VMs?