views:

179

answers:

5

Hi, I'm writing a virtual machine in C just for fun. Lame, I know, but luckily I'm on SO so hopefully no one will make fun :)

I wrote a really quick'n'dirty VM that reads lines of (my own) ASM and does stuff. Right now, I only have 3 instructions: add, jmp, end. All is well and it's actually pretty cool being able to feed lines (doing it something like write_line(&prog[1], "jmp", regA, regB, 0); and then running the program:

while (machine.code_pointer <= BOUNDS && DONE != true)
{
    run_line(&prog[machine.cp]);
}

I'm using an opcode lookup table (which may not be efficient but it's elegant) in C and everything seems to be working OK.

My question is more of a "best practices" question but I do think there's a correct answer to it. I'm making the VM able to read binary files (storing bytes in unsigned char[]) and execute bytecode. My question is: is it the VM's job to make sure the bytecode is well formed or is it just the compiler's job to make sure the binary file it spits out is well formed?

I only ask this because what would happen if someone would edit a binary file and screw stuff up (delete arbitrary parts of it, etc). Clearly, the program would be buggy and probably not functional. Is this even the VM's problem? I'm sure that people much smarter than me have figured out solutions to these problems, I'm just curious what they are!

A: 

It makes sense to have the compiler do as much sanity checking as possible (since it only has to do it once), but there's always going to be issues that can't be detected by static analysis, like [cough] stack overflow, array range errors, and the like.

500 - Internal Server Error
+1  A: 

The same issue arises in Java, and as I recall, in that case the VM does have to do some checks to make sure the bytecode is well formed. In that situation, it's actually a serious issue because of the potential for security problems: if someone can alter a Java bytecode file to contain something that the compiler would never output (such as accessing a private variable from another class), it could potentially expose sensitive data being held in the application's memory, or could allow the application to access a website that it shouldn't be allowed to, or something. Java's virtual machine includes a bytecode verifier to make sure, to the extent possible, that these sorts of things don't happen.

Now, in your case, unless your homemade language takes off and becomes popular, the security aspect is something you don't have to worry about so much; after all, who's going to be hacking your programs, other than you? Still, I would say it's a good idea to make sure that your VM at least has a reasonable failure strategy for when the bytecode is invalid. At a minimum, if it encounters something it doesn't understand and can't process, it should detect that and fail with an error message, which will make debugging easier on your part.

David Zaslavsky
I was never under the impression that making members private is a security tool. My impressions is that it is merely a design tool to provide good abstractions, hiding implementation details, but not considered secure.
Chris Taylor
A: 

Virtual machines that interpret bytecode generally have some way of validating their input; for example, Java will throw a VerifyError if the class file is in an inconsistent state

However, it sounds like you're implementing a processor, and since they tend to be lower-level there's less ways you can manage to get things in a detectable invalid state -- giving it an undefined opcode is one obvious way. Real processors will signal that the process attempted to execute an illegal instruction, and the OS will deal with it (Linux kills it with SIGILL, for example)

Michael Mrozek
+8  A: 

Is it the VM's job to make sure the bytecode is well formed or is it just the compiler's job to make sure the binary file it spits out is well formed?

You get to decide.

Best practice is to have the VM do a single check before execution, cost proportional to the size of the program, which is sophisticated enought to guarantee that nothing wonky can happen during execution. Then during actual execution of the bytecode, you run with no checks. However, the check-before-running idea can require some very sophisticated analysis, and even the most performance-conscious VMs often have some checks at run time (example: array bounds).

For a hobby project, I'd keep things simple and have the VM check sanity every time you execute an instruction. The overhead for most instructions won't be too great.

Norman Ramsey
Perfect answer, thank you!
David Titarenco
A: 

I'd say it's legitimate for your VM to let the emulated processor catch fire, as long as the VM implementation itself doesn't crash. As the VM implementor, you get to set the rules. But if you want virtual hardware companies to virtually buy your virtual chip, you'll have to do something a little more forgiving of errors: good options might be to raise an exception (harder to implement) or reset the processor (much easier). Or maybe you just define every opcode to be valid, except that some are "undocumented" - they do something unspecified, other than crashing your implementation. Rationale: if (!) your VM implementation is to run several instances of the guest simultaneously, it would be very bad if one guest were able to cause others to fail.

Bernd Jendrissek