views:

100

answers:

5

Suppose that the purpose of an assignment is to write a compiler that works on a subset of C language (you can assume a subset of whatever language, just supporting basic scripting expressiveness without having complex things as objects).

What kind of intermediate code could be used to verify the correctness of the compiler? I was talking with a professor and he spoke about the fact that he didn't know what to give to his students as the VM to be used for the "compiled code" so I wondered which could be a good solution.

Subset of C -> Compiler -> Code? -> VM

in which code could be either in binary format or better in an ASCII format (something like pseudo-asm).

I'm looking for something already made, not how to structure this intermediate code and the VM, just an easy and simple one ready to be used to test some compiled programs..

A: 

How about compiling to a scripting language (e.g. JavaScript)? It's human-readable and already made.

Amnon
The purpose is to write a compiler for a language already at JavaScript complexity into something lower like a dummy VM with classic stack/heap management but not so complex like a real HW
Jack
You could compile to a simple subset of JavaScript.
Amnon
It seems to me like the largest factor in determining how complicated your intermediate code becomes is your compiler. If it optimizes and everything, you can take advantage of the underlying machine and the intermediate code uses a larger fraction of it's capabilities, while if you want to keep it simple, your compiler might basically only produce simple LOAD, STORE, ADD etc 3-code instructions for the underlying (virtual) machine.
JeSuisse
+2  A: 

You could describe some abstract machine design and then provide it an instruction set in list-format. I small LISP parser is a nobrainer in parsers.

(label add-two)
(init-stack-frame 2)
(load r1 0)
(load r2 1)
(add val r1 r2)
(goto cont)

Also, writing a lisp interpreter to read this in is a nobrainer.

load_labels (index, expr, env)
    if expr.first == 'label'
        env.set(expr.second, index)

interpret (machine, expr, env)
    return env.lookup(expr.first).eval(machine, expr.tail)
Cheery
A: 

How about targeting the Java Virtual Machine? Not sure how simple it is but it is very well documented, so if the students where curious, they could head over to amazon.com and get a book about what the intermediate code actually means and how the vm works.

You could also just create real 80x86 or 68000 assembly, use an assembler to get machine code and then use an emulator to run it. Real hardware doesn't strike me as more complicated than some made-up VM if you're already gone through writing a compiler and it has tons of debuggers and other utilities available already.

But I do like the LISP suggestion :-)

JeSuisse
I was thinking about x86 or 68000 assembly that is quite easy but it has the problem to force students to have to write a binary assembler instead a more suitable one like "add a b / mul b c" and so on! JVM would be nice but maybe a little bit overkill
Jack
Do the students really need to develop the whole toolchain down to machine code? I mean, can't you just show them how assembly statements directly translate to machine code on a few simple examples (INT 16 -> CD 10 etc) and once they got that, announce to them in a great booming voice: "Lo and behold, here I have this great magical machine called an ASSEMBLER which will translate your assembly code to machine code?" ;-)
JeSuisse
A: 

How about llvm?

swegi
A: 

You can find many examples of intermediate code/bytecode in existing VMs. Depending on your definition, they may or may not be simple. Examples:

Corbin March