views:

168

answers:

6

Possible Duplicate:
Learning to write a compiler

Hi Stack Overflow, now don't get me wrong, I don't intend to write a compiler for C++(though I intend to write it in C++) or Java or some other high level complex programming language. I just want to learn the basics of converting a basic instruction set into a Windows Executable(Say, just a simple language with 5-6 functions, completely custom). Also I don't want to download any libraries or header files. If you could link me to any very basic example source or tutorials it would be greatly appreciated!

+2  A: 

Jack Crenshaw's Let's Build a Compiler is a good tutorial to start off with. He's a good writer and makes the subject easy to understand.

Gordon Brandly
+2  A: 

Here's what you need to write a basic compiler:

  1. Parser. You will need to parse your language, and make an Abstract Syntax Tree. You may want to learn about writing parsers. You can either hand code the parser, or you can use parser generators, e.g lex/yacc.
  2. Assembly. You will need to generate assembly instructions form the Syntax Tree.
  3. Instruction Set. You will need to translate the assembly into machine code, in some specific instruction set (typical Intel and AMD CPU uses x86 instruction set; alternatively, you can target Java VM's instruction set or .NET's IL).
Lie Ryan
-1 Lex isn't a parser.
mathepic
@mathepic: I didn't claim it was a parser.
Lie Ryan
I meant to say parser generator.
mathepic
+3  A: 

To parse the input, you should read up on recursive descent parsing (those are probably the easiest parsers to hand-implement), although you will also need a lexer of some kind to produce tokens for your parser. They can be hand-coded (I've done it), although it's easier to use a lexer generator like lex or flex.

Once you've parsed the input, you will need to transform it into appropriate output. I can't help you much there, as I do not know the Windows toolchain very well. The "easy" way is to generate assembly and run it through NASM, MASM, or whatever assembler comes with your compiler environment. If your language is sufficiently simple, you can just generate the assembly as you go in the parser code.

Michael E
Thanks man, personally the reason I'm trying to learn this(and the reason I hate libraries), is because I love to write things myself, and this is a topic I want to understand. The lexing and such I understand, it's that mystified point at which text goes from text to executable code. On the other hand I would like to thank you for the link.
Cr15py
A: 

I would recommend www.antlr.org. I worked in C#, but it has support for C, Java, Python and more.

kenny
A: 

Actually, the most important thing you need is to figure out the binary format of .exe files (Unless you are planning to use an existing linker, at which point I think you need to output obj files which also have a binary format).

You also need to deal with a LOT of assembly, unless you are already VERY familiar with the x86 instruction set, I'd try something else.

Here are a couple possibilities:

There used to be a thing called "Tiny C"--I'm guessing this is it: http://bellard.org/tcc/

It is a good enough compiler to build itself, but not so complex that it's hard to understand. It's a bare-bones "How-to build a compiler" lesson in a box. Messed with it on the 8088.

Output for an "Embedded" cpu. They tend to have simple assembly languages and very clearly defined executable formats.

Output C-code. This is a cheat for sure, but you can concentrate on your language and not worry too much about the assembly language. (It's done wonders for Apple, that's all Objective-C is).

Finally, if you really want to go the .exe route, first write an app that produces a "Hello world" exe. Don't bother having it "Compile" anything, just hand edit the code, get it into the exe format and run it--in doing this you will KNOW that you got all your bits lined up and into the right spots, then you can start on a compiler with some confidence.

After this, then creating the language can be done through a lot of the procedures given here--but if you just want to see how it all works, I'd definitely do a few small iterations first, don't worry about what you will run into until you run into it.

Bill K
+1  A: 

For learning about how building a compiler is different in C++ than in, say, C or Pascal, try out the Boost Spirit parser framework.

This assumes familiarity with C++.

For learning about creating a compiler I suggest using a simpler language than C++, then perhaps advancing to C++.

Cheers & hth.,

Alf P. Steinbach