views:

392

answers:

4

I have an experience about the compiler phrases and I interested in Programming Languages & Compilers field and I hope somebody gives me some explanation about what is the good approach to write a new compiler from scratch for a new programming language ? (I mean STEPS).

+10  A: 

The first step is to read the Dragon Book.

It offers a good introduction to the whole field of compiler building, but also goes into enough detail to actually build your own.

As for the following steps I suggest following the chapters of the book. It's not written as a tutorial, but nevertheless offers much practical advice, which makes it an ideal hub for your own ideas and research.

DR
Please guys, you should *really* stop citing the dragon book... It's one of the worst compiler books out there. You can cite Appel's, Cooper, etc. But *please* not the Dragon Book.
tonfa
"Modern Compiler Implementation" (Appel) is very badly written and "Engineering a Compiler" (Cooper) is not very suitable for beginners.
DR
It can't be worse than the arcane Dragon Book. I've personally not found the Tiger Book badly written, although I've never read it from cover to cover, I've found it has a lot of very good insights.
tonfa
"Engineering a compiler" is an excellent book for beginners (though it holds its own at intermediate level), far more so than the Dragon book. The Dragon Book is truly, truly an awful intro to compilers.
Paul Biggar
I liked the "Dragon Book"... ^_^
fortran
I found the Dragon book hard to read.
Amigable Clark Kant
+3  A: 

I would look at integrating your langauge/front end with the GNU compiler framework.

That way you only (ONLY!) need to write the parser and translator to gcc's portable object format. You get the optimiser, object code generation for the chip of choice, linker etc for free.

Another alternative would be to target a Java JVM, the virtual machine is well documented and the JVM instruction set is much more staighforward than x86 machine code.

James Anderson
Depends on what you want do to. If you really want to know how everything works from scratch, GCC is a very confusing way.
Amigable Clark Kant
+6  A: 

Please don't use the Dragon Book, it's old and mostly outdated (and uses weird names for most of the stuff).

For books, I'd recommand Apple's Tiger Book, or Cooper's Engineering a compiler. I'd strongly suggest you to use a framework like llvm so you don't have to re-implement a bunch of stuff for code generation etc.

Here is the tutorial for building your language with llvm: http://llvm.org/docs/tutorial/

tonfa
+2  A: 

I managed to write a compiler without any particular book (though I had read some compiler books in the past, just not in any real detail).

The first thing you should do is play with any of the "Compiler compiler" type tools (flex, bison, antlr, javacc) and get your grammar working. Grammars are mostly straightforward, but there's always nitty bits that get in the way and make a ruin of everything. Especially things like expressions, precedence, etc.

Some of the older simpler language are simpler for a reason. It makes the parsers "Just Work". Consider a Pascal variant that can be processed solely through recursive decent.

I mention this because without your grammar, you have no language. If you can't parse and lex it properly, you get nowhere very fast. And watching a dozen lines of sample code in your new language get turned in to a slew of tokens and syntax nodes is actually really amazing. In a "wow, it really works" kind of way. It's literally almost an "it all works" or "none of it works" kind of thing, especially at the beginning. Once it actually works, you feel like you might be able to really pull it off.

And to some extent that's true, because once you get that part done, you have to get your fundamental runtime going. Once you get "a = 1 + 1" compiled, the bulk of the new work is behind your and now you just need to implement the rest of the operators. It basically becomes an exercise of managing lookup tables and references, and having some idea where you are at any one time in the process.

You can run out on your own with a brand new syntax, innovative runtime, etc. But if you have the time, it's probably best to do a language that's already been done, just to understand and implement all of the steps, and think about if you were writing the language you really want, how you would do what you're doing with this existing one differently.

There are a lot of mechanics to compiler writing and just doing the process successfully once will give you a lot more confidence when you want to come back and do it again with your own, new language.

Will Hartung