views:

404

answers:

8

I'm interested in learning - at depth - about compiler theory...

  • parsing
  • EBNF
  • LALR?

Are all terms I'm familiar with but don't really understand how to actually implement/use..

I'm looking for links, tutorials, and other resources; all online, preferably all free...

I'm more interested in simple / complete implementations, than complex / incomplete or convoluted implementations.

Update:

I know there are other similar links, but those didn't have the narrow focus on online and free links/resources..

Further Update

Any more high level answers?

+1  A: 

Start there and dig into the references you find. Each article has external links.

John at CashCommons
Yes, wikipedia is good. Especially if you know what you don't know... in my case, I know about those 3 terms, but I don't have a bigger picture understanding of how they all fit together, and what I don't know.Thanks!
John Weldon
+1  A: 

Theres a WikiBook about Compiler construction here. It is convering many aspects of compiler construction. I would suggest you'll take a look at tools like flex and/or bison (if you're planning to write an compiler in C) or the equivalent tools for your language. These tools help you to generate a parser for the grammar of your language.

Best wishes, Fabian

halfdan
Awesome, thanks! I'll dig into that.
John Weldon
+2  A: 

MIT has open course ware on this topic...

Jason Punyon
Ah!! I forgot about MIT's opencourseware... Thanks!
John Weldon
Yes! Structure and Interpretation of Computer Programs is available online at MIT - the course is dated and you need first edition of the book (to follow the citations in the lectures) but very, very good on the topic. And you learn LISP!
codefool
@John Weldon: No Problem!
Jason Punyon
+1  A: 

Lex and Yacc are tools for deterministically turning source into a tree -i.e., you specify your language grammer for the two tools and it generates a program to help you structure the code into a syntax tree.

LALR and EBNF are compiler front-end related terms and lex and yacc are the practical tools for it. Just by focusing your energy on learning how to use lex and yacc to build a parse tree you should have a starting point for building a compiler. Dynamic languages are easier to compile since they either have lots of runtime machinery or are fully interpreted, so you could just stop your search here :D.

These are standard UNIX tools so you should find plenty of resources on them.

I got this answer from someone about optimizing C++. If your going into this direction then you can learn a lot here.

Hassan Syed
This is the kind of answer I'm looking for.. Thanks Vainstah, more of this would be great.
John Weldon
+3  A: 

I still rely on the classics -- "Principles of Compiler Design" by Aho and Ullman (Green Dragon Book) and "Compilers: Principles, Techniques, and Tools" by Aho, Ullman, and Sethi (Red Dragon Book.)

Ah...memories...

codefool
very good book that
Hassan Syed
If they'd only release those books online and for free :)
John Weldon
This book is SO old that I expect you can get it for nothing on amazon.
Hassan Syed
What are the differences between the two books? Which one should I start with? And does the datedness affect the content very much?
int3
no like maths compiler theory represent absolute truth. They just transform one representation to another. Techniques and optimizations change but not the basics, and if you ask me the new techniques and optimizations are very simple to understand; most research occurs on Java - at least when I last looked at the area.
Hassan Syed
I must add. That the libraries to perform things change.... there are probably more elegant lexxers and tokenizers now and the intermediary languages improve -- e.g., Java and CLR are the target of choice these days and they are far more elegant than a programatic in memory data structure :D. But the jist of it lies in these old leviathan books :P study them well.
Hassan Syed
I would start with the MIT online course - that will give you fantastic foundation. Then move on to the Green Dragon Book, and then to the Red Dragon Book. Don't worry about them being dated - that's the thing with CS - it's all been done before - only now it's faster, smarter, and deals with a lot more. But the basics are always the basics until they finally come out with a trinary processor, and even then most of it will *still* apply.
codefool
+1  A: 

You need to understand parsing in general, and then how parsers are used. Recursive descent parsers are the easiest to understand, IMHO, with LALR/YACC/... being more difficult to graph because the tools are complex.

I learned recursive descent parsing and code generation from one enormously fun techical paper. That paper has been turned into a website, which walks you through building a completely self-contained compiler system that can compile itself and other languages:

MetaII Compiler Tutorial

This is all based on an amazing little 10-page technical paper by Val Schorre: META II: A Syntax-Oriented Compiler Writing Language from honest-to-god 1964. I learned how to build compilers from this back in 1970. There's a mind-blowing moment when you finally grok how the compiler can regenerate itself....

I know the website author from my college days, but have nothing to do with the website.

Ira Baxter
+1  A: 

See those links :

http://www.stanford.edu/class/cs143/

See also this :

http://www.cs.princeton.edu/~appel/modern/java/ you can find it on google books also.

dan
+1  A: 

There's a free tutorial called Let's build a compiler. Quote from the introduction;

"This series of articles is a tutorial on the theory and practice of developing language parsers and compilers. Before we are finished, we will have covered every aspect of compiler construction, designed a new programming language, and built a working compiler."

Dananjaya