tags:

views:

2852

answers:

10

After over a decade of C/C++ coding, I've noticed the following pattern - very good programmers tend to have detailed knowledge of the innards of the compiler.

I'm a reasonably good programmer, and I have an ad-hoc collection of compiler "superstitions", so I'd like to reboot my knowledge and start from the basics.

Can anyone recommend links to online resources or favorite books? I'm particularly interested in C/C++ compiling, optimization, GCC and LLVM.

BTW: At university I was interested in compilers, and so I signed up for subject called "Parsing and Translation". Bad move. It ended up being the Professor's vehicle for testing his advanced parsing and grammar theory textbook on unsuspecting 3rd year Comp Sci students. I was left with no practical knowledge, and even more confused than before.

+8  A: 
Brandon E Taylor
I was thumbing through the GCC internals manual, it doesn't seem useful for "Learning" how a compiler works. It's not a teaching document it assumes that you already have a knowlege of the subject.
NoMoreZealots
+9  A: 

If you want dead-tree edition, try The Art of Compiler Design: Theory and Practice.

J-16 SDiZ
+1  A: 

Depending on what you exactly want to know, you should have a look at pipes&filter pattern, because as far as I know this (or something similar) is used in a lot of compilers in the last years.

When my compiler knowledge is not too outdated it works like this:

Parse sourcecode into symbolic representation

Clean up symbolic representation, do some normalization

Optimization of the symbolic tree based on certain rules

write out executable code based on symbolic tree

Of course dependencies etc. have to be resolved too.

And of course having a look at gcc or javac sourcecode may help in getting more detailed understanding.

Patrick Cornelissen
+17  A: 

Start with the dragon book....(stress more on code optimization and code generation)

Go onto write a toy compiler for an educational programming language like Decaf or Cool.., you may use parser generators (lex and yacc) for your front end(to make life easier and focus on more imp stuff)....

Then read gcc internals book along with browsing gcc source code.

sourabh jaiswal
Thanks, nice sequence. I take the dragon book is : http://en.wikipedia.org/wiki/index.html?curid=188976
Justicle
Yes, that is the dragon book. I read the 1st edition. It had a much simpler dragon....
RBerteig
Gah. People keep recommending this. Not me. Start with a casual introduction---say "Let's build a compiler"---then look at a Computer Sciencey reference with all the math and theory.
dmckee
I'd recommend against trying to understand GCC. It's fairly unusual as far as compilers go, and its architecture is poor by design (as in, the design is crippled on purpose. Yes, I'm serious. No, I'm not just making a joke at GCC's expense).
Dietrich Epp
When it comes to understanding what you are doing, LEX and YACC just add an extra layer of technology that obscures your view of what's going on. If the goal is UNDERSTANDING how a compiler works a recursive decent parser will give you a better understanding than using LEX and YACC, and generally speaking if you're just doing it as a learning exercise you are probably not going to write a optimising compiler in your free time without someone else helping you.
NoMoreZealots
If you are interested in compiler optimizations only then you can try SUIF
sourabh jaiswal
suif.stanford.edu
sourabh jaiswal
+1  A: 

It may also be valuable to pick up and read the source code to a compiler. I doubt that GCC is the best first choice, since it is burdened with full compatibility to more than 20 years of evolution of the language. But I'm also sure that a reading of its source, guided by one of the internal reference manuals, would be educational.

I'd seriously consider looking at the source to a scripting language that is internally compiled to a bytecode for a virtual machine. Several languages fit that description, but I would start with Lua. The language is small, and the VM is novel. The source code is also small and the bits I've looked at have been very clear although lightly commented.

RBerteig
+8  A: 

Compiler Text are good, but they are a bit heavy for teaching yourself. Jack Crenshaw has a "Book" that was a series of articles you can download and read call "Lets Build a Compiler." It follows a "Learn By Doing" methodology that is great if you didn't get anything out of taking formal classes on the subject, or it's been WAY too many years since took it (that's my case). It holds your hand and leads you through writting a compiler instead of smacking you around with Lambda Calculus and deep theoretical issues that only academia cares about. It was a good way to stir up those brain cells that only had a fuzzy memory of writting something on the Vax (YEAH, that right a VAX!) many many moons ago at school. It's written very conversationally and easy to just sit down and read, unlike most text books which require several pots of coffee just to get past the first chapter. Once you have a basis for understanding then more traditional text such as the Dragon book are great references to expand on your understanding. (And personal I like the Dead Tree versions, I printed out Jack's, it's much easier to read in a comfortable position than on a laptop. And the Ebook readers are too expensive for something that doesn't actually feel like you're reading a real book yet.)

What some might call a "downside" is that it's written in Pascal, but I thought that just made me think about it more than if someone had given me a working C program to start with. Appart from that it was written with the 68000 in mind, which is only being used in embedded systems at this point time. Again for me this wasn't a problem, I knew 68000 asm and 68000 asm is easier to read than some other asm.

NoMoreZealots
+4  A: 
Norman Ramsey
Thanks for the tip - I will check lcc out
Justicle
Brillant Engineers? Jack Crenshaw designed parts of the space shuttle, and home made computers were a HOBBY of his. Not to dispute the intellect of folks who wrote lcc, but you don't have to be brilliant to design a compiler. It's really not that hard.
NoMoreZealots
The reference was not to Crenshaw but to gcc. RMS is many things, but brilliant engineer is not one of them. Then add 1000 monkeys and stir well...
Norman Ramsey
+2  A: 

see Fabrice Bellard's otcc source code

http://bellard.org/otcc/

plan9assembler
A: 

have a look on Kaleidoscope. You can write your own compiler in just a few days with LLVM.

name