What are some good materials on ~practical~ compiler construction? I've already picked up a copy of the Dragon Book that I plan to read fairly thoroughly. It should cover most of the theory that we need.
You should consider looking into the different things that motivated other languages to do what they do. Many Have Websites.
What language should we write a compiler for? A small subset of C? Our own language?
I think there are 4 languages. Each one has merits:
- Context Free(ish) Languages: algol-like languages go here. C is such an example, And so is Perl, Python, Ada, PHP, C++, Lua, bash, SQL. Advantage: Pretty; Disadvantage: Hard to get working, even with compiler generators
- Lisp like Languages: These basically describe a human readable Abstract Syntax tree, usually with some kind of metaprogramming ability added on. Advantage: Easier to write than context free, probably in a few hours. Disadvantage: Guaranteed to look exactly like Lisp
- Forth Like: Forth, Postscript, dc, BrainF*ck: Stupid simple token-only language. Advantages: even easier to parse than Lisp, you can probably get this going in a few minutes; Disadvantages: write-only language(for most folks)
- Misguided Hand Written: FORTRAN, Tcl, early BASIC variants. The common factor these have is that their parsers are hand written, because they have to be. Language features are hard coded into the parsers. Advantages: Can do some interesting, sometimes amazing things; Disadvantages: You won't finish the parser in one summer.
In some ways that's the easy part. The semantic aspect your new language provides will depend strongly on what task it is aimed at solving. This topic is too big for me to post about reasonably, so I have to defer to wikipedia
What language should we write the compiler in? Most of the resources I have found utilize C. This seems to be the standard but we are looking for suggestions.
Functional languages seem to be well suited to this, if you are hand-writing your front-end. Haskell and Ocaml seem to be the two favorites.
On the other hand, I've never used those for that. Parser generators are about as terse. In fact, lex/yacc work pretty well. Their output is not meant for mortals, however. If this is a problem, or you need some output language other than C, ANTLR may your best bet.
For the back end: LLVM. Several sample languages are provided to get you up and running. You can target any plausible platform (including interpretation).
You actually don't need a back-end right away. There's going to be some front end work, resolving how the language actually has to work before you can even think about doing anything with parsed IR. This is particularly true for context free languages. If this aspect is less interesting to you, use a Lisp or Forth like syntax that you can write in an hour so that you can get straight to the semantic work.
Naturally, that semantic work is the meat of a compilers course, and really where most of the interesting (to me, anyway) content lies anyway. This is the part that distinguishes, say, functional languages from object oriented languages or concurrent languages and so on.
Any other general suggestions, pitfalls, tips?
Why are you writing another language? I don't mean this in a pejorative sense. Writing yet another generic C-like language (Or Lisp or Python or whatever) probably won't be very satisfying, even if it's just for learning. My favorite learning-oriented language, SPL, had very specific design goals that resulted in a distinct language that was quite orthogonal to any other (although this goal was at odds to usefulness). Decide early what your language is for.