+1  A: 

(1) The dragon book is very popular, but I don't like it, it's too old, and old things are out of date. (I don't have personal experience with a better book, so I won't recommend one.)

(3) As for what language to use, see this question:

http://stackoverflow.com/questions/809710/what-is-the-best-language-to-write-a-compiler-in

(4) Regarding parsing tool chains, lex and yacc and equivalents are ok, but ANTLR is better and Parsec is even better, in my opinion.

Brian
I initially was wary of the Dragon Book as well, for its age. However, most of what I have read recently about it say that its theory is still sound and the portions that are largely out of date are the compiler-optimizations (which I won't be utilizing for this project).
Simucal
FYI, The purple Dragon Book, second edition was published in August 2006 (http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools#Second_edition)
Matthew Flaschen
@Matthew Flaschen, here is the #2 comment on amazon: "When I ordered this book I thought it is a second edition that includes new or revised chapters. I was wrong. The book itself is identical with the first edition (I wouldn't even say this is a second edition, it is a re-print), the only difference is that they give you a card with an access code to the www.aw-bc.com/dragonbook website where you can download some online chapters."
Simucal
A: 

The second edition of the Dragon Book contains a Java implementation of a compiler frontend for a C-like language. It was all what I needed to write my own compiler.

Ayman Hourieh
+1  A: 

Some ideas:

Lance Richardson
A: 

I had written a compiler in Scheme, for a subset of the scheme language, but I am a student at IU where scheme is very popular among the Prog Languages Group. But yeah I found it very simple and intutive to write a compiler in that Language.

Shiv
A: 

Tips: do some heavy thinking before you start coding, what do you want your OS to do that isn't already done in other OSes

do you want your project to be easy, or do you want it to be highly useful? some languages make it fairly straight forward to write a compiler, but they are usually speed limited or don't add much value. Maybe you can make it modular or social (open source) so others can add to your OS with their favorite modules/features?

code it in a fast / compiled language (c#,c++) or if you're sadistic assembly (bad joke, please don't)

answers merged from (Brian ayman )

1 The dragon book is very popular, but I don't like it, it's too old, and old things are out of date. (I don't have personal experience with a better book, so I won't recommend one.) The second edition of the Dragon Book contains a Java implementation of a compiler frontend for a C-like language. It was all what I needed to write my own compiler.

2&3 Code it in C#,c++ and provide a compiler in C/c++ to start out? Depending on your implementation could be easy or difficult.
As for what language to use, see this question: http://stackoverflow.com/questions/809710/what-is-the-best-language-to-write-a-compiler-in

4(misc extras) see above design issues, + Regarding parsing tool chains, lex and yacc and equivalents are ok, but ANTLR is better and Parsec is even better, in my opinion.

Mark Essel
+1  A: 

I used Crafting a Compiler by Fischer when I was in school many many years ago. It seems to have been updated in 1998 and a new edition is comming out this year. It has a companion Crafting a Compiler in C. While the "dragon book" has been the standard for years, this might not be a bad text to review.

MikeJ
+2  A: 

What are some good materials on ~practical~ compiler construction? I've already picked up a copy of the Dragon Book that I plan to read fairly thoroughly. It should cover most of the theory that we need.

You should consider looking into the different things that motivated other languages to do what they do. Many Have Websites.

What language should we write a compiler for? A small subset of C? Our own language?

I think there are 4 languages. Each one has merits:

  • Context Free(ish) Languages: algol-like languages go here. C is such an example, And so is Perl, Python, Ada, PHP, C++, Lua, bash, SQL. Advantage: Pretty; Disadvantage: Hard to get working, even with compiler generators
  • Lisp like Languages: These basically describe a human readable Abstract Syntax tree, usually with some kind of metaprogramming ability added on. Advantage: Easier to write than context free, probably in a few hours. Disadvantage: Guaranteed to look exactly like Lisp
  • Forth Like: Forth, Postscript, dc, BrainF*ck: Stupid simple token-only language. Advantages: even easier to parse than Lisp, you can probably get this going in a few minutes; Disadvantages: write-only language(for most folks)
  • Misguided Hand Written: FORTRAN, Tcl, early BASIC variants. The common factor these have is that their parsers are hand written, because they have to be. Language features are hard coded into the parsers. Advantages: Can do some interesting, sometimes amazing things; Disadvantages: You won't finish the parser in one summer.

In some ways that's the easy part. The semantic aspect your new language provides will depend strongly on what task it is aimed at solving. This topic is too big for me to post about reasonably, so I have to defer to wikipedia

What language should we write the compiler in? Most of the resources I have found utilize C. This seems to be the standard but we are looking for suggestions.

Functional languages seem to be well suited to this, if you are hand-writing your front-end. Haskell and Ocaml seem to be the two favorites.

On the other hand, I've never used those for that. Parser generators are about as terse. In fact, lex/yacc work pretty well. Their output is not meant for mortals, however. If this is a problem, or you need some output language other than C, ANTLR may your best bet.

For the back end: LLVM. Several sample languages are provided to get you up and running. You can target any plausible platform (including interpretation).

You actually don't need a back-end right away. There's going to be some front end work, resolving how the language actually has to work before you can even think about doing anything with parsed IR. This is particularly true for context free languages. If this aspect is less interesting to you, use a Lisp or Forth like syntax that you can write in an hour so that you can get straight to the semantic work.

Naturally, that semantic work is the meat of a compilers course, and really where most of the interesting (to me, anyway) content lies anyway. This is the part that distinguishes, say, functional languages from object oriented languages or concurrent languages and so on.

Any other general suggestions, pitfalls, tips?

Why are you writing another language? I don't mean this in a pejorative sense. Writing yet another generic C-like language (Or Lisp or Python or whatever) probably won't be very satisfying, even if it's just for learning. My favorite learning-oriented language, SPL, had very specific design goals that resulted in a distinct language that was quite orthogonal to any other (although this goal was at odds to usefulness). Decide early what your language is for.

TokenMacGuy
Well thought out answer. Thanks!
Simucal