Preferred Languages : C/C++, Java, and Ruby

I am looking for some helpful books/tutorials on how to write your own compiler simply for educational purposes. I am most familiar with C/C++, Java, and Ruby so I prefer resources that involve one of those three, but any good resource is acceptable.

+4  A: 

Haven't read it but Writing a compiler in Ruby bottom up looks promising for you.

Peter Coulton
+319  A: 

Big List of Resources:

Michael Stum
It was one of my favorites. ;)
Vijesh VP
This post originally just had Dragon Book in it, but was modified to include the whole list. So Vijesh is referring to the Dragon Book.
Is the C-Python source code something good to look at too? or the Lua source?
I would add "Lisp In Small Pieces" by Christian Queinnec. Building a Lisp compiler (and environment) can be quite different from building a C or Pascal compiler. It would at least contribute to widen your understanding of the language implementation world. :-)
Great pointer to resources.
I've read `Let's Build a Compiler` [] series, it is really nice writeup and is a good starting point.
Terrance Parr (ANTLR creator) has also written a few books on using ANTLR - The Definitive Antlr Reference: Building Domain-Specific Languages (Pragmatic Programmers) and Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)
Mark Mullin
These are nice resources but when it's just a big list like this, I stop getting the Stack Overflow experience and feel I may as well just be reading Wikipedia.
+4  A: 

A while back I found this article on writing compilers which has links to two resources, Let's Build a Compiler and A Nanopass Framework for Compiler Education.

John Downey
+4  A: 

Here's an interesting paper: An Incremental Approach to Compiler Construction

Read it like a tutorial. It uses a subset of Scheme as the input language.

+10  A: 

"Let's Build a Compiler" is awesome, but it's a bit outdated. (I'm not saying it makes it even a little bit less valid)

I agree that this series is a bit outdated, although it is still useful. However, my biggest gripe with it is the fact that it tries to output straight to assembly language rather than building any type of parse tree, which means (contrary to what is stated in the first article) that it isn't very useful for writing an interpreter.
+24  A: 

I concur with the Dragon Book reference; IMO, it is the definitive guide to compiler construction. Get ready for some hardcore theory, though.

If you want a book that is lighter on theory, Game Scripting Mastery might be a better book for you. If you are a total newbie at compiler theory, it provides a gentler introduction. It doesn't cover more practical parsing methods (opting for non-predictive recursive descent without discussing LL or LR parsing), and as I recall, it doesn't even discuss any sort of optimization theory. Plus, instead of compiling to machine code, it compiles to a bytecode that is supposed to run on a VM that you also write.

It's still a decent read, particularly if you can pick it up for cheap on Amazon. If you only want an easy introduction into compilers, Game Scripting Mastery is not a bad way to go. If you want to go hardcore up front, then you should settle for nothing less than the Dragon Book.

Daniel F. Hanson
Game Scripting Mastery is a great learning resource because when you're done you will have a playable, scriptable 2D adventure game. This makes every exercise focused on a specific purpose, and keeps the reader motivated.
Dour High Arch
Dragon is a bit overly focussed on grammar based parsing. If you are not trying to parse something sheer impossible like C++ or so using parser generators, but can use e.g. a handcrafted LL grammar you might want to look out for something that treats a higher percentage compiler fields other than grammar transformation and proving
Marco van de Voort
+13  A: 

If you're looking to use powerful, higher level tools rather than building everything yourself, going through the projects and readings for this course is a pretty good option. It's a languages course by the author of the Java parser engine ANTLR. You can get the book for the course as a PDF from the Pragmatic Programmers.

The course goes over the standard compiler compiler stuff that you'd see elsewhere: parsing, types and type checking, polymorphism, symbol tables, and code generation. Pretty much the only thing that isn't covered is optimizations. The final project is a program that compiles a subset of C. Because you use tools like ANTLR and LLVM, it's feasible to write the entire compiler in a single day (I have an existence proof of this, though I do mean ~24 hours). It's heavy on practical engineering using modern tools, a bit lighter on theory.

LLVM, by the way, is simply fantastic. Many situations where you might normally compile down to assembly, you'd be much better off compiling to LLVM's Intermediate Representation instead. It's higher level, cross platform, and LLVM is quite good at generating optimized assembly from it.

Peter Burns
+2  A: 

The Parrot Foundation offers a 9-part tutorial on writing a compiler to target the Parrot Virtual Machine. The tutorial uses a simple Lua-like language, Squaak, but Parrot is flexible enough to handle modern OO languages as well.

Bruce Alderman
+6  A: 

The Dragon Book is definitely the "building compilers" book, but if your language isn't quite as complicated as the current generation of languages, you may want to look at the Interpreter pattern from Design Patterns.

The example in the book designs a regular expression-like language and is well thought through, but as they say in the book, it's good for thinking through the process but is effective really only on small languages. However, it is much faster to write an Interpreter for a small language with this pattern than having to learn about all the different types of parsers, yacc and lex, et cetera...

Chris Bunch
+7  A: 

The MSDN article "Roll your own compiler in the .net framework" is a well written, concise and practical starting point.

Leon Bambrick
+19  A: 

I think Modern Compiler Implementation in ML is the best introductory compiler writing text. There's a Java version and a C version too, either of which might be more accessible given your languages background. The book packs a lot of useful basic material (scanning and parsing, semantic analysis, activation records, instruction selection, RISC and x86 native code generation) and various "advanced" topics (compiling OO and functional languages, polymorphism, garbage collection, optimization and single static assignment form) into relatively little space (~500 pages).

I prefer Modern Compiler Implementation to the Dragon book because Modern Compiler implementation surveys less of the field--instead it has really solid coverage of all the topics you would need to write a serious, decent compiler. After you work through this book you'll be ready to tackle research papers directly for more depth if you need it.

I must confess I have a serious soft spot for Niklaus Wirth's Compiler Construction. It is available online as a PDF. I find Wirth's programming aesthetic simply beautiful, however some people find his style too minimal (for example Wirth favors recursive descent parsers, but most CS courses focus on parser generator tools; Wirth's language designs are fairly conservative.) Compiler Construction is a very succinct distillation of Wirth's basic ideas, so whether you like his style or not or not, I highly recommend reading this book.

Dominic Cooney
+5  A: 

Python comes bundled with a python compiler written in Python. You can see the source code, and it includes all phases, from parsing, abstract syntax tree, emitting code, etc. Hack it.

+1  A: 

Check out this article: it profiles two papers on writing compilers.

+8  A: 

One book not yet suggested but very important is "Linkers and Loaders" by John Levine. If you're not using an external assembler, you'll need a way to output a object file that can be linked into your final program. Even if you're using an external assembler, you'll probably need to understand relocations and how the whole program loading process works to make a working tool. This book collects a lot of the random lore around this process for various systems, including Win32 and Linux.

Ben Combee
+2  A: 

Go to the Flipcode article archive and search for Implementing A Scripting Engine by Jan Niestadt, a nine-part series about writing a scripting engine, including a compiler and virtual machine.

Peter Stuifzand
+6  A: 

An easy way to create a compiler is to use bison and flex (or similar), build a tree (AST) and generate code in C. With generating C code being the most important step. By generating C code, your language will automatically work on all platforms that have a C compiler.

Generating C code is as easy as generating HTML (just use print, or equivalent), which in turn is much easier than writing a C parser or HTML parser.

Peter Stuifzand
+6  A: 

If you're willing to use LLVM, check this out: It teaches you how to write a compiler from scratch using LLVM's framework, and doesn't assume you have any knowledge about the subject.

The tutorial suggest you write your own parser and lexer etc, but I advise you to look into bison and flex once you get the idea. They make life so much easier.

+2  A: 

As an starting point, it will be good to create a recursive descent parser (RDP) (let's say you want to create your own flavour of BASIC and build a BASIC interpreter) to understand how to write a compiler. I found the best information in Herbert Schild's C Power Users, chapter 7. This chapter refers to another book of H. Schildt "C The complete Reference" where he explains how to create a calculator (a simple expression parser). I found both books on eBay very cheap. You can check the code for the book if you go to or check in I found the same code but for C# in his latest book

+7  A: 

"... Let's Build a Compiler ..."

I'd second by @sasb. Forget buying more books for the moment.

Why? Tools & language.

The language required is Pascal and if I remember correctly is based on Turbo-Pascal. It just so happens if you go to and download the Pascal compiler all the examples work straight from the page ~ The beaut thing about Free Pascal is you can use it almost whatever processor or OS you can care for.

Once you have mastered the lessons then try the more advanced "Dragon Book" ~

+2  A: 

If you want to use Ruby, look at Treetop, if you want to use Java, look at Antlr. Both are powerful libraries that make it easier and quicker to build parsers for your language.

+4  A: 

I liked the Crenshaw tutorial too, because it makes it absolutely clear that a compiler is just another program that reads some input and writes some out put.

Read it.

Work it if you want, but then look at another reference on how bigger and more complete compilers are really written.

And read On Trusting Trust, to get a clue about the unobvious things that can be done in this domain.

+1  A: 

You might be interested in this ONLamp article where Dan Sugalski describes how he built a compiler to add modern features to a 1980s legacy programming language still used by his employer.

Bruce Alderman
+6  A: 

Another important chunk of knowledge can be found in this free PDF (the newest 2008 edition is non-free)

Parsing Techniques - A Practical Guide

[update] Another nice free resource to introduce you to compiler construction

Compiler Basics

+7  A: 

The LCC compiler (wikipedia) (project homepage) of Fraser and Hanson is described in their book "A Retargetable C Compiler: Design and Implementation". It is quite readable and explains the whole compiler, down to code generation.

+2  A: 

FWIW at the bottom of this page there is a link to a "C Like" interpreter written in C/C++ and using lexx and yacc tools. I think the C++ version has been updated to build using Microsoft Visual Studio.

NOTE: This was my first and last attempt at writing an interpreter so don't expect too much.

+4  A: 

There's a lot of good answers here, so i thought I'd just add one more to the list:

I got a book called Project Oberon more than a decade ago, which has some very well written text on the compiler. The book really stands out in the sense that the source and explanations is very hands on and readable. The complete text (the 2005 edition) has been made available in pdf, so you can download right now. The compiler is discussed in chapter 12:

Niklaus Wirth, Jürg Gutknecht

(The treatment is not as extensive as his book on compilers)

I've read several books on compilers, and i can second the dragon book, time spent on this book is very worthwhile.

+3  A: 

If you are interested in writing a compiler for a functional language (rather than a procedural one) Simon Peyton-Jones and David Lester's "Implementing functional languages: a tutorial" is an excellent guide.

The conceptual basics of how functional evaluation works is guided by examples in a simple but powerful functional language called "Core". Additionally, each part of the Core language compiler is explained with code examples in Miranda (a pure functional language very similar to Haskell).

Several different types of compilers are described but even if you only follow the so-called template compiler for Core you will have an excellent understanding of what makes functional programming tick.

Mark Reid
+2  A: 

The Dragon Book is too complicated. So ignore it as a starting point. It is good and makes you think a lot once you already have a starting point, but for starters, perhaps you should simply try to write an math/logical expression evaluator using RD, LL or LR parsing techniques with everything (lexing/parsing) written by hand in perhaps C/Java. This is interesting in itself and gives you an idea of the problems involved in a compiler. Then you can jump in to your own DSL using some scripting language (since processing text is usually easier in these) and like someone said, generate code in either the scripting language itself or C. You should probably use flex/bison/antlr etc to do the lexing/parsing if you are going to do it in c/java.

I wouldn't say "too complicated", I would say "badly written".
+5  A: 

You should check out Darius Bacon's "ichbins", which is a compiler for a small Lisp dialect, targeting C, in just over 6 pages of code. The advantage it has over most toy compilers is that the language is complete enough that the compiler is written in it. (The tarball also includes an interpreter to bootstrap the thing.)

There's more stuff about what I found useful in learning to write a compiler on my Ur-Scheme web page.

+2  A: 

I asked the same question of a friend of mine, and he pointed me to The Structure and Interpretation of Computer Programs. Any thoughts on this? I'm looking for a nice next step after working through a data structures and algorithms book.

This is a useful book to start thinking about how programs are evaluated by compilers, but it doesn't get into things like lexing, parsing, intermediate representations, or code generation.
Jay Conrod
+6  A: 


I am looking into the same concept, and found this promising article by Joel Pobar,

Create a Language Compiler for the .NET Framework

he discusses a high level concept of a compiler and proceeds to invent his own langauge for the .Net framework. Although its aimed at the .Net Framework, many of the concepts should be able to be reproduced. The Article covers:

  1. Langauge definition
  2. Scanner
  3. Parser (the bit im mainly interested in)
  4. Targeting the .Net Framework The
  5. Code Generator

there are other topics, but you get the just.

Its aimed to people starting out, written in C# (not quite Java)



What does "not quite Java" mean?
haha, sorry, i meant its written for .Net, which in principal is similar to java. Both are JIT in style. :)
+2  A: 

I have written an online tutorial on compiler design, titled "Let's build a scripting Engine-Compiler, as well as a native code compiler called Bxbasm. The Online doc's are at:

The docs, support files and compiler, in zip form, are at:


Steve A.

+1  A: 

if you like me, who has no formal computer science education, and interested on build/want to know how a compiler works.

I am recommend "Programming Language Processors in Java: Compilers and Interpreters", an amazing book for self taught computer programmer.

from my points of view, understanding those basic language theory, automate machine, set theory is not a big problem, the problem is how to turn those thing into code, above book tell you how to write a parser, analysis context, and generate code. if you can not understands this book, then i have to say, give up build a compiler. the book is best programming book i have even read.

there is an other book also good, Compiler Design in C, lot of code, tell you every thing about how to build compiler and lex tools.

building a compiler is a fun programming practice, can learn a heaps of programming skills.

do not buy the Dragon book, wast of money and time,not for practitioner

+6  A: 

Sorry it is in spanish, but this is the bibliography of a course called "Compiladores e Intérpretes" (Compilers and Interpreters) in Argentina.

The course was from formal language theory to compiler construction, and that's are the topics you need to build, at least, a simple compiler.

  • Compilers Design in C.
    Allen I. Holub

    Prentice-Hall. 1990.

  • Compiladores. Teoría y Construcción.
    Sanchís Llorca, F.J. , Galán Pascual, C. Editorial Paraninfo. 1988.

  • Compiler Construction.
    Niklaus Wirth

    Addison-Wesley. 1996.

  • Lenguajes, Gramáticas y Autómatas. Un enfoque práctico.
    Pedro Isasi Viñuela, Paloma Martínez Fernández, Daniel Borrajo Millán. Addison-Wesley Iberoamericana (España). 1997.

  • The art of compiler design. Theory and practice.
    Thomas Pittman, James Peters.

    Prentice-Hall. 1992.

  • Object-Oriented Compiler Construction.
    Jim Holmes.
    Prentice Hall, Englewood Cliffs, N.J. 1995

  • Compiladores. Conceptos Fundamentales.
    B. Teufel, S. Schmidt, T. Teufel.

    Addison-Wesley Iberoamericana. 1995.

  • Introduction to Automata Theory, Languages, and Computation.

    John E. Hopcroft. Jeffref D. Ullman.
    Addison-Wesley. 1979.

  • Introduction to formal languages.
    György E. Révész.

    Mc Graw Hill. 1983.

  • Parsing Techniques. A Practical Guide.
    Dick Grune, Ceriel Jacobs.
    Impreso por los autores. 1995

  • Yacc: Yet Another Compiler-Compiler.
    Stephen C. Johnson
    Computing Science Technical Report Nº 32, 1975. Bell Laboratories. Murray Hill, New

  • Lex: A Lexical Analyzer Generator.
    M. E. Lesk, E. Schmidt. Computing Science Technical Report Nº 39, 1975. Bell Laboratories. Murray Hill, New Jersey.

  • lex & yacc.
    John R. Levine, Tony Mason, Doug Brown.
    O’Reilly & Associates. 1995.

  • Elements of the theory of computation.
    Harry R. Lewis, Christos H. Papadimitriou. Segunda Edición. Prentice Hall. 1998.

  • Un Algoritmo Eficiente para la Construcción del Grafo de Dependencia de Control.
    Salvador V. Cavadini.
    Trabajo Final de Grado para obtener el Título de Ingeniero en Computación.
    Facultad de Matemática Aplicada. U.C.S.E. 2001.

+1  A: 

A PDF version of Crenshaw's tutorial (see first post, maybe it can be added there):

Marco van de Voort
+5  A: 
  1. This is a vast subject. Do not underestimate this point. And do not underestimate my point to not underestimate it.
  2. I hear the Dragon Book is a (the?) place to start. Edit: along with searching. :) Get better at searching, eventually it will be your life.
  3. Building your own programming language is absolutely a good exercise! But know that it will never be used for any practical purpose in the end. Exceptions to this are few and very far between.
If you haven't read the Dragon book. Please don't recommend it. In fact, have you ever implemented a compiler?
Yeah, as the name implies, the Dragon Book is a monster. Very in-depth, but a very good resource nonetheless. I wouldn't recommend it for beginners, though...
Zachary Murray
I wouldn't recommend it for anyone.
@Neil: You haven't google'd me, have you? lol. But no, I haven't read that book.
I'm reading it (the dragon book) presently, and also Lex/Yacc at the same time, I'm finding the book quite good. Personally.
Simeon Pilgrim
If you like it all well and good. My problem is with people that blindly recommend it whenever the word "compiler" is mentioned. Particularly if they haven't actually read it!
Neil, what do you find bad about the book? I haven't read it yet, I just keep hearing it's a good book, until now.
To be fair, I prefaced it with "I hear...". :) #1 and #3 are the points I feel are extremely important to know going in but aren't mentioned as often.
It's still worth reading the Dragon Book even if you disagree with its approach. Compiler design is a very sticky subject and it's important to understand all the strange issues one has to contend with.
+10  A: 

You might want to look into Lex/Yacc (or Flex/Bison, whatever you want to call them). Flex is a lexical analyzer, which will parse and identify the semantic components ("tokens") of your language, and Bison will be used to define what happens when each token is parsed. This could be, but is definitely not limited to, printing out C code, for a compiler that would compile to C, or dynamically running the instructions.

This FAQ should help you, and this tutorial looks quite useful.

Zachary Murray
+38  A: 

This is a pretty vague question, I think; just because of the depth of the topic involved. A compiler can be decomposed into two separate parts, however; a top-half and a bottom-one. The top-half generally takes the source language and converts it into an intermediate representation, and the bottom half takes care of the platform specific code generation.

Nonetheless, one idea for an easy way to approach this topic (the one we used in my compilers class, at least) is to build the compiler in the two pieces described above. Specifically, you'll get a good idea of the entire process by just building the top-half.

Just doing the top half lets you get the experience of writing the lexical analyzer and the parser and go to generating some "code" (that intermediate representation I mentioned). So it will take your source program and convert it to another representation and do some optimization (if you want), which is the heart of a compiler. The bottom half will then take that intermediate representation and generate the bytes needed to run the program on a specific architecture. For example, the the bottom half will take your intermediate representation and generate a PE executable.

Some books on this topic that I found particularly helpful was Compilers Principles and Techniques (or the Dragon Book, due to the cute dragon on the cover). It's got some great theory and definitely covers Context-Free Grammars in a really accessible manner. Also, for building the lexical analyzer and parser, you'll probably use the *nix tools lex and yacc. And uninterestingly enough, the book called "lex and yacc" picked up where the Dragon Book left off for this part.

+2  A: 
  • Start by making sure you can answer most of the questions tagged c++ here on StackOverflow.
  • After that you should make sure you understand how other compilers work and understand [parts of] their source code.
  • You'll notice you need assembler and will start learning assembler until you can answer many questions with that tag.
  • If you've come this far, you'll find that several years have passed and realize how big such a project is and possibly smile at your own question from back then (if this page still exists at that time) ...
Not to be rude but, it sounds like you probably haven't written a simple compiler.
+2  A: 

I'm surprised it hasn't been mentioned, but Donald Knuth's The Art of Computer Programming was originally penned as a sort of tutorial on compiler writing.

Of course, Dr. Knuth's propensity for going in-depth on topics has led to the compiler-writing tutorial being expanded to an estimated 9 volumes, only three of which have actually been published. It's a rather complete exposition on programming topics, and covers everything you would ever need to know about writing a compiler, in minute detail.

+2  A: 

Whenever I want to try out a new language idea, I just write a simple parser, and have it generate some language that's easy to get good compilers for, like C.

How do you think C++ was done?

Mike Dunlavey
+5  A: 

Generally speaking, there's no five minutes tutorial for compilers, because it's a complicated topic and writing a compiler can take months. You will have to do your own search.

Python and Ruby are usually interpreted. Perhaps you want to start with an interpreter as well. It's generally easier.

The first step is to write a formal language description, the grammar of your programming language. Then you have to transform the source code that you want to compile or interpret according to the grammar into an abstract syntax tree, an internal form of the source code that the computer understands and can operate on. This step is usually called parsing and the software that parses the source code is called a parser. Often the parser is generated by a parser generator which transform a formal grammar into source oder machine code. For a good, non-mathematical explanation of parsing I recommend Parsing Techniques - A Practical Guide. Wikipedia has a comparison of parser generators from which you can choose that one that is suitable for you. Depending on the parser generator you chose, you will find tutorials on the Internet and for really popular parser generators (like GNU bison) there are also books.

Writing a parser for your language can be really hard, but this depends on your grammar. So I suggest to keep your grammar simple (unlike C++); a good example for this is LISP.

In the second step the abstract syntax tree is transformed from a tree structure into a linear intermediate representation. As a good example for this Lua's bytecode is often cited. But the intermediate representation really depends on your language.

If you are building an interpreter, you will simply have to interpret the intermediate representation. You could also just-in-time-compile it. I recommend LLVM and libjit for just-in-time-compilation. To make the language usable you will also have to include some input and output functions and perhaps a small standard library.

If you are going to compile the language, it will be more complicated. You will have to write backends for different computer architectures and generate machine code from the intermediate representation in those backends. I recommend LLVM for this task.

There are a few books on this topic, but I can recommend none of them for general use. Most of them are too academic or too practical. There's no "Teach yourself compiler writing in 21 days" and thus, you will have to buy several books to get a good understanding of this entire topic. If you search the Internet, you will come across some some online books and lecture notes. Maybe there's a university library nearby you where you can borrow books on compilers.

I also recommend a good background knowledge in theoretical computer science and graph theory, if you are going to make your project serious. A degree in computer science will also be helpful.

++ You're right that it's good to know all those things, and it can be a big job, but I also learned from some experts how _not_ to make things a big deal. It's good to know things, and it's even better to know when not to use them, which is most of the time.
Mike Dunlavey
+1  A: 

In order to get a deeper understanding of parsing I recommend to read Parsing Techniques - A Practical Guide and a good book on theoretical computer science.

Already included in the Big List of Resources
Yes, but not the second edition, which is really better and up-to-date. However, as a parser author you may not need this new information.
+4  A: 

I remember asking this question about seven years ago when I was rather new to programming. I was very careful when I asked and surprisingly I didn't get as much critisism as you are getting here. They did however point me in the direction of the "Dragon Book" which is in my opinion, a really great book that explains everything you need to know to write a compiler (You will of course have to master a language or two. The more languages you know, the marrier)

And yes many people say reading that book is crazy and you wont learn anything from it, but I disagree completely with that.

Many people also say that writing compilers are stupid and pointless. Well, there are a number of reasons why compiler development are useful: - Because it's fun. - It's educational, when learning how to write compilers you will learn alot about computer science and other techinques that are useful when writing other applications. - If nobody wrote compilers the existing languages wouldn't get any better.

I didn't write my own compiler right away, but after asking I knew where to start. And now, after learning many different languages and reading the Dragon Book, writing isn't that much of a problem. ( I'm also studying Computer Engineering atm, but most of what I know about programming is self taught )

In conclusion: - The Dragon Book is a great "tutorial" But spend some time mastering a language or two before attempting to write a compiler. Don't expect to be a compiler guru within the next decade or so though.

Edit: The book is also good if you want to learn how to write parsers/interpreters.

+2  A: 

Not a book, but a technical paper and an enormously fun learning experience if you want to know more about compilers (and metacompilers)... this website walks you through building a completely self-contained compiler system that can compile itself and other languages:

This is all based on an amazing little 10-page technical paper:

Val Schorre META II: A Syntax-Oriented Compiler Writing Language

from honest-to-god 1964. I learned how to build compilers from this back in 1970. There's a mind-blowing moment when you finally grok how the compiler can regenerate itself....

I know the website author from my college days, but have nothing to do with the website.

Ira Baxter
+3  A: 

ANLTR isn't in here? Wow! Don't forget the book.

Parr has a new book as well named Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages.
Taylor Leese
+3  A: 

I've created a video tutorial for ANTLR 3.x at

It'll eventually cover creating the entire compiler; I just finished the recognizer section

Enjoy! -- Scott

Scott Stanchfield
+1  A: 

Take a look at the book below. The author is the creator of ANTLR.

Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages.

alt text

Taylor Leese
+5  A: 

From the comp.compilers FAQ:

"Programming a Personal Computer" by Per Brinch Hansen Prentice-Hall 1982 ISBN 0-13-730283-5

This unfortunately-titled book explains the design and creation of a single-user programming environment for micros, using a Pascal-like language called Edison. The author presents all source code and explanations for the step-by-step implementation of an Edison compiler and simple supporting operating system, all written in Edison itself (except for a small supporting kernel written in a symbolic assembler for PDP 11/23; the complete source can also be ordered for the IBM PC).

The most interesting things about this book are: 1) its ability to demonstrate how to create a complete, self-contained, self-maintaining, useful compiler and operating system, and 2) the interesting discussion of language design and specification problems and trade-offs in Chapter 2.

"Brinch Hansen on Pascal Compilers" by Per Brinch Hansen Prentice-Hall 1985 ISBN 0-13-083098-4

Another light-on-theory heavy-on-pragmatics here's-how-to-code-it book. The author presents the design, implementation, and complete source code for a compiler and p-code interpreter for Pascal- (Pascal "minus"), a Pascal subset with boolean and integer types (but no characters, reals, subranged or enumerated types), constant and variable definitions and array and record types (but no packed, variant, set, pointer, nameless, renamed, or file types), expressions, assignment statements, nested procedure definitions with value and variable parameters, if statements, while statements, and begin-end blocks (but no function definitions, procedural parameters, goto statements and labels, case statements, repeat statements, for statements, and with statements).

The compiler and interpreter are written in Pascal* (Pascal "star"), a Pascal subset extended with some Edison-style features for creating software development systems. A Pascal* compiler for the IBM PC is sold by the author, but it's easy to port the book's Pascal- compiler to any convenient Pascal platform.

This book makes the design and implementation of a compiler look easy. I particularly like the way the author is concerned with quality, reliability, and testing. The compiler and interpreter can easily be used as the basis for a more involved language or compiler project, especially if you're pressed to quickly get something up and running.

joe snyder
+3  A: 

Missing from the list: Garbage Collection: Algorithms for Automatic Dynamic Memory Management, by Jones and Lins.

(Assuming you're writing the compiler and runtime system, and that you're implementing a garbage collected language.

+2  A: 

I found the Dragon book much too hard to read with too much focus on language theory that is not really required to write a compiler in practice.

I would add the Oberon book which contains the full source of an amazingly fast and simple oberon compiler Project Oberon

alt text

+8  A: 

If you have little time, I recommend Niklaus Wirth's "Compiler Construction" (Addison-Wesley. 1996), a tiny little booklet that you can read in a day, but it explains the basics (including how to implement lexers, recursive descent parsers, and your own stack-based virtual machines). After that, if you want a deep dive, there's no way around the Dragon book as other commenters suggest.

Dr. Jochen L. Leidner
+1  A: 

I've heard good things about create your own programming lang check it out.

Although I haven't personally read it yet, but have a look if you can get your hands on it.

$39.99 is pretty expensive for a book of 53 pages, I bought a graphics programming book that has about 1000 pages for the double of that price.
+3  A: 

I can't believe that list not mentions /-IMHO-/ most pragmatic book on compiler creation:

Crafting a compiler with C

A bit outdated, but is fully complete and self-sufficient for starting to code small compiler.