views:

742

answers:

11

I've been toying with the idea of building a new general purpose programming language lately, and I was wondering where to go for help. Does anyone have a favorite book? Tutorial? Tools? I see the primary benefit of the project being that I will learn more about language design. I have little hope that it will become the next Ruby or whatever. Still, I'd like to put real thought into it and understand the decisions I'm making.

I'm not looking for advice on the language itself here, although I may ask particular questions along the way.

Edit:

I hesitated to provide specifics at first, because I didn't want to bias the answers. I have had some experience with ANTLR in college, but that was a while back, and I didn't do so well in my compilers course. I still can't remember the difference between LALR and, um, the opposite of LALR.

That said, I'm thinking of targeting the Java VM because it has vast deployment in Enterprise America (which is where I'm likely to spend a good portion of my career), and it's very well supported. I may look at the Parrot VM as well.

A: 

A good starting point might be DSL Tools.

Will
+2  A: 

You can define a syntax using Backus Naur Form.

I've found the book Brinch Hansen on Pascal Compilers (although the title says Pascal, the advice can be generalized to other languages) to be both readable and informative in explaining semantic considerations. These are rules to guarantee that your grammar is understandable to a compiler. I don't know if this book is still in print, but you can probably find a used copy somewhere.

The Amber Scripting Language website has lots of details on the design of the Amber language, and provides a good example to study.

You might want to look into compiler tools like Lex and Yacc to build a compiler for your language.

Finally, you might take a look at the Parrot Virtual Machine and the languages others have built for it.

These should at least give you some ideas on the way to go.

Bruce Alderman
+1  A: 

I like ANTLRWorks for prototyping and debugging grammars when working on languages with Java.

John the Statistician
+1  A: 

If you're completely new to compilers, I would strongly suggest that you get a book. For learning an entirely new subject, I almost always find books to be better than tutorials. The material is often better presented in books, and the material definitely gets better coverage. A tutorial just can't complete with 400 dedicated pages. A good book will have exercises to help you learn the material as well.

I recommend Compiler Construction: Principles and Practice by Louden. It's well written, and has a very practical slant. It's less about theory and more about getting stuff built. There are also several free books online dedicated to compilers, including Wirth's Compiler Construction.

Edit: I have no idea what Markdown is mangling my links. They look fine the edit preview. Filed a bug report.

Derek Park
+8  A: 

I've lent out my hardcopy of Compiler Construction (free PDF!) by Niklaus Wirth — the creator of Pascal — several times as an "introduction to building a compiler" book. It's not the most comprehensive book on creating compilers, but it's a short, concise read and will get you started quickly.

Also, do not write your own JIT and consider very carefully whether you should even create your own virtual machine. LLVM, the Low-Level Virtual Machine, is a very well-designed Open Source framework with an extremely liberal license, and it's specifically for doing code generation and optimization for a wide variety of target CPUs given input in an abstract RISC instruction set.

Essentially, with LLVM, you generate LLVM "bitcode" and ask LLVM to perform optimizations and code generation for your target CPU based on that. The bitcode is itself a very efficient representation, and it has a property (single static assignment form) that makes it very straightforward to do good optimization with. The effort you spend figuring out how to implement your own interpreter or virtual machine would be almost guaranteed to be better spent learning how to use LLVM for your code generation; after all, if you do that, you'll get both static and dynamic native code generation for a wide variety of platforms (including exotic ones like Cell).

Chris Hanson
A: 

There's Let's Build a Compiler

I have other links at http://delicious.com/marxidad/langdev

Mark Cidade
+2  A: 

I have not read it but many people tout the Dragon Book as THE book for compilers.

That being said I have also heard that it is much too much data to start out with and that you should start off with something much simpler. Peruse through a bit of Lets build a compiler. Try some things out.

In fact here is a good resource. It has the linked above Lets build a compiler as well as another interesting article called A Nanopass Framework for Compiler Education.

That brings me to A Nanopass Framework for Compiler Education [PDF] by Sarkar, Waddell, and Dybvig. The details of this paper aren't quite as important as the general concept: a compiler is nothing more than a series of transformations of the internal representation of a program. The authors promote using dozens or hundreds of compiler passes, each being as simple as possible. Don't combine transformations; keep them separate. The framework mentioned in the title is a way of specifying the inputs and outputs for each pass. The code is in Scheme, which is dynamically typed, so data is validated at runtime.

mk
The "Dragon Book" is THE Book for starting out when you know nothing. Once you have the basics there are many other books.
Martin York
The Dragon Book has no information whatsoever on language design. The Dragon Dook is about compiler construction. Two different things.
Markus Schnell
+6  A: 

Books:

  • If you only had time to read one language design / implementation book, I'd recommend reading Programming Language Pragmatics for both inspiration and information.

  • If you really want to get serious with ANTLR (and it's not a bad choice), you should pick up The Definitive ANTLR reference. The PDF version costs just $24 and is far superior to the documentation that can be found online.

  • For an entertaining and educational story about designing a real world language, I heartily recommend The Design and Evolution of C++. At least if statically typed, compiled-to-native-code languages are your cup of tea.

As for language design, I've found the following articles inspiring:

Discussion forums:

  • If you are looking for information regarding a particular design issue, it has been probably discussed in comp.lang.misc. Not too active nowadays.

  • The D language newsgroup contains some good language design specific discussion, or at least it did when I frequented there a couple of years ago.

  • Lambda the Ultimate is mostly about theoretical functional programming stuff but occasionally touches upon language design issues (for example, Some words of advice on language design)

Interesting languages-under-design I've bumped upon are Jolt, Heron and and PicoC (from the Internet Archive as the pages are no longer available as they were.)

Link to Heron at Internet Archive (MarkDown formatter doesn't support * in URL):

http://web.archive.org/web/*/http://www.heron-language.com/

Antti Sykäri
+1 for the links to LtU and Yegge's and Graham's blogs.
Damien Pollet
A: 

I'd suggest looking at the reading list from this discussion on "Lambda the Ultimate" (a discussion site targeted for academic programming language research and advanced practitioners).

joel.neely
+2  A: 

Definitely, the Dragon Book is the resource you are looking for.

It will give you a wide range of topics to pay attention when building your compiler, but if you never read about this kind of topics it may be a little difficult. For help in formal language theory you can get this book.

Also, see my resource list here for more information.

eKek0
+8  A: 

I'm rather surprised that most of the answers only consider compiler writing. But the compiler only a small and rather simple part. A complete language is way more than that, it's the model, the syntax, and the tools, etc. So,

Stand on the shoulder of giants

  • Learn Lisp. Virtually all known language features have been implemented in it.
  • Learn Smalltalk. The compiler and the debugger are both written in Smalltalk and their code is available for reading in any image.
  • Learn a functional language like Haskell or O'Caml. This is how to do static types seriously, if you're on that side of the static/dynamic argument.
  • Think of the syntax as a user interface, with all the affordance/usability questions this suggests.

Also, keep an open mind and a critical eye for technical choices. E.g. the traditional parsing algorithms are indeed well documented in textbooks, but the whole family of PEG and packrat parsers are really simple and efficient, and way more debuggable than table-driven LALR algorithms. You also probably don't want a complicated compiler, e.g. Chrome V8's compiler was designed as an extremely simple one, for robustness, but it still outperforms other JS implementations.

Speaking of a debugger, design the tools with the language. IMHO every language should come with pretty-printer, real debugger, refactoring support, etc.

Damien Pollet
+1 for your final comment!If previous generations had kept these concerns in mind, we would today have automated refactoring tools for languages such as C++
none
Actually I suspect we wouldn't have C++ in the first place, because it's more difficult to make refactoring tools work for it :)
Damien Pollet