tags:

views:

814

answers:

13

Inspired by Eric Sink's interview on the stackoverflow podcast I would like to build a full compiler in my spare time for the learning experience. My initial thought was to build a C compiler but I'm not sure whether it would take too much time.

I am wondering if there is a smaller general purpose language that would be more appropriate to implement as a first compiler effort? Or is a C implementation doable on a reasonable timescale (200 hrs)?

It is my intention to target the CLR.

+8  A: 

Write a Scheme compiler.

See: An incremental approach to compiler construction

squadette
I have to disagree on this one. The challenges involved in implementing a Scheme compiler are drastically different than that of a language like C, Python, etc. If you're wanting to learn the internals of a compiler for anything outside of the Lisp world, Scheme just isn't a good choice.
Cody Brocious
@squadette: Thanks. This paper looks awesome. While the Pascal suggestion is more what I was thinking of I'm going to have a look at doing this as well. If nothing else hopefully I would come out of it know Lisp a lot better.
Brownie
One thing that a Scheme compiler would "gloss over" is the lexing and parsing phase. Scheme doesn't have the common problems of more algebraic expression parsers (operator precendence, etc.) and the complicated grammars.
Will Hartung
+1 for Scheme. Fond memories of my SICP class
Tim
+4  A: 

My suggestion is to pick your favorite language. The knowledge you have going into it will outweigh the difficulty of writing a compiler for it, typically.

Cody Brocious
Except if your favorite language is C++ ;)
Juan Pablo Califano
+1  A: 

Whichever language you choose, you could consider compiling to intermediate language ( IL ) to target the Common Language Runtime ( CLR ). I presume targetting the Java Virtual Machine ( JVM ) would be similar for non-Windows, or prehaps the CLR implementation of in Mono? This would probably greatly simiplify the job and would let you have something that performed well from the off. You later re-target a specific architecture if you wanted to go further.

Tom Grove
Thanks Tom. It is my intention to target the CLR initially. I will add a note to the question.
Brownie
contrary to propaganda; CLR is heavily targeted to a subset of languages, mostly imperative. anything far enough from C, Java, Python, and you'll be missing interesting features best done at low level. ie: continuations, or even closures are non-optimal.
Javier
+9  A: 

You'll be happiest writing compilers for older, smaller languages. Pascal, for example, were designed as learning tools. The Pascal language is small and elegant; the compiler can be written fairly simply.

Even an Oberon or Modula-2 compiler is similar in complexity to Pascal; their design was driven by the same person, Niklaus Wirth.

Languages like C, which evolved organically, are too full of quirks to be good learning experiences.

S.Lott
+1 Heh, you beat me to Pascal by a couple of seconds. A good choice, IMO.
Shane MacLaughlin
Oberon is even simpler than Modula-2, Modula-3 or Pascal. (Modula-3 is from DEC, The rest are from Niklaus Wirth, who has a gift for simplification.)
bendin
Pascal was indeed designed as a teaching language, but Modula-2 was designed as a systems implementation language for the operating system and application software of a workstation (called Lilith) that was being developed under the lead of Wirth at ETH Zurich. It was not designed as a teaching language.
trijezdci
A: 

In terms of simplicity, FORTH is going to be one of the easier languages to develop. It's threaded interpretive rather than truly compiled, but you'll still be dealing with parsing, variable storage, etc..

For a compiler, I'd go with C or Pascal, both of which are quite compact and have source for compilers available.

Shane MacLaughlin
+1  A: 

I can't think of any one language that is simple enough to use as a first compiler-writing exercise. I don't think I'd try C for a first cut. Why not invent your own language? Maybe it'll be a real hit.

Adam Hawes
+4  A: 

If you want a compact tutorial, why not consider Wirth's Compiler Construction (pdf). The source language (Oberon-0) is simple enough to keep the compiler comprehensible. The implementation language (Oberon) should be readable to anyone who has done some programming.

As to which language to use to implement the compiler. Use something you are familiar with. When in doubt, choose a language that will not unnecessarily complicate the attempt: Something with garbage collection. Something that makes it easy to print or otherwise dump internal data structures for inspection. Python, Scheme and Lua all come to mind.

The final consideration is what to target with your compiler. The virtual machines JVM and CLR have been metioned, I'm sure. You could go that route. It might be easier, for a first attempt to use a simulator for a stripped-down RISC processor as your target. (Wirth's compiler book does this.)

I wouldn't recommend targeting x86 for your first compiler as it's hideous beyond words. I also wouldn't target a high(er) level language like C because you'll miss out on a lot of the interesting details, like how to implement short-circuit semantics for boolean operators and such like.

bendin
A: 

Write a brainfuck or forth compiler. BASIC is perhaps also such language not too rich in features. I think C would be moderately hard. Do not envy about the target arch. Use whatever you have.

If you don't want to implement an assembler then put your compiler output assembly code and push it to gas or nasm.

Cheery
+2  A: 

Whatever language you choose, remember you can define your own set of supported features to customize it to fit your learning goals. If you want to learn about compilers (which it sounds like you do), then you could write a C compiler but just drop support for some random feature, like pointers for example, or only implement a subset of the keywords, just to make it more manageable.

Of course, if your goal is to get really intimate with a particular language, you'll want to fully implement a compiler for that language.

Bill the Lizard
+1  A: 

Pascal has already been mentioned, but I'd like to add that Niklaus Wirth's book Algorithms + Data Structures = Programs contains a complete implementation of a small Pascal-like language using recursive descent. If you're looking for a theory-intensive discussion of parsing, look elsewhere; but if you want straightforward code that lets you learn by doing, then I'd recommend A + DP = P.

joel.neely
+2  A: 

Another point in favor of Scheme: it's practical for a beginner to write a self-hosting compiler for it, like Kragen Sitaker's Ur-Scheme, his first compiler. There are few other 'tutorial' compilers powerful enough to compile themselves (though there are some pointers at the link). This brings more realism and interest to the problem.

Darius Bacon
A: 

I'd recommend writing a brainf**k compiler. It is a very simple and good for a first compiler. And timescale would be more like 1 and a half hours. Some other good languages are Forth, Logo, and Lisp.

None
A: 

In a compiler course, we wrote compilers for a subset of C (I liked to think of it as C--). It wasn't that difficult since you knew where your boundaries were. You can always refactor and add more features later.

Matt