tags:

views:

439

answers:

5

I've been getting into compiler creation. I've found some terrific beginner stuff and advanced stuff but nothing in the middle. I've created 3 different simple proof-of-concept compilers for toy languages but I want to expose myself to something real.

The most straight forward real language in terms of syntax seems to be C. Since the language I'm most comfortable with right now is C#, I'd love to study the source code of a real non-tutorial C compiler written in C#. Does one (with source code available) exist?

Edit: to clear up some confusion:

Ideally I'd like a C compiler, not a .NET or C# compiler, but with the source code written in C#.
I know C# --> C feels a little backwards but it'll allow me to ease deeper into compilers starting with a familiar language before I go changing that too.

Although I'm not looking for C#/.NET compilers, here are some in case someone sees this question who is looking for that:

+4  A: 

found this via google.

http://blogs.msdn.com/jmstall/archive/2005/02/06/368192.aspx

EDIT: and this (not exactly C): http://msdn.microsoft.com/en-us/magazine/cc136756.aspx

Luiscencio
Good find and I know there's the Mono one as well but I'm really looking for a C compiler if possible.
Dinah
+6  A: 

You are going to have a hard time finding sample code. Compiler writers use bootstrapping. The first C compiler was written in B. Which was then used to write the first C++ compiler. Which was used to write the C# compiler. Which is very commonly used to write compilers for managed code.

This is not a process that ever goes backwards. Although side-ways was common, C compilers often were used to cross-compile a compiler for another operating system.

I think I used this book, it has terrific C compiler code in the appendices. Written in C. I used parts of it when writing a Basic compiler I needed in a large project. The expression parser is hard to get right, it has an elegant solution for the operator precedence rules.

Targeting a managed language is the easier way to get this going. The language shouldn't matter too much, it is getting it working that is the real challenge. Even though it is a lot easier to get managed code working. If you want to target C, you'll need black-belt machine code skillz and deep insight in the object file format and the linker.

Hans Passant
Btw: not so sure about the book.
Hans Passant
The first C compiler was not written in BCPL - where did you get this from?
anon
@Neil: it is a tedious detail I think I read somewhere 15 years ago. I don't remember, who the hell cares. Well, obviously you think it is crucial enough to down vote a post that has *nothing* to do with what was on that punch tape. Edit the frikkin post if it that important to you.
Hans Passant
@Hans Downvotes are intended to be used on technically incorrect answers.
anon
Please @Neil, share with us. What was it written in?
cciotti
@Neil: collaborative editing is intended to be used to improve answers. It takes changing one word in this post.
Hans Passant
To quote DMR from http://cm.bell-labs.com/cm/cs/who/dmr/chist.html: In 1971 I began to extend the B language by adding a character type and also rewrote its compiler to generate PDP-11 machine instructions instead of threaded code. Thus the transition from B to C was contemporaneous with the creation of a compiler capable of producing programs fast and small enough to compete with assembly language. I called the slightly-extended language NB, for `new B.'
anon
@Hans I almost never edit other users answers, apart from trivial spelling corrections, and even then not often.
anon
@Neil, you're a trooper. Thanks for the 4 upvotes.
Hans Passant
+9  A: 

The most straightforward real language in terms of syntax seems to be C.

I'm not sure what you mean by "real language", but whatever "real language" means, I cannot agree that C has a "straightforward" lexical or syntactic grammar, and its semantics are underspecified. If you want an extremely straightforward language with pretty well-defined semantics, why not go for Scheme? Scheme has a very easy grammar but is certainly not trivial to get its semantics right.

Eric Lippert
In what way are the semantics underspecified?
Dinah
@Dinah: Many aspects of C and C++ are left up to the discretion of the compiler writer, and different compiler writers choose different semantics. What order function arguments are evaluated in, for instance. (Though, to be fair, scheme also leaves this underspecified. However, this is mitigated somewhat by the fact that scheme discourages side-effecting code, which makes it less relevant which one goes first.)
Eric Lippert
I've never touched Scheme but I could give it a shot. Do you know of an open source Scheme compiler in C#?
Dinah
@Dinah: There are Scheme implementations for .NET but I do not know if they are open source.
Eric Lippert
If I recall correctly, C can be compiled in a single pass. So in some sense, C is very *straightforward* ;-)
Joren
+1  A: 

I don't know of one that exists, but there's no reason one couldn't, or shouldn't.

Writing a compiler for a C-like language is a classic project for one-semester college compiler courses. If you know C# already, it provides a lot of features which will make your job easier than when I was in college! There are plenty of libraries sitting around which will make the job easier, without taking away the challenge, and you can always replace them with your own ad-hoc code if you need flexibility they don't provide.

The first C compiler was written in BCPL because it's what they had. Current C compilers are usually written in C because they aim to be portable. I don't think anyone would argue that C is a good language for writing compilers in. (C# isn't perfect but it's a lot better!) In a statically-compiled language like C, I don't think you get much benefit, if any, from using the target language to write the compiler.

A compiler in an HLL potentially has many advantages. It'd be shorter and simpler than one written in C. That alone could make a lot of things sufficiently easier that they could be pulled below the threshold of "too hard that nobody's ever going to do them". (GCC is kind of the poster-child for how a compiler written in a LLL can be so complex that it moves at a glacial rate.) Optimizations are basically graph transformations, which aren't exactly C's forte.

I don't consider it "backwards" at all to use C# to compile C. Unless somebody's proposing to rewrite all their C code in a higher-level language, it still needs to be compiled somehow, and that means you need a compiler. Shouldn't that compiler be written with tools that enable it to offer the best reliability and performance?

Good luck! I look forward to seeing what you write!

Ken
A: 

here's a list of compilers on wikipeddia and maany of them are open source http://en.wikipedia.org/wiki/List_of_compilers

rashumaker