views:

126

answers:

4

Possible Duplicate:
Learning to write a compiler

I looked around trying to find out more about programming language development, but couldn't find a whole lot online. I have found some tutorial videos, but not much for text guides, FAQs, advice etc. I am really curious about how to build my own programming language. It brings me to SO to ask:

How can you go about making your own programming language?

I would like to build a very basic language. I don't plan on having a very good language, nor do I think it will be used by anyone. I simply want to make my own language to learn more about operating systems, programming, and become better at everything.

Where does one start? Building the syntax? Building a compiler? What skills are needed? A lot of assembly and understanding of the operating system? What languages are most compilers and languages built in? I assume C.

+3  A: 

It entirely depends on what your programming language is going to be like.

  • Do you definitely want it to be compiled? There are interpreted languages as well... or you could implement compilation at execution time

  • What do you want the target platform to be? Some options:

    • Native code (which architectures and operating systems?)
    • JVM
    • Regular .NET
    • .NET using the Dynamic Language Runtime (like IronRuby/IronPython)
    • Parrot

Personally I would strongly consider targeting the JVM or .NET, just because then you get a lot of "safety" for free, as well as a huge set of libraries your language can use. (Obviously with native code there are plenty of libraries too, but I suspect that getting the interoperability between them right may be trickier.)

I see no reason why you'd particularly want to write a compiler (or other part of the system) in C, especially if it's only for educational purposes (so you don't need a 100-million-lines-a-second compiler). What language are you personally most productive in?

Jon Skeet
I can code in C and C++. I prefer C, though. I mostly want to get at a very low level, learn how a compiler works, executables are made, code is loaded into memory and executed. I eventually want to make my own tiny operating system with my own little system/user programs. I don't intend to do anything on a large scale, but very, very minimal. I would be working mostly within Linux, and would like the language to be used on most *nix systems.
Google
+4  A: 

Take a look at ANTLR. It is an awesome compiler-compiler the stuff you use to build a parser for a language.

Building a language is basically about defining a grammar and adding production rules to this grammar. Doing that by hand is not trivial, but a good compiler-compiler will help you a lot.

You might also want to have a look at the classic "Dragon Book" (a book about compilers that features a knight slaying a dragon on the front page). (Google it).

Building domain specific languages is a useful skill to master. Domain specific languages is typically not full featured programming language, but typically business rules formulated in a custom made language tailor made for the project. Have a look at that topic too.

Holstebroe
Thanks I found the book and ANTLR looks very interesting and time saving.
Google
If you want to dive into parsers and / or ANTLR I can recommend Terence Parr's book. He is quite good at making the difficult topic of parser writing understandable.
Holstebroe
+2  A: 

I'd say that before you begin you might want to take a look at the Dragon Book and/or Programming Language Pragmatics. That will ground you in the theory of programming languages. The books cover compilation, and interpretation, and will enable you to build all the tools that would be needed to make a basic programming language.

I don't know how much assembly language you know, but unless you're rather comfortable with some dialect of assembly language programming I'd advise you against trying to write a compiler that compiles down to assembly code, as it's quite a bit of a challenge. You mentioned earlier that you're familiar wtih both C and C++, so perhaps you can write a compiler that compiles down to C or C++ and then use gcc/g++ or any other C/C++ compiler to convert the code to a native executable. This is what the Vala programming language does (it converts Vala syntax to C code that uses the GObject library).

As for what you can use to write the compiler, you have a lot of options. You could write it by hand in C or C++, or in order to simplify development you could use a higher level language so that you can focus on the writing of the compiler more than the memory allocations and the such that are needed for working with strings in C.

You could simply generate the grammars and have Flex and Bison generate the parser and lexical analyser. This is really useful as it allows you to do iterative development to quickly work on getting a working compiler.

Another option you have is to use ANTLR to generate your parser, the advantage to this is that you get lots of target languages that ANTLR can compile to. I've never used this but I've heard a lot about it.

Furthermore if you'd like a better grounding on the models that are used so frequently in programming language compiler/scanner/parser construction you should get a book on the Models of Computation. I'd recommend Introduction to the Theory of Computation.

You also seem to show an interest in gaining an understanding of operating systems. This I would say is something that is separate from Programming Language Design, and should be pursued separately. The book Principles of Modern Operating Systems is a pretty good starting place for learning about that. You could start with small projects like creating a shell, or writing a programme that emulates the ls command, and then go into more low level things, depending on how through you are with the system calls in C.

I hope that helps you.

Varun Madiath
Thanks, really insightful post! I will most certainly look up everything, great post!
Google
Thanks for marking this the right answer. By the time I had typed everything out, someone else's answer had already been marked right. It's the first answer I've posted on this site that was accepted.
Varun Madiath
A: 

There are various tutorials online such as Write Yourself a Scheme in 48 hrs.

One place to start tho' might be with an "embedded domain specific language" (EDSL). This is a language that actually runs within the environment of another, but you have created keywords, operators, etc particularly suited to the subject (domain) that you want to work in.

Gaius