views:

865

answers:

9

What tips can you give a person who is looking to write a programming or script language? I am not worried about how to program nor design a compiler but how to develop one quickly using tools and code generators.

Last time i tried i coded it in c++ and the states and syntax took almost as long as writing the actual logic :(. I know the follow tools would help.

I was thinking i could generate c++ code and have gcc compile that. Using the tools above how long would you estimate it would take to write a program or script language?


Variations on this question have been asked repeatedly, as far back as http://stackoverflow.com/questions/1669/learning-to-write-a-compiler. Here is an incomplete list of SO resources on the topic.

+8  A: 

Estimating how long something like that might take is dependent on many different factors. For example, an experienced programmer can easily knock out a simple arithmetic expression evaluator in a couple of hours, with unit tests. But a novice programmer may have to learn about parsing techniques, recursive descent, abstract representation of expression trees, tree-walking strategies, and so on. This could easily take weeks or more, just for arithmetic expressions.

However, don't let that discourage you. As Jeff and Joel were discussing with Eric Sink on a recent Stack Overflow podcast, writing a compiler is an excellent way to learn about many different aspects of programming. I've built a few compilers and they are among my most memorable programming projects.

Some classic books on building compilers are:

Greg Hewgill
+2  A: 

The classic books on compiler design are

"Principles of Compiler Design" by Alfred V. Aho and Jeffrey D. Ullman. It's been around quite some time now and its pink knight and green dragon are well known to at least a couple of generations of CS students.

Also...

"Compilers: Principles, Techniques, and Tools" by Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman

If you're interested in writing a compiler then these are undoubtedly the best places to start.

Cruachan
+4  A: 

As a person who knows C++ very well, what tips can you give a person who is looking to write a programming or script language?

Don't do it. (or at least think long and hard before you do!)

If you're trying to write a scripting language to expose the methods/properties of some custom-written objects, it would be better to implement those in Java (or .NET/VB or all those icky Microsoftisms) and then use one of the Bean Scripting Framework languages as your scripting language. (with whatever the equivalent is on the Microsoft end.)

Jason S
ok, fine. :p I'm just saying it 'cause I spent several weeks writing a parser for a crude scripting language, but figured out later that I could've just used Javascript or Python by writing my object model and exposing it to an existing scripting language. Who wants to learn a new language?
Jason S
+1, we already have way too many programming languages. If it's really necessary, there are frameworks to create domain specific languages.
Wim Coenen
+2  A: 

I'd strongly recommend looking at existing bytecode interpreters. If you can make your language fit into CIL (.NET) or Java (or even others such as Python or Parrot), you'll save yourself all the effort of making a workable supporting environment and can get on with experimenting with language concepts.

bobince
+3  A: 

Any questions about compilers will have an answer "go read dragon book, read that book, this book..." on SO regardless of their content in a few minutes. So I skip that part (like I was telling in the first place). Reading these books to learn how to use the tools you want, is about as useful as reading about angular momentum to learn how to ride a bike.

So, to answer what you asked, without questioning your intention, I can easily recommend antlr and antlrworks for starters. You can generate your AST easily (where the real magic happens, I think) and debug your grammar visually. It generates a good portion of a working compiler for you.

If you know your stuff and want to have more control or don't like antlr, you can use lemon parser generator and ragel state machine compiler (have special support for lexing) together.

If you don't need too much performance and since you plan to generate C/C++ code, you can skip doing any optimizations yourself and leave that stuff to your C/C++ compiler.

If you can live with a slow runtime, you can further shorten your development effort just doing interpretation, since it is often easier to implement dynamic features this way.

artificialidiot
A: 

If you're planning on writing an interpreter or compiler, don't do it because you want to write the next big thing. Write it because you already have a purpose for it in mind or to learn. If you do this you may find that you've accidentally written the next big thing.

Jason Baker
+1  A: 

Dave Hanson, who with Chris Fraser spent 10 years building one of the world's most carefully crafted compilers, told me once that one of the main things he learned from the experience was not to try to write a compiler in C or C++.

If you want to develop something quickly, don't generate native code; target an existing virtual machine such as the CLR, JVM, or the Lua virtual machine. Generate code using maximal munch.

Another good option if you're writing an interpreter is just to use the memory management and other facilities of your underlying programming language. Parse to an AST and then interpret by tree walk of the AST. This will get you off the ground fast. Performance is not the greatest, but it's acceptable. (Using this technique I once wrote a PostScript interpreter in Modula-3. The first implementation took a week and although it later underwent some performance tuning, primarily in the lexer, it never had to be replaced.)

Avoid LALR parser generators; use something that saves your time, like ANTLR or the Elkhound GLR parser generator.

Norman Ramsey
+1  A: 

A good tool that I've used for LALR is the GOLD Parsing System. It's free, the grammer is Backus-Naur Form, and there are multiple examples, including engines written in C#, VB.NET, Java and others. This lets you write a grammer, compile the grammer to a file, and then use an engine to parse the grammer.

As recommended above, I would recommend targeting a byte code of some kind, like IL. This will allow you to leverage the enormous amounts of existing frameworks.

Good Luck

PerryJ
A: 

If you don't want to get into writing a compiler to reduce your language to assembly/machine, then your next option is to write a compiler to a byte-code language virtual machine, such as the JVM, PVM or .NET.

Of course, if you don't even want to do that - you just want to create your own "domain specific language", I would build it in Common Lisp. Lisp macros provide a rather straight-forward method of creating whatever syntax you want and parsing it into Lisp. And you don't have worry about byte-code or assembly. Of course, you need to learn Lisp.

Technical Bard