Have you ever implemented a programming language, either your own or an existing language and why did you decide to do it?

+5  A: 

Depends what you mean precisely but I've

  • Written a C compiler for the BBC Micro

(this was at work, but for fun)

  • Wrote the (back end of the) first-ever C compiler for the ARM processor

(this was "real" work and was needed to port UNIX to the ARM)

and written quite a few domain specific languages, mostly in Lisp.

+1  A: 

Yep, a simply basic interpreter, written in c, sometime in the 80s.

I did it for the educational aspects. I learned about parsing, and expression evaluation, and debugging.

It was a special project my senior year at college. I learned more doing that one project than the rest of my classes together.

+1  A: 

I wrote a C++ compiler. As to why... because it was a homework assignment in college. :)

Tom H.
you had an assignment which had you write a c++ compiler. what kind of a course was that. even graduate compiler designer courses at my uni make you write a compiler for a small subset of C.
Two possibilities:1. The assignment was meant to be done by a large group.2. He didn't implement C++, just a subset. And, let me guess, the subset didn't include templates.Ah, I forgot another one:3. 1 + 2
Eduardo León
+3  A: 

I've nearly done it, twice:

  • When I was taking A-level electronics, I decided to design and build a computer. This was a little ambitious, but I got some of the logic bits done, and fully designed and emulated the instruction set (which was very simple) on a PC.

  • I nearly had to implement the HelloWorld language just for the sake of honour.

EDIT: I can't believe I forgot to include Logo. When I was a boy I owned a Sinclair ZX Spectrum, but the schools all had BBC Micros. We had Logo at school, and I liked it - so I learned trigonometry from the Spectrum manual (which was wonderful) and wrote a Logo interpreter. I should have been overwhelmed by the idea of doing it, and rejected it as beyond my capabilities. Back then I didn't stop to think whether or not I would really be able to do something before I tried it. It worked, although it was obviously clunky (I was about 10 or 12). I even remember that line 7000 started the routine to draw the turtle (in an XOR fashion). Good old GOSUB 7000 - if the turtle isn't showing, just add another call...

Jon Skeet
+11  A: 

I invented a language and wrote a full compiler for it (from parsing source to generating X86 code). This was for a research project as an undergraduate. It took me about a year, working a few hours a week.

My main goal was really to learn how compilers are implemented. I really think the best way to learn about something is by doing it. My architecture wasn't the best, and I've since worked on a couple different compilers, so I know what to do better next time.

My second goal was just to explore programming language ideas. After I got the basic architecture, I spent most of my time playing with language features. I combined first class functions and classes in one language, and I added generics (both for functions and classes) and type inference.

The project is defunct now since I've moved onto other things, but it was definitely worthwhile. I learned a lot.

Jay Conrod
Dang, that's really cool.
Jon Smock
+2  A: 

Yes - back in the 80s. It was an assignment at University, as part of the compiler course - and was a minimal C language. Oh - the joys of working with Lex and Yacc. If you ever find yourself in this situation, either invest time in learning something like Antlr or GOLD, or get yourself a copy of Lex and Yacc.

Pete OHanlon
+2  A: 

I wrote the rule scripting language for a system at work. I came up with the syntax, runtime, function library, etc. out of whole cloth, and implemented the language in Java. It's primary benefit was that it nicely exposed our data model to developers of the rule sets.

This was all in Java.

If I were to do it again, I'm not sure that I would, but to be honest I'm not sure that I wouldn't either. The data model in this case was rather tricky, and the language offered good value in cleanly manipulating that model. If I could duplicate that readily in another language through it's normal extensibility, I'd do that. But frankly I can't think of a language extensible enough to let me really do it save for Lisp, and there's no way I'd be crafting a Lisp for the rule developers.

But I certainly would make an effort to try and leverage an existing language first (but I doubt I'd go as far as hacking an existing languages grammar/runtime). I'd want to work within the language itself.

Will Hartung
+2  A: 

I wrote a functional, interpreted language in Oz as a school project. I learned a lot about abstract syntax trees. The language was quite useless, but I still have the small programs I wrote in it lying around:

      if x==0 then 0 else
         if x==1 then 1 else
            call fib(x-1) + call fib(x-2)
   call fib(12)

That calculates Fibonacci number 12.

If I remember correctly, the language only supports two types; positive integers and boolean values, and there is no possibility of strings. Also, any program written in it will only ever print one number.

A few of these programs were part of the specification we worked from, and because of the requirement of positive integers, subtraction had frequent underflows. The professor for this course had not taken this into consideration, and as such, one of the specified programs had a bug where it would not run if you programmed your interpreter properly.

Vegard Larsen
"call fib(12) ... That calculates Fibonacci number 12." -- oh, is that what it does? :)
Dustin Getz
+2  A: 

Yes, three of them:

  1. One OOP TIL (like Fourth) for graphics to run a 2D CAD drawing engine [GIGL = GIGL Interactive Graphics Language] (1984),
  2. one OOPL in C for app dev [SOIL = Simplified Object Interaction Language] (1985), and
  3. an OOP extension in a macrocompiler as a (short-lived) commercial product [Flex_Ability_] (1986)

In all 3 cases I needed the language to vastly simplify the development process.

For example:

  • GIGL allowed a full 2D CAD engine to fit into the RAM of a C64 machine, but unfortunately executed too slowly to support the overly-ambitious product design (which had already run out of resources in straight Assembly Language prior to one-last-attempt with GIGL; paging commands from the disk drive got around the memory limits, but killed the drawing speed).
  • SOIL merged the Windows for Data C library with the Faircom C-Tree ISAM DB library in a "Form" class that allowed us to implement a complex order/inventory system as a series of drill-down screens, all based on the same basic Form code. It was fast, pretty, and extensible, and scared the heck out of the intern that was hired to maintain it. Then the company folded.
  • Flex_Ability_ extended the DataFlex v2.3 4GL for object-oriented programming, introducing the concept of nested/related DataSet objects and reducing the typical 400+ lines of code required for a header-and-line-items application to 14 lines of elegant OOP code. The technology was acquired by Data Access Corp and highly influenced the design of DataFlex v3.0 (now Visual DataFlex), which was (to the best of my knowledge) the first object-oriented 4GL.
Steven A. Lowe
+1  A: 

I've written a compiler for a subset of C as a class project. I've also designed and implemented small scripting/querying languages for personal projects I've worked on - mainly for fun. I find desiging and implementing languages one of the most interesting programming experiences I've had.

Nathaniel Flath
+5  A: 

Yes, I am currently busy with IronScheme.

+6  A: 

I implemented PostScript for use as a scripting language for a debugger written in Modula-3. This was in 1990 or thereabouts. Tcl was not expressive enough to be useful, and Lua hadn't been invented yet. PostScript was designed to run on printers which at the time were only slightly more powerful than a typical PC and a good deal less powerful than a departmental minicomputer. So the language was practically tailor-made to be implemented by a simple interpreter, and because Modula-3 provided the garbage collection, the first version only took a week to build. The embedded interpreter proved to be a very powerful tool; it was one of the best decisions we made on the project.

I have since implemented a compiler-target language called C-- for research, plus I have implemented a half a dozen "micro languages" (micro-Scheme, micro-ML, micro-Smalltalk, micro-Prolog, ...) to use in teaching classes on programming languages.

I've also worked on other people's compilers for C and Haskell.

It's a good exercise for any programmer to design and implement a simple language. You will learn a lot. Unfortunately there are a lot of bad books out there and not many good ones. Of those now in print, perhaps the one by Dan Friedman and Mitch Wand is the best, but it's rather formal.

Norman Ramsey
Is the book Essentials of Programming Languages?
Mark Stock
Yes, that's the book. It's a good book but I think the intended audience is rather different from most Stackoverflow members. But until my own book is published it's the best one I can recommend. There are *lots* of books to avoid...
Norman Ramsey

Not a programming language, but I implemented a small SQL interpreter some time ago, and a little database whose data all disappeared when the program terminated :)


No. Next question ? :-)

+2  A: 

Yes, I designed and implemented a couple of small languages.

One was the Imagine Staging Language (ISL), which was a narrowly focused commercial project for a few years. I reverse engineered the file format of Imagine staging files, designed a language that described the possible contents of such files, and implemented it using yacc and lex (well, bison and flex, technically). Imagine was a GUI-based 3D renderer for the Amiga and PC with no scripting capability. People used ISL for things like generating scenes using physics-based motion. I decided to do it because I wanted to use it myself - there was no other product at the time with similar capabilities.

The others were mini-languages for various jobs. Generally, I used yacc and lex and designed well-formed grammars intended to make it easy to write code for domain-specific problems in an error-resistant manner. The most elaborate was a major subset of C intended to allow for arbitrary scientific calculations, triggered by the receipt of telemetry data, in realtime. The runtime for this one had symbol tables and stack bounds checking and such and was fairly full-featured.

The major driving force behind these mini-languages was that it was the best way available at the time to solve custom problems in narrow problem domains.

With the modern widespread availability of options such as TCL and LUA, I would look closely at using one of those before rolling my own again.

John Grieggs
+2  A: 

As an undergrad in the early 1970s, I had to write a compiler for a subset of the PL/I language (which was enjoying some measure of popularity at the time, at least within academic circles). During the course, our professor spent almost an entire week teaching us about recursion, demonstrating how recursive techniques could be used in compilers for parsing, etc.

For some strange reason, I assumed that he expected us to actually make use of those things which he emphasized during class. Well, after many long and sleepless nights at the computer center, the day finally arrived for us to submit our final projects. After I proudly showed him the results of my operable, yet somewhat sluggish compiler, he remarked -- "Fine, but why did you decide to use recursion?" (!)

I didn't get quite the grade I had hoped for, either. But hey, at least my compiler WORKED.

+2  A: 

For my job I recently designed and implemented a simple extensible DSL so users can write short expressions to calculate text, numeric or boolean values, to control the appearance of business documents. If building a system from scratch I would probably have let the user choose any .NET language to do this in, eliminating a lot of work for me. But the requirement is to provide a convenient way of configuring a pre-existing runtime (not .NET), and also to mimic the simplicity of another pre-existing product, which created an opportunity to do a lot of interesting things.

Interesting things about it are:

  • It uses the Irony parser library to do much of the hard work - I recommend it highly.
  • It's type safe, supports overloading and uses type inference, e.g. to figure out what + means, it reasons from the required type and/or the input types as necessary. It could either reject or accept 4 + "2" at compile time depending on what language rules had been defined.
  • The core language-defining system uses C# generics so that it is independent of the runtime implementation, although it assumes a runtime tree representation.
  • Language features are defined in IOC components. There are no built-in language features. I can add new keywords and capabilities by dropping in new assemblies.
  • The user's expressions are persisted as executable trees (made up of objects in the pre-existing runtime system I'm targeting with a new UI), instead of as text source. This means I can change my mind about details of syntax in future versions without breaking backward compatibility. User's existing expressions will automatically appear written in the new syntax (this has proven invaluable as it has evolved during prototyping).
  • I've deliberately chosen descriptive terms in language features, instead of imperative ones. So instead of convert_to_number "42" (which is an instruction) we have "42" as number which is a descripion. It makes the expressions more readable. There are pragmatic exceptions to this rule for things people are very familiar with from other languages - if then else being an obvious example.
  • It supports generics. A language feature is partly defined by a pattern containing question marks to indicate where sub-expressions may appear, such as:

    if ? then ? else ?

    ? is number

    ? + ?

... along with a list that gives the types allowed for sub-expressions in a given overload, of which there can be several. It's possible to indicate that sub-expressions are of an unknown type. All subexpressions in a pattern with unknown type must be of the same type in any instance of the pattern. It's like generics but only allowing a single type parameter. This is what 'if' does - its 1st argument must be boolean and its return type and the types of the 2nd and 3rd arguments must all be the same as each other, but are otherwise unconstrained.

if ("4" + "2") as number > 30 then "seems to be working" 
                              else "oh dear"

And I threw in a syntax-highlighting editor as well. It was a lot of fun!

Daniel Earwicker
+2  A: 

I wrote a byte-code compiler and a virtual machine for a language used in a game, where each game level was stored as a script along with any media used in that level (graphics+sound). The game was entirely written in java, including the virtual machine, so the efficiency was horrible of course (virtual machine inside a virtual machine), but it was a simple game with mostly static graphics. The flexibility it gave the designers when they created the levels was very much appreciated.

Addendum: I was actually rather surprised at how not-horrible the efficiency was, but I wouldn't recommend it for a real project...

+21  A: 

An unreasonable number. Here are some, more recent first. In general most of these larger projects were for fun and learning more than use.

Squerm: An experimental Lisp dialect that's a kind of portmanteau of Scheme and Erlang. Meant to play with those ideas and grow into an image-based environment influenced by Squeak, but that part hasn't happened so far.

Wren: Meant to run interactively on small embedded systems with around 4k of RAM available. A friend had asked me if I could recommend a language for that to short-circuit the compile-and-download cycle, and I decided to hack up a new one.

ichbins: A self-hosting compiler from a Lisp dialect to C, in about 6 pages of non-obfuscated code. Written to learn the practicalities of bootstrapping and evolving a self-hosting system.

Consp: A toy Scheme dialect oriented around capability security. I wanted to test my understanding of the subject in a simpler context than E (see below).

Hypercode: A pretty weird system to try out some ideas about hypertext as source code and a live environment where the full representation of an executing program is there to edit/interact with as it runs. Written in Hypercard.

Hmph: Some related ideas, this time in a web context. Really hacky, but pointing in a direction I'd like to see worked out better.

Tusl: A small stack language for embedding in C programs to add runtime programmability to them and keep from having to write another config-file parser, etc. Sort of like Color Forth minus the colors. It's come in handy in several projects, though I guess it'll never replace Lua. :-)

Scheme wiki: an underpowered Scheme interpreter in Python as part of a wiki that ran literate Scheme code that anyone could edit. The general idea was to make a wiki with live programs instead of dead code snippets; I ran out of steam before making it really usable.

E: Allen Short greatly extended my early draft of a C-based implementation of the E programming language, itself starting from Mark Miller's work. I like E and would like to use it in anger.

'mlisp': Never-released systems-programming dialect of Lisp, compiling to the bare metal. For Brian Spilsbury's Vapour OS, which never got released either. We did write a few device drivers...

UTS: Bytecode interpreter for R4RS Scheme. I used it a lot for about 5 years for my personal programming, but it breaks when compiled with any modern release of gcc, and I'm too lazy to figure out why. (These days I mostly use Python instead.)

Plonk: Java port of UTS. This seems to have been the first reasonably-complete-and-standard Java implementation of Scheme. It was too slow and inconvenient for me to want to take it further, but it taught me Java.

MIXAL: I wrote this intending to use it while studying Knuth, which, well, still hasn't happened. Jose Ortega turned it into GNU MDK.

Awklisp: At work I had to use a horrible Basic implementation called MapBasic. It was so slow I bet you could write a faster interpreter in Awk. Yes.

yawk: A parser generator in and for Awk. Fun, but unreleasable since I wrote it at work.

req: My college data-structures class was taught by Dave Gillespie, who wrote GNU Calc; he organized the material around building a similar kind of little symbolic-math system using term rewriting. I rewrote it later, dropping the algebraic simplification part; I still use it regularly for random calculations.

They're on GitHub or my my old home page.

Darius Bacon
Awesome stuff! I plan to check it out.
Robert Harvey
+1  A: 

It's not a language nor a compiler; I tried to write a code analyzer for the D language, but that project died! I wrote a lexer and a parser, after that .. I wasn't sure how can I make use of the AST for further processing.

hasen j
+6  A: 

Over the years, I've designed and implemented a couple of Business Rules Engines for use in the public sector that had their own programming languages.

The first used a functional language and compiled down to a byte-code with a VM engine that ran on both PCs and ICL mainframes.

The second was somewhat grander in terms of scope and functionality (and succeeded the first). It fully supported two languages; the first being the functional language from the previous engine and the second was a Java-derivative. I also did some significant work on three other languages, COBOL (to allow legacy mainframe business rules to run directly in the rules engine), a graphical modelling language and Structured English. None of which found it into the final product, though. A feature of this engine is that the different languages could be combined in a rules set. Ultimately, they were used to derive a single abstract syntax tree and compiled down to IL to run under the .NET Framework.

Steve Morgan

I'm currently working on a Pascal compiler

+3  A: 

I've lost count of the number of languages I've implemented. Some are

  • A few Lisp interpreters

  • A language I called D around 1986, a variation on C that sported continuations.

  • When I was teaching, I had students write a Pascal-subset for compiler class.

  • Numerous DSLs.

Nowadays I usually implement a new language by translating it into an existing compiler language.

For lexers and parsers, I know all about lex and yacc, but personally I prefer to roll my own.

I've come to see that whenever you're programming you're building a language, in a sense, so knowing how to design and build languages is an invaluable skill for any programmer.

[Added] This ties into discussions of code generation and its alter-ego, partial evaluation. I've found that some problems, such as flexible report generation, can be simplified enormously by splitting them into two phases, a code generation phase and an execution phase, even if the code is only a simple interpreted byte code.

Mike Dunlavey
+1, D was not easy to implement.
Tim Post
The lemon parser is a good example of rolling your own, which I believe Blue is currently using (or will).
Tim Post
@tinkertim: My D parser way-predated the current D language. I don't think it was well-known enough to actually claim the letter.
Mike Dunlavey

I implemented a BrainFuck interpreter, but that's really easy. Really, really easy. The code could fit on one screen.

Was the interpreter written in BF?
+1  A: 


I wrote a compiler for a course in college and after that, I wrote:

A subroutine threaded FORTH compiler/interpreter (that line is blurry) for an independent study course.

An implementation of a lazy Scheme in C which included bit fiddling extensions for a seminar course in lazy evaluation.

A 6502 emulator.

The VM for a Prolog compiler.

A PostScript interpreter in C for fun.

A 6800 interpreter to emulate the sound boards in Williams Electronics video games. I had it running in 2x slower than real time on a slowish Macintosh and was intending to rewrite key instructions in assembly to speed it up, but then the PowerPC was released and after recompiling it ran better than real time, so I declared that I was done.

An interpreter with JIT hooks for javascript/VRMLScript for a VRML browser.

A compiler for a non-turing complete predicate language for browser scraping/form filling that compiled into a nice little VM machine (the non-turing completeness guaranteed that all programs would halt!)

A PostScript interpreter in Java which was used as a teaching tool for some high school honors students who were prepping for the AP computer science test - I handed them the module definitions and let them try do the implementation and glue code. If they couldn't work out bugs, I gave them the reference implementation.

+1  A: 

I'm working on a language that I call the Simple Parrot Test Language, or SPTL (pronounced "spittle"). My goal is to understand the Parrot VM and PIR bytecode. I'm blogging about it here.

Bruce Alderman

As a regular follower of the Ruby Quizes, I've implemented compilers/interpreters/VM's for

  • Chip8 [#88],
  • an arbitrary bytecode language [#100],
  • a stub of an Erlang-inspired language [#135]
  • a Turing Machine [#162]
  • Befunge-93 [#184]

Quite some, over the many years,

  • a pascal compiler (for the c64!),
  • a few forth interpreters,
  • a basic compiler (huh),
  • various lisps,
  • a dozen assemblers,
  • a c-compiler,
  • a compiler-compiler,
  • an sql engine,
  • a smalltalk (still makes my living),
  • a prolog,
  • half a dozen graphical languages and
  • even more dsl's embedded in various projects.

Hard to tell which was most fun to do - maybe the forth system for the 6809, as it was created on the naked hardware; not even an assembler avail.; entered hexcode of the first words via the hex-rom-debug-monitor...

Now mentally returning back to becoming a lisp lover again: pure semantics; no syntactic sugar-shit to fight against, the full power of though at your hands...


I wrote a domain specific (compiled) language for a game because it was the best way of building the product. It's was similar in nature to UnrealScript.

Why? Because it was the best solution to the problem of building stable core code that could be used by lesser skilled programmers (such as designers and artists) to promote stability and rapid iteration.

I.e. a bug in the "bullet" code would just cause the bullet to freeze in mid-air before being reaped by the VM rather than causing the game to lockup/crash.

I've also built various other languages for fun (e.g. a Common Lisp interpreter in javascript) because, well, building languages is fun!

Steve Lacey

I once wrote a mini-language by accident.

It started out as a simple state machine (in C) for driving a keyboard controller chip (HP-HIL if anyone cares). By the end of it I'd effectively turned that state machine table into a really bad programming language (having only one data type and having a fixed number of variables, if you will) for controlling the chip.

I was too inexperienced at the time to realize what I was doing, or I'd have made my life easier by making a real mini-language out of it. As it was, you had a VERY wide table with columns for all the possible things you might do (branch, jump, etc). It was actually quite readable, except that you had to keep track of which column was which command.

Michael Kohne