views:

1773

answers:

5

I find myself attached to a project to integerate an interpreter into an existing application. The language to be interpreted is a derivative of Lisp, with application-specific builtins. Individual 'programs' will be run batch-style in the application.

I'm surprised that over the years I've written a couple of compilers, and several data-language translators/parsers, but I've never actually written an interpreter before. The prototype is pretty far along, implemented as a syntax tree walker, in C++. I can probably influence the architecture beyond the prototype, but not the implementation language (C++). So, constraints:

  • implementation will be in C++
  • parsing will probably be handled with a yacc/bison grammar (it is now)
  • suggestions of full VM/Interpreter ecologies like NekoVM and LLVM are probably not practical for this project. Self-contained is better, even if this sounds like NIH.

What I'm really looking for is reading material on the fundamentals of implementing interpreters. I did some browsing of SO, and another site known as Lambda the Ultimate, though they are more oriented toward programming language theory.

Some of the tidbits I've gathered so far:

  • Lisp in Small Pieces, by Christian Queinnec. The person recommending it said it "goes from the trivial interpreter to more advanced techniques and finishes presenting bytecode and 'Scheme to C' compilers."

  • NekoVM. As I've mentioned above, I doubt that we'd be allowed to incorporate an entire VM framework to support this project.

  • Structure and Interpretation of Computer Programs. Originally I suggested that this might be overkill, but having worked through a healthy chunk, I agree with @JBF. Very informative, and mind-expanding.

  • On Lisp by Paul Graham. I've read this, and while it is an informative introduction to Lisp principles, is not enough to jump-start constructing an interpreter.

  • Parrot Implementation. This seems like a fun read. Not sure it will provide me with the fundamentals.

  • New! Scheme from Scratch. Peter Michaux is attacking various implementations of Scheme, from a quick-and-dirty Scheme interpreter written in C (for use as a bootstrap in later projects) to compiled Scheme code. Very interesting so far.

So how about it? Is there a good book that takes the neophyte by the hand and shows how to build an interpreter in C/C++ for a Lisp-like language? Do you have a preference for syntax-tree walkers or bytecode interpreters?

To answer @JBF:

  • the current prototype is an interpreter, and it makes sense to me as we're accepting a path to an arbitrary code file and executing it in our application environment. The builtins are used to affect our in-memory data representation.

  • it should not be hideously slow. The current tree walker seems acceptable.

  • The language is based on Lisp, but is not Lisp, so no standards compliance required.

  • As mentioned above, it's unlikely that we'll be allowed to add a full external VM/interpreter project to solve this problem.

To the other posters, I'll be checking out your citations as well. Thanks, all!

+2  A: 

Check out JScheme from Peter Norvig. I found this amazingly simple to understand and port to C++. Uh, dunno about using scheme as a scripting language though - teaching it to jnrs is cumbersome and feels dated (helloooo 1980's).

CVertex
+2  A: 

The Kamin Interpreters from Samuel Kamin's book Programming Languages, An Interpreter-Based Approach, translated to C++ by Timothy Budd. I'm not sure how useful the bare source code will be, as it was meant to go with the book, but it's a fine book that covers the basics of implementing Lisp in a lower-level language, including garbage collection, etc. (That's not the focus of the book, which is programming languages in general, but it is covered.)

Lisp in Small Pieces goes into more depth, but that's both good and bad for your case. There's a lot of material on compiling and such that won't be relevant to you, and its simpler interpreters are in Scheme, not C++.

SICP is good, definitely. Not overkill, but of course writing interpreters is only a small fraction of the book.

The JScheme suggestion is a good one, too (and it incorporates some code by me), but won't help you with things like GC.

I might flesh this out with more suggestions later.

Edit: A few people have said they learned from my awklisp. This is admittedly kind of a weird suggestion, but it's very small, readable, actually usable, and unlike other tiny-yet-readable toy Lisps it implements its own garbage collector and data representation instead of relying on an underlying high-level implementation language to provide them.

Darius Bacon
+5  A: 

Short answer:

The fundamental reading list for a lisp interpreter is SICP. I would not at all call it overkill, if you feel you are overqualified for the first parts of the book jump to chapter 4 and start interpreting away (although I feel this would be a loss since chapters 1-3 really are that good!).

Add LISP in Small Pieces (LISP from now on), chapters 1-3. Especially chapter 3 if you need to implement any non-trivial control forms.

See this post by Jens Axel Søgaard on a minimal self-hosting Scheme: http://www.scheme.dk/blog/2006/12/self-evaluating-evaluator.html .

A slightly longer answer:

It is hard to give advice without knowing what you require from your interpreter.

  • does it really really need to be an interpreter, or do you actually need to be able to execute lisp code?
  • does it need to be fast?
  • does it need standards compliance? Common Lips? R5RS? R6RS? Any SFRIs you need?

If you need anything more fancy than a simple syntax tree walker I would strongly recommend embedding a fast scheme subsystem. Gambit scheme comes to mind: http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php/Main_Page .

If that is not an option chapter 5 in SICP and chapters 5-- in LISP target compilation for faster execution.

For faster interpretation I would take a look at the most recent JavaScript interpreters/compilers. There seem to be a lot of thought going into fast JavaScript execution, and you can probably learn from them. V8 cites two important papers: http://code.google.com/apis/v8/design.html and squirrelfish cites a couple: http://webkit.org/blog/189/announcing-squirrelfish/ .

There is also the canonical scheme papers: http://library.readscheme.org/page1.html for the RABBIT compiler.

If I engage in a bit of premature speculation, memory management might be the tough nut to crack. Nils M Holm has published a book "Scheme 9 from empty space" http://www.t3x.org/s9fes/ which includes a simple stop-the-world mark and sweep garbage collector. Source included.

John Rose (of newer JVM fame) has written a paper on integrating Scheme to C: http://library.readscheme.org/servlets/cite.ss?pattern=AcmDL-Ros-92 .

JBF
+3  A: 

Yes on SICP.

I've done this task several times and here's what I'd do if I were you:

Design your memory model first. You'll want a GC system of some kind. It's WAAAAY easier to do this first than to bolt it on later.

Design your data structures. In my implementations, I've had a basic cons box with a number of base types: atom, string, number, list, bool, primitive-function.

Design your VM and be sure to keep the API clean. My last implementation had this as a top-level API (forgive the formatting - SO is pooching my preview)

ConsBoxFactory &GetConsBoxFactory() { return mConsFactory; }
AtomFactory &GetAtomFactory() { return mAtomFactory; }
Environment &GetEnvironment() { return mEnvironment; }
t_ConsBox *Read(iostream &stm);
t_ConsBox *Eval(t_ConsBox *box);
void Print(basic_ostream<char> &stm, t_ConsBox *box);
void RunProgram(char *program);
void RunProgram(iostream &stm);

RunProgram isn't needed - it's implemented in terms of Read, Eval, and Print. REPL is a common pattern for interpreters, especially LISP.

A ConsBoxFactory is available to make new cons boxes and to operate on them. An AtomFactory is used so that equivalent symbolic atoms map to exactly one object. An Environment is used to maintain the binding of symbols to cons boxes.

Most of your work should go into these three steps. Then you will find that your client code and support code starts to look very much like LISP too:

t_ConsBox *ConsBoxFactory::Cadr(t_ConsBox *list)
{
    return Car(Cdr(list));
}

You can write the parser in yacc/lex, but why bother? Lisp is an incredibly simple grammar and scanner/recursive-descent parser pair for it is about two hours of work. The worst part is writing predicates to identify the tokens (ie, IsString, IsNumber, IsQuotedExpr, etc) and then writing routines to convert the tokens into cons boxes.

Make it easy to write glue into and out of C code and make it easy to debug issues when things go wrong.

plinth
+2  A: 

I would like to extend my recommendation for Programming Languages: Application and Interpretation. If you want to write an interpreter, that book takes you there in a very short path. If you read through writing the code you read and doing the exercise you end up with a bunch of similar interpreters but different (one is eager, the other is lazy, one is dynamic, the other has some typing, one has dynamic scope, the other has lexical scope, etc).

J. Pablo Fernández
Thanks. This is actually in my queue, behind SICP. At the rate I am progressing through SICP it will be awhile, but it will happen.
Don Wakefield