tags:

views:

2092

answers:

7

I am trying to build a Lisp grammar. Easy, right? Apparently not.

I present these inputs and receive errors...

( 1 1)
23 23 23 
ui ui

This is the grammar...

%%
sexpr: atom                 {printf("matched sexpr\n");}
    | list
    ;
list: '(' members ')'       {printf("matched list\n");}
    | '('')'                {printf("matched empty list\n");}
    ;
members: sexpr              {printf("members 1\n");}
    | sexpr members         {printf("members 2\n");}
    ;
atom: ID                    {printf("ID\n");}
    | NUM                   {printf("NUM\n");}
    | STR                   {printf("STR\n");}
    ;
%%

As near as I can tell, I need a single non-terminal defined as a program, upon which the whole parse tree can hang. But I tried it and it didn't seem to work.

edit - this was my "top terminal" approach:

program: slist;

slist: slist sexpr | sexpr;

But it allows problems such as:

( 1 1

Edit2: The FLEX code is...

%{
    #include <stdio.h>
    #include "a.yacc.tab.h"
    int linenumber;
    extern int yylval;
%}
%%
\n                         { linenumber++; }
[0-9]+                     { yylval = atoi(yytext); return NUM; }
\"[^\"\n]*\"               { return STR; }
[a-zA-Z][a-zA-Z0-9]*       { return ID; }
.
%%

An example of the over-matching...

(1 1 1)
NUM
matched sexpr
NUM
matched sexpr
NUM
matched sexpr
(1 1
NUM
matched sexpr
NUM
matched sexpr

What's the error here?

edit: The error was in the lexer.

A: 

Isn't the point of Lisp that it has no syntax, and that it's in effect already parsed for you? I've heard that writing Lisp is like writing directly in a compiler's AST.

dsimcha
Something still has to transform it into actual executing code, which is what I'm trying to do here.
Paul Nathan
You might say Lisp is to the AST what assembly is to machine code. It's not exactly the same thing, but there's a direct correspondence between them. Just as you need an assembler to get executable code, you need a parser for Lisp.
Chuck
+2  A: 

You are correct in that you need to define a non-terminal. That would be defined as a set of sexpr. I'm not sure of the YACC syntax for that. I'm partial to ANTLR for parser generators and the syntax would be:

program: sexpr*

Indicating 0 or more sexpr.

Update with YACC syntax:

program :  /* empty */
        | program sexpr
        ;

Not in YACC, but might be helpful anyway, here's a full grammar in ANTLR v3 that works for the cases you described(excludes strings in the lexer because it's not important for this example, also uses C# console output because that's what I tested it with):

program: (sexpr)*;

sexpr: list
    |  atom            {Console.WriteLine("matched sexpr");}
    ;

list:     
   '('')'              {Console.WriteLine("matched empty list");}
   | '(' members ')'   {Console.WriteLine("matched list");}

    ;

members: (sexpr)+      {Console.WriteLine("members 1");};

atom: Id               {Console.WriteLine("ID");}
    | Num              {Console.WriteLine("NUM");}
    ;


Num: ( '0' .. '9')+;
Id: ('a' .. 'z' | 'A' .. 'Z')+;
Whitespace : ( ' ' | '\r' '\n' | '\n' | '\t' ) {Skip();};

This won't work exactly as is in YACC because YACC generates and LALR parser while ANTLR is a modified recursive descent. There is a C/C++ output target for ANTLR if you wanted to go that way.

Bearddo
Updated with info about my top-level terminal overmatching.
Paul Nathan
+1  A: 

It's been a long time since I worked with YACC, but you do need a top-level non-terminal. Could you be more specific about "tried it" and "it didn't seem to work"? Or, for that matter, what the errors are?

I'd also suspect that YACC might be overkill for such a syntax-light language. Something simpler (like recursive descent) might work better.

David Thornley
Updated.Also, I've used the flex/bison tools before, so it makes my life easier.
Paul Nathan
I would definitely agree with the recursive descent parser for lisp - or even a push-down automaton
Eclipse
Okay, I'm stumped, it shouldn't be accepting unbalanced parentheses. I assume you're using lex/flex; how does lex/flex define ID? What you have looks good to me, so I'd like to see more of what you're doing.
David Thornley
+3  A: 

Lisp grammar can not be represented as context-free grammar, and yacc can not parse all lisp code. It is because of lisp features such as read-evaluation and programmable reader. So, in order just to read an arbitrary lisp code, you need to have a full lisp running. This is not some obscure, non-used feature, but it is actually used. E.g., CL-INTERPOL, CL-SQL.

If the goal is to parse a subset of lisp, then the program text is a sequence of sexprs.

dmitry_vk
Actually, that depends on the version of Lisp. If you're not targetting Common Lisp, or you're trying to bootstrap, a CFG can be a good place to start.
David Thornley
+1  A: 

You could try this grammar here.

Eclipse
+4  A: 

The error is really in the lexer. Your parentheses end up as the last "." in the lexer, and don't show up as parentheses in the parser.

Add rules like

\)     { return RPAREN; }
\(     { return LPAREN; }

to the lexer and change all occurences of '(', ')' to LPAREN and RPAREN respectively in the parser. (also, you need to #define LPAREN and RPAREN where you define your token list)

Note: I'm not sure about the syntax, could be the backslashes are wrong.

jpalecek
A: 

Do you neccesarily need a yacc/bison parser? A "reads a subset of lisp syntax" reader isn't that hard to implement in C (start with a read_sexpr function, dispatch to a read_list when you see a '(', that in turn builds a list of contained sexprs until a ')' is seen; otherwise, call a read_atom that collects an atom and returns it when it can no longer read atom-constituent characters).

However, if you want to be able to read arbritary Common Lisp, you'll need to (at the worst) implement a Common Lisp, as CL can modify the reader run-time (and even switch between different read-tables run-time under program control; quite handy when you're wanting to load code written in another language or dialect of lisp).

Vatine
I'm not shooting for common lisp. My desire is that eventually I can load this onto an embedded device and write/rewrite/compose control functions on the fly. I'm using flex/bison because I know how to use them. Not because they are the best option, per se.
Paul Nathan