views:

1034

answers:

4

Hello,

The following simple "calculator expression" grammar (BNF) can be easily parsed with the a trivial recursive-descent parser, which is predictive LL(1):

<expr>      :=  <term> + <term>
            |   <term> - <term>
            |   <term>
<term>      :=  <factor> * <factor>
                <factor> / <factor>
                <factor>
<factor>    :=  <number>
            |   <id>
            |   ( <expr> )
<number>    :=  \d+
<id>        :=  [a-zA-Z_]\w+

Because it is always enough to see the next token in order to know the rule to pick. However, suppose that I add the following rule:

<command>   :=  <expr>
            |   <id> = <expr>

For the purpose of interacting with the calculator on the command line, with variables, like this:

calc> 5+5
=> 10
calc> x = 8
calc> 6 * x + 1
=> 49

Is it true that I can not use a simple LL(1) predictive parser to parse <command> rules ? I tried to write the parser for it, but it seems that I need to know more tokens forward. Is the solution to use backtracking, or can I just implement LL(2) and always look two tokens forward ?

How to RD parser generators handle this problem (ANTLR, for instance) ?

Thanks in advance

+1  A: 

The problem is that the grammar:


<command>   :=  <expr>
            |   <id> = <expr>

is not a mutually-recursive procedure. For a recursive decent parser you will need to determine a non-recursive equivalent.

rdentato post's shows how to fix this, assuming you can play with the grammar. This powerpoint spells out the problem in a bit more detail and shows how to correct it: http://www.google.com/url?sa=t&amp;source=web&amp;ct=res&amp;cd=7&amp;url=http%3A%2F%2Fxml.cs.nccu.edu.tw%2Fcourses%2Fcompiler%2Fcp2006%2Fslides%2Flec3-Parsing%26TopDownParsing.ppt&amp;ei=-YLaSPrWGaPwhAK5ydCqBQ&amp;usg=AFQjCNGAFrODJxoxkgJEwDMQ8A8594vn0Q&amp;sig2=nlYKQVfakmqy_57137XzrQ

mmattax
Can you elaborate ? I don't recall any limitations of mutual-recursiveness, and the Dragon book doesn't mention it.
Eli Bendersky
+3  A: 

THe problem with

<command>   :=  <expr>
            |   <id> = <expr>

is that when you "see" <id> you can't tell if it's the beginning of an assignement (second rule) or it's a "<factor>". You will only know when you'll read the next token.

AFAIK ANTLR is LL(*) (and is also able to generate rat-pack parsers if I'm not mistaken) so it will probably handle this grammare considering two tokens at once.

If you can play with the grammar I would suggest to either add a keyword for the assignment (e.g. let x = 8) :

<command>   :=  <expr>
            |   "let" <id> "=" <expr>

or use the = to signify evaluation:

<command>   :=  "=" <expr>
            |   <id> "=" <expr>
Remo.D
+1  A: 

ANTLR 3 uses a "LL(*)" parser as opposed to a LL(k) parser, so it will look ahead until it reaches the end of the input if it has to, without backtracking, using a specially optimized determinstic finite automata (DFA).

Mark Cidade
A: 

I think there are two ways to solve this with a recursive descent parser: either by using (more) lookahead or by backtracking.

Lookahead

command() {
    if (currentToken() == id && lookaheadToken() == '=') {
        return assignment();
    } else {
        return expr();
    }
}

Backtracking

command() {
    savedLocation = scanLocation();
    if (accept( id )) {
         identifier = acceptedTokenValue();
         if (!accept( '=' )) {
             setScanLocation( savedLocation );
             return expr();
         }
         return new assignment( identifier, expr() );
    } else {
         return expr();
    }
}
Sven