views:

389

answers:

4

As is explained in http://stackoverflow.com/questions/2652060/removing-left-recursion , there are two ways to remove the left recursion.

  • Modify the original grammar to remove the left recursion using some procedure
  • Write the grammar originally not to have the left recursion

What people normally use for removing (not having) the left recursion with ANTLR? I've used flex/bison for parser, but I need to use ANTLR. The only thing I'm concerned about using ANTLR (or LL parser in genearal) is left recursion removal.

  • In practical sense, how serious of removing left recursion in ANTLR? Is this a showstopper in using ANTLR? Or, nobody cares about it in ANTLR community?
  • I like the idea of AST generation of ANTLR. In terms of getting AST quick and easy way, which method (out of the 2 removing left recursion methods) is preferable?

Added

I did some experiment with the following grammar.

E -> E + T|T
T -> T * F|F
F -> INT | ( E )

After left recursion removal, I get the following one

E -> TE'
E' -> null | + TE'
T -> FT'
T' -> null | * FT'

I could come up with the following ANTLR representation. Even though, It's relatively pretty simple and straightforward, it seems the grammar that doesn't have the left recursion should be the better way to go.

grammar T;

options {
    language=Python;
}

start returns [value]
   : e {$value = $e.value};
e returns [value]
   : t ep  
     {
       $value = $t.value
       if $ep.value != None:
         $value += $ep.value
     }
   ;
ep returns [value]
   : {$value = None}
   | '+' t r = ep 
     {
       $value = $t.value
       if $r.value != None:
            $value += $r.value
     }
   ;
t returns [value]
  : f tp 
    {
      $value = $f.value
      if $tp.value != None:
        $value *= $tp.value
    }
  ;
tp returns [value]
  : {$value = None}
  | '*' f r = tp 
    {
      $value = $f.value;
      if $r.value != None:
        $value *= $r.value
    }
  ;
f returns [int value]
  : INT {$value = int($INT.text)}
  | '(' e ')' {$value = $e.value}
  ;

INT :   '0'..'9'+ ;
WS: (' '|'\n'|'\r')+ {$channel=HIDDEN;} ;
+1  A: 

If you are writing the grammar, then of course you try to write it to avoid the pitfalls of your particular parser generator.

Usually, in my experience, I get some reference manual for the (legacy) language of interest, and it already contains a grammar or railroad diagrams, and it is what it is.

In that case, pretty much left recursion removal from a grammar is done by hand. There's no market for left-recursion-removal tools, and if you had one, it would be specialized to a grammar syntax that didn't match the grammar syntax you have.

Doing this removal is mostly a matter of sweat in many cases, and there isn't usually tons of it. So the usual approach is get out your grammar knife and have at it.

I don't think how you remove left recursion changes how ANTLR gets trees. You have to do the left recursion first, or ANTLR (whatever LL parser generator you are using) simply won't accept your grammar.

There are those of us that don't want the parser generator to put any serious constraints on what we can write for a context free grammar. In this case you want to use something like a GLR parser generator, which handles left- or right-recursion with ease. Unreasonable people can even insist on automated AST generation with no effort on the part of the grammar writer. For a tool that can do both, see DMS Software Reengineering Toolkit.

Ira Baxter
+1  A: 

In practical sense, how serious of removing left recursion in ANTLR? Is this a showstopper in using ANTLR?

I think that you have a misunderstanding of left-recursion. It is a property of the grammar, not of the parser generator or the interaction between the parser generator and the specification. It happens when the first symbol on the right side of a rule is equal to the nonterminal corresponding to the rule itself.

To understand the inherent problem here, you need to know something about how a recursive-descent (LL) parser works. In an LL parser, the rule for each nonterminal symbol is implemented by a function corresponding to that rule. So, suppose I have a grammar like this:

S -> A B
A -> a
B -> b

Then, the parser would look (roughly) like this:

boolean eat(char x) {
  // if the next character is x, advance the stream and return true
  // otherwise, return false
}

boolean S() {
  if (!A()) return false;
  if (!B()) return false;
  return true;
}

boolean A(char symbol) {
  return eat('a');
}

boolean B(char symbol) {
  return eat('b');
}

However, what happens if I change the grammar to be the following?

S -> A B
A -> A c | null
B -> b

Presumably, I want this grammar to represent a language like c*b. The corresponding function in the LL parser would look like this:

boolean A() {
  if (!A()) return false;  // stack overflow!  We continually call A()
                           // without consuming any input.
  eat('c');
  return true;
}

So, we can't have left-recursion. Rewrite the grammar as:

S -> A B
A -> c A | null
B -> b

and the parser changes as such:

boolean A() {
  if (!eat('c')) return true;
  A();
  return true;
}

(Disclaimer: this is my rudimentary approximation of an LL parser, meant only for demonstration purposes regarding this question. It has obvious bugs in it.)

danben
nicely written. I'm fairly new to antlr, this is quite helpful.
Steve B.
+2  A: 

Consider something like a typical parameter list:

parameter_list: parameter
              | parameter_list ',' parameter
              ;

Since you don't care about anything like precedence or associativity with parameters, this is fairly easy to convert to right recursion, at the expense of adding an extra production:

parameter_list: parameter more_params
              ;

more_params:
           | ',' parameter more_params
           ;

For the most serious cases, you might want to spend some time in the Dragon Book. Doing a quick check, this is covered primarily in chapter 4.

As far as seriousness goes, I'm pretty sure ANTLR simply won't accept a grammar that contains left recursion, which would put it into the "absolute necessity" category.

Jerry Coffin
A: 

I can't speak for ANTLR, but in general, the steps to eliminate a left recursion of the form:

A -> A B
  -> B

is to change it to be:

A -> B*

or, if ANTLR doesn't support the Kleene closure, you can do:

A -> B B'

B' -> B B'
   -> 

If you provide an example of your rules that are having conflicts, I can provide a better, more specific answer.

samoz