views:

62

answers:

2

So, I'm writing an LL(1) parser (for kicks) and I ran into a problem with the member selection operator ("." as in "var.function()"). Here's a simplified grammar I'm trying to analyze with the parser:

postfix_expression
    : postfix_expression '(' ')'
    | postfix_expression '.' identifier
    | identifier
    ;

The above grammar is left-recursive AND needs to be left-factored in order to be parsed by an LL(1) parser. Here is my attempt at solving this:

postfix_expression
    : identifier postfix_expression_tail
    ;

postfix_expression_tail
    | '(' ')' postfix_expression_tail
    | '.' postfix_expression
    | <empty>
    ;

I think that this grammar works, but is there an easier way to do this? EDIT: Does this grammar actually accept the same language as the first grammar?

PS:

I know recursive descent parsers can't handle left-recursion, i.e.,:

E -> E + T
E -> T

So instead you have to do some transformations to use right-recursion instead, and then process the resulting parse tree to make it left-associative again.

+1  A: 

What do you want to be simpler: the grammar you came up with, or do you want help with removing left recursion in general? I don't think the transformed grammar going to get any simpler than that. It makes perfect sense as is.

John Kugelman
Help with removing left-recursion in general. I'm not sure if the grammar I wrote is correct.
Matt Fichman
A: 

You can do some rewriting to eliminate direct left recursion, but eliminating indirect left recursion is harder. Especially if the grammar is cyclic, or contains epsilon rules.

They will accept the same language, but won't give the same parse tree. Top down parsing produces right associative trees and bottom up parsing produces left associative trees.

And yes, you have to we-write the parse tree afterwards to get some operators to have a left associative parse.

Alternatively, You can use operator precedence to force left or right associativity in the parse. If you are using recursive decent, you can implement this in a number of ways: http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm

Elminating left recursion is one way, but there are a number of other ways to deal with left recursion in top down parsers.

Memoization: Packrat parsers can use a modified memoization routine to enable left recursion (but without the linear time guarantee), and cancellation parsing uses memoization to prevent loops in top down parsing. You can also prevent loops by assigning a token cost to each left recursion, and stopping when there are no more tokens in the file.

Most of the existing left recursion supporting parsing combinator libraries use one of these approaches.

Another way is using the left corner of the grammar. A left corner of a grammar rule is a terminal or non terminal that can appear at the very front of a production. We can build an automata to represent how these left corners are linked,

I.e.

E = E + E
E = I
I = 1|0

1,0 are left corners of I, and I,1,0,E are left corners of E. The left spline of the grammar looks like this.

        E
       /
      E*
    /   \ ...[+ E]
   I
 /  \
1    0

I'm being really lazy and not putting all the state transformations in.

We can use this information in a top down or bottom up way.

In the top down method, we start at the top and descend until we parse successfully, on the ascent, we use the left recursive rules to rewrite the tree.

I.e 1 + 0 is parsed E --> E* --> I --> 1 for 1, then at E*, we look ahead for (1), and then try to parse E again at this point.

You can think of it as collapsing all left recursion into superposition, which you don't resolve until you've parsed the left hand side. I.e, the E* state represents, some E and then apply left recursive rules.

In a bottom up manner, we start at the bottom of the tree and climb it, and then descend when there are possible left recursive rules to apply. i.e we do 1 --> I --> E* and then look ahead for '+' before ascending to E.

Left corner parsers can be implemented in a number of ways, including a variant of a LR state machine, often called LC. It can also be implemented directly using a mixture of recursive ascent and descent.

And there is something called the left corner transform, which transforms left recursion into right recursive rules, but does this by having special parse rules which can take and argument and use it in it's production.

1729