views:

54

answers:

2

I'm building a parser for a language I've designed, in which type names start with an upper case letter and variable names start with a lower case letter, such that the lexer can tell the difference and provide different tokens. Also, the string 'this' is recognised by the lexer (it's an OOP language) and passed as a separate token. Finally, data members can only be accessed on the 'this' object, so I built the grammar as so:

%token TYPENAME
%token VARNAME
%token THIS

%%

start:
    Expression
    ;

Expression:
    THIS
    | THIS '.' VARNAME
    | Expression '.' TYPENAME
    ;
%%

The first rule of Expression allows the user to pass 'this' around as a value (for example, returning it from a method or passing to a method call). The second is for accessing data on 'this'. The third rule is for calling methods, however I've removed the brackets and parameters since they are irrelevant to the problem. The originally grammar was clearly much larger than this, however this is the smallest part that generates the same error (1 Shift/Reduce conflict) - I isolated it into its own parser file and verified this, so the error has nothing to do with any other symbols.

As far as I can see, the grammar given here is unambiguous and so should not produce any errors. If you remove any of the three rules or change the second rule to

Expression '.' VARNAME

there is no conflict. In any case, I probably need someone to state the obvious of why this conflict occurs and how to resolve it.

+3  A: 

The problem is that the grammar can only look one ahead. Therefore when you see a this then a . are you line 2 or line 3 (via a line 1).

The grammar could reduce this. to expression. and then look for a typename or shift it to this. and look for a varname, but it has to decide when it gets to the ..

Simeon Pilgrim
+1  A: 

I try to avoid y.output but sometimes it does help. I looked at the file it produced and saw.

state 1

    2 Expression: THIS.  [$end, '.']
    3           | THIS . '.' VARNAME

    '.'  shift, and go to state 4

    '.'       [reduce using rule 2 (Expression)]
    $default  reduce using rule 2 (Expression)

Basically it is saying it sees '.' and can reduce or it can shift. Reduce makes me anrgu sometimes because they are hard to fine. The shift is rule 3 and is obvious (but the output doesnt mention the rule #). The reduce where it see's '.' in this case is the line

| Expression '.' TYPENAME

When it goes to Expression it looks at the next letter (the '.') and goes in. Now it sees THIS | so when it gets to the end of that statement it expects '.' when it leaves or an error. However it sees THIS '.' while its between this and '.' (hence the dot in the out file) and it CAN reduce a rule so there is a path conflict. I believe you can use %glr-parser to allow it to try both but the more conflicts you have the more likely you'll either get unexpected output or an ambiguity error. I had ambiguity errors in the past. They are annoying to deal with especially if you dont remember what rule caused or affected them. it is recommended to avoid conflicts.

I highly recommend this book before attempting to use bison.

I cant think of a 'great' solution but this gives no conflicts

start:
    ExpressionLoop
    ;

ExpressionLoop:
      Expression
    | ExpressionLoop ';' Expression
    ;
Expression:
      rval 
    | rval '.' TYPENAME
    | THIS //trick is moving this AWAY so it doesnt reduce
rval:

    THIS '.' VARNAME

Alternative you can make it reduce later by adding more to the rule so it doesnt reduce as soon or by adding a token after or before to make it clear which path to take or fails (remember, it must know BEFORE reducing ANY path)

start:
    ExpressionLoop
    ;

ExpressionLoop:
      Expression
    | ExpressionLoop ';' Expression
    ;
Expression:
      rval 
    | rval '.' TYPENAME
rval:
      THIS '@'
    | THIS '.' VARNAME
%%

-edit- note if i want to do func param and type varname i cant because type according to the lexer func is a Var (which is A-Za-z09_) as well as type. param and varname are both var's as well so this will cause me a reduce/reduce conflict. You cant write this as what they are, only what they look like. So keep that in mind when writing. You'll have to write a token to differentiate the two or write it as one of the two but write additional logic in code (the part that is in { } on the right side of the rules) to check if it is a funcname or a type and handle both those case.

acidzombie24