views:

309

answers:

1

I've been using PLY to build up a parser for my language, however I've got a shift/reduce conflict that's causing me some trouble. My language has generic types with a syntax ala C++ templates. So right now I have rules like:

    expression : expression LESS expression %prec COMPARISON
    expression : template
    template : NAME
             | NAME LESS templates GREATER
    templates : template
              | templates COMMA template

However, I've found that it's unable to parse:

a < 2

(which is a problem for obvious reasons). The following is the debug output:

PLY: PARSE DEBUG START

State  : 0
Stack  : . <Token: 'NAME' 'a'>
Action : Shift and goto state 42

State  : 42
Stack  : NAME . <Token: 'LESS' '<'>
Action : Shift and goto state 81

State  : 81
Stack  : NAME LESS . <Token: 'NUMBER' '2'>
ERROR: Error  : NAME LESS . <Token: 'NUMBER' '2'>

If more of my parser is needed I can provide it. Thanks.

EDIT: One solution that was suggested to me was to make types their own token. This would require a little bit of work because my language doesn't use a preprocessor include system like C/C++ however I think it would still be possible, I'd prefer a solution restricted to the grammar however.

+1  A: 

Yacc parsers are not particularly powerful and attempting a context-free parse may be asking too much of it. I suggest using some kind of a trick to make yacc act like it is parsing a context-sensitive grammar, or, don't try to use the parser to enforce every syntax rule.

  • Add Context
    Recognize when you are parsing a type, set a flag or call a method to communicate this to the scanner, and then return a different terminal symbol for < and > in this case.
  • Simplify Grammar
    Alternatively, go ahead and just use a unified expression/template syntax for part of the template production, and error out anything other than template syntax in code. The parser is the least-capable part of your system, so push work into code where possible. (No limits to code, lots of limits to yacc.)

I'm not saying that these are your only choices. If you spent days puzzling over the state tables and tweaking the grammar to the point where yacc is happy with it, I imagine you would be "successful", but it's not worth it. At that point you may as well have just written a recursive descent parser. (RD is more lines of code, and you don't get to see the grammar neatly laid out in BNFish yacc, but at least you can parse anything and you never get bogged down with "it's not working" puzzles.)

Does Python have any equivalent to Ruby's Treetop? That would solve the problem. Bison's %glr-parser feature can also "solve" problems like this, albeit in a rather BFI way.

DigitalRoss
Unfortunately without using a specific template rule the grammar is for more ambiguous and fails on more trivial cases. I think you're right that I need to add context, which is easy for the simple cases (builtin types, or types defined in that file), but difficult for things included using my import system. Oh well :)
Alex Gaynor