views:

99

answers:

3

Hi there .. i'm using GNU Bison 2.4.2 to write a grammar for a new language i'm working on and i have a question. When i specify a rule, let's say :

statement : T_CLASS T_IDENT  '{' T_CLASS_MEMBERS '}' {
           // create a node for the statement ...
}

If i have a variation on the rule, for instance

statement : T_CLASS T_IDENT T_EXTENDS T_IDENT_LIST  '{' T_CLASS_MEMBERS '}' {
           // create a node for the statement ...
}

Where (from flex scanner rules) :

"class"                     return T_CLASS;
"extends"                   return T_EXTENDS;
[a-zA-Z\_][a-zA-Z0-9\_]*    return T_IDENT;

(and T_IDENT_LIST is a rule for comma separated identifiers).

Is there any way to specify all of this only in one rule, setting somehow the "T_EXTENDS T_IDENT_LIST" as optional? I've already tried with

 T_CLASS T_IDENT (T_EXTENDS T_IDENT_LIST)? '{' T_CLASS_MEMBERS '}' {
     // create a node for the statement ...
 } 

But Bison gave me an error.

Thanks

+1  A: 

To make a long story short, no. Bison only deals with LALR(1) grammars, which means it only uses one symbol of lookahead. What you need is something like this:

statement: T_CLASS T_IDENT extension_list '{' ...

extension_list: 
              | T_EXTENDS T_IDENT_LIST
              ;

There are other parser generators that work with more general grammars though. If memory serves, some of them support optional elements relatively directly like you're asking for.

Jerry Coffin
That was the solution to write only one rule without the | :) Thanks!
Simone Margaritelli
A: 

I think the most you can do is

statement : T_CLASS T_IDENT  '{' T_CLASS_MEMBERS '}'
    | T_CLASS T_IDENT T_EXTENDS T_IDENT_LIST  '{' T_CLASS_MEMBERS '}' {
}
Michael Krelin - hacker
A: 

Why don't you just split them using the choice (|) operator?

statement:
  T_CLASS T_IDENT T_EXTENDS T_IDENT_LIST  '{' T_CLASS_MEMBERS '}'
  | T_CLASS T_IDENT  '{' T_CLASS_MEMBERS '}'

I don't think you can do it just because this is a LALR(1) bottom-up parser, you would need something different like a LL(k) (ANTLR?) to do what you want to do..

Jack