views:

325

answers:

3

Using Flex and Bison, I have a grammar specification for a boolean query language, which supports logical "and", "or", and "not" operations, as well as nested subexpressions using "()".

All was well until I noticed that queries like "A and B or C and D" which I'd like parsed as "(A & B) | (C & D)" was actually being interpreted as "A & ( B | ( C & D ) )". I'm nearly certain this is an associativity issue, but can't seem to find a proper explanation or example anywhere - that or I'm missing something important.

Pertinent information from boolpars.y:

%token TOKEN
%token OPEN_PAREN CLOSE_PAREN
%right NOT
%left AND
%left OR

%%

query:      expression                          { ... }
            ;

expression: expression AND expression           { ... }
            | expression OR expression          { ... }
            | NOT expression                    { ... }
            | OPEN_PAREN expression CLOSE_PAREN { ... }
            | TOKEN                             { ... }
            ;

Can anyone find the flaw? I can't see why Bison isn't giving "or" appropriate precedence.

+3  A: 

From bison docs:

Operator precedence is determined by the line ordering of the declarations; the higher the line number of the declaration (lower on the page or screen), the higher the precedence.

So in your case OR is lower on the screen and has higher precedence. Change the order to

%left OR
%left AND

(I haven't tested it though)

marcin
That's the weird part - I've tried it, and the output is the same. (Good finding that particular sentence though)
Chris
Simeon Pilgrim
+1  A: 

Why not split up the productions, as in this snippet from a C-ish language

logical_AND_expression:
    inclusive_OR_expression
    | logical_AND_expression ANDAND inclusive_OR_expression
            {$$ = N2(__logand__, $1, $3);}
    ;

logical_OR_expression:
    logical_AND_expression
    | logical_OR_expression OROR logical_AND_expression
            {$$ = N2(__logor__, $1, $3);}
    ;
themis
+1  A: 

I've performed tests on my own implementation, and from my tests, marcin's answer is correct. If I define the precedence as:

%left OR
%left AND

Then the expression A&B|C&D will be reduced to ((A&B)|(C&D))

If I define the precedence as:

%left AND
%left OR

Then the expression A&B|C&D will be reduced to ((A&(B|C))&D)

One differentiating expression would be:

true & true | true & false

The former precedence definition would render this as true, whereas the latter would render it as false. I've tested both scenarios and both work as explained.

Double check your tests to make sure. Also note that it is the order of the %left, %right, etc. definitions in the header portion that define the precedence, not the order that you define your rules themselves. If it's still not working, maybe it's some other area in your code that's messing it up, or maybe your version of bison is different (I'm just shooting in the dark at this point).

Anthony Johnson
Simeon Pilgrim