views:

858

answers:

3

I'm using Flex and Bison for a parser generator, but having problems with the start states in my scanner.

I'm using exclusive rules to deal with commenting, but this grammar doesn't seem to match quoted tokens:

%x COMMENT

//                    { BEGIN(COMMENT); }
<COMMENT>[^\n]        ;
<COMMENT>\n           { BEGIN(INITIAL); }

"=="                  { return EQUALEQUAL; }

.                     ;

In this simple example the line:

// a == b

isn't matched entirely as a comment, unless I include this rule:

<COMMENT>"=="             ;

How do I get round this without having to add all these tokens into my exclusive rules?

+4  A: 

Matching C-style comments in Lex/Flex or whatever is well documented:

in the documentation, as well as various variations around the Internet.

Here is a variation on that found in the Flex documentation:

   <INITIAL>{
     "//"              BEGIN(IN_COMMENT);
     }
     <IN_COMMENT>{
     \n      BEGIN(INITIAL);
     [^\n]+    // eat comment
     "/"       // eat the lone /
     }
Aiden Bell
I'd rather not have to use inclusive states if it can be avoided as I have a lot of rules. The problem is this 'eat comment' rule doesn't seem to match tokens with more than one character (such as ==).
Dan
Then I think you might be doing something wrong. You need to create a 'sub parser' for comments, which does not match your normal tokens.
Aiden Bell
+1  A: 

Try adding a "+" after the [^n] rule. I don't know why the exclusive state is still picking up '==' even in an exclusive state, but apparently it is. Flex will normally match the rule that matches the most text, and adding the "+" will at least make the two rules tie in length. Putting the COMMENT rule first will cause it to be used in case of a tie.

Darryl
A: 

The clue is:

The problem is this 'eat comment' rule doesn't seem to match tokens with more than one character

so add a * to match zero or more non-newlines. You want Zero otherwise a empty comment will not match.

%x COMMENT

//                    { BEGIN(COMMENT); }
<COMMENT>[^\n]*        ;
<COMMENT>\n           { BEGIN(INITIAL); }

"=="                  { return EQUALEQUAL; }

.                     ;
Simeon Pilgrim
An empty comment won't trigger a match for that rule either, nor does it need to.
Darryl
True, true, the line below that catches it. so you are safe to change the * to a +
Simeon Pilgrim