ansaurus

Question

lex (flex) generated program not parsing whole input

Answer 1

A:

This rule

[-+]?([0-9*\.?[0-9]+|[0-9]+\.)([Ee][-+]?[0-9]+)? 
          |

seems to be missing a closing bracket just after the first 0-9, I added a | below where I think it should be. I couldn't begin to guess how flex would respond to that.

The rule I usually use for symbol names is [a-zA-Z$_], this is like your unquoted strings except that I usually allow numbers inside symbols as long as the symbol doesn't start with a number.

[a-zA-Z$_]([a-zA-Z$_]|[0-9])*

A characters is just a short symbol. I don't think it needs to have its own rule, but if it does, then you need to insure that the string rule requires at least 2 characters.

[a-zA-Z$_]([a-zA-Z$_]|[0-9])+

John Knoeller 2010-02-14 06:26:51

Fixed the mismatched square brackets, but no luck. However, I did manage to duplicate the issue with a shorter set of rules.

Zxaos 2010-02-14 15:47:09

Answer 2

+1 A:

When generating a lexer that's standalone (that is, not one with tokens that are defined in bison/yacc, you typically write an enum at the top of the file defining your tokens. However, the main loop of a lex program, including the main loop generated by default, looks something like this:

while( token = yylex() ){
    ...

This is fine, until your lexer matches the rule that appears first in the enum - in this specific case CDR. Since enums by default start at zero, this causes the while loop to end. Renumbering your enum - will solve the issue.

enum tokens{
            CDR = 1,
            CHARACTER,
            SET
    };

Short version: when defining tokens by hand for a lexer, start with 1 not 0.

Zxaos 2010-02-14 17:10:38

ansaurus

tags:

views:

answers:

lex (flex) generated program not parsing whole input

related questions