views:

132

answers:

1

Suppose I have a lex regular expression like

[aA][0-9]{2,2}[pP][sS][nN]? { return TOKEN; }

If a user enters

A75PsN
A75PS

It will match

But if a user says something like

A75PKN

I would like it to error and say "Character K not recognized, expecting S"

What I am doing right now is just writing it like

let [a-zA-Z]
num [0-9]

{let}{num}{2,2}{let}{2,3}

And then essentially re-lexing the string in Yacc so that I can have meaningful error conditions

How can I get around this?

The only thing I can think of is to use named groups?

+1  A: 

Wow! Interesting scheme.

If you're going to detect that in the lexical analyzer, you would have to have a catch-all rule that deals with 'any otherwise unrecognized string' and produces an error message.

Determining that it was the K that caused the trouble is going to be hell.

[^aA][0-9]{2,2}[pP][sS][nN]? { report_error(); return ERROR; }
[aA][0-9]{2,2}[^pP][sS][nN]? { report_error(); return ERROR; }
[aA][0-9]{2,2}[pP][^sS][nN]? { report_error(); return ERROR; }
[aA][0-9]{2,2}[pP][sS][^nN]  { report_error(); return ERROR; }

Note the placing of the carets, and the absence of the question mark! Dealing with non-digits, or too many digits, or too few digits - urgh!

Generally, you would be better of recognizing all 'identifiers' and then validating which ones are OK:

[a-zA-Z][0-9]{2,2}[a-zA-Z]{2,5} { return validate_id_string(); }

Choose your poison what you allow into the validation routine; it decides whether what was entered was OK or not, and its return value controls what the Lex rule returns to the grammar. This is one way of distinguishing keywords from identifiers too.

Generalize and simplify the regular expression to suit what really goes on.

Jonathan Leffler
Wow great response. Shouldn't validate_id_string() be like validate_id_string(yytext)?? Pass in the yytext to validate?
DevDevDev
@DevDevDev: passing yytext is optional since yytext is global. It depends in part on whether you'd use the function anywhere else. Yes, parameters are better than globals. But I was illustrating general technique, not the minute niceties of good coding style.
Jonathan Leffler
Thanks! I wasn't trying to comment on the coding style I was just wondering if you meant validate_id_string would access yytext.
DevDevDev