ansaurus

Question

Handling error conditions in Lex rather than Yacc?

Answer 1

+1 A:

Wow! Interesting scheme.

If you're going to detect that in the lexical analyzer, you would have to have a catch-all rule that deals with 'any otherwise unrecognized string' and produces an error message.

Determining that it was the K that caused the trouble is going to be hell.

[^aA][0-9]{2,2}[pP][sS][nN]? { report_error(); return ERROR; }
[aA][0-9]{2,2}[^pP][sS][nN]? { report_error(); return ERROR; }
[aA][0-9]{2,2}[pP][^sS][nN]? { report_error(); return ERROR; }
[aA][0-9]{2,2}[pP][sS][^nN]  { report_error(); return ERROR; }

Note the placing of the carets, and the absence of the question mark! Dealing with non-digits, or too many digits, or too few digits - urgh!

Generally, you would be better of recognizing all 'identifiers' and then validating which ones are OK:

[a-zA-Z][0-9]{2,2}[a-zA-Z]{2,5} { return validate_id_string(); }

Choose your poison what you allow into the validation routine; it decides whether what was entered was OK or not, and its return value controls what the Lex rule returns to the grammar. This is one way of distinguishing keywords from identifiers too.

Generalize and simplify the regular expression to suit what really goes on.

Jonathan Leffler 2009-09-30 06:33:03

Wow great response. Shouldn't validate_id_string() be like validate_id_string(yytext)?? Pass in the yytext to validate?

DevDevDev 2009-09-30 18:13:32

@DevDevDev: passing yytext is optional since yytext is global. It depends in part on whether you'd use the function anywhere else. Yes, parameters are better than globals. But I was illustrating general technique, not the minute niceties of good coding style.

Jonathan Leffler 2009-09-30 19:47:33

Thanks! I wasn't trying to comment on the coding style I was just wondering if you meant validate_id_string would access yytext.

DevDevDev 2009-09-30 23:02:04

ansaurus

tags:

views:

answers:

Handling error conditions in Lex rather than Yacc?

related questions