views:

437

answers:

1

From the Bison Manual:

In a simple interactive command parser where each input is one line, it may be sufficient to allow yyparse to return 1 on error and have the caller ignore the rest of the input line when that happens (and then call yyparse again).

This is pretty much what I want, but I am having trouble getting to work. Basically, I want to detect and error in flex, and if an error is detected, have Bison discard the entire line. What I have right now, isn't working quite right because my commands still get executed:

kbsh: ls '/home
Error: Unterminated Single Quote
admin  kbrandt  tempuser
syntax error
kbsh:

In my Bison file:

commands:
     /*Empty*/ { prompt(); } |
     command { prompt(); }
    ;

command:
    error {return 1; } |
    chdir_command |
    pwd_command |
    exit_command |
    WORD arg_list {
        execute_command($1, $2);
        //printf("%s, %s\n", $1, $2); 
    } |
    WORD { execute_command($1, NULL); }
    ;

And in my Flex:

'   {BEGIN inQuote; }

<inQuote>\n {printf("Error: Unterminated Single Quote\n"); BEGIN(0); return(ERROR);}
+4  A: 

I don't think you'll find a simple solution to handling these types of parsing errors in the lexer.

I would keep the lexer (flex/lex) as dumb as possible, it should just provide a stream of basic tokens (identifiers, keywords, etc...) and have the parser (yacc/bison) do the error detection. In fact it is setup for exactly what you want, with a little restructuring of your approach...

In the lexer (parser.l), keep it simple (no eol/newline handling), something like (isn't full thing):

}%

/* I don't recall if the backslashify is required below */
SINGLE_QUOTE_STRING \'.*\'
DOUBLE_QUOTE_STRING \".*\"

%%
{SINGLE_QUOTE_STRING} {
    yylval.charstr = copy_to_tmp_buffer(yytext);  // implies a %union
    return STRING;
}
{DOUBLE_QUOTE_STRING} {
    yylval.charstr = copy_to_tmp_buffer(yytext);  // implies a %union
    return STRING;
}
\n   return NEWLINE;

Then in your parser.y file do all the real handling (isn't full thing):

command:
    error NEWLINE
        { yyclearin; yyerrorok; print_the_next_command_prompt(); }
    | chdir_command STRING NEWLINE
        { do_the_chdir($<charstr>2); print_the_next_command_prompt(); }
    | ... and so on ...

There are two things to note here:

  1. The shift of things like NEWLINE to the yacc side so that you can determine when the user is done with the command then you can clear things out and start over (assuming you have "int yywrap() {return 1;}" somewhere). If you try to detect it too early in flex, when do you know to raise an error?
  2. chdir isn't one command (unless it was sub ruled and you just didn't show it), it now has chdir_command STRING (the argument to the chdir). This makes it so that the parser can figure out what went wrong, you can then yyerror if that directory doesn't exist, etc...

This way you should get something like (guessing what chdir might look like):

cd 'some_directory
syntax error
cd 'some_directory'
you are in the some_directory dude!

And it is all handled by the yacc grammer, not by the tokenizer.

I have found that keeping flex as simple as possible gives you the most flexibility. :)

tim
Done tinkering with the edits now...
tim
I often find myself with a bullet in my foot because I wrote an overly-complex lexer.
Chris Lutz
Thanks a lot tim, going to try to work this into my project when I get a chance. I am just starting to learn this stuff, so assuming your advise about keep flex simple it good advise, this answer is great!
Kyle Brandt
"unless it was sub ruled and you just didn't show it" -- Ya it is sub-ruled, I just didn't show it...
Kyle Brandt
So in this example, would the copy_to_tmp_buffer strip the quotes characters off?
Kyle Brandt
You could strip the quotes in the copy_to_tmp_buffer for sure. Or write a wrapper like copy_to_tmp_buffer(strip_the_quotes(yytext)). You could get fancy and move the quote parsing to the yacc side like "chdir_command '\'' IDENTIFIER '\'' NEWLINE" so long as you can isolate IDENTIFIER properly in the lexer.
tim