views:

113

answers:

3

I have some bison grammar:

input: /* empty */
       | input command
;

command:
        builtin
        | external
;

builtin:
        CD { printf("Changing to home directory...\n"); }
        | CD WORD { printf("Changing to directory %s\n", $2); }
;

I'm wondering how I get Bison to not accept (YYACCEPT?) something as a command until it reads ALL of the input. So I can have all these rules below that use recursion or whatever to build things up, which either results in a valid command or something that's not going to work.

One simple test I'm doing with the code above is just entering "cd mydir mydir". Bison parses CD and WORD and goes "hey! this is a command, put it to the top!". Then the next token it finds is just WORD, which has no rule, and then it reports an error.

I want it to read the whole line and realize CD WORD WORD is not a rule, and then report an error. I think I'm missing something obvious and would greatly appreciate any help - thanks!

Also - I've tried using input command NEWLINE or something similar, but it still pushes CD WORD to the top as a command and then parses the extra WORD separately.

A: 

Usually, things aren't done the way you describe.

With Bison/Yakk/Lex, one usually carefully designs their syntax to do exactly what they need. Because Bison/Yakk/Lex are naturally greedy with their regular expressions, this should help you.

So, how about this instead.

Since you are parsing whole lines at a time, I think we can use this fact to our advantage and revise the syntax.

input : /* empty */
      | line


command-break : command-break semi-colon
              | semi-colon

line : commands new-line

commands : commands command-break command
         | commands command-break command command-break
         | command
         | command command-break

...

Where new-line, 'semi-colonis defined in yourlexsource as something like\n,\t` . This should give you the UNIX-style syntax for commands that you are looking for. All sorts of things are possible, and it is a little bloated allowing for multiple semicolons and doesn't take in consideration white-space, but you should get the idea.

Lex and Yakk are a powerful tool, and I find them quite enjoyable - at least, when you aren't on a deadline.

rlb.usa
+2  A: 

Sometimes I deal with these cases by flattening my grammars.

In your case, it might make sense to add tokens to your lexer for newline and command separators (;) so you can explicitly put them in your Bison grammar, so the parser will expect a full line of input for a command before accepting as a commmand.

sep:   NEWLINE | SEMICOLON
   ;

command:  CD  sep
   |  CD WORD sep
   ;

Or, for an arbitrary list of arguments like a real shell:

args:
    /* empty */
  | args WORD
  ;

command:
      CD args sep
   ;
mrjoltcola
This seems to work. It's a bummer, though, that i have to specifically mention that separator expression for each command. I might change over to arbitrary arguments at some point...but not yet! I'm still curious if there are other ways to do this...
chucknelson
Correction: this works with 2 words (cd hello hello), but at that point it pops the tokens off. Then it starts again for some reason. So "cd hello1 hello2 hello3" will pop off cd, hello1, and hello2, but then it will try to match a separate rule for hello3. I'm so confused...
chucknelson
If you use the "args" rule as in the 2nd portion above it should match an arbitrary number.
mrjoltcola
Still don't fully understand all of this, and I still get some wacky results in Bison, but this definitely helped.
chucknelson
A: 

Couldn't you just change your rule match actions to append to a list of actions you want to perform if the whole thing works? Then after the entire input has been processed you decide if you want to do what was in that list of actions based on if you saw any parse errors.

nategoose