views:

445

answers:

2

Wikipedia's Interpolation Definition I am just learning flex / bison and I am writing my own shell with it. I am trying to figure out a good way to do variable interpolation. My initial approach to this was to have flex scan for something like ~ for my home directory, or $myVar , and then set what the yyval.stringto what is returned using a look up function. My problem is, that this doesn't help me when text appears one token:

kbsh:/home/kbrandt% echo ~
/home/kbrandt
kbsh:/home/kbrandt% echo ~/foo
/home/kbrandt /foo
kbsh:/home/kbrandt%

The lex definition I have for variables:

\$[a-zA-Z/0-9_]+    {
    yylval.string=return_value(&variables, (yytext + sizeof(char)));;
    return(WORD);
}

Then in my Grammar, I have things like:

chdir_command:
    CD WORD { change_dir($2); }
    ;

Anyone know of a good way to handle this sort of thing? Am I going about this all wrong?

+1  A: 

Looks generally OK


I'm not sure what return_value is doing, hopefully it will strdup(3) the variable name, because yytext is just a buffer.

If you are asking about the division of labor between lex and parse, I'm sure it's perfectly reasonable to push the macro processing and parameter substitution into the scanner and just have your grammar deal with WORDs, lists, commands, pipelines, redirections, etc. After all, it would be reasonable enough, albeit kind of out of style and possibly defeating the point of your exercise, to do everything with code.

I do think that making cd or chdir a terminal symbol and using that in a grammar production is...not the best design decision. Just because a command is a built-in doesn't mean it should appear as a rule. Go ahead and parse cd and chdir like any other command. Check for built-in semantics as an action, not a production.

After all, what if it's redefined as a shell procedure?

DigitalRoss
+2  A: 

The way 'traditional' shells deal with things like variable substitution is difficult to handle with lex/yacc. What they do is more like macro expansion, where AFTER expanding a variable, they then re-tokenize the input, without expanding further variables. So for example, an input like "xx${$foo}" where 'foo' is defined as 'bar' and 'bar' is defined as '$y' will expand to 'xx$y' which will be treated as a single word (and $y will NOT be expanded).

You CAN deal with this in flex, but you need a lot of supporting code. You need to use flex's yy_buffer_state stuff to sometimes redirect the output into a buffer that you'll then rescan from, and use start states carefully to control when variables can and can't be expanded.

Its probably easier to use a very simple lexer that returns tokens like ALPHA (one or more alphabetic chars), NUMERIC (one or more digits), or WHITESPACE (one or more space or tab), and have the parser assemble them appropriately, and you end up with rules like:

simple_command: wordlist NEWLINE ;

wordlist: word | wordlist WHITESPACE word ;

word: word_frag
    | word word_frag { $$ = concat_string($1, $2); }
;

word_frag: single_quote_string
         | double_quote_string
         | variable
         | ALPHA
         | NUMERIC
        ...more options...
;

variable: '$' name { $$ = lookup($2); }
        | '$' '{' word '}' { $$ = lookup($3); }
        | '$' '{' word ':' ....

as you can see, this get complex quite fast.

Chris Dodd