views:

29

answers:

2

I have tried something like this in my Bison file...

ReturnS: RETURN expression {printf(";")}

...but the semicolon gets printed AFTER the next token, past this rule, instead of right after the expression. This rule was made as we're required to convert the input file to a c-like form and the original language doesn't require a semicolon after the expression in the return statement, but C does, so I thought I'd add it manually to the output with printf. That doesn't seem to work, as the semicolon gets added but for some reason, it gets added after the next token is parsed (outside the ReturnS rule) instead of right when the expression rule returns to ReturnS.

This rule also causes the same result:

loop_for: FOR var_name COLONEQUALS expression TO {printf("%s<=", $<chartype>2);} expression STEP {printf("%s+=", $<chartype>2);} expression {printf(")\n");} Code ENDFOR

Besides the first two printf's not working right (I'll post another question regarding that), the last printf is actually called AFTER the first token/literal of the "Code" rule has been parsed, resulting in something like this:

for (i=0; i<=5; i+=1
a)
=a+1;

instead of

for (i=0; i<=5; i+=1)
a=a+1;

Any ideas what I'm doing wrong?

+1  A: 

Probably because the grammar has to look-ahead one token to decide to reduce by the rule you show.

The action is executed when the rule is reduced, and it is very typical that the grammar has to read one more token before it knows that it can/should reduce the previous rule.

For example, if an expression can consist of an indefinite sequence of added terms, it has to read beyond the last term to know there isn't another '+' to continue the expression.


After seeing the Yacc/Bison grammar and Lex/Flex analyzer, some of the problems became obvious, and others took a little more sorting out.

  • Having the lexical analyzer do much of the printing meant that the grammar was not properly in control of what appeared when. The analyzer was doing too much.
  • The analyzer was also not doing enough work - making the grammar process strings and numbers one character at a time is possible, but unnecessarily hard work.
  • Handling comments is tricky if they need to be preserved. In a regular C compiler, the lexical analyzer throws the comments away; in this case, the comments had to be preserved. The rule handling this was moved from the grammar (where it was causing shift/reduce and reduce/reduce conflicts because of empty strings matching comments) to the lexical analyzer. This may not always be optimal, but it seemed to work OK in this context.
  • The lexical analyzer needed to ensure that it returned a suitable value for yylval when a value was needed.
  • The grammar needed to propagate suitable values in $$ to ensure that rules had the necessary information. Keywords for the most part did not need a value; things like variable names and numbers do.
  • The grammar had to do the printing in the appropriate places.

The prototype solution returned had a major memory leak because it used strdup() liberally and didn't use free() at all. Making sure that the leaks are fixed - possibly by using a char array rather than a char pointer for YYSTYPE - is left to the OP.

Jonathan Leffler
I see. Thanks for the response. Have no idea how to fix that, but now at least I have a clue where to look.
Lefteris Aslanoglou
I've tried but can't seem to fix this. Any ideas on what to do to actually resolve this?
Lefteris Aslanoglou
@Leftos: drop me the code by email - see my profile. I'll see what I can make of it.
Jonathan Leffler
A: 

Comments aren't a good place to provide code samples, so I'm going to provide an example of code that works, after Jonathan (replied above) did some work on my code. All due credit goes to him, this isn't mine.

Instead of having FLEX print any recognized parts and letting BISON do the formatting afterwards, Jonathan suggested that FLEX prints nothing and only returns to BISON, which should then handle all printing it self.

So, instead of something like this...


FLEX

"FOR"   {printf("for ("); return FOR;}
"TO"    {printf("; "); return TO;}
"STEP"  {printf("; "); return STEP;}
"ENDFOR"    {printf("\n"); printf("}\n"); return ENDFOR;}
[a-zA-Z]+   {printf("%s",yytext); yylval.strV = yytext; return CHARACTERS;}
":="    {printf("="); lisnew=0; return COLONEQUALS;}

BISON

loop_for:   FOR var_name {strcpy(myvar, $<strV>2);} COLONEQUALS expression TO {printf("%s<=", myvar);} expression STEP {printf("%s+=", myvar);} expression {printf(")\n");} Code ENDFOR

...he suggested this:


FLEX

[a-zA-Z][a-zA-Z0-9]*    { yylval = strdup(yytext); return VARNAME;}
[1-9][0-9]*|0           { yylval = strdup(yytext); return NUMBER; }

BISON

loop_for:   FOR var_name COLONEQUALS NUMBER TO NUMBER STEP NUMBER
    { printf("for (%s = %s; %s <= %s; %s += %s)\n", $2, $4, $2, $6, $2, $8); }
var_name:   VARNAME
Lefteris Aslanoglou
Jonathan, if you'd like add the above examples to your answer I'll gladly delete my answer.
Lefteris Aslanoglou
Also, if anyone's wondering about the $x's in printf and how they work, see also SO 3539498.http://stackoverflow.com/questions/3539498/using-x-to-grab-string-from-rule
Lefteris Aslanoglou