tags:

views:

19

answers:

1

Is it true that in the following file, INT is taken literally, while ID is not? How can you tell?

/* C-Minus BNF Grammar */

%token ELSE
%token IF
%token INT
%token RETURN
%token VOID
%token WHILE

%token ID
%token NUM

%token LTE
%token GTE
%token EQUAL
%token NOTEQUAL
%%

program : declaration_list ;

declaration_list : declaration_list declaration | declaration ;

declaration : var_declaration | fun_declaration ;

var_declaration : type_specifier ID ';'
                | type_specifier ID '[' NUM ']' ';' ;

type_specifier : INT | VOID ;

fun_declaration : type_specifier ID '(' params ')' compound_stmt ;

params : param_list | VOID ;

param_list : param_list ',' param
           | param ;

param : type_specifier ID | type_specifier ID '[' ']' ;

compound_stmt : '{' local_declarations statement_list '}' ;

local_declarations : local_declarations var_declaration
                   | /* empty */ ;

statement_list : statement_list statement
               | /* empty */ ;

statement : expression_stmt
          | compound_stmt
          | selection_stmt
          | iteration_stmt
          | return_stmt ;

expression_stmt : expression ';'
                | ';' ;

selection_stmt : IF '(' expression ')' statement
               | IF '(' expression ')' statement ELSE statement ;

iteration_stmt : WHILE '(' expression ')' statement ;

return_stmt : RETURN ';' | RETURN expression ';' ;

expression : var '=' expression | simple_expression ;

var : ID | ID '[' expression ']' ;

simple_expression : additive_expression relop additive_expression
                  | additive_expression ;

relop : LTE | '<' | '>' | GTE | EQUAL | NOTEQUAL ;

additive_expression : additive_expression addop term | term ;

addop : '+' | '-' ;

term : term mulop factor | factor ;

mulop : '*' | '/' ;

factor : '(' expression ')' | var | call | NUM ;

call : ID '(' args ')' ;

args : arg_list | /* empty */ ;

arg_list : arg_list ',' expression | expression ;
+3  A: 

Normally, yacc takes single quoted characters literally and everything else is either declared as a production or a token.

In this case, both INT and ID are declared as tokens so one presumes they will be returned from the lexer with a corresponding semantic value assigned to the global yylval.

There is a complex mechanism to deal with multiple-character tokens but this grammar is not using it.

DigitalRoss
How does the lexer return a semantic value? What would it look like?
Phenom
In general, see http://www.gnu.org/software/bison/manual/Specifically, ID and INT are declared as tokens, so yylex() must return them. This grammar is expecting yylex() to recognize the keywords and return them as tokens. Any other word should be returned as ID. The lexer returns a semantic value by assigning to `yylval`. This will be important for `ID`, which could be any word, but it doesn't matter for the tokens that represent specific, as you say, literal symbols and keywords.
DigitalRoss