views:

106

answers:

1

I have to create a lexical and syntax analyzer for a c-like language. In this language we define as comment "everything that exists after the symbol % until the end of line". Are the following declarations correct?

Flex
...
[%][^\n]*[\n]  { return T_COMMENT; }
[\n]   { return T_NEWLINE; }

Bison
...
comment:com text newline;
text: |name text|digit text;

...
com: T_COMMENT   { printf("%s",yytext); };
newline: T_NEWLINE  { printf("%s",yytext); };

I also need to define the quote symbol ". Is the following correct (flex)?

"\""   { return T_QUOTE; }

There is no compile error in the flex and bison input files but when I use a program written in this c-like language as a test input I get as a result lexical error in line 1. There is no lexical error in this line. My program has to start with like this: PROGRAM name_of_program and a compalsory new line I make the following declarations: Flex

"PROGRAM"  { return T_PROGRAM; }

Bison

%start programma
%token T_PROGRAM
...
programma:PROGRAM name newline function STARTMAIN dec_var command ENDMAIN eof;
...
PROGRAM: T_PROGRAM  { printf("%s",yytext); };
...

(words in upper case are defined like PROGRAM as they are part of the language) Do I write anything wrong? I think that the problem is with newline definition but I am not sure.

Thank you in advance for any answer. Sorry for the long post.

+1  A: 

Generally, comments are handled by the lexer and not passed to the parser. If your language is truly C-like, then in most cases a newline should be treated like any other whitespace. Comments and quoted strings are the notable exceptions. Quoted strings are usually captured by the lexer using start states and passed to the parser whole.

Your flex code uses character sets too much. You don't need to make a set if you only want to match one particular character; just put the character, with a backslash escape if needed. Additionally, . means any non-newline character.

Also, you don't have any definition for the name_of_program token. Assuming it is a C-style identifier, you can declare an identifier pattern and token in flex and pass it up to bison.

Finally, you might want to adopt the naming convention of using all caps for tokens passed to bison from flex, and lowercase for tokens used within bison.

So, from what you've described, I have the following:

example.l:

%%

\%.* /* comment */
\n { return T_NEWLINE; }
\' { return T_QUOTE; }
PROGRAM { return T_PROGRAM; }
[A-Za-z_][A-Za-z0-9_]* { yylval.id = yytext; return T_IDENTIFIER; }

%%

example.y:

%%

programma: T_PROGRAM T_IDENTIFIER T_NEWLINE function STARTMAIN dec_var command ENDMAIN eof;

text: 
    | name text
    | digit text;

%%

I'm not sure you need the eof token in there.

I hope this helps.

Mike DeSimone