tags:

views:

61

answers:

3

For example, I'm supposed to convert "int" to "INT". But if there's the word "integer", I don't think it's supposed to turn into "INTeger".

If I define "int" printf("INT"); the substrings are matched though. Is there a way to prevent this from happening?

+1  A: 

well, here's how i did it:

(("int"([a-z]|[A-Z]|[0-9])+)|(([a-z]|[A-Z]|[0-9])+"int")) ECHO;
"int" printf("INT");

better suggestions welcome.

master chief
A: 

Lex will choose the rule with the longest possible match for the current input. To avoid substring matches you need to include an additional rule that is longer than int. The easiest way to do to this is to add a simple rule that picks up any string that is longer than one character, i.e. [a-zA-Z]+. The entire lex program would look like this:-

%%

[\t ]+          /* skip whitespace */
int { printf("INT"); }
[a-zA-Z]+       /* catch-all to avoid substring matches */

%%

int main(int argc, char *argv[])
   {
   yylex();
   }
Andrew O'Reilly
+2  A: 

I believe the following captures what you want.

%{
#include <stdio.h>
%}

ws                      [\t\n ]

%%

{ws}int{ws}         { printf ("%cINT%c", *yytext, yytext[4]); }
.                       { printf ("%c", *yytext); }

To expand this beyond word boundaries ({ws}, in this case) you will need to either add modifiers to ws or add more specifc checks.

ezpz
That will change "stint" so maybe you need the ws at both ends. Also what about things like int*x?
Kinopiko
This is the correct answer.
samoz