ansaurus

Question

Big problem with regular expression in Lex (lexical analyzer)

Answer 1

+2 A:

The classical regex to match strings in double quotes is:

\"([^\"]|\\.)*\"

In your case, you'll want something like this:

"title"\ *=\ *\"([^\"]|\\.)*\"

PS: IMHO, you're putting too many quotes in your regexes, it's hard to read.

rz0 2010-03-26 23:53:40

Lex doesn't work with empty spaces, it needs `" "` to match a space. It's just because of Lex really, I don't usually do this on different languages like PHP (where I'm most used to work with regex).

Nazgulled 2010-03-27 00:20:33

You can also use '`\ `' to match a space in most lex versions

Chris Dodd 2010-03-27 00:42:18

I believe '\ ' is POSIX-compliant. See http://www.opengroup.org/onlinepubs/009695399/utilities/lex.html , Table: Escape Sequences in lex.

rz0 2010-03-27 02:21:09

It's just a matter of preference, it doesn't really matter in the end.

Nazgulled 2010-03-27 02:53:37

Answer 2

A:

You could use start conditions to simplify each separate pattern, for example:

%x title
%%
"title"\ *=\ *\"  { /* mark title start */
  BEGIN(title);
  fputs("found title = <|", yyout);
}

<title>[^"\\]* { /* process title part, use ([^\"]|\\.)* to grab all at once */
  ECHO;
}

<title>\\. { /* process escapes inside title */
  char c = *(yytext + 1);
  fputc(c, yyout); /* double escaped characters */
  fputc(c, yyout);
}

<title>\" { /* mark end of title */
  fputs("|>", yyout);
  BEGIN(0); /* continue as usual */
}

To make an executable:

$ flex parse_ini.y
$ gcc -o parse_ini lex.yy.c -lfl

Run it:

$ ./parse_ini < input.txt

Where input.txt is:

author = "Marjan\" Mernik  and Viljem Zumer",
title = "Imp\"lementation of multiple...",
year = 1999

Output:

author = "Marjan\" Mernik  and Viljem Zumer",
found title = <|Imp""lementation of multiple...|>,
year = 1999

It replaced '"' around the title by '<|' and '|>'. Also'\"'` is replaced by '""' inside title.

J.F. Sebastian 2010-03-27 03:23:26

I'm already using too much start conditions, this complicates things a bit. Also, it's easier to catch everything in one regex cause I need to pass the match to a C function.

Nazgulled 2010-03-27 04:12:47

ansaurus

tags:

views:

answers:

Big problem with regular expression in Lex (lexical analyzer)

related questions