tags:

views:

99

answers:

4

Hi,

I am trying to parse FSM statements of the Gezel language (http://rijndael.ece.vt.edu/gezel2/) using Python and regular expressions

regex_cond = re.compile(r'.+((else\tif|else|if)).+')  
line2 = '@s0 else if (insreg==1) then (initx,PING,notend) -> sinitx;'
match = regex_cond.match(line2);

I have problems to distinguish if and else if. The else if in the example is recognized as a if.

/Markus

+3  A: 

a \t matches a tab character. It doesn't look like you have a tab character between "else" and "if" in line2. You might try \s instead, which matches any whitespace character.

Alex B
I might also suggest that you could remove the double parentheses ((...)) and replace with one set (...), as one set will provide both a capture and an alternate.
Alex B
thank you, but it is still matching if and not else if.
markus
True, but not the only problem.
katrielalex
A: 

Correct me if im wrong, but RE are not good for parsing, since its only sufficient for Type2 languages. For exaple you can't decide weather or not ((())())) is a valid statement without "counting", which regex can't do. Or, to talk about your example, if else else could not be found as invalid. Maybe im mixiung up scanner/parser, in this case please tell me.

InsertNickHere
Parsing nested structures with Regex was pretty well shot down in [this SO question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). This question related to HTML but applies equally well to any nested structures
NealB
+2  A: 

Don't do this; use pyparsing instead. You'll thank yourself later.


The problem is that .+ is greedy, so it's eating up the else... do .+? instead. Or rather, don't, because you're using pyparsing now.

regex_cond = re.compile( r'.+?(else\sif|else|if).+?' )
...
# else if
katrielalex
+1  A: 

Your immediate problem is that .+ is greedy and so it matches @s0 else instead of just @s0. To make it non-greedy, use .+? instead:

import re

regex_cond = re.compile(r'.+?(else\s+if|else|if).+')  
line2 = '@s0 else if (insreg==1) then (initx,PING,notend) -> sinitx;'
match = regex_cond.match(line2)
print(match.groups())
# ('else if',)

However, like others have suggested, using a parser like Pyparsing is a better method than using re here.

unutbu