tags:

views:

73

answers:

2

I have a regular expression which works perfectly well (although I am sure it is weak) in .NET/C#:

((^|\s))(?<tag>\@(?<tagname>(\w|\+)+))(?($|\s|\.))

I am trying to move it over to Python, but I seem to be running into a formatting issue (invalid expression exception).

It is a lame question/request, but I have been staring at this for a while, but nothing obvious is jumping out at me.

Note: I am simply trying

r = re.compile('((^|\s))(?<tag>\@(?<tagname>(\w|\+)+))(?($|\s|\.))')

Thanks, Scott

A: 

Is "(?" there to prevent creation of a separate group? In Python's re's, this is "(:?". Try this:

r = re.compile(r'((^|\s))(:?<tag>\@(:?<tagname>(\w|\+)+))(:?($|\s|\.))')

Also, note the use of a raw string literal (the "r" character just before the quotes). Raw literals suppress '\' escaping, so that your '\' characters pass straight through to re (otherwise, you'd need '\\' for every '\').

Paul McGuire
Noncapturing group in .net is (?:)
Rubens Farias
The tip on raw strings was very helpful. As delroth mentions the main issue was related to ?P<name> instead of ?<name>
Scott Watermasysk
+1  A: 

There are some syntax incompatibilities between .NET regexps and PCRE/Python regexps :

  • (?<name>...) is (?P<name>...)
  • (?...) does not exist, and as I don't know what it is used for in .NET I can't guess any equivalent. A Google codesearch do not give me any pointer to what it could be used for.

Besides, you should use Python raw strings (r"I am a raw string") instead of normal strings when expressing regexps : raw strings do not interpret escape sequences (like \n). But it is not the problem in your example as you're not using any known escape sequence which could be replaced (\s does not mean anything as an escape sequence, so it is not replaced).

Pierre Bourdon