tags:

views:

366

answers:

4

I am trying to convert the following Perl regex I found in the Video::Filename Perl module to a Python 2.5.4 regex to parse a filename

# Perl > v5.10
re => '^(?:(?<name>.*?)[\/\s._-]*)?(?<openb>\[)?(?<season>\d{1,2})[x\/](?<episode>\d{1,2})(?:-(?:\k<season>x)?(?<endep>\d{1,2}))?(?(<openb>)\])(?:[\s._-]*(?<epname>[^\/]+?))?$',

I would like to use named groups too, and I know in Python the regex extension for named groups is different, but I am not 100% sure on the syntax.

This is what I tried:

# Python (not working)
r = re.compile(r'^(?:(?P<name>.*?)[\/\s._-]*)?(?P<openb>\[)?(?P<season>\d{1,2})[x\/](?P<episode>\d{1,2})(?:-(?:\kP<season>x)?(?P<endep>\d{1,2}))?(?(P<openb>)\])(?:[\s._-]*(?P<epname>[^\/]+?))?$')

The error I get:

   raise error, v # invalid expression
sre_constants.error: bad character in group name

For example, this one I managed to convert and it works. But the one above I can't seem to get right. I get a compilation error in Python.

# Perl:
re => '^(?:(?<name>.*?)[\/\s._-]+)?(?:s|se|season|series)[\s._-]?(?<season>\d{1,2})[x\/\s._-]*(?:e|ep|episode|[\/\s._-]+)[\s._-]?(?<episode>\d{1,2})(?:-?(?:(?:e|ep)[\s._]*)?(?<endep>\d{1,2}))?(?:[\s._]?(?:p|part)[\s._]?(?<part>\d+))?(?<subep>[a-z])?(?:[\/\s._-]*(?<epname>[^\/]+?))?$',

# Python (working):
r = re.compile(r'^(?:(?P<name>.*?)[\/\s._-]+)?(?:s|se|season|series)[\s._-]?(?P<season>\d{1,2})[x\/\s._-]*(?:e|ep|episode|[\/\s._-]+)[\s._-]?(?P<episode>\d{1,2})(?:-?(?:(?:e|ep)[\s._]*)?(?P<endep>\d{1,2}))?(?:[\s._]?(?:p|part)[\s._]?(?P<part>\d+))?(?P<subep>[a-z])?(?:[\/\s._-]*(?P<epname>[^\/]+?))?$')

I am not sure where to start looking.

A: 

Those regexps are the product of a sick an twisted mind... :-)

Anyway, (?()) are conditions in both Python and Perl, and the perl syntax above looks like it should be the same as the Python syntax, i.e., it evaluates as true of the group named exists.

Where to start looking? The documentation for the modules are here:

http://docs.python.org/library/re.html http://www.perl.com/doc/manual/html/pod/perlre.html

Lennart Regebro
Reading someone else's regex always makes my brain hurt. Maybe I'm better off just rewriting them in Python.
Andre Miller
Unfortunately they're using regex extensions, which differ between the two implementations so it doesn't work verbatim. I was hoping somewhere here is an expert in both that could show me the differences.
Andre Miller
+2  A: 

I found the offending part but can't figure out what exactly is wrong without wrapping my mind around the whole thing.

r = re.compile(r'^(?:(?P<name>.*?)[\/\s._-]*)?(?P<openb>\[)?(?P<season>\d{1,2})[x\/](?P<episode>\d{1,2})(?:-(?:\kP<season>x)?(?P<endep>\d{1,2}))?

(?(P<openb>)\]) // this part here causes the error message

(?:[\s._-]*(?P<epname>[^\/]+?))?$')

The problem seems to be with the fact that group names in python must be valid python identifiers (check documentation). The parentheses seem to be the problem. Removing them gives

(?(P<openb>)\]) //with parentheses
(?P<openb>\])   //without parentheses

redefinition of group name 'openb' as group 6; was group 2
jitter
Yeah, I got that far too, I think its because Python expects a named group to be (?P<name>...) and that syntax doesn't match, but I don't know what to put in its place
Andre Miller
+6  A: 

There are 2 problems with your translation. First of all, the second mention of openb has extra parenthesis around it making it a conditional expression, not a named expression.

Next is that you didn't translate the \k<season> backreference, Python uses (P=season) to match the same. The following compiles for me:

r = re.compile(r'^(?:(?P<name>.*?)[\/\s._-]*)?(?P<openb>\[)?(?P<season>\d{1,2})[x\/](?P<episode>\d{1,2})(?:-(?:(?P=season)x)?(?P<endep>\d{1,2}))?(?(openb)\])(?:[\s._-]*(?P<epname>[^\/]+?))?$')

If I were you, I'd use re.VERBOSE to split this expression over multiple lines and add copious documentation so you can keep understanding the expression in the future if this is something that needs to remain maintainable though.

(edited after realising the second openb reference was a conditional expression, and to properly translate the backreference).

Martijn Pieters
Hmm, in Perl the \k<name> means a backreference to a previously defined named group, so its normal (in this case) to have the name twice in the same regex.
Andre Miller
Ah, I see. Then that needs to be converted to a (?P=name) reference, I think. Updating..
Martijn Pieters
Well, as the perlre man page completely fails to mention named patterns nor backreferences to them, and I am short on time right now, we'll leave that as an exercise for the reader for now.
Martijn Pieters
Actually perlre< http://perldoc.perl.org/perlre.html > does mention named patterns, and named back-references, but only on Perl5.10 and newer.
Brad Gilbert
Right! And now I see that my assertion that openb was used twice was incorrect as well; it's a conditional switch `(?(condition)yes-pattern|no-pattern)` that our question asker mistranslated. Updating my answer accordingly.
Martijn Pieters
Thanks Martijn, that does seem to work. Accepting your answer.
Andre Miller
A: 

I may be wrong but you tried to get the backreference using :

(?:\k<season>x)

Isn't the syntax \g<name> in Python ?

e-satis