tags:

views:

76

answers:

1

hey guys,

When i use this expression:

re.sub("([^\s\w])(\s*\1)+","\\1","...")

I checked the regex at gskinner, its supposed to return me '.'

but it doesn't work.

If i use

    re.sub(r"([^\s\w])(\s*\1)+","\\1","...")

It returns me the error:

raise error, v # invalid expression

sre_constants.error: nothing to repeat

Can someone plse explain? thanks

+2  A: 

Hi, It seems to be a python bug (that works perfectly in vim). The source of the problem is the (\s*...)+ bit. Basically , you can't do (\s*)+ which make sense , because you are trying to repeat something which can be null.

>>> re.compile(r"(\s*)+")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py", line 180, in compile
    return _compile(pattern, flags)
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py", line 233, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat

However (\s*\1) should not be null, but we know it only because we know what's in \1. Apparently python doesn't ... that's weird.

mb14
Even weirder is that `([^\s\w])(\1)+` *does* work.
Alan Moore
@alan: yes, I have noticed that as well.
mb14
if that's the case, is there a workaround?
goh
@goh: I guess you need to do it in two steps, first remove all the spaces betweens indenticals stuff and then do your previous stuff ,but you won't need anymore the \s* which causes problems.
mb14
@mb14, yep i suppose thanks.
goh