views:

77

answers:

3

Given a string of text, in Python:

s = "(((((hi abc )))))))"
s = "***(((((hi abc ***&&&&"

How do I replace all non-alphabetic symbols that occur more than 3 times...as blank string

For all the above, the result should be:

hi abc
+8  A: 

This should work: \W{3,}: matching non-alphanumerics that occur 3 or more times:

>>> s = "***(((((hi abc ***&&&&"
>>> re.sub("\W{3,}", "", s) 
'hi abc'
>>> s = "(((((hi abc )))))))"
>>> re.sub("\W{3,}", "", s) 
'hi abc'
Stephen
"more than 3" != "3 or more"
John Machin
@John : Correct. The examples included '***', so I shot a guess that he wanted 3+... I was confident that, given this solution, he could figure out how to add one. (That's why I italicized _3 or more_)
Stephen
+4  A: 

If you want to replace any sequence of non-space non-alphamerics (e.g. '!?&' as well as your examples), @Stephen's answer is fine. But if you only want to replace sequences of three or more identical non-alphamerics, a backreference will help:

>>> r3 = re.compile(r'(([^\s\w])\2{2,})')
>>> r3.findall('&&&xxx!&?yyy*****')
[('&&&', '&'), ('*****', '*')]

So, for example:

>>> r3.sub('', '&&&xxx!&?yyy*****')
'xxx!&?yyy'
Alex Martelli
"more than 3" != "3 or more"
John Machin
+1, I came back to add backreferences to my answer, but I'll let you have it... :)
Stephen
@John, yep, but as @Stephen already explained, it's more believable that the OP did a slight mistake in English, than a total blooper in his example of desired behavior;-).
Alex Martelli
A: 

You can't (easily, using regexes) replace that by a "blank string" that's the same length as the replaced text. You can replace it with an empty string "" or a single space " " or any other constant string of your choice; I've used "*" in the example so that it is easier to see what is happening.

>>> re.sub(r"(\W)\1{3,}", "*", "12345<><>aaaaa%%%11111<<<<..>>>>")
'12345<><>aaaaa%%%11111*..*'
>>>

Note carefully: it doesn't change "<><>" ... I'm assuming that "non-alphabetic symbols that occur more than 3 times" means the same symbol has to occur more than 3 times". I'm also assuming that you did mean "more than 3" and not "3 or more".

John Machin