Given a string of text, in Python:
s = "(((((hi abc )))))))"
s = "***(((((hi abc ***&&&&"
How do I replace all non-alphabetic symbols that occur more than 3 times...as blank string
For all the above, the result should be:
hi abc
Given a string of text, in Python:
s = "(((((hi abc )))))))"
s = "***(((((hi abc ***&&&&"
How do I replace all non-alphabetic symbols that occur more than 3 times...as blank string
For all the above, the result should be:
hi abc
This should work: \W{3,}
: matching non-alphanumerics that occur 3 or more times:
>>> s = "***(((((hi abc ***&&&&"
>>> re.sub("\W{3,}", "", s)
'hi abc'
>>> s = "(((((hi abc )))))))"
>>> re.sub("\W{3,}", "", s)
'hi abc'
If you want to replace any sequence of non-space non-alphamerics (e.g. '!?&'
as well as your examples), @Stephen's answer is fine. But if you only want to replace sequences of three or more identical non-alphamerics, a backreference will help:
>>> r3 = re.compile(r'(([^\s\w])\2{2,})')
>>> r3.findall('&&&xxx!&?yyy*****')
[('&&&', '&'), ('*****', '*')]
So, for example:
>>> r3.sub('', '&&&xxx!&?yyy*****')
'xxx!&?yyy'
You can't (easily, using regexes) replace that by a "blank string" that's the same length as the replaced text. You can replace it with an empty string ""
or a single space " "
or any other constant string of your choice; I've used "*"
in the example so that it is easier to see what is happening.
>>> re.sub(r"(\W)\1{3,}", "*", "12345<><>aaaaa%%%11111<<<<..>>>>")
'12345<><>aaaaa%%%11111*..*'
>>>
Note carefully: it doesn't change "<><>" ... I'm assuming that "non-alphabetic symbols that occur more than 3 times" means the same symbol has to occur more than 3 times". I'm also assuming that you did mean "more than 3" and not "3 or more".