ansaurus

Question

How do I do this replace regex in python?

Answer 1

+8 A:

This should work: \W{3,}: matching non-alphanumerics that occur 3 or more times:

>>> s = "***(((((hi abc ***&&&&"
>>> re.sub("\W{3,}", "", s) 
'hi abc'
>>> s = "(((((hi abc )))))))"
>>> re.sub("\W{3,}", "", s) 
'hi abc'

Stephen 2010-07-23 00:28:27

"more than 3" != "3 or more"

John Machin 2010-07-23 00:47:34

@John : Correct. The examples included '***', so I shot a guess that he wanted 3+... I was confident that, given this solution, he could figure out how to add one. (That's why I italicized _3 or more_)

Stephen 2010-07-23 00:50:48

Answer 2

+4 A:

If you want to replace any sequence of non-space non-alphamerics (e.g. '!?&' as well as your examples), @Stephen's answer is fine. But if you only want to replace sequences of three or more identical non-alphamerics, a backreference will help:

>>> r3 = re.compile(r'(([^\s\w])\2{2,})')
>>> r3.findall('&&&xxx!&?yyy*****')
[('&&&', '&'), ('*****', '*')]

So, for example:

>>> r3.sub('', '&&&xxx!&?yyy*****')
'xxx!&?yyy'

Alex Martelli 2010-07-23 00:38:26

"more than 3" != "3 or more"

John Machin 2010-07-23 00:47:51

+1, I came back to add backreferences to my answer, but I'll let you have it... :)

Stephen 2010-07-23 00:52:26

@John, yep, but as @Stephen already explained, it's more believable that the OP did a slight mistake in English, than a total blooper in his example of desired behavior;-).

Alex Martelli 2010-07-23 01:01:19

Answer 3

A:

You can't (easily, using regexes) replace that by a "blank string" that's the same length as the replaced text. You can replace it with an empty string "" or a single space " " or any other constant string of your choice; I've used "*" in the example so that it is easier to see what is happening.

>>> re.sub(r"(\W)\1{3,}", "*", "12345<><>aaaaa%%%11111<<<<..>>>>")
'12345<><>aaaaa%%%11111*..*'
>>>

Note carefully: it doesn't change "<><>" ... I'm assuming that "non-alphabetic symbols that occur more than 3 times" means the same symbol has to occur more than 3 times". I'm also assuming that you did mean "more than 3" and not "3 or more".

John Machin 2010-07-23 00:46:27

ansaurus

tags:

views:

answers:

How do I do this replace regex in python?

related questions