tags:

views:

52

answers:

1

I need to find, process and remove (one by one) any substrings that match a rather long regex:

# p is a compiled regex
# s is a string  
while 1:
    m = p.match(s)
    if m is None:
        break
    process(m.group(0)) #do something with the matched pattern
    s = re.sub(m.group(0), '', s) #remove it from string s

The code above is not good for 2 reasons:

  1. It doesn't work if m.group(0) happens to contain any regex-special characters (like *, +, etc.).

  2. It feels like I'm duplicating the work: first I search the string for the regular expression, and then I have to kinda go look for it again to remove it.

What's a good way to do this?

+3  A: 

The re.sub function can take a function as an argument so you can combine the replacement and processing steps if you wish:

def process_match(m):
    # Process the match here.
    return ''

s = p.sub(process_match, s)
Mark Byers
Thanks, forgot about that..
max
Ah and I figured out what to do about if I do want to replace a string that may contain regex symbols in it.. re.escape(s) takes care of that.
max