tags:

views:

350

answers:

4

Is there a simple way to remove all characters from a given string that match a given regular expression? I know in Ruby I can use gsub:

>> key = "cd baz ; ls -l"
=> "cd baz ; ls -l"
>> newkey = key.gsub(/[^\w\d]/, "")
=> "cdbazlsl"

What would the equivalent function be in Python?

+4  A: 
import re
re.sub(pattern, '', s)

Docs

SilentGhost
+2  A: 

re.subn() is your friend:

>>> import re
>>> key = "cd baz ; ls -l"
>>> re.subn(r'\W', "", key)
('cdbazlsl', 6)
>>> re.subn(r'\W', "", key)[0]
'cdbazlsl'

Returns a tuple. Take the first element if you only want the resulting string. Or just call re.sub(), as SilentGhost notes. (Which is to say, his answer is more exact.)

hughdbrown
Why call subn and then use [0] rather than just call the simpler sub?
Alex Martelli
I posted my answer when no other was visible. I subsequently found that it was not an ideal answer. I could have deleted my answer or edited it, possibly with attribution to others for the idea. What have you found answerers do when their answers are not quite on -- delete or edit?
hughdbrown
Empirical evidence is that it depends on how many up-votes (deserved or not!) have already been acquired :-(
John Machin
+2  A: 
import re
old = "cd baz ; ls -l"
regex = r"[^\w\d]" # which is the same as \W btw
pat = re.compile( regex )
new = pat.sub('', old )
THC4k
+2  A: 

The answers so far have focused on doing the same thing as your Ruby code, which is exactly the reverse of what you're asking in the English part of your question: the code removes character that DO match, while your text asks for

a simple way to remove all characters from a given string that fail to match

For example, suppose your RE's pattern was r'\d{2,}', "two or more digits" -- so the non-matching parts would be all non-digits plus all single, isolated digits. Removing the NON-matching parts, as your text requires, is also easy:

>>> import re
>>> there = re.compile(r'\d{2,}')
>>> ''.join(there.findall('123foo7bah45xx9za678'))
'12345678'

Edit: OK, OP's clarified the question now (he did indeed mean what his code, not his text, said, and now the text is right too;-) but I'm leaving the answer in for completeness (the other answers suggesting re.sub are correct for the question as it now stands). I realize you probably mean what you "say" in your Ruby code, and not what you say in your English text, but, just in case, I thought I'd better complete the set of answers!-)

Alex Martelli
Ah, yes, you are correct. I changed the question to match up with what I was actually trying to say. Thanks!
Chris Bunch