tags:

views:

54

answers:

1

Hello,

I need to replace part of a string. I was looking through the Python documentation and found re.sub.

import re
s = '<textarea id="Foo"></textarea>'
output = re.sub(r'<textarea.*>(.*)</textarea>', 'Bar', s)
print output

>>>'Bar'

I was expecting this to print '<textarea id="Foo">Bar</textarea>' and not 'bar'.

Could anybody tell me what I did wrong?

+6  A: 

Instead of capturing the part you want to replace you can capture the parts you want to keep and then refer to them using a reference \1 to include them in the substituted string.

Try this instead:

output = re.sub(r'(<textarea.*>).*(</textarea>)', r'\1Bar\2', s)

Also, assuming this is HTML you should consider using an HTML parser for this task, for example Beautiful Soup.

Mark Byers
I think you mean `r'\1Bar\3'`.
Nathon
@Nathon - there is no `\3` match. Only two of them in parenthesis...
eumiro
Aha, I see. Thanks a lot Mark.
Pickels
@eumiro Ah, right. I misread it.
Nathon