ansaurus

Question

Answer 1

A:

>>> text.replace("><", "<")
'<hi type="italic"> the</hi>'

ghostdog74 2009-07-30 03:05:21

This won't work because there are other instances where the value of text might be "<tag>stuff</tag><tag>blah</tag>"

Daniel 2009-07-30 03:11:35

Answer 2

+2 A:

Two bugs in your code. First, you're not matching (and specifically, capturing) what you think you're matching and capturing -- insert after your call to .search:

>>> _.groups()
('',)

The unconstrained repetition of repetitions (star after a capturing group with nothing but stars) matches once too many -- with the empty string at the end of what you think you're matchin -- and that's what gets captured. Fix by changing at least one of the stars to a plus, e.g., by:

>>> pat_error = re.compile(r">(\s*\w+)*>")
>>> pat_error.search(text)
<_sre.SRE_Match object at 0x83ba0>
>>> _.groups()
(' the',)

Now THIS matches and captures sensibly. Second, youre not using raw string literal syntax where you should, so you don't have a backslash where you think you have one -- you have an escape sequence \1 which is the same as chr(1). Fix by using raw string literal syntax, i.e. after the above snippet

>>> pat_error.sub(r">\1", text)
'<hi type="italic"> the</hi>'

Alternatively you could double up all of your backslashes, to avoid them being taken as the start of escape sequences -- but, raw string literal syntax is much more readable.

Alex Martelli 2009-07-30 03:13:55

ansaurus

tags:

views:

answers:

Python: \number Backreference in re.sub

related questions