What makes your problem a little bit tricky is that you want to match inside of a multiline string. You need to use the re.MULTILINE
flag to make that work.
Then, you need to match some groups inside your source string, and use those groups in the final output. Here is code that works to solve your problem:
import re
s_pat = "^\s*REPLACE\(([^)]+)\)(.*)$"
pat = re.compile(s_pat, re.MULTILINE)
s_input = """\
Hello
REPLACE(str1) this is to replace
REPLACE(str2) this is to replace"""
def mksub(m):
return '<replace name="%s">%s</replace>' % m.groups()
s_output = re.sub(pat, mksub, s_input)
The only tricky part is the regular expression pattern. Let's look at it in detail.
^
matches the start of a string. With re.MULTILINE
, this matches the start of a line within a multiline string; in other words, it matches right after a newline in the string.
\s*
matches optional whitespace.
REPLACE
matches the literal string "REPLACE".
\(
matches the literal string "(".
(
begins a "match group".
[^)]
means "match any character but a ")".
+
means "match one or more of the preceding pattern.
)
closes a "match group".
\)
matches the literal string ")"
(.*)
is another match group containing ".*".
$
matches the end of a string. With re.MULTILINE
, this matches the end of a line within a multiline string; in other words, it matches a newline character in the string.
.
matches any character, and *
means to match zero or more of the preceding pattern. Thus .*
matches anything, up to the end of the line.
So, our pattern has two "match groups". When you run re.sub()
it will make a "match object" which will be passed to mksub()
. The match object has a method, .groups()
, that returns the matched substrings as a tuple, and that gets substituted in to make the replacement text.
EDIT: You actually don't need to use a replacement function. You can put the special string \1
inside the replacement text, and it will be replaced by the contents of match group 1. (Match groups count from 1; the special match group 0 corresponds the the entire string matched by the pattern.) The only tricky part of the \1
string is that \
is special in strings. In a normal string, to get a \
, you need to put two backslashes in a row, like so: "\\1"
But you can use a Python "raw string" to conveniently write the replacement pattern. Doing so you get this:
import re
s_pat = "^\s*REPLACE\(([^)]+)\)(.*)$"
pat = re.compile(s_pat, re.MULTILINE)
s_repl = r'<replace name="\1">\2</replace>'
s_input = """\
Hello
REPLACE(str1) this is to replace
REPLACE(str2) this is to replace"""
s_output = re.sub(pat, s_repl, s_input)