tags:

views:

80

answers:

2
>>> s = 'foo: "apples", bar: "oranges"'
>>> pattern = 'foo: "(.*)"'

I want to be able to substitute into the group like this:

>>> re.sub(pattern, 'pears', s, group=1)
'foo: "pears", bar: "oranges"'

Is there a nice way to do this?

+2  A: 

For me works something like:

rx = re.compile(r'(foo: ")(.*?)(".*)')
s_new = rx.sub(r'\g<1>pears\g<3>', s)
print(s_new)

Notice ?in re, so it ends with first ", also notice " in groups 1 and 3 because they must be in output.

Instead of \g<1> (or \g<number>) you can use just \1, but remember to use "raw" strings and that g<1> form is preffered because \1 could be ambiguous (look for examples in Python doc) .

Michał Niklas
A: 
re.sub(r'(?<=foo: ")[^"]+(?=")', 'pears', s)

The regex matches a sequence of chars that

  • Follows the string foo: ",
  • doesn't contain double quotation marks and
  • is followed by "

(?<=) and (?=) are lookbehind and lookahead

This regex will fail if the value of foo contains escaped quots. Use the following one to catch them too:

re.sub(r'(?<=foo: ")(\\"|[^"])+(?=")', 'pears', s)

Sample code

>>> s = 'foo: "apples \\\"and\\\" more apples", bar: "oranges"'
>>> print s
foo: "apples \"and\" more apples", bar: "oranges"
>>> print   re.sub(r'(?<=foo: ")(\\"|[^"])+(?=")', 'pears', s)
foo: "pears", bar: "oranges"
Amarghosh
This returns `'foo: "pears"'`
Iacopo
@lacopo That was a typo - try with the updated regex. Replaced `[^=]` with `[^"]`. Be warned that this will fail if the value of `foo` contains escaped quotation marks.
Amarghosh
@lacopo updated the regex to handle escaped double quotes too.
Amarghosh
Though now it would fail with a string like "a backslash:\\" -- there's no really robust way to match escaped strings with a regex unfortunately.
Ian Bicking
hmm.. I didn't think of that case.
Amarghosh