tags:

views:

330

answers:

3

Is it possible to write a single Python regular expression that can be applied to a multi-line string and change all occurrences of "foo" to "bar", but only on lines beginning with "#"?

I was able to get this working in Perl, using Perl's \G regular expression sigil, which matches the end of the previous match. However, Python doesn't appear to support this.

Here's the Perl solution, in case it helps:

my $x =<<EOF;
# foo
foo
# foo foo
EOF

$x =~ s{
        (            # begin capture
          (?:\G|^\#) # last match or start of string plus hash
          .*?        # followed by anything, non-greedily
        )            # end capture
        foo
      }
      {$1bar}xmg;

print $x;

The proper output, of course, is:

# bar
foo
# bar bar

Can this be done in Python?


Edit: Yes, I know that it's possible to split the string into individual lines and test each line and then decide whether to apply the transformation, but please take my word that doing so would be non-trivial in this case. I really do need to do it with a single regular expression.

+3  A: 
lines = mystring.split('\n')
for line in lines:
    if line.startswith('#'):
        line = line.replace('foo', 'bar')

No need for a regex.

Harley
Yes, but as I specifically said in the last line of the question, I'd like to do this without having to split the string and sift through it line by line.
mike
Why not split the string? I see Mat's provided a regex solution, but I find this one much easier to read.
John Fouhy
There's an existing function that takes a series of regexes and applies them to an input string, and it's politically infeasible to change this function since quite a lot depends upon it.
mike
Sorry, missed that last line. I'm genuinely curious why splitting is not an option though, I think both methods load the entire string into memory
Harley
Unfortunately, using regexes for solutions like this in python is not ... well ... pythonic. Text replacement using regexes is not as well supported in python as it is in perl, since python is much more generic in focus. The for loop may be your best bet for a simple, concice implementation.
Robert P
+1  A: 

It looked pretty easy to do with a regular expression:

>>> import re
... text = """line 1
... line 2
... Barney Rubble Cutherbert Dribble and foo
... line 4
... # Flobalob, bing, bong, foo and brian
... line 6"""
>>> regexp = re.compile('^(#.+)foo', re.MULTILINE)
>>> print re.sub(regexp, '\g<1>bar', text)
line 1
line 2
Barney Rubble Cutherbert Dribble and foo
line 4
# Flobalob, bing, bong, bar and brian
line 6

But then trying your example text is not so good:

>>> text = """# foo
... foo
... # foo foo"""
>>> regexp = re.compile('^(#.+)foo', re.MULTILINE)
>>> print re.sub(regexp, '\g<1>bar', text)
# bar
foo
# foo bar

So, try this:

>>> regexp = re.compile('(^#|\g.+)foo', re.MULTILINE)
>>> print re.sub(regexp, '\g<1>bar', text)
# foo
foo
# foo foo

That seemed to work, but I can't find \g in the documentation!

Moral: don't try to code after a couple of beers.

Mat
Wait, Python has a \g sigil that works like Perl's \G? I didn't notice that in the docs.
mike
Anyway, this doesn't work. Try feeding it "# foo foo"
mike
Yeah, I just realised that when I saw your example text. Darn!
Mat
That last one doesn't seem to work at all -- it's all foos and no bars! :) Anyway, I think I'm going to give up on this feature. It's probably not possible.
mike
A: 

\g works in python just like perl, and is in the docs.

"In addition to character escapes and backreferences as described above, \g will use the substring matched by the group named name, as defined by the (?P...) syntax. \g uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'. The backreference \g<0> substitutes in the entire substring matched by the RE."

Algorias