ansaurus

Question

How do I write a regex that performs multiple substitutions on each line, EXCEPT when the line starts with a certain string?

Answer 1

+4 A:

You should really use two regexes; one to identify the "commented-out" lines and one to modify the http's in the regular lines.

There might be a non-standard way to combine the two regexes or replace all of your multiple (http...)+ matches, but I wouldn't use them.

aib 2009-02-09 20:10:26

The regex is fed into a legacy function that operates on a big, multi-line blob of text. I wish I could split it into lines and do what you say, but that would require major regression testing.

mike 2009-02-09 20:13:18

Major refactoring and regression testing, I should say.

mike 2009-02-09 20:14:45

@Mike - if you need to match the beginning of multiple lines, consider the 'm' modifier. It causes ^ and $ to match the beginning or end of any line.

Chris Lutz 2009-02-09 20:19:13

Oh, in practice I do -- somehow that got wiped out when I was turning it into an SO question.

mike 2009-02-09 20:21:45

Ah, the joy of working with legacy code :)

aib 2009-02-11 11:52:07

Answer 2

+3 A:

You can't really do this for an indefinite number of expressions. Try this:

s#(http://[^\s]+)#&lt;$1&gt;#g unless m#^//#;

This will replace all of the URLs in the line, but only if the first two characters of the line aren't "//". Sure, it's a little more complicated, but it works (I think).

EDIT: My answer is the same as aib's, but I have code.

Chris Lutz 2009-02-09 20:10:49

Answer 3

+3 A:

rewriting it a little...with my suggestions and using the whitespace modifier so it's actually readable. :)

s{
    (?:^|\G)     # start of the last match, so you never backtrack and don't capture.
    (?!//)       # a section without //
    (.*?)        # followed by anything
    (
        http://  # with http://
        [^\s]+   # and non-spaces - you could also use \S
    )
 }
 {$1<$2>}xmg;

Trying this in perl, we get:

sub test {
    my ($str, $expect) = @_;
    my $mod = $str;
    $mod =~ s{
            (?:^|\G)       # start of the last match, so you never backtrack.
            (?!//)       # a section without //
            (.*?)        # followed by anything
            (
                http://  # with http://
                [^\s]+   # and non-spaces - you could also use \S
            )
          }
          {$1<$2>}xmg;
    print "Expecting '$expect' got '$mod' - ";
    print $mod eq $expect ? "passed\n" : "failed\n";
}

test("http://foo.com",    "<http://foo.com&gt;");
test("// http://foo.com", "// http://foo.com");
test("foo\nhttp://a.com","foo\n&lt;http://a.com&gt;");

# output is 
# Expecting '<http://foo.com&gt;' got '<http://foo.com&gt;' - passed
# Expecting '// http://foo.com' got '// http://foo.com' - passed
# Expecting 'foo
# <http://a.com&gt;' got 'foo
# <http://a.com&gt;' - passed

Edit: Couple of changes: Added the 'm' modifier to make sure that it matches from the start of a line, and change \G to (^|\G) to make sure it starts looking at the start of the line too.

Robert P 2009-02-09 20:19:40

That's really really good, and I might be able to figure out the last little problem on my own, but of course any input is appreciated: In practice it also has a /m modifier, since it operates on a big blob of text. This causes it to fail on "foo\nhttp://a.com"

mike 2009-02-09 20:25:13

...which should return "foo\n<http://a.com>" but actually returns "foo\nhttp://a.com"

mike 2009-02-09 20:25:54

In fact, I'm going to accept your answer anyway, since it's perfect for the question as originally asked.

mike 2009-02-09 20:26:29

ok, sure. I'll update the answer.

Robert P 2009-02-09 20:26:51

Hey, changing your \G to (^|\G) and your $1<2> to $2<3> seems to work!

mike 2009-02-09 20:28:50

ah yeah :) Figured that one out right as I was updating the question... give me a bit and I'll add it. :)

Robert P 2009-02-09 20:34:29

Also, I made the first group a non-capturing group. That way it's clear to others that you really don't care what the first part is.

Robert P 2009-02-09 20:39:41

ansaurus

tags:

views:

answers:

How do I write a regex that performs multiple substitutions on each line, EXCEPT when the line starts with a certain string?

related questions