views:

48

answers:

2

My previous post got the tags partially stripped, so here it is again:

My goal is to replace all instances of a trailing - to a trailing + within tag brackets. Lets assume the line to be replaced looks like this:

<h> aa- aa- </h> <h> ba- ba- </h> 

and should afterwards look like

<h> aa+ aa+ </h> <h> ba+ ba+ </h>.

First I tried this expression:

s/<h>(.*?)-(.*?)<\/h>/<h>$1+$2<\/h>/g;

which yielded this output:

<h> aa+ aa- </h> <h> ba+ ba- </h>.

The g option does lead to more than one substitution per line, but only for the first instance per tag bracket (and only if both round brackets contain the question mark).

To narrow down the problem, I then tried to to achieve substitution disregarding the tags. The expression

s/(.*?)-(.*?)/$1+$2/g;

leads indeed to the desired result

<h> aa+ aa+ </h> <h> ba+ ba+ </h>.

This will substitute outside of the tag brackets as well, of course.

So what is the problem with my first expression, and how can I achieve my goal of complete substitution within tag brackets?

A: 

1) g option works for whole regular expression incuding tags

2) use it twice or until no changes

oraz
A: 

Separate the tag-selection and the substitution operations:

$str = '<h> aa- aa- </h> <h> ba- ba- </h>';
while ( $str =~ m!<h>.*?-.*?</h>! ) {
    substr( $str, $-[0], $+[0] - $-[0] ) =~ y/-/+/;
}
print $str, "\n";

(The @- and @+ arrays provide offset information about the last successful match.)

Alternatively:

sub fixup_h_tag {
    my $tag = shift;
    $tag =~ y/-/+/;
    $tag
}
$str = '<h> aa- aa- </h> <h> ba- ba- </h>';
$str =~ s{ (<h>.*?</h>) }{ fixup_h_tag("$1") }gxe;
print $str, "\n";

Note that if your tag markup language allows comments or quoted strings of some kind (that may contain or not as tags) or optional whitespace or attributes in the h tag, regexes are not easily going to provide a robust solution.

ysth