tags:

views:

493

answers:

9

Let's say I want to write a regular expression to change all <abc>, <def>, and <ghi> tags into <xyz> tags.. and I also want to change their closing tags to </xyz>. This seems like a reasonable regex (ignore the backticks; StackOverflow has trouble with the less-than signs if I don't include them):

`s!<(/)?(abc|def|ghi)>!<${1}xyz>!g;`

And it works, too. The only problem is that for opening tags, the optional $1 variable gets assigned undef, and so I get a "Use of uninitialized value..." warning.

What's an elegant way to fix this? I'd rather not make this into two separate regexs, one for opening tags and another for closing tags, because then there are two copies of the taglist that need to be maintained, instead of just one.

Edit: I know I could just turn off warnings in this region of the code, but I don't consider that "elegant".

+1  A: 

You could just make your first match be (</?), and get rid of the hard-coded < on the "replace" side. Then $1 would always have either "<" or "</". There may be more elegant solutions to address the warning issue, but this one should handle the practical problem.

kcrumley
+1  A: 

Here is one way:

   s!<(/?)(abc|def|ghi)>!<$1xyz>!g;

Update: Removed irrelevant comment about using (?:pattern).

jmcnamara
But I *do* want to capture.
raldi
I misread. I'll fix it ...
jmcnamara
A: 

Add

no warnings 'uninitialized';

or

s!<(/)?(abc|def|ghi)>! join '', '<', ${1}||'', 'xyz>' !ge;
tye
+2  A: 

How about:

`s!(</?)(abc|def|ghi)>!${1}xyz>!g;`
mitchnull
A: 

To make the regex capture $1 in either case, try:

  s!<(/|)?(abc|def|ghi)>!<${1}xyz>!g;
       ^
       note the pipe symbol, meaning '/' or ''

For '' this will capture the '' between '<' and 'abc>', and for '', capture '/' between '<' and 'abc>'.

Aaron
(/?) would be simpler.
cjm
And you should still drop the question mark from after the parens
tye
What tye meant is that it’s pointless to put a question mark behind a group that is guaranteed to always match something. So either you write it (/?) or (/|) – both will result in the same effect – but not (/|)? – since the question mark there is redundant.
Aristotle Pagaltzis
Doh, thx. Copy-paste without reading = bad.
Aaron
+10  A: 

Move the question mark inside the capturing bracket. That way $1 will always be defined, but may be a zero-length string.

moonshadow
+1  A: 

s!<(/?)(abc|def|ghi)>!<${1}xyz>!g;

The only difference is changing "(/)?" to "(/?)". You have already identified several functional solution. This one has the elegance you asked for, I think.

Rob Adams
A: 

I'd rather not make this into two separate regexs, one for opening tags and another for closing tags, because then there are two copies of the taglist that need to be maintained

Why? Put your taglist into a variable and interpolate that variable into as many regexes as you like. I'd consider this even whith a single regex because it's much more readable with a complicated regex (and what regex isn't complicated?).

innaM
A: 

Be careful in as much as HTML is a bit harder then it looks to be at first glance. For example, do you want to change "<abc foo='bar'>" to "<xyz foo='bar'>"? Your regex won't. Do you want to change "<img alt='<abc>'>"? The regex will. Instead, you might want to do something like this:

use HTML::TreeBuilder;
my $tree=HTML::TreeBuilder->new_from_content("<abc>asdf</abc>");
for my $tag (qw<abc def ghi>) {
  for my $elem ($tree->look_down(_tag => $tag)) {
    $elem->tag('xyz');
  }
}
print $tree->as_HTML;

That keeps you from having to do the fiddly bits of parsing HTML yourself.

theorbtwo