views:

53

answers:

2

I have a bunch of files (in the hundreds) that have img tags like the following:

<img randomAttr1="randomVal" randomAttr2="valueRand" border="0" 
     randomAttr3="someRandValue">

I'm trying to do a search and replace operation in Visual Studio 2005 that will identify a tag as an <img>, but only match the border="0" potion of the string.

My belief is that I need a non-greedy portion of the regular expression to "match" (and I use the term loosely) the img tag and then actually match the border attribute, so that I can remove it.

I'm using regular expressions to do this as nearly none of the markup is well formed.

My goal here is to remove the border attributes from all of the img tags.

I've tried the following regex, but I can't seem to get it to match only the border tag:

(\<img)#.@border=\"[0-9]+\"

I believe the '#' and the '@' to be non-greedy matching characters as that is what the documentation for VS-2005 says, and thus I would not think that it would match so many characters; however it matches everything from the <img all the way to the end of the border="0" attribute.

+1  A: 

Try the following: (Tested)

Find: {\<img.#}border=\"[0-9]+\"
Replace: \1

Note that this won't match tags with a newline between the tag and the attribute.

SLaks
@SLaks Thanks, but that didn't match anything.
leeand00
I changed the regex; it should work now.
SLaks
Ah okay so the /1 means replace the second match right? Because the first match (/0) is actually the {\<img} portion of the tag.Or wait, if it starts it's matches at /1 that would mean that it's replacing border with what it non-greedily matched on tag (a.k.a. nothing)Which is it?
leeand00
`\0` means the entire `Find What` text. `\1` means the first group. (It's 1-based) You can click the little arrow on the right side of the replace box to see.
SLaks
He tagged everything except the border= attribute (curly braces), so that's in \1. So the find matches the img tag and all of its contents up to (and including) the border attribute, and replaces it with the img tag and all of its contents *except* the border attribute.
GalacticCowboy
A: 

Don't be so quick to give up on real parsers. For example, given near-garbage input of

<TagSoup>lskdjfs
sdfkljs sdfalkjdfs
<img randomAttr1=randomVal randomAttr2="valueRand" border="0" 
     randomAttr3="someRandValue">
sdklfjsdflkj
<img randomAttr1="randomVal" randomAttr2="valueRand123"
     randomAttr3=someRandValue456>

the code below deletes the border attribute.

#! /usr/bin/perl

use warnings;
use strict;

use HTML::Parser;

sub start {
  my($tag,$attr,$attrseq,$text,$skipped) = @_;

  print $skipped;
  unless ($tag eq "img") {
    print $text;
    return;
  }

  my $changed = 0;
  my @seq;
  for (@$attrseq) {
    if (lc($_) eq "border" && $attr->{$_} =~ /^\s*0+\s*$/) {
      delete $attr->{$_};
      $changed = 1;
    }
    else {
      push @seq => $_;
    }
  }

  if ($changed) {
    print "<$tag ",
            join(" " => map qq[$_="$attr->{$_}"], @seq),
          ">";
  }
  else {
    print $text;
  }
}

die "Usage: $0 html-file\n" unless @ARGV == 1;
my $p = HTML::Parser->new(
  api_version => 3,
  marked_sections => 1,
  case_sensitive => 1,
  start_h => [ \&start => "tag, attr, attrseq, text, skipped_text" ],
  end_h => [ sub { print @_ } => "skipped_text, text" ],
);

undef $/;
$p->parse(<>);
Greg Bacon
I'm trying to keep all of this within a Visual Studio Macro (only because I have other regex's running that macro to fix other problems, as all of those pages are quite similar...)Although in light of no other replies I may attempt to use your script by calling it from the VS Macro (if that's possible).
leeand00
I presently do not have perl installed...
leeand00